Arxiv Day: Article

Advancing Network Security: A Comprehensive Testbed and Dataset for Machine Learning-Based Intrusion Detection

This paper introduces a Testbed designed for generating network traffic, leveraging the capabilities of containers, Kubernetes, and eBPF/XDP technologies. Our Testbed serves as an advanced platform for producing network traffic for machine learning based network experiments. By utilizing this Testbed, we offer small malicious network traffic dataset publically that satisfy ground truth property completely.

Updated: 2024-10-23 23:58:10

标题: 推进网络安全：基于机器学习的入侵检测的全面测试平台和数据集

摘要: 这篇论文介绍了一个专为生成网络流量而设计的测试平台，利用容器、Kubernetes和eBPF/XDP技术的能力。我们的测试平台作为一个高级平台，用于为基于机器学习的网络实验产生网络流量。通过利用这个测试平台，我们公开提供了一个满足完全真实属性的小规模恶意网络流量数据集。

更新时间: 2024-10-23 23:58:10

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2410.18332v1

Optimal Partial Graph Matching

Partial graph matching addresses the limitations of traditional graph matching by allowing some nodes to remain unmatched, making it applicable to more complex scenarios. However, this flexibility introduces additional complexity, as both the subset of nodes to match and the optimal mapping must be determined. While recent studies have explored deep learning techniques for partial graph matching, a significant limitation remains: the absence of an optimization objective that fully captures the problem's intrinsic nature while enabling efficient solutions. In this paper, we propose a novel optimization framework for partial graph matching, inspired by optimal partial transport. Our approach formulates an objective that enables partial assignments while incorporating matching biases, using weighted total variation as the divergence function to guarantee optimal partial assignments. We employ the Hungarian algorithm to achieve efficient, exact solutions with cubic time complexity. Our contributions are threefold: (i) we introduce a robust optimization objective that balances matched and unmatched nodes; (ii) we establish a connection between partial graph matching and the linear sum assignment problem, enabling efficient solutions; (iii) we propose a deep graph matching architecture with a novel partial matching loss, providing an end-to-end solution. The empirical evaluations on standard graph matching benchmarks demonstrate the efficacy of the proposed approach.

Updated: 2024-10-23 23:58:06

标题: 最佳部分图匹配

摘要: 部分图匹配通过允许一些节点保持不匹配来解决传统图匹配的局限性，使其适用于更复杂的场景。然而，这种灵活性引入了额外的复杂性，因为必须确定要匹配的节点子集和最佳映射。虽然最近的研究已经探索了用于部分图匹配的深度学习技术，但仍然存在一个重要的限制：缺乏完全捕捉问题固有特性并能够提供高效解决方案的优化目标。在本文中，我们提出了一个受到最优部分传输启发的部分图匹配的新型优化框架。我们的方法制定了一个目标，可以实现部分分配，同时结合匹配偏差，使用加权总变差作为散度函数来保证最佳的部分分配。我们利用匈牙利算法实现了具有立方时间复杂度的高效、精确解决方案。我们的贡献有三个方面：(i) 我们引入了一个平衡匹配和未匹配节点的健壮优化目标；(ii) 我们建立了部分图匹配和线性和分配问题之间的联系，实现了高效解决方案；(iii) 我们提出了一个具有新型部分匹配损失的深度图匹配架构，提供了端到端的解决方案。在标准图匹配基准上的实证评估证明了所提出方法的有效性。

更新时间: 2024-10-23 23:58:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.16718v2

Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation

In this study, we introduce Unified Microphone Conversion, a unified generative framework to enhance the resilience of sound event classification systems against device variability. Building on the limitations of previous works, we condition the generator network with frequency response information to achieve many-to-many device mapping. This approach overcomes the inherent limitation of CycleGAN, requiring separate models for each device pair. Our framework leverages the strengths of CycleGAN for unpaired training to simulate device characteristics in audio recordings and significantly extends its scalability by integrating frequency response related information via Feature-wise Linear Modulation. The experiment results show that our method outperforms the state-of-the-art method by 2.6% and reducing variability by 0.8% in macro-average F1 score.

Updated: 2024-10-23 23:10:09

标题: 统一麦克风转换：通过特征级线性调制实现多对多设备映射

摘要: 在这项研究中，我们引入了统一麦克风转换（Unified Microphone Conversion），这是一个统一的生成框架，用于增强声音事件分类系统对设备变异的韧性。在以往研究的局限性基础上，我们使用频率响应信息对生成器网络进行条件化，实现多对多设备映射。这种方法克服了 CycleGAN 的固有局限性，需要为每对设备单独建模的问题。我们的框架利用 CycleGAN 的优势进行无配对训练，模拟音频录制中的设备特性，并通过特征级线性调制（Feature-wise Linear Modulation）集成频率响应相关信息，显著扩展了其可扩展性。实验结果表明，我们的方法在宏平均 F1 分数上超过了最先进的方法 2.6%，并将可变性降低了 0.8%。

更新时间: 2024-10-23 23:10:09

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.18322v1

Infinite Width Models That Work: Why Feature Learning Doesn't Matter as Much as You Think

Common infinite-width architectures such as Neural Tangent Kernels (NTKs) have historically shown weak performance compared to finite models. This is usually attributed to the absence of feature learning. We show that this explanation is insufficient. Specifically, we show that infinite width NTKs obviate the need for feature learning. They can learn identical behavior by selecting relevant subfeatures from their (infinite) frozen feature vector. Furthermore, we show experimentally that NTKs under-perform traditional finite models even when feature learning is artificially disabled. Instead, we show that weak performance is at least partly due to the fact that existing constructions depend on weak optimizers like SGD. We provide a new infinite width limit based on ADAM-like learning dynamics and demonstrate empirically that the resulting models erase this performance gap.

Updated: 2024-10-23 23:08:50

标题: 无限宽度模型的有效性：为什么特征学习并不像你想象的那样重要

摘要: 常见的无限宽度架构，如神经切向核（NTKs），与有限模型相比，历史上表现较弱。通常将这归因于特征学习的缺失。我们展示了这一解释是不充分的。具体来说，我们展示了无限宽度的NTKs不需要特征学习。它们可以通过从（无限的）冻结特征向量中选择相关子特征来学习相同的行为。此外，我们实验性地展示，即使人为禁用特征学习，NTKs也会表现不如传统有限模型。相反，我们展示了弱性能至少部分是由于现有构造依赖于像SGD这样的弱优化器。我们提供了基于ADAM类似的学习动态的新的无限宽度极限，并通过实验证明，由此产生的模型消除了这种性能差距。

更新时间: 2024-10-23 23:08:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18800v2

Calibrating Deep Neural Network using Euclidean Distance

Uncertainty is a fundamental aspect of real-world scenarios, where perfect information is rarely available. Humans naturally develop complex internal models to navigate incomplete data and effectively respond to unforeseen or partially observed events. In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. However, it does not guarantee well-calibrated predicted probabilities and may result in models that are overconfident or underconfident. High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability. This research introduces a novel loss function called Focal Calibration Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples. By minimizing the Euclidean norm through a strictly proper loss, FCL penalizes the instance-wise calibration error and constrains bounds. We provide theoretical validation for proposed method and apply it to calibrate CheXNet for potential deployment in web-based health-care systems. Extensive evaluations on various models and datasets demonstrate that our method achieves SOTA performance in both calibration and accuracy metrics.

Updated: 2024-10-23 23:06:50

标题: 使用欧几里得距离校准深度神经网络

摘要: 不确定性是现实世界场景的基本特征，完美信息很少可得。人类自然地形成复杂的内部模型，以应对不完整数据并有效响应未预料或部分观察到的事件。在机器学习中，Focal Loss通常用于通过强调难以分类的样本来降低误分类率。然而，它并不能保证预测概率的良好校准，并可能导致模型过度自信或不自信。高校准误差表明预测概率与实际结果之间存在不一致，影响模型可靠性。本研究引入了一种名为Focal Calibration Loss（FCL）的新型损失函数，旨在改善概率校准，同时保留Focal Loss在处理困难样本方面的优势。通过通过严格适当的损失最小化欧几里得范数，FCL对实例级校准误差进行惩罚并限制边界。我们为所提出的方法提供了理论验证，并将其应用于校准CheXNet，以便在基于网络的医疗保健系统中进行潜在部署。对各种模型和数据集的广泛评估表明，我们的方法在校准和准确性指标上均取得了SOTA性能。

更新时间: 2024-10-23 23:06:50

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2410.18321v1

All Random Features Representations are Equivalent

Random features are a powerful technique for rewriting positive-definite kernels as linear products. They bring linear tools to bear in important nonlinear domains like KNNs and attention. Unfortunately, practical implementations require approximating an expectation, usually via sampling. This has led to the development of increasingly elaborate representations with ever lower sample error. We resolve this arms race by deriving an optimal sampling policy. Under this policy all random features representations have the same approximation error, which we show is the lowest possible. This means that we are free to choose whatever representation we please, provided we sample optimally.

Updated: 2024-10-23 23:04:36

标题: 所有随机特征表示是等价的

摘要: 随机特征是将正定核重写为线性乘积的强大技术。它们将线性工具引入重要的非线性领域，如KNN和注意力机制。不幸的是，实际实现需要通过抽样来近似期望。这导致了对越来越精细的表示形式的发展，其样本误差越来越低。我们通过推导出最优抽样策略来解决这种竞争。根据这一策略，所有随机特征表示具有相同的近似误差，我们证明这是最低的可能误差。这意味着我们可以自由选择任何表示形式，只要我们进行最佳抽样。

更新时间: 2024-10-23 23:04:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18802v2

Self-Supervised Learning for Time Series: A Review & Critique of FITS

Accurate time series forecasting is a highly valuable endeavour with applications across many industries. Despite recent deep learning advancements, increased model complexity, and larger model sizes, many state-of-the-art models often perform worse or on par with simpler models. One of those cases is a recently proposed model, FITS, claiming competitive performance with significantly reduced parameter counts. By training a one-layer neural network in the complex frequency domain, we are able to replicate these results. Our experiments on a wide range of real-world datasets further reveal that FITS especially excels at capturing periodic and seasonal patterns, but struggles with trending, non-periodic, or random-resembling behavior. With our two novel hybrid approaches, where we attempt to remedy the weaknesses of FITS by combining it with DLinear, we achieve the best results of any known open-source model on multivariate regression and promising results in multiple/linear regression on price datasets, on top of vastly improving upon what FITS achieves as a standalone model.

Updated: 2024-10-23 23:03:09

标题: 时间序列的自监督学习：《FITS》的回顾与批判

摘要: 准确的时间序列预测是一项非常有价值的工作，可以应用于许多行业。尽管最近深度学习取得了进展，模型复杂度增加，模型规模更大，但许多最先进的模型往往表现不佳或与简单模型相当。其中一个案例是最近提出的模型FITS，声称具有竞争性表现并显著减少参数数量。通过在复杂频域中训练单层神经网络，我们能够复制这些结果。我们在广泛的真实数据集上的实验进一步揭示了FITS在捕捉周期性和季节性模式方面特别擅长，但在趋势、非周期性或类似随机行为方面表现不佳。通过我们的两种新型混合方法，我们试图通过将其与DLinear结合来弥补FITS的弱点，我们在多元回归方面取得了任何已知开源模型的最佳结果，并在价格数据集上的多元/线性回归中取得了令人满意的结果，大大改善了FITS作为独立模型的表现。

更新时间: 2024-10-23 23:03:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18318v1

Time Matters: Scaling Laws for Any Budget

A primary cost driver for training large models is wall-clock training time. We show that popular time estimates based on FLOPs are poor estimates, and construct a more accurate proxy based on memory copies. This allows us to accurately estimate the training speed of a transformer model from its hyperparameters. Combined with a scaling law curve like Chinchilla, this allows us to accurately predict the final loss of a model from a simple equation. We show that this expression is accurate across a wide range of model hyperparameter values, enabling us to analytically make architectural decisions and train models more efficiently. Crucially, this analysis predicts that in contrast to existing literature, models should be wider rather than deeper, as the benefits of speed outweigh the benefits of depth.

Updated: 2024-10-23 22:56:39

标题: 时间很重要：任何预算的规模定律

摘要: 培训大型模型的主要成本驱动因素是挂钟培训时间。我们展示了基于FLOPs的流行时间估计是不准确的，并构建了一个基于内存复制的更准确的代理。这使我们能够准确估计一个变压器模型的训练速度。结合像Chinchilla这样的比例定律曲线，这使我们能够准确预测模型的最终损失，只需简单的方程。我们展示了这个表达式在各种模型超参数值范围内都是准确的，使我们能够在分析上做出架构决策并更有效地训练模型。至关重要的是，这种分析预测与现有文献相反，模型应该更宽而不是更深，因为速度的好处超过了深度的好处。

更新时间: 2024-10-23 22:56:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18922v2

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures. In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.

Updated: 2024-10-23 22:53:28

标题: 多样性介入文本到图像生成的事实性税：基准和事实增强介入

摘要: 基于提示的“多样性干预”通常被采用来改善描绘具有各种种族或性别特征的文本到图像（T2I）模型的多样性。然而，当生成真实历史人物时，这种策略是否会导致非事实的人口分布呢？在这项工作中，我们提出了DemOgraphic FActualIty Representation（DoFaiR），这是一个用于系统量化在T2I模型中使用多样性干预和保留人口事实性之间的权衡的基准。DoFaiR包含了756个经过精心核实的测试实例，通过一个自动化的证据支持评估流程揭示了各种多样性提示的事实性税。在DoFaiR上进行的实验揭示了多样性导向指令增加了DALLE-3生成物中不同性别和种族群体的数量，但以历史不准确的人口分布为代价。为了解决这个问题，我们提出了Fact-Augmented Intervention（FAI），指导大型语言模型（LLM）反思有关历史上一代对象的性别和种族构成的口头或检索到的事实信息，并将其纳入T2I模型的生成环境中。通过以反映的历史真相引导模型生成，FAI在保留多样性的同时显著改善了多样性干预下的人口事实性。

更新时间: 2024-10-23 22:53:28

领域: cs.CL,cs.AI,cs.CV,cs.CY

下载: http://arxiv.org/abs/2407.00377v2

The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects

Recent large-scale T2I models like DALLE-3 have made progress in reducing gender stereotypes when generating single-person images. However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e.g. "a CEO" and "an Assistant"). This contrastive setting often triggers T2I models to generate gender-stereotyped images. Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power. Experiments show that over 74% images generated by DALLE-3 display gender-occupational biases. Additionally, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. We further propose FairCritic, a novel and interpretable framework that leverages an LLM-based critic model to i) detect bias in generated images, and ii) adaptively provide feedback to T2I models for improving fairness. FairCritic achieves near-perfect fairness on PST, overcoming the limitations of previous prompt-based intervention approaches.

Updated: 2024-10-23 22:47:44

标题: 男性CEO和女性助理：双重主体文本到图像生成中性别偏见的评估和缓解

摘要: 最近大规模的T2I模型，如DALLE-3，在生成单人图像时在减少性别刻板印象方面取得了进展。然而，在生成多人图像时仍然存在显著的偏见。为了系统评估这一点，我们提出了配对刻板印象测试（PST）框架，该框架要求T2I模型描绘被赋予男性刻板印象和女性刻板印象社会身份的两个个体（例如“一名CEO”和“一名助手”）。这种对比设置通常会触发T2I模型生成性别刻板印象图像。使用PST，我们评估了性别偏见的两个方面--性别职业偏见和一个新颖的方面：组织权力偏见。实验表明，DALLE-3生成的超过74%图像显示性别职业偏见。此外，与单人设置相比，DALLE-3在PST下更有可能持续传播与男性相关的刻板印象。我们进一步提出FairCritic，这是一个新颖且可解释的框架，利用基于LLM的批评模型来i）检测生成图像中的偏见，以及ii）自适应地向T2I模型提供反馈以改善公平性。FairCritic在PST上实现了近乎完美的公平性，克服了先前基于提示的干预方法的局限性。

更新时间: 2024-10-23 22:47:44

领域: cs.CV,cs.AI,cs.CY

下载: http://arxiv.org/abs/2402.11089v3

Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. To demonstrate their simplicity and expressiveness, we derive diagrams of various autodiff operations and popular curvature approximations with full hyper-parameter support, batching, channel groups, and generalization to any convolution dimension. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to simplify diagrams before evaluation. Finally, we probe performance. Our TN implementation accelerates a recently-proposed KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient tensor dropout for approximate backpropagation.

Updated: 2024-10-23 22:47:01

标题: 卷积和更多作为Einsum：张量网络视角及对二阶方法的进展

摘要: 尽管卷积的直觉很简单，但与密集层相比，卷积更加繁琐分析，这使得将理论和算法思想转移到卷积变得更加复杂。我们通过将卷积视为允许通过绘制图表、操纵它们执行函数转换（如微分）并使用einsum高效评估的张量网络（TNs）来简化卷积。为了展示其简单性和表达能力，我们推导了各种自动微分操作和流行的曲率近似的图表，具有完整的超参数支持，批处理，通道组和对任何卷积维度的泛化。此外，我们基于连接模式提供了基于卷积的特定转换，这可以在评估之前简化图表。最后，我们对性能进行了探索。我们的TN实现可以加速最近提出的KFAC变体高达4.5倍，同时消除了标准实现的内存开销，并实现了新的硬件高效张量丢弃以进行近似反向传播。

更新时间: 2024-10-23 22:47:01

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2307.02275v2

Countering Autonomous Cyber Threats

With the capability to write convincing and fluent natural language and generate code, Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Generative AI has already begun to impact cyberspace through a broad illicit marketplace for assisting malware development and social engineering attacks through hundreds of malicious-AI-as-a-services tools. More alarming is that recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. However, these previous investigations primarily focused on the threats posed by proprietary models due to the until recent lack of strong open-weight model and additionally leave the impacts of network defenses or potential countermeasures unexplored. Critically, understanding the aptitude of downloadable models to function as offensive cyber agents is vital given that they are far more difficult to govern and prevent their misuse. As such, this work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks. Using target machines from a commercial provider, the most recently released downloadable models are found to be on par with a leading proprietary model at conducting simple cyber attacks with common hacking tools against known vulnerabilities. To mitigate such LLM-powered threats, defensive prompt injection (DPI) payloads for disrupting the malicious cyber agent's workflow are demonstrated to be effective. From these results, the implications for AI safety and governance with respect to cybersecurity is analyzed.

Updated: 2024-10-23 22:46:44

标题: 应对自主网络威胁

摘要: 具有撰写令人信服和流畅自然语言并生成代码的能力，基础模型广泛地提出了双重使用的担忧，特别是在网络领域内。生成式人工智能已经开始通过广泛的非法市场影响网络空间，用于协助恶意软件开发和社会工程攻击，通过数百种恶意人工智能作为服务工具。更令人担忧的是，最近的研究表明这些先进模型有潜力指导或独立执行攻击性网络空间操作。然而，这些先前的调查主要关注了专有模型所带来的威胁，因为直到最近缺乏强有力的开放式权重模型，而且还未探讨网络防御或潜在对策的影响。至关重要的是，了解可下载模型作为攻击性网络代理的能力是至关重要的，因为它们更难以管理和防止滥用。因此，这项工作评估了几种最先进的基础模型在其能够妥协孤立网络中的机器的能力，并调查了击败此类人工智能攻击的防御机制。使用商业提供商的目标机器，最近发布的可下载模型被发现在使用常见黑客工具对已知漏洞进行简单网络攻击方面与领先的专有模型相当。为了减轻这种由LLM驱动的威胁，展示了用于破坏恶意网络代理工作流的防御提示注入（DPI）有效载荷。从这些结果中，分析了与网络安全有关的人工智能安全和治理的影响。

更新时间: 2024-10-23 22:46:44

领域: cs.CR,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.18312v1

CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation

Large language models (LLMs) with billions of parameters have sparked a new wave of exciting AI applications. However, their high computational costs and memory demands during inference pose significant challenges. Adaptive sparse activation inference, which activates only a small number of neurons for each token, offers a novel way to accelerate model inference without degrading performance, showing great potential for resource-constrained hardware devices. Nevertheless, existing methods predict activated neurons based on individual tokens with additional MLP, which involve frequent changes in activation maps and resource calls, limiting the acceleration benefits of sparse activation. In this paper, we introduce CoreInfer, an MLP-free adaptive sparse activation inference method based on sentence-level prediction. Specifically, we propose the concept of sentence-wise core neurons, which refers to the subset of neurons most critical for a given sentence, and empirically demonstrate its effectiveness. To determine the core neurons, we explore the correlation between core neurons and the sentence's semantics. Remarkably, we discovered that core neurons exhibit both stability and similarity in relation to the sentence's semantics -- an insight overlooked by previous studies. Building on this finding, we further design two semantic-based methods for predicting core neurons to fit different input scenarios. In CoreInfer, the core neurons are determined during the pre-filling stage and fixed during the encoding stage, enabling zero-cost sparse inference. We evaluated the model generalization and task generalization of CoreInfer across various models and tasks. Notably, on an NVIDIA TITAN XP GPU, CoreInfer achieved a 10.33 times and 2.72 times speedup compared to the Huggingface implementation and PowerInfer, respectively.

Updated: 2024-10-23 22:45:23

标题: CoreInfer: 使用语义启发式自适应稀疏激活加速大型语言模型推断

摘要: 具有数十亿参数的大型语言模型（LLMs）引发了一波新的令人兴奋的人工智能应用浪潮。然而，在推理过程中，它们高昂的计算成本和内存需求带来了重大挑战。自适应稀疏激活推理仅激活每个标记的少量神经元，提供了一种新颖的加速模型推理的方式，而不会降低性能，显示出适用于资源受限硬件设备的巨大潜力。然而，现有方法基于个别标记预测激活的神经元，并使用额外的MLP，这涉及激活图和资源调用的频繁更改，限制了稀疏激活的加速效益。在本文中，我们介绍了CoreInfer，一种基于句子级预测的无MLP自适应稀疏激活推理方法。具体而言，我们提出了句子核神经元的概念，指的是对于给定句子最关键的神经元子集，并从经验上证明了其有效性。为了确定核心神经元，我们探索了核心神经元与句子语义之间的相关性。值得注意的是，我们发现核心神经元与句子语义之间同时表现出稳定性和相似性--这是先前研究忽视的见解。基于这一发现，我们进一步设计了两种基于语义的方法来预测核心神经元，以适应不同的输入场景。在CoreInfer中，核心神经元在预填充阶段确定，并在编码阶段固定，实现零成本稀疏推理。我们评估了CoreInfer在各种模型和任务上的模型泛化和任务泛化能力。值得注意的是，在NVIDIA TITAN XP GPU上，与Huggingface实现和PowerInfer相比，CoreInfer分别实现了10.33倍和2.72倍的加速。

更新时间: 2024-10-23 22:45:23

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.18311v1

The Representation Jensen-Shannon Divergence

Quantifying the difference between probability distributions is crucial in machine learning. However, estimating statistical divergences from empirical samples is challenging due to unknown underlying distributions. This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence. Our approach embeds data into a reproducing kernel Hilbert space (RKHS), representing distributions through uncentered covariance operators. We then compute the Jensen-Shannon divergence between these operators, thereby establishing a proper divergence measure between probability distributions in the input space. We provide estimators based on kernel matrices and empirical covariance matrices using Fourier features. Theoretical analysis reveals that RJSD is a lower bound on the Jensen-Shannon divergence, enabling variational estimation. Additionally, we show that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences. Our experimental results demonstrate RJSD's superiority in two-sample testing, distribution shift detection, and unsupervised domain adaptation, outperforming state-of-the-art techniques. RJSD's versatility and effectiveness make it a promising tool for machine learning research and applications.

Updated: 2024-10-23 22:39:31

标题: Jensen-Shannon散度的表示

摘要: 量化概率分布之间的差异在机器学习中是至关重要的。然而，由于未知的潜在分布，从经验样本中估计统计差异是具有挑战性的。本文提出了表示Jensen-Shannon散度（RJSD）的方法，这是一种受传统Jensen-Shannon散度启发的新型度量。我们的方法将数据嵌入到再生核希尔伯特空间（RKHS）中，通过未居中协方差算子表示分布。然后计算这些算子之间的Jensen-Shannon散度，从而在输入空间中建立了一个适当的概率分布之间的差异度量。我们提供了基于核矩阵和经验协方差矩阵的估计方法，使用傅里叶特征。理论分析表明，RJSD是Jensen-Shannon散度的一个下界，从而实现了变分估计。此外，我们展示了RJSD是最大均值差异（MMD）的高阶扩展，提供了一种更敏感的分布差异度量。我们的实验结果表明，在两样本测试、分布转移检测和无监督领域自适应方面，RJSD表现出优越性，超越了现有技术。RJSD的多功能性和有效性使其成为机器学习研究和应用中的一种有前途的工具。

更新时间: 2024-10-23 22:39:31

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2305.16446v4

A Training Data Recipe to Accelerate A* Search with Language Models

Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.

Updated: 2024-10-23 22:37:31

标题: 使用语言模型加速A*搜索的训练数据配方

摘要: 将大型语言模型（LLMs）与启发式搜索算法（如A*）相结合，有望增强LLM推理能力和可扩展推理。为了加快训练速度并降低计算需求，我们研究了LLM启发式学习的训练数据的核心集选择问题。很少有方法考虑启发式函数学习与搜索算法和机器学习模型之间的互动。在这项工作中，我们从实证角度梳理了A*搜索算法的要求和LLM的要求，以便在此任务上进行泛化。令人惊讶的是，我们发现它们的要求存在重叠；A*需要更准确的预测来搜索接近目标的节点，而LLMs需要相同的节点集以进行有效的泛化。基于这些见解，我们推导出了用于学习基于LLM的启发式的数据选择分布。在三个经典规划领域，迷宫导航、Sokoban和滑动拼图游戏中，我们的技术将寻找解决方案所需的迭代次数减少了最多15倍，并且搜索的时钟速度提高了最多5倍。代码库位于https://github.com/devaansh100/a_star。

更新时间: 2024-10-23 22:37:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09985v2

A Unified View of Group Fairness Tradeoffs Using Partial Information Decomposition

This paper introduces a novel information-theoretic perspective on the relationship between prominent group fairness notions in machine learning, namely statistical parity, equalized odds, and predictive parity. It is well known that simultaneous satisfiability of these three fairness notions is usually impossible, motivating practitioners to resort to approximate fairness solutions rather than stringent satisfiability of these definitions. However, a comprehensive analysis of their interrelations, particularly when they are not exactly satisfied, remains largely unexplored. Our main contribution lies in elucidating an exact relationship between these three measures of (un)fairness by leveraging a body of work in information theory called partial information decomposition (PID). In this work, we leverage PID to identify the granular regions where these three measures of (un)fairness overlap and where they disagree with each other leading to potential tradeoffs. We also include numerical simulations to complement our results.

Updated: 2024-10-23 22:25:02

标题: 使用部分信息分解统一观点下的群体公平权衡

摘要: 本文介绍了一种新颖的信息论视角，探讨了机器学习中突出的群体公平性概念之间的关系，即统计平等、平等几率和预测平等。众所周知，同时满足这三种公平性概念通常是不可能的，这促使从业者转而寻求近似公平解决方案，而不是严格满足这些定义。然而，对它们的相互关系的全面分析，尤其是当它们不完全满足时，仍然很少被探索。我们的主要贡献在于通过利用信息论中的一个称为部分信息分解（PID）的工作来阐明这三种（不）公平性度量之间的确切关系。在这项工作中，我们利用PID来识别这三种（不）公平性度量重叠的细粒度区域，以及它们相互不一致的地方，从而导致潜在的权衡。我们还包括数值模拟来补充我们的结果。

更新时间: 2024-10-23 22:25:02

领域: cs.IT,cs.CY,cs.LG,math.IT,stat.ML

下载: http://arxiv.org/abs/2406.04562v2

Learning Code Preference via Synthetic Evolution

Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation based on well-formed properties and aligning it with developer preferences remains challenging. In this paper, we explore two key questions under the new challenge of code preference learning: (i) How do we train models to predict meaningful preferences for code? and (ii) How do human and LLM preferences align with verifiable code properties and developer code tastes? To this end, we propose CodeFavor, a framework for training pairwise code preference models from synthetic evolution data, including code commits and code critiques. To evaluate code preferences, we introduce CodePrefBench, a benchmark comprising 1364 rigorously curated code preference tasks to cover three verifiable properties-correctness, efficiency, and security-along with human preference. Our evaluation shows that CodeFavor holistically improves the accuracy of model-based code preferences by up to 28.8%. Meanwhile, CodeFavor models can match the performance of models with 6-9x more parameters while being 34x more cost-effective. We also rigorously validate the design choices in CodeFavor via a comprehensive set of controlled experiments. Furthermore, we discover the prohibitive costs and limitations of human-based code preference: despite spending 23.4 person-minutes on each task, 15.1-40.3% of tasks remain unsolved. Compared to model-based preference, human preference tends to be more accurate under the objective of code correctness, while being sub-optimal for non-functional objectives.

Updated: 2024-10-23 22:22:20

标题: 通过合成进化学习代码偏好

摘要: 大型语言模型（LLMs）最近展示了出色的编码能力。然而，基于良好形式属性评估代码生成并与开发者偏好相一致仍具有挑战性。在本文中，我们探讨了在代码偏好学习新挑战下的两个关键问题：（i）如何训练模型来预测代码的有意义偏好？以及（ii）人类和LLM偏好与可验证代码属性和开发者代码口味如何一致？为此，我们提出了CodeFavor，一个从合成进化数据（包括代码提交和代码评论）中训练成对代码偏好模型的框架。为了评估代码偏好，我们引入了CodePrefBench，一个包含1364个经过严格筛选的代码偏好任务的基准，涵盖三个可验证属性-正确性、效率和安全性-以及人类偏好。我们的评估显示，CodeFavor通过提高模型基于代码偏好的准确性达到了28.8%。与此同时，CodeFavor模型在成本效益上可以匹配具有6-9倍更多参数的模型的性能，同时成本效益提高了34倍。我们还通过一系列严格的对照实验来验证了CodeFavor中的设计选择。此外，我们发现基于人类的代码偏好存在高昂的成本和限制：尽管每个任务花费了23.4人分钟，15.1-40.3%的任务仍未解决。与基于模型的偏好相比，基于人类的偏好在代码正确性目标下往往更准确，但对非功能性目标并不是最佳选择。

更新时间: 2024-10-23 22:22:20

领域: cs.LG,cs.CL,cs.SE

下载: http://arxiv.org/abs/2410.03837v2

Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches

This study investigates explainable machine learning algorithms for identifying depression from speech. Grounded in evidence from speech production that depression affects motor control and vowel generation, pre-trained vowel-based embeddings, that integrate semantically meaningful linguistic units, are used. Following that, an ensemble learning approach decomposes the problem into constituent parts characterized by specific depression symptoms and severity levels. Two methods are explored: a "bottom-up" approach with 8 models predicting individual Patient Health Questionnaire-8 (PHQ-8) item scores, and a "top-down" approach using a Mixture of Experts (MoE) with a router module for assessing depression severity. Both methods depict performance comparable to state-of-the-art baselines, demonstrating robustness and reduced susceptibility to dataset mean/median values. System explainability benefits are discussed highlighting their potential to assist clinicians in depression diagnosis and screening.

Updated: 2024-10-23 22:03:09

标题: 使用基于元音的集成学习方法进行鲁棒且可解释的言语抑郁识别

摘要: 本研究调查了用于从语音中识别抑郁症的可解释机器学习算法。基于语音产生的证据表明，抑郁症影响了运动控制和元音生成，使用了预训练的基于元音的嵌入，将语义上有意义的语言单位整合在一起。随后，一个集成学习方法将问题分解为由特定抑郁症症状和严重程度水平表征的组成部分。探讨了两种方法：一种是“自下而上”方法，使用8个模型预测单个患者健康问卷-8（PHQ-8）项目得分，另一种是使用专家混合模型（MoE）的“自上而下”方法，其中包含一个用于评估抑郁症严重程度的路由器模块。这两种方法都表现出与最先进基线相当的性能，显示出稳健性和对数据集均值/中位数值的降低敏感性。讨论了系统的可解释性优势，强调了它们在帮助临床医生进行抑郁症诊断和筛查方面的潜力。

更新时间: 2024-10-23 22:03:09

领域: cs.LG,cs.CL,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.18298v1

Kenyan Sign Language (KSL) Dataset: Using Artificial Intelligence (AI) in Bridging Communication Barrier among the Deaf Learners

Kenyan Sign Language (KSL) is the primary language used by the deaf community in Kenya. It is the medium of instruction from Pre-primary 1 to university among deaf learners, facilitating their education and academic achievement. Kenyan Sign Language is used for social interaction, expression of needs, making requests and general communication among persons who are deaf in Kenya. However, there exists a language barrier between the deaf and the hearing people in Kenya. Thus, the innovation on AI4KSL is key in eliminating the communication barrier. Artificial intelligence for KSL is a two-year research project (2023-2024) that aims to create a digital open-access AI of spontaneous and elicited data from a representative sample of the Kenyan deaf community. The purpose of this study is to develop AI assistive technology dataset that translates English to KSL as a way of fostering inclusion and bridging language barriers among deaf learners in Kenya. Specific objectives are: Build KSL dataset for spoken English and video recorded Kenyan Sign Language and to build transcriptions of the KSL signs to a phonetic-level interface of the sign language. In this paper, the methodology for building the dataset is described. Data was collected from 48 teachers and tutors of the deaf learners and 400 learners who are Deaf. Participants engaged mainly in sign language elicitation tasks through reading and singing. Findings of the dataset consisted of about 14,000 English sentences with corresponding KSL Gloss derived from a pool of about 4000 words and about 20,000 signed KSL videos that are either signed words or sentences. The second level of data outcomes consisted of 10,000 split and segmented KSL videos. The third outcome of the dataset consists of 4,000 transcribed words into five articulatory parameters according to HamNoSys system.

Updated: 2024-10-23 22:01:31

标题: 肯尼亚手语（KSL）数据集：利用人工智能（AI）消除聋人学习者之间的沟通障碍

摘要: 肯尼亚手语（KSL）是肯尼亚聋人社区主要使用的语言。从学前1年级到大学，KSL是聋人学习者的教学媒介，促进他们的教育和学术成就。肯尼亚手语用于社交互动、表达需求、提出请求以及在肯尼亚聋人之间进行一般交流。然而，在肯尼亚聋人和听力人之间存在语言障碍。因此，AI4KSL的创新对消除沟通障碍至关重要。AI4KSL是一个为期两年的研究项目（2023-2024），旨在创建一个数字开放访问的AI，从肯尼亚聋人社区的代表性样本中获得自发和引导数据。本研究的目的是开发AI辅助技术数据集，将英语翻译成KSL，以促进包容性并消除肯尼亚聋人学习者之间的语言障碍。具体目标包括：建立口语英语和视频录制的肯尼亚手语的KSL数据集，并将KSL手语的转录建立到手语的音素级界面。在本文中，描述了构建数据集的方法论。数据收集自48位聋人学习者的教师和导师以及400名聋人学习者。参与者主要通过阅读和唱歌进行手语引发任务。数据集的发现包括约14,000个英语句子，对应的KSL术语从大约4000个单词中衍生而来，以及约20,000个手语视频，其中包含手语单词或句子。第二级数据结果包括10,000个分割和分段的KSL视频。数据集的第三个结果包括4,000个单词的转录，根据HamNoSys系统分为五个发音参数。

更新时间: 2024-10-23 22:01:31

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18295v1

Real-Time Summarization of Twitter

In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.

Updated: 2024-10-23 22:01:13

标题: Twitter实时摘要

摘要: 在这篇论文中，我们描述了我们在TREC实时Twitter摘要中的方法。我们专注于实时推送通知场景，这需要一个系统监控采样推文流并返回与给定兴趣档案相关且新颖的推文。我们使用Dirichlet分数和极少平滑（基线）来分类推文是否与给定兴趣档案相关。通过使用包括平均精度（MAP）、累积增益（CG）和折扣累积增益（DCG）在内的度量标准，实验证明我们的方法表现良好。此外，希望从推送队列中删除冗余推文。由于精度限制，我们仅在本文中描述算法。

更新时间: 2024-10-23 22:01:13

领域: cs.LG

下载: http://arxiv.org/abs/2407.08125v2

NexusIndex: Integrating Advanced Vector Indexing and Multi-Model Embeddings for Robust Fake News Detection

The proliferation of fake news on digital platforms has underscored the need for robust and scalable detection mechanisms. Traditional methods often fall short in handling large and diverse datasets due to limitations in scalability and accuracy. In this paper, we propose NexusIndex, a novel framework and model that enhances fake news detection by integrating advanced language models, an innovative FAISSNexusIndex layer, and attention mechanisms. Our approach leverages multi-model embeddings to capture rich contextual and semantic nuances, significantly improving text interpretation and classification accuracy. By transforming articles into high-dimensional embeddings and indexing them efficiently, NexusIndex facilitates rapid similarity searches across extensive collections of news articles. The FAISSNexusIndex layer further optimizes this process, enabling real-time detection and enhancing the system's scalability and performance. Our experimental results demonstrate that NexusIndex outperforms state-of-the-art methods in efficiency and accuracy across diverse datasets.

Updated: 2024-10-23 21:59:39

标题: NexusIndex：整合先进的向量索引和多模型嵌入以实现强大的假新闻检测

摘要: 数字平台上虚假新闻的泛滥凸显了对强大且可扩展检测机制的需求。传统方法在处理大规模和多样化数据集时往往存在可扩展性和准确性方面的局限。在本文中，我们提出了一种新颖的框架和模型NexusIndex，通过整合先进的语言模型、创新的FAISSNexusIndex层和注意机制来增强虚假新闻检测。我们的方法利用多模型嵌入来捕捉丰富的上下文和语义细微差别，显著提高文本解释和分类准确性。通过将文章转换为高维嵌入并高效索引它们，NexusIndex促进了在大量新闻文章集合中进行快速相似性搜索。FAISSNexusIndex层进一步优化了这一过程，实现了实时检测并增强了系统的可扩展性和性能。我们的实验结果表明，NexusIndex在效率和准确性方面优于现有方法，适用于各种数据集。

更新时间: 2024-10-23 21:59:39

领域: cs.IR,cs.DB,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.18294v1

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

Despite the advances in probabilistic model checking, the scalability of the verification methods remains limited. In particular, the state space often becomes extremely large when instantiating parameterized Markov decision processes (MDPs) even with moderate values. Synthesizing policies for such \emph{huge} MDPs is beyond the reach of available tools. We propose a learning-based approach to obtain a reasonable policy for such huge MDPs. The idea is to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning. Consequently, our method bypasses the need for explicit state-space exploration of large models, providing a practical solution to the state-space explosion problem. We demonstrate the efficacy of our approach by performing extensive experimentation on the relevant models from the quantitative verification benchmark set. The experimental results indicate that our policies perform well, even when the size of the model is orders of magnitude beyond the reach of state-of-the-art analysis tools.

Updated: 2024-10-23 21:57:05

标题: 1-2-3-Go！通过决策树学习和泛化为参数化马尔可夫决策过程综合政策

摘要: 尽管概率模型检查方面取得了进展，但验证方法的可扩展性仍然有限。特别是，在实例化参数化马尔可夫决策过程（MDPs）时，状态空间往往变得非常大，即使参数值适中。为这样的庞大MDPs合成策略超出了现有工具的范围。我们提出了一种基于学习的方法，以获得这样庞大MDPs的合理策略。其思想是通过决策树学习将通过模型检查小实例获得的最优策略推广到更大的实例。因此，我们的方法绕过了对大型模型的显式状态空间探索的需要，为状态空间爆炸问题提供了实际解决方案。我们通过在定量验证基准集中进行广泛的实验来证明我们方法的有效性。实验结果表明，即使模型的规模远远超出了最先进的分析工具的范围，我们的策略仍表现出色。

更新时间: 2024-10-23 21:57:05

领域: cs.AI,cs.LG,cs.LO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.18293v1

Enhancing Enterprise Security with Zero Trust Architecture

Zero Trust Architecture (ZTA) represents a transformative approach to modern cybersecurity, directly addressing the shortcomings of traditional perimeter-based security models. With the rise of cloud computing, remote work, and increasingly sophisticated cyber threats, perimeter defenses have proven ineffective at mitigating risks, particularly those involving insider threats and lateral movement within networks. ZTA shifts the security paradigm by assuming that no user, device, or system can be trusted by default, requiring continuous verification and the enforcement of least privilege access for all entities. This paper explores the key components of ZTA, such as identity and access management (IAM), micro-segmentation, continuous monitoring, and behavioral analytics, and evaluates their effectiveness in reducing vulnerabilities across diverse sectors, including finance, healthcare, and technology. Through case studies and industry reports, the advantages of ZTA in mitigating insider threats and minimizing attack surfaces are discussed. Additionally, the paper addresses the challenges faced during ZTA implementation, such as scalability, integration complexity, and costs, while providing best practices for overcoming these obstacles. Lastly, future research directions focusing on emerging technologies like AI, machine learning, blockchain, and their integration into ZTA are examined to enhance its capabilities further.

Updated: 2024-10-23 21:53:16

标题: 使用零信任架构增强企业安全

摘要: 零信任架构（ZTA）代表了一种对现代网络安全的变革性方法，直接解决了传统基于边界的安全模型的不足。随着云计算、远程办公和日益复杂的网络威胁的兴起，边界防御已被证明无法有效地减轻风险，特别是涉及内部威胁和网络内部移动的风险。ZTA通过假设默认情况下不信任任何用户、设备或系统，要求对所有实体进行持续验证和最低特权访问的强制执行来改变安全范式。本文探讨了ZTA的关键组成部分，如身份和访问管理（IAM）、微分段、连续监控和行为分析，并评估了它们在减少不同领域（包括金融、医疗保健和技术）中的漏洞方面的有效性。通过案例研究和行业报告，讨论了ZTA在减轻内部威胁和最小化攻击面方面的优势。此外，本文还讨论了ZTA实施过程中面临的挑战，如可扩展性、集成复杂性和成本，并提供了克服这些障碍的最佳实践。最后，还审视了着眼于新兴技术（如人工智能、机器学习、区块链）以及它们与ZTA整合的未来研究方向，以进一步增强其能力。

更新时间: 2024-10-23 21:53:16

领域: cs.CR

下载: http://arxiv.org/abs/2410.18291v1

LEGO: Language Model Building Blocks

Large language models (LLMs) are essential in natural language processing (NLP) but are costly in data collection, pre-training, fine-tuning, and inference. Task-specific small language models (SLMs) offer a cheaper alternative but lack robustness and generalization. This paper proposes LEGO, a novel technique to extract SLMs from an LLM and recombine them. Using state-of-the-art LLM pruning strategies, we can create task- and user-specific SLM building blocks that are efficient for fine-tuning and inference while also preserving user data privacy. LEGO utilizes Federated Learning and a novel aggregation scheme for the LLM reconstruction, maintaining robustness without high costs and preserving user data privacy. We experimentally demonstrate the versatility of LEGO, showing its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness.

Updated: 2024-10-23 21:31:42

标题: 乐高：语言模型构建基块

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）中至关重要，但在数据收集、预训练、微调和推断方面成本高昂。面向任务的小型语言模型（SLMs）提供了一种更便宜的替代方案，但缺乏鲁棒性和泛化能力。本文提出了LEGO，一种从LLM中提取SLMs并重新组合它们的新技术。利用最先进的LLM剪枝策略，我们可以创建适用于微调和推断的任务和用户特定的SLM构建模块，同时保护用户数据隐私。LEGO利用联邦学习和一种新颖的聚合方案对LLM进行重建，保持了鲁棒性而不增加成本，并保护用户数据隐私。我们通过实验证明了LEGO的多功能性，展示了其能够实现模型异构性，并缓解数据异构性的影响，同时保持LLM的鲁棒性。

更新时间: 2024-10-23 21:31:42

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18287v1

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.

Updated: 2024-10-23 21:25:39

标题: 无滤镜：对比视觉-语言模型中的文化和社会经济多样性

摘要: 我们研究了对比视觉语言模型(VLMs)中的文化和经济多样性。通过使用各种基准数据集和评估指标，我们提出了几个重要发现。首先，将训练数据过滤到英文图像文本对会对低经济地位社区造成不利影响，并且对文化理解产生负面影响。值得注意的是，这种性能差距并没有被目前流行的来自以西方为中心的ImageNet和COCO数据集的评估指标所捕捉到，甚至与之相悖。其次，在对英文内容进行微调之前，使用全球未经过滤的数据进行预训练可以提高文化理解，而不会降低在这些流行基准上的性能。第三，我们引入了地理定位任务作为一种新颖的评估指标，用于评估VLMs中的文化多样性。我们的工作强调了使用多样化数据来创建更具包容性的多模态系统的价值，并为开发更好地代表全球视角的VLMs奠定了基础。

更新时间: 2024-10-23 21:25:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.13777v3

Augmenting Training Data with Vector-Quantized Variational Autoencoder for Classifying RF Signals

Radio frequency (RF) communication has been an important part of civil and military communication for decades. With the increasing complexity of wireless environments and the growing number of devices sharing the spectrum, it has become critical to efficiently manage and classify the signals that populate these frequencies. In such scenarios, the accurate classification of wireless signals is essential for effective spectrum management, signal interception, and interference mitigation. However, the classification of wireless RF signals often faces challenges due to the limited availability of labeled training data, especially under low signal-to-noise ratio (SNR) conditions. To address these challenges, this paper proposes the use of a Vector-Quantized Variational Autoencoder (VQ-VAE) to augment training data, thereby enhancing the performance of a baseline wireless classifier. The VQ-VAE model generates high-fidelity synthetic RF signals, increasing the diversity and fidelity of the training dataset by capturing the complex variations inherent in RF communication signals. Our experimental results show that incorporating VQ-VAE-generated data significantly improves the classification accuracy of the baseline model, particularly in low SNR conditions. This augmentation leads to better generalization and robustness of the classifier, overcoming the constraints imposed by limited real-world data. By improving RF signal classification, the proposed approach enhances the efficacy of wireless communication in both civil and tactical settings, ensuring reliable and secure operations. This advancement supports critical decision-making and operational readiness in environments where communication fidelity is essential.

Updated: 2024-10-23 21:17:45

标题: 使用向量量化变分自动编码器增广训练数据以进行射频信号分类

摘要: 射频（RF）通信几十年来一直是民用和军事通信的重要组成部分。随着无线环境的复杂性不断增加，以及共享频谱的设备数量不断增多，有效管理和分类占据这些频率的信号变得至关重要。在这种情况下，准确分类无线信号对于有效的频谱管理、信号拦截和干扰缓解至关重要。然而，由于受限于低信噪比（SNR）条件下标记训练数据的有限性，无线RF信号的分类经常面临挑战。为了解决这些挑战，本文提出使用矢量量化变分自动编码器（VQ-VAE）来增加训练数据，从而提高基线无线分类器的性能。VQ-VAE模型生成高保真度的合成RF信号，通过捕捉RF通信信号固有的复杂变化，增加训练数据集的多样性和保真度。我们的实验结果表明，引入VQ-VAE生成的数据显著提高了基线模型的分类准确性，特别是在低SNR条件下。这种增强导致分类器的更好泛化性和鲁棒性，克服了受限的真实世界数据所施加的限制。通过改进RF信号分类，所提出的方法增强了民用和战术环境中无线通信的有效性，确保可靠和安全的运行。这一进步支持在通信保真性至关重要的环境中的关键决策制定和操作准备。

更新时间: 2024-10-23 21:17:45

领域: cs.LG,cs.NI,eess.SP

下载: http://arxiv.org/abs/2410.18283v1

Diffusion Bridge Implicit Models

Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluations. In this work, we take the first step in fast sampling of DDBMs without extra training, motivated by the well-established recipes in diffusion models. We generalize DDBMs via a class of non-Markovian diffusion bridges defined on the discretized timesteps concerning sampling, which share the same marginal distributions and training objectives, and give rise to generative processes ranging from stochastic to deterministic, resulting in diffusion bridge implicit models (DBIMs). DBIMs are not only up to 25$\times$ faster than the vanilla sampler of DDBMs but also induce a novel, simple, and insightful form of ordinary differential equation (ODE) which inspires high-order numerical solvers. Moreover, DBIMs maintain the generation diversity in a distinguished way, by using a booting noise in the initial sampling step, which enables faithful encoding, reconstruction, and semantic interpolation in image translation tasks. Code is available at \url{https://github.com/thu-ml/DBIM}.

Updated: 2024-10-23 21:04:27

标题: 扩散桥隐式模型

摘要: 去噪扩散桥模型（DDBMs）是扩散模型的一种强大变体，用于在给定端点的情况下在两个任意配对分布之间进行插值。尽管在诸如图像翻译等任务中表现出有希望的性能，但DDBMs需要进行计算密集型的采样过程，其中涉及通过数百个网络评估来模拟（随机）微分方程。在这项工作中，我们通过在扩散模型中已经建立的方法的启发，采取了快速采样DDBMs的第一步，而无需额外的训练。我们通过在取样方面定义在离散化时间步长上的一类非马尔可夫扩散桥来推广DDBMs，这些扩散桥具有相同的边缘分布和训练目标，并产生从随机到确定性的生成过程，从而形成了扩散桥隐式模型（DBIMs）。DBIMs不仅比DDBMs的普通采样器快25倍，而且引入了一种新颖、简单且具有启发性的普通微分方程（ODE），激励了高阶数值求解器。此外，DBIMs以一种独特的方式保持了生成多样性，通过在初始采样步骤中使用引导噪声，这使得在图像翻译任务中实现了忠实的编码、重建和语义插值。代码可在\url{https://github.com/thu-ml/DBIM}找到。

更新时间: 2024-10-23 21:04:27

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.15885v2

Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans

In this paper, we study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time, such that a robot can be confident of its ability to perform a complex manipulation task in a given region of its workspace. Although Learning from Demonstrations has been an active area of research, the problems of checking whether a set of demonstrations is sufficient, and systematically seeking additional demonstrations have remained open. We present a novel approach to address these open problems using (i) a screw geometric representation to generate manipulation plans from demonstrations, which makes the sufficiency of a set of demonstrations measurable; (ii) a sampling strategy based on PAC-learning from multi-armed bandit optimization to evaluate the robot's ability to generate manipulation plans in a subregion of its task space; and (iii) a heuristic to seek additional demonstration from areas of weakness. Thus, we present an approach for the robot to incrementally and actively ask for new demonstration examples until the robot can assess with high confidence that it can perform the task successfully. We present experimental results on two example manipulation tasks, namely, pouring and scooping, to illustrate our approach. A short video on the method: https://youtu.be/R-qICICdEos

Updated: 2024-10-23 20:57:56

标题: 螺纹几何遇上强盗：逐步获取演示以生成操纵计划

摘要: 在本文中，我们研究了系统地获取足够一组动作示范的问题，以便机器人可以确信自己能够在其工作空间的特定区域内执行复杂的操作任务。尽管从示范中学习一直是一个活跃的研究领域，但检查一组示范是否足够以及系统地寻求额外的示范仍然是一个未解决的问题。我们提出了一种新颖的方法来解决这些问题，使用(i)螺旋几何表示来从示范中生成操作计划，这使得一组示范的充分性可测量；(ii)基于PAC-learning的采样策略从多臂老虎机优化中评估机器人在任务空间的子区域内生成操作计划的能力；(iii)一种启发式方法从弱点区域寻求额外的示范。因此，我们提出了一种方法，使机器人逐步主动地请求新的示范示例，直到机器人可以高度自信地评估自己能够成功执行任务。我们在两个示例操作任务上进行了实验结果展示，即倒和舀，以说明我们的方法。有关该方法的短视频：https://youtu.be/R-qICICdEos

更新时间: 2024-10-23 20:57:56

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.18275v1

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

While Generative Adversarial Networks (GANs) show increasing performance and the level of realism is becoming indistinguishable from natural images, this also comes with high demands on data and computation. We show that state-of-the-art GAN models -- such as they are being publicly released by researchers and industry -- can be used for a range of applications beyond unconditional image generation. We achieve this by an iterative scheme that also allows gaining control over the image generation process despite the highly non-linear latent spaces of the latest GAN models. We demonstrate that this opens up the possibility to re-use state-of-the-art, difficult to train, pre-trained GANs with a high level of control even if only black-box access is granted. Our work also raises concerns and awareness that the use cases of a published GAN model may well reach beyond the creators' intention, which needs to be taken into account before a full public release. Code is available at https://github.com/hui-po-wang/hijackgan.

Updated: 2024-10-23 20:55:35

标题: 劫持式生成对抗网络：预训练黑盒GAN的意外使用

摘要: 尽管生成对抗网络（GANs）的性能不断提高，真实感水平已经与自然图像难以区分，但这也带来了对数据和计算的高要求。我们展示了最先进的GAN模型 - 例如研究人员和行业公开发布的模型 - 可用于超出无条件图像生成的一系列应用。我们通过一个迭代方案实现了这一点，该方案还允许在最新GAN模型的高度非线性潜在空间中获得对图像生成过程的控制。我们证明了这开辟了利用最先进、难以训练的、预训练的GAN的可能性，即使只有黑盒访问也能实现高水平的控制。我们的工作还引起了关注和意识，即发布的GAN模型的用例可能超出了创建者的意图，这在完全公开发布之前需要考虑。代码可在https://github.com/hui-po-wang/hijackgan 上找到。

更新时间: 2024-10-23 20:55:35

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2011.14107v3

Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences

The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented under mild conditions on $f$. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function $f$ are presented. Previously known solutions to common regularization choices are obtained by leveraging the flexibility of the family of $f$-divergences. These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of $f$-divergences when used in the ERM-$f$DR problem: $i\bigl)$ $f$-divergence regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and $ii\bigl)$ any $f$-divergence regularization is equivalent to a different $f$-divergence regularization with an appropriate transformation of the empirical risk function.

Updated: 2024-10-23 20:55:31

标题: 经验风险最小化与f-分歧族上的正则化的等价性

摘要: 在对$f$进行正则化的经验风险最小化（ERM-$f$DR）问题的解决方案在对$f$的条件下被提出。在这种条件下，最优度量被证明是唯一的。针对特定选择的函数$f$的解决方案示例被提出。通过利用$f$-散度族的灵活性获得了先前已知的常见正则化选择的解决方案。这些包括使用相对熵正则化（Type-I和Type-II）的经验风险最小化的唯一解决方案。对解决方案的分析揭示了在ERM-$f$DR问题中使用$f$-散度时的以下特性：$i)$ $f$-散度正则化迫使解决方案的支持与参考度量的支持相一致，这引入了一个强大的归纳偏差，支配了训练数据提供的证据；和$ii)$ 任何$f$-散度正则化都等价于通过适当转换经验风险函数得到不同的$f$-散度正则化。

更新时间: 2024-10-23 20:55:31

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2402.00501v2

Efficient End-to-end Language Model Fine-tuning on Graphs

Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of language models (LMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LM approaches that exploit text information in graphs suffer from inferior computation and data efficiency. In this study, we introduce LEADING, a novel and efficient approach for end-to-end fine-tuning of language models on TAGs. To enhance data efficiency, LEADING efficiently transfers rich knowledge from LMs to downstream graph learning tasks with limited labeled data by employing end-to-end training of LMs and GNNs in a semi-supervised learning setting. To address associated computation efficiency issues, it introduces two techniques: neighbor decoupling targeting LMs and implicit graph modeling targeting GNNs, respectively. Our proposed approach demonstrates superior performance, achieving state-of-the-art (SOTA) results on the ogbn-arxiv leaderboard, while maintaining computation cost and memory overhead comparable to graph-less fine-tuning of LMs. Through comprehensive experiments, we showcase its superior computation and data efficiency, presenting a promising solution for various LMs and graph learning tasks on TAGs.

Updated: 2024-10-23 20:53:48

标题: 在图上高效端到端的语言模型微调

摘要: 学习来自文本属性图（TAGs）已经引起了重要关注，因为它在现实世界中有广泛的应用。语言模型（LMs）的快速发展已经彻底改变了我们处理文本数据的方式，这表明了用于替代通常用于图神经网络（GNNs）的浅层文本嵌入的强大潜力。然而，我们发现现有的利用文本信息在图中的LM方法存在着计算和数据效率低下的问题。在这项研究中，我们介绍了LEADING，一种新颖高效的方法，用于在TAGs上对语言模型进行端到端微调。为了增强数据效率，LEADING通过在半监督学习设置中对LMs和GNNs进行端到端训练，有效地将丰富的知识转移到有限标记数据的下游图学习任务中。为了解决相关的计算效率问题，它引入了两种技术：针对LMs的邻域解耦和针对GNNs的隐式图建模。我们提出的方法表现出卓越的性能，在ogbn-arxiv排行榜上取得了最先进的结果，同时保持了与无图微调的LMs相比的计算成本和内存开销。通过全面的实验证明了其卓越的计算和数据效率，为各种LMs和TAGs上的图学习任务提供了一个有前途的解决方案。

更新时间: 2024-10-23 20:53:48

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.04737v2

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often overcomes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. We theoretically prove that ProgFed converges at the same asymptotic rate as standard training on full models. Extensive results on a broad range of architectures, including CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple classification to medical image segmentation show that our highly effective training approach saves up to $20\%$ computation and up to $63\%$ communication costs for converged models. As our approach is also complimentary to prior work on compression, we can achieve a wide range of trade-offs by combining these techniques, showing reduced communication of up to $50\times$ at only $0.1\%$ loss in utility. Code is available at https://github.com/hui-po-wang/ProgFed.

Updated: 2024-10-23 20:52:23

标题: ProgFed：通过渐进训练实现高效、通信有效和计算高效的联邦学习

摘要: 联邦学习是一种强大的分布式学习方案，允许众多边缘设备共同训练模型，而无需共享它们的数据。然而，对于边缘设备来说，训练是资源密集型的，有限的网络带宽通常是主要瓶颈。先前的工作通常通过将模型或消息压缩为紧凑格式（例如，通过梯度压缩或蒸馏）来克服这些限制。相比之下，我们提出了ProgFed，这是第一个用于高效和有效的联邦学习的渐进训练框架。它本质上减少了计算和双向通信成本，同时保持了最终模型的强大性能。我们在广泛的架构（包括CNNs（VGG、ResNet、ConvNets）和U-nets）以及从简单分类到医学图像分割等各种任务上的大量结果表明，我们的高效训练方法可以节省高达20%的计算成本和高达63%的通信成本，以获得收敛模型。由于我们的方法也与之前的压缩工作互补，通过结合这些技术，我们可以实现广泛的权衡，仅在效用损失为0.1%时，通信减少高达50倍。代码可在https://github.com/hui-po-wang/ProgFed找到。

更新时间: 2024-10-23 20:52:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2110.05323v3

FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations

Conventional gradient-sharing approaches for federated learning (FL), such as FedAvg, rely on aggregation of local models and often face performance degradation under differential privacy (DP) mechanisms or data heterogeneity, which can be attributed to the inconsistency between the local and global objectives. To address this issue, we propose FedLAP-DP, a novel privacy-preserving approach for FL. Our formulation involves clients synthesizing a small set of samples that approximate local loss landscapes by simulating the gradients of real images within a local region. Acting as loss surrogates, these synthetic samples are aggregated on the server side to uncover the global loss landscape and enable global optimization. Building upon these insights, we offer a new perspective to enforce record-level differential privacy in FL. A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes while achieving an improved trade-off between privacy and utility. Extensive experiments validate the superiority of our approach across various datasets with highly skewed distributions in both DP and non-DP settings. Beyond the promising performance, our approach presents a faster convergence speed compared to typical gradient-sharing methods and opens up the possibility of trading communication costs for better performance by sending a larger set of synthetic images. The source is available at \url{https://github.com/hui-po-wang/FedLAP-DP}.

Updated: 2024-10-23 20:45:38

标题: FedLAP-DP: 通过共享差分隐私损失近似进行联邦学习

摘要: 传统的梯度共享方法对于联邦学习（FL），如FedAvg，依赖于本地模型的聚合，并且经常在差分隐私（DP）机制或数据异质性下面临性能下降，这可以归因于本地和全局目标之间的不一致性。为了解决这个问题，我们提出了FedLAP-DP，一种新颖的隐私保护方法用于FL。我们的公式涉及客户端合成一小组样本，通过模拟在本地区域内真实图像的梯度来近似本地损失景观。这些合成样本作为损失代理，在服务器端聚合，以揭示全局损失景观并实现全局优化。基于这些见解，我们提供了在FL中实施基于记录级别差分隐私的新视角。正式的隐私分析表明，FedLAP-DP产生与典型梯度共享方案相同的隐私成本，同时实现了隐私和效用之间的改进权衡。广泛的实验验证了我们方法在各种具有高度倾斜分布的数据集中的优越性，无论是在DP还是非DP设置中。除了令人期待的性能之外，我们的方法相对于典型的梯度共享方法具有更快的收敛速度，并且通过发送更大的合成图像集合来交换通信成本以获得更好的性能。源代码可在\url{https://github.com/hui-po-wang/FedLAP-DP}上找到。

更新时间: 2024-10-23 20:45:38

领域: cs.LG

下载: http://arxiv.org/abs/2302.01068v5

Self-training Language Models for Arithmetic Reasoning

Recent language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving models' reasoning capabilities without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). In systematic experimentation across six different arithmetic reasoning datasets, we find that models can substantially improve in both single-round (offline) and online self-training, reaching a correct result in +13.9% and +25.9% more cases, respectively, underlining the importance of actuality of self-training feedback. We further find that in the single-round, offline self-training, traditional supervised training can deliver gains comparable to preference optimization, but in online self-training, preference optimization methods largely outperform supervised training thanks to their superior stability and robustness on unseen types of problems.

Updated: 2024-10-23 20:43:02

标题: 自我训练语言模型用于算术推理

摘要: 最近的语言模型在涉及复杂多步推理的任务中取得了令人印象深刻的成果，但要进一步扩展这些能力传统上需要昂贵的收集更多带注释的数据。在这项工作中，我们探讨了在不使用新数据的情况下通过自动反馈来提高模型的推理能力的潜力（自我训练）。在对六种不同的算术推理数据集进行系统实验中，我们发现模型在单轮（离线）和在线自我训练中都可以显著改进，分别在更多案例中达到正确结果+13.9%和+25.9%，强调了自我训练反馈的实效性的重要性。我们进一步发现，在单轮、离线自我训练中，传统监督训练可以获得与偏好优化相当的增益，但在在线自我训练中，由于在未见问题类型上的稳定性和鲁棒性更优，偏好优化方法明显优于监督训练。

更新时间: 2024-10-23 20:43:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08400v3

Stabilizing black-box model selection with the inflated argmax

Model selection is the process of choosing from a class of candidate models given data. For instance, methods such as the LASSO and sparse identification of nonlinear dynamics (SINDy) formulate model selection as finding a sparse solution to a linear system of equations determined by training data. However, absent strong assumptions, such methods are highly unstable: if a single data point is removed from the training set, a different model may be selected. This paper presents a new approach to stabilizing model selection that leverages a combination of bagging and an "inflated" argmax operation. Our method selects a small collection of models that all fit the data, and it is stable in that, with high probability, the removal of any training point will result in a collection of selected models that overlaps with the original collection. In addition to developing theoretical guarantees, we illustrate this method in (a) a simulation in which strongly correlated covariates make standard LASSO model selection highly unstable and (b) a Lotka-Volterra model selection problem focused on identifying how competition in an ecosystem influences species' abundances. In both settings, the proposed method yields stable and compact collections of selected models, outperforming a variety of benchmarks.

Updated: 2024-10-23 20:39:07

标题: 使用膨胀的最大值来稳定黑盒模型选择

摘要: 模型选择是在给定数据的情况下从一类候选模型中进行选择的过程。例如，诸如LASSO和稀疏非线性动力学识别（SINDy）等方法将模型选择形式化为在由训练数据确定的线性方程组中寻找稀疏解。然而，在没有强假设的情况下，这些方法非常不稳定：如果从训练集中移除一个数据点，可能会选择不同的模型。本文提出了一种通过结合bagging和“膨胀”argmax操作来稳定模型选择的新方法。我们的方法选择了一个适合所有数据的小型模型集合，且具有稳定性，高概率下，删除任何训练点都会导致选择的模型集合与原始集合重叠。除了发展理论保证外，我们还在以下两个方面说明了这种方法：（a）一个模拟实验中，强相关的协变量导致标准LASSO模型选择非常不稳定，（b）一个专注于识别生态系统中竞争如何影响物种丰度的Lotka-Volterra模型选择问题。在这两种情况下，所提出的方法产生了稳定且紧凑的模型集合，优于各种基准。

更新时间: 2024-10-23 20:39:07

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.18268v1

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ($\textit{i.e.,}$ backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models ($\textit{e.g.,}$ ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning these models. To address these challenges, we establish new standards for an effective and feasible backdoor attack in the context of large pre-trained models. In line with these standards, we introduce our EDT model, an \textbf{E}fficient, \textbf{D}ata-free, \textbf{T}raining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models, which replaces the embedding of the poisoned image with the target image without poisoning the training dataset or training the victim model. Our experiments, conducted across various pre-trained models such as ViT, CLIP, BLIP, and stable diffusion, and on downstream tasks including image classification, image captioning, and image generation, demonstrate the effectiveness of our method. Our code is available in the supplementary material.

Updated: 2024-10-23 20:32:14

标题: 几秒钟的后门：通过模型编辑解锁大型预训练模型的漏洞

摘要: 大型预训练模型在各种下游任务中取得了显著的成功。然而，最近的研究表明一种敌对攻击（即后门攻击）可以通过污染它们的训练数据来操纵机器学习模型的行为，在大型预训练模型的实际应用中构成重大威胁，特别是对于那些定制模型。因此，解决探索预训练模型脆弱性的独特挑战至关重要。通过对大型预训练模型（例如ViT）进行后门攻击能力的实证研究，我们发现攻击大型预训练模型存在以下独特挑战：1）无法操纵甚至访问大型训练数据集，2）需要大量计算资源用于训练或微调这些模型。为了解决这些挑战，我们为大型预训练模型背景下有效且可行的后门攻击建立了新标准。根据这些标准，我们介绍了我们的EDT模型，一种高效、无需数据、无需训练的后门攻击方法。受模型编辑技术启发，EDT将基于编辑的轻量级码书注入到大型预训练模型的后门中，将受污染图像的嵌入替换为目标图像，而无需污染训练数据集或训练受害模型。我们在各种预训练模型（如ViT、CLIP、BLIP和稳定扩散）以及包括图像分类、图像字幕和图像生成在内的下游任务上进行的实验表明了我们方法的有效性。我们的代码可在附录材料中找到。

更新时间: 2024-10-23 20:32:14

领域: cs.AI

下载: http://arxiv.org/abs/2410.18267v1

Federated learning with differential privacy and an untrusted aggregator

Federated learning for training models over mobile devices is gaining popularity. Current systems for this task exhibit significant trade-offs between model accuracy, privacy guarantee, and device efficiency. For instance, Oort (OSDI 2021) provides excellent accuracy and efficiency but requires a trusted central server. On the other hand, Orchard (OSDI 2020) provides good accuracy and the rigorous guarantee of differential privacy over an untrusted server, but creates huge overhead for the devices. This paper describes Aero, a new federated learning system that significantly improves this trade-off. Aero guarantees good accuracy, differential privacy over an untrusted server, and keeps the device overhead low. The key idea of Aero is to tune system architecture and design to a specific set of popular, federated learning algorithms. This tuning requires novel optimizations and techniques, e.g., a new protocol to securely aggregate updates from devices. An evaluation of Aero demonstrates that it provides comparable accuracy to plain federated learning (without differential privacy), and it improves efficiency (CPU and network) over Orchard by up to $10^5\times$.

Updated: 2024-10-23 20:24:09

标题: 具有差分隐私和不可信聚合器的联邦学习

摘要: 移动设备上训练模型的联邦学习正变得越来越受欢迎。目前针对这一任务的系统在模型准确性、隐私保证和设备效率之间表现出显著的权衡。例如，Oort（OSDI 2021）提供了出色的准确性和效率，但需要一个可信赖的中央服务器。另一方面，Orchard（OSDI 2020）在一个不受信任的服务器上提供了良好的准确性和严格的差分隐私保证，但为设备带来了巨大的开销。本文描述了一种新的联邦学习系统Aero，显著改善了这种权衡。Aero保证了良好的准确性，在不受信任的服务器上实现了差分隐私，并保持了设备开销的低水平。Aero的关键思想是调整系统架构和设计以适应一组特定的流行联邦学习算法。这种调整需要新颖的优化和技术，例如，一种安全聚合设备更新的新协议。对Aero的评估表明，它提供了与普通联邦学习（没有差分隐私）相当的准确性，并且在CPU和网络方面比Orchard提高了高达$10^5$倍的效率。

更新时间: 2024-10-23 20:24:09

领域: cs.CR

下载: http://arxiv.org/abs/2312.10789v2

FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.

Updated: 2024-10-23 20:22:37

标题: 公平WASP：快速和最佳公平Wasserstein预处理

摘要: 近年来，出现了大量旨在减少不同子群体模型输出差异的机器学习方法。在许多情况下，训练数据可能被不同用户在多个下游应用中使用，这意味着最有效的干预可能是在训练数据本身上进行。在这项工作中，我们提出了FairWASP，一种新颖的预处理方法，旨在减少分类数据集中的差异，而无需修改原始数据。FairWASP返回样本级权重，使得重新加权的数据集最小化到原始数据集的Wasserstein距离，同时满足（经验版本的）人口平等原则，这是一种流行的公平性标准。我们在理论上证明整数权重是最优的，这意味着我们的方法可以等效地理解为复制或消除样本。因此，FairWASP可以用于构建可输入任何分类方法的数据集，而不仅仅是接受样本权重的方法。我们的工作基于将预处理任务重新表述为大规模混合整数规划（MIP），我们提出了一种基于切割平面方法的高效算法。实验证明，我们提出的优化算法在解决MIP及其线性规划放松问题方面显著优于最先进的商业求解器。进一步的实验突出了FairWASP在减少差异并在下游分类设置中保持准确性方面的竞争性表现。

更新时间: 2024-10-23 20:22:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.00109v3

Hamiltonian Matching for Symplectic Neural Integrators

Hamilton's equations of motion form a fundamental framework in various branches of physics, including astronomy, quantum mechanics, particle physics, and climate science. Classical numerical solvers are typically employed to compute the time evolution of these systems. However, when the system spans multiple spatial and temporal scales numerical errors can accumulate, leading to reduced accuracy. To address the challenges of evolving such systems over long timescales, we propose SympFlow, a novel neural network-based symplectic integrator, which is the composition of a sequence of exact flow maps of parametrised time-dependent Hamiltonian functions. This architecture allows for a backward error analysis: we can identify an underlying Hamiltonian function of the architecture and use it to define a Hamiltonian matching objective function, which we use for training. In numerical experiments, we show that SympFlow exhibits promising results, with qualitative energy conservation behaviour similar to that of time-stepping symplectic integrators.

Updated: 2024-10-23 20:21:56

标题: 哈密顿匹配在辛型神经积分器中的应用

摘要: Hamilton的运动方程在物理学的各个领域，包括天文学、量子力学、粒子物理学和气候科学中构成了一个基本框架。经典数值求解器通常被用来计算这些系统的时间演化。然而，当系统涵盖多个空间和时间尺度时，数值误差可能会累积，导致精度降低。为了解决在长时间尺度上演化这种系统的挑战，我们提出了SympFlow，这是一种基于神经网络的辛积分器，由一系列参数化的时间相关哈密顿函数的精确流映射组成。这种架构允许进行反向误差分析：我们可以识别架构的基础哈密顿函数，并使用它来定义一个匹配目标函数，用于训练。在数值实验中，我们展示了SympFlow表现出有希望的结果，其能量守恒行为与时间步辛积分器类似。

更新时间: 2024-10-23 20:21:56

领域: cs.LG,cs.NA,math.NA,physics.comp-ph

下载: http://arxiv.org/abs/2410.18262v1

Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters. As one fixes the problem instance and varies the parameters, the "dual" loss function typically has a piecewise-decomposable structure, i.e. is well-behaved except at certain sharp transition boundaries. In this work we initiate the study of techniques to develop efficient ERM learning algorithms for data-driven algorithm design by enumerating the pieces of the sum dual loss functions for a collection of problem instances. The running time of our approach scales with the actual number of pieces that appear as opposed to worst case upper bounds on the number of pieces. Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes using tools from computational geometry, and an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.

Updated: 2024-10-23 20:21:18

标题: 利用输出敏感技术加速基于数据驱动算法设计的ERM

摘要: 数据驱动的算法设计是一种有前途的基于学习的方法，用于超越具有可调参数的算法的最坏情况分析。一个重要的开放问题是为具有多个参数的组合算法族设计计算效率高的数据驱动算法。当一个人固定问题实例并改变参数时，“双重”损失函数通常具有分段可分解结构，即在某些尖锐的转换边界处除外，表现良好。在这项工作中，我们通过列举一组问题实例的总双重损失函数的片段来启动研究为数据驱动的算法设计开发有效的ERM学习算法的技术。我们的方法的运行时间随着实际出现的片段数量而扩展，而不是考虑最坏情况下的片段数量上限。我们的方法涉及两种新颖的成分--一种用计算几何工具枚举由一组超平面诱导的多面体的输出敏感算法，以及一种执行图，它紧凑地表示算法可能在所有可能的参数值下达到的所有状态。我们通过为定价问题、基于链接的聚类和基于动态规划的序列对齐提供算法来说明我们的技术。

更新时间: 2024-10-23 20:21:18

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2204.03569v3

Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers

Random numbers are incredibly important in a variety of fields, and the need for their validation remains important for safety. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers, however their quality still needs to be thoroughly validated. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests from the NIST Statistical Test Suite (STS), which are often slow and only perform one test at a time. Our work presents a deep learning model utilizing the Transformer architecture that 1) performs multiple NIST STS tests at once, and 2) runs much faster. This model outputs multi-label classification results on passing these statistical tests. We performed a thorough hyper-parameter optimization to converge on the best possible model and as a result, achieved a high degree of accuracy with a Macro F1-score of above 0.96. We also compared this model to a conventional deep learning method (Long Short Term Memory Recurrent Neural Networks) to quantify randomness and showed our model achieved similar performances while being much more efficient and scalable. The high performance and efficiency of this Transformer-based deep learning model showed that it can be a viable replacement for the NIST STS for validating random numbers.

Updated: 2024-10-23 20:18:14

标题: Transformer模型作为评估随机数质量的统计测试套件的高效替代品

摘要: 随机数字在各个领域中都非常重要，对于安全性而言，验证它们的必要性仍然很重要。量子随机数生成器（QRNG）在理论上可以生成真正的随机数字，但是它们的质量仍需要经过彻底的验证。通常，验证随机数字的任务被委托给不同的统计测试，例如来自NIST统计测试套件（STS）的测试，这些测试通常速度较慢，一次只能执行一个测试。我们的工作提出了一个利用Transformer架构的深度学习模型，该模型一次执行多个NIST STS测试，并且运行速度更快。该模型输出通过这些统计测试的多标签分类结果。我们进行了深入的超参数优化，以获得最佳模型，并且取得了高度精确度，Macro F1分数超过0.96。我们还将该模型与传统的深度学习方法（长短期记忆循环神经网络）进行了比较，以量化随机性，并展示了我们的模型在更高效和可扩展的同时取得了类似的表现。基于Transformer的深度学习模型的高性能和高效性表明，它可以作为NIST STS验证随机数字的可行替代方案。

更新时间: 2024-10-23 20:18:14

领域: cs.LG

下载: http://arxiv.org/abs/2405.03904v2

AutoSpec: Automated Generation of Neural Network Specifications

The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, particularly in safety-critical domains. Despite progress in the formal verification of neural networks, current practices require users to manually define model specifications -- properties that dictate expected model behavior in various scenarios. This manual process, however, is prone to human error, limited in scope, and time-consuming. In this paper, we introduce AutoSpec, the first framework to automatically generate comprehensive and accurate specifications for neural networks in learning-augmented systems. We also propose the first set of metrics for assessing the accuracy and coverage of model specifications, establishing a benchmark for future comparisons. Our evaluation across four distinct applications shows that AutoSpec outperforms human-defined specifications as well as two baseline approaches introduced in this study.

Updated: 2024-10-23 20:05:48

标题: AutoSpec：神经网络规范的自动生成

摘要: 随着神经网络在学习增强系统中的应用日益增加，模型的安全性和稳健性变得越来越重要，特别是在安全关键领域。尽管在神经网络的形式验证方面取得了进展，但目前的实践要求用户手动定义模型规范——这些规范规定了在各种场景下预期的模型行为。然而，这种手动过程容易出现人为错误，范围有限且耗时。本文介绍了AutoSpec，这是第一个自动生成神经网络学习增强系统中全面准确规范的框架。我们还提出了用于评估模型规范准确性和覆盖范围的第一套指标，为未来的比较建立了基准。我们在四个不同应用中的评估结果表明，AutoSpec的性能优于人工定义的规范以及本研究中引入的两种基准方法。

更新时间: 2024-10-23 20:05:48

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2409.10897v2

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling with a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is computationally inefficient. Inspired by classical deep RL literature, we propose separating generation and learning in RLHF. This enables asynchronous generation of new samples while simultaneously training on old samples, leading to faster training and more compute-optimal scaling. However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model. To understand the challenges in this regime, we investigate a fundamental question: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? Among several RLHF algorithms we tested, we find that online DPO is most robust to off-policy data, and robustness increases with the scale of the policy model. We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost, giving rise to a trade-off. Finally, we verify the scalability of asynchronous RLHF by training LLaMA 3.1 8B on an instruction-following task 40% faster than a synchronous run while matching final performance.

Updated: 2024-10-23 19:59:50

标题: 异步RLHF：更快更高效的基于离策略RL的语言模型

摘要: RLHF的主导范式是在线和同策略RL：同步地从大语言模型（LLM）策略生成，用奖励模型标记，并使用LLM自身输出的反馈进行学习。尽管性能良好，但这种范式在计算上效率低下。受经典深度RL文献启发，我们提出将生成和学习在RLHF中分离。这使得可以异步生成新样本，同时在旧样本上进行训练，从而实现更快的训练和更有效的计算扩展。然而，异步训练依赖于一个未被充分探索的领域，即在线但离策略RLHF：学习来自我们模型先前迭代的样本。为了了解这个领域的挑战，我们研究了一个基本问题：我们可以容忍多少离策略性以加速学习但保持性能不变？在测试的几种RLHF算法中，我们发现在线DPO对离策略数据最具鲁棒性，并且鲁棒性随着策略模型规模的增加而增加。我们进一步研究了异步RLHF的计算优化，但发现这些优化会带来性能成本，形成一种权衡。最后，我们通过在一个指令跟随任务上训练LLaMA 3.1 8B来验证异步RLHF的可扩展性，比同步运行快40%，同时保持最终性能。

更新时间: 2024-10-23 19:59:50

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18252v1

Lessons from Learning to Spin "Pens"

In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.

Updated: 2024-10-23 19:56:39

标题: 从学习“笔芯”中得到的经验

摘要: 手持操作类似笔状物品是我们日常生活中的重要技能，因为许多工具，如锤子和螺丝刀，形状类似。然而，由于缺乏高质量的演示和仿真与真实世界之间存在显著差距，目前基于学习的方法在这项任务中面临困难。在这项工作中，我们通过演示旋转类似笔状物品的能力，推动了基于学习的手持操作系统的发展边界。我们首先使用强化学习训练一个具有特权信息的oracle策略，并在仿真中生成高保真度的轨迹数据集。这有两个目的：1）在仿真中预训练感觉运动策略；2）在现实世界中进行开环轨迹重放。然后，我们使用这些真实世界轨迹对感觉运动策略进行微调，使其适应真实世界的动态。通过不到50个轨迹，我们的策略学会旋转超过十种不同物理特性的类似笔状物品，并进行多次革命。我们对我们的设计选择进行了全面分析，并分享了开发过程中学到的经验教训。

更新时间: 2024-10-23 19:56:39

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.18902v2

Efficient Inference for Augmented Large Language Models

Augmented Large Language Models (LLMs) enhance the capabilities of standalone LLMs by integrating external data sources through API calls. In interactive LLM applications, efficient scheduling is crucial for maintaining low request completion times, directly impacting user engagement. However, these augmentations introduce scheduling challenges due to the need to manage limited memory for cached information (KV caches). As a result, traditional size-based scheduling algorithms, such as Shortest Job First (SJF), become less effective at minimizing completion times. Existing work focuses only on handling requests during API calls by preserving, discarding, or swapping memory without considering how to schedule requests with API calls. In this paper, we propose LAMPS, a novel LLM inference framework for augmented LLMs. LAMPS minimizes request completion time through a unified scheduling approach that considers the total length of requests and their handling strategies during API calls. Recognizing that LLM inference is memory-bound, our approach ranks requests based on their consumption of memory over time, which depends on both the output sizes and how a request is managed during its API calls. To implement our scheduling, LAMPS predicts the strategy that minimizes memory waste of a request during its API calls, aligning with but improving upon existing approaches. We also propose starvation prevention techniques and optimizations to mitigate the overhead of our scheduling. We implement LAMPS on top of vLLM and evaluate its performance against baseline LLM inference systems, demonstrating improvements in end-to-end latency by 27%-85% and reductions in TTFT by 4%-96% compared to the existing augmented-LLM system, with even greater gains over vLLM.

Updated: 2024-10-23 19:53:30

标题: 增强型大型语言模型的高效推理

摘要: 增强大型语言模型（LLMs）通过API调用集成外部数据源，增强了独立LLMs的能力。在交互式LLM应用程序中，高效的调度对于维持低请求完成时间至关重要，直接影响用户参与度。然而，这些增强引入了调度挑战，因为需要管理用于缓存信息（KV缓存）的有限内存。因此，传统的基于大小的调度算法，如最短作业优先（SJF），在最小化完成时间方面变得不那么有效。现有工作仅专注于在API调用期间处理请求，通过保留、丢弃或交换内存来处理请求，而没有考虑如何调度带有API调用的请求。在本文中，我们提出了LAMPS，一个用于增强LLMs的新颖LLM推理框架。LAMPS通过统一的调度方法最小化请求完成时间，考虑请求的总长度以及它们在API调用期间的处理策略。认识到LLM推理受内存限制，我们的方法基于请求随时间消耗的内存来对请求进行排序，这取决于输出大小以及请求在其API调用期间的管理方式。为了实现我们的调度，LAMPS预测在API调用期间最小化请求内存浪费的策略，与但改进现有方法。我们还提出了防止饥饿的技术和优化，以减少我们调度的开销。我们在vLLM之上实现了LAMPS，并将其性能与基线LLM推理系统进行了评估，显示出相对于现有增强LLM系统，端到端延迟的改进为27%-85%，TTFT的减少为4%-96%，甚至比vLLM获得更大的收益。

更新时间: 2024-10-23 19:53:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18248v1

Strongly-polynomial time and validation analysis of policy gradient methods

This paper proposes a novel termination criterion, termed the advantage gap function, for finite state and action Markov decision processes (MDP) and reinforcement learning (RL). By incorporating this advantage gap function into the design of step size rules and deriving a new linear rate of convergence that is independent of the stationary state distribution of the optimal policy, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. To the best of our knowledge, this is the first time that such strong convergence properties have been established for policy gradient methods. Moreover, in the stochastic setting, where only stochastic estimates of policy gradients are available, we show that the advantage gap function provides close approximations of the optimality gap for each individual state and exhibits a sublinear rate of convergence at every state. The advantage gap function can be easily estimated in the stochastic case, and when coupled with easily computable upper bounds on policy values, they provide a convenient way to validate the solutions generated by policy gradient methods. Therefore, our developments offer a principled and computable measure of optimality for RL, whereas current practice tends to rely on algorithm-to-algorithm or baselines comparisons with no certificate of optimality.

Updated: 2024-10-23 19:40:50

标题: 策略梯度方法的强多项式时间和验证分析

摘要: 这篇论文提出了一种新颖的终止准则，称为优势差函数，用于有限状态和动作马尔可夫决策过程（MDP）和强化学习（RL）。通过将这种优势差函数纳入步长规则设计，并推导出一个新的线性收敛率，独立于最优策略的稳态状态分布，我们证明政策梯度方法可以在强多项式时间内解决MDP。据我们所知，这是首次为政策梯度方法建立了这种强收敛特性。此外，在只有政策梯度的随机估计可用的随机设置中，我们展示了优势差函数为每个单独状态提供了接近最优性差距的近似，并在每个状态都表现出次线性的收敛速度。优势差函数在随机情况下可以很容易地估计，并且当与易于计算的政策值上界结合时，它们提供了一种验证政策梯度方法生成的解的便捷方式。因此，我们的发展为RL提供了一个有原则且可计算的最优度量，而当前的实践倾向于依赖算法与算法或基线比较，没有最优性证书。

更新时间: 2024-10-23 19:40:50

领域: cs.LG,cs.AI,cs.DS,math.OC,49K45, 49M05, 90C05, 90C26, 90C40, 90C46

下载: http://arxiv.org/abs/2409.19437v2

Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling step has been widely adopted for the problem. However, such a method faces challenges from the risk of reward hacking and the complexity of reinforcement learning. To overcome the challenge, our insight is that both challenges come from the state-actions not supported in the dataset. Such state-actions are unreliable and increase the complexity of the reinforcement learning problem at the second step. Based on the insight, we develop a novel two-step learning method called PRC: preference-based reinforcement learning with constrained actions. The high-level idea is to limit the reinforcement learning agent to optimize over a constrained action space that excludes the out-of-distribution state-actions. We empirically verify that our method has high learning efficiency on various datasets in robotic control environments.

Updated: 2024-10-23 19:38:34

标题: 具有约束动作的两步离线基于偏好的强化学习

摘要: 基于偏好的强化学习（PBRL）在离线环境中在工业应用中取得了巨大成功，例如聊天机器人。一个两步学习框架，在奖励建模步骤之后应用强化学习步骤，已被广泛采用解决这个问题。然而，这种方法面临来自奖励欺骗风险和强化学习复杂性的挑战。为了克服这一挑战，我们的见解是这两个挑战都来自数据集中不支持的状态-动作。这些状态-动作是不可靠的，并增加了第二步强化学习问题的复杂性。基于这一见解，我们开发了一种名为PRC的新型两步学习方法：带有受限动作的基于偏好的强化学习。高层次的想法是将强化学习代理限制在一个受限动作空间中进行优化，排除了分布之外的状态-动作。我们在机器人控制环境中的各种数据集上经验验证了我们的方法具有很高的学习效率。

更新时间: 2024-10-23 19:38:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.00330v2

Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent

Strategic coordination between autonomous agents and human partners under incomplete information can be modeled as turn-based cooperative games. We extend a turn-based game under incomplete information, the shared-control game, to allow players to take multiple actions per turn rather than a single action. The extension enables the use of multi-step intent, which we hypothesize will improve performance in long-horizon tasks. To synthesize cooperative policies for the agent in this extended game, we propose an approach featuring a memory module for a running probabilistic belief of the environment dynamics and an online planning algorithm called IntentMCTS. This algorithm strategically selects the next action by leveraging any communicated multi-step intent via reward augmentation while considering the current belief. Agent-to-agent simulations in the Gnomes at Night testbed demonstrate that IntentMCTS requires fewer steps and control switches than baseline methods. A human-agent user study corroborates these findings, showing an 18.52% higher success rate compared to the heuristic baseline and a 5.56% improvement over the single-step prior work. Participants also report lower cognitive load, frustration, and higher satisfaction with the IntentMCTS agent partner.

Updated: 2024-10-23 19:37:19

标题: 不完整信息下的游戏中人-代理协调通过多步意图

摘要: 在信息不完整的情况下，自主代理和人类合作伙伴之间的战略协调可以建模为基于回合的合作游戏。我们将一种不完整信息下的基于回合的游戏，即共享控制游戏，进行扩展，允许玩家在每个回合中采取多个行动而不是单个行动。这种扩展使得可以使用多步意图，我们假设这将改善长期任务的性能。为了合成在这个扩展游戏中的代理的合作策略，我们提出了一种方法，其中包括一个用于环境动态的运行概率信念的记忆模块和一种名为IntentMCTS的在线规划算法。该算法通过奖励增强来策略性地选择下一个行动，同时考虑当前的信念和任何传达的多步意图。在《夜间的侏儒》测试平台上进行的代理对代理的模拟表明，IntentMCTS比基准方法需要更少的步骤和控制切换。人类-代理用户研究证实了这些发现，显示与启发式基线相比，成功率高出18.52％，比单步先前工作提高了5.56％。参与者还报告意图MCTS代理合作伙伴的认知负荷更低、沮丧更少，对其满意度更高。

更新时间: 2024-10-23 19:37:19

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.18242v1

Characterising Open Source Co-opetition in Company-hosted Open Source Software Projects: The Cases of PyTorch, TensorFlow, and Transformers

Companies, including market rivals, have long collaborated on the development of open source software (OSS), resulting in a tangle of co-operation and competition known as "open source co-opetition". While prior work investigates open source co-opetition in OSS projects that are hosted by vendor-neutral foundations, we have a limited understanding thereof in OSS projects that are hosted and governed by one company. Given their prevalence, it is timely to investigate open source co-opetition in such contexts. Towards this end, we conduct a mixed-methods analysis of three company-hosted OSS projects in the artificial intelligence (AI) industry: Meta's PyTorch (prior to its donation to the Linux Foundation), Google's TensorFlow, and Hugging Face's Transformers. We contribute three key findings. First, while the projects exhibit similar code authorship patterns between host and external companies (80%/20% of commits), collaborations are structured differently (e.g., decentralised vs. hub-and-spoke networks). Second, host and external companies engage in strategic, non-strategic, and contractual collaborations, with varying incentives and collaboration practices. Some of the observed collaborations are specific to the AI industry (e.g., hardware-software optimizations or AI model integrations), while others are typical of the broader software industry (e.g., bug fixing or task outsourcing). Third, single-vendor governance creates a power imbalance that influences open source co-opetition practices and possibilities, from the host company's singular decision-making power (e.g., the risk of license change) to their community involvement strategy (e.g., from over-control to over-delegation). We conclude with recommendations for future research.

Updated: 2024-10-23 19:35:41

标题: 翻译：公司主办的开源软件项目中的开源合作竞争特征：PyTorch、TensorFlow和Transformers案例

摘要: 公司，包括市场竞争对手，长期以来一直在合作开发开源软件（OSS），形成了一种被称为“开源合作竞争”的合作与竞争交织的关系。虽然先前的研究探讨了由供应商中立基金会托管的OSS项目中的开源合作竞争，但我们对由一家公司托管和管理的OSS项目的理解有限。考虑到它们的普遍存在，现在是研究这种情况下的开源合作竞争的时机。为此，我们对人工智能（AI）行业中三个公司托管的OSS项目进行了混合方法分析：Meta的PyTorch（在其捐赠给Linux基金会之前）、Google的TensorFlow和Hugging Face的Transformers。我们得出了三个关键发现。首先，虽然这些项目在主承诺和外部公司之间展现出相似的代码编写模式（80%/20%的提交），但合作结构不同（例如，分散式与中心枢轴网络）。其次，主持人和外部公司参与战略、非战略和合同合作，具有不同的激励和合作实践。一些观察到的合作是特定于AI行业的（例如，硬件软件优化或AI模型集成），而另一些则是更广泛的软件行业的典型合作（例如，错误修复或任务外包）。第三，单一供应商治理造成了权力失衡，影响了开源合作竞争的实践和可能性，从主持公司的独立决策权（例如，许可证更改的风险）到他们的社区参与策略（例如，从过度控制到过度委托）。我们最后提出了未来研究的建议。

更新时间: 2024-10-23 19:35:41

领域: cs.SE,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.18241v1

E2E-Swin-Unet++: An Enhanced End-to-End Swin-Unet Architecture With Dual Decoders For PTMC Segmentation

Efficiently managing papillary thyroid microcarcinoma (PTMC) while minimizing patient discomfort poses a significant clinical challenge. Radiofrequency ablation (RFA) offers a less invasive alternative to surgery and radiation therapy for PTMC treatment, characterized by shorter recovery times and reduced pain. As an image-guided procedure, RFA generates localized heat by delivering high-frequency electrical currents through electrodes to the targeted area under ultrasound imaging guidance. However, the precision and skill required by operators for accurate guidance using current ultrasound B-mode imaging technologies remain significant challenges. To address these challenges, we develop a novel AI segmentation model, E2E-Swin-Unet++. This model enhances ultrasound B-mode imaging by enabling real-time identification and segmentation of PTMC tumors and monitoring of the region of interest for precise targeting during treatment. E2E-Swin- Unet++ is an advanced end-to-end extension of the Swin-Unet architecture, incorporating thyroid region information to minimize the risk of false PTMC segmentation while providing fast inference capabilities. Experimental results on a real clinical RFA dataset demonstrate the superior performance of E2E-Swin-Unet++ compared to related models. Our proposed solution significantly improves the precision and control of RFA ablation treatment by enabling real-time identification and segmentation of PTMC margins during the procedure.

Updated: 2024-10-23 19:33:33

标题: E2E-Swin-Unet++：具有双解码器的增强型端到端Swin-Unet架构，用于PTMC分割

摘要: 高效管理乳头状甲状腺微小癌（PTMC）同时最大限度减少患者不适是一个重要的临床挑战。射频消融（RFA）为PTMC治疗提供了一种较少侵入性的替代方法，其特点是恢复时间较短，疼痛较轻。作为一种图像引导程序，RFA通过在超声引导下向靶区输送高频电流来产生局部热量。然而，操作员在使用当前超声B模式成像技术进行准确引导时所需的精度和技能仍然是重大挑战。为了解决这些挑战，我们开发了一种新颖的AI分割模型，E2E-Swin-Unet++。该模型通过实时识别和分割PTMC肿瘤以及监测治疗过程中的感兴趣区域，增强了超声B模式成像。E2E-Swin-Unet++是Swin-Unet架构的先进端到端扩展，整合了甲状腺区域信息，以最小化虚假PTMC分割的风险，同时提供快速推理能力。对真实临床RFA数据集的实验结果表明，与相关模型相比，E2E-Swin-Unet++表现出更优越的性能。我们提出的解决方案通过实时识别和分割PTMC边缘，显著提高了RFA消融治疗的精度和控制能力。

更新时间: 2024-10-23 19:33:33

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.18239v1

Bayesian optimization for robust robotic grasping using a sensorized compliant hand

One of the first tasks we learn as children is to grasp objects based on our tactile perception. Incorporating such skill in robots will enable multiple applications, such as increasing flexibility in industrial processes or providing assistance to people with physical disabilities. However, the difficulty lies in adapting the grasping strategies to a large variety of tasks and objects, which can often be unknown. The brute-force solution is to learn new grasps by trial and error, which is inefficient and ineffective. In contrast, Bayesian optimization applies active learning by adding information to the approximation of an optimal grasp. This paper proposes the use of Bayesian optimization techniques to safely perform robotic grasping. We analyze different grasp metrics to provide realistic grasp optimization in a real system including tactile sensors. An experimental evaluation in the robotic system shows the usefulness of the method for performing unknown object grasping even in the presence of noise and uncertainty inherent to a real-world environment.

Updated: 2024-10-23 19:33:14

标题: 基于贝叶斯优化的传感器化顺应性机械手稳健抓取技术

摘要: 儿童学习的第一项任务之一是根据我们的触觉感知来抓取物体。将这种技能应用于机器人将会实现多种应用，如增加工业流程的灵活性或为身体残障人士提供帮助。然而，困难在于将抓取策略调整到各种任务和物体上，这些通常是未知的。暴力解决方案是通过反复试验学习新的抓取方法，这种方法低效且无效。相比之下，贝叶斯优化采用主动学习的方法，通过增加信息来逼近最佳抓取。本文提出使用贝叶斯优化技术来安全地执行机器人抓取。我们分析了不同的抓取指标，以在实际系统中包括触觉传感器实现现实的抓取优化。机器人系统中的实验评估显示了该方法的实用性，即使在存在噪音和不确定性的真实环境中，也可以执行未知物体的抓取。

更新时间: 2024-10-23 19:33:14

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.18237v1

Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning

We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different learning datasets.

Updated: 2024-10-23 19:31:22

标题: 通用黑盒奖励中毒攻击对离线强化学习

摘要: 我们研究了针对使用深度神经网络的通用离线强化学习的普遍黑匣子奖励毒化攻击问题。我们考虑了一个黑匣子威胁模型，攻击者完全不了解学习算法，其预算受到限制，通过限制每个数据点的破坏量和总扰动量。我们要求攻击对任何可能被代理使用的高效算法普遍有效。我们提出了一种称为“策略对比攻击”的攻击策略。其思想是找到数据集中覆盖的低性能和高性能策略，并使它们分别对代理表现为高性能和低性能。据我们所知，我们提出了通用黑匣子奖励毒化攻击的第一个普遍离线RL设置。我们在攻击设计上提供了理论见解，并在实证上表明，我们的攻击对当前最先进的离线RL算法在不同的学习数据集上都是高效的。

更新时间: 2024-10-23 19:31:22

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2402.09695v2

Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits

We consider multi-draft speculative sampling, where the proposal sequences are sampled independently from different draft models. At each step, a token-level draft selection scheme takes a list of valid tokens as input and produces an output token whose distribution matches that of the target model. Previous works have demonstrated that the optimal scheme (which maximizes the probability of accepting one of the input tokens) can be cast as a solution to a linear program. In this work we show that the optimal scheme can be decomposed into a two-step solution: in the first step an importance sampling (IS) type scheme is used to select one intermediate token; in the second step (single-draft) speculative sampling is applied to generate the output token. For the case of two identical draft models we further 1) establish a necessary and sufficient condition on the distributions of the target and draft models for the acceptance probability to equal one and 2) provide an explicit expression for the optimal acceptance probability. Our theoretical analysis also motives a new class of token-level selection scheme based on weighted importance sampling. Our experimental results demonstrate consistent improvements in the achievable block efficiency and token rates over baseline schemes in a number of scenarios.

Updated: 2024-10-23 19:28:34

标题: 多次草稿的推测抽样：规范结构与理论极限

摘要: 我们考虑多次草案的猜测抽样，其中提案序列独立地从不同的草案模型中抽样。在每一步中，一个标记级别的草案选择方案将有效标记列表作为输入，并产生一个输出标记，其分布与目标模型相匹配。先前的研究表明，最佳方案（最大化接受输入标记之一的概率）可以被视为线性规划的解决方案。在这项工作中，我们展示了最佳方案可以分解为两步解决方案：在第一步中，使用重要性抽样（IS）类型方案选择一个中间标记；在第二步中，应用（单次草案）猜测抽样来生成输出标记。对于两个相同的草案模型，我们进一步1）建立了目标和草案模型的分布必要和充分条件，使接受概率等于一，并且2）提供了最佳接受概率的显式表达式。我们的理论分析还激发了一类基于加权重要性抽样的标记级别选择方案。我们的实验结果表明，在许多情况下，与基线方案相比，实现的块效率和标记速率均有稳定的改进。

更新时间: 2024-10-23 19:28:34

领域: cs.CL,cs.DC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.18234v1

DMTG: A Human-Like Mouse Trajectory Generation Bot Based on Entropy-Controlled Diffusion Networks

CAPTCHAs protect against resource misuse and data theft by distinguishing human activity from automated bots. Advances in machine learning have made traditional image and text-based CAPTCHAs vulnerable to attacks, leading modern CAPTCHAs, such as GeeTest and Akamai, to incorporate behavioral analysis like mouse trajectory detection. Existing bypass techniques struggle to fully mimic human behavior, making it difficult to evaluate the effectiveness of anti-bot measures. To address this, we propose a diffusion model-based mouse trajectory generation framework (DMTG), which controls trajectory complexity and produces realistic human-like mouse movements. DMTG also provides white-box and black-box testing methods to assess its ability to bypass CAPTCHA systems. In experiments, DMTG reduces bot detection accuracy by 4.75%-9.73% compared to other models. Additionally, it mimics physical human behaviors, such as slow initiation and directional force differences, demonstrating improved performance in both simulation and real-world CAPTCHA scenarios.

Updated: 2024-10-23 19:27:30

标题: DMTG：一种基于熵控制扩散网络的类人鼠轨迹生成机器人

摘要: CAPTCHAs通过区分人类活动和自动机器人来保护资源免受滥用和数据窃取。机器学习的进展使传统的基于图像和文本的CAPTCHAs容易受到攻击，导致现代CAPTCHAs（如GeeTest和Akamai）融入了像鼠标轨迹检测这样的行为分析。现有的绕过技术难以完全模仿人类行为，使得评估防止机器人的措施的有效性变得困难。为了解决这个问题，我们提出了基于扩散模型的鼠标轨迹生成框架（DMTG），该框架控制轨迹复杂性并产生真实的类似人类的鼠标移动。DMTG还提供了白盒和黑盒测试方法，以评估其绕过CAPTCHA系统的能力。在实验中，与其他模型相比，DMTG将机器人检测准确率降低了4.75%-9.73%。此外，它模仿了人类的物理行为，如缓慢启动和方向力的差异，表现出在模拟和现实世界CAPTCHA场景中的改进性能。

更新时间: 2024-10-23 19:27:30

领域: cs.CR

下载: http://arxiv.org/abs/2410.18233v1

Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Offline reinforcement learning has become one of the most practical RL settings. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich understanding of offline RL from the reward-based to the preference-based setting. In this work, we propose a general framework to bridge this gap. Our key insight is transforming preference feedback to scalar rewards via binary reward labeling (BRL), and then any reward-based offline RL algorithms can be applied to the dataset with the reward labels. The information loss during the feedback signal transition is minimized with binary reward labeling in the practical learning scenarios. We theoretically show the connection between several recent PBRL techniques and our framework combined with specific offline RL algorithms. By combining reward labeling with different algorithms, our framework can lead to new and potentially more efficient offline PBRL algorithms. We empirically test our framework on preference datasets based on the standard D4RL benchmark. When combined with a variety of efficient reward-based offline RL algorithms, the learning result achieved under our framework is comparable to training the same algorithm on the dataset with actual rewards in many cases and better than the recent PBRL baselines in most cases.

Updated: 2024-10-23 19:26:47

标题: 二进制奖励标签：连接离线偏好与基于奖励的强化学习

摘要: 离线强化学习已成为最实用的强化学习设置之一。然而，大多数现有的离线强化学习作品集中在具有标量奖励反馈的标准设置上。目前还不清楚如何将现有的对离线强化学习的丰富理解从奖励为基础的设置转移到以偏好为基础的设置上。在这项工作中，我们提出了一个桥接这一差距的通用框架。我们的关键洞察是通过二进制奖励标记（BRL）将偏好反馈转换为标量奖励，然后任何以奖励为基础的离线强化学习算法都可以应用于带有奖励标签的数据集。在实际学习场景中，通过二进制奖励标记最小化了反馈信号转换过程中的信息损失。我们在理论上展示了几种最近的PBRL技术与我们的框架结合特定离线强化学习算法之间的联系。通过将奖励标记与不同算法结合，我们的框架可以引导出新的、可能更高效的离线PBRL算法。我们在基于标准D4RL基准的偏好数据集上对我们的框架进行了实证测试。当与各种高效的以奖励为基础的离线强化学习算法结合时，在许多情况下，我们的框架下的学习结果与在实际奖励数据集上训练相同算法的结果相媲美，并且在大多数情况下优于最近的PBRL基准。

更新时间: 2024-10-23 19:26:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.10445v3

Assessment of Developmental Dysgraphia Utilising a Display Tablet

Even though the computerised assessment of developmental dysgraphia (DD) based on online handwriting processing has increasing popularity, most of the solutions are based on a setup, where a child writes on a paper fixed to a digitizing tablet that is connected to a computer. Although this approach enables the standard way of writing using an inking pen, it is difficult to be administered by children themselves. The main goal of this study is thus to explore, whether the quantitative analysis of online handwriting recorded via a display screen tablet could sufficiently support the assessment of DD as well. For the purpose of this study, we enrolled 144 children (attending the 3rd and 4th class of a primary school), whose handwriting proficiency was assessed by a special education counsellor, and who assessed themselves by the Handwriting Proficiency Screening Questionnaires for Children (HPSQ C). Using machine learning models based on a gradient-boosting algorithm, we were able to support the DD diagnosis with up to 83.6% accuracy. The HPSQ C total score was estimated with a minimum error equal to 10.34 %. Children with DD spent significantly higher time in-air, they had a higher number of pen elevations, a bigger height of on-surface strokes, a lower in-air tempo, and a higher variation in the angular velocity. Although this study shows a promising impact of DD assessment via display tablets, it also accents the fact that modelling of subjective scores is challenging and a complex and data-driven quantification of DD manifestations is needed.

Updated: 2024-10-23 19:24:58

标题: 使用显示平板评估发展性书写障碍

摘要: 虽然基于在线手写处理的发展性书写障碍（DD）的计算机化评估越来越受欢迎，但大多数解决方案都是基于一个设置，即孩子在连接到计算机的数字化平板上写在一张固定的纸上。尽管这种方法可以使用墨水笔进行标准写作，但孩子们很难自行管理。因此，本研究的主要目标是探索通过显示屏平板记录的在线手写的定量分析是否足以支持DD的评估。为了本研究的目的，我们招募了144名孩子（就读于一所小学的三年级和四年级），他们的书写能力由特殊教育顾问评估，并通过儿童书写熟练度筛选问卷（HPSQ C）自评。使用基于梯度提升算法的机器学习模型，我们能够以高达83.6％的准确率支持DD的诊断。HPSQ C总分的估计最小误差为10.34％。DD患儿在空中停留的时间较长，笔提升次数较多，表面笔画的高度较大，空中速度较慢，并且角速度变化较大。尽管这项研究显示了通过显示平板进行DD评估的有希望影响，但也强调了主观评分建模具有挑战性，需要进行DD表现的复杂和数据驱动的量化。

更新时间: 2024-10-23 19:24:58

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.18230v1

Hotel Booking Cancellation Prediction Using Applied Bayesian Models

This study applies Bayesian models to predict hotel booking cancellations, a key challenge affecting resource allocation, revenue, and customer satisfaction in the hospitality industry. Using a Kaggle dataset with 36,285 observations and 17 features, Bayesian Logistic Regression and Beta-Binomial models were implemented. The logistic model, applied to 12 features and 5,000 randomly selected observations, outperformed the Beta-Binomial model in predictive accuracy. Key predictors included the number of adults, children, stay duration, lead time, car parking space, room type, and special requests. Model evaluation using Leave-One-Out Cross-Validation (LOO-CV) confirmed strong alignment between observed and predicted outcomes, demonstrating the model's robustness. Special requests and parking availability were found to be the strongest predictors of cancellation. This Bayesian approach provides a valuable tool for improving booking management and operational efficiency in the hotel industry.

Updated: 2024-10-23 19:13:31

标题: 使用应用贝叶斯模型进行酒店预订取消预测

摘要: 本研究应用贝叶斯模型来预测酒店预订取消情况，这是影响资源分配、收入和客户满意度的关键挑战，拥有36285个观察值和17个特征的Kaggle数据集被使用。Bayesian逻辑回归和Beta-Binomial模型被实施。逻辑模型应用于12个特征和5000个随机选择的观察值，预测准确性优于Beta-Binomial模型。关键预测因素包括成人和儿童人数、停留时间、提前预订时间、停车位、房型和特殊要求。使用留一交叉验证（LOO-CV）对模型进行评估，确认观察和预测结果之间的强大一致性，证明了模型的稳健性。特殊要求和停车位的可用性被发现是取消的最强预测因素。这种贝叶斯方法为酒店行业提供了一种有价值的工具，用于改进预订管理和运营效率。

更新时间: 2024-10-23 19:13:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.16406v2

Disjointness Violations in Wikidata

Disjointness checks are among the most important constraint checks in a knowledge base and can be used to help detect and correct incorrect statements and internal contradictions. Wikidata is a very large, community-managed knowledge base. Because of both its size and construction, Wikidata contains many incorrect statements and internal contradictions. We analyze the current modeling of disjointness on Wikidata, identify patterns that cause these disjointness violations and categorize them. We use SPARQL queries to identify each ``culprit'' causing a disjointness violation and lay out formulas to identify and fix conflicting information. We finally discuss how disjointness information could be better modeled and expanded in Wikidata in the future.

Updated: 2024-10-23 19:12:05

标题: 《Wikidata中的不相交性违规》

摘要: 不相交性检查是知识库中最重要的约束检查之一，可用于帮助检测和纠正不正确的陈述和内部矛盾。Wikidata是一个非常庞大的社区管理的知识库。由于其规模和构造，Wikidata包含许多不正确的陈述和内部矛盾。我们分析了Wikidata上当前的不相交性建模，识别导致这些不相交性违规的模式并对其进行分类。我们使用SPARQL查询来识别导致不相交性违规的每个“罪魁祸首”，并提出公式来识别和修复冲突信息。最后，我们讨论了未来如何更好地在Wikidata中建模和扩展不相交性信息。

更新时间: 2024-10-23 19:12:05

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.13707v2

CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis

The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author ($N=1$) to multi-author (up to $N=5$) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at \texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}.

Updated: 2024-10-23 19:05:12

标题: CollabStory：多LLM协作故事生成和作者分析

摘要: 具有无缝互操作能力的统一框架的崛起使得大型语言模型（LLMs）之间进行无限制任务合作成为可能。尽管如此，目前尚未有人尝试探索这种协作写作。我们将人类-LLM合作推进到下一个阶段，探索多LLM情景，生成了首个专门由LLM生成的协作故事数据集CollabStory。我们关注单作者（$N=1$）到多作者（最多$N=5$）情景，其中多个LLMs共同创作故事。我们使用开源指令调整的LLMs生成了超过32k个故事。此外，我们从为人类-人类多作者写作任务和分析设定标准的PAN任务中获得灵感。我们将他们关于作者相关任务扩展到多LLM设置，并提出LLM-LLM合作的基线。我们发现目前的基线无法处理这种新兴情景。因此，CollabStory是一个资源，可以帮助推动对使用多个LLM的技术的理解和发展。在写作任务的背景下研究这一点至关重要，因为LLM-LLM合作可能会对与抄袭检测、学术诚信、版权问题等相关的挑战产生潜在影响。我们将我们的数据集和代码提供在\texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}。

更新时间: 2024-10-23 19:05:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12665v2

Data Augmentation for Automated Adaptive Rodent Training

Fully optimized automation of behavioral training protocols for lab animals like rodents has long been a coveted goal for researchers. It is an otherwise labor-intensive and time-consuming process that demands close interaction between the animal and the researcher. In this work, we used a data-driven approach to optimize the way rodents are trained in labs. In pursuit of our goal, we looked at data augmentation, a technique that scales well in data-poor environments. Using data augmentation, we built several artificial rodent models, which in turn would be used to build an efficient and automatic trainer. Then we developed a novel similarity metric based on the action probability distribution to measure the behavioral resemblance of our models to that of real rodents.

Updated: 2024-10-23 18:51:11

标题: 数据增强用于自动化适应性啮齿动物训练

摘要: 对于像啮齿动物这样的实验室动物完全优化自动化行为训练方案长期以来一直是研究者们渴望达到的目标。这是一个劳动密集且耗时的过程，需要动物与研究者之间的密切互动。在这项工作中，我们采用了数据驱动的方法来优化实验室内啮齿动物的训练方式。为了达到我们的目标，我们研究了数据增强技术，这种技术在数据匮乏的环境中表现出色。利用数据增强，我们构建了几个人工啮齿动物模型，这些模型将用于构建一个高效自动的训练器。然后我们开发了一种基于动作概率分布的新颖相似度度量标准，用于衡量我们模型的行为与真实啮齿动物的相似度。

更新时间: 2024-10-23 18:51:11

领域: cs.AI

下载: http://arxiv.org/abs/2410.18221v1

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

Updated: 2024-10-23 18:45:33

标题: FLUX: 基于软件的快速GPU通信重叠通过内核融合

摘要: 大型深度学习模型已经展示出在各种应用领域中解决许多任务的强大能力。这些大型模型通常需要分布式训练和推理。张量并行是一种常见的技术，将操作或层的计算在设备之间进行划分，以克服单个处理器的内存容量限制，或加速计算以满足特定的延迟要求。然而，这种并行性引入了额外的通信，可能会占整体运行时间的相当大比例。因此，这种技术在具有高速互连的设备组内的可扩展性受到限制，例如具有NVLinks的GPU节点。本文提出了一种新颖的方法Flux，可以通过依赖计算显著隐藏GPU的通信延迟。Flux将通信和计算操作分解为更细粒度的操作，并进一步将它们融合到一个更大的内核中，以有效地隐藏通信而不损害内核效率。在给定一个融合内核的情况下，Flux可以潜在地重叠高达96%的通信。总体上，在具有各种GPU世代和互连的128个GPU集群上，它可以实现高达1.24倍的训练速度提升，以及在具有各种GPU世代和互连的8个GPU集群上，可以实现高达1.66倍和1.30倍的预填和解码推理速度提升。

更新时间: 2024-10-23 18:45:33

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.06858v5

Optimizing the role of human evaluation in LLM-based spoken document summarization systems

The emergence of powerful LLMs has led to a paradigm shift in abstractive summarization of spoken documents. The properties that make LLMs so valuable for this task -- creativity, ability to produce fluent speech, and ability to abstract information from large corpora -- also present new challenges to evaluating their content. Quick, cost-effective automatic evaluations such as ROUGE and BERTScore offer promise, but do not yet show competitive performance when compared to human evaluations. We draw on methodologies from the social sciences to propose an evaluation paradigm for spoken document summarization explicitly tailored for generative AI content. We provide detailed evaluation criteria and best practices guidelines to ensure robustness in the experimental design, replicability, and trustworthiness of human evaluation studies. We additionally include two case studies that show how these human-in-the-loop evaluation methods have been implemented at a major U.S. technology company.

Updated: 2024-10-23 18:37:14

标题: 优化LLM（语言模型）为基础的口语文档摘要系统中人类评估的作用

摘要: 强大的LLMs的出现导致口头文件摘要的抽象总结发生了范式转变。LLMs具有的使其在这一任务中如此宝贵的特性，包括创造力、产生流畅语言的能力以及从大型语料库中抽象信息的能力，也给评估其内容带来了新的挑战。像ROUGE和BERTScore这样的快速、成本效益的自动评估方法提供了希望，但与人类评估相比，它们尚未展现出竞争性能。我们借鉴社会科学的方法，提出了一种专门针对生成式AI内容的口头文件摘要评估范式。我们提供了详细的评估标准和最佳实践指南，以确保实验设计的稳健性、可复制性和人类评估研究的可信度。此外，我们还包括了两个案例研究，展示了这些人为介入的评估方法如何在一家美国主要技术公司中得到应用。

更新时间: 2024-10-23 18:37:14

领域: cs.AI,cs.CL,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.18218v1

Neural Cover Selection for Image Steganography

In steganography, selecting an optimal cover image, referred to as cover selection, is pivotal for effective message concealment. Traditional methods have typically employed exhaustive searches to identify images that conform to specific perceptual or complexity metrics. However, the relationship between these metrics and the actual message hiding efficacy of an image is unclear, often yielding less-than-ideal steganographic outcomes. Inspired by recent advancements in generative models, we introduce a novel cover selection framework, which involves optimizing within the latent space of pretrained generative models to identify the most suitable cover images, distinguishing itself from traditional exhaustive search methods. Our method shows significant advantages in message recovery and image quality. We also conduct an information-theoretic analysis of the generated cover images, revealing that message hiding predominantly occurs in low-variance pixels, reflecting the waterfilling algorithm's principles in parallel Gaussian channels. Our code can be found at: https://github.com/karlchahine/Neural-Cover-Selection-for-Image-Steganography.

Updated: 2024-10-23 18:32:34

标题: 神经覆盖选择用于图像隐写术

摘要: 在隐写术中，选择一个最佳的封面图像，即封面选择，对于有效隐藏消息至关重要。传统方法通常采用穷举搜索来识别符合特定感知或复杂度度量标准的图像。然而，这些度量标准与图像实际消息隐藏效果之间的关系并不清楚，通常会导致隐写术结果不尽理想。受到生成模型最新进展的启发，我们引入了一种新颖的封面选择框架，该框架涉及在预训练生成模型的潜在空间内进行优化，以识别最适合的封面图像，与传统的穷举搜索方法有所区别。我们的方法在消息恢复和图像质量方面表现出显著优势。我们还对生成的封面图像进行了信息论分析，揭示消息隐藏主要发生在低方差像素中，反映了水填充算法在平行高斯通道中的原则。我们的代码可以在以下链接找到：https://github.com/karlchahine/Neural-Cover-Selection-for-Image-Steganography。

更新时间: 2024-10-23 18:32:34

领域: cs.AI

下载: http://arxiv.org/abs/2410.18216v1

Advancing NLP Security by Leveraging LLMs as Adversarial Engines

This position paper proposes a novel approach to advancing NLP security by leveraging Large Language Models (LLMs) as engines for generating diverse adversarial attacks. Building upon recent work demonstrating LLMs' effectiveness in creating word-level adversarial examples, we argue for expanding this concept to encompass a broader range of attack types, including adversarial patches, universal perturbations, and targeted attacks. We posit that LLMs' sophisticated language understanding and generation capabilities can produce more effective, semantically coherent, and human-like adversarial examples across various domains and classifier architectures. This paradigm shift in adversarial NLP has far-reaching implications, potentially enhancing model robustness, uncovering new vulnerabilities, and driving innovation in defense mechanisms. By exploring this new frontier, we aim to contribute to the development of more secure, reliable, and trustworthy NLP systems for critical applications.

Updated: 2024-10-23 18:32:03

标题: 通过利用LLM作为对抗引擎推动自然语言处理安全性

摘要: 这份立场文件提出了一种通过利用大型语言模型（LLMs）作为生成多样化对抗攻击引擎来推进自然语言处理（NLP）安全性的新方法。基于最近的研究表明LLMs在创建单词级对抗示例方面的有效性，我们主张将这一概念扩展到包括更广泛范围的攻击类型，包括对抗性贴片、通用扰动和有针对性的攻击。我们认为LLMs的复杂语言理解和生成能力可以在各个领域和分类器架构中生成更有效、语义连贯和类似人类的对抗示例。对于对抗性NLP的这种范式转变具有深远的影响，可能增强模型的稳健性，揭示新的漏洞，并推动防御机制的创新。通过探索这一新领域，我们旨在为关键应用开发更安全、可靠和值得信赖的NLP系统作出贡献。

更新时间: 2024-10-23 18:32:03

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18215v1

Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks

Recent advancements in Large Language Models (LLMs) have sparked widespread concerns about their safety. Recent work demonstrates that safety alignment of LLMs can be easily removed by fine-tuning with a few adversarially chosen instruction-following examples, i.e., fine-tuning attacks. We take a further step to understand fine-tuning attacks in multilingual LLMs. We first discover cross-lingual generalization of fine-tuning attacks: using a few adversarially chosen instruction-following examples in one language, multilingual LLMs can also be easily compromised (e.g., multilingual LLMs fail to refuse harmful prompts in other languages). Motivated by this finding, we hypothesize that safety-related information is language-agnostic and propose a new method termed Safety Information Localization (SIL) to identify the safety-related information in the model parameter space. Through SIL, we validate this hypothesis and find that only changing 20% of weight parameters in fine-tuning attacks can break safety alignment across all languages. Furthermore, we provide evidence to the alternative pathways hypothesis for why freezing safety-related parameters does not prevent fine-tuning attacks, and we demonstrate that our attack vector can still jailbreak LLMs adapted to new languages.

Updated: 2024-10-23 18:27:36

标题: 朝向理解多语言LLMs对微调攻击的脆弱性

摘要: 最近大型语言模型（LLMs）的最新进展引发了人们对它们安全性的广泛关注。最近的研究表明，LLMs的安全对齐可以通过使用一些对抗性选择的指令遵循示例进行微调来轻易地消除，即微调攻击。我们进一步研究了多语言LLMs中的微调攻击。我们首先发现微调攻击的跨语言泛化：在一种语言中使用一些对抗性选择的指令遵循示例，多语言LLMs也可以轻松受到影响（例如，多语言LLMs无法拒绝其他语言中的有害提示）。受这一发现的启发，我们假设与安全相关的信息是与语言无关的，并提出了一种新方法，称为安全信息本地化（SIL），用于识别模型参数空间中与安全相关的信息。通过SIL，我们验证了这一假设，并发现仅改变微调攻击中的20％权重参数就可以打破所有语言的安全对齐。此外，我们为什么冻结与安全相关的参数并不能阻止微调攻击提供了替代路径假设的证据，并且我们证明我们的攻击向量仍然可以破解适应新语言的LLMs。

更新时间: 2024-10-23 18:27:36

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.18210v1

Automated Defect Detection and Grading of Piarom Dates Using Deep Learning

Grading and quality control of Piarom dates, a premium and high-value variety cultivated predominantly in Iran, present significant challenges due to the complexity and variability of defects, as well as the absence of specialized automated systems tailored to this fruit. Traditional manual inspection methods are labor intensive, time consuming, and prone to human error, while existing AI-based sorting solutions are insufficient for addressing the nuanced characteristics of Piarom dates. In this study, we propose an innovative deep learning framework designed specifically for the real-time detection, classification, and grading of Piarom dates. Leveraging a custom dataset comprising over 9,900 high-resolution images annotated across 11 distinct defect categories, our framework integrates state-of-the-art object detection algorithms and Convolutional Neural Networks (CNNs) to achieve high precision in defect identification. Furthermore, we employ advanced segmentation techniques to estimate the area and weight of each date, thereby optimizing the grading process according to industry standards. Experimental results demonstrate that our system significantly outperforms existing methods in terms of accuracy and computational efficiency, making it highly suitable for industrial applications requiring real-time processing. This work not only provides a robust and scalable solution for automating quality control in the Piarom date industry but also contributes to the broader field of AI-driven food inspection technologies, with potential applications across various agricultural products.

Updated: 2024-10-23 18:25:20

标题: 使用深度学习进行Piarom枣自动缺陷检测和分级

摘要: Piarom枣的分级和质量控制是一个重要挑战，因为存在各种缺陷的复杂性和可变性，以及缺乏专门针对这种水果的自动化系统。传统的手工检查方法劳动密集、耗时且容易出现人为错误，而现有的基于人工智能的分选解决方案不足以处理Piarom枣的微妙特征。在这项研究中，我们提出了一种创新的深度学习框架，专门用于实时检测、分类和分级Piarom枣。利用一个包含超过9,900张高分辨率图像的自定义数据集，跨越11个不同缺陷类别进行注释，我们的框架集成了最先进的目标检测算法和卷积神经网络(CNN)，以实现高精度的缺陷识别。此外，我们采用先进的分割技术来估计每个枣的面积和重量，从而根据行业标准优化分级过程。实验结果表明，我们的系统在准确性和计算效率方面明显优于现有方法，使其非常适合需要实时处理的工业应用。这项工作不仅为自动化Piarom枣行业的质量控制提供了一个稳健且可扩展的解决方案，还为基于人工智能的食品检查技术领域做出了贡献，具有潜在的应用于各种农产品。

更新时间: 2024-10-23 18:25:20

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18208v1

Melody Construction for Persian lyrics using LSTM recurrent neural networks

The present paper investigated automatic melody construction for Persian lyrics as an input. It was assumed that there is a phonological correlation between the lyric syllables and the melody in a song. A seq2seq neural network was developed to investigate this assumption, trained on parallel syllable and note sequences in Persian songs to suggest a pleasant melody for a new sequence of syllables. More than 100 pieces of Persian music were collected and converted from the printed version to the digital format due to the lack of a dataset on Persian digital music. Finally, 14 new lyrics were given to the model as input, and the suggested melodies were performed and recorded by music experts to evaluate the trained model. The evaluation was conducted using an audio questionnaire, which more than 170 persons answered. According to the answers about the pleasantness of melody, the system outputs scored an average of 3.005 from 5, while the human-made melodies for the same lyrics obtained an average score of 4.078.

Updated: 2024-10-23 18:11:44

标题: 使用LSTM循环神经网络构建波斯歌词的旋律

摘要: 本文研究了将波斯歌词作为输入进行自动旋律构建。假设歌词音节与歌曲旋律之间存在音韵相关性。开发了一个seq2seq神经网络来探究这一假设，该网络在波斯歌曲中的平行音节和音符序列上进行训练，以为新的音节序列建议一个愉悦的旋律。收集了100多首波斯音乐作品，并将其从印刷版转换为数字格式，因为缺乏波斯数字音乐数据集。最终，将14首新歌词输入模型，并由音乐专家演奏和录制了建议的旋律，以评估训练模型。评估使用音频问卷进行，超过170人回答了。根据关于旋律愉悦程度的答案，系统输出的平均得分为5分中的3.005，而相同歌词的人工制作旋律平均得分为4.078。

更新时间: 2024-10-23 18:11:44

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.18203v1

TensorOpera Router: A Multi-Model Router for Efficient LLM Inference

With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present TO-Router, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, TO-Router improves query efficiency by up to 40\%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

Updated: 2024-10-23 18:11:42

标题: TensorOpera路由器：用于高效LLM推断的多模型路由器

摘要: 随着在各个领域中大型语言模型（LLMs）的快速增长，出现了许多新的LLMs，每个都具有特定领域的专业知识。这种增长突显了迅速、高质量和具有成本效益的LLM查询响应方法的需求。然而，目前还没有一个单一的LLM能够有效地平衡这一三难问题。一些模型功能强大但成本极高，而另一些则快速且廉价，但质量较差。为了解决这一挑战，我们提出了TO-Router，这是一个非单体化的LLM查询系统，可以将各种LLM专家无缝地集成到一个查询界面中，并根据查询的需求动态路由传入的查询到性能最高的专家。通过大量实验，我们证明与独立专家模型相比，TO-Router可以提高查询效率高达40\%，并导致高达30\%的显著成本降低，同时保持或提高模型性能高达10%。

更新时间: 2024-10-23 18:11:42

领域: cs.AI,cs.LG,I.2; I.5

下载: http://arxiv.org/abs/2408.12320v3

PyTSC: A Unified Platform for Multi-Agent Reinforcement Learning in Traffic Signal Control

Multi-Agent Reinforcement Learning (MARL) presents a promising approach for addressing the complexity of Traffic Signal Control (TSC) in urban environments. However, existing platforms for MARL-based TSC research face challenges such as slow simulation speeds and convoluted, difficult-to-maintain codebases. To address these limitations, we introduce PyTSC, a robust and flexible simulation environment that facilitates the training and evaluation of MARL algorithms for TSC. PyTSC integrates multiple simulators, such as SUMO and CityFlow, and offers a streamlined API, empowering researchers to explore a broad spectrum of MARL approaches efficiently. PyTSC accelerates experimentation and provides new opportunities for advancing intelligent traffic management systems in real-world applications.

Updated: 2024-10-23 18:10:38

标题: PyTSC：交通信号控制中多智能体强化学习的统一平台

摘要: 多智能体强化学习（MARL）为解决城市环境中交通信号控制（TSC）的复杂性提供了一种有前途的方法。然而，现有基于MARL的TSC研究平台面临诸如模拟速度慢、代码结构复杂且难以维护等挑战。为解决这些限制，我们引入了PyTSC，一个稳健且灵活的模拟环境，促进了MARL算法在TSC中的训练和评估。PyTSC集成了多个模拟器，如SUMO和CityFlow，并提供了简化的API，使研究人员能够高效地探索广泛的MARL方法。PyTSC加速了实验，并为推进智能交通管理系统在实际应用中的发展提供了新机会。

更新时间: 2024-10-23 18:10:38

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2410.18202v1

Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings

Since the Transformer architecture emerged, language model development has grown, driven by their promising potential. Releasing these models into production requires properly understanding their behavior, particularly in sensitive domains like medicine. Despite this need, the medical literature still lacks practical assessment of pre-trained language models, which are especially valuable in settings where only consumer-grade computational resources are available. To address this gap, we have conducted a comprehensive survey of language models in the medical field and evaluated a subset of these for medical text classification and conditional text generation. The subset includes 53 models with 110 million to 13 billion parameters, spanning the Transformer-based model families and knowledge domains. Different approaches are employed for text classification, including zero-shot learning, enabling tuning without the need to train the model. These approaches are helpful in our target settings, where many users of language models find themselves. The results reveal remarkable performance across the tasks and datasets evaluated, underscoring the potential of certain models to contain medical knowledge, even without domain specialization. This study thus advocates for further exploration of model applications in medical contexts, particularly in computational resource-constrained settings, to benefit a wide range of users. The code is available on https://github.com/anpoc/Language-models-in-medicine.

Updated: 2024-10-23 18:10:29

标题: 在资源受限环境下评估医学语言模型

摘要: 自从Transformer架构出现以来，语言模型的发展已经增长，受到它们有前途的潜力驱动。将这些模型投入生产需要正确理解它们的行为，特别是在医学等敏感领域。尽管有这个需求，但医学文献仍然缺乏对预训练语言模型的实际评估，这些模型在只有消费者级计算资源可用的情况下尤其有价值。为了填补这一空白，我们对医学领域的语言模型进行了全面调查，并评估了其中的一个子集，以进行医学文本分类和条件文本生成。这个子集包括53个模型，参数范围从1.1亿到130亿，涵盖了基于Transformer的模型家族和知识领域。对于文本分类采用了不同的方法，包括零样本学习，可以在不需要训练模型的情况下进行调整。这些方法在我们的目标环境中是有帮助的，许多语言模型用户发现自己处于这种环境中。结果显示，在评估的任务和数据集上表现出色，强调了某些模型包含医学知识的潜力，即使没有领域专业化。因此，本研究主张在医学背景下进一步探索模型应用，特别是在计算资源有限的情境下，以使广泛用户受益。代码可在https://github.com/anpoc/Language-models-in-medicine 上找到。

更新时间: 2024-10-23 18:10:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16611v2

Rethinking Positive Pairs in Contrastive Learning

Contrastive learning, a prominent approach to representation learning, traditionally assumes positive pairs are closely related samples (the same image or class) and negative pairs are distinct samples. We challenge this assumption by proposing to learn from arbitrary pairs, allowing any pair of samples to be positive within our framework.The primary challenge of the proposed approach lies in applying contrastive learning to disparate pairs which are semantically distant. Motivated by the discovery that SimCLR can separate given arbitrary pairs (e.g., garter snake and table lamp) in a subspace, we propose a feature filter in the condition of class pairs that creates the requisite subspaces by gate vectors selectively activating or deactivating dimensions. This filter can be optimized through gradient descent within a conventional contrastive learning mechanism. We present Hydra, a universal contrastive learning framework for visual representations that extends conventional contrastive learning to accommodate arbitrary pairs. Our approach is validated using IN1K, where 1K diverse classes compose 500,500 pairs, most of them being distinct. Surprisingly, Hydra achieves superior performance in this challenging setting. Additional benefits include the prevention of dimensional collapse and the discovery of class relationships. Our work highlights the value of learning common features of arbitrary pairs and potentially broadens the applicability of contrastive learning techniques on the sample pairs with weak relationships.

Updated: 2024-10-23 18:07:18

标题: 重新思考对比学习中的正样本组合

摘要: 对比学习是一种重要的表示学习方法，传统上假设正样本是密切相关的样本（相同的图像或类别），而负样本是不同的样本。我们挑战这一假设，提议从任意对中学习，允许我们的框架内的任意一对样本都可以成为正样本。所提议方法的主要挑战在于将对比学习应用于语义上相距较远的不同对。受到SimCLR可以在子空间中分开给定的任意对（例如搭扣蛇和台灯）的发现的启发，我们提出了在类对条件下创建所需子空间的特征滤波器，通过门向量有选择地激活或停用维度。这个滤波器可以通过传统的对比学习机制中的梯度下降进行优化。我们提出了Hydra，一个通用的对比学习框架，用于视觉表示，将传统的对比学习扩展到适应任意对。我们的方法在IN1K上进行了验证，其中1K个不同的类别组成了500,500对，其中大部分是不同的。令人惊讶的是，在这种具有挑战性的环境中，Hydra实现了卓越的性能。额外的好处包括避免维度崩溃和发现类别关系。我们的工作突显了学习任意对的共同特征的价值，并可能扩大对比学习技术在具有弱关系的样本对上的适用性。

更新时间: 2024-10-23 18:07:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18200v1

LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization

This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This ``gray box" method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO.

Updated: 2024-10-23 18:04:54

标题: LeTO:学习受限制的可视动作策略与可区分轨迹优化

摘要: 本文介绍了LeTO，一种通过可微分轨迹优化学习受限视觉动作策略的方法。我们的方法将一个可微分优化层集成到神经网络中。通过将优化层构建为一个轨迹优化问题，我们使模型能够以安全和受控约束的方式端到端生成动作，而无需额外的模块。我们的方法允许在训练过程中引入约束信息，从而平衡满足约束、平滑轨迹和最小化误差的训练目标与演示的关系。这种“灰盒”方法将基于优化的安全和可解释性与神经网络的强大表征能力结合在一起。我们在模拟环境和真实机器人中对LeTO进行了定量评估。结果表明LeTO在模拟和真实世界任务中表现良好。此外，与现有的模仿学习方法相比，它能够生成更少不确定性、更高质量和更平滑的轨迹。因此，LeTO提供了一个实际示例，展示了如何实现神经网络与轨迹优化的集成。我们的代码发布在https://github.com/ZhengtongXu/LeTO。

更新时间: 2024-10-23 18:04:54

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2401.17500v3

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI). CQA integrates a retrieval-augmented generation (RAG) pipeline to leverage large language models (LLMs) and external medical knowledge to generate detailed textual descriptions of ECGs. The generated text is enriched with information about demographics and waveform patterns. ESI integrates both contrastive and captioning loss to pretrain ECG encoders for enhanced representations. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches.

Updated: 2024-10-23 18:01:50

标题: ECG语义集成器（ESI）：一个预先用LLM增强的心脏文字训练的基础ECG模型

摘要: 深度学习在心电图（ECG）分析中的应用提高了心脏医疗诊断的准确性和效率。通过利用深度学习在语义理解中的能力，特别是在特征提取和表示学习方面，本研究介绍了一个旨在改善12导联ECG信号学习表示质量和鲁棒性的新的多模态对比预训练框架。我们的框架包括两个关键组件，包括心脏查询助手（CQA）和ECG语义集成器（ESI）。CQA集成了一个检索增强生成（RAG）管道，利用大型语言模型（LLMs）和外部医学知识生成ECG的详细文本描述。生成的文本丰富了关于人口统计信息和波形模式的信息。ESI集成了对比损失和字幕损失，为增强表示预训练ECG编码器。我们通过各种下游任务验证了我们的方法，包括心律失常检测和基于ECG的主体识别。我们的实验结果表明，在这些任务中，我们的方法在强基线方法上取得了实质性的改进。这些基线包括监督学习和自监督学习方法，以及先前的多模态预训练方法。

更新时间: 2024-10-23 18:01:50

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2405.19366v2

ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

Data selection is crucial for optimizing language model (LM) performance on specific tasks, yet most existing methods fail to effectively consider the target task distribution. Current approaches either ignore task-specific requirements entirely or rely on approximations that fail to capture the nuanced patterns needed for tasks like Autoformalization or code generation. Methods that do consider the target distribution often rely on simplistic, sometimes noisy, representations, like hashed n-gram features, which can lead to collisions and introduce noise. We introduce ZIP-FIT, a data selection framework that uses gzip compression to directly measure alignment between potential training data and the target task distribution. In extensive evaluations on Autoformalization and Python code generation, ZIP-FIT significantly outperforms leading baselines like DSIR and D4. Models trained on ZIP-FIT-selected data achieve their lowest cross-entropy loss up to 85.1\% faster than baselines, demonstrating that better task alignment leads to more efficient learning. In addition, ZIP-FIT performs selection up to 65.8\% faster than DSIR and two orders of magnitude faster than D4. Notably, ZIP-FIT shows that smaller, well-aligned datasets often outperform larger but less targeted ones, demonstrating that a small amount of higher quality data is superior to a large amount of lower quality data. Our results imply that task-aware data selection is crucial for efficient domain adaptation, and that compression offers a principled way to measure task alignment. By showing that targeted data selection can dramatically improve task-specific performance, our work provides new insights into the relationship between data quality, task alignment, and model learning efficiency.

Updated: 2024-10-23 18:01:06

标题: ZIP-FIT：通过基于压缩的对齐实现无嵌入数据选择

摘要: 数据选择对于优化语言模型（LM）在特定任务上的性能至关重要，然而大多数现有方法未能有效考虑目标任务的分布。当前方法要么完全忽略任务特定要求，要么依赖于无法捕捉Autoformalization或代码生成等任务所需微妙模式的近似方法。考虑目标分布的方法通常依赖于简单的、有时带有噪音的表示，比如哈希n-gram特征，这可能导致碰撞并引入噪音。我们介绍了ZIP-FIT，一个数据选择框架，利用gzip压缩直接衡量潜在训练数据与目标任务分布之间的对齐度。在Autoformalization和Python代码生成的广泛评估中，ZIP-FIT明显优于DSIR和D4等主流基线。在ZIP-FIT选择的数据上训练的模型的交叉熵损失最多比基线快85.1\%，表明更好的任务对齐导致更高效的学习。此外，ZIP-FIT的选择速度比DSIR快65.8\%，比D4快两个数量级。值得注意的是，ZIP-FIT显示，较小但对齐良好的数据集通常优于较大但不够精准的数据集，表明少量高质量数据优于大量低质量数据。我们的结果暗示，任务感知的数据选择对于有效的领域适应至关重要，压缩提供了一种衡量任务对齐的原则方法。通过展示有针对性的数据选择可以显著提高任务特定性能，我们的工作为数据质量、任务对齐和模型学习效率之间的关系提供了新的见解。

更新时间: 2024-10-23 18:01:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18194v1

TabDPT: Scaling Tabular Foundation Models

The challenges faced by neural networks on tabular data are well-documented and have hampered the progress of tabular foundation models. Techniques leveraging in-context learning (ICL) have shown promise here, allowing for dynamic adaptation to unseen data. ICL can provide predictions for entirely new datasets without further training or hyperparameter tuning, therefore providing very fast inference when encountering a novel task. However, scaling ICL for tabular data remains an issue: approaches based on large language models cannot efficiently process numeric tables, and tabular-specific techniques have not been able to effectively harness the power of real data to improve performance and generalization. We are able to overcome these challenges by training tabular-specific ICL-based architectures on real data with self-supervised learning and retrieval, combining the best of both worlds. Our resulting model -- the Tabular Discriminative Pre-trained Transformer (TabDPT) -- achieves state-of-the-art performance on the CC18 (classification) and CTR23 (regression) benchmarks with no task-specific fine-tuning, demonstrating the adapatability and speed of ICL once the model is pre-trained. TabDPT also demonstrates strong scaling as both model size and amount of available data increase, pointing towards future improvements simply through the curation of larger tabular pre-training datasets and training larger models.

Updated: 2024-10-23 18:00:00

标题: TabDPT：缩放表格基础模型

摘要: 神经网络在表格数据上面临的挑战已经被充分记录，并且阻碍了表格基础模型的进展。利用上下文学习（ICL）的技术在这方面显示出了潜力，允许动态适应未知数据。ICL可以为全新数据集提供预测，无需进一步训练或超参数调整，因此在遇到新任务时提供非常快速的推理。然而，针对表格数据的ICL的扩展仍然存在问题：基于大型语言模型的方法无法高效处理数字表格，而特定于表格的技术尚未能有效利用真实数据的力量来提高性能和泛化能力。我们通过在真实数据上进行自监督学习和检索训练特定于表格的ICL架构，结合了两者的优势，克服了这些挑战。我们的最终模型--Tabular Discriminative Pre-trained Transformer（TabDPT）--在CC18（分类）和CTR23（回归）基准测试上取得了最先进的性能，无需特定任务的微调，展示了一旦模型预训练后ICL的适应性和速度。TabDPT还展示了随着模型大小和可用数据量的增加而呈现出的强大扩展性，预示着通过筛选更大的表格预训练数据集和训练更大的模型可以简单地实现未来的改进。

更新时间: 2024-10-23 18:00:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.18164v1

Prioritized Generative Replay

Sample-efficient online reinforcement learning often uses replay buffers to store experience for reuse when updating the value function. However, uniform replay is inefficient, since certain classes of transitions can be more relevant to learning. While prioritization of more useful samples is helpful, this strategy can also lead to overfitting, as useful samples are likely to be more rare. In this work, we instead propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience. This paradigm enables (1) densification of past experience, with new generations that benefit from the generative model's generalization capacity and (2) guidance via a family of "relevance functions" that push these generations towards more useful parts of an agent's acquired history. We show this recipe can be instantiated using conditional diffusion models and simple relevance functions such as curiosity- or value-based metrics. Our approach consistently improves performance and sample efficiency in both state- and pixel-based domains. We expose the mechanisms underlying these gains, showing how guidance promotes diversity in our generated transitions and reduces overfitting. We also showcase how our approach can train policies with even higher update-to-data ratios than before, opening up avenues to better scale online RL agents.

Updated: 2024-10-23 17:59:52

标题: 优先生成重放

摘要: 高效的在线强化学习通常使用回放缓冲区来存储经验，以便在更新值函数时重新利用。然而，均匀回放是低效的，因为某些类别的转换对学习更为重要。虽然优先处理更有用的样本是有帮助的，但这种策略也可能导致过拟合，因为有用的样本可能更加罕见。在这项工作中，我们提出了一种基于参数的优先级版本的代理记忆，使用生成模型捕捉在线经验。这一范式实现了(1)过去经验的密集化，新一代受益于生成模型的泛化能力，以及(2)通过一系列“相关性函数”的指导，推动这些新一代走向代理获取历史中更有用的部分。我们展示了这一方法可以利用条件扩散模型和简单的相关性函数，如好奇心或基于价值的指标，来实现。我们的方法在基于状态和像素的领域中始终提高了性能和样本效率。我们揭示了这些收益背后的机制，展示了指导如何促进我们生成的转换中的多样性，并减少过拟合。我们还展示了我们的方法如何训练具有比以往更高更新数据比率的策略，为更好地扩展在线强化学习代理开辟了新的机遇。

更新时间: 2024-10-23 17:59:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.18082v1

ALTA: Compiler-Based Analysis of Transformers

We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework -- language specification, symbolic interpreter, and weight compiler -- available to the community to enable further applications and insights.

Updated: 2024-10-23 17:58:49

标题: ALTA: 基于编译器的Transformer分析

摘要: 我们提出了一种名为ALTA的新编程语言，以及一种编译器，可以将ALTA程序映射到Transformer的权重。ALTA受到了Weiss等人（2021年）提出的RASP语言和Lindner等人（2023年）提出的Tracr编译器的启发，后者将RASP程序编译成Transformer的权重。ALTA对这些先前工作进行了补充和扩展，提供了表达循环以及将程序编译为通用Transformer等其他优势。ALTA使我们能够建构性地展示Transformer如何表示计算奇偶和加法等长度不变算法，以及解决组合泛化任务的SCAN基准，而无需中间的scratchpad解码步骤。我们还提出了分析算法表达能力已经确立，但在给定训练集上的端到端训练未能诱导符合期望算法行为的情况的工具。为此，我们探索从ALTA执行跟踪进行训练作为更精细的监督信号。这使得能够进行额外的实验和理论分析，将不同算法的可学习性与数据可用性和建模决策（如位置编码）相关联。我们将ALTA框架（语言规范、符号解释器和权重编译器）提供给社区，以促进进一步的应用和见解。

更新时间: 2024-10-23 17:58:49

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18077v1

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. While prior data can be used to pretrain a set of low-level skills, or as additional off-policy data for online RL, it has been unclear how to combine these ideas effectively for online exploration. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. Our method first extracts low-level skills using a variational autoencoder (VAE), and then pseudo-relabels unlabeled trajectories using an optimistic reward model, transforming prior data into high-level, task-relevant examples. Finally, SUPE uses these transformed examples as additional off-policy data for online RL to learn a high-level policy that composes pretrained low-level skills to explore efficiently. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks. Code: https://github.com/rail-berkeley/supe.

Updated: 2024-10-23 17:58:45

标题: 利用未标记的先前数据中的技能，实现高效的在线探索

摘要: 无监督预训练在许多监督领域中具有变革性。然而，将这些想法应用于强化学习（RL）在于fine-tuning不涉及模仿特定任务数据，而是通过迭代的自我改进来探索和定位解决方案，这提出了一种独特的挑战。在这项工作中，我们研究了如何利用未标记的先前轨迹数据来学习高效的探索策略。虽然先前数据可以用于预训练一组低级技能，或作为在线RL的附加离线数据，但如何有效地将这些想法结合起来进行在线探索仍不清楚。我们的方法SUPE（来自未标记先前数据的技能用于探索）表明，仔细结合这些想法可以增加其收益。我们的方法首先使用变分自动编码器（VAE）提取低级技能，然后使用一种乐观奖励模型伪标记未标记的轨迹，将先前数据转化为高级、与任务相关的示例。最后，SUPE使用这些转化后的示例作为在线RL的附加离线数据，学习一个高级策略，将预训练的低级技能组合起来进行高效探索。我们在实证中显示，SUPE可靠地胜过先前的策略，成功解决了一系列长时间跨度、稀疏奖励的任务。源代码：https://github.com/rail-berkeley/supe。

更新时间: 2024-10-23 17:58:45

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.18076v1

ProFL: Performative Robust Optimal Federated Learning

Performative prediction (PP) is a framework that captures distribution shifts that occur during the training of machine learning models due to their deployment. As the trained model is used, its generated data could cause the model to evolve, leading to deviations from the original data distribution. The impact of such model-induced distribution shifts in the federated learning (FL) setup remains unexplored despite being increasingly likely to transpire in real-life use cases. Although Jin et al. (2024) recently extended PP to FL in a straightforward manner, the resulting model only converges to a performative stable point, which may be far from optimal. The methods in Izzo et al. (2021); Miller et al. (2021) can find a performative optimal point in centralized settings, but they require the performative risk to be convex and the training data to be noiseless, assumptions often violated in realistic FL systems. This paper overcomes all of these shortcomings and proposes Performative robust optimal Federated Learning (ProFL), an algorithm that finds performative optimal points in FL from noisy and contaminated data. We present the convergence analysis under the Polyak-Lojasiewicz condition, which applies to non-convex objectives. Extensive experiments on multiple datasets validate our proposed algorithms' efficiency.

Updated: 2024-10-23 17:57:14

标题: ProFL: 执行鲁棒最优联邦学习

摘要: 履行性预测（PP）是一个框架，捕捉了在机器学习模型训练过程中由于其部署而发生的分布转移。当训练过的模型被使用时，其生成的数据可能导致模型演变，从而偏离原始数据分布。尽管在联邦学习（FL）设置中模型诱导的分布转移的影响尚未被探索，但在现实应用中越来越可能发生。尽管Jin等人（2024）最近以一种简单直接的方式将PP扩展到FL，但所得到的模型仅收敛到一个履行性稳定点，这可能远非最优。Izzo等人（2021）；米勒等人（2021）的方法可以在集中式设置中找到一个履行性最优点，但它们要求履行性风险是凸的且训练数据是无噪声的，这些假设在现实FL系统中经常被违反。本文克服了所有这些缺点，并提出了Performative robust optimal Federated Learning（ProFL）算法，该算法从嘈杂和污染的数据中找到FL中的履行性最优点。我们在Polyak-Lojasiewicz条件下进行了收敛分析，该条件适用于非凸目标。对多个数据集进行的大量实验验证了我们提出的算法的效率。

更新时间: 2024-10-23 17:57:14

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2410.18075v1

UnCLe: Unsupervised Continual Learning of Depth Completion

We propose UnCLe, a standardized benchmark for Unsupervised Continual Learning of a multimodal depth estimation task: Depth completion aims to infer a dense depth map from a pair of synchronized RGB image and sparse depth map. We benchmark depth completion models under the practical scenario of unsupervised learning over continuous streams of data. Existing methods are typically trained on a static, or stationary, dataset. However, when adapting to novel non-stationary distributions, they "catastrophically forget" previously learned information. UnCLe simulates these non-stationary distributions by adapting depth completion models to sequences of datasets containing diverse scenes captured from distinct domains using different visual and range sensors. We adopt representative methods from continual learning paradigms and translate them to enable unsupervised continual learning of depth completion. We benchmark these models for indoor and outdoor and investigate the degree of catastrophic forgetting through standard quantitative metrics. Furthermore, we introduce model inversion quality as an additional measure of forgetting. We find that unsupervised continual learning of depth completion is an open problem, and we invite researchers to leverage UnCLe as a development platform.

Updated: 2024-10-23 17:56:33

标题: UnCLe：无监督深度补全的持续学习

摘要: 我们提出了UnCLe，这是一个用于多模态深度估计任务的无监督持续学习的标准基准：深度完成旨在从一对同步的RGB图像和稀疏深度图中推断出密集的深度图。我们在实际情况下对深度完成模型进行基准测试，这种情况是在连续数据流上进行无监督学习。现有方法通常是在静态或固定的数据集上训练的。然而，当适应新的非固定分布时，它们会“灾难性地遗忘”先前学到的信息。UnCLe通过将深度完成模型调整为包含来自不同视觉和范围传感器使用的不同领域捕获的各种场景的数据集序列来模拟这些非固定分布。我们从持续学习范式中采用代表性方法，并将它们转化为使深度完成的无监督持续学习成为可能。我们对室内和室外的这些模型进行基准测试，并通过标准量化指标调查灾难性遗忘的程度。此外，我们引入模型逆转质量作为遗忘的附加度量。我们发现无监督持续学习深度完成是一个开放的问题，我们邀请研究人员利用UnCLe作为开发平台。

更新时间: 2024-10-23 17:56:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18074v1

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

Recently, multimodal large language models (MLLMs) have received much attention for their impressive capabilities. The evaluation of MLLMs is becoming critical to analyzing attributes of MLLMs and providing valuable insights. However, current benchmarks overlook the problem of prompt sensitivity - minor prompt variations may lead to significant performance fluctuations. Thus, inappropriate prompts may obscure the models' capabilities, underestimating the models' performance. Moreover, different models have different preferences for different prompts, and thus, using the same prompt for all models will cause evaluation bias. This paper analyzes this deficiency in existing benchmarks and further introduces a new evaluation framework named TP-Eval, which introduces a prompt customization method to reduce evaluation biases and tap models' potential. TP-Eval will rewrite the original prompts to different customized prompts for different models. In particular, we propose some well-designed modules for prompt customization tailored to the scenario of MLLM evaluation. Extensive experiments demonstrate the effectiveness of our approach to uncovering models' capabilities, and TP-Eval should benefit the community in developing more comprehensive and convincing MLLM evaluation benchmarks.

Updated: 2024-10-23 17:54:43

标题: TP-Eval：通过定制提示来评估触屏多模态LLMs的潜力

摘要: 最近，多模态大型语言模型（MLLMs）因其印象深刻的能力而受到广泛关注。评估MLLMs对于分析MLLMs的属性并提供有价值的见解变得至关重要。然而，当前的基准忽略了提示敏感性的问题 - 微小的提示变化可能导致性能波动。因此，不恰当的提示可能会掩盖模型的能力，低估模型的性能。此外，不同的模型对不同的提示有不同的偏好，因此，对所有模型使用相同的提示会导致评估偏差。本文分析了现有基准中的这一缺陷，并进一步引入了一个名为TP-Eval的新的评估框架，该框架引入了一种提示定制方法以减少评估偏差并发掘模型的潜力。TP-Eval将为不同的模型重写原始提示，以获得不同的定制提示。特别是，我们提出了一些针对MLLM评估场景量身定制的提示定制模块。大量实验表明我们的方法揭示了模型的能力，TP-Eval应有助于社区开发更全面和有说服力的MLLM评估基准。

更新时间: 2024-10-23 17:54:43

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18071v1

Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers

To solve ever more complex problems, Deep Neural Networks are scaled to billions of parameters, leading to huge computational costs. An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of these often over-parameterized networks. Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion. We extend the current state by proposing to explicitly optimize hyperparameters of attribution methods for the task of pruning, and further include transformer-based networks in our analysis. Our approach yields higher model compression rates of large transformer- and convolutional architectures (VGG, ResNet, ViT) compared to previous works, while still attaining high performance on ImageNet classification tasks. Here, our experiments indicate that transformers have a higher degree of over-parameterization compared to convolutional neural networks. Code is available at https://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorch.

Updated: 2024-10-23 17:53:24

标题: 解释性修剪再探讨：优化归因方法以修剪CNNs和Transformers

摘要: 为了解决越来越复杂的问题，深度神经网络被扩展到数十亿个参数，导致巨大的计算成本。减少计算需求并提高效率的有效方法是修剪这些经常过度参数化的网络的不必要组件。先前的研究表明，来自可解释人工智能领域的归因方法是提取和修剪最不相关网络组件的有效手段。我们通过提出明确优化归因方法的超参数来扩展当前状态，以用于修剪任务，并进一步在我们的分析中包括基于转换器的网络。我们的方法相对于先前的作品，能够实现大型转换器和卷积架构（VGG、ResNet、ViT）的更高模型压缩率，同时在ImageNet分类任务上仍能获得高性能。在这里，我们的实验表明，与卷积神经网络相比，转换器具有更高程度的过度参数化。代码可在https://github.com/erfanhatefi/Pruning-by-eXplaining-in-PyTorch获取。

更新时间: 2024-10-23 17:53:24

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.12568v2

Training Free Guided Flow Matching with Optimal Control

Controlled generation with pre-trained Diffusion and Flow Matching models has vast applications. One strategy for guiding ODE-based generative models is through optimizing a target loss $R(x_1)$ while staying close to the prior distribution. Along this line, some recent work showed the effectiveness of guiding flow model by differentiating through its ODE sampling process. Despite the superior performance, the theoretical understanding of this line of methods is still preliminary, leaving space for algorithm improvement. Moreover, existing methods predominately focus on Euclidean data manifold, and there is a compelling need for guided flow methods on complex geometries such as SO(3), which prevails in high-stake scientific applications like protein design. We present OC-Flow, a general and theoretically grounded training-free framework for guided flow matching using optimal control. Building upon advances in optimal control theory, we develop effective and practical algorithms for solving optimal control in guided ODE-based generation and provide a systematic theoretical analysis of the convergence guarantee in both Euclidean and SO(3). We show that existing backprop-through-ODE methods can be interpreted as special cases of Euclidean OC-Flow. OC-Flow achieved superior performance in extensive experiments on text-guided image manipulation, conditional molecule generation, and all-atom peptide design.

Updated: 2024-10-23 17:53:11

标题: 使用最优控制的无训练引导流匹配

摘要: 使用预训练的扩散和流匹配模型进行受控生成具有广泛的应用。引导基于ODE的生成模型的一种策略是通过优化目标损失$R(x_1)$，同时保持接近先验分布。沿着这条线，一些最近的工作展示了通过其ODE抽样过程微分来引导流模型的有效性。尽管性能优越，但对这一方法系列的理论理解仍处于初步阶段，留下了算法改进的空间。此外，现有方法主要集中在欧几里得数据流形上，对于如SO(3)这样的复杂几何形状存在迫切需求的引导流方法，这在高风险科学应用中如蛋白质设计中占主导地位。我们提出了OC-Flow，一个通用且在理论上有基础的无需训练的引导流匹配框架，使用最优控制。借鉴最优控制理论的进展，我们开发了解决引导ODE生成中的最优控制问题的有效和实用算法，并在欧几里得和SO(3)两种情况下提供了收敛保证的系统理论分析。我们展示了现有的通过ODE反向传播的方法可以解释为欧几里得OC-Flow的特例。OC-Flow在文本引导图像操作、条件分子生成和全原子肽设计的广泛实验中实现了卓越的性能。

更新时间: 2024-10-23 17:53:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18070v1

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

Because it is difficult to precisely specify complex objectives, reinforcement learning policies are often optimized using flawed proxy rewards that seem to capture the true objective. However, optimizing proxy rewards frequently leads to reward hacking: the optimized reward function ceases to be a good proxy, and the resulting policy performs poorly with respect to the unspecified true reward. Principled solutions to reward hacking have been impeded by the lack of a good definition for the problem. To address this, we introduce a definition of reward hacking based on the correlation between proxy and true rewards for states and actions seen by a "base policy" that breaks down under optimization. We show that this definition captures reward hacking behavior across several realistic settings, including in reinforcement learning from human feedback (RLHF). We then show theoretically that regularization to the base policy can effectively prevent reward hacking. While current RLHF approaches apply a KL penalty between the action distributions of policies, our theory suggests that it is more effective to regularize using the $\chi^2$ divergence between the policies' occupancy measures. We intuitively show why this type of regularization is superior and demonstrate that it better mitigates reward hacking in practice across four realistic domains, including RLHF for LLMs. Our code is available at https://github.com/cassidylaidlaw/orpo.

Updated: 2024-10-23 17:52:57

标题: 相关代理：奖励作弊的新定义和改进的缓解方法

摘要: 由于很难精确指定复杂的目标，强化学习策略通常使用似乎能捕捉真实目标的有缺陷的代理奖励进行优化。然而，优化代理奖励经常导致奖励破解：优化后的奖励函数不再是一个良好的代理，由此产生的策略在未指定的真实奖励方面表现不佳。对奖励破解的原则性解决方案受到问题定义的缺乏阻碍。为了解决这个问题，我们引入了一个基于“基本策略”所看到的状态和行为的代理和真实奖励之间相关性的奖励破解定义，该定义在优化下崩溃。我们展示了这个定义捕捉了几个现实设置中的奖励破解行为，包括从人类反馈中的强化学习（RLHF）。然后，我们在理论上展示了对基本策略的正则化可以有效防止奖励破解。虽然当前的RLHF方法应用了策略之间的动作分布之间的KL惩罚，但我们的理论表明，使用策略的占用度量之间的$\chi^2$散度进行正则化更有效。我们直观地展示了为什么这种类型的正则化更优越，并证明它在实践中更好地缓解了四个现实领域中的奖励破解，包括LLMs的RLHF。我们的代码可以在https://github.com/cassidylaidlaw/orpo找到。

更新时间: 2024-10-23 17:52:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.03185v2

Physical Reasoning and Object Planning for Household Embodied Agents

In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments. Drawing inspiration from factors affecting human decision-making, we explore how large language models tackle this challenge through four meticulously crafted commonsense question-and-answer datasets featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights, to simulate diverse household scenarios. Our contributions include insightful human preference mappings for all three factors and four extensive QA datasets (2K, 15k, 60k, 70K questions) probing the intricacies of utility dependencies, contextual dependencies and object physical states. The datasets, along with our findings, are accessible at: https://github.com/Ayush8120/COAT. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

Updated: 2024-10-23 17:50:54

标题: 家用实体代理的物理推理和物体规划

摘要: 在本研究中，我们探讨了为强大的家庭实体代理进行任务规划的复杂领域，特别强调选择替代物品这一错综复杂的任务。我们介绍了CommonSense Object Affordance Task（COAT），这是一个旨在分析常识场景中推理能力的新框架。这种方法的核心是理解这些代理在执行家务任务时如何有效地识别和利用替代物品，从而揭示了在现实环境中进行实践决策的复杂性。受影响人类决策的因素启发，我们探讨了大型语言模型如何通过四个精心设计的常识问答数据集来解决这一挑战，这些数据集具有精细的规则和人类注释。我们对这些数据集上最先进的语言模型的评估揭示了三个关键考虑因素：1）将对象固有的效用与手头任务对齐，2）导航上下文依赖关系（社会规范、安全性、适当性和效率），以及3）考虑对象的当前物理状态。为了保持可访问性，我们引入了五个抽象变量，反映了一个对象的物理状态，由人类洞察力调节，以模拟多样的家庭场景。我们的贡献包括所有三个因素的深刻人类偏好映射和四个大量问答数据集（2K、15k、60k、70K问题），探讨了效用依赖、上下文依赖和对象物理状态的错综复杂性。数据集以及我们的研究发现可在以下网址获取：https://github.com/Ayush8120/COAT。这项研究不仅推进了我们对语言模型中物理常识推理的理解，还为未来家庭代理智能的改进铺平了道路。

更新时间: 2024-10-23 17:50:54

领域: cs.AI

下载: http://arxiv.org/abs/2311.13577v2

Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers

Rotary Positional Embeddings (RoPE) enhance positional encoding in Transformer models, yet their full impact on model dynamics remains underexplored. This paper studies how RoPE introduces position-dependent rotations, causing phase shifts in token embeddings that influence higher-frequency components within the model's internal representations. Through spectral analysis, we demonstrate that RoPE's rotation matrices induce oscillatory behaviors in embeddings, affecting information retention across layers and shaping temporal modeling capabilities. We show that activation functions in feed-forward networks interact with RoPE-modulated embeddings to generate harmonics, leading to constructive or destructive interference based on phase alignment. Our findings reveal that phase alignment amplifies activations and sharpens attention, while misalignment weakens activations and disrupts focus on positional patterns. This study underscores the importance of frequency components as intrinsic elements of model behavior, offering new insights beyond traditional analyses.

Updated: 2024-10-23 17:48:28

标题: 超越位置：旋转嵌入如何塑造自回归转换器中的表示和记忆

摘要: Rotary Positional Embeddings (RoPE)增强了Transformer模型中的位置编码，但它们对模型动态的完整影响尚未得到充分探讨。本文研究了RoPE如何引入位置相关的旋转，导致令牌嵌入中的相位移位，影响模型内部表示中的高频组件。通过频谱分析，我们证明RoPE的旋转矩阵引起嵌入中的振荡行为，影响跨层信息保留并塑造时间建模能力。我们展示了前馈网络中的激活函数如何与RoPE调制的嵌入交互，生成谐波，并根据相位对齐进行构造性或破坏性干涉。我们的研究结果表明，相位对齐增强了激活并增强了注意力，而不对齐削弱了激活并破坏了对位置模式的关注。这项研究强调了频率成分作为模型行为的固有元素的重要性，并提供了超越传统分析的新见解。

更新时间: 2024-10-23 17:48:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18067v1

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy.

Updated: 2024-10-23 17:47:58

标题: MADial-Bench：走向面向现实世界的记忆增强对话生成评估

摘要: 长期记忆对于聊天机器人和对话系统（DS）创建一致且人类化的对话至关重要，这一点在许多开发的记忆增强DS（MADS）中得到证明。为了评估这种MADS的有效性，现有常用的评估指标，如检索准确率和困惑度（PPL），主要侧重于面向查询的事实性和语言质量评估。然而，这些指标常常缺乏实际价值。此外，DS中的人类化评估维度不足。关于记忆召回范式，当前的评估方案只考虑被动记忆检索，而忽略了具有丰富触发因素的多样化记忆召回，例如情绪和环境，这在情感支持场景中可能至关重要。为了弥补这一差距，我们构建了一个新颖的记忆增强对话基准（MADail-Bench），基于认知科学和心理学理论，涵盖各种记忆召回范式。该基准分别评估两个任务：记忆检索和记忆识别，结合被动和主动记忆召回数据。我们引入了新的评分标准，包括记忆注入、情感支持（ES）熟练度和亲密度，以全面评估生成的响应。在这一基准上，最先进的嵌入模型和大型语言模型的结果表明了进一步发展的潜力。广泛的测试进一步揭示了记忆注入、ES熟练度和亲密度之间的相关性。

更新时间: 2024-10-23 17:47:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15240v2

KeySpace: Public Key Infrastructure Considerations in Interplanetary Networks

As satellite networks grow larger and begin to incorporate interplanetary communication, there is an increasing interest in the unsolved problem of how to approach PKI in these conditions. In this paper we explore the goals and requirements for implementing key management systems in satellite networks, focusing on megaconstellations and interplanetary networks. We design a set of standardized experiments which can be used to compare systems against one another for particular network topologies. Using these, we demonstrate that terrestrial PKI techniques are feasible in highly distributed interplanetary networks, showing that it is possible to configure PKI systems to achieve efficient low-latency connection establishment, and minimize the impact of attacks through effective revocations. We evaluate this by building the Deep Space Network Simulator (DSNS), a novel network simulator aimed at efficient simulation of large space networks. We run simulations evaluating connection establishment and key revocation under a wide range of PKI configurations. Finally, we propose and evaluate two additional configuration options: OCSP Hybrid, and the use of relay nodes as a firewall. Together these minimize the extent of the network an attacker can reach with a compromised key, and reduce the attacker's load on interplanetary relay links.

Updated: 2024-10-23 17:47:20

标题: KeySpace：星际网络中的公钥基础设施考虑

摘要: 随着卫星网络规模的不断扩大和开始融入星际通信，人们对如何在这些条件下处理公钥基础设施（PKI）这一尚未解决的问题越发感兴趣。本文探讨了在卫星网络中实施密钥管理系统的目标和要求，重点关注超级星座和星际网络。我们设计了一组标准化实验，可用于比较特定网络拓扑中系统之间的性能。通过这些实验，我们展示了在高度分布的星际网络中，地面PKI技术是可行的，表明可以配置PKI系统以实现高效低延迟的连接建立，并通过有效的吊销最小化攻击的影响。我们通过构建深空网络模拟器（DSNS）来评估这一点，该网络模拟器旨在高效模拟大型空间网络。我们运行了模拟实验，评估了在各种PKI配置下的连接建立和密钥吊销情况。最后，我们提出并评估了两种额外的配置选项：OCSP混合和使用中继节点作为防火墙。这些方法共同减少了攻击者可以通过受compromised key影响的网络范围，并减少了攻击者在星际中继链路上的负载。

更新时间: 2024-10-23 17:47:20

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2408.10963v2

The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies

When humans are subject to an algorithmic decision system, they can strategically adjust their behavior accordingly (``game'' the system). While a growing line of literature on strategic classification has used game-theoretic modeling to understand and mitigate such gaming, these existing works consider standard models of fully rational agents. In this paper, we propose a strategic classification model that considers behavioral biases in human responses to algorithms. We show how misperceptions of a classifier (specifically, of its feature weights) can lead to different types of discrepancies between biased and rational agents' responses, and identify when behavioral agents over- or under-invest in different features. We also show that strategic agents with behavioral biases can benefit or (perhaps, unexpectedly) harm the firm compared to fully rational strategic agents. We complement our analytical results with user studies, which support our hypothesis of behavioral biases in human responses to the algorithm. Together, our findings highlight the need to account for human (cognitive) biases when designing AI systems, and providing explanations of them, to strategic human in the loop.

Updated: 2024-10-23 17:42:54

标题: 战略分类中行为反应的双刃剑：理论与用户研究

摘要: 当人类受到算法决策系统的影响时，他们可以根据需要调整行为（“操纵”系统）。尽管越来越多的文献关于战略分类已经使用博弈论建模来理解和减轻这种操纵，但这些现有作品考虑了完全理性的标准模型。在本文中，我们提出了一个考虑人类对算法的行为偏见的战略分类模型。我们展示了分类器的误解（特别是其特征权重）如何导致有偏和理性代理人的响应之间的不同类型的差异，并确定了行为代理人何时过度或过少地投资于不同的特征。我们还展示了具有行为偏见的战略代理人与完全理性的战略代理人相比，可能对公司带来益处或（也许出乎意料地）危害。我们通过用户研究来补充我们的分析结果，这些研究支持了我们关于人类对算法响应中行为偏见的假设。总的来说，我们的发现强调了在设计AI系统时需要考虑人类（认知）偏见，并为参与其中的战略人员提供解释的重要性。

更新时间: 2024-10-23 17:42:54

领域: cs.LG,cs.GT,cs.HC

下载: http://arxiv.org/abs/2410.18066v1

Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning.

Updated: 2024-10-23 17:42:39

标题: 有条件的语言政策：可控的多目标微调的通用框架

摘要: 基于奖励的微调对于将语言政策与预期行为（例如创造力和安全性）保持一致至关重要。一个关键挑战是开发可控制的语言模型，以灵活、高效的方式权衡多个（冲突的）目标。本文提出了条件语言政策（CLP），这是一个在多个目标上微调语言模型的通用框架。基于多任务训练和参数高效微调技术，CLP学习可控制的模型，能够在推理时有效地权衡冲突的目标。值得注意的是，这不需要训练或维护多个模型来实现不同目标之间的权衡。通过对两个摘要数据集的大量实验和消融分析，我们展示了CLP学习到的可控制语言模型优于并支配了现有的多目标微调方法。

更新时间: 2024-10-23 17:42:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.15762v2

SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation

Robot learning has proven to be a general and effective technique for programming manipulators. Imitation learning is able to teach robots solely from human demonstrations but is bottlenecked by the capabilities of the demonstrations. Reinforcement learning uses exploration to discover better behaviors; however, the space of possible improvements can be too large to start from scratch. And for both techniques, the learning difficulty increases proportional to the length of the manipulation task. Accounting for this, we propose SPIRE, a system that first uses Task and Motion Planning (TAMP) to decompose tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths. We develop novel strategies to train learning agents when deployed in the context of a planning system. We evaluate SPIRE on a suite of long-horizon and contact-rich robot manipulation problems. We find that SPIRE outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance, is 6 times more data efficient in the number of human demonstrations needed to train proficient agents, and learns to complete tasks nearly twice as efficiently. View https://sites.google.com/view/spire-corl-2024 for more details.

Updated: 2024-10-23 17:42:07

标题: SPIRE：用于长期操纵的协同规划，模仿和强化学习

摘要: 机器人学习已被证明是编程机械手的一种通用和有效的技术。模仿学习能够仅通过人类示范教导机器人，但受到示范能力的限制。强化学习利用探索来发现更好的行为；然而，可能的改进空间太大，无法从零开始。对于这两种技术，学习困难度随着操作任务的长度增加而增加。考虑到这一点，我们提出了SPIRE，这是一种系统，首先使用任务和运动规划（TAMP）将任务分解为较小的学习子问题，然后结合模仿学习和强化学习以最大化它们的优势。我们开发了在规划系统环境中部署时训练学习代理的新策略。我们在一系列长时程和接触丰富的机器人操作问题上评估了SPIRE。我们发现，SPIRE在平均任务性能上比之前整合模仿学习、强化学习和规划的方法表现优越，效果提升了35%至50%，在训练熟练代理所需的人类示范数量方面效率提高了6倍，并且学习完成任务的效率几乎提高了一倍。请查看https://sites.google.com/view/spire-corl-2024获取更多详细信息。

更新时间: 2024-10-23 17:42:07

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18065v1

Utilitarian Algorithm Configuration for Infinite Parameter Spaces

Utilitarian algorithm configuration is a general-purpose technique for automatically searching the parameter space of a given algorithm to optimize its performance, as measured by a given utility function, on a given set of inputs. Recently introduced utilitarian configuration procedures offer optimality guarantees about the returned parameterization while provably adapting to the hardness of the underlying problem. However, the applicability of these approaches is severely limited by the fact that they only search a finite, relatively small set of parameters. They cannot effectively search the configuration space of algorithms with continuous or uncountable parameters. In this paper we introduce a new procedure, which we dub COUP (Continuous, Optimistic Utilitarian Procrastination). COUP is designed to search infinite parameter spaces efficiently to find good configurations quickly. Furthermore, COUP maintains the theoretical benefits of previous utilitarian configuration procedures when applied to finite parameter spaces but is significantly faster, both provably and experimentally.

Updated: 2024-10-23 17:33:57

标题: Utilitarian Algorithm Configuration for Infinite Parameter Spaces （无限参数空间的效用主义算法配置）

摘要: Utilitarian算法配置是一种通用技术，用于自动搜索给定算法的参数空间，以优化其性能，此性能由给定的效用函数在给定输入集上进行衡量。最近引入的utilitarian配置程序提供有关返回参数化的最优性保证，同时能够证明地适应底层问题的难度。然而，这些方法的适用性受到严重限制，因为它们只搜索一个有限的、相对较小的参数集。它们无法有效地搜索具有连续或不可数参数的算法配置空间。在本文中，我们介绍了一种新的程序，我们称之为COUP（Continuous, Optimistic Utilitarian Procrastination）。COUP旨在高效地搜索无限参数空间，以快速找到良好的配置。此外，当应用于有限参数空间时，COUP保持了以前utilitarian配置程序的理论优势，但在证明和实验上都显著更快。

更新时间: 2024-10-23 17:33:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.18246v2

Explaining Bayesian Networks in Natural Language using Factor Arguments. Evaluation in the medical domain

In this paper, we propose a model for building natural language explanations for Bayesian Network Reasoning in terms of factor arguments, which are argumentation graphs of flowing evidence, relating the observed evidence to a target variable we want to learn about. We introduce the notion of factor argument independence to address the outstanding question of defining when arguments should be presented jointly or separately and present an algorithm that, starting from the evidence nodes and a target node, produces a list of all independent factor arguments ordered by their strength. Finally, we implemented a scheme to build natural language explanations of Bayesian Reasoning using this approach. Our proposal has been validated in the medical domain through a human-driven evaluation study where we compare the Bayesian Network Reasoning explanations obtained using factor arguments with an alternative explanation method. Evaluation results indicate that our proposed explanation approach is deemed by users as significantly more useful for understanding Bayesian Network Reasoning than another existing explanation method it is compared to.

Updated: 2024-10-23 17:33:27

标题: 用因子参数在自然语言中解释贝叶斯网络。在医学领域的评估。

摘要: 在这篇论文中，我们提出了一个建立贝叶斯网络推理的自然语言解释模型，这个模型基于因子论证，即流动证据的论证图，将观察到的证据与我们想要了解的目标变量联系起来。我们引入了因子论证独立性的概念，以解决定义何时应该联合或分开呈现论点的问题，并提出了一个算法，从证据节点和目标节点开始，生成一个按强度排序的所有独立因子论证的列表。最后，我们实现了一种使用这种方法构建贝叶斯推理的自然语言解释的方案。我们的提议已通过人为评估研究在医学领域进行验证，我们比较使用因子论证获得的贝叶斯网络推理解释与另一种替代解释方法。评估结果表明，用户认为我们提出的解释方法比与之比较的另一种现有解释方法更有助于理解贝叶斯网络推理。

更新时间: 2024-10-23 17:33:27

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.18060v1

B-Side: Binary-Level Static System Call Identification

System call filtering is widely used to secure programs in multi-tenant environments, and to sandbox applications in modern desktop software deployment and package management systems. Filtering rules are hard to write and maintain manually, hence generating them automatically is essential. To that aim, analysis tools able to identify every system call that can legitimately be invoked by a program are needed. Existing static analysis works lack precision because of a high number of false positives, and/or assume the availability of program/libraries source code -- something unrealistic in many scenarios such as cloud production environments. We present B-Side, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime. B-Side assumes no access to program/libraries sources, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works. B-Side also allows to statically detect phases of execution in a program in which different filtering rules can be applied. We validate B-Side and demonstrate its higher precision compared to state-of-the-art works: over a set of popular applications, B-Side's average $F_1$ score is 0.81, vs. 0.31 and 0.53 for competitors. Over 557 static and dynamically-compiled binaries taken from the Debian repositories, B-Side identifies an average of 43 system calls, vs. 271 and 95 for two state-of-the art competitors. We further evaluate the strictness of the phase-based filtering policies that can be obtained with B-Side.

Updated: 2024-10-23 17:26:52

标题: B-Side: 二进制级静态系统调用识别

摘要: 系统调用过滤被广泛用于在多租户环境中保护程序，并在现代桌面软件部署和软件包管理系统中对应用程序进行沙盒化。手动编写和维护过滤规则很困难，因此自动生成这些规则至关重要。为此，需要能够识别程序可以合法调用的每个系统调用的分析工具。现有的静态分析工作缺乏精确性，因为存在大量误报，并/或者假设程序/库源代码可用--这在许多场景（如云生产环境）中是不现实的。我们提出了B-Side，一种静态二进制分析工具，能够识别x86-64静态/动态可执行文件在运行时可能调用的系统调用的超集。B-Side假设无法访问程序/库源代码，并通过利用符号执行，结合一种启发式方法来检测系统调用包装器，从而在现有工作中提高了精度。B-Side还允许静态检测程序中可以应用不同过滤规则的执行阶段。我们验证了B-Side，并证明其与现有技术相比具有更高的精度：在一组流行应用程序中，B-Side的平均$F_1$分数为0.81，而竞争对手的分数分别为0.31和0.53。在从Debian存储库获取的557个静态和动态编译的二进制文件中，B-Side识别出平均43个系统调用，而两个现有竞争对手的识别数量分别为271个和95个。我们进一步评估了可以通过B-Side获得的基于执行阶段的过滤策略的严格性。

更新时间: 2024-10-23 17:26:52

领域: cs.CR,cs.OS

下载: http://arxiv.org/abs/2410.18053v1

Safeguard is a Double-edged Sword: Denial-of-service Attack on Large Language Models

Safety is a paramount concern of large language models (LLMs) in their open deployment. To this end, safeguard methods aim to enforce the ethical and responsible use of LLMs through safety alignment or guardrail mechanisms. However, we found that the malicious attackers could exploit false positives of safeguards, i.e., fooling the safeguard model to block safe content mistakenly, leading to a new denial-of-service (DoS) attack on LLMs. Specifically, by software or phishing attacks on user client software, attackers insert a short, seemingly innocuous adversarial prompt into to user prompt templates in configuration files; thus, this prompt appears in final user requests without visibility in the user interface and is not trivial to identify. By designing an optimization process that utilizes gradient and attention information, our attack can automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97\% of user requests on Llama Guard 3. The attack presents a new dimension of evaluating LLM safeguards focusing on false positives, fundamentally different from the classic jailbreak.

Updated: 2024-10-23 17:26:06

标题: 保护措施是一把双刃剑：对大型语言模型的拒绝服务攻击

摘要: 安全是大型语言模型（LLMs）在其开放部署中的首要关注点。为此，保障方法旨在通过安全对齐或护栏机制来强制执行LLMs的道德和负责任的使用。然而，我们发现恶意攻击者可能会利用保障的误报，即愚弄保障模型使其错误地阻止安全内容，从而导致对LLMs的新拒绝服务（DoS）攻击。具体而言，通过对用户客户端软件进行软件或钓鱼攻击，攻击者将一个短小、看似无害的对抗提示插入到配置文件中的用户提示模板中；因此，这个提示会出现在最终用户请求中，但在用户界面中看不到，并且不容易识别。通过设计一个利用梯度和注意力信息的优化过程，我们的攻击可以自动生成看似安全的对抗提示，大约只有30个字符长，通用地阻止了Llama Guard 3上超过97%的用户请求。这种攻击提出了一种评估LLM保障的新维度，重点关注误报，与经典的越狱完全不同。

更新时间: 2024-10-23 17:26:06

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.02916v2

Stochastic gradient descent in high dimensions for multi-spiked tensor PCA

We study the dynamics in high dimensions of online stochastic gradient descent for the multi-spiked tensor model. This multi-index model arises from the tensor principal component analysis (PCA) problem with multiple spikes, where the goal is to estimate $r$ unknown signal vectors within the $N$-dimensional unit sphere through maximum likelihood estimation from noisy observations of a $p$-tensor. We determine the number of samples and the conditions on the signal-to-noise ratios (SNRs) required to efficiently recover the unknown spikes from natural random initializations. We show that full recovery of all spikes is possible provided a number of sample scaling as $N^{p-2}$, matching the algorithmic threshold identified in the rank-one case [Ben Arous, Gheissari, Jagannath 2020, 2021]. Our results are obtained through a detailed analysis of a low-dimensional system that describes the evolution of the correlations between the estimators and the spikes, while controlling the noise in the dynamics. We find that the spikes are recovered sequentially in a process we term "sequential elimination": once a correlation exceeds a critical threshold, all correlations sharing a row or column index become sufficiently small, allowing the next correlation to grow and become macroscopic. The order in which correlations become macroscopic depends on their initial values and the corresponding SNRs, leading to either exact recovery or recovery of a permutation of the spikes. In the matrix case, when $p=2$, if the SNRs are sufficiently separated, we achieve exact recovery of the spikes, whereas equal SNRs lead to recovery of the subspace spanned by the spikes.

Updated: 2024-10-23 17:20:41

标题: 高维多峰张量主成分分析中的随机梯度下降

摘要: 我们研究了多尖峰张量模型的在线随机梯度下降在高维空间中的动态特性。这个多指数模型源自具有多个尖峰的张量主成分分析（PCA）问题，其目标是通过对具有噪声观测的$p$-张量进行最大似然估计，在$N$维单位球内估计$r$个未知信号向量。我们确定了需要有效恢复未知尖峰所需的样本数量和信噪比（SNR）的条件，从自然随机初始化开始。我们表明，只要样本数量按$N^{p-2}$的比例增长，就可以实现所有尖峰的完全恢复，与在秩为一的情况下确定的算法门槛相匹配[Ben Arous, Gheissari, Jagannath 2020, 2021]。我们的结果是通过对描述估计器和尖峰之间相关性演变的低维系统进行详细分析获得的，同时控制动态中的噪声。我们发现尖峰是按顺序恢复的，这个过程我们称之为“顺序消除”：一旦相关性超过临界阈值，所有共享行或列索引的相关性都变得足够小，从而允许下一个相关性增长并变得宏观。相关性成为宏观的顺序取决于它们的初始值和相应的SNR，导致要么完全恢复，要么恢复尖峰的一个排列。在矩阵情况下，当$p=2$时，如果SNR足够分离，我们可以实现尖峰的精确恢复，而相等的SNR会导致尖峰张成的子空间的恢复。

更新时间: 2024-10-23 17:20:41

领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH,68Q87, 62F30, 60G42

下载: http://arxiv.org/abs/2410.18162v1

Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases

Keyphrase selection is a challenging task in natural language processing that has a wide range of applications. Adapting existing supervised and unsupervised solutions for the Russian language faces several limitations due to the rich morphology of Russian and the limited number of training datasets available. Recent studies conducted on English texts show that large language models (LLMs) successfully address the task of generating keyphrases. LLMs allow achieving impressive results without task-specific fine-tuning, using text prompts instead. In this work, we access the performance of prompt-based methods for generating keyphrases for Russian scientific abstracts. First, we compare the performance of zero-shot and few-shot prompt-based methods, fine-tuned models, and unsupervised methods. Then we assess strategies for selecting keyphrase examples in a few-shot setting. We present the outcomes of human evaluation of the generated keyphrases and analyze the strengths and weaknesses of the models through expert assessment. Our results suggest that prompt-based methods can outperform common baselines even using simple text prompts.

Updated: 2024-10-23 17:07:32

标题: 关键短语生成的关键算法：基于指令的俄罗斯科学关键短语的LLM

摘要: 关键词选择是自然语言处理中一个具有广泛应用的挑战性任务。由于俄语的丰富形态和有限的训练数据集，将现有的监督和无监督解决方案应用于俄语面临一些限制。最近对英语文本进行的研究表明，大型语言模型（LLMs）成功地解决了生成关键词的任务。LLMs能够在不进行特定任务微调的情况下，使用文本提示取得令人印象深刻的结果。在这项工作中，我们评估了基于提示的方法在生成俄语科学摘要关键词方面的表现。首先，我们比较了零样本和少样本基于提示的方法、微调模型和无监督方法的性能。然后，我们评估了在少样本设置下选择关键词示例的策略。我们展示了生成的关键词的人类评估结果，并通过专家评估分析了模型的优势和劣势。我们的结果表明，基于提示的方法甚至使用简单的文本提示也能胜过常见的基线方法。

更新时间: 2024-10-23 17:07:32

领域: cs.CL,cs.AI,68T50,I.2.7; I.7.m; H.3.3

下载: http://arxiv.org/abs/2410.18040v1

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference

Each request in LLM inference goes through two phases: compute-bound prefill and memory-bandwidth-bound decode. To improve GPU utilization, recent systems use hybrid batching that combines the prefill and decode phases of different requests into the same batch. Hybrid batching works well for linear operations as it amortizes the cost of loading model weights from HBM. However, attention computation in hybrid batches remains inefficient because existing attention kernels are optimized for either prefill or decode. In this paper, we present POD-Attention -- the first GPU kernel that efficiently computes attention for hybrid batches. POD-Attention aims to maximize the utilization of both compute and memory bandwidth by carefully allocating the GPU's resources such that prefill and decode operations happen concurrently on the same multiprocessor. We integrate POD-Attention in a state-of-the-art LLM inference scheduler Sarathi-Serve. POD-Attention speeds up attention computation by up to 75% (mean 28%) and increases LLM serving throughput by up to 22% in offline inference. In online inference, POD-Attention enables lower time-to-first-token (TTFT), time-between-tokens (TBT), and request execution latency versus Sarathi-Serve.

Updated: 2024-10-23 17:06:56

标题: POD-Attention：解锁全预填充解码重叠，加快LLM推断速度

摘要: LLM推断中的每个请求都经过两个阶段：计算绑定的预填充和内存带宽绑定的解码。为了提高GPU利用率，最近的系统使用混合批处理将不同请求的预填充和解码阶段组合到同一批中。混合批处理对线性操作效果很好，因为它分摊了从HBM加载模型权重的成本。然而，在混合批处理中，注意力计算仍然是低效的，因为现有的注意力核心要么针对预填充，要么针对解码进行优化。在本文中，我们提出了POD-Attention——第一个能够有效计算混合批处理中的注意力的GPU核心。POD-Attention旨在通过仔细分配GPU资源，使预填充和解码操作同时在同一多处理器上进行，最大限度地利用计算和内存带宽。我们将POD-Attention集成到最先进的LLM推断调度程序Sarathi-Serve中。POD-Attention将注意力计算加速了多达75%（平均28%），并将LLM服务的吞吐量提高了多达22%。在离线推断中，POD-Attention使得时间-第一个令牌（TTFT）、令牌之间的时间（TBT）和请求执行延迟比Sarathi-Serve更低。

更新时间: 2024-10-23 17:06:56

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2410.18038v1

GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration

Graphs are widely used for modeling relational data in real-world scenarios, such as social networks and urban computing. Existing LLM-based graph analysis approaches either integrate graph neural networks (GNNs) for specific machine learning tasks, limiting their transferability, or rely solely on LLMs' internal reasoning ability, resulting in suboptimal performance. To address these limitations, we take advantage of recent advances in LLM-based agents, which have shown capabilities of utilizing external knowledge or tools for problem solving. By simulating human problem-solving strategies such as analogy and collaboration, we propose a multi-agent system based on LLMs named GraphTeam, for graph analysis. GraphTeam consists of five LLM-based agents from three modules, and the agents with different specialities can collaborate with each other to address complex problems. Specifically, (1) input-output normalization module: the question agent extracts and refines four key arguments from the original question, facilitating the problem understanding, and the answer agent organizes the results to meet the output requirement; (2) external knowledge retrieval module: we first build a knowledge base consisting of relevant documentation and experience information, and then the search agent retrieves the most relevant entries for each question. (3) problem-solving module: given the retrieved information from search agent, the coding agent uses established algorithms via programming to generate solutions, and in case the coding agent does not work, the reasoning agent will directly compute the results without programming. Extensive experiments on six graph analysis benchmarks demonstrate that GraphTeam achieves state-of-the-art performance with an average 25.85% improvement over the best baseline in terms of accuracy. The code and data are available at https://github.com/BUPT-GAMMA/GraphTeam.

Updated: 2024-10-23 17:02:59

标题: GraphTeam：通过多智能体协作促进基于大型语言模型的图分析

摘要: 图表在建模现实世界场景中的关系数据方面被广泛应用，如社交网络和城市计算。现有基于LLM的图分析方法要么整合了图神经网络（GNNs）用于特定机器学习任务，限制了它们的可转移性，要么仅依赖LLMs的内部推理能力，导致性能不佳。为了解决这些限制，我们利用LLM-based代理的最新进展，这些代理已经表现出利用外部知识或工具进行问题解决的能力。通过模拟类比和协作等人类问题解决策略，我们提出了一个基于LLMs的多代理系统，名为GraphTeam，用于图分析。GraphTeam由来自三个模块的五个LLM-based代理组成，具有不同专业知识的代理可以相互协作解决复杂问题。具体来说，（1）输入输出标准化模块：问题代理从原始问题中提取和完善四个关键参数，促进问题理解，答案代理组织结果以满足输出要求；（2）外部知识检索模块：我们首先构建一个由相关文档和经验信息组成的知识库，然后搜索代理检索每个问题的最相关条目。（3）问题解决模块：鉴于搜索代理检索到的信息，编码代理通过编程使用建立的算法生成解决方案，如果编码代理无法工作，推理代理将直接计算结果而无需编程。对六个图分析基准进行的大量实验表明，GraphTeam在准确性方面的表现优于最佳基线平均提高了25.85%。代码和数据可在https://github.com/BUPT-GAMMA/GraphTeam 上找到。

更新时间: 2024-10-23 17:02:59

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2410.18032v1

Exploring Large Language Models for Feature Selection: A Data-centric Perspective

The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, leveraging their exceptional few-shot and zero-shot learning capabilities. In this work, we aim to explore and understand the LLMs-based feature selection methods from a data-centric perspective. We begin by categorizing existing feature selection methods with LLMs into two groups: data-driven feature selection which requires numerical values of samples to do statistical inference and text-based feature selection which utilizes prior knowledge of LLMs to do semantical associations using descriptive context. We conduct experiments in both classification and regression tasks with LLMs in various sizes (e.g., GPT-4, ChatGPT and LLaMA-2). Our findings emphasize the effectiveness and robustness of text-based feature selection methods and showcase their potentials using a real-world medical application. We also discuss the challenges and future opportunities in employing LLMs for feature selection, offering insights for further research and development in this emerging field.

Updated: 2024-10-23 17:01:05

标题: 探索大型语言模型用于特征选择：数据中心的视角

摘要: 大型语言模型（LLMs）的快速发展显著影响了各个领域，利用它们出色的少样本和零样本学习能力。在这项工作中，我们旨在从数据中心的角度探索和理解基于LLMs的特征选择方法。我们首先将现有的LLMs特征选择方法分为两组：数据驱动的特征选择需要样本的数值来进行统计推断，而基于文本的特征选择则利用LLMs的先验知识使用描述性上下文进行语义关联。我们在各种大小的LLMs（例如GPT-4、ChatGPT和LLaMA-2）中进行分类和回归任务的实验。我们的研究结果强调了基于文本的特征选择方法的有效性和稳健性，并展示了它们在真实世界医疗应用中的潜力。我们还讨论了在特征选择中应用LLMs面临的挑战和未来机遇，为这一新兴领域的进一步研究和发展提供了见解。

更新时间: 2024-10-23 17:01:05

领域: cs.AI

下载: http://arxiv.org/abs/2408.12025v2

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Reinforcement learning with human feedback (RLHF) is shown to largely benefit from precise reward models (RMs). However, recent studies in reward modeling schemes are skewed towards English, limiting the applicability of RLHF in multilingual alignments. In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English. Our experimental results demonstrate the strong cross-lingual transfer of English RMs, exceeding target language RMs by 3~4% average increase in Multilingual RewardBench. Furthermore, we analyze the cross-lingual transfer of RMs through the representation shifts. Finally, we perform multilingual alignment to exemplify how cross-lingual transfer in RM propagates to enhanced multilingual instruction-following capability, along with extensive analyses on off-the-shelf RMs. We release the code, model, and data.

Updated: 2024-10-23 17:00:13

标题: 多语言对齐中奖励模型的跨语言转移

摘要: 使用人类反馈的强化学习（RLHF）被证明在精确的奖励模型（RMs）方面大大受益。然而，最近对奖励建模方案的研究偏向于英语，限制了RLHF在多语言对齐中的适用性。在这项工作中，我们调查了在多种语言中训练的RMs的跨语言转移，主要来自英语。我们的实验结果显示了英语RMs的强大跨语言转移，超过目标语言RMs，在多语言RewardBench中平均增加了3~4％。此外，我们通过表示转移分析了RMs的跨语言转移。最后，我们进行了多语言对齐，以示例说明RMs中的跨语言转移如何传播到增强的多语言指令遵循能力，并对现成的RMs进行了广泛的分析。我们发布了代码、模型和数据。

更新时间: 2024-10-23 17:00:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18027v1

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language models (VLMs) use separate modules for understanding and generating visual content, which can lead to misalignment and increased complexity. In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. This approach not only simplifies the model but also achieves near state-of-the-art performance in visual language understanding and generation. The success of VILA-U is attributed to two main factors: the unified vision tower that aligns discrete visual tokens with textual inputs during pretraining, which enhances visual perception, and autoregressive image generation can achieve similar quality as diffusion models with high-quality dataset. This allows VILA-U to perform comparably to more complex models using a fully token-based autoregressive framework.

Updated: 2024-10-23 16:42:06

标题: VILA-U：统一的基础模型，整合视觉理解和生成

摘要: VILA-U是一个统一的基础模型，整合了视频、图像、语言理解和生成。传统的视觉语言模型（VLMs）使用独立的模块来理解和生成视觉内容，这可能导致不一致性和增加复杂性。相比之下，VILA-U采用了一个单一的自回归下一个标记预测框架来执行这两个任务，消除了需要额外组件如扩散模型的必要性。这种方法不仅简化了模型，还在视觉语言理解和生成方面取得了接近最先进的性能。VILA-U的成功归因于两个主要因素：统一的视觉塔，在预训练期间将离散的视觉标记与文本输入对齐，增强了视觉感知，以及自回归图像生成可以通过高质量的数据集实现与扩散模型相似的质量。这使得VILA-U能够通过完全基于标记的自回归框架表现出与更复杂模型相当的性能。

更新时间: 2024-10-23 16:42:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.04429v2

STAR: SocioTechnical Approach to Red Teaming Language Models

This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerability by generating parameterised instructions for human red teamers, leading to improved coverage of the risk surface. Parameterised instructions also provide more detailed insights into model failures at no increased cost. Second, STAR improves signal quality by matching demographics to assess harms for specific groups, resulting in more sensitive annotations. STAR further employs a novel step of arbitration to leverage diverse viewpoints and improve label reliability, treating disagreement not as noise but as a valuable contribution to signal quality.

Updated: 2024-10-23 16:41:45

标题: STAR：面向红队测试语言模型的社会技术方法

摘要: 这项研究介绍了STAR，一个在当前最佳实践基础上改进的社会技术框架，用于改进大型语言模型红队安全性。STAR做出了两个关键贡献：通过为人类红队生成参数化指令来增强可控性，从而提高风险表面的覆盖范围。参数化指令还可以在不增加成本的情况下提供更详细的模型失败洞察。其次，STAR通过将人口统计数据匹配到特定群体的伤害评估中，提高了注释的敏感性。STAR进一步采用仲裁的新步骤，利用不同观点并改善标签可靠性，将分歧视为信号质量的有价值贡献，而非噪音。

更新时间: 2024-10-23 16:41:45

领域: cs.AI,cs.CL,cs.CY,cs.HC

下载: http://arxiv.org/abs/2406.11757v4

Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning

Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt and learn. Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in neural networks. This framework employs a feedforward network to process input data and a feedback network to process targets, with each network enhancing the other through anti-parallel signal propagation. By leveraging the more informative signals from the bottom layer of the feedback network to guide the updates of the top layer of the feedforward network and vice versa, CCL enables the simultaneous transformation of source inputs to target outputs and the dynamic mutual influence of these transformations. Experimental results on MNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets using multi-layer perceptrons and convolutional neural networks demonstrate that CCL achieves comparable performance to other biologically plausible algorithms while offering a more biologically realistic learning mechanism. Furthermore, we showcase the applicability of our approach to an autoencoder task, underscoring its potential for unsupervised representation learning. Our work presents a direction for biologically inspired and plausible learning algorithms, offering an alternative mechanism of learning and adaptation in neural networks.

Updated: 2024-10-23 16:27:27

标题: 逆向学习：一种生物学上合理的双网络深度学习方法

摘要: 尽管误差反向传播在神经网络中被广泛使用，但由于其缺乏生物学上的可信度而受到批评，存在着诸如反向锁定问题和权重传输问题等问题。这些限制促使研究人员探索更具生物学可信度的学习算法，这些算法可能会揭示生物神经系统如何适应和学习。受生物系统中观察到的逆流交换机制启发，我们提出了逆流学习（CCL），这是一个在神经网络中进行信用分配的生物学可信度框架。该框架采用前馈网络处理输入数据，反馈网络处理目标数据，每个网络通过反向信号传播相互增强。通过利用反馈网络底层更具信息量的信号来指导前馈网络顶层的更新，反之亦然，CCL实现了源输入到目标输出的同时转换，以及这些转换之间的动态相互影响。在MNIST、FashionMNIST、CIFAR10和CIFAR100数据集上使用多层感知器和卷积神经网络的实验结果表明，CCL在提供更具生物学可信度的学习机制的同时，实现了与其他生物学可信度算法相当的性能。此外，我们展示了我们的方法在自编码器任务中的适用性，突出了其在无监督表示学习中的潜力。我们的工作提出了一种生物启发和可信度学习算法的方向，提供了神经网络中学习和适应的另一种机制。

更新时间: 2024-10-23 16:27:27

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2409.19841v2

Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning

Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning, particularly in novel domains and complex logical sequences. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Our approach bridges LLM-generated ideas with formal logic verification, employing a custom interpreter to convert LLM outputs into First Order Logic constructs for theorem prover scrutiny. Central to our method is an intermediary JSON-based Domain-Specific Language, which by design balances precise logical structures with intuitive human concepts. This hybrid representation enables both rigorous validation and accessible human comprehension of LLM reasoning processes. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge, and a flexible architecture that allows for easy extension to various domain-specific applications. We demonstrate Proof of Thought's effectiveness through benchmarking on StrategyQA and a novel multimodal reasoning task, showing improved performance in open-ended scenarios. By providing verifiable and interpretable results, our technique addresses critical needs for AI system accountability and sets a foundation for human-in-the-loop oversight in high-stakes domains.

Updated: 2024-10-23 16:27:20

标题: 思想的证明：神经符号程序合成实现鲁棒和可解释的推理

摘要: 大型语言模型（LLMs）已经彻底改变了自然语言处理，但它们在不一致推理方面仍然存在困难，特别是在新领域和复杂逻辑序列中。本研究介绍了Proof of Thought，这是一个增强LLM输出可靠性和透明度的框架。我们的方法将LLM生成的思想与形式逻辑验证相结合，利用自定义解释器将LLM输出转换为一阶逻辑结构以供定理证明程序审查。我们方法的核心是中间JSON-based特定领域语言，通过设计平衡精确逻辑结构和直观人类概念。这种混合表示使LLM推理过程既可以进行严格验证又可以让人类理解。关键贡献包括强大的类型系统和排序管理，以增强逻辑完整性，明确表示规则以清楚区分事实和推理知识，以及灵活的架构，可轻松扩展到各种特定领域应用。我们通过在StrategyQA和一项新颖的多模态推理任务上进行基准测试来展示Proof of Thought的有效性，表明在开放式场景中表现出更好的性能。通过提供可验证和可解释的结果，我们的技术满足了AI系统问责制的关键需求，并为高风险领域中人类监督打下了基础。

更新时间: 2024-10-23 16:27:20

领域: cs.AI,cs.CL,cs.LG,cs.LO,cs.NE

下载: http://arxiv.org/abs/2409.17270v2

Inferring stability properties of chaotic systems on autoencoders' latent spaces

The data-driven learning of solutions of partial differential equations can be based on a divide-and-conquer strategy. First, the high dimensional data is compressed to a latent space with an autoencoder; and, second, the temporal dynamics are inferred on the latent space with a form of recurrent neural network. In chaotic systems and turbulence, convolutional autoencoders and echo state networks (CAE-ESN) successfully forecast the dynamics, but little is known about whether the stability properties can also be inferred. We show that the CAE-ESN model infers the invariant stability properties and the geometry of the tangent space in the low-dimensional manifold (i.e. the latent space) through Lyapunov exponents and covariant Lyapunov vectors. This work opens up new opportunities for inferring the stability of high-dimensional chaotic systems in latent spaces.

Updated: 2024-10-23 16:25:36

标题: 在自动编码器的潜空间推断混沌系统的稳定性特性

摘要: 基于数据驱动的偏微分方程解的学习可以基于分而治之的策略。首先，将高维数据压缩到一个潜变空间中，使用自动编码器；其次，在潜变空间中推断时间动态，使用一种递归神经网络形式。在混沌系统和湍流中，卷积自动编码器和回声状态网络（CAE-ESN）成功地预测了动态，但目前很少了解是否也可以推断出稳定性属性。我们展示了CAE-ESN模型通过Lyapunov指数和共变Lyapunov向量推断了不变的稳定性属性和低维流形（即潜变空间）中的切空间的几何。这项工作为在潜变空间中推断高维混沌系统的稳定性提供了新的机会。

更新时间: 2024-10-23 16:25:36

领域: cs.LG,nlin.CD

下载: http://arxiv.org/abs/2410.18003v1

Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation

Foundation models (FMs) have achieved significant success across various tasks, leading to research on benchmarks for reasoning abilities. However, there is a lack of studies on FMs performance in exceptional scenarios, which we define as out-of-distribution (OOD) reasoning tasks. This paper is the first to address these cases, developing a novel dataset for evaluation of FMs across multiple modalities, including graphic novels, calligraphy, news articles, and lyrics. It includes tasks for instance classification, character recognition, token prediction, and text generation. The paper also proposes prompt engineering techniques like Chain-of-Thought (CoT) and CoT+Few-Shot to enhance performance. Validation of FMs using various methods revealed improvements. The code repository is accessible at: https://github.com/MLAI-Yonsei/ExceptionalBenchmark

Updated: 2024-10-23 16:24:23

标题: 基于特殊案例的基准模型评估：数据集创建和验证

摘要: 基础模型（FMs）在各种任务中取得了显著成功，引发了关于推理能力基准的研究。然而，对于FMs在异常场景中的表现缺乏研究，我们将其定义为超出分布（OOD）的推理任务。本文首次解决了这些情况，开发了一个新颖的数据集，用于评估FMs在多个模态下的表现，包括图像小说、书法、新闻文章和歌词。其中包括实例分类、字符识别、标记预测和文本生成等任务。本文还提出了促使工程技术，如Chain-of-Thought（CoT）和CoT+Few-Shot，以增强性能。使用各种方法验证了FMs的改进。代码存储库可在以下链接中获取：https://github.com/MLAI-Yonsei/ExceptionalBenchmark

更新时间: 2024-10-23 16:24:23

领域: cs.AI

下载: http://arxiv.org/abs/2410.18001v1

Cyber-Physical Authentication Scheme for Secure V2G Transactions

The rapid adoption of electric vehicles (EVs) globally has catalyzed the need for robust cybersecurity measures within vehicle-to-grid (V2G) networks. As these networks are increasingly being integrated into smart charging infrastructures, they also introduce new vulnerabilities that threaten grid stability and user privacy This paper proposes a cyber-physical authentication protocol and trading smart contract tailored to plug and charge (PnC) operations within blockchain-based V2G systems. The protocol leverages advanced cryptographic techniques and blockchain to ensure secure, transparent, and tamper-proof energy transactions between EVs and charging stations. Key contributions include the development of a cyber-physical authentication method, the implementation of a smart contract framework for secure energy trading, and a detailed security and privacy analysis. The proposed protocol effectively mitigates risks such as man-in-the-middle (MitM) attacks and replay attacks while preserving user anonymity and data integrity.

Updated: 2024-10-23 16:22:34

标题: 网络物理认证方案用于安全的V2G交易

摘要: 全球电动汽车（EVs）的快速普及催生了车辆对电网（V2G）网络中健壮网络安全措施的需求。随着这些网络越来越多地整合到智能充电基础设施中，它们也引入了威胁电网稳定性和用户隐私的新漏洞。本文提出了一种针对基于区块链的V2G系统中插入充电（PnC）操作的网络身份验证协议和交易智能合约。该协议利用先进的加密技术和区块链，确保EV和充电站之间的能源交易安全、透明和防篡改。关键贡献包括开发网络身份验证方法、实施用于安全能源交易的智能合约框架以及详细的安全和隐私分析。所提议的协议有效地减轻了诸如中间人（MitM）攻击和重放攻击等风险，同时保持用户匿名性和数据完整性。

更新时间: 2024-10-23 16:22:34

领域: cs.CR

下载: http://arxiv.org/abs/2409.14008v3

AlleNoise: large-scale text classification benchmark dataset with real-world label noise

Label noise remains a challenge for training robust classification models. Most methods for mitigating label noise have been benchmarked using primarily datasets with synthetic noise. While the need for datasets with realistic noise distribution has partially been addressed by web-scraped benchmarks such as WebVision and Clothing1M, those benchmarks are restricted to the computer vision domain. With the growing importance of Transformer-based models, it is crucial to establish text classification benchmarks for learning with noisy labels. In this paper, we present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise, containing over 500,000 examples across approximately 5,600 classes, complemented with a meaningful, hierarchical taxonomy of categories. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. In addition to the noisy labels, we provide human-verified clean labels, which help to get a deeper insight into the noise distribution, unlike web-scraped datasets typically used in the field. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise. In addition, we show evidence that these algorithms do not alleviate excessive memorization. As such, with AlleNoise, we set the bar high for the development of label noise methods that can handle real-world label noise in text classification tasks. The code and dataset are available for download at https://github.com/allegro/AlleNoise.

Updated: 2024-10-23 16:19:06

标题: AlleNoise：具有真实标签噪声的大规模文本分类基准数据集

摘要: 标签噪声仍然是训练稳健分类模型的挑战。大多数减轻标签噪声的方法主要使用合成噪声的数据集进行基准测试。虽然通过网页抓取的基准测试数据集（如WebVision和Clothing1M）部分解决了具有现实噪声分布的数据集的需求，但这些基准测试仅限于计算机视觉领域。随着基于Transformer的模型日益重要，建立用于学习带有噪声标签的文本分类基准测试至关重要。在本文中，我们介绍了AlleNoise，这是一个新的精心策划的文本分类基准测试数据集，其中包含超过500,000个实例，涵盖约5,600个类别，配有有意义的分层类别分类。噪声分布来自一家大型电子商务市场的实际用户，因此它真实地反映了人类错误的语义。除了嘈杂的标签外，我们还提供经人工验证的干净标签，这有助于更深入地了解噪声分布，与通常用于该领域的网页抓取数据集不同。我们证明了代表性的已建立方法无法处理这种真实世界的噪声。此外，我们展示了这些算法无法减轻过度记忆的证据。因此，通过AlleNoise，我们为开发能够处理文本分类任务中真实世界标签噪声的方法设定了高标准。代码和数据集可在https://github.com/allegro/AlleNoise 上下载。

更新时间: 2024-10-23 16:19:06

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.10992v2

Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices

Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based upon dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.

Updated: 2024-10-23 16:12:59

标题: 估计核积分算子的谱矩阵元素从有限样本矩阵

摘要: 分析从输入数据分布中采样到的特征结构在输入和特征数量受限的情况下是具有挑战性的。传统方法通常依赖于从有限测量矩阵得出的样本协方差矩阵的特征值谱；然而，这些谱对于测量矩阵的大小敏感，导致了偏倚的见解。在本文中，我们介绍了一种新颖的算法，该算法在有限采样测量矩阵的情况下提供了无偏的核积分算子的谱矩估计，当输入和特征数量趋于无穷时。我们的方法基于动态规划，高效且能够估计算子谱的矩。我们在径向基函数（RBF）核上展示了我们估计器的准确性，突出了它与理论谱的一致性。此外，我们展示了我们的方法在理解神经网络中学习表示的几何结构方面的实用性和鲁棒性。

更新时间: 2024-10-23 16:12:59

领域: cs.LG,math.SP,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.17998v1

AI driven health recommender

As AI emerged as highest valued technology, We used that to create a web application that makes a patient work easier .It detects the disease name based on the symptoms given by the patient and recommends medication for respective disease, precautions to take, diet to follow and workouts to do, so the disease can be minimized. The web application is made with clean and Realtime data by using Machine learning as root. We used flask to create a user-friendly platform.

Updated: 2024-10-23 16:08:00

标题: 人工智能驱动的健康推荐系统

摘要: 随着人工智能成为最有价值的技术，我们利用它创建了一个使患者工作更加轻松的网络应用程序。该应用程序根据患者提供的症状检测疾病名称，并推荐相应的药物、需要采取的预防措施、需要遵循的饮食和需要进行的锻炼，以减少疾病的发生。通过使用机器学习技术处理干净且实时的数据，我们创建了这个网络应用程序。我们使用Flask创建了一个用户友好的平台。

更新时间: 2024-10-23 16:08:00

领域: cs.AI

下载: http://arxiv.org/abs/2410.17991v1

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.

Updated: 2024-10-23 16:05:11

标题: CondTSF：时间序列预测的一键式数据集压缩插件

摘要: 数据集压缩是一种新生技术，它生成一个小型数据集，可以用于训练深度神经网络，以降低训练成本。数据集压缩的目标是确保使用合成数据集训练的模型能够表现出与使用完整数据集训练的模型相当的性能。然而，现有方法主要集中在分类任务上，这在它们适应时间序列预测(TS-forecasting)方面面临挑战。这一挑战源于对合成数据评估的差异。在分类中，如果使用完整数据集训练的模型和使用合成数据集训练的模型对于相同输入产生相同的标签，则认为合成数据是经过良好压缩的，而不考虑输出logits分布的变化。相反，在时间序列预测中，合成数据的压缩效果取决于两个模型预测之间的距离。只有当预测中的所有数据点相似时，才认为合成数据是经过良好压缩的。因此，与分类相比，时间序列预测具有更严格的评估方法。为了弥合这一差距，我们从理论上分析了时间序列预测的数据集压缩的优化目标，并根据我们的分析提出了一种名为时间序列预测数据集压缩(CondTSF)的新型数据集压缩插件。将CondTSF插入先前的数据集压缩方法可以减小使用完整数据集训练的模型和使用合成数据集训练的模型之间的预测距离，从而提高性能。我们对八个常用的时间序列数据集进行了大量实验。CondTSF在所有数据集上始终提高了所有先前的数据集压缩方法的性能，特别是在低压缩比下。

更新时间: 2024-10-23 16:05:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02131v4

Learning a quantum computer's capability

Accurately predicting a quantum computer's capability -- which circuits it can run and how well it can run them -- is a foundational goal of quantum characterization and benchmarking. As modern quantum computers become increasingly hard to simulate, we must develop accurate and scalable predictive capability models to help researchers and stakeholders decide which quantum computers to build and use. In this work, we propose a hardware-agnostic method to efficiently construct scalable predictive models of a quantum computer's capability for almost any class of circuits, and demonstrate our method using convolutional neural networks (CNNs). Our CNN-based approach works by efficiently representing a circuit as a three-dimensional tensor and then using a CNN to predict its success rate. Our CNN capability models obtain approximately a $1\%$ average absolute prediction error when modeling processors experiencing both Markovian and non-Markovian stochastic Pauli errors. We also apply our CNNs to model the capabilities of cloud-access quantum computing systems, obtaining moderate prediction accuracy (average absolute error around $2-5\%$), and we highlight the challenges to building better neural network capability models.

Updated: 2024-10-23 16:03:19

标题: 学习量子计算机的能力

摘要: 准确预测量子计算机的能力——它可以运行哪些电路以及它们的运行效果如何——是量子特性和基准测试的基本目标。随着现代量子计算机变得越来越难以模拟，我们必须开发准确且可扩展的预测能力模型，以帮助研究人员和利益相关者决定建造和使用哪种量子计算机。在本研究中，我们提出了一种硬件无关的方法，可以高效地构建可扩展的量子计算机能力预测模型，适用于几乎任何类别的电路，并且展示了我们使用卷积神经网络（CNNs）的方法。我们基于CNN的方法通过将电路有效地表示为三维张量，然后使用CNN来预测其成功率。我们的CNN能力模型在建模经历马尔可夫和非马尔可夫随机Pauli错误的处理器时，平均绝对预测误差约为1％。我们还将我们的CNN应用于模拟云访问量子计算系统的能力，获得了中等的预测准确性（平均绝对误差约为2-5％），并突出了构建更好的神经网络能力模型所面临的挑战。

更新时间: 2024-10-23 16:03:19

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2304.10650v2

Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data

Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, cross-organizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked using fuzzy identifiers, leading to a common practice termed as multi-party fuzzy VFL. Existing models generally address either multi-party VFL or fuzzy VFL between two parties. Extending these models to practical multi-party fuzzy VFL typically results in significant performance degradation and increased costs for maintaining privacy. To overcome these limitations, we introduce the Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers. FeT innovatively encodes these identifiers into data representations and employs a transformer architecture distributed across different parties, incorporating three new techniques to enhance performance. Furthermore, we have developed a multi-party privacy framework for VFL that integrates differential privacy with secure multi-party computation, effectively protecting local representations while minimizing associated utility costs. Our experiments demonstrate that the FeT surpasses the baseline models by up to 46\% in terms of accuracy when scaled to 50 parties. Additionally, in two-party fuzzy VFL settings, FeT also shows improved performance and privacy over cutting-edge VFL models.

Updated: 2024-10-23 16:00:14

标题: 《联邦Transformer：实际模糊链接数据上的多方垂直联邦学习》

摘要: 联邦学习（FL）是一种不断发展的范式，使多个参与方能够在不共享原始数据的情况下协作训练模型。在其变体中，垂直联邦学习（VFL）在现实世界的跨组织合作中尤为相关，其中共享实例组的不同特征由不同参与方贡献。在这些情况下，参与方通常使用模糊标识符进行连接，导致一种常见做法被称为多方模糊VFL。现有模型通常要么解决多方VFL，要么解决两方之间的模糊VFL。将这些模型扩展到实际的多方模糊VFL通常会导致性能显著下降，以及维护隐私增加成本。为了克服这些限制，我们引入了联邦Transformer（FeT），一种支持带有模糊标识符的多方VFL的新框架。FeT创新地将这些标识符编码为数据表示，并采用分布在不同参与方之间的Transformer体系结构，结合三种新技术来提高性能。此外，我们还开发了一个多方隐私框架，将差分隐私与安全多方计算结合起来，有效保护本地表示，同时最小化相关的效用成本。我们的实验表明，FeT在扩展到50个参与方时，在准确性方面比基准模型提高了高达46％。此外，在两方模糊VFL设置中，FeT还显示出比最先进的VFL模型更好的性能和隐私保护。

更新时间: 2024-10-23 16:00:14

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.17986v1

Stick-breaking Attention

The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases to account for token order. But current methods using still face length generalisation challenges. We propose an alternative attention mechanism based on the stick-breaking process: For each token before the current, we determine a break point $\beta_{i,j}$, which represents the proportion of the remaining stick to allocate to the current token. We repeat the process until the stick is fully allocated, resulting in a sequence of attention weights. This process naturally incorporates recency bias, which has linguistic motivations for grammar parsing (Shen et. al., 2017). We study the implications of replacing the conventional softmax-based attention mechanism with stick-breaking attention. We then discuss implementation of numerically stable stick-breaking attention and adapt Flash Attention to accommodate this mechanism. When used as a drop-in replacement for current softmax+RoPE attention systems, we find that stick-breaking attention performs competitively with current methods on length generalisation and downstream tasks. Stick-breaking also performs well at length generalisation, allowing a model trained with $2^{11}$ context window to perform well at $2^{14}$ with perplexity improvements.

Updated: 2024-10-23 15:51:13

标题: 打破棍子的关注

摘要: 传统上，自注意机制依赖于softmax运算符，需要像RoPE这样的位置嵌入，或者位置偏差来考虑标记顺序。但是当前使用的方法仍然面临长度泛化挑战。我们提出了一种基于破碎棍子过程的替代注意机制：对于当前标记之前的每个标记，我们确定一个断点$ \beta_{i,j} $，它表示剩余棍子分配给当前标记的比例。我们重复这个过程直到棍子完全分配，得到一系列注意权重。这个过程自然地包含了最近偏差，这在语法解析方面有语言动机(Shen et. al., 2017)。我们研究了用破碎棍子注意替换传统的基于softmax的注意机制的含义。然后我们讨论了如何实现数值稳定的破碎棍子注意，并调整Flash Attention以适应这种机制。当作为当前softmax+RoPE注意系统的替代品使用时，我们发现破碎棍子注意在长度泛化和下游任务上与当前方法具有竞争力。破碎棍子也在长度泛化方面表现良好，使得使用$ 2^{11} $上下文窗口训练的模型在$ 2^{14} $上表现良好，困惑度得到改善。

更新时间: 2024-10-23 15:51:13

领域: cs.LG

下载: http://arxiv.org/abs/2410.17980v1

Federated Class-Incremental Learning with Hierarchical Generative Prototypes

Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients) while safeguarding data privacy. On top of that, Federated Continual Learning (FCL) also accounts for data distribution evolving over time, mirroring the dynamic nature of real-world environments. While previous studies have identified Catastrophic Forgetting and Client Drift as primary causes of performance degradation in FCL, we shed light on the importance of Incremental Bias and Federated Bias, which cause models to prioritize classes that are recently introduced or locally predominant, respectively. Our proposal constrains both biases in the last layer by efficiently finetuning a pre-trained backbone using learnable prompts, resulting in clients that produce less biased representations and more biased classifiers. Therefore, instead of solely relying on parameter aggregation, we leverage generative prototypes to effectively balance the predictions of the global model. Our method significantly improves the current State Of The Art, providing an average increase of +7.8% in accuracy. Code to reproduce the results is provided in the suppl. material.

Updated: 2024-10-23 15:48:45

标题: 使用层次生成原型进行联邦式逐类增量学习

摘要: 联邦学习（FL）旨在通过在多个设备（客户端）之间分布计算，从而减轻深层模型的训练负担，同时确保数据隐私。此外，联邦持续学习（FCL）还考虑数据分布随时间演变，反映了真实环境的动态性质。尽管先前的研究已经确定了灾难性遗忘和客户漂移作为FCL中性能下降的主要原因，我们揭示了增量偏差和联邦偏差的重要性，这导致模型优先考虑最近引入的类别或本地主导的类别。我们的提议通过使用可学习的提示有效地微调预训练的骨干结构，限制了最后一层中的两种偏差，从而使客户端产生更少偏见的表示和更有偏见的分类器。因此，我们不仅仅依靠参数聚合，还利用生成原型有效平衡全局模型的预测。我们的方法显著提高了当前的最新技术水平，平均准确率提高了+7.8％。重现结果的代码已在附加材料中提供。

更新时间: 2024-10-23 15:48:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.02447v3

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration

Despite their proficiency in math tasks, the mechanisms underlying LLMs' mathematical reasoning abilities remain a subject of debate. Recent studies suggest that chain-of-thought (CoT) prompts can bolster mathematical reasoning by encouraging LLMs to employ human-like logical reasoning (System 2), enabling them to excel on the Cognitive Reflection Test (CRT). To assess whether LLMs genuinely possess System 2-like logical reasoning, we introduced targeted modifications to CRT problems. Our findings reveal that, despite the use of CoT prompts, mainstream LLMs, including the latest o1-preview model, continue to exhibit a significant error rate. Further analysis indicates that they predominantly rely on System 1-like intuitive reasoning and pattern matching derived from training data, rather than demonstrating mastery of mathematical thinking. This discovery challenges the prevailing notion that LLMs possess genuine logical reasoning abilities and that CoT can enhance them. Consequently, this work may temper overly optimistic projections regarding LLMs' advancement toward artificial general intelligence.

Updated: 2024-10-23 15:43:28

标题: 大型语言模型是否真正掌握数学？一项实证探究

摘要: 尽管在数学任务中表现出色，但低阶语言模型（LLMs）数学推理能力的机制仍存在争议。最近的研究表明，思维链（CoT）提示可以通过鼓励LLMs采用类似人类的逻辑推理（系统2）来增强数学推理能力，从而使他们在认知反思测试（CRT）上表现出色。为了评估LLMs是否真正具有类似系统2的逻辑推理能力，我们对CRT问题进行了有针对性的修改。我们的研究结果显示，尽管使用了CoT提示，主流LLMs，包括最新的o1-preview模型，仍然表现出显著的错误率。进一步分析表明，他们主要依赖于类似系统1的直觉推理和基于训练数据推导出的模式匹配，而不是展示出数学思维的掌握。这一发现挑战了LLMs具有真正逻辑推理能力以及CoT可以增强这种能力的普遍看法。因此，这项工作可能会减弱对LLMs朝人工通用智能的过分乐观预期。

更新时间: 2024-10-23 15:43:28

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14979v2

Credal Learning Theory

Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learned from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not), as well as infinite model spaces, which directly generalize classical results.

Updated: 2024-10-23 15:40:23

标题: 信任学习理论

摘要: 统计学习理论是机器学习的基础，为从一个（单个）训练集中学习的模型的风险提供理论上的界限，假定这些模型来自未知的概率分布。然而，在实际部署中，数据分布可能（并经常）会发生变化，导致领域适应/泛化问题。在本文中，我们为学习的“信念”理论奠定了基础，使用概率凸集（信念集）来模拟数据生成分布的变化。我们认为，这种信念集可以从有限样本的训练集中推断出来。对于有限假设空间（无论是否可实现）以及无限模型空间的情况，我们推导出界限，直接推广了经典结果。

更新时间: 2024-10-23 15:40:23

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.00957v4

metasnf: Meta Clustering with Similarity Network Fusion in R

metasnf is an R package that enables users to apply meta clustering, a method for efficiently searching a broad space of cluster solutions by clustering the solutions themselves, to clustering workflows based on similarity network fusion (SNF). SNF is a multi-modal data integration algorithm commonly used for biomedical subtype discovery. The package also contains functions to assist with cluster visualization, characterization, and validation. This package can help researchers identify SNF-derived cluster solutions that are guided by context-specific utility over context-agnostic measures of quality.

Updated: 2024-10-23 15:39:08

标题: metasnf: 在R中使用相似性网络融合进行元聚类

摘要: metasnf是一个R包，使用户能够应用元聚类方法，通过对解决方案本身进行聚类，从而高效地搜索广泛的集群解决方案空间，用于基于相似性网络融合（SNF）的聚类工作流程。SNF是一种常用于生物医学亚型发现的多模态数据整合算法。该包还包含帮助进行集群可视化、特征化和验证的函数。该包可以帮助研究人员识别由特定上下文效用引导的SNF衍生的集群解决方案，而不是基于上下文不可知的质量度量。

更新时间: 2024-10-23 15:39:08

领域: stat.CO,cs.LG

下载: http://arxiv.org/abs/2410.17976v1

Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning

Spectrum access is an essential problem in device-to-device (D2D) communications. However, with the recent growth in the number of mobile devices, the wireless spectrum is becoming scarce, resulting in low spectral efficiency for D2D communications. To address this problem, this paper aims to integrate the ambient backscatter communication technology into D2D devices to allow them to backscatter ambient RF signals to transmit their data when the shared spectrum is occupied by mobile users. To obtain the optimal spectrum access policy, i.e., stay idle or access the shared spectrum and perform active transmissions or backscattering ambient RF signals for transmissions, to maximize the average throughput for D2D users, deep reinforcement learning (DRL) can be adopted. However, DRL-based solutions may require long training time due to the curse of dimensionality issue as well as complex deep neural network architectures. For that, we develop a novel quantum reinforcement learning (RL) algorithm that can achieve a faster convergence rate with fewer training parameters compared to DRL thanks to the quantum superposition and quantum entanglement principles. Specifically, instead of using conventional deep neural networks, the proposed quantum RL algorithm uses a parametrized quantum circuit to approximate an optimal policy. Extensive simulations then demonstrate that the proposed solution not only can significantly improve the average throughput of D2D devices when the shared spectrum is busy but also can achieve much better performance in terms of convergence rate and learning complexity compared to existing DRL-based methods.

Updated: 2024-10-23 15:36:43

标题: 环境背散射通信辅助的D2D系统中的动态频谱访问与量子强化学习

摘要: 频谱访问是设备对设备（D2D）通信中的一个关键问题。然而，随着移动设备数量的增长，无线频谱变得稀缺，导致D2D通信的频谱效率较低。为了解决这个问题，本文旨在将环境回波通信技术整合到D2D设备中，使它们能够将环境射频信号反射回传输数据，当共享频谱被移动用户占用时。为了获得最佳的频谱访问策略，即保持空闲或访问共享频谱并执行主动传输或反射环境射频信号进行传输，以最大化D2D用户的平均吞吐量，可以采用深度强化学习（DRL）。然而，基于DRL的解决方案可能需要长时间的训练，因为维度诅咒问题以及复杂的深度神经网络架构。因此，我们开发了一种新颖的量子强化学习（RL）算法，相比于DRL，该算法可以通过量子叠加和量子纠缠原则实现更快的收敛速度和更少的训练参数。具体来说，建议的量子RL算法使用参数化的量子电路来逼近最优策略，而不是使用传统的深度神经网络。然后进行大量模拟实验表明，所提出的解决方案不仅可以在共享频谱繁忙时显著提高D2D设备的平均吞吐量，而且在收敛速度和学习复杂性方面比现有基于DRL的方法表现得更好。

更新时间: 2024-10-23 15:36:43

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2410.17971v1

Optical Generative Models

Generative models cover various application areas, including image, video and music synthesis, natural language processing, and molecular design, among many others. As digital generative models become larger, scalable inference in a fast and energy-efficient manner becomes a challenge. Here, we present optical generative models inspired by diffusion models, where a shallow and fast digital encoder first maps random noise into phase patterns that serve as optical generative seeds for a desired data distribution; a jointly-trained free-space-based reconfigurable decoder all-optically processes these generative seeds to create novel images (never seen before) following the target data distribution. Except for the illumination power and the random seed generation through a shallow encoder, these optical generative models do not consume computing power during the synthesis of novel images. We report the optical generation of monochrome and multi-color novel images of handwritten digits, fashion products, butterflies, and human faces, following the data distributions of MNIST, Fashion MNIST, Butterflies-100, and Celeb-A datasets, respectively, achieving an overall performance comparable to digital neural network-based generative models. To experimentally demonstrate optical generative models, we used visible light to generate, in a snapshot, novel images of handwritten digits and fashion products. These optical generative models might pave the way for energy-efficient, scalable and rapid inference tasks, further exploiting the potentials of optics and photonics for artificial intelligence-generated content.

Updated: 2024-10-23 15:36:08

标题: 光学生成模型

摘要: 生成模型涵盖了各种应用领域，包括图像、视频和音乐合成、自然语言处理以及分子设计等多个领域。随着数字生成模型变得更大，以快速和高效的方式进行可扩展推断成为一个挑战。在这里，我们提出了受扩散模型启发的光学生成模型，其中一个浅层快速数字编码器首先将随机噪声映射到作为光学生成种子的相位模式，以生成所需的数据分布；一个经过联合训练的基于自由空间的可重构解码器通过全光方式处理这些生成种子，以依照目标数据分布创建新颖的图像（以前从未见过）。除了通过浅层编码器进行光照功率和随机种子生成外，这些光学生成模型在合成新图像时不消耗计算功率。我们报告了通过光学生成单色和多色手写数字、时尚产品、蝴蝶和人脸的新颖图像，分别遵循MNIST、Fashion MNIST、Butterflies-100和Celeb-A数据集的数据分布，整体性能与基于数字神经网络的生成模型相当。为了实验性地展示光学生成模型，我们使用可见光在快照中生成了手写数字和时尚产品的新颖图像。这些光学生成模型可能为能效高、可扩展和快速推断任务铺平道路，进一步开发光学和光子学在人工智能生成内容方面的潜力。

更新时间: 2024-10-23 15:36:08

领域: cs.NE,cs.LG,physics.app-ph,physics.optics

下载: http://arxiv.org/abs/2410.17970v1

On the potential of Optimal Transport in Geospatial Data Science

Prediction problems in geographic information science and transportation are often motivated by the possibility to enhance operational efficiency and thereby reduce emissions. Examples range from predicting car sharing demand for relocation planning to forecasting traffic congestion for navigation purposes. However, conventional accuracy metrics ignore the spatial distribution of the errors, despite its relevance for operations. Here, we put forward a spatially aware evaluation metric and loss function based on Optimal Transport (OT). Our framework leverages partial OT and can minimize relocation costs in any spatial prediction problem. We showcase the advantages of OT-based evaluation over conventional metrics and further demonstrate the application of an OT loss function for improving forecasts of bike sharing demand and charging station occupancy. Thus, our framework not only aligns with operational considerations, but also signifies a step forward in refining predictions within geospatial applications. All code is available at https://github.com/mie-lab/geospatialOT.

Updated: 2024-10-23 15:35:57

标题: 关于最优输运在地理空间数据科学中的潜力

摘要: 地理信息科学和交通领域的预测问题常常受到提高运营效率和减少排放的可能性的驱动。从预测共享汽车需求以进行搬迁规划到预测交通拥堵以用于导航等例子不一而足。然而，传统的准确度指标忽略了错误的空间分布，尽管这对运营至关重要。在这里，我们提出了一种基于最优输运（OT）的空间感知评估指标和损失函数。我们的框架利用了部分OT，并可以在任何空间预测问题中最小化搬迁成本。我们展示了基于OT评估的优势，相较于传统指标，并进一步展示了利用OT损失函数来改进自行车共享需求和充电站占用的预测的应用。因此，我们的框架不仅符合运营考虑，而且意味着在地理空间应用中改进预测的一大步。所有代码都可以在https://github.com/mie-lab/geospatialOT找到。

更新时间: 2024-10-23 15:35:57

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2410.11709v2

POMDP-Driven Cognitive Massive MIMO Radar: Joint Target Detection-Tracking In Unknown Disturbances

The joint detection and tracking of a moving target embedded in an unknown disturbance represents a key feature that motivates the development of the cognitive radar paradigm. Building upon recent advancements in robust target detection with multiple-input multiple-output (MIMO) radars, this work explores the application of a Partially Observable Markov Decision Process (POMDP) framework to enhance the tracking and detection tasks in a statistically unknown environment. In the POMDP setup, the radar system is considered as an intelligent agent that continuously senses the surrounding environment, optimizing its actions to maximize the probability of detection $(P_D)$ and improve the target position and velocity estimation, all this while keeping a constant probability of false alarm $(P_{FA})$. The proposed approach employs an online algorithm that does not require any apriori knowledge of the noise statistics, and it relies on a much more general observation model than the traditional range-azimuth-elevation model employed by conventional tracking algorithms. Simulation results clearly show substantial performance improvement of the POMDP-based algorithm compared to the State-Action-Reward-State-Action (SARSA)-based one that has been recently investigated in the context of massive MIMO (MMIMO) radar systems.

Updated: 2024-10-23 15:34:11

标题: POMDP驱动的认知大规模MIMO雷达：未知干扰下的联合目标检测跟踪

摘要: 文献摘要：在未知干扰中嵌入的移动目标的联合检测和跟踪代表了激发认知雷达范式发展的关键特征。借鉴最近在多输入多输出（MIMO）雷达中鲁棒目标检测方面的进展，本文探讨了在部分可观测马尔可夫决策过程（POMDP）框架下应用以增强在统计未知环境中的跟踪和检测任务。在POMDP设置中，雷达系统被视为一个智能体，不断感知周围环境，优化其行动以最大化检测概率（PD）并改善目标位置和速度估计，同时保持恒定的虚警概率（PFA）。所提出的方法采用在线算法，不需要任何先验知识噪声统计，并且依赖于比传统跟踪算法采用的传统的距离-方位-仰角模型更广泛的观测模型。仿真结果清楚地显示了基于POMDP的算法相比于最近在大规模MIMO（MMIMO）雷达系统背景下研究的基于状态-动作-奖励-状态-动作（SARSA）的算法的显著性能改进。

更新时间: 2024-10-23 15:34:11

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2410.17967v1

Quantum Architecture Search with Unsupervised Representation Learning

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

Updated: 2024-10-23 15:30:50

标题: 用无监督表示学习进行量子架构搜索

摘要: 无监督表示学习为推进在噪声中间规模量子（NISQ）设备上的量子架构搜索（QAS）提供了新的机会。 QAS 旨在优化变分量子算法（VQAs）的量子电路。大多数 QAS 算法紧密耦合搜索空间和搜索算法，通常需要评估大量量子电路，导致高计算成本，限制了对更大量子电路的可伸缩性。基于预测器的 QAS 算法通过基于结构或嵌入的电路性能估计来缓解这一问题。然而，这些方法通常需要耗时的标记，以优化跨多个电路的门参数，这对于训练准确的预测器至关重要。受经典神经架构搜索算法 Arch2vec 的启发，我们研究了无监督表示学习在 QAS 中的潜力，而不依赖于预测器。我们的框架将无监督架构表示学习与搜索过程解耦，使学习到的表示能够应用于各种下游任务。此外，它集成了一种改进的量子电路图编码方案，解决了现有表示的局限性，增强了搜索效率。这种无预测器的方法消除了大型标记数据集的需求。在搜索过程中，我们采用 REINFORCE 和贝叶斯优化来探索潜在表示空间，并将它们的性能与基准方法进行比较。我们的结果表明，该框架能够高效识别高性能量子电路，并且需要更少的搜索迭代。

更新时间: 2024-10-23 15:30:50

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2401.11576v3

A Time-Aware Approach to Early Detection of Anorexia: UNSL at eRisk 2024

The eRisk laboratory aims to address issues related to early risk detection on the Web. In this year's edition, three tasks were proposed, where Task 2 was about early detection of signs of anorexia. Early risk detection is a problem where precision and speed are two crucial objectives. Our research group solved Task 2 by defining a CPI+DMC approach, addressing both objectives independently, and a time-aware approach, where precision and speed are considered a combined single-objective. We implemented the last approach by explicitly integrating time during the learning process, considering the ERDE{\theta} metric as the training objective. It also allowed us to incorporate temporal metrics to validate and select the optimal models. We achieved outstanding results for the ERDE50 metric and ranking-based metrics, demonstrating consistency in solving ERD problems.

Updated: 2024-10-23 15:30:37

标题: 一种针对早期发现厌食症的时间感知方法：UNSL在eRisk 2024中的应用

摘要: eRisk实验室旨在解决与网络早期风险检测相关的问题。在今年的版本中，提出了三项任务，其中任务2是关于早期检测厌食症迹象的。早期风险检测是一个需要精度和速度的问题。我们的研究小组通过定义CPI+DMC方法解决了任务2，分别解决了这两个关键目标，并采用了一个考虑精度和速度作为一个单一目标的时间感知方法。我们通过在学习过程中显式地整合时间，并将ERDE{\theta}指标作为训练目标来实现最后一种方法。这也使我们能够将时间度量指标纳入验证和选择最佳模型。我们在ERDE50指标和基于排名的指标上取得了杰出的结果，表明在解决ERD问题时具有一致性。

更新时间: 2024-10-23 15:30:37

领域: cs.CY,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.17963v1

Closed-form merging of parameter-efficient modules for Federated Continual Learning

Model merging has emerged as a crucial technique in Deep Learning, enabling the integration of multiple models into a unified system while preserving performance and scalability. In this respect, the compositional properties of low-rank adaptation techniques (e.g., LoRA) have proven beneficial, as simple averaging LoRA modules yields a single model that mostly integrates the capabilities of all individual modules. Building on LoRA, we take a step further by imposing that the merged model matches the responses of all learned modules. Solving this objective in closed form yields an indeterminate system with A and B as unknown variables, indicating the existence of infinitely many closed-form solutions. To address this challenge, we introduce LoRM, an alternating optimization strategy that trains one LoRA matrix at a time. This allows solving for each unknown variable individually, thus finding a unique solution. We apply our proposed methodology to Federated Class-Incremental Learning (FCIL), ensuring alignment of model responses both between clients and across tasks. Our method demonstrates state-of-the-art performance across a range of FCIL scenarios.

Updated: 2024-10-23 15:30:13

标题: 对于联邦持续学习的参数高效模块的闭合形式融合

摘要: 模型合并已经成为深度学习中关键的技术，可以将多个模型整合到一个统一系统中，同时保持性能和可伸缩性。在这方面，低秩适应技术（例如LoRA）的组合特性已被证明是有益的，因为简单地对LoRA模块进行平均可以产生一个集成了所有单独模块能力的单一模型。在LoRA的基础上，我们通过要求合并的模型与所有学习模块的响应匹配，再进一步迈出了一步。通过闭合形式解决这个目标产生了一个具有A和B作为未知变量的不定系统，表明存在无限多个闭合形式解决方案。为了解决这一挑战，我们引入了LoRM，一个交替优化策略，一次训练一个LoRA矩阵。这样可以分别解决每个未知变量，从而找到唯一的解决方案。我们将我们提出的方法应用于联邦类增量学习（FCIL），确保在客户端和任务之间模型响应的对齐。我们的方法在一系列FCIL场景中展示了最先进的性能。

更新时间: 2024-10-23 15:30:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17961v1

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Drug-target interaction (DTI) prediction is crucial for identifying new therapeutics and detecting mechanisms of action. While structure-based methods accurately model physical interactions between a drug and its protein target, cell-based assays such as Cell Painting can better capture complex DTI interactions. This paper introduces MOTIVE, a Morphological cOmpound Target Interaction Graph dataset comprising Cell Painting features for 11,000 genes and 3,600 compounds, along with their relationships extracted from seven publicly available databases. We provide random, cold-source (new drugs), and cold-target (new genes) data splits to enable rigorous evaluation under realistic use cases. Our benchmark results show that graph neural networks that use Cell Painting features consistently outperform those that learn from graph structure alone, feature-based models, and topological heuristics. MOTIVE accelerates both graph ML research and drug discovery by promoting the development of more reliable DTI prediction models. MOTIVE resources are available at https://github.com/carpenter-singh-lab/motive.

Updated: 2024-10-23 15:29:28

标题: 动机：一种用于归纳链接预测的药物-靶标相互作用图

摘要: 药物靶标相互作用（DTI）预测对于发现新的治疗药物和检测作用机制至关重要。虽然基于结构的方法准确地模拟了药物与其蛋白靶标之间的物理相互作用，但基于细胞的实验，如细胞绘画，可以更好地捕捉复杂的DTI相互作用。本文介绍了MOTIVE，一个包含11000个基因和3600种化合物的细胞绘画特征的形态化合物靶标相互作用图数据集，以及从七个公开可用数据库中提取的它们之间的关系。我们提供了随机、冷源（新药物）和冷目标（新基因）数据拆分，以便在真实使用情况下进行严格评估。我们的基准结果显示，使用细胞绘画特征的图神经网络始终优于仅从图结构学习、基于特征的模型和拓扑启发式方法。MOTIVE通过促进更可靠的DTI预测模型的开发，加速了图机器学习研究和药物发现。MOTIVE资源可在https://github.com/carpenter-singh-lab/motive上获得。

更新时间: 2024-10-23 15:29:28

领域: cs.LG

下载: http://arxiv.org/abs/2406.08649v2

Bounded KRnet and its applications to density estimation and approximation

In this paper, we develop an invertible mapping, called B-KRnet, on a bounded domain and apply it to density estimation/approximation for data or the solutions of PDEs such as the Fokker-Planck equation and the Keller-Segel equation. Similar to KRnet, the structure of B-KRnet adapts the pseudo-triangular structure into a normalizing flow model. The main difference between B-KRnet and KRnet is that B-KRnet is defined on a hypercube while KRnet is defined on the whole space, in other words, a new mechanism is introduced in B-KRnet to maintain the exact invertibility. Using B-KRnet as a transport map, we obtain an explicit probability density function (PDF) model that corresponds to the pushforward of a prior (uniform) distribution on the hypercube. It can be directly applied to density estimation when only data are available. By coupling KRnet and B-KRnet, we define a deep generative model on a high-dimensional domain where some dimensions are bounded and other dimensions are unbounded. A typical case is the solution of the stationary kinetic Fokker-Planck equation, which is a PDF of position and momentum. Based on B-KRnet, we develop an adaptive learning approach to approximate partial differential equations whose solutions are PDFs or can be treated as PDFs. A variety of numerical experiments is presented to demonstrate the effectiveness of B-KRnet.

Updated: 2024-10-23 15:28:59

标题: 有界KRnet及其在密度估计和逼近中的应用

摘要: 在这篇论文中，我们在一个有界域上开发了一种可逆映射，称为B-KRnet，并将其应用于数据或PDE解的密度估计或逼近，比如福克-普朗克方程和凯勒-塞格尔方程。与KRnet类似，B-KRnet的结构将伪三角结构调整为一个归一化流模型。B-KRnet和KRnet之间的主要区别在于B-KRnet定义在一个超立方体上，而KRnet定义在整个空间上，换句话说，在B-KRnet中引入了一种新的机制来保持精确的可逆性。将B-KRnet用作传输映射，我们得到一个明确的概率密度函数（PDF）模型，对应于在超立方体上先验（均匀）分布的推移。当只有数据可用时，它可以直接应用于密度估计。通过将KRnet和B-KRnet耦合，我们在一个高维域上定义了一个深度生成模型，其中一些维度是有界的，而其他维度是无界的。一个典型案例是静止动力学福克-普朗克方程的解，这是位置和动量的PDF。基于B-KRnet，我们开发了一种自适应学习方法来逼近其解为PDF或可视为PDF的偏微分方程。展示了各种数值实验以证明B-KRnet的有效性。

更新时间: 2024-10-23 15:28:59

领域: cs.LG

下载: http://arxiv.org/abs/2305.09063v3

StockGPT: A GenAI Model for Stock Prediction and Trading

This paper introduces StockGPT, an autoregressive ``number'' model trained and tested on 70 million daily U.S.\ stock returns over nearly 100 years. Treating each return series as a sequence of tokens, StockGPT automatically learns the hidden patterns predictive of future returns via its attention mechanism. On a held-out test sample from 2001 to 2023, daily and monthly rebalanced long-short portfolios formed from StockGPT predictions yield strong performance. The StockGPT-based portfolios span momentum and long-/short-term reversals, eliminating the need for manually crafted price-based strategies, and yield highly significant alphas against leading stock market factors, suggesting a novel AI pricing effect. This highlights the immense promise of generative AI in surpassing human in making complex financial investment decisions.

Updated: 2024-10-23 15:28:45

标题: StockGPT：一种用于股票预测和交易的GenAI模型

摘要: 这篇论文介绍了StockGPT，这是一个自回归的“数值”模型，经过训练和测试在近100年的7000万美国股票日收益率上。将每个收益率序列视为一个令牌序列，StockGPT通过其注意机制自动学习预测未来收益率的隐藏模式。在从2001年到2023年的持有测试样本上，基于StockGPT预测形成的每日和每月重新平衡的多空组合表现强劲。基于StockGPT的投资组合涵盖了动量和长期/短期逆转，消除了手工制作基于价格的策略的需求，并针对领先的股市因素产生高度显著的阿尔法，表明一种新颖的AI定价效应。这凸显了生成式AI在超越人类在进行复杂金融投资决策方面的巨大潜力。

更新时间: 2024-10-23 15:28:45

领域: q-fin.CP,cs.AI,q-fin.PM,q-fin.PR,q-fin.ST

下载: http://arxiv.org/abs/2404.05101v3

Linear Adversarial Concept Erasure

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

Updated: 2024-10-23 15:28:38

标题: 线性对抗概念消除

摘要: 在文本数据上训练的现代神经模型依赖于在没有直接监督的情况下产生的预训练表示。随着这些表示越来越多地在实际应用中使用，无法\emph{控制}它们内容的能力变得越来越重要。我们制定了识别和擦除与给定概念对应的线性子空间的问题，以防止线性预测器恢复该概念。我们将这个问题建模为一个受约束的线性最大最小博弈，并表明现有解决方案通常不适用于这个任务。我们推导了一些目标的闭合形式解，并提出了一种凸松弛方法\method，适用于其他目标。在二元性别去除的背景下评估时，该方法恢复了一个低维子空间，其去除减轻了内在和外在评估的偏见。我们表明该方法具有很高的表达能力，在保持可处理性和可解释性的同时，有效地减轻了深度非线性分类器中的偏见。

更新时间: 2024-10-23 15:28:38

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2201.12091v5

Medical Imaging Complexity and its Effects on GAN Performance

The proliferation of machine learning models in diverse clinical applications has led to a growing need for high-fidelity, medical image training data. Such data is often scarce due to cost constraints and privacy concerns. Alleviating this burden, medical image synthesis via generative adversarial networks (GANs) emerged as a powerful method for synthetically generating photo-realistic images based on existing sets of real medical images. However, the exact image set size required to efficiently train such a GAN is unclear. In this work, we experimentally establish benchmarks that measure the relationship between a sample dataset size and the fidelity of the generated images, given the dataset's distribution of image complexities. We analyze statistical metrics based on delentropy, an image complexity measure rooted in Shannon's entropy in information theory. For our pipeline, we conduct experiments with two state-of-the-art GANs, StyleGAN 3 and SPADE-GAN, trained on multiple medical imaging datasets with variable sample sizes. Across both GANs, general performance improved with increasing training set size but suffered with increasing complexity.

Updated: 2024-10-23 15:28:25

标题: 医学影像复杂性及其对GAN性能的影响

摘要: 机器学习模型在不同的临床应用中的泛滥导致了对高保真度医学图像训练数据日益增长的需求。由于成本约束和隐私问题，这类数据通常稀缺。通过生成对抗网络（GAN）进行医学图像合成成为一种强大的方法，可以基于现有真实医学图像集合合成生成逼真的图像。然而，要有效训练这样一个GAN所需的确切图像集大小尚不清楚。在这项工作中，我们实验性地建立了基准，衡量了样本数据集大小与生成图像保真度之间的关系，考虑到数据集中图像复杂性的分布。我们分析了基于delentropy的统计指标，这是一种根植于信息理论中香农熵的图像复杂度度量。对于我们的流程，我们使用两种最先进的GAN，StyleGAN 3和SPADE-GAN，在多个医学成像数据集上进行实验，其中样本大小可变。在两种GAN中，随着训练集大小的增加，总体性能提高，但随着复杂性的增加，性能受到影响。

更新时间: 2024-10-23 15:28:25

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17959v1

MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers

In this paper, we propose MCUBERT to enable language models like BERT on tiny microcontroller units (MCUs) through network and scheduling co-optimization. We observe the embedding table contributes to the major storage bottleneck for tiny BERT models. Hence, at the network level, we propose an MCU-aware two-stage neural architecture search algorithm based on clustered low-rank approximation for embedding compression. To reduce the inference memory requirements, we further propose a novel fine-grained MCU-friendly scheduling strategy. Through careful computation tiling and re-ordering as well as kernel design, we drastically increase the input sequence lengths supported on MCUs without any latency or accuracy penalty. MCUBERT reduces the parameter size of BERT-tiny and BERT-mini by 5.7$\times$ and 3.0$\times$ and the execution memory by 3.5$\times$ and 4.3$\times$, respectively. MCUBERT also achieves 1.5$\times$ latency reduction. For the first time, MCUBERT enables lightweight BERT models on commodity MCUs and processing more than 512 tokens with less than 256KB of memory.

Updated: 2024-10-23 15:27:37

标题: MCUBERT：基于通用微控制器的高效内存利用BERT推理

摘要: 在本文中，我们提出了MCUBERT，通过网络和调度的共同优化，使类似BERT的语言模型能够在微小微控制器单元（MCUs）上运行。我们观察到嵌入表对于微小BERT模型来说是主要的存储瓶颈。因此，在网络层面上，我们提出了一种基于聚类低秩逼近的面向MCU的两阶段神经架构搜索算法，用于嵌入压缩。为了减少推理内存需求，我们进一步提出了一种新颖的细粒度MCU友好调度策略。通过精心的计算切片和重新排序以及内核设计，我们显著增加了MCU支持的输入序列长度，而不会有任何延迟或准确性损失。MCUBERT将BERT-tiny和BERT-mini的参数大小分别减少了5.7倍和3.0倍，并将执行内存减少了3.5倍和4.3倍。MCUBERT还实现了1.5倍的延迟减少。首次，MCUBERT使得轻量级BERT模型能够在普通MCUs上运行，并处理超过512个令牌，内存不超过256KB。

更新时间: 2024-10-23 15:27:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17957v1

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

Sparse Mixture of Experts (MoE) models, while outperforming dense Large Language Models (LLMs) in terms of performance, face significant deployment challenges during inference due to their high memory demands. Existing offloading techniques, which involve swapping activated and idle experts between the GPU and CPU, often suffer from rigid expert caching mechanisms. These mechanisms fail to adapt to dynamic routing, leading to inefficient cache utilization, or incur prohibitive costs for prediction training. To tackle these inference-specific challenges, we introduce ExpertFlow, a comprehensive system specifically designed to enhance inference efficiency by accommodating flexible routing and enabling efficient expert scheduling between CPU and GPU. This reduces overhead and boosts system performance. Central to our approach is a predictive routing path-based offloading mechanism that utilizes a lightweight predictor to accurately forecast routing paths before computation begins. This proactive strategy allows for real-time error correction in expert caching, significantly increasing cache hit ratios and reducing the frequency of expert transfers, thereby minimizing I/O overhead. Additionally, we implement a dynamic token scheduling strategy that optimizes MoE inference by rearranging input tokens across different batches. This method not only reduces the number of activated experts per batch but also improves computational efficiency. Our extensive experiments demonstrate that ExpertFlow achieves up to 93.72\% GPU memory savings and enhances inference speed by 2 to 10 times compared to baseline methods, highlighting its effectiveness and utility as a robust solution for resource-constrained inference scenarios.

Updated: 2024-10-23 15:24:54

标题: ExpertFlow：针对高效专家混合推理进行优化的专家激活和标记分配

摘要: 稀疏专家混合（MoE）模型在性能方面优于密集大型语言模型（LLMs），但在推断过程中面临着高内存需求的重大部署挑战。现有的卸载技术涉及在GPU和CPU之间交换激活和空闲专家，通常受到僵化专家缓存机制的影响。这些机制无法适应动态路由，导致缓存利用效率低下，或者在预测训练中产生高昂的成本。为了解决这些推断特定的挑战，我们引入了ExpertFlow，这是一个专门设计用于增强推断效率的综合系统，可以容纳灵活的路由并实现CPU和GPU之间的有效专家调度。这减少了开销并提升了系统性能。我们方法的核心是一种基于预测路由路径的卸载机制，利用轻量级预测器在计算开始之前准确预测路由路径。这种主动策略允许在专家缓存中进行实时错误校正，显著增加缓存命中率，减少专家转移频率，从而最小化I/O开销。此外，我们实现了一种动态令牌调度策略，通过重新排列不同批次的输入令牌来优化MoE推断。这种方法不仅减少了每批次激活的专家数量，还提高了计算效率。我们的广泛实验表明，ExpertFlow相比基线方法可以实现高达93.72％的GPU内存节省，并将推断速度提高2到10倍，突显其作为资源受限推断场景的强大解决方案的有效性和实用性。

更新时间: 2024-10-23 15:24:54

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17954v1

SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains

Retrieval-augmented generation (RAG) enhances the question-answering (QA) abilities of large language models (LLMs) by integrating external knowledge. However, adapting general-purpose RAG systems to specialized fields such as science and medicine poses unique challenges due to distribution shifts and limited access to domain-specific data. To tackle this, we propose SimRAG, a self-training approach that equips the LLM with joint capabilities of question answering and question generation for domain adaptation. Our method first fine-tunes the LLM on instruction-following, question-answering, and search-related data. Then, it prompts the same LLM to generate diverse domain-relevant questions from unlabeled corpora, with an additional filtering strategy to retain high-quality synthetic examples. By leveraging these synthetic examples, the LLM can improve their performance on domain-specific RAG tasks. Experiments on 11 datasets, spanning two backbone sizes and three domains, demonstrate that SimRAG outperforms baselines by 1.2\%--8.6\%.

Updated: 2024-10-23 15:24:16

标题: SimRAG：自我改进的检索增强生成，用于将大型语言模型适应专业领域

摘要: 检索增强生成（RAG）通过整合外部知识增强了大型语言模型（LLMs）的问答能力。然而，将通用RAG系统调整到科学和医学等专业领域面临着由于分布偏移和领域特定数据有限而带来的独特挑战。为了解决这个问题，我们提出了SimRAG，一种自我训练方法，为LLM提供了问题回答和问题生成的联合能力，以进行领域适应。我们的方法首先在指令遵循、问答和搜索相关数据上对LLM进行微调。然后，它提示同一LLM从未标记的语料库中生成多样化的与领域相关的问题，并采用额外的过滤策略来保留高质量的合成示例。通过利用这些合成示例，LLM可以提高其在领域特定RAG任务上的性能。跨越两种骨干大小和三个领域的11个数据集的实验证明，SimRAG的表现优于基线1.2%至8.6%。

更新时间: 2024-10-23 15:24:16

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.17952v1

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling

Large Language Models (LLMs) have shown remarkable capabilities in various domains, yet their economic impact has been limited by challenges in tool use and function calling. This paper introduces ThorV2, a novel architecture that significantly enhances LLMs' function calling abilities. We develop a comprehensive benchmark focused on HubSpot CRM operations to evaluate ThorV2 against leading models from OpenAI and Anthropic. Our results demonstrate that ThorV2 outperforms existing models in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks. We also show that ThorV2 is far more reliable and scales better to multistep tasks compared to traditional models. Our work offers the tantalizing possibility of more accurate function-calling compared to today's best-performing models using significantly smaller LLMs. These advancements have significant implications for the development of more capable AI assistants and the broader application of LLMs in real-world scenarios.

Updated: 2024-10-23 15:23:23

标题: 将Floworks与OpenAI和Anthropic进行基准测试：一种增强LLM函数调用的新框架

摘要: 大型语言模型（LLMs）在各个领域展现出了显著的能力，但由于在工具使用和功能调用方面存在挑战，它们的经济影响受到了限制。本文介绍了一种名为ThorV2的新颖架构，显著增强了LLMs的功能调用能力。我们开发了一个专注于HubSpot CRM操作的全面基准测试，以评估ThorV2与OpenAI和Anthropic等领先模型的性能。我们的结果表明，ThorV2在单一和多API调用任务的准确性、可靠性、延迟和成本效率方面均优于现有模型。我们还展示了与传统模型相比，ThorV2更可靠并且更适合多步任务的扩展。我们的工作为使用较小的LLMs实现比今天表现最佳的模型更准确的功能调用提供了诱人的可能性。这些进展对于开发更强大的AI助手以及在真实场景中更广泛应用LLMs具有重要意义。

更新时间: 2024-10-23 15:23:23

领域: cs.AI

下载: http://arxiv.org/abs/2410.17950v1

Generalized Resubstitution for Regression Error Estimation

We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.

Updated: 2024-10-23 15:22:21

标题: 回归误差估计的广义再代入

摘要: 我们提出了用于回归的广义重训练误差估计器，这是一个广泛的估计器家族，每个对应于经验概率测度和损失函数的选择。通常的平方和标准是一个特例，对应于标准的经验概率测度和二次损失。其他选择经验概率测度的方法导致更一般的估计器，具有更好的偏差和方差性质。我们证明这些误差估计器在广泛假设下是一致的。此外，基于矩方法和最大伪似然方法提出和研究了选择经验测度的程序。使用多项式回归的详细实验结果实证证明了所提出的估计器的优越有限样本偏差和方差性质。实验的R代码已提供。

更新时间: 2024-10-23 15:22:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.17948v1

Theoretically Grounded Pruning of Large Ground Sets for Constrained, Discrete Optimization

Modern instances of combinatorial optimization problems often exhibit billion-scale ground sets, which have many uninformative or redundant elements. In this work, we develop light-weight pruning algorithms to quickly discard elements that are unlikely to be part of an optimal solution. Under mild assumptions on the instance, we prove theoretical guarantees on the fraction of the optimal value retained and the size of the resulting pruned ground set. Through extensive experiments on real-world datasets for various applications, we demonstrate that our algorithm, QuickPrune, efficiently prunes over 90% of the ground set and outperforms state-of-the-art classical and machine learning heuristics for pruning.

Updated: 2024-10-23 15:18:07

标题: 基于理论的对受约束的离散优化中的大地面集的修剪

摘要: 现代组合优化问题通常涉及具有数十亿规模的底层集合，其中包含许多无信息或冗余元素。在这项工作中，我们开发了轻量级修剪算法，快速丢弃不太可能是最优解的元素。在实例的温和假设下，我们证明了在保留的最优值分数和修剪后的底层集合大小上的理论保证。通过对各种应用程序的真实数据集进行广泛实验，我们展示了我们的算法QuickPrune能够有效修剪超过90%的底层集合，并且在修剪方面优于最先进的经典和机器学习启发式算法。

更新时间: 2024-10-23 15:18:07

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2410.17945v1

Optimizing Travel Itineraries with AI Algorithms in a Microservices Architecture: Balancing Cost, Time, Preferences, and Sustainability

The objective of this research is how an implementation of AI algorithms in the microservices architecture enhances travel itineraries by cost, time, user preferences, and environmental sustainability. It uses machine learning models for both cost forecasting and personalization, genetic algorithm for optimization of the itinerary, and heuristics for sustainability checking. Primary evaluated parameters consist of latency, ability to satisfy user preferences, cost and environmental concern. The experimental results demonstrate an average of 4.5 seconds of response time on 1000 concurrent users and 92% of user preferences accuracy. The cost efficiency is proved, with 95% of provided trips being within the limits of the budget declared by the user. The system also implements some measures to alleviate negative externalities related to travel and 60% of offered travel plans had green options incorporated, resulting in the average 15% lower carbon emissions than the traditional travel plans offered. The genetic algorithm with time complexity O(g.p.f) provides the optimal solution in 100 generations. Every iteration improves the quality of the solution by 5%, thus enabling its effective use in optimization problems where time is measured in seconds. Finally, the system is designed to be fault-tolerant with functional 99.9% availability which allows the provision of services even when requirements are exceeded. Travel optimization platform is turned dynamic and efficient by this microservices based architecture which provides enhanced scaling, allows asynchronous communication and real time changes. Because of the incorporation of Ai, cost control and eco-friendliness approaches, the system addresses the different user needs in the present days travel business.

Updated: 2024-10-23 15:15:56

标题: 在微服务架构中使用AI算法优化旅行行程：平衡成本、时间、偏好和可持续性

摘要: 这项研究的目标是通过在微服务架构中实施AI算法，提高旅行行程的成本、时间、用户偏好和环境可持续性。它使用机器学习模型进行成本预测和个性化，使用遗传算法优化行程，使用启发式算法检查可持续性。评估的主要参数包括延迟、满足用户偏好的能力、成本和环境关注。实验结果显示，在1000个并发用户上平均响应时间为4.5秒，用户偏好准确率为92%。成本效率得到证明，提供的旅行中95%在用户声明的预算范围内。系统还实施了一些措施来减轻与旅行相关的负面外部性，60%的旅行计划中包含绿色选项，导致平均碳排放量比传统旅行计划低15%。遗传算法的时间复杂度为O(g.p.f)，在100代中提供了最优解。每次迭代使解决方案的质量提高5%，因此能够在时间以秒计算的优化问题中有效使用。最后，系统设计为容错，功能可用率达到99.9%，即使超出要求，也可以提供服务。这种基于微服务的架构使旅行优化平台变得动态高效，提供增强的扩展性，允许异步通信和实时更改。由于AI、成本控制和环保方法的引入，该系统满足了当前旅行业务中不同用户需求。

更新时间: 2024-10-23 15:15:56

领域: cs.SE,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.17943v1

Spiking Graph Neural Network on Riemannian Manifolds

Graph neural networks (GNNs) have become the dominant solution for learning on graphs, the typical non-Euclidean structures. Conventional GNNs, constructed with the Artificial Neuron Network (ANN), have achieved impressive performance at the cost of high computation and energy consumption. In parallel, spiking GNNs with brain-like spiking neurons are drawing increasing research attention owing to the energy efficiency. So far, existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry, and suffer from the high latency issue due to Back-Propagation-Through-Time (BPTT) with the surrogate gradient. In light of the aforementioned issues, we are devoted to exploring spiking GNN on Riemannian manifolds, and present a Manifold-valued Spiking GNN (MSG). In particular, we design a new spiking neuron on geodesically complete manifolds with the diffeomorphism, so that BPTT regarding the spikes is replaced by the proposed differentiation via manifold. Theoretically, we show that MSG approximates a solver of the manifold ordinary differential equation. Extensive experiments on common graphs show the proposed MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.

Updated: 2024-10-23 15:09:02

标题: 在黎曼流形上的尖峰图神经网络

摘要: 图神经网络（GNNs）已成为学习图形的主要解决方案，典型的非欧几里得结构。传统的GNNs，构建于人工神经网络（ANN），在高计算和能耗的代价下取得了令人瞩目的性能。与此同时，具有类似大脑的尖峰神经元的尖峰GNNs正在引起越来越多的研究关注，这归功于其能效。到目前为止，现有的尖峰GNNs将图形视为欧几里得空间，忽略了结构几何，由于通过替代梯度的时间反向传播（BPTT），受到高延迟问题的困扰。考虑到上述问题，我们致力于在黎曼流形上探索尖峰GNN，并提出了一种流形值尖峰GNN（MSG）。具体而言，我们设计了一个新的尖峰神经元，位于具有微分同胚的测地完备流形上，以便将关于尖峰的BPTT替换为通过流形的推导。理论上，我们表明MSG近似于流形常微分方程的解。对常见图形进行的大量实验表明，所提出的MSG在性能上优于先前的尖峰GNNs，并且在能效方面优于传统的GNNs。

更新时间: 2024-10-23 15:09:02

领域: cs.LG

下载: http://arxiv.org/abs/2410.17941v1

Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis

Limited accessibility to neurological care leads to underdiagnosed Parkinson's Disease (PD), preventing early intervention. Existing AI-based PD detection methods primarily focus on unimodal analysis of motor or speech tasks, overlooking the multifaceted nature of the disease. To address this, we introduce a large-scale, multi-task video dataset consisting of 1102 sessions (each containing videos of finger tapping, facial expression, and speech tasks captured via webcam) from 845 participants (272 with PD). We propose a novel Uncertainty-calibrated Fusion Network (UFNet) that leverages this multimodal data to enhance diagnostic accuracy. UFNet employs independent task-specific networks, trained with Monte Carlo Dropout for uncertainty quantification, followed by self-attended fusion of features, with attention weights dynamically adjusted based on task-specific uncertainties. To ensure patient-centered evaluation, the participants were randomly split into three sets: 60% for training, 20% for model selection, and 20% for final performance evaluation. UFNet significantly outperformed single-task models in terms of accuracy, area under the ROC curve (AUROC), and sensitivity while maintaining non-inferior specificity. Withholding uncertain predictions further boosted the performance, achieving 88.0+-0.3%$ accuracy, 93.0+-0.2% AUROC, 79.3+-0.9% sensitivity, and 92.6+-0.3% specificity, at the expense of not being able to predict for 2.3+-0.3% data (+- denotes 95% confidence interval). Further analysis suggests that the trained model does not exhibit any detectable bias across sex and ethnic subgroups and is most effective for individuals aged between 50 and 80. Requiring only a webcam and microphone, our approach facilitates accessible home-based PD screening, especially in regions with limited healthcare resources.

Updated: 2024-10-23 15:08:59

标题: 通过多任务视频分析在家可访问的帕金森病检测

摘要: 有限的神经学护理可及性导致帕金森病（PD）被低估，阻碍了早期干预。现有基于人工智能的PD检测方法主要侧重于对运动或语音任务的单模态分析，忽视了疾病的多方面特性。为了解决这个问题，我们引入了一个包含1102个会话（每个会话包含通过网络摄像头捕获的手指轻击、面部表情和语音任务视频）的大规模多任务视频数据集，来自845名参与者（其中272名患有PD）。我们提出了一种新颖的不确定性校准融合网络（UFNet），利用这个多模态数据来增强诊断准确性。UFNet采用独立的任务特定网络，通过蒙特卡洛辍学进行不确定性量化训练，然后通过自我关注特征融合，根据特定任务的不确定性动态调整注意力权重。为了确保以患者为中心的评估，参与者被随机分为三组：60%用于训练，20%用于模型选择，20%用于最终性能评估。UFNet在准确性、ROC曲线下面积（AUROC）和灵敏度方面明显优于单任务模型，同时保持非劣特异性。保留不确定预测进一步提升了性能，实现了88.0+-0.3%的准确性，93.0+-0.2%的AUROC，79.3+-0.9%的灵敏度和92.6+-0.3%的特异性，代价是无法预测2.3+-0.3%的数据（+-表示95%置信区间）。进一步分析表明，训练模型在性别和种族亚组中没有可检测的偏见，并且对50至80岁之间的个体最有效。我们的方法仅需要一个网络摄像头和麦克风，有助于在资源有限的地区进行易于获取的家庭PD筛查。

更新时间: 2024-10-23 15:08:59

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.14856v2

Certifiably Robust Policies for Uncertain Parametric Environments

We present a data-driven approach for producing policies that are provably robust across unknown stochastic environments. Existing approaches can learn models of a single environment as an interval Markov decision processes (IMDP) and produce a robust policy with a probably approximately correct (PAC) guarantee on its performance. However these are unable to reason about the impact of environmental parameters underlying the uncertainty. We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. The key challenge is then to produce meaningful performance guarantees that combine the two layers of uncertainty: (1) multiple environments induced by parameters with an unknown distribution; (2) unknown induced environments which are approximated by IMDPs. We present a novel approach based on scenario optimisation that yields a single PAC guarantee quantifying the risk level for which a specified performance level can be assured in unseen environments, plus a means to trade-off risk and performance. We implement and evaluate our framework using multiple robust policy generation methods on a range of benchmarks. We show that our approach produces tight bounds on a policy's performance with high confidence.

Updated: 2024-10-23 15:01:34

标题: 不确定参数环境下的可靠策略认证

摘要: 我们提出了一种基于数据驱动的方法，用于生成经过证明在未知随机环境下具有稳健性的策略。现有方法可以学习单个环境的模型，作为区间马尔可夫决策过程（IMDP），并在其性能上具有近似正确（PAC）保证的稳健策略。然而，这些方法无法推断出底层不确定性的环境参数的影响。我们提出了一个基于具有未知参数分布的参数马尔可夫决策过程（MDP）的框架。我们对由参数诱导的一组未知样本环境进行学习和分析IMDP。关键挑战是产生结合两层不确定性的有意义性能保证：（1）由参数诱导的具有未知分布的多个环境；（2）由IMDP近似的未知诱导环境。我们提出了一种基于场景优化的新方法，产生一个单一的PAC保证，量化在未知环境中可以确保指定性能水平的风险水平，并提供一种权衡风险和性能的方法。我们使用多种稳健策略生成方法在一系列基准测试中实施和评估我们的框架。我们展示了我们的方法以高置信度产生对策略性能的严格边界。

更新时间: 2024-10-23 15:01:34

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2408.03093v2

Semi-Implicit Functional Gradient Flow

Particle-based variational inference methods (ParVIs) use non-parametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Recent works introduce functional gradient flows to substitute the kernel for better flexibility. However, the deterministic updating mechanism may suffer from limited exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles as the approximation family. The corresponding functional gradient flow, which can be estimated via denoising score matching, exhibits strong theoretical convergence guarantee. We also present an adaptive version of our method to automatically choose the suitable noise magnitude. Extensive experiments demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.

Updated: 2024-10-23 15:00:30

标题: 半隐式功能梯度流

摘要: 基于粒子的变分推断方法（ParVIs）使用由粒子表示的非参数变分族来逼近目标分布，根据Kullback-Leibler（KL）散度的核化Wasserstein梯度流。最近的研究引入了功能梯度流来替代核以获得更好的灵活性。然而，确定性更新机制可能受限于有限的探索，并需要昂贵的重复运行以获得新样本。在本文中，我们提出了半隐式功能梯度流（SIFG），这是一种使用扰动粒子作为逼近族的功能梯度ParVI方法。相应的功能梯度流可以通过去噪分数匹配来估计，具有强大的理论收敛保证。我们还提出了我们方法的自适应版本，以自动选择适当的噪声幅度。广泛的实验表明，所提出的框架在模拟和实际数据问题上的有效性和效率。

更新时间: 2024-10-23 15:00:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.17935v1

Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry

Estimating the variability of seasonal snow cover, in particular snow depth in remote areas, poses significant challenges due to limited spatial and temporal data availability. This study uses snow depth measurements from the ICESat-2 satellite laser altimeter, which are sparse in both space and time, and incorporates them with climate reanalysis data into a downscaling-calibration scheme to produce monthly gridded snow depth maps at microscale (10 m). Snow surface elevation measurements from ICESat-2 along profiles are compared to a digital elevation model to determine snow depth at each point. To efficiently turn sparse measurements into snow depth maps, a regression model is fitted to establish a relationship between the retrieved snow depth and the corresponding ERA5 Land snow depth. This relationship, referred to as subgrid variability, is then applied to downscale the monthly ERA5 Land snow depth data. The method can provide timeseries of monthly snow depth maps for the entire ERA5 time range (since 1950). The validation of downscaled snow depth data was performed at an intermediate scale (100 m x 500 m) using datasets from airborne laser scanning (ALS) in the Hardangervidda region of southern Norway. Results show that snow depth prediction achieved R2 values ranging from 0.74 to 0.88 (post-calibration). The method relies on globally available data and is applicable to other snow regions above the treeline. Though requiring area-specific calibration, our approach has the potential to provide snow depth maps in areas where no such data exist and can be used to extrapolate existing snow surveys in time and over larger areas. With this, it can offer valuable input data for hydrological, ecological or permafrost modeling tasks.

Updated: 2024-10-23 14:59:06

标题: 用ICESat-2激光测高技术对ERA5再分析数据进行降尺度处理以获取雪深分布

摘要: 估算季节性积雪覆盖的变异性，特别是在偏远地区的积雪深度，由于空间和时间数据的有限可用性，面临着重大挑战。本研究利用ICESat-2卫星激光测高计的积雪深度测量数据，这些数据在空间和时间上都是稀疏的，并将它们与气候再分析数据结合在一起，通过一个下尺度校准方案，生成微尺度（10米）的月度格网积雪深度图。利用ICESat-2沿剖面的积雪表面高程测量与数字高程模型进行比较，以确定每个点的积雪深度。为了有效地将稀疏的测量数据转换为积雪深度图，拟合了回归模型以建立检索到的积雪深度与相应ERA5 Land积雪深度之间的关系。这种关系，称为亚网格变异性，然后被应用于降尺度月度ERA5 Land积雪深度数据。该方法可以为整个ERA5时间范围（自1950年以来）提供月度积雪深度图的时间序列。通过在挪威南部Hardangervidda地区使用来自航空激光扫描（ALS）的数据集进行中尺度（100米x500米）下尺度积雪深度数据的验证。结果表明，积雪深度预测在校准后取得了介于0.74到0.88之间的R2值。该方法依赖于全球可用数据，并适用于树线以上的其他积雪地区。尽管需要区域特定的校准，我们的方法有潜力在没有此类数据的地区提供积雪深度图，并可用于在时间和更大范围内推断现有的积雪调查。通过这种方法，可以为水文、生态或多年冻土建模任务提供有价值的输入数据。

更新时间: 2024-10-23 14:59:06

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2410.17934v1

Multi-Continental Healthcare Modelling Using Blockchain-Enabled Federated Learning

One of the biggest challenges of building artificial intelligence (AI) model in healthcare area is the data sharing. Since healthcare data is private, sensitive, and heterogeneous, collecting sufficient data for modelling is exhausted, costly, and sometimes impossible. In this paper, we propose a framework for global healthcare modelling using datasets from multi-continents (Europe, North America and Asia) while without sharing the local datasets, and choose glucose management as a study model to verify its effectiveness. Technically, blockchain-enabled federated learning is implemented with adaption to make it meet with the privacy and safety requirements of healthcare data, meanwhile rewards honest participation and penalize malicious activities using its on-chain incentive mechanism. Experimental results show that the proposed framework is effective, efficient, and privacy preserved. Its prediction accuracy is much better than the models trained from limited personal data and is similar to, and even slightly better than, the results from a centralized dataset. This work paves the way for international collaborations on healthcare projects, where additional data is crucial for reducing bias and providing benefits to humanity.

Updated: 2024-10-23 14:55:53

标题: 跨洲卫生保健建模：利用区块链启用的联邦学习

摘要: 在医疗领域构建人工智能（AI）模型的最大挑战之一是数据共享。由于医疗数据具有隐私、敏感性和异质性，收集足够的数据进行建模是耗尽的、昂贵的，有时甚至不可能的。本文提出了一个框架，利用来自多个大陆（欧洲、北美和亚洲）的数据集进行全球医疗建模，同时不共享本地数据集，并选择葡萄糖管理作为研究模型来验证其有效性。技术上，实施了基于区块链的联邦学习，通过调整使其符合医疗数据的隐私和安全要求，同时通过其链上激励机制奖励诚实参与者并惩罚恶意活动。实验结果表明，所提出的框架是有效、高效且保护隐私的。其预测精度比从有限个人数据训练的模型要好得多，并且与甚至略优于来自中央数据集的结果。这项工作为医疗项目的国际合作铺平了道路，额外的数据对于减少偏见并为人类提供好处至关重要。

更新时间: 2024-10-23 14:55:53

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.17933v1

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack

Deep neural network based systems deployed in sensitive environments are vulnerable to adversarial attacks. Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-level semantics are manipulated by introducing perturbations. However, they often results in substantial semantic distortions in the denoised output and suffers from low efficiency. In this study, we propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA), which employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance throughout the process. Under the condition of rich semantic information provided by MLLM, we perform the DDPM denoising process of each step using a series of edit-friendly noise maps, and leverage DPM Solver++ to accelerate this process, enabling efficient sampling with semantic consistency. Compared to existing methods, our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes. Consequently, we for the first time introduce Semantic-Consistent Adversarial Examples (SCAE). Extensive experiments and visualizations have demonstrated the high efficiency of SCA, particularly in being on average 12 times faster than the state-of-the-art attacks. Our research can further draw attention to the security of multimedia information.

Updated: 2024-10-23 14:53:38

标题: SCA：高效的语义一致性无限制对抗攻击

摘要: 在敏感环境中部署的基于深度神经网络的系统容易受到对抗性攻击的影响。无限制的对抗性攻击通常会操纵图像的语义内容（例如颜色或纹理），以创建既有效又逼真的对抗性示例。最近的研究利用扩散反演过程将图像映射到一个潜在空间，通过引入扰动来操纵高级语义。然而，它们通常会导致去噪输出中的实质性语义扭曲，并且效率低下。在本研究中，我们提出了一个名为Semantic-Consistent Unrestricted Adversarial Attacks（SCA）的新框架，它采用反演方法提取易于编辑的噪声图，并利用Multimodal Large Language Model（MLLM）在整个过程中提供语义指导。在MLLM提供丰富语义信息的条件下，我们使用一系列易于编辑的噪声图进行每个步骤的DDPM去噪过程，并利用DPM Solver++加速该过程，实现具有语义一致性的高效采样。与现有方法相比，我们的框架使得能够高效生成对抗性示例，这些示例显示出最小可辨认的语义变化。因此，我们首次引入了Semantic-Consistent Adversarial Examples（SCAE）。大量实验证明了SCA的高效性，特别是在平均比最先进的攻击快12倍。我们的研究还可以进一步引起对多媒体信息安全的关注。

更新时间: 2024-10-23 14:53:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.02240v4

On provable privacy vulnerabilities of graph representations

Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of similarity-based edge reconstruction attacks (SERA), furnishing a non-asymptotic analysis of their reconstruction capacities. Moreover, we present empirical corroboration indicating that such attacks can perfectly reconstruct sparse graphs as graph size increases. Conversely, we establish that sparsity is a critical factor for SERA's effectiveness, as demonstrated through analysis and experiments on (dense) stochastic block models. Finally, we explore the resilience of private graph representations produced via noisy aggregation (NAG) mechanism against SERA. Through theoretical analysis and empirical assessments, we affirm the mitigation of SERA using NAG . In parallel, we also empirically delineate instances wherein SERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility.

Updated: 2024-10-23 14:50:51

标题: 关于图表示的可证明隐私漏洞

摘要: 图形表示学习（GRL）对于从复杂网络结构中提取见解至关重要，但由于这些表示中潜在的隐私漏洞，也引起了安全性的关注。本文研究了图神经模型中的结构漏洞，敏感的拓扑信息可以通过边重构攻击来推断。我们的研究主要涉及基于相似性的边重构攻击（SERA）的理论基础，提供了对其重构能力的非渐近分析。此外，我们提出的实证验证表明，随着图形大小的增加，这种攻击可以完美重构稀疏图。相反，我们确定了稀疏性是SERA有效性的关键因素，通过对（密集的）随机块模型的分析和实验加以证实。最后，我们探讨了通过噪声聚合（NAG）机制产生的私人图表征对SERA的抗性。通过理论分析和实证评估，我们确认了使用NAG来减轻SERA的能力。同时，我们还在实践中勾画了SERA展示出效能和不足的例子，以阐明隐私和效用之间的权衡。

更新时间: 2024-10-23 14:50:51

领域: cs.LG

下载: http://arxiv.org/abs/2402.04033v3

Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction

Causal decoder-only transformer models used for generative language modelling, such as Generative Pre-trained Transformers (GPT), are trained to predict the next token in a sequence based only on its previous tokens. Despite this simple training objective, they have proved to be powerful AI tools. However, only predicting the next token results in top layer embedding vectors that are highly token-focused. There may be benefits in generating embedding vectors at each token position that better capture the overall meaning of longer sequences of future text. Recent studies matching brain scans with deep language models suggest that humans also predict upcoming words when listening or reading but consider multiple future tokens rather than just one. This research investigates a new pretraining method called Future Token Prediction (FTP). In FTP, a large transformer encoder generates top layer embedding vectors for each token position, which, instead of being passed to a language head, are linearly and expansively projected to a pseudo-sequence, which is cross attended to by a small transformer decoder to predict the next N tokens forward from that position in the sequence. The top layer embedding vectors from FTP models exhibit distinct properties compared to those from standard GPT models, varying smoothly along a text sequence as measured by cosine similarity between adjacent tokens. Text generated by FTP models show improved topic coherence compared to standard GPT-like models trained with the same prediction perplexity for the next single token. The vectors are shown to better represent the topic of text based on the results of text classification examples. On a toy, but complex, coding problem, FTP networks produce significantly better results than GPT networks.

Updated: 2024-10-23 14:50:15

标题: 未来令牌预测——使用每个令牌语义状态向量的因果语言建模进行多令牌预测

摘要: 用于生成性语言建模的因果解码器-仅变压器模型，如生成式预训练变压器（GPT），是根据其先前的标记来预测序列中的下一个标记。尽管这一简单的训练目标，它们已被证明是强大的人工智能工具。然而，仅预测下一个标记导致顶层嵌入向量高度集中在标记上。在生成每个标记位置的嵌入向量可能有益于更好地捕捉更长未来文本序列的整体含义。最近的研究将脑扫描与深度语言模型进行匹配，表明人类在听或阅读时也会预测即将出现的词汇，但会考虑多个未来标记而不仅仅是一个。本研究调查了一种称为未来标记预测（FTP）的新的预训练方法。在FTP中，一个大型变压器编码器为每个标记位置生成顶层嵌入向量，而不是传递给语言头，而是线性和扩展地投影到一个伪序列，该序列通过小型变压器解码器进行交叉关注，以预测从该位置开始的序列中的下一个N个标记。 FTP模型的顶层嵌入向量与标准GPT模型的嵌入向量具有不同的特性，通过相邻标记之间的余弦相似度来测量，沿着文本序列平稳变化。与使用相同预测困惑度的标准GPT样式模型相比，由FTP模型生成的文本显示出改进的主题连贯性。通过文本分类示例的结果，向量被证明更好地表示文本的主题。在一个玩具，但复杂的编码问题上，FTP网络产生比GPT网络更好的结果。

更新时间: 2024-10-23 14:50:15

领域: cs.CL,cs.LG,I.2.6; I.2.7

下载: http://arxiv.org/abs/2410.18160v1

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of component functions. Numerical experiments across various tasks validate the effectiveness of our method.

Updated: 2024-10-23 14:49:39

标题: 在较弱假设下的随机优化的自适应方差减少

摘要: 本文探讨了基于STORM技术的随机优化的自适应方差缩减方法。现有的STORM的自适应扩展依赖于诸如有界梯度和有界函数值之类的强假设，或者在收敛速度中增加额外的$\mathcal{O}(\log T)$项。为了解决这些限制，我们引入了一种新颖的自适应STORM方法，通过我们新设计的学习率策略，实现了非凸函数的最佳收敛速度$\mathcal{O}(T^{-1/3})$。与现有方法相比，我们的方法需要更弱的假设，并且在没有额外的$\mathcal{O}(\log T)$项的情况下达到最佳收敛速度。我们还将提出的技术扩展到随机组合优化，获得相同的最佳速率$\mathcal{O}(T^{-1/3})$。此外，我们研究了非凸有限和问题，并开发了另一种创新的自适应方差缩减方法，实现了最佳的收敛速度$\mathcal{O}(n^{1/4} T^{-1/2})$，其中$n$表示组成函数的数量。通过对各种任务的数值实验验证了我们方法的有效性。

更新时间: 2024-10-23 14:49:39

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.01959v2

PnLCalib: Sports Field Registration via Points and Lines Optimization

Camera calibration in broadcast sports videos presents numerous challenges for accurate sports field registration due to multiple camera angles, varying camera parameters, and frequent occlusions of the field. Traditional search-based methods depend on initial camera pose estimates, which can struggle in non-standard positions and dynamic environments. In response, we propose an optimization-based calibration pipeline that leverages a 3D soccer field model and a predefined set of keypoints to overcome these limitations. Our method also introduces a novel refinement module that improves initial calibration by using detected field lines in a non-linear optimization process. This approach outperforms existing techniques in both multi-view and single-view 3D camera calibration tasks, while maintaining competitive performance in homography estimation. Extensive experimentation on real-world soccer datasets, including SoccerNet-Calibration, WorldCup 2014, and TS-WorldCup, highlights the robustness and accuracy of our method across diverse broadcast scenarios. Our approach offers significant improvements in camera calibration precision and reliability.

Updated: 2024-10-23 14:48:44

标题: PnLCalib：通过点和线优化实现体育场地注册

摘要: 广播体育视频中的摄像机校准面临许多挑战，因为存在多个摄像机角度、不同的摄像机参数以及场地频繁的遮挡。传统的基于搜索的方法依赖于初始摄像机姿态估计，但在非标准位置和动态环境下可能会遇到困难。作为回应，我们提出了一个基于优化的校准流程，利用3D足球场模型和预定义的关键点来克服这些限制。我们的方法还引入了一个新颖的优化模块，通过在非线性优化过程中使用检测到的场地线来改善初始校准。这种方法在多视角和单视角3D摄像机校准任务中胜过现有技术，同时在单应性估计中保持竞争性能。在真实世界的足球数据集上进行了大量实验，包括SoccerNet-Calibration、2014年世界杯和TS-WorldCup，突显了我们的方法在各种广播场景中的稳健性和准确性。我们的方法在摄像机校准精度和可靠性方面提供了显著的改进。

更新时间: 2024-10-23 14:48:44

领域: cs.CV,cs.AI,I.2; I.4; I.5

下载: http://arxiv.org/abs/2404.08401v3

Posterior Sampling-based Online Learning for Episodic POMDPs

Learning in POMDPs is known to be significantly harder than in MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes and is polynomial in the other parameters. In a general setting, the regret scales exponentially in the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret.

Updated: 2024-10-23 14:47:42

标题: 基于后验抽样的用于片段式POMDPs的在线学习

摘要: POMDPs中的学习被认为比MDPs要困难得多。在本文中，我们考虑了具有未知转移和观测模型的情节POMDPs的在线学习问题。我们提出了一种基于后验抽样的POMDPs强化学习算法（PS4POMDPs），与最先进的基于乐观主义的POMDPs在线学习算法相比，该算法更简单、更易实施。我们展示了所提算法的贝叶斯遗憾随着剧集数量的平方根而增长，并且在其他参数上是多项式级别的。在一般设置下，遗憾随着时间跨度$H$呈指数级增长，我们通过提供下界证明了这是不可避免的。然而，当POMDP是不完全的且弱地揭示的（这是最近文献中的一个常见假设）时，我们建立了一个多项式贝叶斯遗憾上界。最后，我们提出了一个多智能体POMDPs的后验抽样算法，并展示它也具有次线性遗憾。

更新时间: 2024-10-23 14:47:42

领域: cs.LG,cs.AI,cs.SY,eess.SY,stat.ML,93E35

下载: http://arxiv.org/abs/2310.10107v4

SJMalloc: the security-conscious, fast, thread-safe and memory-efficient heap allocator

Heap-based exploits that leverage memory management errors continue to pose a significant threat to application security. The root cause of these vulnerabilities are the memory management errors within the applications, however various hardened allocator designs have been proposed as mitigation. A common feature of these designs is the strategic decision to store heap metadata separately from the application data in use, thereby reducing the risk of metadata corruption leading to security breaches. Despite their potential benefits, hardened allocators have not been widely adopted in real-world applications. The primary barrier to their adoption is the performance overheads they introduce. These overheads can negatively impact the efficiency and speed of applications, which is a critical consideration for developers and system administrators. Having learned from previous implementations, we developed SJMalloc, a general-purpose, high-performance allocator that addresses these concerns. SJMalloc stores its metadata out-of-band, away from the application's data on the heap. This design choice not only enhances security but also improves performance. Across a variety of real-world workloads, SJMalloc demonstrates a ~6% performance improvement compared to GLibcs allocator, while using only ~5% more memory. Furthermore, SJMalloc successfully passes the generic elements of the GLibc malloc testsuite and can thus be used as a drop-in replacement for the standard allocator, offering an easy upgrade path for enhanced security and performance without requiring changes to existing applications.

Updated: 2024-10-23 14:47:12

标题: SJMalloc：注重安全、快速、线程安全和内存高效的堆分配器

摘要: 基于堆的利用利用内存管理错误仍然对应用程序安全构成重大威胁。这些漏洞的根本原因是应用程序内部的内存管理错误，然而已经提出了各种加固的分配器设计以进行缓解。这些设计的一个共同特点是战略性决策，将堆元数据与正在使用的应用程序数据分开存储，从而降低元数据损坏导致安全漏洞的风险。尽管它们具有潜在的好处，加固的分配器在现实世界中并未被广泛采用。它们被采用的主要障碍是引入的性能开销。这些开销可能会对应用程序的效率和速度产生负面影响，这对开发人员和系统管理员来说是一个关键考虑因素。通过从以前的实施中学到的经验，我们开发了SJMalloc，一个通用、高性能的分配器，解决了这些问题。SJMalloc将其元数据存储在堆上与应用程序数据分开的位置。这种设计选择不仅增强了安全性，还提高了性能。在各种实际工作负载下，与GLibcs分配器相比，SJMalloc表现出约6%的性能改进，同时仅增加约5%的内存使用。此外，SJMalloc成功通过了GLibc malloc测试套件的通用元素，因此可以作为标准分配器的替代品，为增强安全性和性能提供了一条简便的升级路径，而无需对现有应用程序进行更改。

更新时间: 2024-10-23 14:47:12

领域: cs.OS,cs.CR,D.4.2

下载: http://arxiv.org/abs/2410.17928v1

PixLore: A Dataset-driven Approach to Rich Image Captioning

In the domain of vision-language integration, generating detailed image captions poses a significant challenge due to the lack of curated and rich datasets. This study introduces PixLore, a novel method that leverages Querying Transformers through the fine-tuning of the BLIP-2 model using the LoRa method on a standard commercial GPU. The followed approach, which involves training on a carefully assembled dataset from state-of-the-art Computer Vision models combined and augmented by ChatGPT, addresses the question of whether intricate image understanding can be achieved with an ensemble of smaller-scale models, referred to as Knowledge Stitching. Comparative evaluations against major models such as GPT-4 and Google Bard demonstrate that PixLore-2.7B, despite having considerably fewer parameters, is rated higher than the existing State-of-the-Art models in over half of the assessments. Precisely, PixLore outperform Bard and BLIP-2, which score approximately 35.18% and 27.98% lower than PixLore in the task of image captioning. This research not only presents a groundbreaking approach but also highlights the importance of well-curated datasets in enhancing the performance of smaller models.

Updated: 2024-10-23 14:47:10

标题: PixLore：基于数据集驱动的丰富图像描述方法

摘要: 在视觉与语言融合领域，生成详细的图像描述面临着重大挑战，因为缺乏经过精心策划和丰富的数据集。本研究介绍了PixLore，这是一种利用查询变换器通过在标准商用GPU上使用LoRa方法对BLIP-2模型进行微调的新方法。采用的方法包括在从最先进的计算机视觉模型中精心组合和增强的数据集上进行训练，这些模型还结合了ChatGPT，解决了复杂图像理解是否可以通过一组较小规模的模型（称为知识拼接）实现的问题。与GPT-4和谷歌巴德等主要模型进行比较评估显示，尽管参数数量明显较少，但PixLore-2.7B在超过一半的评估中比现有的最先进模型评分更高。具体来说，PixLore在图像描述任务中的表现优于巴德和BLIP-2，其评分约比PixLore低35.18%和27.98%。这项研究不仅提出了一种开创性的方法，还突显了精心策划的数据集在提升较小模型性能方面的重要性。

更新时间: 2024-10-23 14:47:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.05349v3

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to $\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of $\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$, where $m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of $\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of $\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and $\mathcal{O}(d^{3/8}T^{-1/8})$, where $n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

Updated: 2024-10-23 14:42:35

标题: 高效基于符号的优化：通过方差减少加速收敛

摘要: SignSGD是一种通信高效的方法，仅传输随机梯度的符号进行参数更新。现有文献表明，SignSGD可以实现收敛速度为$\mathcal{O}(d^{1/2}T^{-1/4})$，其中$d$表示维度，$T$表示迭代次数。本文通过引入基于符号的随机方差减少（SSVR）方法，将这一收敛速度提高到$\mathcal{O}(d^{1/2}T^{-1/3})，该方法利用方差减少估计量来跟踪梯度，并利用其符号来进行更新。对于有限和问题，我们的方法可以进一步提升，实现$\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$的收敛速度，其中$m$表示组件函数的数量。此外，我们研究了分布式设置中的异质多数投票，并引入了两种新算法，分别实现了$\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$和$\mathcal{O}(d^{1/4}T^{-1/4})$的改善收敛速度，优于先前的结果$\mathcal{O}(dT^{-1/4} + dn^{-1/2})$和$\mathcal{O}(d^{3/8}T^{-1/8}$，其中$n$表示节点数量。通过不同任务的数值实验验证了我们提出方法的有效性。

更新时间: 2024-10-23 14:42:35

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.00489v2

Securing Stack Smashing Protection in WebAssembly Applications

WebAssembly is an instruction set architecture and binary format standard, designed for secure execution by an interpreter. Previous work has shown that WebAssembly is vulnerable to buffer overflow due to the lack of effective protection mechanisms. In this paper, we evaluate the implementation of Stack Smashing Protection (SSP) in WebAssembly standalone runtimes, and uncover two weaknesses in their current implementation. The first one is the possibility to overwrite the SSP reference value because of the contiguous memory zones inside a WebAssembly process. The second comes from the reliance of WebAssembly on the runtime to provide randomness in order to initialize the SSP reference value, which impacts the robustness of the solution. We address these two flaws by hardening the SSP implementation in terms of storage and random generator failure, in a way that is generalizable to all of WebAssembly. We evaluate our new, more robust, solution to prove that the implemented improvements do not reduce the efficiency of SSP.

Updated: 2024-10-23 14:41:59

标题: 在WebAssembly应用程序中保护堆栈破坏攻击

摘要: WebAssembly是一种指令集架构和二进制格式标准，旨在通过解释器进行安全执行。先前的研究表明，由于缺乏有效的保护机制，WebAssembly容易受到缓冲区溢出的影响。在本文中，我们评估了WebAssembly独立运行时中Stack Smashing Protection（SSP）的实现，并发现了当前实现中的两个弱点。第一个弱点是由于WebAssembly进程内的连续内存区域，可能会覆盖SSP参考值。第二个弱点来自于WebAssembly依赖运行时提供随机性以初始化SSP参考值，这影响了解决方案的鲁棒性。我们通过增强SSP实现的存储和随机生成器故障方面的方法来解决这两个缺陷，这种方法是可以推广到所有WebAssembly的。我们评估我们的新的、更加健壮的解决方案，以证明实施的改进不会降低SSP的效率。

更新时间: 2024-10-23 14:41:59

领域: cs.CR

下载: http://arxiv.org/abs/2410.17925v1

Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models

With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical. However, existing defense methods often struggle with two key issues: (i) inadequate defense capabilities, particularly in domain-specific scenarios like chemistry, where a lack of specialized knowledge can lead to the generation of harmful responses to malicious queries. (ii) over-defensiveness, which compromises the general utility and responsiveness of LLMs. To mitigate these issues, we introduce a multi-agents-based defense framework, Guide for Defense (G4D), which leverages accurate external information to provide an unbiased summary of user intentions and analytically grounded safety response guidance. Extensive experiments on popular jailbreak attacks and benign datasets show that our G4D can enhance LLM's robustness against jailbreak attacks on general and domain-specific scenarios without compromising the model's general functionality.

Updated: 2024-10-23 14:40:37

标题: 《防御指南（G4D）：大型语言模型中稳健平衡防御的动态指导》

摘要: 随着大型语言模型（LLMs）的广泛部署，确保它们的安全性变得日益关键。然而，现有的防御方法往往面临两个关键问题：（i）防御能力不足，特别是在领域特定的场景中，如化学领域，缺乏专业知识可能导致对恶意查询生成有害响应。（ii）过度防御，这会损害LLMs的通用效用和响应性。为了缓解这些问题，我们引入了一个基于多代理的防御框架，Guide for Defense（G4D），利用准确的外部信息提供用户意图的无偏总结和基于分析的安全响应指导。对流行的越狱攻击和良性数据集进行的大量实验表明，我们的G4D可以增强LLMs对一般和领域特定场景中的越狱攻击的鲁棒性，而不会损害模型的一般功能性。

更新时间: 2024-10-23 14:40:37

领域: cs.AI

下载: http://arxiv.org/abs/2410.17922v1

Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

Updated: 2024-10-23 14:34:39

标题: 解决临床多模态融合中的异步性：通过个性化胸部X射线生成

摘要: 将多模式临床数据（如电子健康记录（EHR）和胸部X射线图像（CXR））整合在一起，对于临床预测任务特别有益。然而，在时间设置中，多模式数据通常是固有异步的。EHR可以持续收集，但由于成本高和辐射剂量大，CXR通常采集间隔较长。当需要临床预测时，最后一次可用的CXR图像可能已经过时，导致预测不够理想。为了解决这一挑战，我们提出了DDL-CXR方法，该方法动态生成最新的个性化CXR图像的潜变表示。我们的方法利用患者特定生成的潜扩散模型，这些模型在先前的CXR图像和EHR时间序列上进行战略性条件化，分别提供有关解剖结构和疾病进展的信息。通过这种方式，跨模式之间的交互可以更好地被潜在的CXR生成过程所捕捉，最终提高了预测性能。使用MIMIC数据集的实验表明，所提出的模型能够有效地解决多模式融合中的异步性，并且始终优于现有方法。

更新时间: 2024-10-23 14:34:39

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17918v1

regAL: Python Package for Active Learning of Regression Problems

Increasingly more research areas rely on machine learning methods to accelerate discovery while saving resources. Machine learning models, however, usually require large datasets of experimental or computational results, which in certain fields, such as (bio)chemistry, materials science, or medicine, are rarely given and often prohibitively expensive to obtain. To bypass that obstacle, active learning methods are employed to develop machine learning models with a desired performance while requiring the least possible number of computational or experimental results from the domain of application. For this purpose, the model's knowledge about certain regions of the application domain is estimated to guide the choice of the model's training set. Although active learning is widely studied for classification problems (discrete outcomes), comparatively few works handle this method for regression problems (continuous outcomes). In this work, we present our Python package regAL, which allows users to evaluate different active learning strategies for regression problems. With a minimal input of just the dataset in question, but many additional customization and insight options, this package is intended for anyone who aims to perform and understand active learning in their problem-specific scope.

Updated: 2024-10-23 14:34:36

标题: regAL: 用于回归问题主动学习的Python包

摘要: 越来越多的研究领域依赖机器学习方法来加速发现，同时节省资源。然而，机器学习模型通常需要大量的实验或计算结果的数据集，在某些领域，如（生物）化学、材料科学或医学中很少提供，获取起来成本很高。为了规避这个障碍，采用主动学习方法来开发具有期望性能的机器学习模型，同时需要尽可能少的来自应用领域的计算或实验结果。为此，估计模型对应用领域某些区域的知识，以指导模型训练集的选择。尽管主动学习广泛应用于分类问题（离散结果），但相对较少的工作处理这种方法用于回归问题（连续结果）。在这项工作中，我们提出了我们的Python软件包regAL，允许用户评估用于回归问题的不同主动学习策略。只需输入相关数据集的最小输入，但具有许多额外的自定义和洞察选项，这个软件包适用于任何希望在其问题特定范围内执行和理解主动学习的人。

更新时间: 2024-10-23 14:34:36

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2410.17917v1

Deep learning for model correction of dynamical systems with data scarcity

We present a deep learning framework for correcting existing dynamical system models utilizing only a scarce high-fidelity data set. In many practical situations, one has a low-fidelity model that can capture the dynamics reasonably well but lacks high resolution, due to the inherent limitation of the model and the complexity of the underlying physics. When high resolution data become available, it is natural to seek model correction to improve the resolution of the model predictions. We focus on the case when the amount of high-fidelity data is so small that most of the existing data driven modeling methods cannot be applied. In this paper, we address these challenges with a model-correction method which only requires a scarce high-fidelity data set. Our method first seeks a deep neural network (DNN) model to approximate the existing low-fidelity model. By using the scarce high-fidelity data, the method then corrects the DNN model via transfer learning (TL). After TL, an improved DNN model with high prediction accuracy to the underlying dynamics is obtained. One distinct feature of the propose method is that it does not assume a specific form of the model correction terms. Instead, it offers an inherent correction to the low-fidelity model via TL. A set of numerical examples are presented to demonstrate the effectiveness of the proposed method.

Updated: 2024-10-23 14:33:11

标题: 深度学习在数据稀缺条件下动态系统模型校正中的应用

摘要: 我们提出了一个深度学习框架，用于仅利用稀缺的高保真数据集对现有动力系统模型进行校正。在许多实际情况下，人们通常拥有一个低保真模型，可以相当好地捕捉动态，但由于模型的固有限制和底层物理的复杂性而缺乏高分辨率。当高分辨率数据可用时，自然而然地想要寻求模型校正以改善模型预测的分辨率。我们重点讨论高保真数据量非常有限的情况，大多数现有的数据驱动建模方法都无法应用。在本文中，我们通过一种仅需要稀缺高保真数据集的模型校正方法来解决这些挑战。我们的方法首先寻找一个深度神经网络（DNN）模型来近似现有的低保真模型。通过使用稀缺的高保真数据，该方法然后通过迁移学习（TL）来校正DNN模型。在TL之后，获得了一个对底层动态具有高预测精度的改进DNN模型。该方法的一个显著特点是，它不假设模型校正项的具体形式。相反，它通过TL为低保真模型提供了固有的校正。我们展示了一组数值示例来证明所提方法的有效性。

更新时间: 2024-10-23 14:33:11

领域: cs.LG,math.DS,stat.ML

下载: http://arxiv.org/abs/2410.17913v1

Anomaly Prediction: A Novel Approach with Explicit Delay and Horizon

Anomaly detection in time series data is a critical challenge across various domains. Traditional methods typically focus on identifying anomalies in immediate subsequent steps, often underestimating the significance of temporal dynamics such as delay time and horizons of anomalies, which generally require extensive post-analysis. This paper introduces a novel approach for time series anomaly prediction, incorporating temporal information directly into the prediction results. We propose a new dataset specifically designed to evaluate this approach and conduct comprehensive experiments using several state-of-the-art methods. Our results demonstrate the efficacy of our approach in providing timely and accurate anomaly predictions, setting a new benchmark for future research in this field.

Updated: 2024-10-23 14:29:56

标题: 异常预测：一种具有明确延迟和视野的新方法

摘要: 时间序列数据中的异常检测是各个领域的一个关键挑战。传统方法通常集中于识别立即后续步骤中的异常，往往低估了时间动态（如延迟时间和异常的范围），这通常需要大量的后续分析。本文介绍了一种新颖的时间序列异常预测方法，将时间信息直接纳入预测结果中。我们提出了一个新的数据集，专门设计用于评估这种方法，并使用几种最先进的方法进行了全面实验。我们的结果表明，我们的方法在提供及时准确的异常预测方面表现出了有效性，为未来研究设定了新的基准。

更新时间: 2024-10-23 14:29:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.04377v3

Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning

Advanced Persistent Threats (APTs) represent sophisticated cyberattacks characterized by their ability to remain undetected within the victim system for extended periods, aiming to exfiltrate sensitive data or disrupt operations. Existing detection approaches often struggle to effectively identify these complex threats, construct the attack chain for defense facilitation, or resist adversarial attacks. To overcome these challenges, we propose Slot, an advanced APT detection approach based on provenance graphs and graph reinforcement learning. Slot excels in uncovering multi-level hidden relationships, such as causal, contextual, and indirect connections, among system behaviors through provenance graph mining. By pioneering the integration of graph reinforcement learning, Slot dynamically adapts to new user activities and evolving attack strategies, enhancing its resilience against adversarial attacks. Additionally, Slot automatically constructs the attack chain according to detected attacks with clustering algorithms, providing precise identification of attack paths and facilitating the development of defense strategies. Evaluations with real-world datasets demonstrate Slot's outstanding accuracy, efficiency, adaptability, and robustness in APT detection, with most metrics surpassing state-of-the-art methods. Additionally, case studies conducted to assess Slot's effectiveness in supporting APT defense further establish it as a practical and reliable tool for cybersecurity protection.

Updated: 2024-10-23 14:28:32

标题: 插槽：通过图强化学习驱动的来源驱动的高级持续威胁检测

摘要: 高级持久性威胁（APTs）代表了一种复杂的网络攻击，其特点是能够在受害系统内长时间保持不被发现，旨在窃取敏感数据或干扰运营。现有的检测方法通常难以有效识别这些复杂威胁，构建用于防御的攻击链，或抵抗对抗性攻击。为了克服这些挑战，我们提出了Slot，一种基于溯源图和图强化学习的先进APT检测方法。Slot在通过溯源图挖掘揭示系统行为之间的因果、上下文和间接连接等多级隐藏关系方面表现出色。通过引领图强化学习的整合，Slot动态适应新的用户活动和不断演变的攻击策略，增强其对抗性攻击的弹性。此外，Slot通过聚类算法自动构建攻击链，根据检测到的攻击提供攻击路径的精确识别，并促进防御策略的制定。通过真实数据集的评估显示，Slot在APT检测方面具有出色的准确性、效率、适应性和稳健性，大多数指标均超过了现有技术方法。此外，进行的案例研究评估了Slot在支持APT防御方面的有效性，进一步确立了它作为网络安全保护的实用和可靠工具。

更新时间: 2024-10-23 14:28:32

领域: cs.CR

下载: http://arxiv.org/abs/2410.17910v1

Leveraging Deep Learning for Time Series Extrinsic Regression in predicting photometric metallicity of Fundamental-mode RR Lyrae Stars

Astronomy is entering an unprecedented era of Big Data science, driven by missions like the ESA's Gaia telescope, which aims to map the Milky Way in three dimensions. Gaia's vast dataset presents a monumental challenge for traditional analysis methods. The sheer scale of this data exceeds the capabilities of manual exploration, necessitating the utilization of advanced computational techniques. In response to this challenge, we developed a novel approach leveraging deep learning to estimate the metallicity of fundamental mode (ab-type) RR Lyrae stars from their light curves in the Gaia optical G-band. Our study explores applying deep learning techniques, particularly advanced neural network architectures, in predicting photometric metallicity from time-series data. Our deep learning models demonstrated notable predictive performance, with a low mean absolute error (MAE) of 0.0565, the root mean square error (RMSE) achieved is 0.0765 and a high $R^2$ regression performance of 0.9401 measured by cross-validation. The weighted mean absolute error (wMAE) is 0.0563, while the weighted root mean square error (wRMSE) is 0.0763. These results showcase the effectiveness of our approach in accurately estimating metallicity values. Our work underscores the importance of deep learning in astronomical research, particularly with large datasets from missions like Gaia. By harnessing the power of deep learning methods, we can provide precision in analyzing vast datasets, contributing to more precise and comprehensive insights into complex astronomical phenomena.

Updated: 2024-10-23 14:26:35

标题: 利用深度学习预测基本模式RR Lyrae星的光度金属丰度的时间序列外部回归

摘要: 天文学正在进入一个前所未有的大数据科学时代，受到欧空局盖亚望远镜等任务的推动，该望远镜旨在三维地绘制银河系。盖亚的庞大数据集对传统分析方法提出了巨大挑战。这些数据的规模之巨超出了手动探索的能力，需要利用先进的计算技术。为了应对这一挑战，我们开发了一种新颖的方法，利用深度学习来估计盖亚光学G波段中基本模式（ab型）RR莱拉星的金属丰度。我们的研究探讨了应用深度学习技术，尤其是高级神经网络架构，来从时间序列数据中预测光度金属丰度。我们的深度学习模型表现出显著的预测性能，平均绝对误差（MAE）为0.0565，均方根误差（RMSE）达到0.0765，通过交叉验证测量的$R^2$回归性能高达0.9401。加权平均绝对误差（wMAE）为0.0563，而加权均方根误差（wRMSE）为0.0763。这些结果展示了我们方法在准确估计金属丰度值方面的有效性。我们的工作强调了深度学习在天文研究中的重要性，尤其是在像盖亚这样的任务所产生的大数据集中。通过利用深度学习方法的强大能力，我们可以在分析庞大数据集时提供精确性，有助于更精确和全面地了解复杂的天文现象。

更新时间: 2024-10-23 14:26:35

领域: cs.AI,astro-ph.IM

下载: http://arxiv.org/abs/2410.17906v1

Acquiring Better Load Estimates by Combining Anomaly and Change Point Detection in Power Grid Time-series Measurements

In this paper we present novel methodology for automatic anomaly and switch event filtering to improve load estimation in power grid systems. By leveraging unsupervised methods with supervised optimization, our approach prioritizes interpretability while ensuring robust and generalizable performance on unseen data. Through experimentation, a combination of binary segmentation for change point detection and statistical process control for anomaly detection emerges as the most effective strategy, specifically when ensembled in a novel sequential manner. Results indicate the clear wasted potential when filtering is not applied. The automatic load estimation is also fairly accurate, with approximately 90% of estimates falling within a 10% error margin, with only a single significant failure in both the minimum and maximum load estimates across 60 measurements in the test set. Our methodology's interpretability makes it particularly suitable for critical infrastructure planning, thereby enhancing decision-making processes.

Updated: 2024-10-23 14:24:50

标题: 通过在电网时间序列测量中结合异常和变点检测获得更好的负载估计

摘要: 在本文中，我们提出了一种新颖的方法，用于自动异常和开关事件过滤，以改进电网系统中的负载估计。通过利用无监督方法和监督优化，我们的方法优先考虑可解释性，同时确保在未见数据上具有稳健且可推广的性能。通过实验，一种结合了用于变化点检测的二进制分割和用于异常检测的统计过程控制的策略显示出最有效，特别是在一种新颖的顺序方式中集成时。结果表明，当未应用过滤时，存在明显的浪费潜力。自动负载估计也相当准确，约有90％的估计值落在10％的误差范围内，在测试集中的60次测量中，最小和最大负载估计中仅有一个显著失败。我们的方法具有良好的可解释性，特别适用于关键基础设施规划，从而增强决策过程。

更新时间: 2024-10-23 14:24:50

领域: cs.LG,cs.AI,stat.AP,stat.ML

下载: http://arxiv.org/abs/2405.16164v3

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under $\textit{general}$ latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions -- that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations -- in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.

Updated: 2024-10-23 14:22:49

标题: 基于潜在动态的强化学习：朝向统计和算法模块化

摘要: 强化学习在现实世界中的应用往往涉及代理在复杂、高维观测上运行的环境，但潜在的（“潜在”）动态相对简单。然而，在诸如潜在空间较小的限制性环境之外，对于在潜在动态下进行强化学习的基本统计要求和算法原则了解甚少。本文从统计和算法的角度探讨了强化学习在$\textit{一般}$潜在动态下的问题。在统计方面，我们的主要负面结果表明，在与丰富观测结合时，大多数已研究的带函数逼近的强化学习设置变得难以处理；我们补充了一个积极的结果，识别了潜在推进覆盖性作为一种使统计可处理性成为可能的一般条件。在算法方面，我们在两个设置中开发了经过证明的有效的可观察到潜在的约简 -- 即，将潜在MDP的任意算法转换为可以在丰富观测上运行的算法的约简：一个是代理可以访问潜在动态的事后观测[LADZ23]，另一个是代理可以估计自预测潜在模型[SAGHCB20]。总的来说，我们的结果作为迈向在潜在动态下进行强化学习的统一统计和算法理论的第一步。

更新时间: 2024-10-23 14:22:49

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.17904v1

Scalable Offline Reinforcement Learning for Mean Field Games

Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactions or access to system dynamics, limiting their practicality in real-world scenarios where such interactions are infeasible or difficult to model. In this paper, we present Offline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. By leveraging iterative mirror descent and importance sampling techniques, Off-MMD estimates the mean-field distribution from static datasets without relying on simulation or environment dynamics. Additionally, we incorporate techniques from offline reinforcement learning to address common issues like Q-value overestimation, ensuring robust policy learning even with limited data coverage. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation, highlighting its applicability to real-world multi-agent systems where online experimentation is infeasible. We empirically demonstrate the robustness of Off-MMD to low-quality datasets and conduct experiments to investigate its sensitivity to hyperparameter choices.

Updated: 2024-10-23 14:16:34

标题: 可扩展的离线均场博弈强化学习

摘要: 均场博弈的强化学习算法为优化大规模互动代理群体的策略提供了可扩展的框架。现有方法通常依赖在线交互或系统动态访问，这限制了它们在现实世界场景中的实用性，因为这种交互是不可行的或难以建模的。本文介绍了离线蒙特卡罗的镜像下降（Off-MMD），这是一种新颖的均场强化学习算法，通过纯离线数据近似均场博弈中的均衡策略。通过利用迭代镜像下降和重要性采样技术，Off-MMD从静态数据集中估计均场分布，而无需依赖模拟或环境动态。此外，我们还融合了离线强化学习中的技术，以解决常见问题，如Q值过度估计，确保即使数据覆盖有限，也能实现强大的策略学习。我们的算法适用于复杂环境，并在像群体探索或导航这样的基准任务上表现出色，突显了它在现实世界多代理系统中的适用性，其中在线实验是不可行的。我们通过实验证实了Off-MMD对低质量数据集的稳健性，并进行实验以调查其对超参数选择的敏感性。

更新时间: 2024-10-23 14:16:34

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.17898v1

A spring-block theory of feature learning in deep neural networks

Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively. We then propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.

Updated: 2024-10-23 14:11:34

标题: 一个关于深度神经网络中特征学习的弹簧-块理论

摘要: 特征学习深度网络逐渐将数据折叠到一个常规低维几何结构中。这种现象是如何由非线性、噪声、学习速率和其他选择的集体作用所引发的，这一点迄今为止还是逃脱了从微观神经元动态建立的第一原则理论。我们展示了一个噪声-非线性相图，识别了浅层或深层学习更有效的区域。然后我们提出了一个宏观机械理论，复制了这个相图，并解释了为什么一些深度神经网络是懒惰的，而一些是积极的，并将跨层的特征学习与泛化联系起来。

更新时间: 2024-10-23 14:11:34

领域: cond-mat.dis-nn,cond-mat.stat-mech,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.19353v2

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation

Protein structures are important for understanding their functions and interactions. Currently, many protein structure prediction methods are enriching the structure database. Discriminating the origin of structures is crucial for distinguishing between experimentally resolved and computationally predicted structures, evaluating the reliability of prediction methods, and guiding downstream biological studies. Building on works in structure prediction, We developed a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), to represent and discriminate the origin of protein structures. CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes, and is expected to be extended to more. Simultaneously, we utilized Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM. Preliminary experiments demonstrated that, compared to large-scale protein language models pre-trained on vast amounts of amino acid sequences, the "structure-sequence" enables the language model to learn more informative protein features, enhancing and optimizing structural representations. We have provided the code, model weights, and all related materials on https://github.com/GouWenrui/CPE-Pro-main.git.

Updated: 2024-10-23 14:08:10

标题: CPE-Pro：一种用于蛋白质表示和起源评估的结构敏感深度学习方法

摘要: 蛋白质结构对于理解其功能和相互作用至关重要。目前，许多蛋白质结构预测方法正在丰富结构数据库。区分结构的来源对于区分实验解析和计算预测结构、评估预测方法的可靠性以及指导下游生物学研究至关重要。在结构预测工作的基础上，我们开发了一种结构敏感的监督深度学习模型，Crystal vs Predicted Evaluator for Protein Structure（CPE-Pro），用于表示和区分蛋白质结构的来源。CPE-Pro学习蛋白质的结构信息，并捕捉结构间的差异以实现对四个数据类的准确可追溯性，并有望扩展到更多数据类。同时，我们利用Foldseek将蛋白质结构编码为“结构序列”，并训练了一个蛋白质结构序列语言模型，SSLM。初步实验表明，与在大量氨基酸序列上预训练的大规模蛋白质语言模型相比，“结构序列”使语言模型能够学习更多信息丰富的蛋白质特征，增强和优化结构表示。我们已在https://github.com/GouWenrui/CPE-Pro-main.git上提供了代码、模型权重和所有相关材料。

更新时间: 2024-10-23 14:08:10

领域: q-bio.BM,cs.CL,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2410.15592v2

Towards Croppable Implicit Neural Representations

Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.

Updated: 2024-10-23 14:02:12

标题: 朝向可裁剪的隐式神经表示

摘要: 隐式神经表示（INRs）近年来引起了人们的兴趣，因为它们能够使用神经网络对自然信号进行编码。虽然INRs可以支持诸如插值新坐标和信号压缩等有用应用，但它们的黑匣子特性使得在训练后修改它们变得困难。在本文中，我们探讨了可编辑的INRs的概念，并特别关注了广泛使用的裁剪操作。为此，我们提出了局部-全局SIRENs——一种支持设计裁剪的新颖INR架构。局部-全局SIRENs基于结合局部和全局特征提取进行信号编码。其设计独特之处在于能够轻松地删除编码信号的特定部分，并实现相应权重的减少。这通过从网络中消除相应的权重来实现，无需重新训练。我们进一步展示了如何利用这种架构支持先前编码信号的简便扩展。除了信号编辑，我们还探讨了局部-全局方法如何加速训练、增强各种信号的编码、改善下游性能，并应用于现代INRs（如INCODE），凸显其潜力和灵活性。代码可在https://github.com/maorash/Local-Global-INRs找到。

更新时间: 2024-10-23 14:02:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.19472v2

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation.

Updated: 2024-10-23 14:01:27

标题: 打破阶级障碍：通过跨类特征补偿器实现高效数据集精简

摘要: 数据集精炼已经成为一种技术，旨在将大型自然数据集中的信息特征压缩成紧凑且综合的形式。虽然最近的进展已经改进了这一技术，但其性能受到当前类别特定综合范式的限制。在这种范式下，合成数据仅针对预先分配的独热标签进行优化，从而在特征压缩中创建了一个隐式的类别屏障。这导致了对精炼预算的低效利用和对跨类别特征分布的忽视，最终限制了效果和效率，正如我们的分析所示。为了克服这些限制，本文提出了Inter-class Feature Compensator (INFER)，这是一种创新的精炼方法，超越了当前数据集精炼方法中广泛使用的类别特定数据-标签框架。具体来说，INFER利用了Universal Feature Compensator (UFC)来增强跨类别之间的特征整合，从而能够从单个UFC输入中生成多个额外的合成实例。这显著提高了精炼预算的效率。此外，INFER在精炼过程中丰富了跨类别交互，从而增强了精炼数据的效果和泛化性。通过允许线性插值与原始数据集中相似的标签，INFER精心优化了合成数据，并将合成数据集中的软标签大小几乎降至零，为数据集精炼的效率和效果建立了一个新的基准。

更新时间: 2024-10-23 14:01:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.06927v2

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Recent advancements in Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents. We wonder: can personality tests be applied to these agents to analyze their behavior, similar to humans? We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs. TRAIT is built on two psychometrically validated small human questionnaires, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC-10X knowledge graph to a variety of real-world scenarios. TRAIT also outperforms existing personality tests for LLMs in terms of reliability and validity, achieving the highest scores across four key metrics: Content Validity, Internal Validity, Refusal Rate, and Reliability. Using TRAIT, we reveal two notable insights into personalities of LLMs: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (e.g., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction.

Updated: 2024-10-23 14:01:14

标题: LLM是否具有独特和一致的个性特征？TRAIT：为LLM设计的具有心理测量学的个性测试集

摘要: 最近大型语言模型（LLMs）的进展已经导致它们在各个领域被应用为对话代理。我们想知道：个性测试能否应用于这些代理以分析它们的行为，类似于人类？我们引入了TRAIT，一个新的基准，包括8K个多选题，旨在评估LLMs的个性。TRAIT基于两个经过心理测量验证的小型人类问卷，即大五人格问卷（BFI）和短暗黑三合一问卷（SD-3），并结合ATOMIC-10X知识图谱的各种真实场景。TRAIT在可靠性和有效性方面也优于现有的LLMs个性测试，达到了四个关键指标的最高分：内容有效性、内部有效性、拒绝率和可靠性。利用TRAIT，我们揭示了LLMs个性的两个显著洞见：1）LLMs表现出独特且一致的个性，这在很大程度上受到它们的训练数据（例如用于调整对齐的数据）的影响，2）当前的提示技术在引发某些特质（例如高精神病和低责任感）方面效果有限，这表明需要在这个方向进行进一步研究。

更新时间: 2024-10-23 14:01:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14703v2

I've Got 99 Problems But FLOPS Ain't One

Hyperscalers dominate the landscape of large network deployments, yet they rarely share data or insights about the challenges they face. In light of this supremacy, what problems can we find to solve in this space? We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications. Leveraging the language models scaling laws, we discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies for intra-datacenter communication, high speed scale-up networks and transports, outlining a rich research agenda for the networking community.

Updated: 2024-10-23 14:00:36

标题: 我有99个问题，但FLOPS不是其中之一

摘要: 超大规模数据中心主导了大型网络部署的格局，然而他们很少分享他们所面临挑战的数据或见解。鉴于这种优势，我们能找到哪些问题需要解决？我们采用一种非常规的方法来找到相关的研究方向，从公开计划着手，计划建设一个用于机器学习应用的价值100亿美元的数据中心。利用语言模型的规模定律，我们发现这样一个数据中心可能承载的工作负载，并探索在这个过程中可能遇到的挑战，重点放在网络研究上。我们得出结论，建设这样的数据中心和训练这样的模型在技术上是可能的，但这需要新颖的广域传输方式用于数据中心间的通信，多路径传输和新颖的数据中心拓扑结构用于数据中心内部的通信，高速规模扩展网络和传输方式，为网络社区提供了一个丰富的研究议程。

更新时间: 2024-10-23 14:00:36

领域: cs.DC,cs.CL,cs.LG,cs.NI

下载: http://arxiv.org/abs/2407.12819v2

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning due to a lack of high-quality image-text paired data. Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity. To synthesize higher-quality data, we propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions highlighting relations among geometric elements. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results. Experiments demonstrate that the proposed method brings significant and consistent improvements on multiple LMM baselines, achieving new performance records in the 2B, 7B, and 8B settings. Notably, R-CoT-8B significantly outperforms previous state-of-the-art open-source mathematical models by 16.6% on MathVista and 9.2% on GeoQA, while also surpassing the closed-source model GPT-4o by an average of 13% across both datasets. The code is available at https://github.com/dle666/R-CoT.

Updated: 2024-10-23 13:58:39

标题: R-CoT：大型多模态模型中几何推理的逆向思维问题生成

摘要: 现有的大型多模型(LMMs)在数学几何推理方面存在困难，因为缺乏高质量的图像-文本配对数据。当前的几何数据生成方法，即应用预设模板生成几何数据或使用大型语言模型(LLMs)重新表述问题和答案(Q&A)，不可避免地限制了数据的准确性和多样性。为了合成更高质量的数据，我们提出了一个两阶段的反向思维链(R-CoT)几何问题生成流程。首先，我们引入GeoChain来生成高保真度的几何图像和相应的描述，突出几何元素之间的关系。然后，我们设计了一个逆向A&Q方法，根据描述逐步推理并从推理结果中反向生成问题。实验证明，所提出的方法在多个LMM基准上带来了显著而一致的改进，在2B、7B和8B设置中取得了新的性能记录。值得注意的是，R-CoT-8B在MathVista上比先前最先进的开源数学模型提高了16.6%，在GeoQA上提高了9.2%，同时也在两个数据集上平均超过了封闭源模型GPT-4o的13%。代码可在https://github.com/dle666/R-CoT找到。

更新时间: 2024-10-23 13:58:39

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.17885v1

Lightweight Neural App Control

This paper introduces a novel mobile phone control architecture, termed ``app agents", for efficient interactions and controls across various Android apps. The proposed Lightweight Multi-modal App Control (LiMAC) takes as input a textual goal and a sequence of past mobile observations, such as screenshots and corresponding UI trees, to generate precise actions. To address the computational constraints inherent to smartphones, within LiMAC, we introduce a small Action Transformer (AcT) integrated with a fine-tuned vision-language model (VLM) for real-time decision-making and task execution. We evaluate LiMAC on two open-source mobile control datasets, demonstrating the superior performance of our small-form-factor approach against fine-tuned versions of open-source VLMs, such as Florence2 and Qwen2-VL. It also significantly outperforms prompt engineering baselines utilising closed-source foundation models like GPT-4o. More specifically, LiMAC increases the overall action accuracy by up to 19% compared to fine-tuned VLMs, and up to 42% compared to prompt-engineering baselines.

Updated: 2024-10-23 13:57:00

标题: 轻量级神经应用控制

摘要: 本文介绍了一种新颖的移动电话控制架构，称为“应用代理”，用于在各种Android应用程序之间进行高效的交互和控制。所提出的轻量级多模态应用程序控制（LiMAC）以文本目标和过去移动观察序列（例如屏幕截图和相应的UI树）作为输入，以生成精确的操作。为了解决智能手机固有的计算约束，在LiMAC内部，我们引入了一个小型Action Transformer（AcT），与经过调整的视觉-语言模型（VLM）集成在一起，用于实时决策和任务执行。我们在两个开源移动控制数据集上评估了LiMAC，展示了我们小型形态方法在性能上优于经过调整的开源VLM版本，如Florence2和Qwen2-VL。它还明显优于利用GPT-4o等闭源基础模型的提示工程基线。具体而言，与经过调整的VLM相比，LiMAC将整体动作准确性提高了高达19％，与提示工程基线相比提高了高达42％。

更新时间: 2024-10-23 13:57:00

领域: cs.AI

下载: http://arxiv.org/abs/2410.17883v1

Identifiable Representation and Model Learning for Latent Dynamic Systems

Learning identifiable representations and models from low-level observations is useful for an intelligent spacecraft to reliability finish downstream tasks. For temporal observations, to ensure that the data generating process is provably inverted, most existing works either assume the noise variables in the dynamic mechanisms are (conditionally) independent, or require interventions which can directly affect each latent variable. However, in practice, the relationship between the exogenous inputs/interventions and the latent variables may follow some complex deterministic mechanisms. In this work, we study the problem of identifiable representation and model learning for latent dynamic systems. The key idea is that we use an inductive bias inspired by controllable canonical forms, which is invariant, sparse, and input dependent by definition. We prove that, for linear or affine nonlinear latent dynamic systems, it is possible to identify the representations up to scaling and determine the models up to some simple transformations. The results have potential to provide some theoretical guarantees for developing more trustworthy decision-making and control methods for intelligent spacecrafts.

Updated: 2024-10-23 13:55:42

标题: 可识别表示和潜在动态系统的模型学习

摘要: 从低级观察中学习可识别的表示和模型对于智能航天器可靠地完成下游任务是有用的。对于时间观测，为了确保数据生成过程可以被证明反演，大多数现有的工作要么假设动态机制中的噪声变量是（条件）独立的，要么需要直接影响每个潜变量的干预。然而，在实践中，外生输入/干预与潜变量之间的关系可能遵循一些复杂的确定性机制。在这项工作中，我们研究了潜动态系统的可识别表示和模型学习问题。关键思想是我们使用受可控典型形式启发的归纳偏差，这种偏差是不变的、稀疏的，并且根据定义依赖于输入。我们证明，对于线性或仿射非线性潜动态系统，有可能识别表示直到缩放，并确定模型直到一些简单的转换。这些结果有潜力为开发更可靠的决策制定和控制方法提供一些理论保证，以用于智能航天器。

更新时间: 2024-10-23 13:55:42

领域: cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2410.17882v1

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements due to the increasing size of the model weights and the optimizer states. Various techniques have been developed to tackle these challenges, such as low-rank adaptation (LoRA), which involves introducing a parallel trainable low-rank matrix to the fixed pre-trained weights at each layer. However, these methods often fall short compared to the full-rank weight training approach, as they restrict the parameter search to a low-rank subspace. This limitation can disrupt training dynamics and require a full-rank warm start to mitigate the impact. In this paper, we introduce a new method inspired by a phenomenon we formally prove: as training progresses, the rank of the estimated layer gradients gradually decreases, and asymptotically approaches rank one. Leveraging this, our approach involves adaptively reducing the rank of the gradients during Adam optimization steps, using an efficient online-updating low-rank projections rule. We further present a randomized SVD scheme for efficiently finding the projection matrix. Our technique enables full-parameter fine-tuning with adaptive low-rank gradient updates, significantly reducing overall memory requirements during training compared to state-of-the-art methods while improving model performance in both pretraining and fine-tuning. Finally, we provide a convergence analysis of our method and demonstrate its merits for training and fine-tuning language and biological foundation models.

Updated: 2024-10-23 13:53:26

标题: AdaRankGrad：自适应梯度排序和时刻用于内存高效的LLMs训练和微调

摘要: 训练和微调大型语言模型（LLMs）面临与内存和计算需求相关的挑战，这是由于模型权重和优化器状态的增加而导致的。已经开发了各种技术来解决这些挑战，例如低秩适应（LoRA），它涉及在每一层的固定预训练权重中引入一个并行可训练的低秩矩阵。然而，与完全秩训练方法相比，这些方法通常表现不佳，因为它们将参数搜索限制在低秩子空间中。这种限制可能会扰乱训练动态，并需要完全秩的热启动来减轻影响。在本文中，我们介绍了一种新方法，灵感来自于我们正式证明的现象：随着训练的进行，估计的层梯度的秩逐渐减小，并渐近地接近秩一。利用这一点，我们的方法涉及在Adam优化步骤期间自适应降低梯度的秩，使用高效的在线更新低秩投影规则。我们进一步提出了一种随机SVD方案，用于高效找到投影矩阵。我们的技术实现了全参数微调与自适应低秩梯度更新，与最先进的方法相比，在训练过程中显著减少了总体内存需求，同时提高了模型在预训练和微调中的性能。最后，我们对我们的方法进行了收敛分析，并展示了其在训练和微调语言和生物基础模型中的优点。

更新时间: 2024-10-23 13:53:26

领域: cs.LG

下载: http://arxiv.org/abs/2410.17881v1

GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks

The increasing demand for spatiotemporal data and modeling tasks in geosciences has made geospatial code generation technology a critical factor in enhancing productivity. Although large language models (LLMs) have demonstrated potential in code generation tasks, they often encounter issues such as refusal to code or hallucination in geospatial code generation due to a lack of domain-specific knowledge and code corpora. To address these challenges, this paper presents and open-sources the GeoCode-PT and GeoCode-SFT corpora, along with the GeoCode-Eval evaluation dataset. Additionally, by leveraging QLoRA and LoRA for pretraining and fine-tuning, we introduce GeoCode-GPT-7B, the first LLM focused on geospatial code generation, fine-tuned from Code Llama-7B. Furthermore, we establish a comprehensive geospatial code evaluation framework, incorporating option matching, expert validation, and prompt engineering scoring for LLMs, and systematically evaluate GeoCode-GPT-7B using the GeoCode-Eval dataset. Experimental results show that GeoCode-GPT outperforms other models in multiple-choice accuracy by 9.1% to 32.1%, in code summarization ability by 1.7% to 25.4%, and in code generation capability by 1.2% to 25.1%. This paper provides a solution and empirical validation for enhancing LLMs' performance in geospatial code generation, extends the boundaries of domain-specific model applications, and offers valuable insights into unlocking their potential in geospatial code generation.

Updated: 2024-10-23 13:52:51

标题: GeoCode-GPT：用于地理空间代码生成任务的大型语言模型

摘要: 地球科学中对时空数据和建模任务需求的增加，使得地理空间代码生成技术成为提高生产力的关键因素。尽管大型语言模型（LLMs）在代码生成任务中展现出潜力，但由于缺乏领域特定知识和代码语料库，它们在地理空间代码生成中经常遇到拒绝编码或幻觉等问题。为了解决这些挑战，本文介绍并开源了GeoCode-PT和GeoCode-SFT语料库，以及GeoCode-Eval评估数据集。此外，通过利用QLoRA和LoRA进行预训练和微调，我们推出了GeoCode-GPT-7B，这是首个专注于地理空间代码生成的LLM，从Code Llama-7B进行了微调。此外，我们建立了一个全面的地理空间代码评估框架，包括选项匹配、专家验证和提示工程评分用于LLMs，并系统评估了GeoCode-GPT-7B使用GeoCode-Eval数据集。实验结果显示，GeoCode-GPT在多选准确性方面优于其他模型9.1%至32.1%，在代码摘要能力方面优于其他模型1.7%至25.4%，在代码生成能力方面优于其他模型1.2%至25.1%。本文为增强LLMs在地理空间代码生成中的性能提供了解决方案和经验验证，拓展了领域特定模型应用的边界，并提供了有价值的见解，揭示了它们在地理空间代码生成中的潜力。

更新时间: 2024-10-23 13:52:51

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.17031v2

Relaxed Equivariance via Multitask Learning

Incorporating equivariance as an inductive bias into deep learning architectures to take advantage of the data symmetry has been successful in multiple applications, such as chemistry and dynamical systems. In particular, roto-translations are crucial for effectively modeling geometric graphs and molecules, where understanding the 3D structures enhances generalization. However, equivariant models often pose challenges due to their high computational complexity. In this paper, we introduce REMUL, a training procedure for approximating equivariance with multitask learning. We show that unconstrained models (which do not build equivariance into the architecture) can learn approximate symmetries by minimizing an additional simple equivariance loss. By formulating equivariance as a new learning objective, we can control the level of approximate equivariance in the model. Our method achieves competitive performance compared to equivariant baselines while being $10 \times$ faster at inference and $2.5 \times$ at training.

Updated: 2024-10-23 13:50:27

标题: 通过多任务学习实现的松弛等变性

摘要: 将等变性作为归纳偏见融入深度学习架构，以利用数据对称性已在多个应用中取得成功，如化学和动力系统。特别是，旋转平移对于有效建模几何图和分子至关重要，其中理解3D结构有助于泛化。然而，等变模型通常由于其高计算复杂性而面临挑战。在本文中，我们介绍了REMUL，一种用于近似等变性的多任务学习训练过程。我们展示了不受约束的模型（不将等变性构建到架构中）可以通过最小化额外的简单等变性损失来学习近似对称性。通过将等变性构建为一个新的学习目标，我们可以控制模型中近似等变性的水平。我们的方法在推理时实现了与等变基线相比具有竞争力的性能，同时在训练时快10倍，推理时快2.5倍。

更新时间: 2024-10-23 13:50:27

领域: cs.LG

下载: http://arxiv.org/abs/2410.17878v1

Understanding Layer Significance in LLM Alignment

Aligning large language models (LLMs) through fine-tuning is essential for tailoring them to specific applications. Therefore, understanding what LLMs learn during the alignment process is crucial. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To delve deeper into LLM alignment, we propose to identify which layers within LLMs are most critical to the alignment process, thereby uncovering how alignment influences model behavior at a granular level. We propose a novel approach to identify the important layers for LLM alignment (ILA). It involves learning a binary mask for each incremental weight matrix in the LoRA algorithm, indicating the significance of each layer. ILA consistently identifies important layers across various alignment datasets, with nearly 90% overlap even with substantial dataset differences, highlighting fundamental patterns in LLM alignment. Experimental results indicate that freezing non-essential layers improves overall model performance, while selectively tuning the most critical layers significantly enhances fine-tuning efficiency with minimal performance loss.

Updated: 2024-10-23 13:47:05

标题: 理解LLM对齐中的层重要性

摘要: 通过微调来对齐大型语言模型（LLMs）对于将它们定制到特定应用程序至关重要。因此，了解在对齐过程中LLMs学习了什么是至关重要的。最近的研究表明，对齐主要调整模型的表达风格，而不是其基础知识，这表明只有模型的某些组件受到了重大影响。为了深入探讨LLM对齐，我们提出了识别LLMs中哪些层对对齐过程至关重要的方法，从而揭示对齐如何在细粒度水平影响模型行为。我们提出了一种新颖的方法来识别LLM对齐的重要层（ILA）。它涉及在LoRA算法中为每个递增的权重矩阵学习一个二进制掩码，指示每个层的重要性。ILA在各种对齐数据集中始终识别出重要层，即使有重大数据集差异，也有近90%的重叠，突显了LLM对齐中的基本模式。实验结果表明，冻结非关键层可以提高整体模型性能，而有选择地调整最关键的层可以显著提高微调效率，同时最小化性能损失。

更新时间: 2024-10-23 13:47:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17875v1

Efficient Sketches for Training Data Attribution and Studying the Loss Landscape

The study of modern machine learning models often necessitates storing vast quantities of gradients or Hessian vector products (HVPs). Traditional sketching methods struggle to scale under these memory constraints. We present a novel framework for scalable gradient and HVP sketching, tailored for modern hardware. We provide theoretical guarantees and demonstrate the power of our methods in applications like training data attribution, Hessian spectrum analysis, and intrinsic dimension computation for pre-trained language models. Our work sheds new light on the behavior of pre-trained language models, challenging assumptions about their intrinsic dimensionality and Hessian properties.

Updated: 2024-10-23 13:44:08

标题: 高效草图用于训练数据归因和研究损失景观

摘要: 现代机器学习模型的研究通常需要存储大量的梯度或Hessian向量积（HVPs）。传统的草图方法在这些内存约束下往往难以扩展。我们提出了一种新颖的可扩展梯度和HVP草图框架，专为现代硬件量身定制。我们提供了理论保证，并展示了我们的方法在训练数据归因、Hessian谱分析和预先训练的语言模型的固有维数计算等应用中的能力。我们的工作揭示了预训练语言模型的行为，挑战了有关其固有维度和Hessian属性的假设。

更新时间: 2024-10-23 13:44:08

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.03994v2

Gradient-based Jailbreak Images for Multimodal Fusion Models

Augmenting language models with image inputs may enable more effective jailbreak attacks through continuous optimization, unlike text inputs that require discrete optimization. However, new multimodal fusion models tokenize all input modalities using non-differentiable functions, which hinders straightforward attacks. In this work, we introduce the notion of a tokenizer shortcut that approximates tokenization with a continuous function and enables continuous optimization. We use tokenizer shortcuts to create the first end-to-end gradient image attacks against multimodal fusion models. We evaluate our attacks on Chameleon models and obtain jailbreak images that elicit harmful information for 72.5% of prompts. Jailbreak images outperform text jailbreaks optimized with the same objective and require 3x lower compute budget to optimize 50x more input tokens. Finally, we find that representation engineering defenses, like Circuit Breakers, trained only on text attacks can effectively transfer to adversarial image inputs.

Updated: 2024-10-23 13:38:29

标题: 基于梯度的破解图像用于多模态融合模型

摘要: 使用图像输入增强语言模型可能通过持续优化实现更有效的越狱攻击，不同于需要离散优化的文本输入。然而，新的多模态融合模型使用不可微分函数对所有输入模态进行标记化，这会阻碍直接攻击。在这项工作中，我们引入了一个标记化快捷方式的概念，该方式用连续函数近似标记化，并实现连续优化。我们使用标记化快捷方式创建了针对多模态融合模型的首个端到端梯度图像攻击。我们在Chameleon模型上评估了我们的攻击，并获得引诱72.5%提示的有害信息的越狱图像。与使用相同目标优化的文本越狱相比，越狱图像需要3倍更低的计算预算来优化50倍更多的输入标记。最后，我们发现，仅在文本攻击上训练的表示工程防御措施，如断路器，可以有效转移到对抗性图像输入。

更新时间: 2024-10-23 13:38:29

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.03489v2

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally inefficient and memory-hungry, bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. Our thorough experimental evaluations show that TargetCall 1) improves the end-to-end basecalling runtime performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) recall in keeping on-target reads, 2) maintains high accuracy in downstream analysis, and 3) achieves better runtime performance, throughput, recall, precision, and generality compared to prior works. TargetCall is available at https://github.com/CMU-SAFARI/TargetCall.

Updated: 2024-10-23 13:36:37

标题: TargetCall：通过预基调过滤消除基调中的浪费计算

摘要: 基础召回是纳米孔测序分析中的一个关键步骤，其中纳米孔测序仪的原始信号被转换为核苷酸序列，即读取。最先进的基础召回器采用复杂的深度学习模型来实现高精度的基础召回。这使得基础召回在计算上效率低下且占用大量内存，成为整个基因组分析流程的瓶颈。然而，对于许多应用而言，大多数读取与感兴趣的参考基因组（即目标参考）不匹配，因此在后续的基因组流程中被丢弃，浪费了基础召回的计算。为了解决这个问题，我们提出了TargetCall，这是第一个用于消除基础召回中浪费计算的预基础召回过滤器。 TargetCall的关键思想是在基础召回之前丢弃不会与目标参考匹配的读取（即非靶读取）。TargetCall由两个主要组件组成：（1）LightCall，一个轻量级神经网络基础召回器，产生嘈杂的读取；和（2）相似性检查，通过将这些嘈杂的读取与目标参考进行匹配，将每个嘈杂的读取标记为靶或非靶。我们的彻底实验评估表明，TargetCall 1）提高了最先进基础召回器的端到端基础召回运行时性能3.31倍，同时保持了高（98.88％）的保留靶读取召回率，2）在下游分析中保持高准确性，3）与先前的工作相比，实现了更好的运行时性能，吞吐量，召回率，精度和普适性。TargetCall可在https://github.com/CMU-SAFARI/TargetCall 上获得。

更新时间: 2024-10-23 13:36:37

领域: q-bio.GN,cs.AI,cs.LG

下载: http://arxiv.org/abs/2212.04953v3

Population stratification for prediction of mortality in post-AKI patients

Acute kidney injury (AKI) is a serious clinical condition that affects up to 20% of hospitalised patients. AKI is associated with short term unplanned hospital readmission and post-discharge mortality risk. Patient risk and healthcare expenditures can be minimised by followup planning grounded on predictive models and machine learning. Since AKI is multi-factorial, predictive models specialised in different categories of patients can increase accuracy of predictions. In the present article we present some results following this approach.

Updated: 2024-10-23 13:36:23

标题: 人群分层对于预测急性肾损伤后患者死亡率的作用

摘要: 急性肾损伤（AKI）是一种严重的临床疾病，影响到多达20％的住院患者。AKI与短期不计划的住院再入院和出院后死亡风险相关。通过基于预测模型和机器学习的随访计划，可以最大限度地减少患者风险和医疗支出。由于AKI是多因素的，专门针对不同类别患者的预测模型可以增加预测的准确性。在本文中，我们介绍了采用这种方法得出的一些结果。

更新时间: 2024-10-23 13:36:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.17865v1

CASCRNet: An Atrous Spatial Pyramid Pooling and Shared Channel Residual based Network for Capsule Endoscopy

This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Spatial Pyramid Pooling (ASPP) blocks. Further, the performance of the proposed model is compared with other well-known approaches. The experimental results yield that proposed model provides better disease classification results. The proposed model was successful in classifying diseases with an F1 Score of 78.5% and a Mean AUC of 98.3%, which is promising given its compact architecture.

Updated: 2024-10-23 13:35:18

标题: CASCRNet：基于空洞空间金字塔池化和共享通道残差的胶囊内窥镜网络

摘要: 这篇手稿总结了MISAHUB在2024年胶囊视觉挑战中的工作。为了解决由于数据集的复杂性和不平衡性而具有挑战性的多类疾病分类任务，本文提出了CASCRNet（胶囊内窥镜-Aspp-SCR-网络），这是一个参数高效且新颖的模型，使用了共享通道残差（SCR）块和孔隙空间金字塔池（ASPP）块。此外，将提出的模型的性能与其他知名方法进行了比较。实验结果表明，提出的模型提供了更好的疾病分类结果。提出的模型成功地对疾病进行分类，其F1分数为78.5％，平均AUC为98.3％，考虑到其紧凑的架构，这是令人鼓舞的。

更新时间: 2024-10-23 13:35:18

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17863v1

DataTales: A Benchmark for Real-World Intelligent Data Narration

We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.

Updated: 2024-10-23 13:30:02

标题: DataTales：一个真实世界智能数据叙述的基准测试

摘要: 我们介绍了DataTales，一个新颖的基准测试，旨在评估语言模型在数据叙述方面的熟练程度，这是将复杂的表格数据转化为易于理解的叙述所必不可少的任务。现有的基准测试往往无法捕捉实际应用所需的分析复杂性。DataTales通过提供4.9k份财务报告及相应的市场数据来填补这一空白，展示了模型需要创造清晰叙述并分析大数据集，同时理解领域内的专业术语的需求。我们的发现突显了语言模型在实现熟练数据叙述所需的准确性和分析深度方面所面临的重大挑战，为未来模型开发和评估方法提供了有希望的途径。

更新时间: 2024-10-23 13:30:02

领域: cs.AI

下载: http://arxiv.org/abs/2410.17859v1

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to embodied decision-making in open-world environments presents challenges. A key issue is the difficulty in smoothly connecting individual entities in low-level observations with abstract concepts required for planning. A common approach to address this problem is through the use of hierarchical agents, where VLMs serve as high-level reasoners that break down tasks into executable sub-tasks, typically specified using language and imagined observations. However, language often fails to effectively convey spatial information, while generating future images with sufficient accuracy remains challenging. To address these limitations, we propose visual-temporal context prompting, a novel communication protocol between VLMs and policy models. This protocol leverages object segmentation from both past and present observations to guide policy-environment interactions. Using this approach, we train ROCKET-1, a low-level policy that predicts actions based on concatenated visual observations and segmentation masks, with real-time object tracking provided by SAM-2. Our method unlocks the full potential of VLMs visual-language reasoning abilities, enabling them to solve complex creative tasks, especially those heavily reliant on spatial understanding. Experiments in Minecraft demonstrate that our approach allows agents to accomplish previously unattainable tasks, highlighting the effectiveness of visual-temporal context prompting in embodied decision-making. Codes and demos will be available on the project page: https://craftjarvis.github.io/ROCKET-1.

Updated: 2024-10-23 13:26:59

标题: ROCKET-1: 利用视觉-时间上下文提示实现主人开放世界互动

摘要: 视觉语言模型（VLMs）在多模态任务中表现出色，但是将它们调整到在开放世界环境中做出具体决策时存在挑战。一个关键问题是在低级观察中顺利连接个体实体与规划所需的抽象概念之间的困难。解决这个问题的常见方法是使用分层代理，其中VLMs作为高级推理器，将任务分解为可执行的子任务，通常使用语言和想象的观察来指定。然而，语言通常无法有效传达空间信息，同时生成具有足够准确性的未来图像仍具有挑战性。为了解决这些限制，我们提出了视觉时间上下文提示，这是一种新颖的VLMs和策略模型之间的通信协议。该协议利用过去和现在观察的对象分割来指导策略-环境交互。使用这种方法，我们训练了ROCKET-1，一个基于连接的视觉观察和分割掩模预测动作的低级策略，实时对象跟踪由SAM-2提供。我们的方法释放了VLMs视觉语言推理能力的全部潜力，使它们能够解决复杂的创造性任务，特别是那些严重依赖空间理解的任务。在Minecraft中的实验表明，我们的方法使代理能够完成以前无法实现的任务，突出了视觉时间上下文提示在具体决策中的有效性。代码和演示将在项目页面上提供：https://craftjarvis.github.io/ROCKET-1。

更新时间: 2024-10-23 13:26:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17856v1

TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation

Generative Adversarial Networks (GANs) have emerged as a prominent research focus for image editing tasks, leveraging the powerful image generation capabilities of the GAN framework to produce remarkable results.However, prevailing approaches are contingent upon extensive training datasets and explicit supervision, presenting a significant challenge in manipulating the diverse attributes of new image classes with limited sample availability. To surmount this hurdle, we introduce TAGE, an innovative image generation network comprising three integral modules: the Codebook Learning Module (CLM), the Code Prediction Module (CPM) and the Prompt-driven Semantic Module (PSM). The CPM module delves into the semantic dimensions of category-agnostic attributes, encapsulating them within a discrete codebook. This module is predicated on the concept that images are assemblages of attributes, and thus, by editing these category-independent attributes, it is theoretically possible to generate images from unseen categories. Subsequently, the CPM module facilitates naturalistic image editing by predicting indices of category-independent attribute vectors within the codebook. Additionally, the PSM module generates semantic cues that are seamlessly integrated into the Transformer architecture of the CPM, enhancing the model's comprehension of the targeted attributes for editing. With these semantic cues, the model can generate images that accentuate desired attributes more prominently while maintaining the integrity of the original category, even with a limited number of samples. We have conducted extensive experiments utilizing the Animal Faces, Flowers, and VGGFaces datasets. The results of these experiments demonstrate that our proposed method not only achieves superior performance but also exhibits a high degree of stability when compared to other few-shot image generation techniques.

Updated: 2024-10-23 13:26:19

标题: TAGE：稳定的少样本图像生成的可信属性组编辑

摘要: 生成对抗网络（GANs）已经成为图像编辑任务的一个突出研究重点，利用GAN框架强大的图像生成能力产生了显著的成果。然而，现有方法依赖于大量的训练数据集和明确的监督，这在利用有限样本可用性操纵新图像类的多样属性方面存在重大挑战。为了克服这一障碍，我们引入了TAGE，一个创新的图像生成网络，包括三个核心模块：代码本学习模块（CLM），代码预测模块（CPM）和基于提示的语义模块（PSM）。CPM模块深入探讨了与类别无关属性的语义维度，将它们封装在一个离散的代码本中。该模块基于图像是属性的集合这一概念，因此，通过编辑这些与类别无关的属性，从理论上讲可以生成来自未见类别的图像。随后，CPM模块通过预测代码本中类别无关属性向量的索引，促进了自然的图像编辑。此外，PSM模块生成语义线索，无缝集成到CPM的Transformer架构中，增强了模型对目标属性的理解。有了这些语义线索，模型可以生成更突出地强调所需属性的图像，同时保持原始类别的完整性，即使只有有限数量的样本。我们进行了广泛的实验，利用了动物脸部、花朵和VGGFaces数据集。这些实验的结果表明，我们提出的方法不仅实现了卓越的性能，而且与其他少样本图像生成技术相比表现出高度的稳定性。

更新时间: 2024-10-23 13:26:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17855v1

The Probabilistic Tsetlin Machine: A Novel Approach to Uncertainty Quantification

Tsetlin Machines (TMs) have emerged as a compelling alternative to conventional deep learning methods, offering notable advantages such as smaller memory footprint, faster inference, fault-tolerant properties, and interpretability. Although various adaptations of TMs have expanded their applicability across diverse domains, a fundamental gap remains in understanding how TMs quantify uncertainty in their predictions. In response, this paper introduces the Probabilistic Tsetlin Machine (PTM) framework, aimed at providing a robust, reliable, and interpretable approach for uncertainty quantification. Unlike the original TM, the PTM learns the probability of staying on each state of each Tsetlin Automaton (TA) across all clauses. These probabilities are updated using the feedback tables that are part of the TM framework: Type I and Type II feedback. During inference, TAs decide their actions by sampling states based on learned probability distributions, akin to Bayesian neural networks when generating weight values. In our experimental analysis, we first illustrate the spread of the probabilities across TA states for the noisy-XOR dataset. Then we evaluate the PTM alongside benchmark models using both simulated and real-world datasets. The experiments on the simulated dataset reveal the PTM's effectiveness in uncertainty quantification, particularly in delineating decision boundaries and identifying regions of high uncertainty. Moreover, when applied to multiclass classification tasks using the Iris dataset, the PTM demonstrates competitive performance in terms of predictive entropy and expected calibration error, showcasing its potential as a reliable tool for uncertainty estimation. Our findings underscore the importance of selecting appropriate models for accurate uncertainty quantification in predictive tasks, with the PTM offering a particularly interpretable and effective solution.

Updated: 2024-10-23 13:20:42

标题: 概率Tsetlin机器：一种新的不确定性量化方法

摘要: Tsetlin Machines (TMs)已成为传统深度学习方法的一个引人注目的替代选择，具有诸如更小的内存占用、更快的推理速度、容错性和可解释性等显著优势。尽管各种TMs的改进扩展了它们在不同领域的适用性，但在理解TMs如何量化其预测中的不确定性方面仍存在基本差距。为此，本文介绍了概率Tsetlin机器（PTM）框架，旨在提供一个稳健、可靠和可解释的方法来量化不确定性。与原始TM不同，PTM学习了每个Tsetlin Automaton（TA）的每个状态保持在所有子句中的概率。这些概率是使用TM框架中的反馈表进行更新的：Type I和Type II反馈。在推理过程中，TAs根据学习到的概率分布对状态进行采样，类似于贝叶斯神经网络生成权重值时的做法。在我们的实验分析中，我们首先说明了在嘈杂的XOR数据集中TA状态之间的概率分布。然后，我们使用模拟和真实数据集评估了PTM和基准模型。对模拟数据集的实验揭示了PTM在不确定性量化方面的有效性，特别是在描绘决策边界和识别高不确定性区域方面。此外，当应用于使用Iris数据集的多类分类任务时，PTM在预测熵和预期校准误差方面展现了竞争性性能，展示了它作为不确定性估计可靠工具的潜力。我们的发现强调了在预测任务中选择适当模型以准确量化不确定性的重要性，而PTM提供了一个特别可解释和有效的解决方案。

更新时间: 2024-10-23 13:20:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17851v1

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Understanding the inner workings of large language models (LLMs) is crucial for advancing their theoretical foundations and real-world applications. While the attention mechanism and multi-layer perceptrons (MLPs) have been studied independently, their interactions remain largely unexplored. This study investigates how attention heads and next-token neurons interact in LLMs to predict new words. We propose a methodology to identify next-token neurons, find prompts that highly activate them, and determine the upstream attention heads responsible. We then generate and evaluate explanations for the activity of these attention heads in an automated manner. Our findings reveal that some attention heads recognize specific contexts relevant to predicting a token and activate a downstream token-predicting neuron accordingly. This mechanism provides a deeper understanding of how attention heads work with MLP neurons to perform next-token prediction. Our approach offers a foundation for further research into the intricate workings of LLMs and their impact on text generation and understanding.

Updated: 2024-10-23 13:20:15

标题: 在Transformer中解释上下文查找：研究注意力-MLP交互

摘要: 理解大型语言模型（LLMs）的内在运作对于推进它们的理论基础和实际应用至关重要。虽然注意力机制和多层感知器（MLPs）已被独立研究，但它们之间的相互作用仍然很少被探索。本研究调查了在LLMs中注意力头和下一个标记神经元如何相互作用以预测新单词。我们提出了一种方法来识别下一个标记神经元，找到高度激活它们的提示，并确定负责的上游注意力头。然后我们以自动化方式生成和评估这些注意力头活动的解释。我们的研究结果表明，一些注意力头识别与预测标记相关的特定上下文，并相应地激活下游的标记预测神经元。这种机制提供了对注意力头如何与MLP神经元合作执行下一个标记预测的更深入理解。我们的方法为进一步研究LLMs的复杂工作方式及其对文本生成和理解的影响奠定了基础。

更新时间: 2024-10-23 13:20:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.15055v2

Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines

Learning is a fundamental property of intelligent systems, observed across biological organisms and engineered systems. While modern intelligent systems typically rely on gradient descent for learning, the need for exact gradients and complex information flow makes its implementation in biological and neuromorphic systems challenging. This has motivated the exploration of alternative learning mechanisms that can operate locally and do not rely on exact gradients. In this work, we introduce a novel approach that leverages noise in the parameters of the system and global reinforcement signals. Using an Ornstein-Uhlenbeck process with adaptive dynamics, our method balances exploration and exploitation during learning, driven by deviations from error predictions, akin to reward prediction error. Operating in continuous time, Orstein-Uhlenbeck adaptation (OUA) is proposed as a general mechanism for learning dynamic, time-evolving environments. We validate our approach across diverse tasks, including supervised learning and reinforcement learning in feedforward and recurrent systems. Additionally, we demonstrate that it can perform meta-learning, adjusting hyper-parameters autonomously. Our results indicate that OUA provides a viable alternative to traditional gradient-based methods, with potential applications in neuromorphic computing. It also hints at a possible mechanism for noise-driven learning in the brain, where stochastic neurotransmitter release may guide synaptic adjustments.

Updated: 2024-10-23 13:19:26

标题: 奥恩施坦-乌伦贝克适应作为大脑和机器学习的机制

摘要: 学习是智能系统的基本特性，观察到在生物有机体和工程系统中都存在。虽然现代智能系统通常依赖梯度下降进行学习，但精确梯度和复杂信息流的需求使其在生物和神经形态系统中的实现具有挑战性。这促使了对可以在本地操作且不依赖精确梯度的替代学习机制的探索。在这项工作中，我们介绍了一种利用系统参数中的噪声和全局强化信号的新方法。使用具有自适应动态的Ornstein-Uhlenbeck过程，我们的方法在学习过程中平衡了探索和开发，受到误差预测偏差的驱动，类似于奖励预测误差。在连续时间内操作，Orstein-Uhlenbeck调整（OUA）被提议作为学习动态、时变环境的一般机制。我们验证了我们的方法在多样化任务中的效果，包括前馈和递归系统中的监督学习和强化学习。此外，我们证明它可以执行元学习，自主调整超参数。我们的结果表明，OUA提供了一种可行的替代传统基于梯度的方法，具有在神经形态计算中的潜在应用。它还暗示了大脑中噪声驱动学习的可能机制，其中随机神经递质释放可能引导突触调整。

更新时间: 2024-10-23 13:19:26

领域: cs.LG

下载: http://arxiv.org/abs/2410.13563v2

FOOGD: Federated Collaboration for Both Out-of-distribution Generalization and Detection

Federated learning (FL) is a promising machine learning paradigm that collaborates with client models to capture global knowledge. However, deploying FL models in real-world scenarios remains unreliable due to the coexistence of in-distribution data and unexpected out-of-distribution (OOD) data, such as covariate-shift and semantic-shift data. Current FL researches typically address either covariate-shift data through OOD generalization or semantic-shift data via OOD detection, overlooking the simultaneous occurrence of various OOD shifts. In this work, we propose FOOGD, a method that estimates the probability density of each client and obtains reliable global distribution as guidance for the subsequent FL process. Firstly, SM3D in FOOGD estimates score model for arbitrary distributions without prior constraints, and detects semantic-shift data powerfully. Then SAG in FOOGD provides invariant yet diverse knowledge for both local covariate-shift generalization and client performance generalization. In empirical validations, FOOGD significantly enjoys three main advantages: (1) reliably estimating non-normalized decentralized distributions, (2) detecting semantic shift data via score values, and (3) generalizing to covariate-shift data by regularizing feature extractor. The prejoct is open in https://github.com/XeniaLLL/FOOGD-main.git.

Updated: 2024-10-23 13:14:21

标题: FOOGD：联合协作，实现超出分布的泛化和检测

摘要: 联合学习（FL）是一种有希望的机器学习范式，它与客户模型合作捕获全局知识。然而，在真实世界场景中部署FL模型仍然不可靠，因为存在于分布数据和意外的分布之外（OOD）数据的共存，如协变量转移和语义转移数据。当前的FL研究通常通过OOD泛化来解决协变量转移数据，或通过OOD检测来解决语义转移数据，忽视了各种OOD转移同时发生的情况。在这项工作中，我们提出了FOOGD，一种估计每个客户的概率密度并获取可靠的全局分布作为后续FL过程的指导的方法。首先，在FOOGD中，SM3D估计任意分布的得分模型，无需先前的约束，并强大地检测语义转移数据。然后，在FOOGD中的SAG提供不变而多样化的知识，既适用于本地协变量转移泛化，又适用于客户性能泛化。在经验验证中，FOOGD显著享有三个主要优势：（1）可靠地估计非规范化的分散分布，（2）通过评分值检测语义转移数据，以及（3）通过规范特征提取器泛化到协变量转移数据。该项目在https://github.com/XeniaLLL/FOOGD-main.git中开放。

更新时间: 2024-10-23 13:14:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.11397v2

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

Serving systems for Large Language Models (LLMs) improve throughput by processing several requests concurrently. However, multiplexing hardware resources between concurrent requests involves non-trivial scheduling decisions. Practical serving systems typically implement these decisions at two levels: First, a load balancer routes requests to different servers which each hold a replica of the LLM. Then, on each server, an engine-level scheduler decides when to run a request, or when to queue or preempt it. Improved scheduling policies may benefit a wide range of LLM deployments and can often be implemented as "drop-in replacements" to a system's current policy. In this work, we survey scheduling techniques from the literature and from practical serving systems. We find that schedulers from the literature often achieve good performance but introduce significant complexity. In contrast, schedulers in practical deployments often leave easy performance gains on the table but are easy to implement, deploy and configure. This finding motivates us to introduce two new scheduling techniques, which are both easy to implement, and outperform current techniques on production workload traces.

Updated: 2024-10-23 13:05:46

标题: GPU是半空还是半满？LLMs的实用调度技术

摘要: 大型语言模型（LLMs）的服务系统通过同时处理多个请求来提高吞吐量。然而，在并发请求之间复用硬件资源涉及非常重要的调度决策。实际的服务系统通常在两个级别实现这些决策：首先，负载均衡器将请求路由到不同的服务器，每个服务器都保存LLM的副本。然后，在每个服务器上，引擎级调度器决定何时运行请求，或何时将其排队或抢占。改进的调度策略可能有利于各种LLM部署，并且通常可以作为系统当前策略的“插拔式替换”来实现。在这项工作中，我们调查了文献和实际服务系统中的调度技术。我们发现，文献中的调度器通常能够实现良好的性能，但引入了显着的复杂性。相比之下，实际部署中的调度器通常会丢掉易得的性能增益，但易于实现、部署和配置。这一发现促使我们引入了两种新的调度技术，它们易于实现，并在生产工作负载跟踪上表现优于当前技术。

更新时间: 2024-10-23 13:05:46

领域: cs.LG

下载: http://arxiv.org/abs/2410.17840v1

Few-Shot Adversarial Prompt Learning on Vision-Language Models

The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adaptation cost, suboptimal text supervision, and uncontrolled natural generalization capacity. In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. Specifically, we achieve this by providing adversarially correlated text supervision that is end-to-end learned from adversarial examples. We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. The proposed framework gives access to learn adversarial text supervision, which provides superior cross-modal adversarial alignment and matches state-of-the-art zero-shot adversarial robustness with only 1% training data. Code is available at: https://github.com/lionel-w2/FAP.

Updated: 2024-10-23 13:01:14

标题: 《视觉-语言模型上的少样本对抗性提示学习》

摘要: 深度神经网络对于微不可感知的对抗性扰动的脆弱性已经引起了广泛关注。受视觉-语言基础模型成功的启发，先前的努力通过将对抗性视觉特征与文本监督进行对齐，实现了零样本对抗性鲁棒性。然而，在实践中，它们仍然存在一些问题，包括重大的适应成本、次优的文本监督以及无法控制的自然泛化能力。在本文中，为了解决这些问题，我们提出了一个少样本对抗提示框架，通过有限数据调整输入序列，显著提高了对抗性鲁棒性。具体而言，我们通过提供对抗性相关的文本监督，从对抗性示例中端到端地学习，实现了这一点。我们还提出了一种增强多模态特征一致性，同时鼓励自然和对抗性示例之间不同的单模态特征的新型训练目标。所提出的框架使得学习对抗性文本监督成为可能，这提供了优越的跨模态对抗性对齐，并且仅使用1%的训练数据即可达到最先进的零样本对抗性鲁棒性。代码可在以下链接找到：https://github.com/lionel-w2/FAP。

更新时间: 2024-10-23 13:01:14

领域: cs.CV,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2403.14774v2

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

Current Large Language Models (LLMs) are predominantly designed with English as the primary language, and even the few that are multilingual tend to exhibit strong English-centric biases. Much like speakers who might produce awkward expressions when learning a second language, LLMs often generate unnatural outputs in non-English languages, reflecting English-centric patterns in both vocabulary and grammar. Despite the importance of this issue, the naturalness of multilingual LLM outputs has received limited attention. In this paper, we address this gap by introducing novel automatic corpus-level metrics to assess the lexical and syntactic naturalness of LLM outputs in a multilingual context. Using our new metrics, we evaluate state-of-the-art LLMs on a curated benchmark in French and Chinese, revealing a tendency towards English-influenced patterns. To mitigate this issue, we also propose a simple and effective alignment method to improve the naturalness of an LLM in a target language and domain, achieving consistent improvements in naturalness without compromising the performance on general-purpose benchmarks. Our work highlights the importance of developing multilingual metrics, resources and methods for the new wave of multilingual LLMs.

Updated: 2024-10-23 13:00:27

标题: 大型语言模型是否具有英语口音？评估和改进多语言LLM的自然度

摘要: 目前大多数大型语言模型（LLMs）主要以英语为主要语言设计，即使有少数多语种模型，也往往表现出明显的英语中心偏见。就像学习第二语言时可能会产生尴尬表达一样，LLMs经常在非英语语言中生成不自然的输出，反映出词汇和语法中的英语中心模式。尽管这个问题很重要，但多语种LLM输出的自然性却受到了有限的关注。在本文中，我们通过引入新颖的自动语料库级指标来评估多语种环境中LLM输出的词汇和句法自然性，填补了这一空白。利用我们的新指标，我们评估了法语和中文的最先进LLMs在一个策划的基准测试中，揭示了对英语影响的模式。为了缓解这个问题，我们还提出了一种简单有效的对齐方法，以提高LLM在目标语言和领域的自然性，实现了自然性的持续改进，而不影响通用基准测试的性能。我们的工作强调了为新一波多语种LLMs开发多语种指标、资源和方法的重要性。

更新时间: 2024-10-23 13:00:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.15956v2

ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing

Vector data is one of the two core data structures in geographic information science (GIS), essential for accurately storing and representing geospatial information. Shapefile, the most widely used vector data format, has become the industry standard supported by all major geographic information systems. However, processing this data typically requires specialized GIS knowledge and skills, creating a barrier for researchers from other fields and impeding interdisciplinary research in spatial data analysis. Moreover, while large language models (LLMs) have made significant advancements in natural language processing and task automation, they still face challenges in handling the complex spatial and topological relationships inherent in GIS vector data. To address these challenges, we propose ShapefileGPT, an innovative framework powered by LLMs, specifically designed to automate Shapefile tasks. ShapefileGPT utilizes a multi-agent architecture, in which the planner agent is responsible for task decomposition and supervision, while the worker agent executes the tasks. We developed a specialized function library for handling Shapefiles and provided comprehensive API documentation, enabling the worker agent to operate Shapefiles efficiently through function calling. For evaluation, we developed a benchmark dataset based on authoritative textbooks, encompassing tasks in categories such as geometric operations and spatial queries. ShapefileGPT achieved a task success rate of 95.24%, outperforming the GPT series models. In comparison to traditional LLMs, ShapefileGPT effectively handles complex vector data analysis tasks, overcoming the limitations of traditional LLMs in spatial analysis. This breakthrough opens new pathways for advancing automation and intelligence in the GIS field, with significant potential in interdisciplinary data analysis and application contexts.

Updated: 2024-10-23 12:58:14

标题: ShapefileGPT：用于自动Shapefile处理的多代理大型语言模型框架

摘要: 矢量数据是地理信息科学（GIS）中的两种核心数据结构之一，对于准确存储和表示地理空间信息至关重要。Shapefile是最广泛使用的矢量数据格式，已成为所有主要地理信息系统支持的行业标准。然而，处理这些数据通常需要专业的GIS知识和技能，这为来自其他领域的研究人员设置了障碍，阻碍了空间数据分析中的跨学科研究。此外，虽然大型语言模型（LLMs）在自然语言处理和任务自动化方面取得了重大进展，但它们仍然面临着处理GIS矢量数据中固有的复杂空间和拓扑关系的挑战。为了解决这些挑战，我们提出了ShapefileGPT，这是一个由LLMs驱动的创新框架，专门设计用于自动化Shapefile任务。ShapefileGPT利用了一个多代理架构，其中规划者代理负责任务分解和监督，而工作者代理执行任务。我们开发了一个专门的函数库来处理Shapefiles，并提供了全面的API文档，使工作者代理能够通过函数调用有效地操作Shapefiles。为了评估，我们基于权威教材开发了一个基准数据集，涵盖了几何运算和空间查询等类别中的任务。ShapefileGPT实现了95.24%的任务成功率，优于GPT系列模型。与传统的LLMs相比，ShapefileGPT有效地处理复杂的矢量数据分析任务，克服了传统LLMs在空间分析中的局限性。这一突破为推动GIS领域的自动化和智能化开辟了新的途径，在跨学科数据分析和应用环境中具有重要潜力。

更新时间: 2024-10-23 12:58:14

领域: cs.AI

下载: http://arxiv.org/abs/2410.12376v2

Enhancing Interaction Modeling with Agent Selection and Physical Coefficient for Trajectory Prediction

A thorough understanding of the interaction between the target agent and surrounding agents is a prerequisite for accurate trajectory prediction. Although many methods have been explored, they all assign correlation coefficients to surrounding agents in a purely learning-based manner. In this study, we present ASPILin, which manually selects interacting agents and calculates their correlations instead of attention scores. Surprisingly, these simple modifications can significantly improve prediction performance and substantially reduce computational costs. Additionally, ASPILin models the interacting agents at each past time step separately, rather than only modeling the interacting agents at the current time step. This clarifies the causal chain of the target agent's historical trajectory and helps the model better understand dynamic interactions. We intentionally simplified our model in other aspects, such as map encoding. Remarkably, experiments conducted on the INTERACTION, highD, and CitySim datasets demonstrate that our method is efficient and straightforward, outperforming other state-of-the-art methods.

Updated: 2024-10-23 12:56:05

标题: 使用代理选择和物理系数增强交互建模以实现轨迹预测

摘要: 对目标代理与周围代理之间相互作用的深入理解是准确轨迹预测的先决条件。尽管已经探索了许多方法，但它们都是以纯学习方式为周围代理分配相关系数。在这项研究中，我们提出了ASPILin，该方法手动选择互动代理并计算它们的相关性，而不是关注得分。令人惊讶的是，这些简单的修改可以显著提高预测性能并大幅降低计算成本。此外，ASPILin模型在每个过去时间步骤单独建模互动代理，而不仅仅是在当前时间步骤建模互动代理。这澄清了目标代理历史轨迹的因果链，并帮助模型更好地理解动态互动。我们在其他方面有意简化了我们的模型，比如地图编码。值得注意的是，在INTERACTION、highD和CitySim数据集上进行的实验表明，我们的方法高效且简单，胜过其他最先进的方法。

更新时间: 2024-10-23 12:56:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.13152v3

Optimal Design for Reward Modeling in RLHF

Reinforcement Learning from Human Feedback (RLHF) has become a popular approach to align language models (LMs) with human preferences. This method involves collecting a large dataset of human pairwise preferences across various text generations and using it to infer (implicitly or explicitly) a reward model. Numerous methods have been proposed to learn the reward model and align a LM with it. However, the costly process of collecting human preferences has received little attention and could benefit from theoretical insights. This paper addresses this issue and aims to formalize the reward training model in RLHF. We frame the selection of an effective dataset as a simple regret minimization task, using a linear contextual dueling bandit method. Given the potentially large number of arms, this approach is more coherent than the best-arm identification setting. We then propose an offline framework for solving this problem. Under appropriate assumptions - linearity of the reward model in the embedding space, and boundedness of the reward parameter - we derive bounds on the simple regret. Finally, we provide a lower bound that matches our upper bound up to constant and logarithmic terms. To our knowledge, this is the first theoretical contribution in this area to provide an offline approach as well as worst-case guarantees.

Updated: 2024-10-23 12:55:39

标题: 强化学习中奖励建模的最优设计

摘要: 人类反馈强化学习（RLHF）已成为一种将语言模型（LMs）与人类偏好对齐的流行方法。该方法涉及收集跨越各种文本生成的人类成对偏好的大型数据集，并使用它来推断（隐式或明确地）奖励模型。已经提出了许多方法来学习奖励模型并将LM与之对齐。然而，收集人类偏好的昂贵过程受到了很少关注，可以从理论上加以解决。本文解决了这个问题，并旨在形式化RLHF中的奖励训练模型。我们将选择有效数据集视为简单遗憾最小化任务，使用线性上下文对决赌徒方法。鉴于潜在的大量臂，这种方法比最佳臂识别设置更连贯。然后，我们提出了一个离线框架来解决这个问题。在适当的假设下 - 在嵌入空间中奖励模型的线性性，以及奖励参数的有界性 - 我们推导了简单遗憾的界限。最后，我们提供了一个下界，与我们的上界匹配直到常数和对数项。据我们所知，这是这一领域中首次提供离线方法以及最坏情况保证的理论贡献。

更新时间: 2024-10-23 12:55:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.17055v2

Optimal Streaming Algorithms for Multi-Armed Bandits

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of $n$ arms with reward distributions supported on $[0,1]$ with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \eps-$top$-$k$ arms identification problem, which asks for $k$ arms whose reward means are lower than that of the $k$-th best arm by at most $\eps$ with probability at least $1-\delta$. For general $\eps \in (0,1)$, the existing solution for this problem assumes $k = 1$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log \frac{1}{\delta})$ using $O(\log^*(n))$ ($\log^*(n)$ equals the number of times that we need to apply the logarithm function on $n$ before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any $k$ and achieves the optimal sample complexity $O(\frac{n}{\eps^2} \log\frac{k}{\delta})$ using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-\delta$ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(\log \Delta_2^{-1})$ passes, where $\Delta_2$ is the gap between the mean of the best arm and that of the second best arm.

Updated: 2024-10-23 12:54:04

标题: 多臂赌博机的最佳流式算法

摘要: 本文研究了最佳臂识别（BAI）问题的流式模型下的两种变体，在这种模型中，我们有一个包含$n$个臂的流，其奖励分布支持在$[0,1]$上，但其均值是未知的。流中的臂一个接一个地到达，算法只能在一个有限大小的内存中存储臂才能访问它们。首先，我们研究了流式$\epsilon$-$top$-$k$臂识别问题，该问题要求找到$k$个臂，其奖励均值比第$k$个最佳臂的均值低不超过$\epsilon$，且概率至少为$1-\delta$。对于一般的$\epsilon \in (0,1)$，对于这个问题的现有解假设$k=1$，并且使用$O(\frac{n}{\epsilon^2} \log \frac{1}{\delta})$的最优样本复杂度，使用$O(\log^*(n))$（$\log^*(n)$是将对数函数应用到$n$上直到结果不超过1的次数）。内存和流的单次传递。我们提出了一种算法，适用于任何$k$，并且通过使用单臂内存和流的单次传递，实现了最佳样本复杂度$O(\frac{n}{\epsilon^2} \log\frac{k}{\delta})$。其次，我们研究了流式BAI问题，其目标是以至少$1-\delta$的概率识别具有最大奖励均值的臂，使用单臂内存和尽可能少的输入流传递。我们提出了一个单臂内存算法，在$O(\log \Delta_2^{-1})$的传递次数内实现了几乎实例相关的最优样本复杂度，其中$\Delta_2$是最佳臂和第二最佳臂之间的差距。

更新时间: 2024-10-23 12:54:04

领域: cs.LG

下载: http://arxiv.org/abs/2410.17835v1

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

Diffusion models have found great success in generating high quality, natural samples of speech, but their potential for density estimation for speech has so far remained largely unexplored. In this work, we leverage an unconditional diffusion model trained only on clean speech for the assessment of speech quality. We show that the quality of a speech utterance can be assessed by estimating the likelihood of a corresponding sample in the terminating Gaussian distribution, obtained via a deterministic noising process. The resulting method is purely unsupervised, trained only on clean speech, and therefore does not rely on annotations. Our diffusion-based approach leverages clean speech priors to assess quality based on how the input relates to the learned distribution of clean data. Our proposed log-likelihoods show promising results, correlating well with intrusive speech quality metrics such as POLQA and SI-SDR.

Updated: 2024-10-23 12:53:58

标题: 使用在干净语音上训练的扩散模型进行非侵入式语音质量评估

摘要: 扩散模型在生成高质量、自然的语音样本方面取得了巨大成功，但它们在语音密度估计方面的潜力迄今仍未得到充分探索。在这项工作中，我们利用仅在清晰语音上训练的无条件扩散模型来评估语音质量。我们表明，语音话语的质量可以通过估计通过确定性加噪过程获得的终止高斯分布中相应样本的可能性来评估。所得方法纯粹是无监督的，仅在清晰语音上训练，因此不依赖于注释。我们基于扩散的方法利用清晰语音先验，根据输入与学习的清晰数据分布的关系来评估质量。我们提出的对数似然性显示出令人满意的结果，与POLQA和SI-SDR等侵入式语音质量度量良好相关。

更新时间: 2024-10-23 12:53:58

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2410.17834v1

Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks

Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.

Updated: 2024-10-23 12:53:49

标题: 通过Laplace逼近降低元学习中回归任务的方差

摘要: 鉴于一组有限的样本点，元学习算法旨在学习一种针对新的未知任务的最佳适应策略。通常，这些数据可能存在歧义，因为它可能同时属于不同的任务。这在元回归任务中尤为常见。在这种情况下，由于每个任务的支持数据有限，估计的适应策略容易受到高方差的影响，这经常导致次优的泛化性能。在这项工作中，我们解决了梯度基础元学习中方差减少的问题，并形式化了容易受此影响的一类问题，这种条件我们称之为“任务重叠”。具体来说，我们提出了一种新颖的方法，通过根据后验在参数上的方差对每个支持点进行加权，从而减少梯度估计的方差。为了估计后验，我们利用了拉普拉斯近似，这使我们能够用我们的元学习器的损失曲面的曲率来表达方差。实验结果表明了所提出方法的有效性，并突出了在元学习中方差减少的重要性。

更新时间: 2024-10-23 12:53:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.01476v2

Conquering the Communication Constraints to Enable Large Pre-Trained Models in Federated Learning

Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters. In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily-available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning and thus introduce a new framework: FedPEFT. Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.

Updated: 2024-10-23 12:50:18

标题: 摧毁通信限制，实现联邦学习中大型预训练模型

摘要: 联邦学习（FL）已经成为一种有前途的范式，可以实现模型的协作训练，而无需在本地设备上集中访问原始数据。在典型的FL范式（例如FedAvg）中，模型权重在每一轮被发送到服务器并从服务器发送到参与客户端。最近，使用小型预训练模型已经被证明在联邦学习优化和改善收敛方面是有效的。然而，最近的最先进的预训练模型变得更加强大，但也具有更多参数。在传统的FL中，共享庞大的模型权重可能会快速给系统带来巨大的通信负担，特别是如果使用了更加强大的模型。我们是否可以找到一种解决方案，使得这些强大且易于获得的预训练模型在FL中实现卓越的性能，同时减少通信负担？为此，我们研究了在联邦学习中使用参数高效调优的方法，并因此引入了一个新框架：FedPEFT。具体来说，我们系统地评估了FedPEFT在各种客户稳定性、数据分布和差分隐私设置下的性能。通过仅在本地调整和全局共享模型权重的一小部分，可以实现总通信开销的显著减少，同时在广泛的联邦学习场景中保持竞争性甚至更好的性能，为实际和有效的联邦系统提供了新范式的见解。

更新时间: 2024-10-23 12:50:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2210.01708v4

RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

In this paper we introduce RE-tune, a novel approach for fine-tuning pre-trained Multimodal Biomedical Vision-Language models (VLMs) in Incremental Learning scenarios for multi-label chest disease diagnosis. RE-tune freezes the backbones and only trains simple adaptors on top of the Image and Text encoders of the VLM. By engineering positive and negative text prompts for diseases, we leverage the ability of Large Language Models to steer the training trajectory. We evaluate RE-tune in three realistic incremental learning scenarios: class-incremental, label-incremental, and data-incremental. Our results demonstrate that Biomedical VLMs are natural continual learners and prevent catastrophic forgetting. RE-tune not only achieves accurate multi-label classification results, but also prioritizes patient privacy and it distinguishes itself through exceptional computational efficiency, rendering it highly suitable for broad adoption in real-world healthcare settings.

Updated: 2024-10-23 12:40:33

标题: RE-tune: 为多标签胸部X射线分类的生物医学视觉-语言模型进行增量微调

摘要: 在这篇论文中，我们介绍了RE-tune，这是一种新颖的方法，用于微调预训练的多模态生物医学视觉语言模型（VLMs），在增量学习场景下进行多标签胸部疾病诊断。RE-tune冻结了骨干部分，只在VLM的图像和文本编码器之上训练简单的适配器。通过为疾病设计正面和负面文本提示，我们利用大型语言模型引导训练轨迹的能力。我们在三种现实增量学习场景中评估了RE-tune：类别增量、标签增量和数据增量。我们的结果表明，生物医学VLMs是自然的持续学习者，并能防止灾难性遗忘。RE-tune不仅实现了准确的多标签分类结果，而且优先考虑了患者隐私，并以出色的计算效率区别于其他方法，使其非常适合在真实医疗环境中广泛应用。

更新时间: 2024-10-23 12:40:33

领域: cs.AI

下载: http://arxiv.org/abs/2410.17827v1

Generative AI Models for Different Steps in Architectural Design: A Literature Review

Recent advances in generative artificial intelligence (AI) technologies have been significantly driven by models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and denoising diffusion probabilistic models (DDPMs). Although architects recognize the potential of generative AI in design, personal barriers often restrict their access to the latest technological developments, thereby causing the application of generative AI in architectural design to lag behind. Therefore, it is essential to comprehend the principles and advancements of generative AI models and analyze their relevance in architecture applications. This paper first provides an overview of generative AI technologies, with a focus on probabilistic diffusion models (DDPMs), 3D generative models, and foundation models, highlighting their recent developments and main application scenarios. Then, the paper explains how the abovementioned models could be utilized in architecture. We subdivide the architectural design process into six steps and review related research projects in each step from 2020 to the present. Lastly, this paper discusses potential future directions for applying generative AI in the architectural design steps. This research can help architects quickly understand the development and latest progress of generative AI and contribute to the further development of intelligent architecture.

Updated: 2024-10-23 12:38:40

标题: 生成式人工智能模型在建筑设计不同阶段的应用：文献综述

摘要: 最近，生成式人工智能（AI）技术的快速发展主要得益于生成对抗网络（GANs）、变分自动编码器（VAEs）和去噪扩散概率模型（DDPMs）等模型。尽管建筑师们认识到生成式AI在设计中的潜力，但个人障碍通常会限制他们获取最新技术发展的机会，从而导致生成式AI在建筑设计中的应用滞后。因此，理解生成式AI模型的原理和进展，并分析其在建筑应用中的相关性是至关重要的。本文首先概述了生成式AI技术，重点介绍了概率扩散模型（DDPMs）、3D生成模型和基础模型，突出它们的最新进展和主要应用场景。然后，本文解释了如何利用上述模型在建筑中。我们将建筑设计过程分为六个步骤，并回顾了从2020年至今每个步骤中的相关研究项目。最后，本文讨论了将生成式AI应用于建筑设计步骤中的潜在未来方向。这项研究可以帮助建筑师们快速了解生成式AI的发展和最新进展，并有助于智能建筑的进一步发展。

更新时间: 2024-10-23 12:38:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01335v2

On the limits of agency in agent-based models

Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a collection of agents that act and interact within an environment. Their practical utility requires capturing realistic environment dynamics and adaptive agent behavior while efficiently simulating million-size populations. Recent advancements in large language models (LLMs) present an opportunity to enhance ABMs by using LLMs as agents with further potential to capture adaptive behavior. However, the computational infeasibility of using LLMs for large populations has hindered their widespread adoption. In this paper, we introduce AgentTorch -- a framework that scales ABMs to millions of agents while capturing high-resolution agent behavior using LLMs. We benchmark the utility of LLMs as ABM agents, exploring the trade-off between simulation scale and individual agency. Using the COVID-19 pandemic as a case study, we demonstrate how AgentTorch can simulate 8.4 million agents representing New York City, capturing the impact of isolation and employment behavior on health and economic outcomes. We compare the performance of different agent architectures based on heuristic and LLM agents in predicting disease waves and unemployment rates. Furthermore, we showcase AgentTorch's capabilities for retrospective, counterfactual, and prospective analyses, highlighting how adaptive agent behavior can help overcome the limitations of historical data in policy design. AgentTorch is an open-source project actively being used for policy-making and scientific discovery around the world. The framework is available here: github.com/AgentTorch/AgentTorch.

Updated: 2024-10-23 12:37:10

标题: 关于代理模型中代理能力的限制

摘要: 基于代理的建模（ABM）通过模拟在环境内行动和相互作用的一组代理来理解复杂系统的行为。它们的实际效用需要捕捉现实环境动态和适应性代理行为，同时有效地模拟百万规模的人口。最近大型语言模型（LLMs）的进展为通过将LLMs作为代理来增强ABMs提供了机会，并进一步捕捉适应性行为的潜力。然而，使用LLMs进行大规模人口模拟的计算不可行性阻碍了它们的广泛采用。在本文中，我们介绍了AgentTorch - 一个可以将ABMs扩展到数百万代理并使用LLMs捕捉高分辨率代理行为的框架。我们评估了LLMs作为ABM代理的效用，探讨了模拟规模和个体代理之间的平衡。以COVID-19大流行为案例研究，我们展示了AgentTorch如何模拟代表纽约市的840万代理，捕捉隔离和就业行为对健康和经济结果的影响。我们比较了基于启发式和LLM代理的不同代理架构在预测疾病波和失业率方面的性能。此外，我们展示了AgentTorch在回顾性、假设性和前瞻性分析中的能力，突出了适应性代理行为如何帮助克服历史数据在政策设计中的限制。AgentTorch是一个正在全球范围内用于政策制定和科学发现的开源项目。该框架可在此处获取：github.com/AgentTorch/AgentTorch。

更新时间: 2024-10-23 12:37:10

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2409.10568v2

Att2CPC: Attention-Guided Lossy Attribute Compression of Point Clouds

With the great progress of 3D sensing and acquisition technology, the volume of point cloud data has grown dramatically, which urges the development of efficient point cloud compression methods. In this paper, we focus on the task of learned lossy point cloud attribute compression (PCAC). We propose an efficient attention-based method for lossy compression of point cloud attributes leveraging on an autoencoder architecture. Specifically, at the encoding side, we conduct multiple downsampling to best exploit the local attribute patterns, in which effective External Cross Attention (ECA) is devised to hierarchically aggregate features by intergrating attributes and geometry contexts. At the decoding side, the attributes of the point cloud are progressively reconstructed based on the multi-scale representation and the zero-padding upsampling tactic. To the best of our knowledge, this is the first approach to introduce attention mechanism to point-based lossy PCAC task. We verify the compression efficiency of our model on various sequences, including human body frames, sparse objects, and large-scale point cloud scenes. Experiments show that our method achieves an average improvement of 1.15 dB and 2.13 dB in BD-PSNR of Y channel and YUV channel, respectively, when comparing with the state-of-the-art point-based method Deep-PCAC. Codes of this paper are available at https://github.com/I2-Multimedia-Lab/Att2CPC.

Updated: 2024-10-23 12:32:21

标题: Att2CPC: 基于注意力引导的点云属性有损压缩

摘要: 随着3D感知和采集技术的不断进步，点云数据的数量急剧增长，这促使了高效的点云压缩方法的发展。本文关注学习有损点云属性压缩（PCAC）的任务。我们提出了一种基于注意力的高效方法，利用自动编码器架构对点云属性进行有损压缩。具体地，在编码端，我们进行多次下采样以最大限度地利用局部属性模式，其中有效的外部交叉注意力（ECA）被设计用于通过整合属性和几何上下文来分层聚合特征。在解码端，基于多尺度表示和零填充上采样策略，点云的属性逐步重建。据我们所知，这是第一种在基于点的有损PCAC任务中引入注意力机制的方法。我们验证了我们模型在各种序列上的压缩效率，包括人体框架、稀疏物体和大规模点云场景。实验表明，与最先进的基于点的Deep-PCAC方法相比，我们的方法在Y通道和YUV通道的BD-PSNR上分别实现了平均提升1.15 dB和2.13 dB。本文的代码可在https://github.com/I2-Multimedia-Lab/Att2CPC 上找到。

更新时间: 2024-10-23 12:32:21

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2410.17823v1

Simplifying Deep Temporal Difference Learning

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need of a replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, R2D2 in Hanabi, QMix in Smax, PPO-RNN in Craftax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes Q-learning as a viable alternative.

Updated: 2024-10-23 12:27:12

标题: 简化深度时间差异学习

摘要: Q-learning在强化学习领域起到了基础性作用。然而，具有离线数据的TD算法，如Q-learning，或者深度神经网络等非线性函数逼近需要几种额外的技巧来稳定训练，主要包括重放缓冲区和目标网络。不幸的是，目标网络中冻结网络参数的延迟更新损害了样本效率，类似地，重放缓冲区引入了内存和实现开销。在本文中，我们调查了是否可能加速和简化TD训练同时保持其稳定性。我们的关键理论结果首次证明了，像LayerNorm这样的正则化技术可以产生可证明收敛的TD算法，而无需目标网络，即使使用离线数据。经验上，我们发现在线并行采样通过矢量化环境使训练稳定，而无需重放缓冲区。在这些发现的基础上，我们提出了PQN，我们简化的深度在线Q学习算法。令人惊讶的是，这个简单的算法与更复杂的方法竞争力强，比如：Atari中的Rainbow，Hanabi中的R2D2，Smax中的QMix，Craftax中的PPO-RNN，甚至可以比传统的DQN快50倍而不牺牲样本效率。在PPO已成为首选的强化学习算法的时代，PQN重新确立了Q-learning作为一个可行的替代方案。

更新时间: 2024-10-23 12:27:12

领域: cs.LG

下载: http://arxiv.org/abs/2407.04811v2

Stable generative modeling using Schrödinger bridges

We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling and Bayesian inference. In this paper, we propose a generative model combining Schr\"odinger bridges and Langevin dynamics. Schr\"odinger bridges over an appropriate reversible reference process are used to approximate the conditional transition probability from the available training samples, which is then implemented in a discrete-time reversible Langevin sampler to generate new samples. By setting the kernel bandwidth in the reference process to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with the time-stepping of stiff stochastic differential equations. Moreover, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem as well as generating sample trajectories of a dynamical system using conditional sampling.

Updated: 2024-10-23 12:19:16

标题: 稳定的生成建模：使用薛定谔桥连接

摘要: 我们考虑从一个未知分布中抽样的问题，只有足够多的训练样本可用。在生成建模和贝叶斯推断的背景下，这样的设置最近引起了相当大的兴趣。在本文中，我们提出了一个结合Schr\"odinger桥和Langevin动力学的生成模型。通过适当的可逆参考过程上的Schr\"odinger桥来近似从可用训练样本中的条件转移概率，然后在离散时间可逆Langevin采样器中实施以生成新样本。通过将参考过程中的核带宽设置为与未调整的Langevin算法中使用的时间步长匹配，我们的方法有效地规避了通常与刚性随机微分方程的时间步进相关的任何稳定性问题。此外，我们引入了一种新的分步方案，确保生成的样本保持在训练样本的凸包内。我们的框架可以自然地扩展到生成条件样本和贝叶斯推断问题。我们通过实验在合成数据集上展示了我们提出的方案的性能，这些数据集具有逐渐增加的维度，并且在随机子网格尺度参数化条件抽样问题以及使用条件抽样生成动力系统样本轨迹。

更新时间: 2024-10-23 12:19:16

领域: stat.ML,cs.LG,cs.NA,math.NA,stat.CO,60H10, 62F15, 62F30, 65C05, 65C40

下载: http://arxiv.org/abs/2401.04372v3

Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation

Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel Causality-Aware Spatiotemporal Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover causal relationships.

Updated: 2024-10-23 12:18:49

标题: 因果关系感知的时空图神经网络用于时空时间序列插补

摘要: 时空序列通常通过放置在不同位置的监测传感器收集，由于各种故障（如机械损坏和互联网中断），这些序列通常包含缺失值。填补缺失值对于分析时间序列至关重要。在恢复特定数据点时，大多数现有方法考虑与该点相关的所有信息，而不考虑因果关系。在数据收集过程中，不可避免地会包含一些未知的混淆因素，例如时间序列中的背景噪声和构建的传感器网络中的非因果快捷边。这些混淆因素可能打开背门路径，并建立输入和输出之间的非因果相关性。过度利用这些非因果相关性可能导致过拟合。本文首先从因果关系的角度重新审视时空序列填补问题，并展示如何通过前门调整来阻止混淆因素。基于前门调整的结果，我们介绍了一种新颖的因果感知时空图神经网络（Casper），其中包含一种新颖的Prompt Based Decoder（PBD）和一种时空因果注意力（SCA）。PBD可以减少混淆因素的影响，而SCA可以发现嵌入之间的稀疏因果关系。理论分析表明，SCA基于梯度值发现因果关系。我们在三个真实数据集上评估了Casper，实验结果表明Casper能够胜过基线方法，并有效地发现因果关系。

更新时间: 2024-10-23 12:18:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.11960v4

Learning Lossless Compression for High Bit-Depth Volumetric Medical Image

Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework, which is tailored for high bit-depth medical volume compression. The BD-LVIC framework skillfully divides the high bit-depth volume into two lower bit-depth segments: the Most Significant Bit-Volume (MSBV) and the Least Significant Bit-Volume (LSBV). The MSBV concentrates on the most significant bits of the volumetric medical image, capturing vital structural details in a compact manner. This reduction in complexity greatly improves compression efficiency using traditional codecs. Conversely, the LSBV deals with the least significant bits, which encapsulate intricate texture details. To compress this detailed information effectively, we introduce an effective learning-based compression model equipped with a Transformer-Based Feature Alignment Module, which exploits both intra-slice and inter-slice redundancies to accurately align features. Subsequently, a Parallel Autoregressive Coding Module merges these features to precisely estimate the probability distribution of the least significant bit-planes. Our extensive testing demonstrates that the BD-LVIC framework not only sets new performance benchmarks across various datasets but also maintains a competitive coding speed, highlighting its significant potential and practical utility in the realm of volumetric medical image compression.

Updated: 2024-10-23 12:18:36

标题: 学习高比特深度体积医学图像的无损压缩

摘要: 最近学习型方法在图像压缩方面取得了显著进展。然而，这些方法在处理高比特深度的体积医学图像时面临着性能下降、内存需求增加和处理速度减慢等问题。为了解决这些挑战，本文提出了基于位分割的无损体积图像压缩（BD-LVIC）框架，专为高比特深度医学体积压缩而设计。BD-LVIC框架巧妙地将高比特深度体积分成两个较低比特深度的段：最重要位体积（MSBV）和最不重要位体积（LSBV）。MSBV集中于体积医学图像的最重要位，以紧凑的方式捕捉关键结构细节。这种降低复杂性极大地提高了使用传统编解码器的压缩效率。相反，LSBV处理最不重要位，这些位包含复杂的纹理细节。为了有效地压缩这些详细信息，我们引入了一个配备Transformer-Based Feature Alignment Module的有效学习型压缩模型，该模型利用切片内和切片间的冗余来准确对齐特征。随后，一个并行自回归编码模块将这些特征合并，精确估计最不重要位平面的概率分布。我们的广泛测试表明，BD-LVIC框架不仅在各种数据集上设定了新的性能基准，而且保持了竞争性编码速度，突显了其在体积医学图像压缩领域中的重要潜力和实际实用性。

更新时间: 2024-10-23 12:18:36

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17814v1

PGDiffSeg: Prior-Guided Denoising Diffusion Model with Parameter-Shared Attention for Breast Cancer Segmentation

Early detection through imaging and accurate diagnosis is crucial in mitigating the high mortality rate associated with breast cancer. However, locating tumors from low-resolution and high-noise medical images is extremely challenging. Therefore, this paper proposes a novel PGDiffSeg (Prior-Guided Diffusion Denoising Model with Parameter-Shared Attention) that applies diffusion denoising methods to breast cancer medical image segmentation, accurately recovering the affected areas from Gaussian noise. Firstly, we design a parallel pipeline for noise processing and semantic information processing and propose a parameter-shared attention module (PSA) in multi-layer that seamlessly integrates these two pipelines. This integration empowers PGDiffSeg to incorporate semantic details at multiple levels during the denoising process, producing highly accurate segmentation maps. Secondly, we introduce a guided strategy that leverages prior knowledge to simulate the decision-making process of medical professionals, thereby enhancing the model's ability to locate tumor positions precisely. Finally, we provide the first-ever discussion on the interpretability of the generative diffusion model in the context of breast cancer segmentation. Extensive experiments have demonstrated the superiority of our model over the current state-of-the-art approaches, confirming its effectiveness as a flexible diffusion denoising method suitable for medical image research. Our code will be publicly available later.

Updated: 2024-10-23 12:17:03

标题: PGDiffSeg：具有参数共享注意力的先验引导去噪扩散模型用于乳腺癌分割

摘要: 通过影像学早期检测和准确诊断对于减少与乳腺癌相关的高死亡率至关重要。然而，从低分辨率和高噪声医学图像中定位肿瘤是极具挑战性的。因此，本文提出了一种新颖的PGDiffSeg（具有参数共享注意力的先验引导扩散降噪模型），将扩散降噪方法应用于乳腺癌医学图像分割，准确地从高斯噪声中恢复受影响区域。首先，我们设计了一个并行管道，用于噪声处理和语义信息处理，并在多层中提出了一个参数共享注意力模块（PSA），无缝地整合了这两个管道。这种整合使得PGDiffSeg在去噪过程中能够在多个层次上整合语义细节，从而产生高度准确的分割地图。其次，我们引入了一种引导策略，利用先验知识模拟医疗专业人员的决策过程，从而增强模型精确定位肿瘤位置的能力。最后，我们首次讨论了生成扩散模型在乳腺癌分割背景下的可解释性。大量实验证实了我们的模型优于当前最先进的方法，证实了其作为适用于医学图像研究的灵活扩散降噪方法的有效性。我们的代码将在稍后公开。

更新时间: 2024-10-23 12:17:03

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.17812v1

MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training

As large language models continue to scale up, distributed training systems have expanded beyond 10k nodes, intensifying the importance of fault tolerance. Checkpoint has emerged as the predominant fault tolerance strategy, with extensive studies dedicated to optimizing its efficiency. However, the advent of the sparse Mixture-of-Experts (MoE) model presents new challenges due to the substantial increase in model size, despite comparable computational demands to dense models. In this work, we propose the Mixture-of-Checkpoint System (MoC-System) to orchestrate the vast array of checkpoint shards produced in distributed training systems. MoC-System features a novel Partial Experts Checkpointing (PEC) mechanism, an algorithm-system co-design that strategically saves a selected subset of experts, effectively reducing the MoE checkpoint size to levels comparable with dense models. Incorporating hybrid parallel strategies, MoC-System involves fully sharded checkpointing strategies to evenly distribute the workload across distributed ranks. Furthermore, MoC-System introduces a two-level checkpointing management method that asynchronously handles in-memory snapshots and persistence processes. We build MoC-System upon the Megatron-DeepSpeed framework, achieving up to a 98.9% reduction in overhead for each checkpointing process compared to the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism. Additionally, extensive empirical analyses substantiate that our methods enhance efficiency while maintaining comparable model accuracy, even achieving an average accuracy increase of 1.08% on downstream tasks.

Updated: 2024-10-23 12:08:33

标题: MoC-System：稀疏专家混合模型训练的高效容错能力

摘要: 随着大型语言模型的不断扩大，分布式训练系统已经扩展到超过10,000个节点，进一步强调了容错性的重要性。检查点已经成为主要的容错策略，有大量研究致力于优化其效率。然而，稀疏的专家混合模型的出现带来了新的挑战，因为尽管计算需求与密集模型相当，但模型大小大大增加。在这项工作中，我们提出了混合检查点系统（MoC-System）来协调分布式训练系统中产生的大量检查点分片。MoC-System具有一种新颖的部分专家检查点（PEC）机制，这是一种算法系统协同设计，可以策略性地保存所选专家的子集，从而有效地将MoE检查点大小降低到与密集模型相当的水平。MoC-System融合了混合并行策略，采用全分片检查点策略，可以平均分配工作负载到分布式等级。此外，MoC-System引入了一种两级检查点管理方法，异步处理内存快照和持久性进程。我们基于Megatron-DeepSpeed框架构建了MoC-System，在使用ZeRO-2数据并行性和专家并行性训练MoE模型时，每个检查点过程的开销相比原始方法减少了高达98.9%。此外，广泛的实证分析证实了我们的方法在提高效率的同时保持了可比的模型准确性，甚至在下游任务上实现了平均准确度增加1.08%。

更新时间: 2024-10-23 12:08:33

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.04307v2

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Full-duplex spoken dialogue systems significantly advance over traditional turn-based dialogue systems, as they allow simultaneous bidirectional communication, closely mirroring human-human interactions. However, achieving low latency and natural interactions in full-duplex dialogue systems remains a significant challenge, especially considering human conversation dynamics such as interruptions, backchannels, and overlapping speech. In this paper, we introduce a novel End-to-End GPT-based model OmniFlatten for full-duplex conversation, capable of effectively modeling the complex behaviors inherent to natural conversations with low latency. To achieve full-duplex communication capabilities, we propose a multi-stage post-training scheme that progressively adapts a text-based large language model (LLM) backbone into a speech-text dialogue LLM, capable of generating text and speech in real time, without modifying the architecture of the backbone LLM. The training process comprises three stages: modality alignment, half-duplex dialogue learning, and full-duplex dialogue learning. Throughout all training stages, we standardize the data using a flattening operation, which allows us to unify the training methods and the model architecture across different modalities and tasks. Our approach offers a straightforward modeling technique and a promising research direction for developing efficient and natural end-to-end full-duplex spoken dialogue systems. Audio samples of dialogues generated by OmniFlatten can be found at this web site (https://omniflatten.github.io/).

Updated: 2024-10-23 11:58:58

标题: OmniFlatten：一种无缝语音对话的端到端GPT模型

摘要: 全双工口语对话系统在传统的基于轮次的对话系统上有了显著进展，因为它们允许同时进行双向通信，密切模拟人际互动。然而，在全双工对话系统中实现低延迟和自然交互仍然是一个重大挑战，尤其是考虑到人类对话动态，如打断、回声和重叠语音。在本文中，我们引入了一种新颖的基于End-to-End GPT的模型OmniFlatten，用于全双工对话，能够有效地建模固有于自然对话中的复杂行为，并具有低延迟。为了实现全双工通信能力，我们提出了一个多阶段后训练方案，逐渐将基于文本的大型语言模型（LLM）骨干适应成为一种能够实时生成文本和语音的语音-文本对话LLM，而不需要修改骨干LLM的架构。训练过程包括三个阶段：模态对齐、半双工对话学习和全双工对话学习。在所有训练阶段中，我们使用一种扁平化操作对数据进行标准化，这使我们能够统一跨不同模态和任务的训练方法和模型架构。我们的方法提供了一种直观的建模技术和一个有前途的研究方向，用于开发高效和自然的端到端全双工口语对话系统。OmniFlatten生成的对话音频样本可以在这个网站（https://omniflatten.github.io/）找到。

更新时间: 2024-10-23 11:58:58

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.17799v1

Automated Contrastive Learning Strategy Search for Time Series

In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods manually build specific CL Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually developing CLS usually requires excessive prior knowledge about the data, and massive experiments to determine the detailed CL configurations. In this paper, we present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns CLS for time series datasets and tasks, namely Automated Contrastive Learning (AutoCL). We first construct a principled search space of size over $3\times10^{12}$, covering data augmentation, embedding transformation, contrastive pair construction, and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimizes CLS from the performance on the validation tasks, to obtain effective CLS within the space. Experimental results on various real-world datasets demonstrate that AutoCL could automatically find the suitable CLS for the given dataset and task. From the candidate CLS found by AutoCL on several public datasets/tasks, we compose a transferable Generally Good Strategy (GGS), which has a strong performance for other datasets. We also provide empirical analysis as a guide for the future design of CLS.

Updated: 2024-10-23 11:54:42

标题: 自动对比学习策略搜索用于时间序列

摘要: 近年来，对比学习（CL）已成为时间序列表示学习的主导范式。大多数现有方法通过人类启发式手段手动构建特定的对比学习策略（CLS）用于特定数据集和任务。然而，手动开发CLS通常需要对数据具有过多的先验知识，并且需要进行大量实验来确定详细的对比学习配置。在本文中，我们介绍了微软的自动化机器学习（AutoML）实践，该实践可以自动为时间序列数据集和任务学习CLS，即自动对比学习（AutoCL）。我们首先构建了一个规模超过$3\times10^{12}$的原则性搜索空间，涵盖数据增强、嵌入变换、对比对构建和对比损失。此外，我们引入了一种高效的强化学习算法，该算法通过在验证任务上的性能进行优化，从而在该空间中获得有效的CLS。对各种真实世界数据集的实验结果表明，AutoCL能够自动为给定数据集和任务找到合适的CLS。从AutoCL在几个公共数据集/任务上找到的候选CLS中，我们组合了一个可转移的通用好策略（GGS），该策略对其他数据集具有很强的性能。我们还提供了经验分析作为未来设计CLS的指南。

更新时间: 2024-10-23 11:54:42

领域: cs.LG

下载: http://arxiv.org/abs/2403.12641v3

A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression

This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions. Our contributions are three-fold: 1) we analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel; 2) we demonstrate the validity of the Gaussian Equivalent Property (GEP), which states that the generalization performance of KRR remains the same when the whitened features are replaced by standard Gaussian vectors, thereby shedding light on the success of previous analyzes under the Gaussian Design Assumption; 3) we derive novel bounds that improve over existing bounds across a broad range of setting such as (in)dependent feature vectors and various combinations of eigen-decay rates in the over/underparameterized regimes.

Updated: 2024-10-23 11:52:52

标题: Kernel Ridge Regression中的学习曲线综合分析

摘要: 本文在最小假设条件下进行了对核岭回归（KRR）学习曲线的全面研究。我们的贡献有三个方面：1）我们分析了核的关键属性，如其谱特征衰减、特征值函数的特性以及核的平滑性；2）我们证明了高斯等效性质（GEP）的有效性，该性质指出当白化特征被标准高斯向量替换时，KRR的泛化性能保持不变，从而阐明了在高斯设计假设下以前分析成功的原因；3）我们推导了新的界限，这些界限在（非）独立特征向量和在过/欠参数化区域中各种特征值衰减率的组合情况下改进了现有的界限。

更新时间: 2024-10-23 11:52:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.17796v1

Enhancing Federated Learning Convergence with Dynamic Data Queue and Data Entropy-driven Participant Selection

Federated Learning (FL) is a decentralized approach for collaborative model training on edge devices. This distributed method of model training offers advantages in privacy, security, regulatory compliance, and cost-efficiency. Our emphasis in this research lies in addressing statistical complexity in FL, especially when the data stored locally across devices is not identically and independently distributed (non-IID). We have observed an accuracy reduction of up to approximately 10\% to 30\%, particularly in skewed scenarios where each edge device trains with only 1 class of data. This reduction is attributed to weight divergence, quantified using the Euclidean distance between device-level class distributions and the population distribution, resulting in a bias term ($\delta_k$). As a solution, we present a method to improve convergence in FL by creating a global subset of data on the server and dynamically distributing it across devices using a Dynamic Data queue-driven Federated Learning (DDFL). Next, we leverage Data Entropy metrics to observe the process during each training round and enable reasonable device selection for aggregation. Furthermore, we provide a convergence analysis of our proposed DDFL to justify their viability in practical FL scenarios, aiming for better device selection, a non-sub-optimal global model, and faster convergence. We observe that our approach results in a substantial accuracy boost of approximately 5\% for the MNIST dataset, around 18\% for CIFAR-10, and 20\% for CIFAR-100 with a 10\% global subset of data, outperforming the state-of-the-art (SOTA) aggregation algorithms.

Updated: 2024-10-23 11:47:04

标题: 使用动态数据队列和数据熵驱动的参与者选择增强联邦学习的收敛性

摘要: 联邦学习（FL）是一种在边缘设备上进行协作模型训练的分散化方法。这种分布式模型训练方法在隐私、安全、监管合规性和成本效益方面具有优势。我们在这项研究中着重解决FL中的统计复杂性问题，特别是当存储在设备上的数据不是完全独立和相同分布（非IID）时。我们观察到在偏斜情况下，每个边缘设备仅训练1类数据时，准确度降低了约10％至30％。这种降低归因于权重发散，可以用设备级别类分布与总体分布之间的欧几里得距离来量化，导致了一个偏差项（δk）。作为解决方案，我们提出了一种通过在服务器上创建全局数据子集并使用动态数据队列驱动的联邦学习（DDFL）动态分配到设备来改善FL中的收敛性的方法。接下来，我们利用数据熵指标来观察每轮训练过程，并为聚合提供合理的设备选择。此外，我们提供了对我们提出的DDFL的收敛分析，以验证它们在实际FL场景中的可行性，旨在实现更好的设备选择、非最优全局模型和更快的收敛速度。我们观察到，我们的方法在MNIST数据集上实现了约5％的显着准确度提升，在CIFAR-10上约18％，在CIFAR-100上约20％，使用全局数据子集的10％，优于最先进的聚合算法。

更新时间: 2024-10-23 11:47:04

领域: cs.LG,cs.AI,cs.CR,14J60 (Primary),I.2.11; I.5.1; I.5.4

下载: http://arxiv.org/abs/2410.17792v1

Large Language Models Engineer Too Many Simple Features For Tabular Data

Tabular machine learning problems often require time-consuming and labor-intensive feature engineering. Recent efforts have focused on using large language models (LLMs) to capitalize on their potential domain knowledge. At the same time, researchers have observed ethically concerning negative biases in other LLM-related use cases, such as text generation. These developments motivated us to investigate whether LLMs exhibit a bias that negatively impacts the performance of feature engineering. While not ethically concerning, such a bias could hinder practitioners from fully utilizing LLMs for automated data science. Therefore, we propose a method to detect potential biases by detecting anomalies in the frequency of operators (e.g., adding two features) suggested by LLMs when engineering new features. Our experiments evaluate the bias of four LLMs, two big frontier and two small open-source models, across 27 tabular datasets. Our results indicate that LLMs are biased toward simple operators, such as addition, and can fail to utilize more complex operators, such as grouping followed by aggregations. Furthermore, the bias can negatively impact the predictive performance when using LLM-generated features. Our results call for mitigating bias when using LLMs for feature engineering.

Updated: 2024-10-23 11:37:20

标题: 大型语言模型为表格数据工程师生成了太多简单特征

摘要: 表格机器学习问题通常需要耗时耗力的特征工程。近期的努力集中在利用大型语言模型(LLMs)来利用它们的潜在领域知识。与此同时，研究人员观察到其他LLM相关用例中存在令人担忧的负面偏见，比如文本生成。这些发展激励我们调查LLMs是否存在会对特征工程性能产生负面影响的偏见。虽然这并不构成道德上的担忧，但这种偏见可能会阻碍从业者充分利用LLMs进行自动化数据科学。因此，我们提出了一种方法，通过检测LLMs在工程新特征时建议的操作符(例如，添加两个特征)的频率异常来检测潜在的偏见。我们的实验评估了四个LLMs的偏见，包括两个大型前沿模型和两个小型开源模型，在27个表格数据集上。我们的结果表明，LLMs倾向于简单的操作符，比如加法，而可能无法利用更复杂的操作符，比如分组后的聚合。此外，这种偏见可能会在使用LLM生成的特征时对预测性能产生负面影响。我们的结果呼吁在使用LLMs进行特征工程时减轻偏见。

更新时间: 2024-10-23 11:37:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17787v1

A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning

Tool use, planning, and feedback learning are currently three prominent paradigms for developing Large Language Model (LLM)-based agents across various tasks. Although numerous frameworks have been devised for each paradigm, their intricate workflows and inconsistent taxonomy create challenges in understanding and reviewing the frameworks across different paradigms. This survey introduces a unified taxonomy to systematically review and discuss these frameworks. Specifically, 1) the taxonomy defines environments/tasks, common LLM-profiled roles or LMPRs (policy models, evaluators, and dynamic models), and universally applicable workflows found in prior work, and 2) it enables a comparison of key perspectives on the implementations of LMPRs and workflow designs across different agent paradigms and frameworks. 3) Finally, we identify three limitations in existing workflow designs and systematically discuss the future work.

Updated: 2024-10-23 11:36:57

标题: 一种基于LLM的代理人突出范式的综述：工具使用（包括RAG）、规划和反馈学习

摘要: 工具使用、规划和反馈学习目前是开发基于大型语言模型（LLM）的代理人的三个突出范式，涵盖各种任务。尽管已经针对每种范式设计了许多框架，但它们复杂的工作流程和不一致的分类法在理解和审查跨不同范式的框架时存在挑战。本调查引入了一个统一的分类法，系统地审查和讨论这些框架。具体来说，1）分类法定义了环境/任务、常见的LLM配置文件角色或LMPRs（策略模型、评估者和动态模型），以及以往工作中发现的通用工作流程；2）它使得可以比较不同代理人范式和框架中LMPRs的实施和工作流程设计的关键观点；3）最后，我们确定了现有工作流程设计中的三个局限性，并系统地讨论未来的工作。

更新时间: 2024-10-23 11:36:57

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2406.05804v4

Holon Programming Model -- A Software-Defined Approach for System of Systems

As Systems of Systems evolve into increasingly complex networks, harnessing their collective potential becomes paramount. Traditional SoS engineering approaches lack the necessary programmability to develop third party SoS level behaviors. To address this challenge, we propose a software defined approach to enable flexible and adaptive programming of SoS. We introduce the Holon Programming Model, a software-defined framework designed to meet these needs. The Holon Programming Model empowers developers to design and orchestrate complex system behaviors effectively, as illustrated in our disaster management scenario. This research outlines the Holon Programming Model theoretical underpinnings and practical applications, with the aim of driving further exploration and advancement in the field of software defined SoS

Updated: 2024-10-23 11:34:29

标题: Holon编程模型 - 一种面向系统集成的软件定义方法

摘要: 随着系统的系统逐渐演变为日益复杂的网络，利用它们的集体潜力变得至关重要。传统的系统的系统工程方法缺乏必要的可编程性来开发第三方系统的系统级行为。为了解决这一挑战，我们提出了一种软件定义方法，以实现对系统的系统的灵活和自适应编程。我们介绍了Holon编程模型，这是一个软件定义的框架，旨在满足这些需求。Holon编程模型赋予开发人员有效地设计和编排复杂系统行为的能力，正如我们在灾难管理场景中所展示的。本研究概述了Holon编程模型的理论基础和实际应用，旨在推动软件定义系统的系统领域的进一步探索和进步。

更新时间: 2024-10-23 11:34:29

领域: cs.AI,cs.ET,cs.SE

下载: http://arxiv.org/abs/2410.17784v1

Evaluating Explanations Through LLMs: Beyond Traditional User Studies

As AI becomes fundamental in sectors like healthcare, explainable AI (XAI) tools are essential for trust and transparency. However, traditional user studies used to evaluate these tools are often costly, time consuming, and difficult to scale. In this paper, we explore the use of Large Language Models (LLMs) to replicate human participants to help streamline XAI evaluation. We reproduce a user study comparing counterfactual and causal explanations, replicating human participants with seven LLMs under various settings. Our results show that (i) LLMs can replicate most conclusions from the original study, (ii) different LLMs yield varying levels of alignment in the results, and (iii) experimental factors such as LLM memory and output variability affect alignment with human responses. These initial findings suggest that LLMs could provide a scalable and cost-effective way to simplify qualitative XAI evaluation.

Updated: 2024-10-23 11:31:52

标题: 通过LLMs评估解释：超越传统用户研究

摘要: 随着人工智能在诸如医疗保健等领域变得至关重要，可解释的人工智能（XAI）工具对于建立信任和透明度至关重要。然而，传统的用于评估这些工具的用户研究通常成本高昂、耗时且难以扩展。在本文中，我们探讨了使用大型语言模型（LLMs）来复制人类参与者以帮助简化XAI评估的可能性。我们重现了一项比较反事实和因果解释的用户研究，使用七个LLMs在不同设置下复制人类参与者。我们的结果显示：（i）LLMs可以复制原始研究中的大部分结论，（ii）不同的LLMs产生不同程度的结果一致性，（iii）LLM的记忆和输出变化等实验因素影响与人类反应的一致性。这些初步发现表明，LLMs可能提供一种可扩展且具有成本效益的方式来简化定性XAI评估。

更新时间: 2024-10-23 11:31:52

领域: cs.AI

下载: http://arxiv.org/abs/2410.17781v1

Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets. Moreover, robot policies that follow natural language instructions are typically trained on either templated language or expensive human-labeled instructions, hindering their scalability. To this end, we introduce NILS: Natural language Instruction Labeling for Scalability. NILS automatically labels uncurated, long-horizon robot data at scale in a zero-shot manner without any human intervention. NILS combines pretrained vision-language foundation models in order to detect objects in a scene, detect object-centric changes, segment tasks from large datasets of unlabelled interaction data and ultimately label behavior datasets. Evaluations on BridgeV2, Fractal, and a kitchen play dataset show that NILS can autonomously annotate diverse robot demonstrations of unlabeled and unstructured datasets while alleviating several shortcomings of crowdsourced human annotations, such as low data quality and diversity. We use NILS to label over 115k trajectories obtained from over 430 hours of robot data. We open-source our auto-labeling code and generated annotations on our website: http://robottasklabeling.github.io.

Updated: 2024-10-23 11:19:48

标题: 使用基础模型进行零样本标注的机器人策略学习的扩展

摘要: 针对开发能够将人类语言与感知和行动联系起来的机器人的一个核心挑战是各种机器人数据集中自然语言注释的稀缺。此外，遵循自然语言指令的机器人策略通常是在模板化语言或昂贵的人工标记指令上训练的，限制了它们的可扩展性。为此，我们引入了NILS：用于扩展性的自然语言指令标记。NILS以零干预的方式自动标记规模化未经筛选的、长期的机器人数据，通过结合预训练的视觉语言基础模型来检测场景中的对象、检测以对象为中心的变化、从大规模未标记的交互数据集中分割任务，最终标记行为数据集。对BridgeV2、Fractal和一个厨房游戏数据集的评估表明，NILS可以自主注释未标记和无结构数据集的多样化机器人演示，同时缓解众包人类注释的几个缺点，如数据质量和多样性不足。我们使用NILS标记了从超过430个小时的机器人数据中获得的超过115,000个轨迹。我们在我们的网站上公开了我们的自动标记代码和生成的注释：http://robottasklabeling.github.io。

更新时间: 2024-10-23 11:19:48

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17772v1

Locating Information in Large Language Models via Random Matrix Theory

As large language models (LLMs) become central to AI applications, gaining a deeper understanding of their inner workings is increasingly important. In this work, we analyze the weight matrices of pretrained transformer models -- specifically BERT and Llama -- using random matrix theory (RMT) as a zero-information hypothesis. While randomly initialized weights perfectly agree with RMT predictions, deviations emerge after training, allowing us to locate learned structures within the models. We identify layer-type specific behaviors that are consistent across all blocks and architectures considered. By pinpointing regions that deviate from RMT predictions, we highlight areas of feature learning and confirm this through comparisons with the activation covariance matrices of the corresponding layers. Our method provides a diagnostic tool for identifying relevant regions in transformer weights using only the trained matrices. Additionally, we address the ongoing debate regarding the significance of small singular values in the context of fine-tuning and alignment in LLMs. Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities, suggesting that removing them in an already aligned transformer can be detrimental, as it may compromise model alignment.

Updated: 2024-10-23 11:19:08

标题: 通过随机矩阵理论在大型语言模型中定位信息

摘要: 随着大型语言模型（LLMs）成为人工智能应用的核心，深入了解它们的内部工作变得越来越重要。在这项工作中，我们使用随机矩阵理论（RMT）作为零信息假设，分析了预训练变换器模型（具体为BERT和Llama）的权重矩阵。虽然随机初始化的权重完全符合RMT的预测，但在训练后出现偏差，使我们能够定位模型中的学习结构。我们确定了在所有考虑的块和架构中一致的特定于层类型的行为。通过确定偏离RMT预测的区域，我们突出了特征学习领域，并通过与相应层的激活协方差矩阵进行比较来确认这一点。我们的方法提供了一种诊断工具，可仅使用经过训练的矩阵识别变压器权重中的相关区域。此外，我们就关于小奇异值在精调和LLMs对齐背景下的重要性进行了讨论。我们的研究结果表明，精调后，小奇异值在模型的能力中起着至关重要的作用，这表明在已对齐的变压器中删除它们可能是有害的，因为这可能会损害模型的对齐性。

更新时间: 2024-10-23 11:19:08

领域: cs.LG,cond-mat.dis-nn

下载: http://arxiv.org/abs/2410.17770v1

GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model

Retrieval-Augmented Generation (RAG) systems are widely used across various industries for querying closed-domain and in-house knowledge bases. However, evaluating these systems presents significant challenges due to the private nature of closed-domain data and a scarcity of queries with verifiable ground truths. Moreover, there is a lack of analytical methods to diagnose problematic modules and identify types of failure, such as those caused by knowledge deficits or issues with robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising a grounded data generation process and an evaluation protocol that effectively pinpoints defective modules. Our validation experiments reveal that GRAMMAR provides a reliable approach for identifying vulnerable modules and supports hypothesis testing for textual form vulnerabilities. An open-source tool accompanying this framework is available in our GitHub repository (see https://github.com/xinzhel/grammar), allowing for easy reproduction of our results and enabling reliable and modular evaluation in closed-domain settings.

Updated: 2024-10-23 11:19:02

标题: 语法：基于实证和模块化方法的封闭域检索增强语言模型评估

摘要: 检索增强生成（RAG）系统被广泛应用于各个行业，用于查询封闭领域和内部知识库。然而，由于封闭领域数据的私密性以及可验证基本事实查询的稀缺性，评估这些系统存在重大挑战。此外，目前缺乏分析方法来诊断有问题的模块并确定失败类型，例如由于知识缺陷或鲁棒性问题引起的失败。为了解决这些挑战，我们引入了GRAMMAR（GRounded And Modular Methodology for Assessment of RAG），这是一个评估框架，包括一个基于实地数据生成过程和一个评估协议，有效地找出有缺陷的模块。我们的验证实验显示，GRAMMAR提供了一种可靠的方法，用于识别易受攻击的模块，并支持用于文本形式易受攻击性的假设测试。伴随此框架的开源工具可在我们的GitHub存储库中找到（请参见https://github.com/xinzhel/grammar），可轻松重现我们的结果，并在封闭领域设置中实现可靠且模块化的评估。

更新时间: 2024-10-23 11:19:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.19232v7

Belt and Braces: When Federated Learning Meets Differential Privacy

Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which however is still challenging.Practitioners often not only are not fully aware of its development and categorization, but also face a hard choice between privacy and utility. Therefore, it calls for a holistic review of current advances and an investigation on the challenges and opportunities for highly usable FL systems with a DP guarantee. In this article, we first introduce the primary concepts of FL and DP, and highlight the benefits of integration. We then review the current developments by categorizing different paradigms and notions. Aiming at usable FL with DP, we present the optimization principles to seek a better tradeoff between model utility and privacy loss. Finally, we discuss future challenges in the emergent areas and relevant research topics.

Updated: 2024-10-23 11:17:12

标题: Belt and Braces: 当联邦学习遇见差分隐私

摘要: 联邦学习（FL）具有在不暴露原始数据的情况下进行大规模机器学习（ML）的巨大潜力。差分隐私（DP）是具有可证明保证的隐私保护的事实标准。ML的进步表明，DP将是FL与全面隐私保护完美匹配。因此，人们已经投入了大量精力来实现带有DP的实用FL，但这仍然具有挑战性。从业者们通常不仅对其发展和分类不完全了解，而且在隐私和效用之间面临着艰难选择。因此，需要对当前进展进行全面审查，并对具有DP保证的高度可用FL系统的挑战和机遇进行调查。在本文中，我们首先介绍FL和DP的主要概念，并强调集成的好处。然后，我们通过对不同范式和概念进行分类来回顾当前发展。针对具有DP的可用FL，我们提出了寻求模型效用和隐私损失之间更好权衡的优化原则。最后，我们讨论新兴领域的未来挑战和相关研究主题。

更新时间: 2024-10-23 11:17:12

领域: cs.CR

下载: http://arxiv.org/abs/2404.18814v2

Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies

The advancement of large language model (LLM) based artificial intelligence technologies has been a game-changer, particularly in sentiment analysis. This progress has enabled a shift from highly specialized research environments to practical, widespread applications within the industry. However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges. Motivated by the marketing oriented software development +needs, our study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems to address these issues. Initially, we elucidate the key solutions derived from our development process, highlighting the role of generative AI models like \emph{chatgpt}, \emph{google gemini} in simplifying intricate sentiment analysis tasks into manageable, phased objectives. Furthermore, we present a detailed case study utilizing our collaborative AI system in edge and cloud, showcasing its effectiveness in analyzing sentiments across diverse online media channels.

Updated: 2024-10-23 11:09:57

标题: 情感分析中的协作人工智能：系统架构、数据预测和部署策略

摘要: 基于大型语言模型（LLM）的人工智能技术的进步已经改变了游戏规则，特别是在情感分析领域。这一进展使高度专门化的研究环境向行业内广泛应用发生了转变。然而，整合多样化的人工智能模型以处理复杂的多模态数据以及相关的高成本特征提取提出了重大挑战。受市场导向的软件开发需求的激励，我们的研究引入了一个协作人工智能框架，旨在有效地分配和解决各种人工智能系统之间的任务，以解决这些问题。首先，我们阐述了从我们的开发过程中得出的关键解决方案，强调生成式人工智能模型如\emph{chatgpt}、\emph{google gemini}在将复杂的情感分析任务简化为可管理的阶段目标中的作用。此外，我们展示了一个详细的案例研究，利用我们的协作人工智能系统在边缘和云端进行情感分析的有效性，展示了其在分析各种在线媒体渠道的情感方面的效果。

更新时间: 2024-10-23 11:09:57

领域: cs.SE,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.13247v2

Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy. Motivated by recent work that predicts the probabilities of subsequent tokens using multiple heads, we connect this approach to rank-$1$ canonical tensor decomposition. By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously. This model can also be interpreted as a mixture of experts, allowing us to leverage successful techniques from that domain for efficient and robust training. Importantly, the overall overhead for training and sampling remains low. Our method demonstrates significant improvements in inference speed for both text and code generation tasks, proving particularly beneficial within the self-speculative decoding paradigm. It maintains its effectiveness across various model sizes and training epochs, highlighting its robustness and scalability.

Updated: 2024-10-23 11:06:36

标题: 使用张量分解提高多令牌预测准确性的更快语言模型

摘要: 我们提出了一种新的transformer多标记预测模型，旨在提高抽样效率而不影响准确性。受到最近预测后续标记概率的多头工作的启发，我们将这种方法与秩-1的标准张量分解联系起来。通过将其推广为秩-r的标准概率分解，我们开发了一个改进的模型，可以同时预测多个标记。这个模型也可以解释为专家混合，使我们能够利用该领域成功的技术进行高效和稳健的训练。重要的是，训练和抽样的整体开销保持较低。我们的方法在文本和代码生成任务的推断速度上显示出显著的改进，特别是在自我推测解码范式中特别有益。它在各种模型大小和训练时期中保持有效，突显了其鲁棒性和可扩展性。

更新时间: 2024-10-23 11:06:36

领域: cs.LG

下载: http://arxiv.org/abs/2410.17765v1

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically implausible. Forward gradients are an approach to approximate the gradients from directional derivatives along random tangents computed by forward-mode automatic differentiation. So far, research has focused on using a single tangent per step. This paper provides an in-depth analysis of multi-tangent forward gradients and introduces an improved approach to combining the forward gradients from multiple tangents based on orthogonal projections. We demonstrate that increasing the number of tangents improves both approximation quality and optimization performance across various tasks.

Updated: 2024-10-23 11:02:59

标题: 超越反向传播：利用多切线前向梯度进行优化

摘要: 用于训练神经网络的梯度通常使用反向传播来计算。虽然反向传播是获取精确梯度的有效方法，但它在计算上昂贵，阻碍了并行化，并且在生物学上不可行。前向梯度是一种通过前向模式自动微分计算沿随机切线的方向导数来近似梯度的方法。到目前为止，研究集中在使用每一步一个切线。本文对多切线前向梯度进行了深入分析，并介绍了一种基于正交投影结合多个切线的前向梯度的改进方法。我们证明增加切线数量可以提高在各种任务中的近似质量和优化性能。

更新时间: 2024-10-23 11:02:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17764v1

Bayesian Analysis of Combinatorial Gaussian Process Bandits

We consider the combinatorial volatile Gaussian process (GP) semi-bandit problem. Each round, an agent is provided a set of available base arms and must select a subset of them to maximize the long-term cumulative reward. We study the Bayesian setting and provide novel Bayesian cumulative regret bounds for three GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS. Our bounds extend previous results for GP-UCB and GP-TS to the infinite, volatile and combinatorial setting, and to the best of our knowledge, we provide the first regret bound for GP-BayesUCB. Volatile arms encompass other widely considered bandit problems such as contextual bandits. Furthermore, we employ our framework to address the challenging real-world problem of online energy-efficient navigation, where we demonstrate its effectiveness compared to the alternatives.

Updated: 2024-10-23 11:01:45

标题: 贝叶斯分析组合高斯过程贝叶斯算法

摘要: 我们考虑组合易挥发高斯过程（GP）半强盗问题。每一轮，一个代理人被提供一组可用的基本手臂，并必须选择其中的一个子集以最大化长期累积奖励。我们研究贝叶斯设置，并为三种基于GP的算法提供新颖的贝叶斯累积遗憾上界：GP-UCB、GP-BayesUCB和GP-TS。我们的上界将先前针对GP-UCB和GP-TS的结果扩展到无限、易挥发和组合设置，并据我们所知，我们提供了GP-BayesUCB的第一个遗憾上界。易挥发手臂包括其他广泛考虑的强盗问题，如上下文强盗。此外，我们利用我们的框架来解决具有挑战性的现实世界问题，即在线节能导航问题，我们展示了它与其他选择的有效性。

更新时间: 2024-10-23 11:01:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.12676v2

Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network

Quality-of-Service (QoS) prediction is a critical task in the service lifecycle, enabling precise and adaptive service recommendations by anticipating performance variations over time in response to evolving network uncertainties and user preferences. However, contemporary QoS prediction methods frequently encounter data sparsity and cold-start issues, which hinder accurate QoS predictions and limit the ability to capture diverse user preferences. Additionally, these methods often assume QoS data reliability, neglecting potential credibility issues such as outliers and the presence of greysheep users and services with atypical invocation patterns. Furthermore, traditional approaches fail to leverage diverse features, including domain-specific knowledge and complex higher-order patterns, essential for accurate QoS predictions. In this paper, we introduce a real-time, trust-aware framework for temporal QoS prediction to address the aforementioned challenges, featuring an end-to-end deep architecture called the Hypergraph Convoluted Transformer Network (HCTN). HCTN combines a hypergraph structure with graph convolution over hyper-edges to effectively address high-sparsity issues by capturing complex, high-order correlations. Complementing this, the transformer network utilizes multi-head attention along with parallel 1D convolutional layers and fully connected dense blocks to capture both fine-grained and coarse-grained dynamic patterns. Additionally, our approach includes a sparsity-resilient solution for detecting greysheep users and services, incorporating their unique characteristics to improve prediction accuracy. Trained with a robust loss function resistant to outliers, HCTN demonstrated state-of-the-art performance on the large-scale WSDREAM-2 datasets for response time and throughput.

Updated: 2024-10-23 11:01:39

标题: 使用超图卷积变换网络实现的异常恢复时间QoS预测

摘要: 服务质量（QoS）预测是服务生命周期中的关键任务，通过预测随着网络不确定性和用户偏好的演变而随时间变化的性能变化，从而实现精确和自适应的服务推荐。然而，当代QoS预测方法经常遇到数据稀疏和冷启动问题，这些问题阻碍了准确的QoS预测，并限制了捕捉多样化用户偏好的能力。此外，这些方法通常假设QoS数据可靠性，忽视潜在的可信度问题，如异常值和具有非典型调用模式的用户和服务的存在。此外，传统方法未能利用多样化特征，包括领域特定知识和复杂的高阶模式，这对准确的QoS预测至关重要。在本文中，我们引入了一个实时的、信任感知的时间QoS预测框架，以应对上述挑战，其中包括一种端到端的深度架构，称为超图卷积变压器网络（HCTN）。HCTN将超图结构与超边上的图卷积结合起来，通过捕捉复杂的高阶相关性来有效解决高稀疏问题。此外，变压器网络利用多头注意力以及并行的1D卷积层和完全连接的密集块来捕捉细粒度和粗粒度的动态模式。此外，我们的方法包括一个面向稀疏性的解决方案，用于检测灰羊用户和服务，并结合它们的独特特征来提高预测准确性。在对抗异常值的强大损失函数训练下，HCTN在用于响应时间和吞吐量的大规模WSDREAM-2数据集上展示了最先进的性能。

更新时间: 2024-10-23 11:01:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.17762v1

Topology meets Machine Learning: An Introduction using the Euler Characteristic Transform

This overview article makes the case for how topological concepts can enrich research in machine learning. Using the Euler Characteristic Transform (ECT), a geometrical-topological invariant, as a running example, I present different use cases that result in more efficient models for analyzing point clouds, graphs, and meshes. Moreover, I outline a vision for how topological concepts could be used in the future, comprising (1) the learning of functions on topological spaces, (2) the building of hybrid models that imbue neural networks with knowledge about the topological information in data, and (3) the analysis of qualitative properties of neural networks. With current research already addressing some of these aspects, this article thus serves as an introduction and invitation to this nascent area of research.

Updated: 2024-10-23 10:56:05

标题: 拓扑学与机器学习相遇：使用欧拉特征变换介绍

摘要: 这篇综述文章阐述了拓扑概念如何丰富机器学习研究。以欧拉特征变换（ECT）作为一个几何-拓扑不变量的运行示例，作者呈现了不同的用例，导致更有效的模型用于分析点云、图形和网格。此外，作者勾勒了拓扑概念如何在未来应用的愿景，包括：（1）在拓扑空间上学习功能，（2）构建将神经网络赋予有关数据拓扑信息的混合模型，（3）分析神经网络的定性特性。随着当前研究已经涉及其中一些方面，本文因此作为对这一新兴研究领域的介绍和邀请。

更新时间: 2024-10-23 10:56:05

领域: cs.LG,math.AT,55N31, 62R40, 68T09

下载: http://arxiv.org/abs/2410.17760v1

Escaping the Forest: Sparse Interpretable Neural Networks for Tabular Data

Tabular datasets are widely used in scientific disciplines such as biology. While these disciplines have already adopted AI methods to enhance their findings and analysis, they mainly use tree-based methods due to their interpretability. At the same time, artificial neural networks have been shown to offer superior flexibility and depth for rich and complex non-tabular problems, but they are falling behind tree-based models for tabular data in terms of performance and interpretability. Although sparsity has been shown to improve the interpretability and performance of ANN models for complex non-tabular datasets, enforcing sparsity structurally and formatively for tabular data before training the model, remains an open question. To address this question, we establish a method that infuses sparsity in neural networks by utilising attention mechanisms to capture the features' importance in tabular datasets. We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models, reaching the state-of-the-art on biological datasets. They further permit the extraction of insights from these datasets and achieve better performance than post-hoc methods like SHAP.

Updated: 2024-10-23 10:50:07

标题: 逃离森林：稀疏可解释的用于表格数据的神经网络

摘要: 表格数据集广泛应用于生物学等科学学科。虽然这些学科已经采用了人工智能方法来增强他们的发现和分析，但由于其可解释性，它们主要使用基于树的方法。同时，人工神经网络已被证明为富有和复杂的非表格问题提供了卓越的灵活性和深度，但在表格数据的性能和可解释性方面，它们落后于基于树的模型。尽管已经显示稀疏性可以提高复杂非表格数据集的ANN模型的可解释性和性能，但在训练模型之前在表格数据中结构化和形成地强制稀疏性仍然是一个悬而未决的问题。为了解决这个问题，我们建立了一种方法，通过利用注意机制来捕捉表格数据集中特征的重要性，从而在神经网络中注入稀疏性。我们展示了我们的模型，名为Sparse TABular NET或sTAB-Net，具有注意机制，比基于树的模型更有效，达到了生物数据集的最新水平。它们进一步允许从这些数据集中提取见解，并实现比SHAP等事后方法更好的性能。

更新时间: 2024-10-23 10:50:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17758v1

On the explainability of quantum neural networks based on variational quantum circuits

Ridge functions are used to describe and study the lower bound of the approximation done by the neural networks which can be written as a linear combination of activation functions. If the activation functions are also ridge functions, these networks are called explainable neural networks. In this brief paper, we first show that quantum neural networks which are based on variational quantum circuits can be written as a linear combination of ridge functions by following matrix notations. Consequently, we show that the interpretability and explainability of such quantum neural networks can be directly considered and studied as an approximation with the linear combination of ridge functions.

Updated: 2024-10-23 10:31:03

标题: 关于基于变分量子电路的量子神经网络可解释性的研究

摘要: Ridge函数被用来描述和研究神经网络的逼近下限，这些神经网络可以被写成激活函数的线性组合。如果激活函数也是Ridge函数，这些网络被称为可解释的神经网络。在这篇简短的论文中，我们首先展示了基于变分量子电路的量子神经网络可以通过矩阵符号表示为Ridge函数的线性组合。因此，我们展示了这种量子神经网络的可解释性和可解释性可以直接被视为与Ridge函数的线性组合的逼近进行考虑和研究。

更新时间: 2024-10-23 10:31:03

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2301.05549v3

VISAGE: Video Synthesis using Action Graphs for Surgery

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

Updated: 2024-10-23 10:28:17

标题: VISAGE：利用手术动作图像进行视频综合

摘要: 外科数据科学（SDS）是一个领域，它分析手术前、手术中和手术后的患者数据，以改善手术结果和技能。然而，外科数据稀缺、异质且复杂，这限制了现有机器学习方法的适用性。在这项工作中，我们介绍了腹腔镜手术中未来视频生成的新任务。这项任务可以增加和丰富现有的外科数据，并实现各种应用，如模拟、分析和机器辅助手术。最终，它不仅涉及理解手术的当前状态，还准确预测外科程序的动态和常常不可预测的特性。我们提出的方法VISAGE（使用动作图生成外科视频），利用动作场景图的能力捕捉腹腔镜程序的顺序性质，并利用扩散模型合成时间上连贯的视频序列。VISAGE仅通过单个初始帧和动作图三元组预测未来帧。通过通过动作图融入领域特定知识，VISAGE确保生成的视频符合实际腔镜手术中观察到的预期视觉和运动模式。我们的实验结果表明，对于腹腔镜手术程序，高保真度的视频生成，为SDS中的各种应用提供了可能。

更新时间: 2024-10-23 10:28:17

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17751v1

Can Uncertainty Quantification Enable Better Learning-based Index Tuning?

Index tuning is crucial for optimizing database performance by selecting optimal indexes based on workload. The key to this process lies in an accurate and efficient benefit estimator. Traditional methods relying on what-if tools often suffer from inefficiency and inaccuracy. In contrast, learning-based models provide a promising alternative but face challenges such as instability, lack of interpretability, and complex management. To overcome these limitations, we adopt a novel approach: quantifying the uncertainty in learning-based models' results, thereby combining the strengths of both traditional and learning-based methods for reliable index tuning. We propose Beauty, the first uncertainty-aware framework that enhances learning-based models with uncertainty quantification and uses what-if tools as a complementary mechanism to improve reliability and reduce management complexity. Specifically, we introduce a novel method that combines AutoEncoder and Monte Carlo Dropout to jointly quantify uncertainty, tailored to the characteristics of benefit estimation tasks. In experiments involving sixteen models, our approach outperformed existing uncertainty quantification methods in the majority of cases. We also conducted index tuning tests on six datasets. By applying the Beauty framework, we eliminated worst-case scenarios and more than tripled the occurrence of best-case scenarios.

Updated: 2024-10-23 10:23:53

标题: 不确定性量化能否实现更好的基于学习的指数调整？

摘要: 指数调整对于通过基于工作负载选择最佳指数来优化数据库性能至关重要。这一过程的关键在于准确高效的收益估计器。传统方法依赖于假设工具往往存在效率低下和准确性不足的问题。相比之下，基于学习的模型提供了一个有前途的替代方案，但面临着不稳定性、缺乏可解释性和复杂管理等挑战。为了克服这些限制，我们采用了一种新颖的方法：量化学习模型结果中的不确定性，从而结合传统方法和基于学习的方法的优势，实现可靠的指数调整。我们提出了Beauty，这是第一个增强学习模型的不确定性感知框架，通过不确定性量化和使用假设工具作为补充机制来提高可靠性并减少管理复杂性。具体地，我们引入了一种将自动编码器和蒙特卡洛辍学相结合的新方法，共同量化不确定性，根据收益估计任务的特点进行定制。在涉及十六个模型的实验中，我们的方法在大多数情况下优于现有的不确定性量化方法。我们还在六个数据集上进行了指数调整测试。通过应用Beauty框架，我们消除了最坏情况，并将最佳情况的发生率增加了三倍以上。

更新时间: 2024-10-23 10:23:53

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2410.17748v1

Learning Versatile Skills with Curriculum Masking

Masked prediction has emerged as a promising pretraining paradigm in offline reinforcement learning (RL) due to its versatile masking schemes, enabling flexible inference across various downstream tasks with a unified model. Despite the versatility of masked prediction, it remains unclear how to balance the learning of skills at different levels of complexity. To address this, we propose CurrMask, a curriculum masking pretraining paradigm for sequential decision making. Motivated by how humans learn by organizing knowledge in a curriculum, CurrMask adjusts its masking scheme during pretraining for learning versatile skills. Through extensive experiments, we show that CurrMask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that CurrMask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme.

Updated: 2024-10-23 10:17:13

标题: 学习多功能技能与课程屏蔽

摘要: 遮盖预测作为一种有前景的离线强化学习（RL）预训练范式崭露头角，这是由于其多样化的屏蔽方案，使得能够使用统一模型在各种下游任务中进行灵活推断。尽管遮盖预测具有多功能性，但如何平衡学习不同复杂级别的技能仍然不清楚。为了解决这个问题，我们提出了CurrMask，这是一种用于顺序决策制定的课程遮盖预训练范式。受到人类通过组织知识进行课程学习的启发，CurrMask在预训练期间调整其屏蔽方案，以学习多功能技能。通过大量实验，我们展示了CurrMask在技能提示任务、目标条件规划任务上展现出优越的零-shot性能，并在离线RL任务上展现出竞争性的微调性能。此外，我们对训练动态的分析表明，CurrMask通过动态调整其屏蔽方案逐渐获得不同复杂性的技能。

更新时间: 2024-10-23 10:17:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17744v1

Emotion Recognition with Facial Attention and Objective Activation Functions

In this paper, we study the effect of introducing channel and spatial attention mechanisms, namely SEN-Net, ECA-Net, and CBAM, to existing CNN vision-based models such as VGGNet, ResNet, and ResNetV2 to perform the Facial Emotion Recognition task. We show that not only attention can significantly improve the performance of these models but also that combining them with a different activation function can further help increase the performance of these models.

Updated: 2024-10-23 10:14:37

标题: 使用面部关注和客观激活函数进行情绪识别

摘要: 在这篇论文中，我们研究了引入通道和空间注意机制，即SEN-Net、ECA-Net和CBAM，到现有的CNN视觉模型如VGGNet、ResNet和ResNetV2中执行面部情绪识别任务的影响。我们展示了注意力不仅可以显著提高这些模型的性能，而且将它们与不同的激活函数结合可以进一步帮助提高这些模型的性能。

更新时间: 2024-10-23 10:14:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17740v1

New Insight in Cervical Cancer Diagnosis Using Convolution Neural Network Architecture

The Pap smear is a screening method for early cervical cancer diagnosis. The selection of the right optimizer in the convolutional neural network (CNN) model is key to the success of the CNN in image classification, including the classification of cervical cancer Pap smear images. In this study, stochastic gradient descent (SGD), RMSprop, Adam, AdaGrad, AdaDelta, Adamax, and Nadam optimizers were used to classify cervical cancer Pap smear images from the SipakMed dataset. Resnet-18, Resnet-34, and VGG-16 are the CNN architectures used in this study, and each architecture uses a transfer-learning model. Based on the test results, we conclude that the transfer learning model performs better on all CNNs and optimization techniques and that in the transfer learning model, the optimization has little influence on the training of the model. Adamax, with accuracy values of 72.8% and 66.8%, had the best accuracy for the VGG-16 and Resnet-18 architectures, respectively. Resnet-34 had 54.0%. This is 0.034% lower than Nadam. Overall, Adamax is a suitable optimizer for CNN in cervical cancer classification on Resnet-18, Resnet-34, and VGG-16 architectures. This study provides new insights into the configuration of CNN models for Pap smear image analysis.

Updated: 2024-10-23 10:11:39

标题: 使用卷积神经网络架构的宫颈癌诊断新见解

摘要: 子宫颈抹片是早期宫颈癌诊断的筛查方法。在卷积神经网络（CNN）模型中选择合适的优化器对CNN在图像分类中的成功至关重要，包括宫颈癌颈部抹片图像的分类。本研究使用随机梯度下降（SGD）、RMSprop、Adam、AdaGrad、AdaDelta、Adamax和Nadam优化器对来自SipakMed数据集的宫颈癌颈部抹片图像进行分类。Resnet-18、Resnet-34和VGG-16是本研究中使用的CNN架构，每个架构都使用迁移学习模型。根据测试结果，我们得出结论，迁移学习模型在所有CNN和优化技术上表现更好，并且在迁移学习模型中，优化对模型的训练影响较小。Adamax在VGG-16和Resnet-18架构中的准确率分别为72.8%和66.8%，准确率最高，Resnet-34为54.0%，比Nadam低0.034%。总体而言，Adamax是Resnet-18、Resnet-34和VGG-16架构中CNN在宫颈癌分类中的合适优化器。本研究为宫颈抹片图像分析的CNN模型配置提供了新的见解。

更新时间: 2024-10-23 10:11:39

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.17735v1

Conformal Prediction for Causal Effects of Continuous Treatments

Uncertainty quantification of causal effects is crucial for safety-critical applications such as personalized medicine. A powerful approach for this is conformal prediction, which has several practical benefits due to model-agnostic finite-sample guarantees. Yet, existing methods for conformal prediction of causal effects are limited to binary/discrete treatments and make highly restrictive assumptions such as known propensity scores. In this work, we provide a novel conformal prediction method for potential outcomes of continuous treatments. We account for the additional uncertainty introduced through propensity estimation so that our conformal prediction intervals are valid even if the propensity score is unknown. Our contributions are three-fold: (1) We derive finite-sample prediction intervals for potential outcomes of continuous treatments. (2) We provide an algorithm for calculating the derived intervals. (3) We demonstrate the effectiveness of the conformal prediction intervals in experiments on synthetic and real-world datasets. To the best of our knowledge, we are the first to propose conformal prediction for continuous treatments when the propensity score is unknown and must be estimated from data.

Updated: 2024-10-23 10:09:10

标题: 连续治疗的因果效应的一致预测

摘要: 因果效应的不确定性量化对于个性化医学等关键应用至关重要。一种强大的方法是符合预测，由于模型不可知的有限样本保证，它具有几个实际优势。然而，现有的因果效应符合预测方法仅限于二进制/离散治疗，并做出高度限制性的假设，如已知倾向得分。在这项工作中，我们提供了一种新颖的符合预测方法，用于连续治疗的潜在结果。我们考虑通过倾向性估计引入的额外不确定性，因此即使倾向得分未知，我们的符合预测区间也是有效的。我们的贡献有三个方面：(1)我们推导了连续治疗的潜在结果的有限样本预测区间。(2)我们提供了计算推导区间的算法。(3)我们在合成和真实数据集的实验中展示了符合预测区间的有效性。据我们所知，我们是第一个提出在倾向性得分未知且必须从数据中估计时，为连续治疗提出符合预测的研究者。

更新时间: 2024-10-23 10:09:10

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2407.03094v2

From Keywords to Structured Summaries: Streamlining Scholarly Information Access

This paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how researchers access and filter articles, replacing the traditional text-heavy approach. This vision is exemplified through a proof of concept centered on the "reproductive number estimate of infectious diseases" research theme, using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords. The result is a next-generation information access system as an IR method accessible at https://orkg.org/usecases/r0-estimates.

Updated: 2024-10-23 10:06:10

标题: 从关键词到结构化摘要：简化学术信息获取

摘要: 本文强调信息检索引擎在科学界日益重要，指出传统基于关键词的搜索引擎由于出版物数量不断增加而效率低下。提出的解决方案涉及结构化记录，支持先进的信息技术（IT）工具，包括可视化仪表板，以彻底改变研究人员访问和过滤文章的方式，取代传统的文本密集型方法。这一愿景通过以“传染病的再生数估计”为研究主题的概念验证来体现，利用经过调整的大型语言模型（LLM）自动创建结构化记录，以填充现在超越关键词的后端数据库。结果是一个下一代信息访问系统，作为一种IR方法，可在https://orkg.org/usecases/r0-estimates访问。

更新时间: 2024-10-23 10:06:10

领域: cs.IR,cs.AI,cs.CL,cs.DL

下载: http://arxiv.org/abs/2402.14622v2

FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage

Ever-increasing design complexity of System-on-Chips (SoCs) led to significant verification challenges. Unlike software, bugs in hardware design are vigorous and eternal i.e., once the hardware is fabricated, it cannot be repaired with any patch. Despite being one of the powerful techniques used in verification, the dynamic random approach cannot give confidence to complex Register Transfer Leve (RTL) designs during the pre-silicon design phase. In particular, achieving coverage targets and exposing bugs is a complicated task with random simulations. In this paper, we leverage an existing testing solution available in the software world known as fuzzing and apply it to hardware verification in order to achieve coverage targets in quick time. We created an automated hardware fuzzing framework FuzzWiz using metamodeling and Python to achieve coverage goals faster. It includes parsing the RTL design module, converting it into C/C++ models, creating generic testbench with assertions, fuzzer-specific compilation, linking, and fuzzing. Furthermore, it is configurable and provides the debug flow if any crash is detected during the fuzzing process. The proposed framework is applied on four IP blocks from Google's OpenTitan chip with various fuzzing engines to show its scalability and compatibility. Our benchmarking results show that we could achieve around 90% of the coverage 10 times faster than traditional simulation regression based approach.

Updated: 2024-10-23 10:06:08

标题: FuzzWiz -- 用于高效硬件覆盖率的模糊测试框架

摘要: SoC（片上系统）设计复杂性不断增加导致了重大的验证挑战。与软件不同，硬件设计中的错误是持久且无法修复的，即一旦硬件制造完成，就无法通过任何补丁修复。尽管动态随机方法是验证中使用的强大技术之一，但在硅前设计阶段，它无法为复杂的寄存器传输级（RTL）设计提供信心。特别是在随机模拟中实现覆盖目标并暴露错误是一项复杂的任务。在本文中，我们利用软件世界中已有的一种名为模糊测试（fuzzing）的测试解决方案，并将其应用于硬件验证，以快速实现覆盖目标。我们使用元模型和Python创建了一个自动化硬件模糊测试框架FuzzWiz，以更快地实现覆盖目标。它包括解析RTL设计模块，将其转换为C/C++模型，创建带断言的通用测试台，针对模糊测试的编译、链接和模糊。此外，它是可配置的，并在检测到任何崩溃时提供调试流程。所提出的框架应用于谷歌OpenTitan芯片的四个IP块，采用不同的模糊测试引擎，以展示其可扩展性和兼容性。我们的基准测试结果显示，相比传统的模拟回归方法，我们可以以快10倍的速度实现约90%的覆盖率。

更新时间: 2024-10-23 10:06:08

领域: cs.AR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2410.17732v1

Time-to-Lie: Identifying Industrial Control System Honeypots Using the Internet Control Message Protocol

The convergence of information and operational technology networks has created previously unforeseen security issues. To address these issues, both researchers and practitioners have integrated threat intelligence methods into the security operations of converged networks, with some of the most valuable tools being honeypots that imitate industrial control systems (ICS). However, the development and deployment of such honeypots is a process rich with pitfalls, which can lead to undiagnosed weaknesses in the threat intelligence being gathered. This paper presents a side-channel method of covertly identifying ICS honeypots using the time-to-live (TTL) values of target devices. We show that many ICS honeypots can be readily identified, via minimal interactions, using only basic networking tools. In a study of over 8,000 devices presenting as ICS systems, we detail how our method compares to an existing honeypot detection approach, and outline what our methodology reveals about the current population of live ICS honeypots. In demonstrating our method, this study aims to raise awareness of the viability of the TTL heuristic and the prevalence of its misconfiguration despite its presence in literature.

Updated: 2024-10-23 10:06:02

标题: 时间欺骗：利用互联网控制消息协议识别工业控制系统蜜罐

摘要: 信息技术和运营技术网络的融合已经产生了以前未曾预料到的安全问题。为了解决这些问题，研究人员和实践者已将威胁情报方法整合到融合网络的安全运营中，其中一些最有价值的工具是模拟工业控制系统（ICS）的蜜罐。然而，这种蜜罐的开发和部署过程充满了陷阱，可能导致威胁情报收集中存在未被诊断的弱点。本文提出了一种利用目标设备的生存时间（TTL）值秘密识别ICS蜜罐的侧信道方法。我们展示了许多ICS蜜罐可以通过最小的交互，仅使用基本网络工具就能轻松识别。在对超过8,000台呈现为ICS系统的设备进行研究中，我们详细说明了我们的方法与现有蜜罐检测方法的比较，并概述了我们的方法揭示的当前活跃ICS蜜罐的人口统计学。通过展示我们的方法，本研究旨在提高人们对TTL启发式的可行性以及尽管文献中存在但其配置错误的普遍性的认识。

更新时间: 2024-10-23 10:06:02

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.17731v1

OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment

Ontology alignment is integral to achieving semantic interoperability as the number of available ontologies covering intersecting domains is increasing. This paper proposes OWL2Vec4OA, an extension of the ontology embedding system OWL2Vec*. While OWL2Vec* has emerged as a powerful technique for ontology embedding, it currently lacks a mechanism to tailor the embedding to the ontology alignment task. OWL2Vec4OA incorporates edge confidence values from seed mappings to guide the random walk strategy. We present the theoretical foundations, implementation details, and experimental evaluation of our proposed extension, demonstrating its potential effectiveness for ontology alignment tasks.

Updated: 2024-10-23 09:59:15

标题: OWL2Vec4OA：为本体对齐定制知识图嵌入

摘要: 本文提出了OWL2Vec4OA，这是本体嵌入系统OWL2Vec*的一个扩展。虽然OWL2Vec*已经成为本体嵌入的一个强大技术，但目前缺乏一个机制来调整嵌入以适应本体对齐任务。OWL2Vec4OA将来自种子映射的边缘置信度值纳入，以指导随机游走策略。我们展示了我们提出的扩展的理论基础、实现细节和实验评估，证明了它对本体对齐任务的潜在有效性。

更新时间: 2024-10-23 09:59:15

领域: cs.AI

下载: http://arxiv.org/abs/2408.06310v2

Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint

Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.

Updated: 2024-10-23 09:51:11

标题: 无线蜂窝MIMO系统中长期功率约束条件下的空中联邦学习

摘要: 支持人工智能的无线网络受到了极大关注，由于其独特的传输和分布式计算特性，基于空中联邦学习的技术已成为主要应用之一。本文针对无线MIMO系统中的空中联邦学习推导了误差界，并通过联合优化功率控制和波束成形来最小化最优性差距。我们引入了MOP-LOFPC算法，该算法利用Lyapunov优化来解耦跨轮次的长期约束，仅需要因果通道状态信息。实验结果表明，与现有基线相比，MOP-LOFPC在模型的训练损失和长期功率约束遵从性之间取得了更好和更灵活的权衡。

更新时间: 2024-10-23 09:51:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05354v3

A Type System to Ensure Non-Interference in ReScript

Protecting confidential data from leaking is a critical challenge in computer systems, particularly given the growing number of observers on the internet. Therefore, limiting information flow using robust security policies becomes increasingly vital. We focus on the non-interference policy, where the goal is to ensure that confidential data can not impact public data. This paper presents a type system, for a subset of the ReScript syntax, designed to enforce non-interference. We conclude with a proof of soundness for the type system, demonstrating that if an expression is type-able, it is inherently non-interferent. In addition, we provide a brief overview of a type checker that implements the previously mentioned type system.

Updated: 2024-10-23 09:47:10

标题: 一种确保在ReScript中不干扰的类型系统

摘要: 保护机密数据不被泄露是计算机系统中的一个关键挑战，特别是考虑到互联网上观察者数量的增加。因此，使用强大的安全策略限制信息流变得越来越重要。我们关注非干扰策略，其目标是确保机密数据不会影响公共数据。本文提出了一种类型系统，用于一种ReScript语法的子集，旨在强制执行非干扰。我们最后证明了类型系统的完备性，表明如果一个表达式是可类型化的，它就是固有的非干扰的。此外，我们简要介绍了一个实现先前提到的类型系统的类型检查器。

更新时间: 2024-10-23 09:47:10

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2410.18157v1

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

Reinforcement learning has achieved remarkable success in perfect information games such as Go and Atari, enabling agents to compete at the highest levels against human players. However, research in reinforcement learning for imperfect information games has been relatively limited due to the more complex game structures and randomness. Traditional methods face challenges in training and improving performance in imperfect information games due to issues like inaccurate Q value estimation and reward sparsity. In this paper, we focus on Uno, an imperfect information game, and aim to address these problems by reducing Q value overestimation and reshaping reward function. We propose a novel algorithm that utilizes Monte Carlo Tree Search to average the value estimations in Q function. Even though we choose Double Deep Q Learning as the foundational framework in this paper, our method can be generalized and used in any algorithm which needs Q value estimation, such as the Actor-Critic. Additionally, we employ Monte Carlo Tree Search to reshape the reward structure in the game environment. We compare our algorithm with several traditional methods applied to games such as Double Deep Q Learning, Deep Monte Carlo and Neural Fictitious Self Play, and the experiments demonstrate that our algorithm consistently outperforms these approaches, especially as the number of players in Uno increases, indicating a higher level of difficulty.

Updated: 2024-10-23 09:43:03

标题: 使用蒙特卡洛树搜索改进Q函数的价值估计和重新塑造奖励

摘要: 强化学习在象棋和Atari等完全信息游戏中取得了显著的成功，使代理能够与人类玩家竞争到最高水平。然而，由于更复杂的游戏结构和随机性，对于不完全信息游戏的强化学习研究相对有限。传统方法在训练和提高不完全信息游戏中的性能方面面临挑战，因为存在Q值估计不准确和奖励稀疏等问题。本文专注于Uno这个不完全信息游戏，并旨在通过减少Q值过度估计和重塑奖励函数来解决这些问题。我们提出了一种新颖的算法，利用蒙特卡罗树搜索对Q函数中的值估计进行平均。尽管我们选择双深度Q学习作为本文中的基础框架，但我们的方法可以推广并用于任何需要Q值估计的算法，如演员-评论家。此外，我们利用蒙特卡罗树搜索来重塑游戏环境中的奖励结构。我们将我们的算法与应用于游戏的几种传统方法进行了比较，如双深度Q学习、深度蒙特卡罗和神经虚构自我对弈，实验证明我们的算法始终优于这些方法，特别是在Uno玩家数量增加时，表明了更高的难度水平。

更新时间: 2024-10-23 09:43:03

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2410.11642v2

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs' planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.

Updated: 2024-10-23 09:42:59

标题: 学习从知识图谱中为检索增强的大型语言模型进行规划

摘要: 提高大型语言模型（LLMs）在复杂问答（QA）场景中的性能一直是研究的焦点。最近的研究尝试通过将逐步规划与外部检索相结合来提升LLMs的性能。虽然对于像GPT-3.5这样的高级模型有效，但较小的LLMs在分解复杂问题时面临挑战，需要监督微调。先前的工作依赖于手动注释和从教师LLMs进行知识蒸馏，这些方法耗时且不够准确。本文介绍了一种利用知识图（KGs）导出的规划数据来增强LLMs规划能力的新框架。使用这些数据微调的LLMs具有改进的规划能力，更好地装备他们处理涉及检索的复杂QA任务。对多个数据集进行评估，包括我们新提出的基准数据集，突出了我们框架的有效性和KG导出规划数据的益处。

更新时间: 2024-10-23 09:42:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14282v3

DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark

The wide deployment of Face Recognition (FR) systems poses privacy risks. One countermeasure is adversarial attack, deceiving unauthorized malicious FR, but it also disrupts regular identity verification of trusted authorizers, exacerbating the potential threat of identity impersonation. To address this, we propose the first double identity protection scheme based on traceable adversarial watermarking, termed DIP-Watermark. DIP-Watermark employs a one-time watermark embedding to deceive unauthorized FR models and allows authorizers to perform identity verification by extracting the watermark. Specifically, we propose an information-guided adversarial attack against FR models. The encoder embeds an identity-specific watermark into the deep feature space of the carrier, guiding recognizable features of the image to deviate from the source identity. We further adopt a collaborative meta-optimization strategy compatible with sub-tasks, which regularizes the joint optimization direction of the encoder and decoder. This strategy enhances the representation of universal carrier features, mitigating multi-objective optimization conflicts in watermarking. Experiments confirm that DIP-Watermark achieves significant attack success rates and traceability accuracy on state-of-the-art FR models, exhibiting remarkable robustness that outperforms the existing privacy protection methods using adversarial attacks and deep watermarking, or simple combinations of the two. Our work potentially opens up new insights into proactive protection for FR privacy.

Updated: 2024-10-23 09:42:29

标题: DIP-Watermark：基于强对抗水印的双重身份保护方法

摘要: 面部识别（FR）系统的广泛部署带来了隐私风险。其中一种对抗措施是对抗性攻击，欺骗未经授权的恶意FR，但它也会干扰受信任的授权者的正常身份验证，加剧身份冒充的潜在威胁。为了解决这个问题，我们提出了第一个基于可追踪对抗水印技术的双重身份保护方案，称为DIP-Watermark。DIP-Watermark采用一次性水印嵌入来欺骗未经授权的FR模型，并允许授权者通过提取水印来进行身份验证。具体来说，我们提出了一种信息引导的对抗性攻击方法来对抗FR模型。编码器将特定于身份的水印嵌入到载体的深度特征空间中，引导图像的可识别特征偏离原始身份。我们进一步采用了与子任务兼容的协作元优化策略，该策略规范了编码器和解码器的联合优化方向。这种策略增强了通用载体特征的表示，缓解了水印中的多目标优化冲突。实验证实，DIP-Watermark在最先进的FR模型上实现了显著的攻击成功率和可追溯性准确率，表现出优于使用对抗性攻击和深度水印技术或两者简单组合的现有隐私保护方法的显著鲁棒性。我们的工作潜在地为面部识别隐私的主动保护提供了新的见解。

更新时间: 2024-10-23 09:42:29

领域: cs.CR,cs.CV,eess.IV

下载: http://arxiv.org/abs/2404.14693v2

Continual Learning on a Data Diet

Continual Learning (CL) methods usually learn from all available data. However, this is not the case in human cognition which efficiently focuses on key experiences while disregarding the redundant information. Similarly, not all data points in a dataset have equal potential; some can be more informative than others. This disparity may significantly impact the performance, as both the quality and quantity of samples directly influence the model's generalizability and efficiency. Drawing inspiration from this, we explore the potential of learning from important samples and present an empirical study for evaluating coreset selection techniques in the context of CL to stimulate research in this unexplored area. We train different continual learners on increasing amounts of selected samples and investigate the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance. We present several significant observations: learning from selectively chosen samples (i) enhances incremental accuracy, (ii) improves knowledge retention of previous tasks, and (iii) refines learned representations. This analysis contributes to a deeper understanding of selective learning strategies in CL scenarios.

Updated: 2024-10-23 09:42:17

标题: 在数据饮食中的持续学习

摘要: 持续学习（CL）方法通常会从所有可用数据中学习。然而，这在人类认知中并非如此，人类认知会高效地专注于关键经验，而忽略冗余信息。同样，并非数据集中的所有数据点具有相同的潜力；有些可能比其他数据点更具信息量。这种差异可能会显著影响性能，因为样本的质量和数量直接影响模型的泛化能力和效率。受此启发，我们探索了从重要样本中学习的潜力，并在CL的背景下提出了一个实证研究，以激发这一未被开发的领域的研究。我们对不同的持续学习者训练了逐渐增加的选择样本量，并通过揭示驱动其改善的稳定性 - 可塑性平衡的潜在机制，来调查学习遗忘动态。我们提出了几个重要观察结果：从有选择性地选择样本中学习（i）提高了增量准确性，（ii）改善了先前任务的知识保留，以及（iii）完善了学习表示。这个分析有助于更深入地理解CL场景中的选择性学习策略。

更新时间: 2024-10-23 09:42:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.17715v1

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization

The graduated optimization approach is a heuristic method for finding global optimal solutions for nonconvex functions by using a function smoothing operation with stochastic noise. We show that stochastic noise in stochastic gradient descent (SGD) has the effect of smoothing the objective function, the degree of which is determined by the learning rate, batch size, and variance of the stochastic gradient. Using this finding, we propose and analyze a new graduated optimization algorithm that varies the degree of smoothing by varying the learning rate and batch size, and provide experimental results on image classification tasks with ResNets that support our theoretical findings. We further show that there is an interesting correlation between the degree of smoothing by SGD's stochastic noise, the well-studied ``sharpness'' indicator, and the generalization performance of the model.

Updated: 2024-10-23 09:40:44

标题: 使用随机梯度下降平滑非凸函数：隐式逐步优化的分析

摘要: 研究生优化方法是一种启发式方法，通过使用带有随机噪声的函数平滑操作，寻找非凸函数的全局最优解。我们表明，在随机梯度下降（SGD）中的随机噪声具有平滑目标函数的效果，其程度由学习率、批量大小和随机梯度的方差决定。基于这一发现，我们提出并分析了一种新的研究生优化算法，通过改变学习率和批量大小来改变平滑程度，并提供了在ResNets图像分类任务上的实验结果，支持我们的理论发现。我们进一步表明，SGD的随机噪声对平滑程度、广为研究的“锐度”指标以及模型的泛化性能之间存在有趣的相关性。

更新时间: 2024-10-23 09:40:44

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2311.08745v5

CogSteer: Cognition-Inspired Selective Layer Intervention for Efficient Semantic Steering in Large Language Models

Despite their impressive capabilities, large language models (LLMs) often lack interpretability and can generate toxic content. While using LLMs as foundation models and applying semantic steering methods are widely practiced, we believe that efficient methods should be based on a thorough understanding of LLM behavior. To this end, we propose using eye movement measures to interpret LLM behavior across layers. We find that LLMs exhibit patterns similar to human gaze across layers and different layers function differently. Inspired by these findings, we introduce a heuristic steering layer selection and apply it to layer intervention methods via fine-tuning and inference. Using language toxification and detoxification as test beds, we demonstrate that our proposed CogSteer methods achieve better results in terms of toxicity scores while efficiently saving 97% of the computational resources and 60% of the training time. Our model-agnostic approach can be adopted into various LLMs, contributing to their interpretability and promoting trustworthiness for safe deployment.

Updated: 2024-10-23 09:40:15

标题: CogSteer：启发认知的选择性层干预，用于大型语言模型的高效语义引导

摘要: 尽管大型语言模型（LLMs）具有令人印象深刻的功能，但它们常常缺乏可解释性，并且可能生成有毒内容。尽管在使用LLMs作为基础模型并应用语义引导方法方面已经被广泛实践，但我们认为高效的方法应该基于对LLM行为的深入理解。为此，我们提出使用眼动测量来解释LLM在不同层之间的行为。我们发现，LLMs在不同层之间展现出类似于人类凝视的模式，并且不同层具有不同的功能。受到这些发现的启发，我们引入了一种启发式的层选择方法，并将其应用于通过微调和推理进行的层干预方法。通过使用语言毒化和解毒作为测试基础，我们证明我们提出的CogSteer方法在毒性评分方面取得了更好的结果，同时有效节省了97%的计算资源和60%的训练时间。我们的模型无关方法可以被应用到各种LLMs中，有助于提高它们的可解释性，并促进安全部署的可信度。

更新时间: 2024-10-23 09:40:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17714v1

A Data-Driven Odyssey in Solar Vehicles

Solar vehicles, which simultaneously produce and consume energy, require meticulous energy management. However, potential users often feel uncertain about their operation compared to conventional vehicles. This study presents a simulator designed to help users understand long-distance travel in solar vehicles and recognize the importance of proper energy management. By utilizing Google Maps data and weather information, the simulator replicates real-world driving conditions and provides a dashboard displaying vehicle status, updated hourly based on user-inputted speed. Users can explore various speed policy scenarios and receive recommendations for optimal driving strategies. The simulator's effectiveness was validated using the route of the World Solar Challenge (WSC). This research enables users to monitor energy dynamics before a journey, enhancing their understanding of energy management and informing appropriate speed decisions.

Updated: 2024-10-23 09:39:26

标题: 一个数据驱动的太阳能车辆之旅

摘要: 太阳能车辆同时产生和消耗能源，需要精心的能源管理。然而，潜在用户常常对其与传统车辆的操作感到不确定。本研究提出了一个模拟器，旨在帮助用户了解太阳能车辆的长途旅行，并认识到正确能源管理的重要性。通过利用谷歌地图数据和天气信息，模拟器复制真实世界的驾驶条件，并提供一个仪表板显示车辆状态，根据用户输入的速度每小时更新一次。用户可以探索各种速度政策情景，并获得最佳驾驶策略的建议。该模拟器的有效性通过世界太阳能挑战赛（WSC）的路线进行验证。这项研究使用户能够在出行前监测能源动态，增强他们对能源管理的理解，并做出适当的速度决策。

更新时间: 2024-10-23 09:39:26

领域: cs.AI

下载: http://arxiv.org/abs/2410.17712v1

Beware of Calibration Data for Pruning Large Language Models

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Previous research has primarily focused on designing advanced pruning methods, while different calibration data's impact on pruning performance still lacks systematical exploration. We fill this blank and surprisingly observe that the effects of calibration data even value more than designing advanced pruning strategies, especially for high sparsity. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. We conduct experiments on the recent strong open-source LLMs (e.g., DCLM, and LLaMA-3), and the results show that the proposed method outperforms commonly used calibration data and can effectively enhance strong pruning methods (e.g., Wanda, OWL).

Updated: 2024-10-23 09:36:21

标题: 小心校准数据，用于修剪大型语言模型

摘要: 随着大型语言模型（LLMs）在各个领域得到广泛应用，模型压缩变得越来越关键，以降低成本并提高推理效率。后训练修剪是一种有前途的方法，不需要资源密集型的迭代训练，只需要少量的校准数据来评估参数的重要性。先前的研究主要集中在设计先进的修剪方法上，而不同校准数据对修剪性能的影响仍然缺乏系统地探究。我们填补了这一空白，并惊讶地发现，校准数据的影响甚至比设计先进的修剪策略更重要，尤其对于高稀疏性而言。我们的初步探索还揭示出，使用与训练数据相似的校准数据可以获得更好的性能。由于预训练数据通常无法访问，我们进一步提供了一种自动生成校准数据合成策略，以构建可行的校准数据。我们在最近强大的开源LLMs（例如DCLM和LLaMA-3）上进行实验，结果显示，所提出的方法优于常用的校准数据，并可以有效增强强大的修剪方法（例如Wanda、OWL）。

更新时间: 2024-10-23 09:36:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17711v1

Do causal predictors generalize better to new domains?

We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. In addition, we show that recent causal machine learning methods for domain generalization do not perform better in our evaluation than standard predictors trained on the set of causal features. Likewise, causal discovery algorithms either fail to run or select causal variables that perform no better than our selection. Extensive robustness checks confirm that our findings are stable under variable misclassification.

Updated: 2024-10-23 09:29:39

标题: 因果预测器是否更好地泛化到新的领域？

摘要: 我们研究了基于因果特征训练的机器学习模型在不同领域之间的泛化能力。我们考虑了涵盖健康、就业、教育、社会福利和政治等领域的16个表格数据集上的预测任务。每个数据集都有多个领域，让我们能够测试一个在一个领域训练的模型在另一个领域的表现。对于每个预测任务，我们选择对预测目标有因果影响的特征。我们的目标是测试在因果特征上训练的模型是否在不同领域中泛化更好的假设。毫无例外，我们发现使用所有可用特征的预测器，无论因果关系如何，都比使用因果特征的预测器在领域内和领域外的准确性更高。此外，即使从一个领域到另一个领域的准确性绝对下降对于因果预测器来说也不比使用所有特征的模型更好。此外，我们发现最近针对领域泛化的因果机器学习方法在我们的评估中表现并不比在一组因果特征上训练的标准预测器更好。同样地，因果发现算法要么无法运行，要么选择的因果变量的表现不比我们的选择更好。大量鲁棒性检查证实我们的发现在变量误分类下是稳定的。

更新时间: 2024-10-23 09:29:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.09891v2

Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the increased number of geographical units that need to be maintained. In addition, region representation learning is a universal approach for gaining urban knowledge for diverse metropolitan downstream tasks. Although many researchers have developed comprehensive frameworks to describe urban units from multi-source data, they have not clarified the relationship between the selected geographical elements. Furthermore, metropolitan areas naturally preserve ranked structures, like cities and their inclusive districts, which makes elucidating relations between cross-level urban units necessary. Therefore, we develop a heterogeneous graph-based model to generate meaningful region embeddings at multiple spatial resolutions for predicting different types of inter-level OD flows. To demonstrate the effectiveness of the proposed method, extensive experiments were conducted using real-world aggregated mobile phone datasets collected from Shizuoka Prefecture, Japan. The results indicate that our proposed model outperforms existing models in terms of a uniform urban structure. We extend the understanding of predicted results using reasonable explanations to enhance the credibility of the model.

Updated: 2024-10-23 09:25:58

标题: 可解释的分层城市表示学习用于通勤流量预测

摘要: 通勤流量预测是城市运营中的一个重要任务。先前的研究表明，使用多种辅助数据估计城市内通勤的出发地-目的地（OD）需求是可行的。然而，大多数现有方法不适合处理类似的大规模任务，即在一个府或整个国家内，因为需要维护的地理单元数量增加了。此外，区域表示学习是获取多元城市下游任务的城市知识的通用方法。尽管许多研究人员已经开发了综合框架来描述来自多源数据的城市单元，但他们尚未澄清所选地理元素之间的关系。此外，大都会区自然地保留了城市及其包括区的排名结构，这使得阐明跨级城市单元之间的关系是必要的。因此，我们开发了一种基于异质图的模型，以在多个空间分辨率上生成有意义的区域嵌入，用于预测不同类型的跨级OD流量。为了证明所提方法的有效性，我们使用从日本静冈县收集的真实聚合移动电话数据进行了广泛实验。结果表明，我们提出的模型在统一城市结构方面优于现有模型。我们通过合理的解释扩展了对预测结果的理解，以增强模型的可信度。

更新时间: 2024-10-23 09:25:58

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2408.14762v4

Scalable Random Feature Latent Variable Models

Random feature latent variable models (RFLVMs) represent the state-of-the-art in latent variable models, capable of handling non-Gaussian likelihoods and effectively uncovering patterns in high-dimensional data. However, their heavy reliance on Monte Carlo sampling results in scalability issues which makes it difficult to use these models for datasets with a massive number of observations. To scale up RFLVMs, we turn to the optimization-based variational Bayesian inference (VBI) algorithm which is known for its scalability compared to sampling-based methods. However, implementing VBI for RFLVMs poses challenges, such as the lack of explicit probability distribution functions (PDFs) for the Dirichlet process (DP) in the kernel learning component, and the incompatibility of existing VBI algorithms with RFLVMs. To address these issues, we introduce a stick-breaking construction for DP to obtain an explicit PDF and a novel VBI algorithm called ``block coordinate descent variational inference" (BCD-VI). This enables the development of a scalable version of RFLVMs, or in short, SRFLVM. Our proposed method shows scalability, computational efficiency, superior performance in generating informative latent representations and the ability of imputing missing data across various real-world datasets, outperforming state-of-the-art competitors.

Updated: 2024-10-23 09:22:43

标题: 可扩展的随机特征潜变量模型

摘要: 随机特征潜变量模型（RFLVMs）代表了潜变量模型的最新技术，能够处理非高斯似然，并有效地发现高维数据中的模式。然而，它们对蒙特卡罗抽样的严重依赖导致了可扩展性问题，使得在具有大量观测值的数据集中使用这些模型变得困难。为了扩展 RFLVMs，我们转向基于优化的变分贝叶斯推断（VBI）算法，该算法与基于抽样的方法相比具有可扩展性。然而，为 RFLVMs 实现 VBI 存在挑战，例如内核学习组件中狄利克雷过程（DP）的概率分布函数（PDFs）缺失，以及现有 VBI 算法与 RFLVMs 的不兼容性。为了解决这些问题，我们引入了一个破棍构造方法来获得狄利克雷过程的显式 PDF，并提出了一种名为“块坐标下降变分推断”（BCD-VI）的新颖 VBI 算法。这使得开发可扩展版本的 RFLVMs 成为可能，简称为 SRFLVM。我们提出的方法显示出可扩展性、计算效率、在产生信息丰富的潜在表示和对各种真实世界数据集中缺失数据的填充能力方面的卓越性能，胜过了最先进的竞争对手。

更新时间: 2024-10-23 09:22:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17700v1

Dreaming Learning

Incorporating novelties into deep learning systems remains a challenging problem. Introducing new information to a machine learning system can interfere with previously stored data and potentially alter the global model paradigm, especially when dealing with non-stationary sources. In such cases, traditional approaches based on validation error minimization offer limited advantages. To address this, we propose a training algorithm inspired by Stuart Kauffman's notion of the Adjacent Possible. This novel training methodology explores new data spaces during the learning phase. It predisposes the neural network to smoothly accept and integrate data sequences with different statistical characteristics than expected. The maximum distance compatible with such inclusion depends on a specific parameter: the sampling temperature used in the explorative phase of the present method. This algorithm, called Dreaming Learning, anticipates potential regime shifts over time, enhancing the neural network's responsiveness to non-stationary events that alter statistical properties. To assess the advantages of this approach, we apply this methodology to unexpected statistical changes in Markov chains and non-stationary dynamics in textual sequences. We demonstrated its ability to improve the auto-correlation of generated textual sequences by $\sim 29\%$ and enhance the velocity of loss convergence by $\sim 100\%$ in the case of a paradigm shift in Markov chains.

Updated: 2024-10-23 09:17:31

标题: 梦想学习

摘要: 将新颖性融入深度学习系统仍然是一个具有挑战性的问题。向机器学习系统引入新信息可能会干扰先前存储的数据，并且有可能改变全局模型范式，特别是在处理非稳态来源时。在这种情况下，基于验证误差最小化的传统方法提供的优势有限。为了解决这个问题，我们提出了一种受Stuart Kauffman“相邻可能性”概念启发的训练算法。这种新颖的训练方法在学习阶段探索新的数据空间。它使神经网络倾向于平稳地接受和整合具有不同统计特征的数据序列。与预期不同，这种包容性的最大距离取决于一个特定的参数：在当前方法的探索阶段中使用的采样温度。这种算法被称为“梦想学习”，它预期随着时间的推移可能发生的制度转变，增强神经网络对改变统计属性的非稳态事件的响应能力。为了评估这种方法的优势，我们将这种方法应用于马尔可夫链中的意外统计变化以及文本序列中的非稳态动态。我们证明了它能够将生成的文本序列的自相关性提高约29％，并且在马尔可夫链范式转变的情况下可以将损失收敛速度提高约100％。

更新时间: 2024-10-23 09:17:31

领域: cs.LG,physics.data-an

下载: http://arxiv.org/abs/2410.18156v1

Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes

Power grid load scheduling is a critical task that ensures the balance between electricity generation and consumption while minimizing operational costs and maintaining grid stability. Traditional optimization methods often struggle with the dynamic and stochastic nature of power systems, especially when faced with renewable energy sources and fluctuating demand. This paper proposes a reinforcement learning (RL) approach using a Markov Decision Process (MDP) framework to address the challenges of dynamic load scheduling. The MDP is defined by a state space representing grid conditions, an action space covering control operations like generator adjustments and storage management, and a reward function balancing economic efficiency and system reliability. We investigate the application of various RL algorithms, from basic Q-Learning to more advanced Deep Q-Networks (DQN) and Actor-Critic methods, to determine optimal scheduling policies. The proposed approach is evaluated through a simulated power grid environment, demonstrating its potential to improve scheduling efficiency and adapt to variable demand patterns. Our results show that the RL-based method provides a robust and scalable solution for real-time load scheduling, contributing to the efficient management of modern power grids.

Updated: 2024-10-23 09:16:22

标题: 使用强化学习和马尔可夫决策过程优化电网负载调度

摘要: 电网负荷调度是一项关键任务，确保电力生成与消耗之间的平衡，同时最小化运营成本并保持电网稳定性。传统优化方法通常难以应对电力系统的动态和随机性，特别是面对可再生能源和波动需求时。本文提出了一种使用马尔可夫决策过程（MDP）框架的强化学习（RL）方法，以解决动态负荷调度的挑战。MDP由表示电网条件的状态空间、涵盖控制操作（如发电机调整和储能管理）的行动空间以及平衡经济效率和系统可靠性的奖励函数定义。我们研究了各种RL算法的应用，从基本的Q学习到更高级的深度Q网络（DQN）和演员-评论家方法，以确定最佳调度策略。通过模拟电网环境评估了所提出的方法，表明其潜力提高调度效率并适应可变需求模式。我们的结果显示，基于RL的方法为实时负荷调度提供了稳健且可扩展的解决方案，有助于有效管理现代电网。

更新时间: 2024-10-23 09:16:22

领域: cs.LG

下载: http://arxiv.org/abs/2410.17696v1

An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Question Answering (QA) systems face challenges in handling complex questions that require multi-domain knowledge synthesis. The naive RAG models, although effective in information retrieval, struggle with complex questions that require comprehensive and in-depth answers. The pioneering task is defined as explanatory answer generation, which entails handling identified challenges such as the requirement for comprehensive information and logical coherence within the generated context. To address these issues, we refer to systematic thinking theory and propose SynthRAG, an innovative framework designed to enhance QA performance. SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring, generating systematic information to ensure detailed coverage, and producing customized answers tailored to specific user inquiries. This structured approach guarantees logical coherence and thorough integration of information, yielding responses that are both insightful and methodically organized. Empirical evaluations underscore SynthRAG's effectiveness, demonstrating its superiority in handling complex questions, overcoming the limitations of naive RAG models, and significantly improving answer quality and depth. Furthermore, an online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement, with each response averaging 5.73 upvotes and surpassing the performance of 79.8% of human contributors, highlighting the practical relevance and impact of the proposed framework. Our code is available at https://github.com/czy1999/SynthRAG .

Updated: 2024-10-23 09:14:57

标题: 一个用于在线问答平台生成系统性解释性答案的自适应框架

摘要: 问答（QA）系统在处理需要多领域知识综合的复杂问题时面临挑战。尽管朴素的RAG模型在信息检索方面有效，但在需要全面和深入回答的复杂问题上却遇到困难。开创性的任务被定义为解释性答案生成，其中涉及处理诸如生成的上下文中要求全面信息和逻辑连贯性等已确定的挑战。为了解决这些问题，我们参考系统性思维理论，并提出SynthRAG，这是一个旨在提高QA性能的创新框架。SynthRAG通过采用适应性大纲进行动态内容结构化，生成系统化信息以确保详尽覆盖，并生成根据特定用户查询定制的答案，改进传统模型。这种结构化方法确保了逻辑连贯性和信息的彻底整合，产生既富有洞见又系统有序的回答。实证评估强调了SynthRAG的有效性，展示了其在处理复杂问题、克服朴素RAG模型的局限性以及显著提高答案质量和深度方面的优越性。此外，在知乎平台上的在线部署显示，SynthRAG的答案获得了显著的用户参与度，每个回答平均获得5.73个赞，并超过了79.8%的人类贡献者的表现，突显了所提出的框架的实际相关性和影响。我们的代码可在https://github.com/czy1999/SynthRAG 上找到。

更新时间: 2024-10-23 09:14:57

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2410.17694v1

Generative Forests

We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e. supervised learning) with two classes. Experiments on the quality of generated data display substantial improvements compared to the state of the art. The losses our algorithm minimize and the structure of our models make them practical for related tasks that require fast estimation of a density given a generative model and an observation (even partially specified): such tasks include missing data imputation and density estimation. Additional experiments on these tasks reveal that our models can be notably good contenders to diverse state of the art methods, relying on models as diverse as (or mixing elements of) trees, neural nets, kernels or graphical models.

Updated: 2024-10-23 09:11:00

标题: 生成森林

摘要: 我们关注的是生成式人工智能，针对一种仍然代表最普遍数据形式之一的数据类型：表格数据。我们的论文引入了两个关键贡献：适用于此类任务的一类新的基于森林的强大模型，以及一个简单的训练算法，具有在类似于原始弱/强监督学习设置中保证强收敛的功能。该算法可以通过对决策树归纳（即监督学习）的最流行归纳方案进行一些调整来实现。与现有技术相比，生成数据质量的实验显示出显著的改进。我们的算法最小化的损失和模型结构使其适用于需要在生成模型和观察（甚至是部分指定的观察）给定的密度进行快速估计的相关任务：这些任务包括缺失数据填补和密度估计。对这些任务的额外实验表明，我们的模型可以成为各种最新方法的显著竞争者，依赖于各种模型，如树、神经网络、核或图模型的混合元素。

更新时间: 2024-10-23 09:11:00

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2308.03648v2

Probabilistic ML Verification via Weighted Model Integration

In machine learning (ML) verification, the majority of procedures are non-quantitative and therefore cannot be used for verifying probabilistic models, or be applied in domains where hard guarantees are practically unachievable. The probabilistic formal verification (PFV) of ML models is in its infancy, with the existing approaches limited to specific ML models, properties, or both. This contrasts with standard formal methods techniques, whose successful adoption in real-world scenarios is also due to their support for a wide range of properties and diverse systems. We propose a unifying framework for the PFV of ML systems based on Weighted Model Integration (WMI), a relatively recent formalism for probabilistic inference with algebraic and logical constraints. Crucially, reducing the PFV of ML models to WMI enables the verification of many properties of interest over a wide range of systems, addressing multiple limitations of deterministic verification and ad-hoc algorithms. We substantiate the generality of the approach on prototypical tasks involving the verification of group fairness, monotonicity, robustness to noise, probabilistic local robustness and equivalence among predictors. We characterize the challenges related to the scalability of the approach and, through our WMI-based perspective, we show how successful scaling techniques in the ML verification literature can be generalized beyond their original scope.

Updated: 2024-10-23 09:04:57

标题: 基于加权模型集成的概率机器学习验证

摘要: 在机器学习（ML）验证中，大多数程序都是非定量的，因此无法用于验证概率模型，或者在实际上无法实现硬性保证的领域中应用。机器学习模型的概率形式验证（PFV）处于萌芽阶段，现有方法局限于特定的机器学习模型、属性或两者兼有。这与标准形式方法技术形成鲜明对比，后者之所以成功应用于现实场景，也是因为它们支持各种属性和多样的系统。我们提出了一个基于加权模型集成（WMI）的机器学习系统概率形式验证的统一框架，WMI是一种相对较新的形式主义，用于具有代数和逻辑约束的概率推断。将机器学习模型的PFV简化为WMI至关重要，这使得可以验证许多感兴趣的属性，涵盖了广泛的系统范围，解决了确定性验证和临时算法的多种限制。我们通过验证群体公平性、单调性、对噪声的鲁棒性、概率局部鲁棒性和预测器之间的等价性等原型任务，实质化了该方法的普适性。我们对该方法的可扩展性所面临的挑战进行了描述，并通过基于WMI的视角展示了机器学习验证文献中成功扩展技术如何可以超越其原始范围。

更新时间: 2024-10-23 09:04:57

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04892v2

Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias

With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.

Updated: 2024-10-23 08:39:00

标题: 多激发的投影模拟与受到多体物理启发的归纳偏差

摘要: 随着深度学习的显著进展，依赖机器学习的应用越来越多地融入到日常生活中。然而，大多数深度学习模型具有不透明的、类似于神谕的性质，使得解释和理解它们的决策变得困难。这一问题导致了所谓的可解释人工智能（XAI）领域的发展。该领域中的一种方法称为投影模拟（PS），将思维链建模为粒子在具有概念的顶点的图上的随机行走。尽管这种描述具有各种优点，包括量化的可能性，但它无法自然地用于模拟同时结合多个概念的思想。为了克服这一限制，我们引入了多激发投影模拟（mePS），这是对链式思维的一种概括，将其视为几个粒子在超图上的随机行走。提出了动态超图的定义，用于描述代理的训练历史以及在人工智能和超图可视化方面的应用。受量子多体物理中极为成功的几体相互作用模型启发，为我们的经典mePS框架形式化了一种归纳偏差，并用于解决与超图的朴素实现相关的指数复杂性。我们证明了我们的归纳偏差将复杂度从指数降低到多项式，指数代表了多少粒子可以相互作用的截断。我们在两个玩具环境和一个更复杂的场景中，模拟了一台损坏计算机的诊断，对我们的方法进行了数值应用。这些环境展示了通过适当选择归纳偏差所提供的资源节约，同时展示了可解释性的一些方面。还简要概述了mePS的量子模型，并讨论了一些未来的方向。

更新时间: 2024-10-23 08:39:00

领域: cs.LG,cs.AI,cs.DM,quant-ph

下载: http://arxiv.org/abs/2402.10192v3

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

In this paper, we study the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem, a relevant yet underexplored area in existing literature. We then propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden neurons of the LLMs may contain uncertainty information. Our designed approach demonstrates the benefits of utilizing hidden activations to enhance uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. We distinguish the uncertainty estimation task from the uncertainty calibration task and show that better uncertainty estimation leads to better calibration performance. Furthermore, our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.

Updated: 2024-10-23 08:33:54

标题: LLMs的不确定性估计和量化：一种简单的监督方法

摘要: 在这篇论文中，我们研究了LLMs的不确定性估计和校准问题。我们首先对不确定性估计问题进行了阐述，这是现有文献中一个相关但未被深入研究的领域。然后，我们提出了一种监督方法，利用标记的数据集来估计LLMs响应的不确定性。基于这个表述，我们阐明了LLMs的不确定性估计与标准ML模型的不确定性估计之间的区别，并解释了为什么LLMs的隐藏神经元可能包含不确定性信息。我们设计的方法展示了利用隐藏激活来增强各种任务中的不确定性估计的好处，并在超出分布的设置中表现出强大的可迁移性。我们区分了不确定性估计任务和不确定性校准任务，并表明更好的不确定性估计会导致更好的校准性能。此外，我们的方法易于实现，适用于不同级别的模型可访问性，包括黑盒、灰盒和白盒。

更新时间: 2024-10-23 08:33:54

领域: cs.LG,cs.CL,68T07, 68T50

下载: http://arxiv.org/abs/2404.15993v4

Towards Foundation Model for Chemical Reactor Modeling: Meta-Learning with Physics-Informed Adaptation

In this work, we present a novel application of foundation models for chemical reactor modeling. Accurate modeling of real-world chemical reactors through first-principles is often challenging, and the process of rebuilding and retraining models for each new chemical process is inefficient. This raises a critical question: can we develop a single, universal neural network (i.e., a foundation model) that can rapidly adapt to any new chemical process in a reactor? To address this, we propose a foundation model for chemical reactor modeling that employs a meta-learning approach, followed by physics-informed fine-tuning on new tasks with only a few data samples. Our model is designed to generalize across three classic reactor types: continuous stirred tank reactors, batch reactors, and plug flow reactors. Compared to conventional methods such as data-driven learning, physics-informed learning, transfer learning, and meta-learning, our approach demonstrates superior performance in few-shot scenarios. Specifically, it shows rapid adaptation to unseen reactions with varying integer orders across different reactor set-ups, requiring minimal data for fine-tuning. Source code is available at https://github.com/killingbear999/chemical-reactor-foundation-model.

Updated: 2024-10-23 08:29:20

标题: 朝向化学反应器建模的基础模型：物理启发适应的元学习

摘要: 在这项工作中，我们提出了一种新颖的基于基础模型的化学反应器建模应用。通过第一原理准确建模真实世界的化学反应器通常具有挑战性，为每个新的化学过程重新构建和重新训练模型的过程效率低下。这引发了一个关键问题：我们是否能够开发一个单一、通用的神经网络（即基础模型），可以快速适应反应器中的任何新化学过程？为了解决这个问题，我们提出了一种基于元学习方法的化学反应器建模基础模型，随后在新任务上进行基于物理信息的微调，只需少量数据样本。我们的模型旨在在连续搅拌罐反应器、批量反应器和塞流反应器三种经典反应器类型之间实现泛化。与传统方法如数据驱动学习、基于物理信息的学习、迁移学习和元学习相比，我们的方法在少样本场景中表现出卓越性能。具体而言，它能够快速适应不同反应器设置中具有不同整数阶的未见反应，只需最少数据进行微调。源代码可在https://github.com/killingbear999/chemical-reactor-foundation-model 上找到。

更新时间: 2024-10-23 08:29:20

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.11752v2

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute requirements, many resource-constrained applications still rely on convolutional or hybrid models that combine the benefits of convolution and attention layers and achieve the best results in the sub 100M parameter range. Simultaneously, task adaptation techniques that allow for the use of one shared transformer backbone for multiple downstream tasks, resulting in great storage savings at negligible cost in performance, have not yet been adopted for hybrid transformers. In this work, we investigate how to achieve the best task-adaptation performance and introduce PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. We further combine PETAH adaptation with pruning to achieve highly performant and storage friendly models for multi-tasking. In our extensive evaluation on classification and other vision tasks, we demonstrate that our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.

Updated: 2024-10-23 08:24:47

标题: PETAH：资源有限环境中混合变压器的参数高效任务适应

摘要: 在自然语言处理（NLP）中取得成功后，计算机视觉领域开始转向变压器模型。尽管变压器表现良好，并提供有前途的多任务性能，但由于其高计算需求，许多资源受限的应用仍然依赖卷积或混合模型，这些模型结合了卷积和注意力层的优势，在小于1亿参数范围内取得最佳结果。同时，允许在多个下游任务中使用共享变压器骨干的任务适应技术，可以实现巨大的存储节省，而性能成本几乎可以忽略不计，但这种技术尚未被混合变压器采纳。在这项工作中，我们研究如何实现最佳的任务适应性能，并引入PETAH：参数高效任务适应性混合变压器。我们进一步将PETAH适应性与修剪相结合，以实现高性能且存储友好的多任务模型。在我们对分类和其他视觉任务的广泛评估中，我们证明我们的PETAH适应的混合模型优于针对ViTs的已建立的任务适应技术，同时需要更少的参数，并且在移动硬件上更高效。

更新时间: 2024-10-23 08:24:47

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17661v1

Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

Reward models trained on human preference data have been proven to effectively align Large Language Models (LLMs) with human intent within the framework of reinforcement learning from human feedback (RLHF). However, current reward models have limited generalization capabilities to unseen prompts and responses, which can lead to an unexpected phenomenon known as reward over-optimization, resulting in a decline in actual performance due to excessive optimization of rewards. While previous research has advocated for constraining policy optimization, our study introduces a novel approach to enhance the reward model's generalization ability against distribution shifts by regularizing the hidden states. Specifically, we retain the base model's language model head and incorporate a suite of text-generation losses to preserve the hidden states' text-generation capabilities, while concurrently learning a reward head behind the same hidden states. Our experimental results demonstrate that the introduced regularization technique markedly improves the accuracy of learned reward models across a variety of out-of-distribution (OOD) tasks and effectively alleviates the over-optimization issue in RLHF, offering a more reliable and robust preference learning paradigm.

Updated: 2024-10-23 08:22:44

标题: 规范隐藏状态使得学习LLMs的可泛化奖励模型

摘要: 基于人类偏好数据训练的奖励模型已被证明能够有效地将大型语言模型（LLMs）与人类意图在强化学习从人类反馈（RLHF）框架内对齐。然而，当前的奖励模型在未见提示和回应方面具有有限的泛化能力，这可能导致一种称为奖励过度优化的意外现象，由于奖励过度优化导致实际性能下降。尽管先前的研究倡导限制策略优化，但我们的研究引入了一种新颖的方法，通过对隐藏状态进行规范化来增强奖励模型对分布转移的泛化能力。具体来说，我们保留基础模型的语言模型头，并引入一套文本生成损失，以保留隐藏状态的文本生成能力，同时在相同的隐藏状态后面学习一个奖励头。我们的实验结果表明，引入的正则化技术显著提高了学习奖励模型在各种分布外（OOD）任务上的准确性，并有效缓解了RLHF中的过度优化问题，提供了一种更可靠和鲁棒的偏好学习范式。

更新时间: 2024-10-23 08:22:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10216v2

AutoRNet: Automatically Optimizing Heuristics for Robust Network Design via Large Language Models

Achieving robust networks is a challenging problem due to its NP-hard nature and complex solution space. Current methods, from handcrafted feature extraction to deep learning, have made progress but remain rigid, requiring manual design and large labeled datasets. To address these issues, we propose AutoRNet, a framework that integrates large language models (LLMs) with evolutionary algorithms to generate heuristics for robust network design. We design network optimization strategies to provide domain-specific prompts for LLMs, utilizing domain knowledge to generate advanced heuristics. Additionally, we introduce an adaptive fitness function to balance convergence and diversity while maintaining degree distributions. AutoRNet is evaluated on sparse and dense scale-free networks, outperforming current methods by reducing the need for manual design and large datasets.

Updated: 2024-10-23 08:18:38

标题: AutoRNet: 通过大型语言模型自动优化启发式算法，实现鲁棒网络设计

摘要: 实现强大网络是一个具有挑战性的问题，因为它具有 NP 难度和复杂的解决空间。当前的方法，从手工特征提取到深度学习，取得了进展，但仍然刚性，需要手动设计和大规模标记数据集。为了解决这些问题，我们提出了 AutoRNet，这是一个将大型语言模型（LLMs）与进化算法相结合的框架，用于生成用于强大网络设计的启发式。我们设计了网络优化策略，为LLMs提供特定领域提示，利用领域知识生成先进的启发式。此外，我们引入了自适应适应度函数以平衡收敛性和多样性，同时保持度分布。AutoRNet 在稀疏和密集的无标度网络上进行评估，通过减少对手动设计和大型数据集的需求，优于当前方法。

更新时间: 2024-10-23 08:18:38

领域: cs.AI

下载: http://arxiv.org/abs/2410.17656v1

Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

Bias assessment of news sources is paramount for professionals, organizations, and researchers who rely on truthful evidence for information gathering and reporting. While certain bias indicators are discernible from content analysis, descriptors like political bias and fake news pose greater challenges. In this paper, we propose an extension to a recently presented news media reliability estimation method that focuses on modeling outlets and their longitudinal web interactions. Concretely, we assess the classification performance of four reinforcement learning strategies on a large news media hyperlink graph. Our experiments, targeting two challenging bias descriptors, factual reporting and political bias, showed a significant performance improvement at the source media level. Additionally, we validate our methods on the CLEF 2023 CheckThat! Lab challenge, outperforming the reported results in both, F1-score and the official MAE metric. Furthermore, we contribute by releasing the largest annotated dataset of news source media, categorized with factual reporting and political bias labels. Our findings suggest that profiling news media sources based on their hyperlink interactions over time is feasible, offering a bird's-eye view of evolving media landscapes.

Updated: 2024-10-23 08:18:26

标题: 绘制媒体景观：通过网络互动预测事实报道和政治偏见

摘要: 新闻来源的偏见评估对于依赖真实证据进行信息收集和报道的专业人士、组织和研究人员至关重要。虽然某些偏见指标可以从内容分析中辨识出来，但政治偏见和虚假新闻等描述符提出了更大的挑战。在本文中，我们提出了对最近提出的新闻媒体可靠性估计方法进行扩展，重点是对媒体和它们的纵向网络互动进行建模。具体来说，我们评估了四种强化学习策略在大型新闻媒体超链接图上的分类性能。我们的实验针对两个具有挑战性的偏见描述符，事实报道和政治偏见，显示出了在源媒体级别显著的性能改进。此外，我们在CLEF 2023 CheckThat!实验室挑战中验证了我们的方法，在F1分数和官方MAE度量标准上超过了报告的结果。此外，我们通过发布最大的新闻来源媒体的带有事实报道和政治偏见标签的注释数据集，做出了贡献。我们的研究结果表明，基于它们长期的超链接互动来对新闻媒体来源进行剖析是可行的，从而提供了一个俯瞰不断发展的媒体景观的全貌。

更新时间: 2024-10-23 08:18:26

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.17655v1

xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories

Time series data is prevalent across numerous fields, necessitating the development of robust and accurate forecasting models. Capturing patterns both within and between temporal and multivariate components is crucial for reliable predictions. We introduce xLSTM-Mixer, a model designed to effectively integrate temporal sequences, joint time-variate information, and multiple perspectives for robust forecasting. Our approach begins with a linear forecast shared across variates, which is then refined by xLSTM blocks. These blocks serve as key elements for modeling the complex dynamics of challenging time series data. xLSTM-Mixer ultimately reconciles two distinct views to produce the final forecast. Our extensive evaluations demonstrate xLSTM-Mixer's superior long-term forecasting performance compared to recent state-of-the-art methods. A thorough model analysis provides further insights into its key components and confirms its robustness and effectiveness. This work contributes to the resurgence of recurrent models in time series forecasting.

Updated: 2024-10-23 08:13:11

标题: xLSTM-Mixer：通过标量记忆混合的多元时间序列预测

摘要: 时间序列数据在许多领域中普遍存在，需要开发出稳健且准确的预测模型。捕捉时间序列和多变量组件内部和之间的模式对于可靠的预测至关重要。我们引入了xLSTM-Mixer，这是一个旨在有效整合时间序列、联合时间-变量信息和多个视角以进行稳健预测的模型。我们的方法从一个跨变量共享的线性预测开始，然后通过xLSTM块进行精细调整。这些块是模拟具有挑战性的时间序列数据复杂动态的关键元素。xLSTM-Mixer最终调和了两种不同的视角，以生成最终的预测。我们的广泛评估显示，与最近的最先进方法相比，xLSTM-Mixer具有更优越的长期预测性能。彻底的模型分析进一步提供了对其关键组件的见解，并确认了其稳健性和有效性。这项工作为时间序列预测中循环模型的复苏做出了贡献。

更新时间: 2024-10-23 08:13:11

领域: cs.LG

下载: http://arxiv.org/abs/2410.16928v2

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.

Updated: 2024-10-23 08:07:05

标题: 模仿的艺术：从少量示范学习长时程操纵任务

摘要: 任务参数化高斯混合模型（TP-GMM）是一种学习以物体为中心的机器人操作任务的高效方法。然而，在野外应用TP-GMM面临几个挑战。在这项工作中，我们协同解决了三个关键挑战。首先，末端执行器速度是非欧几里德的，因此很难使用标准GMM进行建模。因此，我们提出将机器人的末端执行器速度分解为方向和大小，并使用黎曼GMM进行建模。其次，我们利用分解后的速度从复杂示范轨迹中分割和排序技能。通过分割，我们进一步对齐技能轨迹，从而利用时间作为一个强大的归纳偏差。第三，我们提出了一种从视觉观察中自动检测每个技能相关任务参数的方法。我们的方法能够仅通过五次示范学习复杂的操作任务，同时仅使用RGB-D观测。在RLBench上进行的大量实验评估表明，我们的方法实现了具有20倍改进的采样效率的最新性能。我们的策略可以在不同环境、物体实例和物体位置之间泛化，而学到的技能是可重复使用的。

更新时间: 2024-10-23 08:07:05

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.13432v3

Towards Active Participant-Centric Vertical Federated Learning: Some Representations May Be All You Need

Vertical Federated Learning (VFL) enables collaborative model training across different participants with distinct features and common samples, while preserving data privacy. Existing VFL methodologies often struggle with realistic data partitions, typically incurring high communication costs and significant operational complexity. In this work, we introduce a novel simplified approach to VFL, Active Participant-Centric VFL (APC-VFL), that, to the best of our knowledge, is the first to require only a single communication round between participants, and allows the active participant to do inference in a non collaborative fashion. This method integrates unsupervised representation learning with knowledge distillation to achieve comparable accuracy to traditional VFL methods based on vertical split learning in classical settings, reducing required communication rounds by up to $4200\times$, while being more flexible. Our approach also shows improvements compared to non-federated local models, as well as a comparable VFL proposal, VFedTrans, offering an efficient and flexible solution for collaborative learning.

Updated: 2024-10-23 08:07:00

标题: 朝向主动参与者中心的垂直联邦学习：有些表示可能是你所需的一切

摘要: Vertical Federated Learning（VFL）实现了不同参与者之间的协作模型训练，这些参与者具有不同的特征和共同的样本，同时保护数据隐私。现有的VFL方法通常在现实数据分区中遇到困难，通常导致高通信成本和显着的操作复杂性。在这项工作中，我们介绍了一种新颖的简化VFL方法，Active Participant-Centric VFL（APC-VFL），据我们所知，这是第一个只需要参与者之间进行一轮通信的方法，并允许主动参与者以非协作方式进行推断。该方法将无监督表示学习与知识蒸馏相结合，以在传统设置中基于垂直拆分学习的传统VFL方法达到可比的准确性，将所需的通信轮次减少了高达4200倍，同时更加灵活。我们的方法还显示出了与非联邦本地模型以及可比的VFL提案VFedTrans相比的改进，为协作学习提供了高效且灵活的解决方案。

更新时间: 2024-10-23 08:07:00

领域: cs.LG

下载: http://arxiv.org/abs/2410.17648v1

Integral Operator Approaches for Scattered Data Fitting on Spheres

This paper focuses on scattered data fitting problems on spheres. We study the approximation performance of a class of weighted spectral filter algorithms, including Tikhonov regularization, Landaweber iteration, spectral cut-off, and iterated Tikhonov, in fitting noisy data with possibly unbounded random noise. For the analysis, we develop an integral operator approach that can be regarded as an extension of the widely used sampling inequality approach and norming set method in the community of scattered data fitting. After providing an equivalence between the operator differences and quadrature rules, we succeed in deriving optimal Sobolev-type error estimates of weighted spectral filter algorithms. Our derived error estimates do not suffer from the saturation phenomenon for Tikhonov regularization in the literature, native-space-barrier for existing error analysis and adapts to different embedding spaces. We also propose a divide-and-conquer scheme to equip weighted spectral filter algorithms to reduce their computational burden and present the optimal approximation error bounds.

Updated: 2024-10-23 08:06:25

标题: 在球面上散点数据拟合的积分算子方法

摘要: 这篇论文关注球面上的散乱数据拟合问题。我们研究了一类加权谱滤波算法，包括Tikhonov正则化、Landaweber迭代、谱截断和迭代Tikhonov，在拟合可能具有无界随机噪声的嘈杂数据时的逼近性能。为了进行分析，我们开发了一种积分算子方法，可以被看作是散乱数据拟合领域中广泛使用的采样不等式方法和规范集方法的延伸。在提供了算子差异与积分规则之间的等价性后，我们成功地推导出了加权谱滤波算法的最优Sobolev类型误差估计。我们推导的误差估计不会受到文献中Tikhonov正则化的饱和现象、现有误差分析的本地空间障碍以及不同嵌入空间的适应性的影响。我们还提出了一个分而治之的方案，以减少加权谱滤波算法的计算负担，并提出了最优逼近误差界。

更新时间: 2024-10-23 08:06:25

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2401.15294v3

Hadamard Representations: Augmenting Hyperbolic Tangents in RL

Activation functions are one of the key components of a deep neural network. The most commonly used activation functions can be classed into the category of continuously differentiable (e.g. tanh) and linear-unit functions (e.g. ReLU), both having their own strengths and drawbacks with respect to downstream performance and representation capacity through learning (e.g. measured by the number of dead neurons and the effective rank). In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to linear-unit functions. We provide insights into the vanishing gradients associated with the former, and show that the dying neuron problem is not exclusive to ReLU's. To alleviate vanishing gradients and the resulting dying neuron problem occurring with continuously differentiable activations, we propose a Hadamard representation. Using deep Q-networks and proximal policy optimization in the Atari domain, we show faster learning, a reduction in dead neurons and increased effective rank.

Updated: 2024-10-23 08:05:57

标题: 哈达玛表示法：在强化学习中增强双曲正切函数

摘要: 激活函数是深度神经网络的关键组成部分之一。最常用的激活函数可以分为连续可微的（如tanh）和线性单元函数（如ReLU）两类，它们在下游性能和表示容量方面各有优势和缺点（例如通过死神经元数量和有效秩来衡量）。在强化学习中，与线性单元函数相比，连续可微激活函数的性能常常表现不佳。我们提供了与前者相关的梯度消失的见解，并展示了濒死神经元问题并不仅限于ReLU。为了缓解连续可微激活函数导致的梯度消失和由此产生的濒死神经元问题，我们提出了一个Hadamard表示。在Atari领域使用深度Q网络和近端政策优化，我们展示了更快的学习速度，死神经元数量减少和有效秩增加。

更新时间: 2024-10-23 08:05:57

领域: cs.LG

下载: http://arxiv.org/abs/2406.09079v2

Entity-based Reinforcement Learning for Autonomous Cyber Defence

A significant challenge for autonomous cyber defence is ensuring a defensive agent's ability to generalise across diverse network topologies and configurations. This capability is necessary for agents to remain effective when deployed in dynamically changing environments, such as an enterprise network where devices may frequently join and leave. Standard approaches to deep reinforcement learning, where policies are parameterised using a fixed-input multi-layer perceptron (MLP) expect fixed-size observation and action spaces. In autonomous cyber defence, this makes it hard to develop agents that generalise to environments with network topologies different from those trained on, as the number of nodes affects the natural size of the observation and action spaces. To overcome this limitation, we reframe the problem of autonomous network defence using entity-based reinforcement learning, where the observation and action space of an agent are decomposed into a collection of discrete entities. This framework enables the use of policy parameterisations specialised in compositional generalisation. Namely, we train a Transformer-based policy on the Yawning Titan cyber-security simulation environment and test its generalisation capabilities across various network topologies. We demonstrate that this approach significantly outperforms an MLP-based policy on fixed networks, and has the ability for zero-shot generalisation to networks of a different size to those seen in training. These findings highlight the potential for entity-based reinforcement learning to advance the field of autonomous cyber defence by providing more generalisable policies capable of handling variations in real-world network environments.

Updated: 2024-10-23 08:04:12

标题: 基于实体的强化学习在自主网络防御中的应用

摘要: 自主网络防御面临的一个重要挑战是确保防御代理在不同网络拓扑和配置之间能够泛化能力。这种能力对于代理在动态变化的环境中保持有效至关重要，比如在企业网络中，设备可能经常加入和离开。标准的深度强化学习方法，其中策略使用固定输入的多层感知器（MLP）参数化，期望固定大小的观测和动作空间。在自主网络防御中，这使得难以开发能够泛化到不同于训练网络拓扑的环境的代理，因为节点数量影响观测和动作空间的自然大小。为了克服这一限制，我们重新构建了自主网络防御问题，使用基于实体的强化学习，其中代理的观测和动作空间被分解为一组离散实体。这个框架使得可以使用专门用于组合泛化的策略参数化。具体来说，我们在Yawning Titan网络安全仿真环境上训练了一个基于Transformer的策略，并测试了它在不同网络拓扑上的泛化能力。我们证明了这种方法在固定网络上明显优于基于MLP的策略，并具有零-shot泛化到具有不同大小的网络的能力。这些发现突显了基于实体的强化学习在推动自主网络防御领域的潜力，提供了更具泛化性能的策略，能够处理现实世界网络环境中的变化。

更新时间: 2024-10-23 08:04:12

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.17647v1

Binarized Simplicial Convolutional Neural Networks

Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on simplicial complexes named Binarized Simplicial Convolutional Neural Networks (Bi-SCNN) based on the combination of simplicial convolution with a binary-sign forward propagation strategy. The usage of the Hodge Laplacian on a binary-sign forward propagation enables Bi-SCNN to efficiently and effectively represent simplicial features that have higher-order structures than traditional graph node representations. Compared to the previous Simplicial Convolutional Neural Networks, the reduced model complexity of Bi-SCNN shortens the execution time without sacrificing the prediction performance and is less prone to the over-smoothing effect. Experimenting with real-world citation and ocean-drifter data confirmed that our proposed Bi-SCNN is efficient and accurate.

Updated: 2024-10-23 07:57:41

标题: 二值化单纯形卷积神经网络

摘要: 图神经网络有一个局限性，即仅处理图节点上的特征，忽略了高维结构（如边和三角形）上的数据。简单卷积神经网络（SCNN）利用单纯复合表示高阶结构，以打破这一限制，尽管仍然缺乏时间效率。在本文中，我们提出了一种基于单纯复合的神经网络架构，称为二值化简单卷积神经网络（Bi-SCNN），基于单纯卷积与二进制符号前向传播策略的结合。二进制符号前向传播的霍奇拉普拉斯使用使Bi-SCNN能够有效地表示具有比传统图节点表示更高阶结构的单纯特征。与以前的简单卷积神经网络相比，Bi-SCNN的模型复杂性降低了执行时间，而不会牺牲预测性能，并且不太容易受到过度平滑效应的影响。通过对真实世界的引文和海洋漂流数据进行实验，证实了我们提出的Bi-SCNN是高效且准确的。

更新时间: 2024-10-23 07:57:41

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.04098v2

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Visual preference alignment involves training Large Vision-Language Models (LVLMs) to predict human preferences between visual inputs. This is typically achieved by using labeled datasets of chosen/rejected pairs and employing optimization algorithms like direct preference optimization (DPO). Existing visual alignment methods, primarily designed for single-image scenarios, struggle to effectively handle the complexity of multi-image tasks due to the scarcity of diverse training data and the high cost of annotating chosen/rejected pairs. We present Multi-Image Augmented Direct Preference Optimization (MIA-DPO), a visual preference alignment approach that effectively handles multi-image inputs. MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats, significantly reducing the costs associated with multi-image data annotations. Our observation reveals that attention values of LVLMs vary considerably across different images. We use attention values to identify and filter out rejected responses the model may have mistakenly focused on. Our attention-aware selection for constructing the chosen/rejected pairs without relying on (i) human annotation, (ii) extra data, and (iii) external models or APIs. MIA-DPO is compatible with various architectures and outperforms existing methods on five multi-image benchmarks, achieving an average performance boost of 3.0% on LLaVA-v1.5 and 4.3% on the recent InternLM-XC2.5. Moreover, MIA-DPO has a minimal effect on the model's ability to understand single images.

Updated: 2024-10-23 07:56:48

标题: MIA-DPO:用于大型视觉语言模型的多图像增强直接偏好优化

摘要: 视觉偏好对齐涉及训练大型视觉-语言模型（LVLMs）来预测人类在视觉输入之间的偏好。通常通过使用选择/拒绝对的标记数据集，并利用直接偏好优化（DPO）等优化算法来实现。现有的视觉对齐方法主要设计用于单图像场景，由于多图像任务的复杂性、多样化训练数据的稀缺性和选择/拒绝对注释的高成本，这些方法往往难以有效处理。我们提出了多图像增强直接偏好优化（MIA-DPO），这是一种有效处理多图像输入的视觉偏好对齐方法。MIA-DPO通过将单图像数据与不相关的图像组成的网格拼贴或画中画格式进行扩展，显著降低了多图像数据注释的成本。我们的观察发现，LVLMs的注意力值在不同图像之间变化很大。我们使用注意力值来识别并过滤模型可能错误关注的拒绝响应。我们的注意力感知选择用于构建选择/拒绝对，不依赖于（i）人类注释，（ii）额外数据，以及（iii）外部模型或API。MIA-DPO与各种架构兼容，在五个多图像基准测试中表现优异，LLaVA-v1.5上平均性能提升3.0％，最近的InternLM-XC2.5上提升4.3％。此外，MIA-DPO对模型理解单图像的能力影响极小。

更新时间: 2024-10-23 07:56:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17637v1

Uncovering the Genetic Basis of Glioblastoma Heterogeneity through Multimodal Analysis of Whole Slide Images and RNA Sequencing Data

Glioblastoma is a highly aggressive form of brain cancer characterized by rapid progression and poor prognosis. Despite advances in treatment, the underlying genetic mechanisms driving this aggressiveness remain poorly understood. In this study, we employed multimodal deep learning approaches to investigate glioblastoma heterogeneity using joint image/RNA-seq analysis. Our results reveal novel genes associated with glioblastoma. By leveraging a combination of whole-slide images and RNA-seq, as well as introducing novel methods to encode RNA-seq data, we identified specific genetic profiles that may explain different patterns of glioblastoma progression. These findings provide new insights into the genetic mechanisms underlying glioblastoma heterogeneity and highlight potential targets for therapeutic intervention.

Updated: 2024-10-23 07:55:40

标题: 揭示胶质母细胞瘤异质性的遗传基础：通过对全切片图像和RNA测序数据进行多模态分析

摘要: Glioblastoma是一种高度侵袭性的脑癌，其特点是快速进展和预后不良。尽管治疗取得了进展，但驱动这种侵袭性的基因机制仍不明确。在这项研究中，我们采用多模态深度学习方法，利用联合图像/RNA-seq分析来研究Glioblastoma的异质性。我们的结果揭示了与Glioblastoma相关的新基因。通过利用整个切片图像和RNA-seq，以及引入新的方法来编码RNA-seq数据，我们识别出可能解释不同Glioblastoma进展模式的特定基因组。这些发现为理解Glioblastoma异质性的遗传机制提供了新的见解，并突出了治疗干预的潜在靶点。

更新时间: 2024-10-23 07:55:40

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2410.18710v1

Markov Chain of Thought for Efficient Mathematical Reasoning

Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, ``derive, then reduce'', we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the \texttt{MCoTInstruct} dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs.

Updated: 2024-10-23 07:53:29

标题: 思维的马尔可夫链用于高效数学推理

摘要: 思维链（CoT）从推理步骤和任务特定操作的逻辑结构中获得多步利益，显著增强了大型语言模型的数学推理能力。随着长CoT的普及，推理步骤的数量超过了可管理的令牌限制，并导致了更高的计算需求。受到人类认知的基本逻辑“推导，然后简化”的启发，我们将标准的多步CoT概念化为一种新颖的思维马尔可夫链（MCoT）。在这项研究中，我们考虑数学推理任务，将每个推理步骤定义为文本配以Python代码片段。为了促进更长的推理路径，通过与代码解释器互动实现自我纠正。我们的MCoT旨在将以前的推理步骤压缩成简化的问题，从而实现在不依赖冗长的KV缓存的情况下进行高效的下一步推理。在我们的实验中，我们整理了MCoTInstruct数据集，实证结果表明MCoT不仅显著提高了效率，而且保持了可比较的准确性。虽然还有许多待探索的地方，但这项工作为探索LLM的长CoT推理能力铺平了道路。

更新时间: 2024-10-23 07:53:29

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17635v1

LMLPA: Language Model Linguistic Personality Assessment

Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.

Updated: 2024-10-23 07:48:51

标题: LMLPA: 语言模型语言人格评估

摘要: 大型语言模型（LLMs）在日常生活和研究中越来越广泛地被使用。其中最常见的用例之一是通过LLMs的语言生成能力实现的对话交互。就像两个人之间的对话一样，由LLM提供动力的实体与人之间的对话取决于交谈者的个性。然而，目前衡量给定LLM的个性是一项挑战。本文介绍了语言模型语言个性评估（LMLPA），这是一个旨在评估LLMs的语言个性的系统。我们的系统通过定量评估其语言输出中反映的不同个性特征，帮助理解LLMs的语言生成能力。与传统的以人为中心的心理测量不同，LMLPA采用了一个个性评估问卷，具体是五大人格问卷，以与LLMs的操作能力保持一致，并且还结合了先前基于语言的个性测量文献的发现。为了减轻对选项顺序的敏感性，我们的问卷设计为开放式，产生文本答案。因此，需要AI评分器将来自文本回答的模糊个性信息转化为清晰的个性特征数字指标。利用主成分分析和可靠性验证，我们的研究结果表明，LLMs具有可以通过LMLPA有效量化的独特个性特征。这项研究对人机交互和以人为中心的人工智能做出了贡献，为未来的研究提供了一个坚固的框架，以完善AI个性评估并将其应用扩展到包括教育和制造在内的多个领域。

更新时间: 2024-10-23 07:48:51

领域: cs.CL,cs.AI,I.2

下载: http://arxiv.org/abs/2410.17632v1

Exploring structure diversity in atomic resolution microscopy with graph neural networks

The emergence of deep learning (DL) has provided great opportunities for the high-throughput analysis of atomic-resolution micrographs. However, the DL models trained by image patches in fixed size generally lack efficiency and flexibility when processing micrographs containing diversified atomic configurations. Herein, inspired by the similarity between the atomic structures and graphs, we describe a few-shot learning framework based on an equivariant graph neural network (EGNN) to analyze a library of atomic structures (e.g., vacancies, phases, grain boundaries, doping, etc.), showing significantly promoted robustness and three orders of magnitude reduced computing parameters compared to the image-driven DL models, which is especially evident for those aggregated vacancy lines with flexible lattice distortion. Besides, the intuitiveness of graphs enables quantitative and straightforward extraction of the atomic-scale structural features in batches, thus statistically unveiling the self-assembly dynamics of vacancy lines under electron beam irradiation. A versatile model toolkit is established by integrating EGNN sub-models for single structure recognition to process images involving varied configurations in the form of a task chain, leading to the discovery of novel doping configurations with superior electrocatalytic properties for hydrogen evolution reactions. This work provides a powerful tool to explore structure diversity in a fast, accurate, and intelligent manner.

Updated: 2024-10-23 07:48:35

标题: 用图神经网络探索原子分辨率显微镜中的结构多样性

摘要: 深度学习（DL）的出现为原子分辨率显微图的高通量分析提供了巨大机遇。然而，通常通过固定大小的图像块训练的DL模型在处理包含不同原子构型的显微图时缺乏效率和灵活性。在此基础上，受原子结构与图形之间的相似性启发，我们描述了一个基于等变图神经网络（EGNN）的少样本学习框架，用于分析包含各种原子结构（如空位、相位、晶界、掺杂等）的库，相比基于图像的DL模型，显著提高了鲁棒性，并将计算参数减少了三个数量级，尤其对于具有灵活晶格畸变的聚合空位线。此外，图形的直观性使得能够批量定量且直接地提取原子尺度结构特征，从而统计揭示电子束照射下空位线的自组装动力学。通过集成EGNN子模型构建了一个通用模型工具包，用于单个结构识别并以任务链的形式处理涉及多种构型的图像，从而发现具有优越电催化性能的氢气生成反应的新型掺杂构型。这项工作提供了一种快速、准确和智能的探索结构多样性的强大工具。

更新时间: 2024-10-23 07:48:35

领域: cond-mat.mtrl-sci,cond-mat.mes-hall,cs.LG

下载: http://arxiv.org/abs/2410.17631v1

Graph Signal Adaptive Message Passing

This paper proposes Graph Signal Adaptive Message Passing (GSAMP), a novel message passing method that simultaneously conducts online prediction, missing data imputation, and noise removal on time-varying graph signals. Unlike conventional Graph Signal Processing methods that apply the same filter to the entire graph, the spatiotemporal updates of GSAMP employ a distinct approach that utilizes localized computations at each node. This update is based on an adaptive solution obtained from an optimization problem designed to minimize the discrepancy between observed and estimated values. GSAMP effectively processes real-world, time-varying graph signals under Gaussian and impulsive noise conditions.

Updated: 2024-10-23 07:44:56

标题: 图信号自适应消息传递

摘要: 本文提出了一种名为图信号自适应消息传递（GSAMP）的新型消息传递方法，该方法同时进行在线预测、缺失数据插补和噪声去除。与传统的图信号处理方法不同，GSAMP的时空更新采用一种独特的方法，在每个节点上利用局部计算。这种更新基于从一个优化问题中获得的自适应解决方案，该问题旨在最小化观测值和估计值之间的差异。GSAMP有效地处理了受高斯和脉冲噪声影响的真实世界的时变图信号。

更新时间: 2024-10-23 07:44:56

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2410.17629v1

Feature Learning in Attention Mechanisms Is More Compact and Stable Than in Convolution

Attention and convolution are fundamental techniques in machine learning. While they use different approaches to learn features - attention mechanisms capture both global and local data relathionships, while convolutional layers focus on local patterns - both methods are effective for various tasks. Although the feature learning of both models is well-studied individually, there has not been a direct comparison of their feature learning dynamics. In this paper, we compare their Lipschitz continuity with respect to the Wasserstein distance and covering numbers under similar settings. We demonstrate that attention processes data in a more compact and stable manner. Compactness refers to the lower variance and intrinsic dimensionality of the activation outputs, while stability refers to the changes between inputs and outputs. We validate our findings through experiments using topological data analysis, measuring the 1-, 2-, and infinity-Wasserstein distances between the outputs of each layer from both models. Furthermore, we extend our comparison to Vision Transformers (ViTs) and ResNets, showing that while ViTs have higher output variance, their feature learning is more stable than that of ResNets.

Updated: 2024-10-23 07:44:14

标题: 注意机制中的特征学习比卷积更紴紧和稳定

摘要: 关注和卷积是机器学习中的基本技术。虽然它们使用不同的方法来学习特征 - 注意机制捕捉全局和局部数据关系，而卷积层专注于局部模式 - 但两种方法对于各种任务都是有效的。尽管这两种模型的特征学习分别得到了深入研究，但它们的特征学习动态尚未直接进行比较。在本文中，我们比较它们相对于Wasserstein距离和相似设置下的覆盖数的Lipschitz连续性。我们证明了关注过程以更紧凑和稳定的方式处理数据。紧凑性指的是激活输出的较低方差和内在维度，而稳定性指的是输入和输出之间的变化。我们通过使用拓扑数据分析实验证实了我们的发现，测量了两种模型的每一层输出之间的1、2和无穷大Wasserstein距离。此外，我们将比较扩展到视觉Transformer（ViTs）和ResNets，表明虽然ViTs具有更高的输出方差，但它们的特征学习比ResNets更稳定。

更新时间: 2024-10-23 07:44:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.17628v1

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

While automated vulnerability detection techniques have made promising progress in detecting security vulnerabilities, their scalability and applicability remain challenging. The remarkable performance of Large Language Models (LLMs), such as GPT-4 and CodeLlama, on code-related tasks has prompted recent works to explore if LLMs can be used to detect vulnerabilities. In this paper, we perform a more comprehensive study by concurrently examining a higher number of datasets, languages and LLMs, and qualitatively evaluating performance across prompts and vulnerability classes while addressing the shortcomings of existing tools. Concretely, we evaluate the effectiveness of 16 pre-trained LLMs on 5,000 code samples from five diverse security datasets. These balanced datasets encompass both synthetic and real-world projects in Java and C/C++ and cover 25 distinct vulnerability classes. Overall, LLMs across all scales and families show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets. They are significantly better at detecting vulnerabilities only requiring intra-procedural analysis, such as OS Command Injection and NULL Pointer Dereference. Moreover, they report higher accuracies on these vulnerabilities than popular static analysis tools, such as CodeQL. We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average). Interestingly, we observe that LLMs show promising abilities at performing parts of the analysis correctly, such as identifying vulnerability-related specifications and leveraging natural language information to understand code behavior (e.g., to check if code is sanitized). We expect our insights to guide future work on LLM-augmented vulnerability detection systems.

Updated: 2024-10-23 07:32:15

标题: 理解大型语言模型在检测安全漏洞中的有效性

摘要: 虽然自动化漏洞检测技术在检测安全漏洞方面取得了令人期待的进展，但其可扩展性和适用性仍然具有挑战性。如GPT-4和CodeLlama等大型语言模型（LLMs）在代码相关任务上表现出色的性能，已经促使最近的研究探索LLMs是否可以用于检测漏洞。在本文中，我们通过同时检查更多数据集、语言和LLMs，并在解决现有工具的缺点的同时，定性评估了在提示和漏洞类别方面的性能，进行了更全面的研究。具体来说，我们评估了16个预训练的LLMs在来自五个不同安全数据集的5,000个代码样本上的有效性。这些平衡数据集涵盖了Java和C/C++中的合成和真实项目，并涵盖了25个不同的漏洞类别。总的来说，LLMs在所有规模和系列上显示出在检测漏洞方面的适度有效性，在数据集上获得平均准确率为62.8%和F1分数为0.71。它们在仅需要进行程序内分析的漏洞（如操作系统命令注入和空指针引用）的检测方面明显更好。此外，它们对这些漏洞的准确性报告比流行的静态分析工具（如CodeQL）更高。我们发现，涉及逐步分析的高级提示策略显著提高了LLMs在实际数据集上的性能，表现为F1分数（平均提高了0.18）。有趣的是，我们观察到LLMs在正确执行部分分析方面表现出有希望的能力，例如识别与漏洞相关的规范，并利用自然语言信息来理解代码行为（例如，检查代码是否经过了消毒）。我们期望我们的见解能指导未来关于LLM增强漏洞检测系统的工作。

更新时间: 2024-10-23 07:32:15

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2311.16169v3

Incremental Learning of Affordances using Markov Logic Networks

Affordances enable robots to have a semantic understanding of their surroundings. This allows them to have more acting flexibility when completing a given task. Capturing object affordances in a machine learning model is a difficult task, because of their dependence on contextual information. Markov Logic Networks (MLN) combine probabilistic reasoning with logic that is able to capture such context. Mobile robots operate in partially known environments wherein unseen object affordances can be observed. This new information must be incorporated into the existing knowledge, without having to retrain the MLN from scratch. We introduce the MLN Cumulative Learning Algorithm (MLN-CLA). MLN-CLA learns new relations in various knowledge domains by retaining knowledge and only updating the changed knowledge, for which the MLN is retrained. We show that MLN-CLA is effective for accumulative learning and zero-shot affordance inference, outperforming strong baselines.

Updated: 2024-10-23 07:29:30

标题: 使用马尔可夫逻辑网络进行功能的增量学习

摘要: Affordances使机器人能够对周围环境进行语义理解。这使它们在完成给定任务时具有更多的行动灵活性。在机器学习模型中捕获对象的affordances是一项困难的任务，因为它们依赖于上下文信息。马尔可夫逻辑网络（MLN）将概率推理与能够捕获这种上下文的逻辑结合在一起。移动机器人在部分已知的环境中运行，在这些环境中，未见的对象affordances可以被观察到。这些新信息必须被整合到现有的知识中，而不必从头开始重新训练MLN。我们引入了MLN累积学习算法（MLN-CLA）。MLN-CLA通过保留知识并仅更新已更改的知识来学习各种知识领域中的新关系，对于这些已重新训练的MLN。我们展示了MLN-CLA对于累积学习和零样本affordance推断的有效性，优于强基准方法。

更新时间: 2024-10-23 07:29:30

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.17624v1

Understanding Gradient Boosting Classifier: Training, Prediction, and the Role of $γ_j$

The Gradient Boosting Classifier (GBC) is a widely used machine learning algorithm for binary classification, which builds decision trees iteratively to minimize prediction errors. This document explains the GBC's training and prediction processes, focusing on the computation of terminal node values $\gamma_j$, which are crucial to optimizing the logistic loss function. We derive $\gamma_j$ through a Taylor series approximation and provide a step-by-step pseudocode for the algorithm's implementation. The guide explains the theory of GBC and its practical application, demonstrating its effectiveness in binary classification tasks. We provide a step-by-step example in the appendix to help readers understand.

Updated: 2024-10-23 07:28:19

标题: 理解梯度提升分类器：训练、预测和$γ_j$的作用

摘要: 梯度提升分类器（GBC）是一种广泛使用的用于二元分类的机器学习算法，它通过迭代构建决策树来最小化预测错误。本文说明了GBC的训练和预测过程，重点在计算终端节点值$\gamma_j$上，这对优化逻辑损失函数至关重要。我们通过泰勒级数逼近推导出$\gamma_j$，并提供了算法实现的逐步伪代码。该指南解释了GBC的理论和实际应用，展示了它在二元分类任务中的有效性。附录中提供了一个逐步示例，以帮助读者理解。

更新时间: 2024-10-23 07:28:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05623v2

Statistical Efficiency of Distributional Temporal Difference Learning

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in the field of DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$. The distributional temporal difference learning has been accordingly proposed, which is an extension of the temporal difference learning (TD) in the classic RL area. In the tabular case, \citet{rowland2018analysis} and \citet{rowland2023analysis} proved the asymptotic convergence of two instances of distributional TD, namely categorical temporal difference learning (CTD) and quantile temporal difference learning (QTD), respectively. In this paper, we go a step further and analyze the finite-sample performance of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional TD learning (NTD). For a $\gamma$-discounted infinite-horizon tabular Markov decision process, we show that for NTD we need $\tilde{O}\left(\frac{1}{\varepsilon^{2p}(1-\gamma)^{2p+1}}\right)$ iterations to achieve an $\varepsilon$-optimal estimator with high probability, when the estimation error is measured by the $p$-Wasserstein distance. This sample complexity bound is minimax optimal up to logarithmic factors in the case of the $1$-Wasserstein distance. To achieve this, we establish a novel Freedman's inequality in Hilbert spaces, which would be of independent interest. In addition, we revisit CTD, showing that the same non-asymptotic convergence bounds hold for CTD in the case of the $p$-Wasserstein distance for $p\geq 1$.

Updated: 2024-10-23 07:26:07

标题: 分布式时序差分学习的统计效率

摘要: 分布式强化学习（DRL）在各个领域取得了实证成功。DRL领域的核心任务之一是分布式策略评估，其中涉及对给定策略$\pi$的回报分布$\eta^\pi$进行估计。相应地提出了分布式时间差分学习，这是经典RL领域中时间差分学习（TD）的扩展。在表格情况下，Rowland等人证明了两种分布式TD实例的渐近收敛，即分类时间差分学习（CTD）和分位数时间差分学习（QTD）。本文进一步分析了分布式TD的有限样本性能。为了促进理论分析，我们提出了非参数分布式TD学习（NTD）。对于$\gamma$-折扣的无限时间表格马尔可夫决策过程，我们证明了对于NTD，我们需要$\tilde{O}\left(\frac{1}{\varepsilon^{2p}(1-\gamma)^{2p+1}}\right)$次迭代才能在高概率下获得一个$\varepsilon$-最优的估计器，当估计误差以$p$-Wasserstein距离度量时。在$1$-Wasserstein距离的情况下，这个样本复杂度界是最小化的，最多只有对数因子。为了实现这一点，我们建立了Hilbert空间中的一种新的Freedman不等式，这将是独立感兴趣的。此外，我们重新审视了CTD，表明在$p\geq 1$的情况下，相同的非渐近收敛界对CTD在$p$-Wasserstein距离下成立。

更新时间: 2024-10-23 07:26:07

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.05811v3

Process Supervision-Guided Policy Optimization for Code Generation

Reinforcement Learning (RL) with unit test feedback has enhanced large language models (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvements. When generated code fails all unit tests, no learning signal is received, hindering progress on complex tasks. To address this, we propose a Process Reward Model (PRM) that delivers dense, line-level feedback on code correctness during generation, mimicking human code refinement and providing immediate guidance. We explore various strategies for training PRMs and integrating them into the RL framework, finding that using PRMs both as dense rewards and for value function initialization significantly boosts performance. Our approach increases our in-house LLM's pass rate from 28.2% to 29.8% on LiveCodeBench and from 31.8% to 35.8% on our internal benchmark. Our experimental results highlight the effectiveness of PRMs in enhancing RL-driven code generation, especially for long-horizon scenarios.

Updated: 2024-10-23 07:22:33

标题: 过程监督引导的代码生成策略优化

摘要: 使用单元测试反馈的强化学习（RL）已经增强了大型语言模型（LLMs）代码生成，但依赖于仅在完整代码评估后提供的稀疏奖励，限制了学习效率和增量改进。当生成的代码未通过所有单元测试时，未收到任何学习信号，阻碍了复杂任务的进展。为了解决这个问题，我们提出了一个过程奖励模型（PRM），在生成过程中提供关于代码正确性的密集、行级反馈，模仿人类代码的细化并提供即时指导。我们探讨了训练PRMs和将它们整合到RL框架中的各种策略，发现将PRMs同时用作密集奖励和值函数初始化可以显著提升性能。我们的方法将我们内部LLM在LiveCodeBench上的通过率从28.2%提高到29.8%，在我们的内部基准测试中从31.8%提高到35.8%。我们的实验结果突出了PRMs在增强RL驱动的代码生成方面的有效性，特别是对于长时间跨度的情况。

更新时间: 2024-10-23 07:22:33

领域: cs.AI,I.2.7,

下载: http://arxiv.org/abs/2410.17621v1

From PDFs to Structured Data: Utilizing LLM Analysis in Sports Database Management

This study investigates the effectiveness of Large Language Models (LLMs) in processing semi-structured data from PDF documents into structured formats, specifically examining their application in updating the Finnish Sports Clubs Database. Through action research methodology, we developed and evaluated an AI-assisted approach utilizing OpenAI's GPT-4 and Anthropic's Claude 3 Opus models to process data from 72 sports federation membership reports. The system achieved a 90% success rate in automated processing, successfully handling 65 of 72 files without errors and converting over 7,900 rows of data. While the initial development time was comparable to traditional manual processing (three months), the implemented system shows potential for reducing future processing time by approximately 90%. Key challenges included handling multilingual content, processing multi-page datasets, and managing extraneous information. The findings suggest that while LLMs demonstrate significant potential for automating semi-structured data processing tasks, optimal results are achieved through a hybrid approach combining AI automation with selective human oversight. This research contributes to the growing body of literature on practical LLM applications in organizational data management and provides insights into the transformation of traditional data processing workflows.

Updated: 2024-10-23 07:17:31

标题: 从PDF到结构化数据：在体育数据库管理中利用LLM分析

摘要: 这项研究调查了大型语言模型（LLMs）在将来自PDF文档的半结构化数据处理为结构化格式方面的有效性，具体考察了它们在更新芬兰体育俱乐部数据库中的应用。通过行动研究方法，我们开发并评估了一种利用OpenAI的GPT-4和Anthropic的Claude 3 Opus模型来处理来自72个体育联合会成员报告数据的AI辅助方法。该系统在自动处理方面取得了90%的成功率，成功处理了72个文件中的65个，转换了超过7,900行的数据。虽然初始开发时间与传统手动处理相当（三个月），但实施的系统显示出未来处理时间可能会减少约90%的潜力。主要挑战包括处理多语言内容、处理多页数据集和管理无关信息。研究结果表明，虽然LLMs展示了自动化半结构化数据处理任务的显著潜力，但通过将AI自动化与有选择的人工监督相结合的混合方法可以实现最佳结果。这项研究为组织数据管理中实际LLM应用的日益增长的文献贡献，并提供了有关传统数据处理工作流转变的见解。

更新时间: 2024-10-23 07:17:31

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2410.17619v1

CAT: Contrastive Adapter Training for Personalized Image Generation

The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.

Updated: 2024-10-23 07:16:42

标题: CAT：用于个性化图像生成的对比适配器训练

摘要: 各种适配器的出现，包括从自然语言处理领域引入的低秩适配器（LoRA），使得扩散模型能够以较低成本个性化图像生成。然而，由于诸多挑战，包括有限的数据集和规范化和计算资源的短缺，适配器训练往往导致不尽人意的结果，导致基础模型的先验知识受损。其中一个众所周知的现象是物体生成的多样性丧失，尤其是在同一类别内，导致生成几乎相同的物体只有微小的变化。这给生成能力带来了挑战。为了解决这个问题，我们提出了对比适配器训练（CAT），这是一种简单而有效的策略，通过应用CAT损失来增强适配器训练。我们的方法有助于在模型初始化适配器时保留基础模型的原始知识。此外，我们引入了知识保留得分（KPS）来评估CAT保留先前信息的能力。我们定性和定量比较了CAT的改进。最后，我们提及CAT在多概念适配器和优化方面的可能性。

更新时间: 2024-10-23 07:16:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.07554v2

Self-Supervised Graph Neural Networks for Enhanced Feature Extraction in Heterogeneous Information Networks

This paper explores the applications and challenges of graph neural networks (GNNs) in processing complex graph data brought about by the rapid development of the Internet. Given the heterogeneity and redundancy problems that graph data often have, traditional GNN methods may be overly dependent on the initial structure and attribute information of the graph, which limits their ability to accurately simulate more complex relationships and patterns in the graph. Therefore, this study proposes a graph neural network model under a self-supervised learning framework, which can flexibly combine different types of additional information of the attribute graph and its nodes, so as to better mine the deep features in the graph data. By introducing a self-supervisory mechanism, it is expected to improve the adaptability of existing models to the diversity and complexity of graph data and improve the overall performance of the model.

Updated: 2024-10-23 07:14:37

标题: 自监督图神经网络在异构信息网络中增强特征提取的翻译

摘要: 这篇论文探讨了图神经网络（GNNs）在处理因互联网的快速发展而带来的复杂图数据时的应用和挑战。鉴于图数据通常存在的异构性和冗余性问题，传统的GNN方法可能过于依赖图的初始结构和属性信息，从而限制了它们准确模拟图中更复杂关系和模式的能力。因此，本研究提出了一个基于自监督学习框架的图神经网络模型，该模型可以灵活地结合不同类型的属性图及其节点的额外信息，以更好地挖掘图数据中的深层特征。通过引入自监督机制，预计可以改善现有模型对图数据的多样性和复杂性的适应性，并提高模型的整体性能。

更新时间: 2024-10-23 07:14:37

领域: cs.LG

下载: http://arxiv.org/abs/2410.17617v1

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

This paper details the process of developing the first native large generative language model for the Nordic languages, GPT-SW3. We cover all parts of the development process, from data collection and processing, training configuration and instruction finetuning, to evaluation and considerations for release strategies. We hope that this paper can serve as a guide and reference for other researchers that undertake the development of large generative models for smaller languages.

Updated: 2024-10-23 07:13:49

标题: GPT-SW3：一种用于北欧语言的自回归语言模型

摘要: 本文详细介绍了开发第一个北欧语言的本地大型生成语言模型GPT-SW3的过程。我们涵盖了开发过程的所有部分，从数据收集和处理、训练配置和指令微调，到评估和发布策略的考虑。我们希望本文可以作为其他研究人员开发小语种大型生成模型的指南和参考。

更新时间: 2024-10-23 07:13:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.12987v3

ImDy: Human Inverse Dynamics from Imitated Observations

Inverse dynamics (ID), which aims at reproducing the driven torques from human kinematic observations, has been a critical tool for gait analysis. However, it is hindered from wider application to general motion due to its limited scalability. Conventional optimization-based ID requires expensive laboratory setups, restricting its availability. To alleviate this problem, we propose to exploit the recently progressive human motion imitation algorithms to learn human inverse dynamics in a data-driven manner. The key insight is that the human ID knowledge is implicitly possessed by motion imitators, though not directly applicable. In light of this, we devise an efficient data collection pipeline with state-of-the-art motion imitation algorithms and physics simulators, resulting in a large-scale human inverse dynamics benchmark as Imitated Dynamics (ImDy). ImDy contains over 150 hours of motion with joint torque and full-body ground reaction force data. With ImDy, we train a data-driven human inverse dynamics solver ImDyS(olver) in a fully supervised manner, which conducts ID and ground reaction force estimation simultaneously. Experiments on ImDy and real-world data demonstrate the impressive competency of ImDyS in human inverse dynamics and ground reaction force estimation. Moreover, the potential of ImDy(-S) as a fundamental motion analysis tool is exhibited with downstream applications. The project page is https://foruck.github.io/ImDy/.

Updated: 2024-10-23 07:06:08

标题: ImDy：通过模仿观察获取的人类逆动力学

摘要: 逆动力学（ID）旨在根据人类运动观察来重现驱动力矩，已成为步态分析的关键工具。然而，由于其有限的可扩展性，它受到了在一般运动中更广泛应用的限制。传统的基于优化的ID需要昂贵的实验室设置，限制了其可用性。为了缓解这一问题，我们提议利用最近进展的人类运动模仿算法以数据驱动的方式学习人类的逆动力学。关键的洞察是，虽然人类的ID知识并不直接适用，但却隐含在运动模仿者中。基于此，我们设计了一个高效的数据采集管道，利用最先进的运动模仿算法和物理模拟器，形成了一个大规模的人类逆动力学基准 ImDy（Imitated Dynamics）。ImDy包含超过150小时的带有关节力矩和全身地面反作用力数据的运动。通过ImDy，我们以完全监督的方式训练了一个数据驱动的人类逆动力学求解器 ImDySolver，它同时进行ID和地面反作用力估计。在ImDy和真实数据上的实验展示了ImDyS在人类逆动力学和地面反作用力估计方面的出色能力。此外，ImDy(-S)作为一种基本的运动分析工具的潜力通过下游应用得到展示。项目页面为https://foruck.github.io/ImDy/。

更新时间: 2024-10-23 07:06:08

领域: cs.AI,cs.CV,cs.GR,cs.RO

下载: http://arxiv.org/abs/2410.17610v1

On the Design and Performance of Machine Learning Based Error Correcting Decoders

This paper analyzes the design and competitiveness of four neural network (NN) architectures recently proposed as decoders for forward error correction (FEC) codes. We first consider the so-called single-label neural network (SLNN) and the multi-label neural network (MLNN) decoders which have been reported to achieve near maximum likelihood (ML) performance. Here, we show analytically that SLNN and MLNN decoders can always achieve ML performance, regardless of the code dimensions -- although at the cost of computational complexity -- and no training is in fact required. We then turn our attention to two transformer-based decoders: the error correction code transformer (ECCT) and the cross-attention message passing transformer (CrossMPT). We compare their performance against traditional decoders, and show that ordered statistics decoding outperforms these transformer-based decoders. The results in this paper cast serious doubts on the application of NN-based FEC decoders in the short and medium block length regime.

Updated: 2024-10-23 07:05:26

标题: 关于基于机器学习的纠错解码器设计和性能的研究

摘要: 本文分析了最近提出的四种神经网络（NN）架构作为前向纠错（FEC）码解码器的设计和竞争力。我们首先考虑了所谓的单标签神经网络（SLNN）和多标签神经网络（MLNN）解码器，据报道它们可以实现接近最大似然（ML）性能。在这里，我们通过分析表明，SLNN和MLNN解码器总是可以实现ML性能，无论编码维度如何--尽管以计算复杂度为代价--实际上并不需要训练。然后我们转向两种基于transformer的解码器：纠错码变换器（ECCT）和交叉注意力消息传递变换器（CrossMPT）。我们将它们的性能与传统解码器进行比较，并表明有序统计解码优于这些基于transformer的解码器。本文的结果严重怀疑了在短中块长度范围内应用基于NN的FEC解码器。

更新时间: 2024-10-23 07:05:26

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.15899v2

Solving a Stackelberg Game on Transportation Networks in a Dynamic Crime Scenario: A Mixed Approach on Multi-Layer Networks

Interdicting a criminal with limited police resources is a challenging task as the criminal changes location over time. The size of the large transportation network further adds to the difficulty of this scenario. To tackle this issue, we consider the concept of a layered graph. At each time stamp, we create a copy of the entire transportation network to track the possible movements of both players, the attacker and the defenders. We consider a Stackelberg game in a dynamic crime scenario where the attacker changes location over time while the defenders attempt to interdict the attacker on his escape route. Given a set of defender strategies, the optimal attacker strategy is determined by applying Dijkstra's algorithm on the layered networks. Here, the attacker aims to minimize while the defenders aim to maximize the probability of interdiction. We develop an approximation algorithm on the layered networks to find near-optimal strategy for defenders. The efficacy of the developed approach is compared with the adopted MILP approach. We compare the results in terms of computational time and solution quality. The quality of the results demonstrates the need for the developed approach, as it effectively solves the complex problem within a short amount of time.

Updated: 2024-10-23 07:05:18

标题: 在动态犯罪情景中解决交通网络上的斯塔克伯格博弈：多层网络上的混合方法

摘要: 用有限的警力干预罪犯是一项具有挑战性的任务，因为罪犯随时间改变位置。庞大的交通网络规模进一步增加了这种情况的难度。为了解决这个问题，我们考虑了分层图的概念。在每个时间戳，我们创建整个交通网络的副本，以跟踪两个玩家，即攻击者和防御者的可能移动。我们考虑在动态犯罪场景中的Stackelberg博弈，攻击者随时间改变位置，而防御者试图在其逃逸路线上干预攻击者。在给定一组防御者策略的情况下，通过在分层网络上应用Dijkstra算法确定最佳的攻击者策略。在这里，攻击者的目标是最小化，而防御者的目标是最大化干预的概率。我们在分层网络上开发了一个近似算法，以找到防御者的近似最优策略。我们将开发的方法的效力与采用的MILP方法进行了比较。我们比较了计算时间和解决方案质量的结果。结果的质量证明了对开发的方法的需求，因为它能够在短时间内有效解决复杂问题。

更新时间: 2024-10-23 07:05:18

领域: cs.AI

下载: http://arxiv.org/abs/2406.14514v2

Non-myopic Generation of Language Model for Reasoning and Planning

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.

Updated: 2024-10-23 07:02:09

标题: 非近视生成语言模型用于推理和规划

摘要: 大型语言模型已经展示出在推理和规划方面的显著能力，通过将复杂问题分解为顺序步骤。尽管它们在数学问题解决和编码等各个领域取得了成功，但由于其自回归解码的固有短视性，LLM在确保可靠和最佳规划方面面临挑战。本文从最优控制的角度重新审视LLM推理，提出了一种新颖的方法，Predictive-Decoding，利用模型预测控制来增强规划准确性。通过根据预测轨迹重新加权LLM分布，Predictive-Decoding旨在减轻早期错误并促进非短视规划。我们的实验显示，在数学、编码和代理等各种任务中都取得了显著改进。此外，Predictive-Decoding展示了计算效率，在减少计算资源的情况下胜过搜索基线。本研究提供了优化LLM规划能力的见解。

更新时间: 2024-10-23 07:02:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17195v2

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ synthesized training data are prone to the limitations of inadequate diversity and discrepancies in distribution between the synthesized and original datasets. To address these challenges, this paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA). Specifically, we revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations. Furthermore, to mitigate excessive deviation in the embedding space, we introduce an image filtering technique grounded in cosine similarity to maintain fidelity during the knowledge distillation process. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method across various teacher-student network configurations, outperforming the contemporary state-of-the-art DFKD methods. Code will be available at:https://github.com/SLGSP/DDA.

Updated: 2024-10-23 07:01:16

标题: 朝向有效的无数据知识蒸馏：通过多样化扩散增强

摘要: 数据无关知识蒸馏（DFKD）已经成为模型压缩领域中的一个关键技术，大大减少对原始训练数据的依赖。然而，采用合成训练数据的传统DFKD方法容易受到多样性不足和合成数据与原始数据集之间分布差异的限制。为了解决这些挑战，本文介绍了一种通过多样扩散增强（DDA）进行DFKD的创新方法。具体来说，我们将DFKD中常见的数据合成范式改为一个复合过程，通过在数据合成后利用扩散模型进行自监督增强，生成具有相似分布但保留可控变化的数据样本。此外，为了减少嵌入空间中的过度偏差，我们引入了一种基于余弦相似性的图像过滤技术，在知识蒸馏过程中保持忠实度。在CIFAR-10、CIFAR-100和Tiny-ImageNet数据集上进行的全面实验展示了我们方法在各种师生网络配置上的卓越性能，优于当代最先进的DFKD方法。代码将在https://github.com/SLGSP/DDA 上提供。

更新时间: 2024-10-23 07:01:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17606v1

Integrating Large Language Models for UAV Control in Simulated Environments: A Modular Interaction Approach

The intersection of LLMs (Large Language Models) and UAV (Unoccupied Aerial Vehicles) technology represents a promising field of research with the potential to enhance UAV capabilities significantly. This study explores the application of LLMs in UAV control, focusing on the opportunities for integrating advanced natural language processing into autonomous aerial systems. By enabling UAVs to interpret and respond to natural language commands, LLMs simplify the UAV control and usage, making them accessible to a broader user base and facilitating more intuitive human-machine interactions. The paper discusses several key areas where LLMs can impact UAV technology, including autonomous decision-making, dynamic mission planning, enhanced situational awareness, and improved safety protocols. Through a comprehensive review of current developments and potential future directions, this study aims to highlight how LLMs can transform UAV operations, making them more adaptable, responsive, and efficient in complex environments. A template development framework for integrating LLMs in UAV control is also described. Proof of Concept results that integrate existing LLM models and popular robotic simulation platforms are demonstrated. The findings suggest that while there are substantial technical and ethical challenges to address, integrating LLMs into UAV control holds promising implications for advancing autonomous aerial systems.

Updated: 2024-10-23 06:56:53

标题: 在模拟环境中集成大型语言模型用于无人机控制：一种模块化交互方法

摘要: 大型语言模型（LLMs）和无人机（UAV）技术的交集代表了一个具有潜力显著增强无人机能力的研究领域。本研究探讨了LLMs在无人机控制中的应用，重点关注将先进的自然语言处理整合到自主飞行系统中的机会。通过使无人机能够解释和回应自然语言命令，LLMs简化了无人机控制和使用，使其更易于广泛用户群体使用，并促进了更直观的人机交互。本文讨论了LLMs可以影响无人机技术的几个关键领域，包括自主决策、动态任务规划、增强情境感知和改进安全协议。通过对当前发展和潜在未来方向的全面审查，本研究旨在突出LLMs如何可以改变无人机操作，使其在复杂环境中更具适应性、响应性和效率。还描述了将LLMs整合到无人机控制中的模板开发框架。展示了集成现有LLM模型和流行机器人仿真平台的概念验证结果。研究结果表明，尽管存在重大技术和伦理挑战，将LLMs整合到无人机控制中对推进自主飞行系统具有潜在的积极影响。

更新时间: 2024-10-23 06:56:53

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.17602v1

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. Large Language Models (LLMs) have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the Natural Language Processing (NLP) domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.

Updated: 2024-10-23 06:54:03

标题: 图谱：一个具有全球视野的知识图谱构建的RAG框架

摘要: 知识图谱（KGs）在人工智能领域至关重要，并广泛用于下游任务，如问答（QA）。构建KGs通常需要领域专家的大量工作。最近，大型语言模型（LLMs）已被用于知识图谱构建（KGC）。然而，大多数现有方法侧重于局部视角，从个别句子或文档中提取知识三元组，缺乏融合过程来结合全局KG中的知识。本文介绍了Graphusion，一个从自由文本中进行零-shot KGC的框架。它包含三个步骤：第一步，我们使用主题建模提取一组种子实体，以指导最终KG包含最相关的实体；第二步，我们使用LLMs进行候选三元组提取；第三步，我们设计了新颖的融合模块，提供了对提取的知识的全局视图，包括实体合并、冲突解决和新颖三元组发现。结果显示，Graphusion在实体提取和关系识别方面分别达到了3分的2.92和2.37分。此外，我们展示了Graphusion如何应用于自然语言处理（NLP）领域，并在教育场景中进行了验证。具体地，我们介绍了TutorQA，一个经专家验证的新QA基准，包括六个任务和共计1200个QA对。使用Graphusion构建的KG，我们在基准测试中取得了显著的改进，例如，在子图完成方面的准确率提高了9.2%。

更新时间: 2024-10-23 06:54:03

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2410.17600v1

Adaptive Spatio-temporal Estimation on the Graph Edges via Line Graph Transformation

Spatio-temporal estimation of signals on graph edges is challenging because most conventional Graph Signal Processing techniques are defined on the graph nodes. Leveraging the Line Graph transform, the Line Graph Least Mean Square (LGLMS) algorithm is proposed to conduct adaptive estimation of time-varying edge signals by projecting the edge signals from edge space to node space. LGLMS is an adaptive algorithm analogous to the classical LMS algorithm but applied to graph edges. Unlike edge-specific methods, LGLMS retains all GSP concepts and techniques originally designed for graph nodes, without the need for redefinition on the edges. Experimenting with transportation graphs and meteorological graphs, with the signal observations having noisy and missing values, we confirmed that LGLMS is suitable for the online prediction of time-varying edge signals.

Updated: 2024-10-23 06:53:57

标题: 通过线图转换在图边上进行自适应时空估计

摘要: 在图边上空间-时间信号的估计是具有挑战性的，因为大多数传统的图信号处理技术是定义在图节点上的。利用线图变换，提出了线图最小均方(LGLMS)算法，用于通过将边空间的边信号投影到节点空间来进行时变边信号的自适应估计。LGLMS是一种自适应算法，类似于经典的LMS算法，但应用于图边。与边缘特定方法不同，LGLMS保留了最初设计用于图节点的所有GSP概念和技术，无需重新定义边缘。通过在交通图和气象图上进行实验，信号观测具有噪声和缺失值，我们证实了LGLMS适用于时变边信号的在线预测。

更新时间: 2024-10-23 06:53:57

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2311.00656v3

Understanding Transfer Learning via Mean-field Analysis

We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, $\alpha$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied. Based on our theoretical results, we show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime under some suitable integrability and regularity assumptions on the loss and activation functions.

Updated: 2024-10-23 06:51:54

标题: 通过平均场分析理解迁移学习

摘要: 我们提出了一个新颖的框架，通过在概率测度空间上的微分计算来探索迁移学习的泛化错误。具体而言，我们考虑了两种主要的迁移学习场景，即$\alpha$-ERM和KL正则化经验风险最小化的微调，并建立了泛化错误和这些场景的人口风险收敛速度的一般条件。根据我们的理论结果，我们展示了在某些适当的可积性和正则性假设下，使用一层隐藏层神经网络在均场极限下进行迁移学习的益处。

更新时间: 2024-10-23 06:51:54

领域: stat.ML,cs.LG,math.FA

下载: http://arxiv.org/abs/2410.17128v2

Universal approximation results for neural networks with non-polynomial activation function over non-compact domains

In this paper, we generalize the universal approximation property of single-hidden-layer feed-forward neural networks beyond the classical formulation over compact domains. More precisely, by assuming that the activation function is non-polynomial, we derive universal approximation results for neural networks within function spaces over non-compact subsets of a Euclidean space, e.g., weighted spaces, $L^p$-spaces, and (weighted) Sobolev spaces over unbounded domains, where the latter includes the approximation of the (weak) derivatives. Furthermore, we provide some dimension-independent rates for approximating a function with sufficiently regular and integrable Fourier transform by neural networks with non-polynomial activation function.

Updated: 2024-10-23 06:51:05

标题: 非紧致域上具有非多项式激活函数的神经网络的通用逼近结果

摘要: 在这篇论文中，我们将单隐藏层前馈神经网络的普适逼近性质推广到紧致域之外。更具体地，通过假设激活函数是非多项式的，我们推导出神经网络在欧几里得空间的非紧致子集上的函数空间内的普适逼近结果，例如加权空间、$L^p$-空间和（加权）Sobolev空间在无界域上，后者包括对（弱）导数的逼近。此外，我们为使用非多项式激活函数的神经网络逼近具有足够正则和可积傅立叶变换的函数提供了一些与维度无关的速率。

更新时间: 2024-10-23 06:51:05

领域: stat.ML,cs.LG,cs.NE,math.CA

下载: http://arxiv.org/abs/2410.14759v2

A Kernel Perspective on Distillation-based Collaborative Learning

Over the past decade, there is a growing interest in collaborative learning that can enhance AI models of multiple parties. However, it is still challenging to enhance performance them without sharing private data and models from individual parties. One recent promising approach is to develop distillation-based algorithms that exploit unlabeled public data but the results are still unsatisfactory in both theory and practice. To tackle this problem, we rigorously analyze a representative distillation-based algorithm in the view of kernel regression. This work provides the first theoretical results to prove the (nearly) minimax optimality of the nonparametric collaborative learning algorithm that does not directly share local data or models in massively distributed statistically heterogeneous environments. Inspired by our theoretical results, we also propose a practical distillation-based collaborative learning algorithm based on neural network architecture. Our algorithm successfully bridges the gap between our theoretical assumptions and practical settings with neural networks through feature kernel matching. We simulate various regression tasks to verify our theory and demonstrate the practical feasibility of our proposed algorithm.

Updated: 2024-10-23 06:40:13

标题: 基于蒸馏的协作学习的核心视角

摘要: 在过去的十年里，人们对协作学习的兴趣日益增长，可以增强多方的人工智能模型。然而，要在不共享个体方的私人数据和模型的情况下提高它们的性能仍然具有挑战性。最近一种有前途的方法是开发基于蒸馏的算法，利用未标记的公共数据，但结果在理论和实践中仍然令人不满意。为了解决这个问题，我们从核回归的视角对一个代表性的基于蒸馏的算法进行了严密的分析。这项工作提供了第一个理论结果，证明了在大规模分布统计异质环境中不直接共享本地数据或模型的非参数协作学习算法的（接近）极小极优性。受到我们理论结果的启发，我们还提出了一种基于神经网络架构的实用蒸馏协作学习算法。我们的算法通过特征核匹配成功地弥合了我们的理论假设与神经网络实践设置之间的差距。我们模拟各种回归任务来验证我们的理论，并展示我们提出的算法的实际可行性。

更新时间: 2024-10-23 06:40:13

领域: cs.LG

下载: http://arxiv.org/abs/2410.17592v1

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop.

Updated: 2024-10-23 06:39:57

标题: 里士多:基于自进化LLM的人工智能外交代理

摘要: 外交是人类社会中最复杂的活动之一，涉及多方复杂互动，需要社会推理、谈判和长期战略规划技能。先前的人工智能代理已经展示了它们处理多步游戏和大型行动空间的能力在多代理任务中。然而，外交涉及庞大的决策空间，尤其是考虑到所需的谈判阶段。尽管基于大型语言模型（LLMs）的最新代理在各种应用中显示出潜力，但它们仍然在复杂多代理环境中的长期规划中遇到困难。利用最新的LLM代理技术，我们的目标是探索人工智能创造类似人类的代理的潜力，能够通过整合三种基本能力执行全面的多代理任务：1）具有记忆和反思的战略规划；2）目标导向的带有社会推理的谈判；以及3）通过自我对抗游戏增强记忆，实现无需人类干预的自我进化。

更新时间: 2024-10-23 06:39:57

领域: cs.AI,cs.MA,cs.SI

下载: http://arxiv.org/abs/2407.06813v4

LVBench: An Extreme Long Video Understanding Benchmark

Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io.

Updated: 2024-10-23 06:37:01

标题: LVBench：一项极端长视频理解基准测试

摘要: 最近在多模态大语言模型方面取得的进展显著增强了对短视频（通常在一分钟以下）的理解，并相应出现了几个评估数据集。然而，这些进展未能满足现实世界应用的需求，例如长期决策的体现智能、深度电影评论和讨论，以及现场体育评论，所有这些都需要理解跨越数小时的长视频。为了填补这一差距，我们引入了LVBench，这是一个专门设计用于长视频理解的基准测试。我们的数据集包括公开获取的视频，并涵盖了一系列旨在实现长视频理解和信息提取的任务。LVBench旨在挑战多模态模型展示长期记忆和扩展理解能力。我们进行了广泛的评估，结果显示当前的多模态模型在这些要求苛刻的长视频理解任务上仍然表现不佳。通过LVBench，我们旨在推动开发更先进的模型，能够应对长视频理解的复杂性。我们的数据和代码可以在https://lvbench.github.io上公开获取。

更新时间: 2024-10-23 06:37:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.08035v2

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation

Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fr\'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research.

Updated: 2024-10-23 06:35:41

标题: 挑战声音场景合成：评估文本到音频生成

摘要: 尽管神经文本到音频生成取得了显著进展，但在可控性和评估方面仍存在挑战。本文通过作为2024年声场合成挑战的一部分举行的声场和事件检测与分类，来解决这些问题。我们提出了一个评估协议，结合客观指标，即Fr\'echet音频距离，与感知评估，利用结构化提示格式，以实现多样化的字幕和有效的评估。我们的分析显示，在声音类别和模型架构之间表现出不同的性能，较大的模型通常表现出色，但创新的轻量级方法也表现出潜力。客观指标和人类评分之间的强相关性验证了我们的评估方法。我们讨论了音频质量、可控性和文本到音频合成器的架构考虑因素的结果，为未来研究提供方向。

更新时间: 2024-10-23 06:35:41

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.17589v1

Predicting Company Growth by Econophysics informed Machine Learning

Predicting company growth is crucial for strategic adjustment, operational decision-making, risk assessment, and loan eligibility reviews. Traditional models for company growth often focus too much on theory, overlooking practical forecasting, or they rely solely on time series forecasting techniques, ignoring interpretability and the inherent mechanisms of company growth. In this paper, we propose a machine learning-based prediction framework that incorporates an econophysics model for company growth. Our model captures both the intrinsic growth mechanisms of companies led by scaling laws and the fluctuations influenced by random factors and individual decisions, demonstrating superior predictive performance compared with methods that use time series techniques alone. Its advantages are more pronounced in long-range prediction tasks. By explicitly modeling the baseline growth and volatility components, our model is more interpretable.

Updated: 2024-10-23 06:30:20

标题: 用经济物理学指导的机器学习预测公司增长

摘要: 预测公司增长对于战略调整、运营决策、风险评估和贷款资格审查至关重要。传统的公司增长模型往往过于注重理论，忽视实际预测，或者仅依赖于时间序列预测技术，忽视可解释性和公司增长的固有机制。在本文中，我们提出了一个基于机器学习的预测框架，结合了一个公司增长的经济物理模型。我们的模型捕捉了由比例律引导的公司固有增长机制和受随机因素和个体决策影响的波动，与仅使用时间序列技术的方法相比，展现出更优越的预测性能。其优势在长期预测任务中更加显著。通过明确建模基线增长和波动性成分，我们的模型更具可解释性。

更新时间: 2024-10-23 06:30:20

领域: cs.CE,cs.LG,econ.GN,physics.soc-ph,q-fin.EC

下载: http://arxiv.org/abs/2410.17587v1

Generative AI Security: Challenges and Countermeasures

Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by Generative AI, and outlines potential research directions for managing these risks.

Updated: 2024-10-23 06:28:19

标题: 生成式人工智能安全：挑战与对策

摘要: 生成式人工智能在许多行业中的不断扩展足迹引发了人们的兴奋和增加的审查。本文探讨了生成式人工智能所带来的独特安全挑战，并概述了管理这些风险的潜在研究方向。

更新时间: 2024-10-23 06:28:19

领域: cs.CR,cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.12617v2

Empirical investigation of multi-source cross-validation in clinical ECG classification

Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs. In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We consider the task of electrocardiogram based cardiovascular disease classification, combining and harmonizing the openly available PhysioNet CinC Challenge 2021 and the Shandong Provincial Hospital datasets for our study. Our results show that K-fold cross-validation, both on single-source and multi-source data, systemically overestimates prediction performance when the end goal is to generalize to new sources. Leave-source-out cross-validation provides more reliable performance estimates, having close to zero bias though larger variability. The evaluation highlights the dangers of obtaining misleading cross-validation results on medical data and demonstrates how these issues can be mitigated when having access to multi-source data.

Updated: 2024-10-23 06:27:26

标题: 临床心电图分类中多源交叉验证的实证研究

摘要: 传统上，基于机器学习的临床预测模型通常是在来自单一来源（如医院）的患者数据上进行训练和评估的。交叉验证方法可以用来估计这些模型在来自同一来源的新患者身上的准确性，通过重复随机分割数据来实现。然而，与部署模型到数据集中未代表的源（如新医院）上获得的准确性相比，这种估计往往是非常过分乐观的。越来越多的多源医疗数据集的可用性为通过源级交叉验证设计获得更全面和现实的预期准确性评估提供了新的机会。在本研究中，我们对标准K折交叉验证和留出源交叉验证方法在多源环境中进行了系统的实证评估。我们考虑了基于心电图的心血管疾病分类任务，将PhysioNet CinC Challenge 2021和山东省立医院数据集进行了结合和调和，用于我们的研究。我们的结果显示，K折交叉验证在单一来源和多源数据上都普遍高估了预测性能，当最终目标是推广到新数据源时。留出源交叉验证提供了更可靠的性能估计，几乎没有偏差，尽管变异性较大。评估突显了在医疗数据上获得误导性交叉验证结果的危险，并展示了在拥有多源数据时如何缓解这些问题。

更新时间: 2024-10-23 06:27:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.15012v2

OpenMU: Your Swiss Army Knife for Music Understanding

We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our music understanding model, OpenMU, with extensive ablations, demonstrating that OpenMU outperforms baseline models such as MU-Llama. Both OpenMU and OpenMU-Bench are open-sourced to facilitate future research in music understanding and to enhance creative music production efficiency.

Updated: 2024-10-23 06:21:09

标题: OpenMU：音乐理解的瑞士军刀

摘要: 我们提出了OpenMU-Bench，一个用于解决训练多模态语言模型以理解音乐中的数据稀缺问题的大规模基准套件。为了构建OpenMU-Bench，我们利用现有数据集并引入新的注释。OpenMU-Bench还通过包括歌词理解和音乐工具使用来扩展音乐理解的范围。使用OpenMU-Bench，我们对我们的音乐理解模型OpenMU进行了广泛的消融实验，证明OpenMU优于基线模型如MU-Llama。OpenMU和OpenMU-Bench都是开源的，以促进未来音乐理解研究，并提高创意音乐制作效率。

更新时间: 2024-10-23 06:21:09

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.15573v2

Exploring Tokenization Methods for Multitrack Sheet Music Generation

This study explores the tokenization of multitrack sheet music in ABC notation, introducing two methods--bar-stream and line-stream patching. We compare these methods against existing techniques, including bar patching, byte patching, and Byte Pair Encoding (BPE). In terms of both computational efficiency and the musicality of the generated compositions, experimental results show that bar-stream patching performs best overall compared to the others, which makes it a promising tokenization strategy for sheet music generation.

Updated: 2024-10-23 06:19:48

标题: 探究多轨谱曲生成的标记化方法

摘要: 这项研究探讨了在ABC符号中对多轨谱谱进行标记化的方法，引入了两种方法--小节流和行流拼接。我们将这些方法与现有技术进行了比较，包括小节拼接、字节拼接和字节对编码（BPE）。实验结果表明，在计算效率和生成作品的音乐性方面，小节流拼接方法相对于其他方法表现最佳，这使其成为谱谱生成的一种有前景的标记化策略。

更新时间: 2024-10-23 06:19:48

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.17584v1

Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees

We propose the first learning scheme for functional differential equations (FDEs). FDEs play a fundamental role in physics, mathematics, and optimal control. However, the numerical analysis of FDEs has faced challenges due to its unrealistic computational costs and has been a long standing problem over decades. Thus, numerical approximations of FDEs have been developed, but they often oversimplify the solutions. To tackle these two issues, we propose a hybrid approach combining physics-informed neural networks (PINNs) with the \textit{cylindrical approximation}. The cylindrical approximation expands functions and functional derivatives with an orthonormal basis and transforms FDEs into high-dimensional PDEs. To validate the reliability of the cylindrical approximation for FDE applications, we prove the convergence theorems of approximated functional derivatives and solutions. Then, the derived high-dimensional PDEs are numerically solved with PINNs. Through the capabilities of PINNs, our approach can handle a broader class of functional derivatives more efficiently than conventional discretization-based methods, improving the scalability of the cylindrical approximation. As a proof of concept, we conduct experiments on two FDEs and demonstrate that our model can successfully achieve typical $L^1$ relative error orders of PINNs $\sim 10^{-3}$. Overall, our work provides a strong backbone for physicists, mathematicians, and machine learning experts to analyze previously challenging FDEs, thereby democratizing their numerical analysis, which has received limited attention. Code is available at \url{https://github.com/TaikiMiyagawa/FunctionalPINN}.

Updated: 2024-10-23 06:16:35

标题: 物理启发的神经网络用于功能微分方程：圆柱近似及其收敛性保证

摘要: 我们提出了第一个用于功能微分方程（FDEs）的学习方案。FDEs在物理学、数学和最优控制中起着基础性的作用。然而，由于其不切实际的计算成本，FDEs的数值分析面临挑战，并且这是几十年来一直存在的问题。因此，已经开发了FDEs的数值逼近方法，但它们经常过分简化解决方案。为了解决这两个问题，我们提出了一个混合方法，将基于物理的神经网络（PINNs）与\textit{圆柱逼近}相结合。圆柱逼近使用正交基扩展函数和功能导数，并将FDEs转化为高维PDEs。为了验证圆柱逼近在FDE应用中的可靠性，我们证明了近似功能导数和解的收敛定理。然后，利用PINNs数值求解导出的高维PDEs。通过PINNs的功能，我们的方法可以更有效地处理更广泛类别的功能导数，优化圆柱逼近的可扩展性。作为概念验证，我们对两个FDEs进行实验，并证明我们的模型可以成功地实现典型的$L^1$相对误差顺序PINNs$\sim 10^{-3}$。总的来说，我们的工作为物理学家、数学家和机器学习专家分析先前具有挑战性的FDEs提供了强有力的支撑，从而使它们的数值分析更加民主化，这一直受到有限关注。代码可在\url{https://github.com/TaikiMiyagawa/FunctionalPINN}上找到。

更新时间: 2024-10-23 06:16:35

领域: math.NA,cond-mat.dis-nn,cs.AI,cs.NA,hep-th,stat.ML

下载: http://arxiv.org/abs/2410.18153v1

Bonsai: Gradient-free Graph Distillation for Node Classification

Graph distillation has emerged as a promising avenue to enable scalable training of GNNs by compressing the training dataset while preserving essential graph characteristics. Our study uncovers significant shortcomings in current graph distillation techniques. First, the majority of the algorithms paradoxically require training on the full dataset to perform distillation. Second, due to their gradient-emulating approach, these methods require fresh distillation for any change in hyperparameters or GNN architecture, limiting their flexibility and reusability. Finally, they fail to achieve substantial size reduction due to synthesizing fully-connected, edge-weighted graphs. To address these challenges, we present Bonsai, a novel graph distillation method empowered by the observation that \textit{computation trees} form the fundamental processing units of message-passing GNNs. Bonsai distills datasets by encoding a careful selection of \textit{exemplar} trees that maximize the representation of all computation trees in the training set. This unique approach imparts Bonsai as the first linear-time, model-agnostic graph distillation algorithm for node classification that outperforms existing baselines across $6$ real-world datasets on accuracy, while being $22$ times faster on average. Bonsai is grounded in rigorous mathematical guarantees on the adopted approximation strategies making it robust to GNN architectures, datasets, and parameters.

Updated: 2024-10-23 06:08:45

标题: 盆景：无梯度图蒸馏用于节点分类

摘要: 图形精炼已成为一种有前途的途径，可以通过压缩训练数据集同时保留基本的图形特征，从而实现GNN的可扩展训练。我们的研究揭示了当前图形精炼技术存在重大缺陷。首先，大多数算法矛盾地要求在完整数据集上进行训练才能进行精炼。其次，由于它们的梯度模拟方法，这些方法在超参数或GNN架构发生变化时需要进行新的精炼，限制了它们的灵活性和可重用性。最后，它们无法实现实质性的尺寸缩减，因为它们合成了全连接的、边加权的图形。为了解决这些挑战，我们提出了Bonsai，一种新颖的图形精炼方法，其观察到\textit{计算树}形成了消息传递GNN的基本处理单元。Bonsai通过对最大化训练集中所有计算树的表示的\textit{示范}树进行精炼数据集的编码。这种独特的方法使Bonsai成为第一个超越现有基线的节点分类的线性时间、与模型无关的图形精炼算法，在准确性上优于$6$个实际数据集，平均快$22$倍。Bonsai基于采用的近似策略具有严格的数学保证，使其对GNN架构、数据集和参数具有稳健性。

更新时间: 2024-10-23 06:08:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17579v1

P1-KAN an effective Kolmogorov Arnold Network for function approximation

A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimension. We show that it outperforms multilayer perceptrons in terms of accuracy and converges faster. We also compare it with several proposed KAN networks: the original spline-based KAN network appears to be more effective for smooth functions, while the P1-KAN network is more effective for irregular functions.

Updated: 2024-10-23 06:05:11

标题: P1-KAN一种有效的科尔莫戈洛夫-阿诺德网络用于函数逼近

摘要: 提出了一种新的科尔莫戈洛夫-阿诺德网络（KAN），用于近似高维度中潜在的不规则函数。我们表明，它在准确性和收敛速度方面优于多层感知器。我们还将其与几种提出的KAN网络进行比较：原始基于样条的KAN网络似乎对平滑函数更有效，而P1-KAN网络对不规则函数更有效。

更新时间: 2024-10-23 06:05:11

领域: cs.LG,cs.NE,stat.ML,68T07

下载: http://arxiv.org/abs/2410.03801v2

Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System through Distributed Database and Multimodal Perception: Demonstrated in Crossroads

The autonomous driving industry is rapidly advancing, with Vehicle-to-Vehicle (V2V) communication systems highlighting as a key component of enhanced road safety and traffic efficiency. This paper introduces a novel Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System (VVCCS), designed to revolutionize macro-scope traffic planning and collision avoidance in autonomous driving. Implemented on Quanser Car (Qcar) hardware platform, our system integrates the distributed databases into individual autonomous vehicles and an optional central server. We also developed a comprehensive multi-modal perception system with multi-objective tracking and radar sensing. Through a demonstration within a physical crossroad environment, our system showcases its potential to be applied in congested and complex urban environments.

Updated: 2024-10-23 05:59:55

标题: 实时基于分布式数据库和多模态感知的车辆间通信协作控制系统：在十字路口进行演示

摘要: 自动驾驶行业正在迅速发展，车辆间通信系统作为增强道路安全和交通效率的关键组成部分。本文介绍了一种新型的基于实时车辆间通信的网络协作控制系统(VVCCS)，旨在革新自动驾驶中的宏观交通规划和碰撞回避。我们在Quanser Car (Qcar)硬件平台上实施了该系统，将分布式数据库集成到各个自动驾驶车辆和一个可选的中央服务器中。我们还开发了一个综合的多模态感知系统，具有多目标跟踪和雷达感知功能。通过在物理十字路口环境中进行演示，我们的系统展示了其在拥挤和复杂的城市环境中应用的潜力。

更新时间: 2024-10-23 05:59:55

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.17576v1

Masked Clinical Modelling: A Framework for Synthetic and Augmented Survival Data Generation

Access to real clinical data is often restricted due to privacy obligations, creating significant barriers for healthcare research. Synthetic datasets provide a promising solution, enabling secure data sharing and model development. However, most existing approaches focus on data realism rather than utility -- ensuring that models trained on synthetic data yield clinically meaningful insights comparable to those trained on real data. In this paper, we present Masked Clinical Modelling (MCM), a framework inspired by masked language modelling, designed for both data synthesis and conditional data augmentation. We evaluate this prototype on the WHAS500 dataset using Cox Proportional Hazards models, focusing on the preservation of hazard ratios as key clinical metrics. Our results show that data generated using the MCM framework improves both discrimination and calibration in survival analysis, outperforming existing methods. MCM demonstrates strong potential to support survival data analysis and broader healthcare applications.

Updated: 2024-10-23 05:57:12

标题: 掩盖的临床建模：合成和增强生存数据生成的框架

摘要: 访问真实临床数据通常受限于隐私义务，这给医疗保健研究造成了重大障碍。合成数据集提供了一个有前途的解决方案，可以实现安全的数据共享和模型开发。然而，大多数现有方法都侧重于数据的真实性而不是效用 -- 确保在合成数据上训练的模型产生与在真实数据上训练相媲美的临床意义洞察。在本文中，我们介绍了Masked Clinical Modelling (MCM)，这是一个受到掩码语言建模启发的框架，旨在用于数据合成和条件数据增强。我们使用Cox比例风险模型在WHAS500数据集上评估了这个原型，重点关注风险比作为关键临床指标的保留。我们的结果表明，使用MCM框架生成的数据在生存分析中提高了辨别和校准能力，优于现有方法。MCM展示了支持生存数据分析和更广泛的医疗应用的潜力。

更新时间: 2024-10-23 05:57:12

领域: cs.LG

下载: http://arxiv.org/abs/2410.16811v2

ConfusedPilot: Confused Deputy Risks in RAG-based LLMs

Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear. In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.

Updated: 2024-10-23 05:55:31

标题: 《迷惑的飞行员：基于RAG的LLM中的迷惑风险》

摘要: 检索增强生成（RAG）是一个过程，其中一个大型语言模型（LLM）从数据库中检索有用信息，然后生成响应。它在企业环境中日常业务操作中变得流行。例如，微软365的Copilot已经积累了数百万企业。然而，采用这种基于RAG的系统的安全影响尚不清楚。在本文中，我们介绍了ConfusedPilot，这是一类混淆Copilot并导致其响应完整性和保密性违规的RAG系统安全漏洞。首先，我们调查了一种漏洞，它将恶意文本嵌入RAG中修改的提示中，从而损坏LLM生成的响应。其次，我们展示了一种泄露秘密数据的漏洞，利用了检索过程中的缓存机制。第三，我们研究了如何利用这两种漏洞在企业内传播错误信息，最终影响其运营，如销售和制造。我们还通过调查基于RAG的系统的架构来讨论这些攻击的根本原因。这项研究突出了当今基于RAG的系统中存在的安全漏洞，并提出了设计准则以确保未来基于RAG的系统的安全。

更新时间: 2024-10-23 05:55:31

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2408.04870v5

Adversarial Domain Adaptation for Metal Cutting Sound Detection: Leveraging Abundant Lab Data for Scarce Industry Data

Cutting state monitoring in the milling process is crucial for improving manufacturing efficiency and tool life. Cutting sound detection using machine learning (ML) models, inspired by experienced machinists, can be employed as a cost-effective and non-intrusive monitoring method in a complex manufacturing environment. However, labeling industry data for training is costly and time-consuming. Moreover, industry data is often scarce. In this study, we propose a novel adversarial domain adaptation (DA) approach to leverage abundant lab data to learn from scarce industry data, both labeled, for training a cutting-sound detection model. Rather than adapting the features from separate domains directly, we project them first into two separate latent spaces that jointly work as the feature space for learning domain-independent representations. We also analyze two different mechanisms for adversarial learning where the discriminator works as an adversary and a critic in separate settings, enabling our model to learn expressive domain-invariant and domain-ingrained features, respectively. We collected cutting sound data from multiple sensors in different locations, prepared datasets from lab and industry domain, and evaluated our learning models on them. Experiments showed that our models outperformed the multi-layer perceptron based vanilla domain adaptation models in labeling tasks on the curated datasets, achieving near 92%, 82% and 85% accuracy respectively for three different sensors installed in industry settings.

Updated: 2024-10-23 05:55:21

标题: 对于金属切割声音检测的对抗领域适应性：利用丰富的实验室数据来弥补行业数据的不足

摘要: 在铣削过程中监测切削状态对于提高制造效率和工具寿命至关重要。受经验丰富的机械师的启发，利用机器学习（ML）模型进行切削声音检测可以作为一种成本低廉且非侵入性的监测方法在复杂的制造环境中使用。然而，为训练标签行业数据成本高昂且耗时。此外，行业数据通常稀缺。在本研究中，我们提出了一种新颖的对抗域适应（DA）方法，利用丰富的实验室数据来学习稀缺的行业数据，两者均带标签，用于训练切削声音检测模型。与直接从不同领域调整特征不同，我们首先将它们投影到两个分开的潜在空间中，这两个空间共同作为学习领域独立表示的特征空间。我们还分析了两种不同的对抗学习机制，其中鉴别器在分开的设置中分别作为对手和评论家，使我们的模型能够学习表达领域不变和领域固有特征。我们从不同位置的多个传感器收集了切削声音数据，准备了来自实验室和行业领域的数据集，并在其上评估了我们的学习模型。实验表明，我们的模型在精心策划的数据集上的标签任务中胜过了基于多层感知器的基本域适应模型，在行业设置中分别实现了接近92％、82％和85％的准确率，用于三种不同传感器。

更新时间: 2024-10-23 05:55:21

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.17574v1

Securing Federated Learning Against Novel and Classic Backdoor Threats During Foundation Model Integration

Federated learning (FL) enables decentralized model training while preserving privacy. Recently, integrating Foundation Models (FMs) into FL has boosted performance but also introduced a novel backdoor attack mechanism. Attackers can exploit the FM's capabilities to embed backdoors into synthetic data generated by FMs used for model fusion, subsequently infecting all client models through knowledge sharing without involvement in the long-lasting FL process. These novel attacks render existing FL backdoor defenses ineffective, as they primarily detect anomalies among client updates, which may appear uniformly malicious under this attack. Our work proposes a novel data-free defense strategy by constraining abnormal activations in the hidden feature space during model aggregation on the server. The activation constraints, optimized using synthetic data alongside FL training, mitigate the attack while barely affecting model performance, as the parameters remain untouched. Extensive experiments demonstrate its effectiveness against both novel and classic backdoor attacks, outperforming existing defenses while maintaining model performance.

Updated: 2024-10-23 05:54:41

标题: 在基础模型集成期间保护联邦学习免受新型和经典后门威胁

摘要: 联邦学习（FL）使得去中心化模型训练与保护隐私并存成为可能。最近，将基础模型（FMs）整合到FL中提升了性能，但也引入了一种新的后门攻击机制。攻击者可以利用FM的能力将后门嵌入由用于模型融合的FM生成的合成数据中，随后通过知识共享感染所有客户端模型，而无需参与长时间的FL过程。这些新型攻击使现有的FL后门防御失效，因为它们主要检测客户端更新中的异常，而在这种攻击下可能统一表现为恶意。我们的工作提出了一种新颖的无数据防御策略，通过在服务器上对隐藏特征空间中的异常激活进行约束来进行模型聚合。通过使用合成数据优化的激活约束以及FL训练，可以减轻攻击，同时几乎不影响模型性能，因为参数保持不变。大量实验证明了其对新型和经典后门攻击的有效性，优于现有的防御措施，同时保持模型性能。

更新时间: 2024-10-23 05:54:41

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.17573v1

Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler

In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Recently, meta-learning techniques have demonstrated superior results in OSDG, effectively orchestrating the meta-train and -test tasks by employing varied random categories and predefined domain partition strategies. These approaches prioritize a well-designed training schedule over traditional methods that focus primarily on data augmentation and the enhancement of discriminative feature learning. The prevailing meta-learning models in OSDG typically utilize a predefined sequential domain scheduler to structure data partitions. However, a crucial aspect that remains inadequately explored is the influence brought by strategies of domain schedulers during training. In this paper, we observe that an adaptive domain scheduler benefits more in OSDG compared with prefixed sequential and random domain schedulers. We propose the Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) to achieve an adaptive domain scheduler. This method strategically sequences domains by assessing their reliabilities in utilizing a follower network, trained with confidence scores learned in an evidential manner, regularized by max rebiasing discrepancy, and optimized in a bi-level manner. The results show that our method substantially improves OSDG performance and achieves more discriminative embeddings for both the seen and unseen categories. The source code is publicly available at https://github.com/KPeng9510/EBiL-HaDS.

Updated: 2024-10-23 05:49:00

标题: 推进开放领域泛化的研究：利用证据双层最困难领域调度器

摘要: 在开放域泛化（OSDG）中，模型暴露于数据外观的新变化（领域）和开放集条件，其中在测试时同时存在已知和新颖类别。该任务的挑战源于需要在不同领域之间进行泛化，以及准确量化类别新颖性，这对于动态环境中的应用至关重要。最近，元学习技术在OSDG中表现出优越的结果，通过利用不同的随机类别和预定义的领域划分策略有效地协调元训练和元测试任务。这些方法优先考虑良好设计的训练计划，而不是传统方法主要关注数据增强和增强区分性特征学习。OSDG中主流的元学习模型通常利用预定义的顺序领域调度器来构建数据分区。然而，一个仍然未充分探讨的关键方面是培训过程中领域调度器策略带来的影响。在本文中，我们观察到自适应领域调度器在OSDG中比预定义的顺序和随机领域调度器更有益。我们提出了Evidential Bi-Level Hardest Domain Scheduler（EBiL-HaDS）来实现自适应领域调度器。该方法通过利用一个跟随网络评估领域的可靠性来战略性地排序领域，该网络是用以一种证据方式学习的置信度分数进行训练的，并通过最大重偏差进行正则化，以及以双层方式进行优化。结果表明，我们的方法显著提高了OSDG性能，并为已知和未知类别实现更具区分性的嵌入。源代码可以在https://github.com/KPeng9510/EBiL-HaDS 上公开获取。

更新时间: 2024-10-23 05:49:00

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2409.17555v2

Diffusion-Reward Adversarial Imitation Learning

Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, we propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more robust and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator, and design diffusion rewards based on the classifier's output for policy learning. Extensive experiments are conducted in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more robust and smoother rewards. Project page: https://nturobotlearninglab.github.io/DRAIL/

Updated: 2024-10-23 05:47:21

标题: 扩散-奖励对抗性模仿学习

摘要: Imitation learning旨在通过观察专家演示来学习策略，而无需访问来自环境的奖励信号。生成对抗性模仿学习（GAIL）将模仿学习制定为对抗性学习，利用生成器策略学习来模仿专家行为，鉴别器学习来区分专家演示和代理轨迹。尽管GAIL取得了令人鼓舞的结果，但训练往往脆弱且不稳定。受到最近扩散模型在生成建模中的主导地位的启发，我们提出了扩散奖励对抗性模仿学习（DRAIL），将扩散模型集成到GAIL中，旨在为策略学习提供更健壮和更平滑的奖励。具体地，我们提出了扩散判别分类器来构建增强的鉴别器，并设计基于分类器输出的扩散奖励进行策略学习。在导航、操作和运动方面进行了大量实验，验证了与先前的模仿学习方法相比，DRAIL的有效性。此外，额外的实验结果展示了DRAIL的泛化能力和数据效率。可视化学习的GAIL和DRAIL的奖励函数表明，DRAIL可以产生更加健壮和更加平滑的奖励。项目页面：https://nturobotlearninglab.github.io/DRAIL/

更新时间: 2024-10-23 05:47:21

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.16194v3

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.

Updated: 2024-10-23 05:44:21

标题: UCB探索用于固定预算的贝叶斯最佳臂识别

摘要: 我们研究在固定预算设置下的最佳臂识别（BAI）。基于上界置信区间（UCB）的自适应分配，如UCBE，已被证明在BAI中表现良好。然而，众所周知，其最优遗憾在理论上依赖于实例，我们证明在许多固定预算BAI问题中这是一个人为现象。在本文中，我们提出了一种UCB探索算法，既在贝叶斯设置下在固定预算BAI问题中在理论上和实践上都是高效的。关键思想是学习先验信息，这可以增强基于UCB的BAI算法的性能，就像在累积遗憾最小化问题中所做的那样。我们为贝叶斯BAI问题建立了失败概率和简单遗憾的界限，提供了关于$\tilde{O}(\sqrt{K/n})$阶的上界，加上对数因子，其中$n$表示预算，$K$表示臂的数量。此外，我们通过实证结果证明我们的方法始终优于最先进的基线方法。

更新时间: 2024-10-23 05:44:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2408.04869v3

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://gemcollector.github.io/maniwhere/.

Updated: 2024-10-23 05:32:34

标题: 学习在任何地方操控：一种视觉通用框架用于强化学习

摘要: 我们能否赋予视觉动作机器人泛化能力，使其能够在各种开放世界场景中运行？在本文中，我们提出了一种名为\textbf{Maniwhere}的通用框架，专为视觉强化学习定制，使训练过的机器人策略能够在多种视觉干扰类型的组合中泛化。具体来说，我们引入了一种融合了空间变换网络（STN）模块的多视图表示学习方法，以捕获不同视角之间的共享语义信息和对应关系。此外，我们采用基于课程的随机化和增强方法来稳定强化学习训练过程，并增强视觉泛化能力。为了展示Maniwhere的有效性，我们精心设计了8个任务，涵盖了复杂对象、双手操作和灵巧手操作任务，展示了Maniwhere在3个硬件平台上的强大视觉泛化和从模拟到真实的转移能力。我们的实验表明，Maniwhere明显优于现有的最先进方法。视频请参见https://gemcollector.github.io/maniwhere/。

更新时间: 2024-10-23 05:32:34

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.15815v2

Diffusion Models are Certifiably Robust Classifiers

Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robustness is still lacking, raising concerns about their vulnerability to stronger future attacks. In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. To achieve non-constant Lipschitzness, thereby obtaining much tighter certified robustness, we generalize diffusion classifiers to classify Gaussian-corrupted data. This involves deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. Experimental results show the superior certified robustness of these Noised Diffusion Classifiers (NDCs). Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under adversarial perturbations with $\ell_2$ norms less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.

Updated: 2024-10-23 05:26:10

标题: 扩散模型是具有可靠性保证的分类器

摘要: 生成学习以其有效建模数据分布而闻名，在处理超出分布实例方面具有固有优势，特别是在增强对抗攻击的鲁棒性方面。在这些方法中，利用强大扩散模型的扩散分类器已经表现出卓越的经验鲁棒性。然而，对它们鲁棒性的全面理论理解仍然缺乏，引发对它们对更强未来攻击的脆弱性的担忧。在本研究中，我们证明扩散分类器具有$O(1)$的Lipschitz性，并建立了它们的认证鲁棒性，展示了它们固有的弹性。为了实现非常数Lipschitz性，从而获得更紧密的认证鲁棒性，我们将扩散分类器推广为分类高斯受损数据。这涉及为这些分布导出证据下限（ELBOs），使用ELBO近似似然，并通过贝叶斯定理计算分类概率。实验结果显示了这些带噪声的扩散分类器（NDCs）具有卓越的认证鲁棒性。值得注意的是，我们在CIFAR-10上使用一个现成的扩散模型，在没有任何额外数据的情况下，对于$\ell_2$范数小于0.25和0.5的对抗扰动，分别实现了超过80%和70%的认证鲁棒性。

更新时间: 2024-10-23 05:26:10

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.02316v3

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers

Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.

Updated: 2024-10-23 05:24:23

标题: 一个用于使用自回归表格变换器预测事件的简单基线

摘要: 许多表格数据的实际应用涉及使用历史事件来预测新事件的属性，例如信用卡交易是否欺诈，或客户在零售平台上给产品评分。现有的事件预测方法包括昂贵、脆弱和依赖于应用程序的技术，如时间感知的位置嵌入、学习的行和字段编码，以及用于解决类别不平衡的过取样方法。此外，这些方法通常假设特定的用例，例如我们知道所有历史事件的标签，或者我们只预测一个预先指定的标签，而不是数据本身的特征。在这项工作中，我们提出了一个简单但灵活的基准线，使用标准的自回归LLM风格的transformer，配备基本的位置嵌入和因果语言建模目标。我们的基准线在流行的数据集上优于现有方法，并可用于各种用例。我们证明了同一个模型可以预测标签、填补缺失值，或者建模事件序列。

更新时间: 2024-10-23 05:24:23

领域: cs.LG,cs.CE,stat.ML

下载: http://arxiv.org/abs/2410.10648v2

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.

Updated: 2024-10-23 05:19:51

标题: 差分隐私学习需要更好的模型初始化和自蒸馏

摘要: 差分私有随机梯度下降（DPSGD）实现了语言模型的隐私保护训练，但通常会降低效用、多样性和语言质量。我们引入了DPRefine，这是一种三阶段方法，使用从经过严格过滤的小型预训练LM生成的数据合成来初始化模型，对私有数据应用DP微调，并进行自我蒸馏以改进输出。这种方法明显优于普通的DPSGD，在所有数据集中，AlpacaEval在78.4%的情况下更喜欢DPRefine的生成结果。我们的分析表明，DPRefine将生成文本中的语言错误减少了84.0%，减轻了与DPSGD常见的语法和拼写错误。它还减少了非私有模型的不一致性，例如虚构的细节和错误引用。我们发现像GPT-2这样的小型模型在初始化和蒸馏中可以发挥作用，突显了它们在实现可扩展和高效的部署隐私保护语言方面的潜力。

更新时间: 2024-10-23 05:19:51

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.17566v1

Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series

Chronic Obstructive Pulmonary Disease (COPD) is a chronic lung disease that causes airflow obstruction. Current methods can only detect COPD from prominent features in spirogram (Volume-Flow time series) but cannot predict future COPD risk from subtle data patterns. We propose a deep learning-based method, DeepSpiro, for early prediction of future COPD risk. DeepSpiro consists of four key components: SpiroSmoother for stabilizing the Volume-Flow curve, SpiroEncoder for capturing volume evolution through key patches of varying lengths, SpiroExplainer for integrating heterogeneous data and explaining predictions through volume attention, and SpiroPredictor for predicting the disease risk of undiagnosed high-risk patients based on key patch concavity, with prediction horizons of 1, 2, 3, 4, 5 years, or even longer. Evaluated on the UK Biobank dataset, DeepSpiro achieved an AUC of 0.8328 for COPD detection and demonstrated strong predictive performance for future COPD risk (p-value < 0.001). DeepSpiro effectively predicts the long-term progression of the disease.

Updated: 2024-10-23 05:18:11

标题: 深度学习用于从肺功能图时间序列中检测和早期预测慢性阻塞性肺疾病

摘要: 慢性阻塞性肺疾病（COPD）是一种导致气流阻塞的慢性肺部疾病。目前的方法只能通过肺功能图中突出的特征来检测COPD，但无法从微妙的数据模式中预测未来的COPD风险。我们提出了一种基于深度学习的方法，DeepSpiro，用于早期预测未来COPD风险。DeepSpiro包括四个关键组件：SpiroSmoother用于稳定体积-流量曲线，SpiroEncoder用于捕获不同长度关键块的体积演变，SpiroExplainer用于整合异质数据并通过体积关注解释预测，SpiroPredictor基于关键块凹凸性预测未诊断高风险患者的疾病风险，预测时间范围为1、2、3、4、5年甚至更长。在英国生物库数据集上评估，DeepSpiro实现了0.8328的COPD检测AUC，并表现出对未来COPD风险的强大预测性能（p值<0.001）。DeepSpiro有效地预测了疾病的长期进展。

更新时间: 2024-10-23 05:18:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03239v2

DisenGCD: A Meta Multigraph-assisted Disentangled Graph Learning Framework for Cognitive Diagnosis

Existing graph learning-based cognitive diagnosis (CD) methods have made relatively good results, but their student, exercise, and concept representations are learned and exchanged in an implicit unified graph, which makes the interaction-agnostic exercise and concept representations be learned poorly, failing to provide high robustness against noise in students' interactions. Besides, lower-order exercise latent representations obtained in shallow layers are not well explored when learning the student representation. To tackle the issues, this paper suggests a meta multigraph-assisted disentangled graph learning framework for CD (DisenGCD), which learns three types of representations on three disentangled graphs: student-exercise-concept interaction, exercise-concept relation, and concept dependency graphs, respectively. Specifically, the latter two graphs are first disentangled from the interaction graph. Then, the student representation is learned from the interaction graph by a devised meta multigraph learning module; multiple learnable propagation paths in this module enable current student latent representation to access lower-order exercise latent representations, which can lead to more effective nad robust student representations learned; the exercise and concept representations are learned on the relation and dependency graphs by graph attention modules. Finally, a novel diagnostic function is devised to handle three disentangled representations for prediction. Experiments show better performance and robustness of DisenGCD than state-of-the-art CD methods and demonstrate the effectiveness of the disentangled learning framework and meta multigraph module. The source code is available at \textcolor{red}{\url{https://github.com/BIMK/Intelligent-Education/tree/main/DisenGCD}}.

Updated: 2024-10-23 05:15:59

标题: DisenGCD：一种基于元多图的解缠图学习框架，用于认知诊断

摘要: 现有的基于图学习的认知诊断（CD）方法取得了相对良好的结果，但它们的学生、练习和概念表示是在一个隐式统一的图中学习和交换的，这使得不考虑交互作用的练习和概念表示学习不佳，无法提供对学生交互作用中的噪声具有高鲁棒性。此外，在学习学生表示时，浅层获得的低阶练习潜在表示并没有得到很好的探索。为了解决这些问题，本文提出了一种用于CD的元多图辅助解耦图学习框架（DisenGCD），该框架在三个解耦的图上学习三种表示：学生-练习-概念交互作用图、练习-概念关系图和概念依赖图。具体来说，后两个图首先从交互作用图中解耦出来。然后，通过设计的元多图学习模块从交互作用图中学习学生表示；该模块中的多个可学习传播路径使当前学生潜在表示能够访问低阶练习潜在表示，从而导致学生表示学习更有效和鲁棒；通过图注意力模块在关系图和依赖图上学习练习和概念表示。最后，设计了一种新颖的诊断功能来处理用于预测的三种解耦表示。实验表明，DisenGCD的性能和鲁棒性优于最先进的CD方法，并证明了解耦学习框架和元多图模块的有效性。源代码可在\textcolor{red}{\url{https://github.com/BIMK/Intelligent-Education/tree/main/DisenGCD}}获得。

更新时间: 2024-10-23 05:15:59

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2410.17564v1

Timetable Nodes for Public Transport Network

Faster pathfinding in time-dependent transport networks is an important and challenging problem in navigation systems. There are two main types of transport networks: road networks for car driving and public transport route network. The solutions that work well in road networks, such as Time-dependent Contraction Hierarchies and other graph-based approaches, do not usually apply in transport networks. In transport networks, non-graph solutions such as CSA and RAPTOR show the best results compared to graph-based techniques. In our work, we propose a method that advances graph-based approaches by using different optimization techniques from computational geometry to speed up the search process in transport networks. We apply a new pre-computation step, which we call timetable nodes (TTN). Our inspiration comes from an iterative search problem in computational geometry. We implement two versions of the TTN: one uses a Combined Search Tree (TTN-CST), and the second uses Fractional Cascading (TTN-FC). Both of these approaches decrease the asymptotic complexity of reaching new nodes from $O(k\times \log|C|)$ to $O(k + \log(k) + \log(|C|))$, where $k$ is the number of outgoing edges from a node and $|C|$ is the size of the timetable information (total outgoing edges). Our solution suits any other time-dependent networks and can be integrated into other pathfinding algorithms. Our experiments indicate that this pre-computation significantly enhances the performance on high-density graphs. This study showcases how leveraging computational geometry can enhance pathfinding in transport networks, enabling faster pathfinding in scenarios involving large numbers of outgoing edges.

Updated: 2024-10-23 05:02:39

标题: 公共交通网络的时间表节点

摘要: 在时变交通网络中更快的路径规划是导航系统中一个重要且具有挑战性的问题。主要有两种类型的交通网络：用于汽车驾驶的道路网络和公共交通路线网络。在道路网络中表现良好的解决方案，如时变缩小分层和其他基于图的方法，通常不适用于交通网络。在交通网络中，与基于图的技术相比，非图解决方案如CSA和RAPTOR表现最佳。在我们的工作中，我们提出了一种通过使用来自计算几何的不同优化技术来加速在交通网络中搜索过程的基于图的方法。我们应用了一个新的预计算步骤，我们称之为时间表节点（TTN）。我们的灵感来自于计算几何中的一个迭代搜索问题。我们实现了两个版本的TTN：一个使用组合搜索树（TTN-CST），另一个使用分数级联（TTN-FC）。这两种方法都将从一个节点到达新节点的渐近复杂度从$O(k\times \log|C|)$减少到$O(k + \log(k) + \log(|C|))$，其中$k$是一个节点的出边数，$|C|$是时间表信息的大小（总出边数）。我们的解决方案适用于任何其他时变网络，并可以集成到其他路径规划算法中。我们的实验结果表明，这种预计算显著提升了高密度图上的性能。这项研究展示了如何利用计算几何可以增强交通网络中的路径规划，使得在涉及大量出边的情况下更快地进行路径规划成为可能。

更新时间: 2024-10-23 05:02:39

领域: cs.DS,cs.AI,cs.CG

下载: http://arxiv.org/abs/2410.15715v2

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks. While emerging benchmarks have been proposed to evaluate LLMs in various domains such as mathematics and computer science, they merely measure the accuracy in terms of the final prediction on multi-choice questions. However, it remains insufficient to verify the essential understanding of LLMs given a chosen choice. To fill this gap, we present CLR-Bench to comprehensively evaluate the LLMs in complex college-level reasoning. Specifically, (i) we prioritize 16 challenging college disciplines in computer science and artificial intelligence. The dataset contains 5 types of questions, while each question is associated with detailed explanations from experts. (ii) To quantify a fair evaluation of LLMs' reasoning ability, we formalize the criteria with two novel metrics. Q$\rightarrow$A is utilized to measure the performance of direct answer prediction, and Q$\rightarrow$AR effectively considers the joint ability to answer the question and provide rationale simultaneously. Extensive experiments are conducted with 40 LLMs over 1,018 discipline-specific questions. The results demonstrate the key insights that LLMs, even the best closed-source LLM, i.e., GPT-4 turbo, tend to `guess' the college-level answers. It shows a dramatic decrease in accuracy from 63.31% Q$\rightarrow$A to 39.00% Q$\rightarrow$AR, indicating an unsatisfactory reasoning ability.

Updated: 2024-10-23 04:55:08

标题: CLR-Bench：评估大型语言模型在大学水平推理中的表现

摘要: 大型语言模型（LLMs）已经展示出在各种语言理解任务中的卓越表现。虽然新兴的基准已被提出来评估LLMs在各个领域如数学和计算机科学中的表现，但它们仅仅测量了在多选题上的最终预测准确性。然而，这仍然不足以验证LLMs在选择的情况下的基本理解。为了填补这一空白，我们提出了CLR-Bench来全面评估LLMs在复杂的大学水平推理中的表现。具体来说，（i）我们优先考虑了计算机科学和人工智能中的16个具有挑战性的大学学科。数据集包含5种类型的问题，而每个问题都与专家的详细解释相关联。（ii）为了量化LLMs推理能力的公平评估，我们用两个新颖的度量标准来形式化标准。Q→A用于衡量直接答案预测的性能，而Q→AR则有效考虑了同时回答问题并提供理由的综合能力。我们对40个LLMs在1018个专业问题上进行了广泛实验。结果显示LLMs，甚至是最好的封闭源LLM，即GPT-4 turbo，倾向于“猜测”大学水平的答案。从63.31%的Q→A到39.00%的Q→AR，准确率显著下降，表明推理能力不尽人意。

更新时间: 2024-10-23 04:55:08

领域: cs.AI

下载: http://arxiv.org/abs/2410.17558v1

BlurryScope: a cost-effective and compact scanning microscope for automated HER2 scoring using deep learning on blurry image data

We developed a rapid scanning optical microscope, termed "BlurryScope", that leverages continuous image acquisition and deep learning to provide a cost-effective and compact solution for automated inspection and analysis of tissue sections. BlurryScope integrates specialized hardware with a neural network-based model to quickly process motion-blurred histological images and perform automated pathology classification. This device offers comparable speed to commercial digital pathology scanners, but at a significantly lower price point and smaller size/weight, making it ideal for fast triaging in small clinics, as well as for resource-limited settings. To demonstrate the proof-of-concept of BlurryScope, we implemented automated classification of human epidermal growth factor receptor 2 (HER2) scores on immunohistochemically (IHC) stained breast tissue sections, achieving concordant results with those obtained from a high-end digital scanning microscope. We evaluated this approach by scanning HER2-stained tissue microarrays (TMAs) at a continuous speed of 5 mm/s, which introduces bidirectional motion blur artifacts. These compromised images were then used to train our network models. Using a test set of 284 unique patient cores, we achieved blind testing accuracies of 79.3% and 89.7% for 4-class (0, 1+, 2+, 3+) and 2-class (0/1+ , 2+/3+) HER2 score classification, respectively. BlurryScope automates the entire workflow, from image scanning to stitching and cropping of regions of interest, as well as HER2 score classification. We believe BlurryScope has the potential to enhance the current pathology infrastructure in resource-scarce environments, save diagnostician time and bolster cancer identification and classification across various clinical environments.

Updated: 2024-10-23 04:46:36

标题: 模糊范围：一种用于使用深度学习在模糊图像数据上自动进行HER2评分的经济高效且紧凑的扫描显微镜

摘要: 我们开发了一种名为“BlurryScope”的快速扫描光学显微镜，利用连续图像采集和深度学习，为组织切片的自动检查和分析提供了成本效益和紧凑的解决方案。BlurryScope集成了专门的硬件和基于神经网络的模型，可以快速处理运动模糊的组织学图像，并执行自动病理分类。该设备提供了与商用数字病理扫描仪相当的速度，但价格显著较低，尺寸/重量更小，非常适合在小诊所中进行快速分诊，以及在资源有限的环境中使用。为了证明BlurryScope的概念，我们实施了对免疫组化染色的乳腺组织切片上人表皮生长因子受体2（HER2）得分的自动分类，与高端数字扫描显微镜获得的结果一致。我们通过以5mm/s的连续速度扫描HER2染色的组织微阵列（TMA）来评估这种方法，这引入了双向运动模糊伪影。然后使用这些受损的图像来训练我们的网络模型。在284个独立患者核心的测试集上，我们分别实现了4级（0、1+、2+、3+）和2级（0/1+、2+/3+）HER2得分分类的盲测试准确率分别为79.3%和89.7%。BlurryScope自动化整个工作流程，从图像扫描到区域的拼接和裁剪，以及HER2得分分类。我们相信BlurryScope有潜力提升当前资源匮乏环境中的病理学基础设施，节省诊断者时间，并加强在各种临床环境中的癌症识别和分类。

更新时间: 2024-10-23 04:46:36

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2410.17557v1

FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

As trustworthy AI continues to advance, the fairness issue in recommendations has received increasing attention. A recommender system is considered unfair when it produces unequal outcomes for different user groups based on user-sensitive attributes (e.g., age, gender). Some researchers have proposed data augmentation-based methods aiming at alleviating user-level unfairness by altering the skewed distribution of training data among various user groups. Despite yielding promising results, they often rely on fairness-related assumptions that may not align with reality, potentially reducing the data quality and negatively affecting model effectiveness. To tackle this issue, in this paper, we study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework aiming at improving fairness in recommender system. First, FairDgcl develops an adversarial contrastive network with a view generator and a view discriminator to learn generating fair augmentation strategies in an adversarial style. Then, we propose two dynamic, learnable models to generate contrastive views within contrastive learning framework, which automatically fine-tune the augmentation strategies. Meanwhile, we theoretically show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy. Lastly, comprehensive experiments conducted on four real-world datasets demonstrate the effectiveness of the proposed FairDgcl.

Updated: 2024-10-23 04:43:03

标题: FairDgcl：具有动态图对比学习的公平感知推荐

摘要: 随着可信人工智能的不断发展，推荐系统中的公平性问题受到了越来越多的关注。当一个推荐系统基于用户敏感属性（如年龄、性别）产生不同用户群体间的不平等结果时，就被认为是不公平的。一些研究人员提出了基于数据增强的方法，旨在通过改变各种用户群体之间的训练数据的偏斜分布来减轻用户级别的不公平现象。尽管取得了有希望的结果，但这些方法通常依赖可能与现实不一致的公平相关假设，可能会降低数据质量并对模型效果产生负面影响。为了解决这个问题，在本文中，我们研究了如何实现高质量的数据增强来提高推荐系统的公平性。具体来说，我们提出了FairDgcl，这是一个旨在提高推荐系统公平性的动态图对抗对比学习框架。首先，FairDgcl开发了一个对抗性对比网络，包括一个视图生成器和一个视图鉴别器，以对抗方式学习生成公平增强策略。然后，我们提出了两个动态可学习模型，在对比学习框架内生成对比视图，自动微调增强策略。同时，我们从理论上展示了FairDgcl可以同时生成既具有公平性又具有准确性的增强表示。最后，在四个真实世界数据集上进行的全面实验证明了提出的FairDgcl的有效性。

更新时间: 2024-10-23 04:43:03

领域: cs.AI

下载: http://arxiv.org/abs/2410.17555v1

RotCAtt-TransUNet++: Novel Deep Neural Network for Sophisticated Cardiac Segmentation

Cardiovascular disease remains a predominant global health concern, responsible for a significant portion of mortality worldwide. Accurate segmentation of cardiac medical imaging data is pivotal in mitigating fatality rates associated with cardiovascular conditions. However, existing state-of-the-art (SOTA) neural networks, including both CNN-based and Transformer-based approaches, exhibit limitations in practical applicability due to their inability to effectively capture inter-slice connections alongside intra-slice information. This deficiency is particularly evident in datasets featuring intricate, long-range details along the z-axis, such as coronary arteries in axial views. Additionally, SOTA methods fail to differentiate non-cardiac components from myocardium in segmentation, leading to the "spraying" phenomenon. To address these challenges, we present RotCAtt-TransUNet++, a novel architecture tailored for robust segmentation of complex cardiac structures. Our approach emphasizes modeling global contexts by aggregating multiscale features with nested skip connections in the encoder. It integrates transformer layers to capture interactions between patches and employs a rotatory attention mechanism to capture connectivity between multiple slices (inter-slice information). Additionally, a channel-wise cross-attention gate guides the fused multi-scale channel-wise information and features from decoder stages to bridge semantic gaps. Experimental results demonstrate that our proposed model outperforms existing SOTA approaches across four cardiac datasets and one abdominal dataset. Importantly, coronary arteries and myocardium are annotated with near-perfect accuracy during inference. An ablation study shows that the rotatory attention mechanism effectively transforms embedded vectorized patches in the semantic dimensional space, enhancing segmentation accuracy.

Updated: 2024-10-23 04:41:51

标题: RotCAtt-TransUNet++: 用于复杂心脏分割的新型深度神经网络

摘要: 心血管疾病仍然是全球主要的健康关注焦点，导致世界范围内有显著比例的死亡率。准确分割心脏医学影像数据对于减少与心血管疾病相关的死亡率至关重要。然而，现有的最先进（SOTA）神经网络，包括基于CNN和基于Transformer的方法，在实际应用中存在局限性，因为它们无法有效捕获在切片内信息之外的切片间连接。这种缺陷在具有沿z轴的复杂、长程细节的数据集中特别明显，例如在轴视图中的冠状动脉。此外，SOTA方法未能区分非心脏组件与心肌在分割中，导致“喷涂”现象。为了应对这些挑战，我们提出了RotCAtt-TransUNet++，这是一种专为复杂心脏结构进行鲁棒分割的新型架构。我们的方法强调通过在编码器中聚合多尺度特征和嵌套跳跃连接来建模全局上下文。它整合了Transformer层来捕捉补丁之间的相互作用，并采用旋转注意机制来捕捉多个切片之间的连接性（切片间信息）。此外，一个通道级的跨注意门引导了来自解码器阶段的融合多尺度通道级信息和特征，以弥合语义差距。实验结果表明，我们提出的模型在四个心脏数据集和一个腹部数据集中优于现有的SOTA方法。重要的是，在推断过程中，冠状动脉和心肌被注释的准确度接近完美。消融研究表明，旋转注意机制有效地将嵌入式向量化补丁转换为语义维度空间，提高了分割的准确性。

更新时间: 2024-10-23 04:41:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.05280v2

LEADS: Lightweight Embedded Assisted Driving System

With the rapid development of electric vehicles, formula races that face high school and university students have become more popular than ever as the threshold for design and manufacturing has been lowered. In many cases, we see teams inspired by or directly using toolkits and technologies inherited from standardized commercial vehicles. These architectures are usually overly complicated for amateur applications like the races. In order to improve the efficiency and simplify the development of instrumentation, control, and analysis systems, we propose LEADS (Lightweight Embedded Assisted Driving System), a dedicated solution for such scenarios.

Updated: 2024-10-23 04:40:45

标题: LEADS：轻量级嵌入式辅助驾驶系统

摘要: 随着电动汽车的快速发展，面向高中和大学学生的方程式比赛比以往任何时候都更受欢迎，因为设计和制造的门槛已经降低。在许多情况下，我们看到一些团队受到或直接使用来自标准商用车辆的工具包和技术的启发。这些架构通常对于像赛车这样的业余应用来说过于复杂。为了提高仪器、控制和分析系统的效率并简化开发过程，我们提出了LEADS（轻量级嵌入式辅助驾驶系统），这是针对这种场景的专门解决方案。

更新时间: 2024-10-23 04:40:45

领域: cs.SE,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.17554v1

Utility Theory of Synthetic Data Generation

Synthetic data algorithms are widely employed in industries to generate artificial data for downstream learning tasks. While existing research primarily focuses on empirically evaluating utility of synthetic data, its theoretical understanding is largely lacking. This paper bridges the practice-theory gap by establishing relevant utility theory in a statistical learning framework. It considers two utility metrics: generalization and ranking of models trained on synthetic data. The former is defined as the generalization difference between models trained on synthetic and on real data. By deriving analytical bounds for this utility metric, we demonstrate that the synthetic feature distribution does not need to be similar as that of real data for ensuring comparable generalization of synthetic models, provided proper model specifications in downstream learning tasks. The latter utility metric studies the relative performance of models trained on synthetic data. In particular, we discover that the distribution of synthetic data is not necessarily similar as the real one to ensure consistent model comparison. Interestingly, consistent model comparison is still achievable even when synthetic responses are not well generated, as long as downstream models are separable by a generalization gap. Finally, extensive experiments on non-parametric models and deep neural networks have been conducted to validate these theoretical findings.

Updated: 2024-10-23 04:34:35

标题: 合成数据生成的效用理论 (Note: "Utility Theory" can also be translated as "效用理论" in Chinese)

摘要: 合成数据算法被广泛应用于各行业，用于生成人工数据供下游学习任务使用。尽管现有研究主要集中在经验性评估合成数据的实用性，但其理论理解大多欠缺。本文通过在统计学习框架中建立相关实用性理论来弥合实践与理论之间的差距。本文考虑了两个实用性度量标准：基于合成数据训练的模型的泛化能力和排名。前者被定义为基于合成数据和真实数据训练的模型之间的泛化差异。通过为这一实用性度量标准推导出分析界限，我们证明了为了确保合成模型的泛化能力可比较，合成特征分布不需要与真实数据相似，只要在下游学习任务中进行适当的模型规范。后者实用性度量研究了基于合成数据训练的模型的相对性能。特别地，我们发现合成数据的分布不一定与真实数据相似以确保一致的模型比较。有趣的是，即使合成响应未被很好生成，只要下游模型可通过泛化差距分开，一致的模型比较仍然是可以实现的。最后，我们进行了大量关于非参数模型和深度神经网络的实验来验证这些理论发现。

更新时间: 2024-10-23 04:34:35

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.10015v3

Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors

Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively from multiple sensory modalities that differ in many characteristics. Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs to improve the sample efficiency and performance of reinforcement learning algorithms. However, the representations learned by these methods could capture information irrelevant to learning a policy and may degrade the performance. We argue that compressing information in the learned joint representations about raw multimodal observations is helpful, and propose a multimodal information bottleneck model to learn task-relevant joint representations from egocentric images and proprioception. Our model compresses and retains the predictive information in multimodal observations for learning a compressed joint representation, which fuses complementary information from visual and proprioceptive feedback and meanwhile filters out task-irrelevant information in raw multimodal observations. We propose to minimize the upper bound of our multimodal information bottleneck objective for computationally tractable optimization. Experimental evaluations on several challenging locomotion tasks with egocentric images and proprioception show that our method achieves better sample efficiency and zero-shot robustness to unseen white noise than leading baselines. We also empirically demonstrate that leveraging information from egocentric images and proprioception is more helpful for learning policies on locomotion tasks than solely using one single modality.

Updated: 2024-10-23 04:32:37

标题: 多传感器深度强化学习的多模态信息瓶颈

摘要: 强化学习已经在机器人控制任务上取得了有希望的结果，但在有效利用多种特征不同的感知模态的信息方面存在困难。最近的研究构建了基于重建或互信息的辅助损失，以从多个感知输入中提取联合表示，以提高强化学习算法的样本效率和性能。然而，这些方法学习的表示可能捕获与学习策略无关的信息，并可能降低性能。我们认为，在学习关于原始多模态观察的联合表示方面压缩信息是有帮助的，并提出了一个多模态信息瓶颈模型，从自我中心图像和本体感觉中学习任务相关的联合表示。我们的模型压缩并保留多模态观察中的预测信息，用于学习压缩的联合表示，融合了视觉和本体感觉反馈的互补信息，同时过滤了原始多模态观察中的与任务无关的信息。我们建议最小化我们的多模态信息瓶颈目标的上界，以进行可计算的优化。在几个具有挑战性的运动任务上进行的实验评估结果显示，我们的方法比主流基线实现了更好的样本效率和对未见白噪声的零-shot鲁棒性。我们还经验性地证明，利用自我中心图像和本体感觉的信息比仅使用单一模态更有助于在运动任务上学习策略。

更新时间: 2024-10-23 04:32:37

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.17551v1

Quantformer: from attention to profit with a quantitative transformer trading strategy

In traditional quantitative trading practice, navigating the complicated and dynamic financial market presents a persistent challenge. Fully capturing various market variables, including long-term information, as well as essential signals that may lead to profit remains a difficult task for learning algorithms. In order to tackle this challenge, this paper introduces quantformer, an enhanced neural network architecture based on transformers, to build investment factors. By transfer learning from sentiment analysis, quantformer not only exploits its original inherent advantages in capturing long-range dependencies and modeling complex data relationships, but is also able to solve tasks with numerical inputs and accurately forecast future returns over a given period. This work collects more than 5,000,000 rolling data of 4,601 stocks in the Chinese capital market from 2010 to 2019. The results of this study demonstrated the model's superior performance in predicting stock trends compared with other 100 factor-based quantitative strategies. Notably, the model's innovative use of transformer-liked model to establish factors, in conjunction with market sentiment information, has been shown to enhance the accuracy of trading signals significantly, thereby offering promising implications for the future of quantitative trading strategies.

Updated: 2024-10-23 04:27:26

标题: Quantformer：从关注到利润的量化Transformer交易策略

摘要: 在传统的量化交易实践中，导航复杂和动态的金融市场一直是一个持久的挑战。充分捕捉包括长期信息在内的各种市场变量，以及可能导致盈利的关键信号，对于学习算法来说仍然是一项困难的任务。为了解决这一挑战，本文介绍了quantformer，这是一种基于transformers的增强型神经网络架构，用于构建投资因子。通过从情感分析中进行迁移学习，quantformer不仅利用其原始固有优势来捕捉长距离依赖性和建模复杂数据关系，而且能够解决具有数字输入的任务并准确预测未来一定时期的回报。本研究收集了2010年至2019年中国资本市场4601只股票的500万多个滚动数据。本研究结果表明，与其他100种基于因子的量化策略相比，该模型在预测股票趋势方面表现出优越性能。值得注意的是，该模型创新地使用transformer-liked模型建立因子，结合市场情绪信息，已被证明显著提高交易信号的准确性，从而为量化交易策略的未来提供了有希望的启示。

更新时间: 2024-10-23 04:27:26

领域: q-fin.MF,cs.AI,cs.CE,G.3; J.2

下载: http://arxiv.org/abs/2404.00424v2

Spectraformer: A Unified Random Feature Framework for Transformer

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods use a subset of combinations of component functions and weight matrices within the random features paradigm. We identify the need for a systematic comparison of different combinations of weight matrices and component functions for attention learning in Transformer. In this work, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in linearized attention of the Transformer. We experiment with broad classes of component functions and weight matrices for three textual tasks in the LRA benchmark. Our empirical findings indicate that different kernels are good at different tasks and that kernel choice is fundamental to performant models. Our code is available at: https://github.com/dukenguyenxyz/spectraformer .

Updated: 2024-10-23 04:08:23

标题: Spectraformer：一种用于Transformer的统一随机特征框架

摘要: 利用各种核逼近和核学习技术对注意力进行线性化已经显示出潜力。过去的方法使用随机特征范式中的一部分组合的组件函数和权重矩阵。我们确定需要对Transformer中不同组件函数和权重矩阵的组合进行系统比较以进行注意力学习。在这项工作中，我们介绍了Spectraformer，这是一个在Transformer的线性化注意力中逼近和学习核函数的统一框架。我们在LRA基准测试中尝试了广泛的组件函数和权重矩阵类别的三个文本任务。我们的实证研究结果表明，不同的核对不同的任务表现良好，核的选择对于性能模型是基本的。我们的代码可以在以下网址找到：https://github.com/dukenguyenxyz/spectraformer。

更新时间: 2024-10-23 04:08:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.15310v3

Set-based Meta-Interpolation for Few-Task Meta-Learning

Meta-learning approaches enable machine learning systems to adapt to new tasks given few examples by leveraging knowledge from related tasks. However, a large number of meta-training tasks are still required for generalization to unseen tasks during meta-testing, which introduces a critical bottleneck for real-world problems that come with only few tasks, due to various reasons including the difficulty and cost of constructing tasks. Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution. However, such reliance on domain-specific knowledge renders these methods inapplicable to other domains. While Manifold Mixup based task augmentation methods are domain-agnostic, we empirically find them ineffective on non-image domains. To tackle these limitations, we propose a novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains such as image classification, molecule property prediction, text classification and speech recognition. Experimentally, we show that Meta-Interpolation consistently outperforms all the relevant baselines. Theoretically, we prove that task interpolation with the set function regularizes the meta-learner to improve generalization.

Updated: 2024-10-23 04:00:40

标题: 基于集合的少任务元学习的元插值

摘要: 元学习方法使机器学习系统能够通过利用相关任务的知识，仅给出少量示例就能够适应新任务。然而，为了在元测试期间泛化到未见过的任务，仍然需要大量的元训练任务，这为只有少量任务的真实世界问题引入了关键瓶颈，这些问题由于各种原因，包括构建任务的困难和成本。最近，已经提出了几种任务增强方法来解决这个问题，利用领域特定知识来设计增强技术，使元训练任务分布更加密集。然而，对领域特定知识的依赖使这些方法无法应用于其他领域。虽然基于Manifold Mixup的任务增强方法是领域无关的，但我们在非图像领域中经验上发现它们无效。为了解决这些局限性，我们提出了一种新颖的领域无关任务增强方法，Meta-Interpolation，它利用表达性神经集函数使用双层优化来密集化元训练任务分布。我们通过实验证实了Meta-Interpolation在跨越图像分类、分子属性预测、文本分类和语音识别等各种领域的八个数据集上的有效性。实验结果表明，Meta-Interpolation始终优于所有相关基线。从理论上讲，我们证明了使用集函数进行任务插值可以规范元学习器以改善泛化能力。

更新时间: 2024-10-23 04:00:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2205.09990v4

RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this paper, we propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff, aimed at improving the alignment between text and images in text-to-image diffusion models. In the coarse semantic re-alignment phase, a novel caption reward, leveraging the BLIP-2 model, is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. Subsequently, the fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view. Experimental results on the MS-COCO and ViLG-300 datasets demonstrate that the proposed two-stage coarse-to-fine semantic re-alignment method outperforms other baseline re-alignment techniques by a substantial margin in both visual quality and semantic similarity with the input prompt.

Updated: 2024-10-23 03:59:05

标题: RealignDiff: 利用粗到细的语义重新对齐增强文本到图像扩散模型

摘要: 最近，在文本到图像扩散模型方面取得了显著进展，成功地从文本描述中生成了高质量、逼真的图像。然而，这些方法在将生成的视觉内容与提示中描述的文本概念精确对齐方面面临挑战。在本文中，我们提出了一种两阶段粗到精的语义重新对齐方法，命名为RealignDiff，旨在改善文本到图像扩散模型中文本和图像之间的对齐。在粗语义重新对齐阶段，提出了一种新颖的标题奖励，利用BLIP-2模型，评估生成的图像标题与给定文本提示之间的语义差异。随后，精细语义重新对齐阶段采用本地密集标题生成模块和重新加权注意调节模块，从本地语义视角对之前生成的图像进行细化。对MS-COCO和ViLG-300数据集的实验结果表明，提出的两阶段粗到精的语义重新对齐方法在视觉质量和语义与输入提示的相似性方面远远优于其他基线重新对齐技术。

更新时间: 2024-10-23 03:59:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.19599v4

Quantifying the Gain in Weak-to-Strong Generalization

Recent advances in large language models have shown capabilities that are extraordinary and near-superhuman. These models operate with such complexity that reliably evaluating and aligning them proves challenging for humans. This leads to the natural question: can guidance from weak models (like humans) adequately direct the capabilities of strong models? In a recent and somewhat surprising work, Burns et al. (2023) empirically demonstrated that when strong models (like GPT-4) are finetuned using labels generated by weak supervisors (like GPT-2), the strong models outperform their weaker counterparts -- a phenomenon they term weak-to-strong generalization. In this work, we present a theoretical framework for understanding weak-to-strong generalization. Specifically, we show that the improvement in performance achieved by strong models over their weaker counterparts is quantified by the misfit error incurred by the strong model on labels generated by the weaker model. Our theory reveals several curious algorithmic insights. For instance, we can predict the amount by which the strong model will improve over the weak model, and also choose among different weak models to train the strong model, based on its misfit error. We validate our theoretical findings through various empirical assessments.

Updated: 2024-10-23 03:55:34

标题: 量化弱到强泛化的增益

摘要: 最近大型语言模型的进展展示出了非凡和接近超人类的能力。这些模型的运作复杂到使得人类难以可靠地评估和对齐它们。这引发了一个自然的问题：来自弱模型（如人类）的指导能否充分引导强模型的能力？在最近一项有些令人惊讶的研究中，Burns等人（2023年）经验性地证明，当强模型（如GPT-4）使用弱监督者（如GPT-2）生成的标签进行微调时，强模型表现优于其弱对手--他们称之为弱到强泛化现象。在这项工作中，我们提出了一个理论框架来理解弱到强泛化。具体而言，我们展示了强模型在其较弱对手上的性能提升是由强模型在由较弱模型生成的标签上产生的不匹配误差所量化的。我们的理论揭示了一些奇特的算法洞见。例如，我们可以预测强模型将比弱模型提高多少，并且根据其不匹配误差从不同的弱模型中选择来训练强模型。我们通过各种经验评估验证了我们的理论发现。

更新时间: 2024-10-23 03:55:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15116v2

ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text Classification

Deep neural networks have achieved remarkable performance in various text-based tasks but often lack interpretability, making them less suitable for applications where transparency is critical. To address this, we propose ProtoLens, a novel prototype-based model that provides fine-grained, sub-sentence level interpretability for text classification. ProtoLens uses a Prototype-aware Span Extraction module to identify relevant text spans associated with learned prototypes and a Prototype Alignment mechanism to ensure prototypes are semantically meaningful throughout training. By aligning the prototype embeddings with human-understandable examples, ProtoLens provides interpretable predictions while maintaining competitive accuracy. Extensive experiments demonstrate that ProtoLens outperforms both prototype-based and non-interpretable baselines on multiple text classification benchmarks. Code and data are available at \url{https://anonymous.4open.science/r/ProtoLens-CE0B/}.

Updated: 2024-10-23 03:53:46

标题: ProtoLens：推进原型学习，实现文本分类中的细粒度可解释性

摘要: 深度神经网络在各种基于文本的任务中取得了显著的表现，但通常缺乏可解释性，使它们不太适合对透明度至关重要的应用。为了解决这个问题，我们提出了ProtoLens，一种基于原型的模型，为文本分类提供了细粒度的、子句级别的可解释性。ProtoLens使用一个基于原型的跨度提取模块来识别与学习原型相关的文本跨度，并使用原型对齐机制来确保训练过程中原型的语义含义始终有效。通过将原型嵌入与人类可理解的示例对齐，ProtoLens提供了可解释的预测结果，同时保持了竞争性的准确性。大量实验证明，ProtoLens在多个文本分类基准上优于基于原型和不可解释的基线。代码和数据可在\url{https://anonymous.4open.science/r/ProtoLens-CE0B/} 上获取。

更新时间: 2024-10-23 03:53:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17546v1

Predicting 30-Day Hospital Readmission in Medicare Patients: Insights from an LSTM Deep Learning Model

Readmissions among Medicare beneficiaries are a major problem for the US healthcare system from a perspective of both healthcare operations and patient caregiving outcomes. Our study analyzes Medicare hospital readmissions using LSTM networks with feature engineering to assess feature contributions. We selected variables from admission-level data, inpatient medical history and patient demography. The LSTM model is designed to capture temporal dynamics from admission-level and patient-level data. On a case study on the MIMIC dataset, the LSTM model outperformed the logistic regression baseline, accurately leveraging temporal features to predict readmission. The major features were the Charlson Comorbidity Index, hospital length of stay, the hospital admissions over the past 6 months, while demographic variables were less impactful. This work suggests that LSTM networks offers a more promising approach to improve Medicare patient readmission prediction. It captures temporal interactions in patient databases, enhancing current prediction models for healthcare providers. Adoption of predictive models into clinical practice may be more effective in identifying Medicare patients to provide early and targeted interventions to improve patient outcomes.

Updated: 2024-10-23 03:50:32

标题: 预测医疗保险患者30天住院再入院：来自LSTM深度学习模型的见解

摘要: 美国医疗保险受益人的再入院是美国医疗保健系统的一个主要问题，从医疗运营和患者护理结果的角度来看。我们的研究使用LSTM网络和特征工程来分析医疗保险受益人的再入院，以评估特征的贡献。我们从入院级别数据、住院医疗史和患者人口统计数据中选择变量。LSTM模型旨在捕获入院级别和患者级别数据的时间动态。在MIMIC数据集的案例研究中，LSTM模型优于logistic回归基线，准确利用时间特征预测再入院。主要特征是Charlson共病指数、住院天数、过去6个月的住院次数，而人口统计变量的影响较小。这项工作表明，LSTM网络提供了一个更有前途的方法来改进医疗保险受益人的再入院预测。它捕获了患者数据库中的时间相互作用，增强了当前医疗提供者的预测模型。将预测模型应用于临床实践可能更有效地识别医疗保险受益人，以提供早期和有针对性的干预，以改善患者结果。

更新时间: 2024-10-23 03:50:32

领域: cs.LG

下载: http://arxiv.org/abs/2410.17545v1

LLMScan: Causal Scan for LLM Misbehavior Detection

Despite the success of Large Language Models (LLMs) across various fields, their potential to generate untruthful, biased and harmful responses poses significant risks, particularly in critical applications. This highlights the urgent need for systematic methods to detect and prevent such misbehavior. While existing approaches target specific issues such as harmful responses, this work introduces LLMScan, an innovative LLM monitoring technique based on causality analysis, offering a comprehensive solution. LLMScan systematically monitors the inner workings of an LLM through the lens of causal inference, operating on the premise that the LLM's `brain' behaves differently when misbehaving. By analyzing the causal contributions of the LLM's input tokens and transformer layers, LLMScan effectively detects misbehavior. Extensive experiments across various tasks and models reveal clear distinctions in the causal distributions between normal behavior and misbehavior, enabling the development of accurate, lightweight detectors for a variety of misbehavior detection tasks.

Updated: 2024-10-23 03:41:49

标题: LLMScan：用于检测LLM错误行为的因果扫描

摘要: 尽管大型语言模型（LLMs）在各个领域取得了成功，但它们生成不真实、偏见和有害响应的潜力带来了重大风险，特别是在关键应用中。这突显了迫切需要系统方法来检测和防止这种不端行为。虽然现有方法针对特定问题如有害响应，但本文介绍了一种基于因果分析的创新LLM监测技术LLMScan，提供了一个全面的解决方案。LLMScan通过因果推断的视角系统地监测LLM的内部运作，基于LLM的“大脑”在不端行为时表现不同的前提。通过分析LLM的输入标记和变换器层的因果贡献，LLMScan有效地检测不端行为。对各种任务和模型进行的广泛实验显示了正常行为和不端行为之间的因果分布明显区别，从而能够开发准确、轻量级的检测器，用于各种不端行为检测任务。

更新时间: 2024-10-23 03:41:49

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.16638v2

Real-World Robot Applications of Foundation Models: A Review

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.

Updated: 2024-10-23 03:39:00

标题: 基于基础模型的实际机器人应用：一项综述

摘要: 最近在基础模型领域的发展，如大型语言模型（LLMs）和视觉-语言模型（VLMs），在大量数据上训练，促进了在不同任务和模态下的灵活应用。它们的影响涉及医疗保健、教育和机器人等各个领域。本文概述了基础模型在现实世界机器人领域的实际应用，重点是替换现有机器人系统中的特定组件。总结涵盖了基础模型中输入输出关系的视角，以及它们在感知、运动规划和控制中的作用在机器人领域。本文最后讨论了未来挑战和对实际机器人应用的影响。

更新时间: 2024-10-23 03:39:00

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.05741v2

Primal-Dual Spectral Representation for Off-policy Evaluation

Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with only experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the curse of horizon. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a linear representation of value function and stationary distribution correction ratio, i.e., primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, SpectralDICE, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.

Updated: 2024-10-23 03:38:31

标题: 基于原始-对偶光谱表示的离线策略评估

摘要: 离线策略评估（OPE）是强化学习（RL）中最基本的问题之一，其目的是估计给定目标策略的预期长期回报，仅利用来自另一个潜在未知的行为策略的经验。分布校正估计（DICE）家族的估计器通过打破时间限制的诅咒，推动了OPE的最新技术。然而，应用DICE估计器的主要瓶颈在于解决涉及鞍点优化的困难，尤其是在神经网络实现中。本文通过使用转移算子的谱分解，建立价值函数和稳态分布校正比率（即DICE框架中的原始和对偶变量）的线性表示，来解决这一挑战。这种原始-对偶表示不仅绕过了香草DICE中的非凸非凹优化，从而实现了计算效率高的算法，而且为更有效地利用历史数据打开了道路。我们强调，我们的算法SpectralDICE是第一个利用原始-对偶变量的线性表示的算法，既计算又样本效率高，其性能得到了严格的理论样本复杂度保证，并在各种基准测试中进行了彻底的经验评估。

更新时间: 2024-10-23 03:38:31

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2410.17538v1

Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

Various machine learning (ML)-based in-situ monitoring systems have been developed to detect anomalies and defects in laser additive manufacturing (LAM) processes. While multimodal fusion, which integrates data from visual, audio, and other modalities, can improve monitoring performance, it also increases hardware, computational, and operational costs due to the use of multiple sensor types. This paper introduces a cross-modality knowledge transfer (CMKT) methodology for LAM in-situ monitoring, which transfers knowledge from a source modality to a target modality. CMKT enhances the representativeness of the features extracted from the target modality, allowing the removal of source modality sensors during prediction. This paper proposes three CMKT methods: semantic alignment, fully supervised mapping, and semi-supervised mapping. The semantic alignment method establishes a shared encoded space between modalities to facilitate knowledge transfer. It employs a semantic alignment loss to align the distributions of identical groups (e.g., visual and audio defective groups) and a separation loss to distinguish different groups (e.g., visual defective and audio defect-free groups). The two mapping methods transfer knowledge by deriving features from one modality to another using fully supervised and semi-supervised learning approaches. In a case study for LAM in-situ defect detection, the proposed CMKT methods were compared with multimodal audio-visual fusion. The semantic alignment method achieved an accuracy of 98.7% while removing the audio modality during the prediction phase, which is comparable to the 98.2% accuracy obtained through multimodal fusion. Using explainable artificial intelligence, we discovered that semantic alignment CMKT can extract more representative features while reducing noise by leveraging the inherent correlations between modalities.

Updated: 2024-10-23 03:38:00

标题: 激光增材制造中基于机器学习的原位监测的视听交叉模态知识转移

摘要: 已开发了各种基于机器学习（ML）的原位监测系统，用于检测激光增材制造（LAM）过程中的异常和缺陷。虽然多模态融合，即集成来自视觉、音频和其他模态的数据，可以提高监测性能，但也会增加硬件、计算和操作成本，因为需要使用多种传感器类型。本文介绍了一种适用于LAM原位监测的跨模态知识转移（CMKT）方法论，该方法将知识从源模态转移到目标模态。CMKT增强了从目标模态中提取的特征的代表性，允许在预测过程中移除源模态传感器。本文提出了三种CMKT方法：语义对齐、完全监督映射和半监督映射。语义对齐方法建立了模态之间的共享编码空间，以促进知识转移。它利用语义对齐损失来对齐相同组的分布（例如，视觉和音频缺陷组），并使用分离损失来区分不同组（例如，视觉缺陷和音频无缺陷组）。这两种映射方法通过使用完全监督和半监督学习方法从一种模态中提取特征到另一种模态来转移知识。在LAM原位缺陷检测的案例研究中，将提出的CMKT方法与多模态音频-视觉融合进行比较。语义对齐方法在去除音频模态的情况下实现了98.7％的准确性，这与通过多模态融合获得的98.2％准确性相当。通过可解释的人工智能，我们发现语义对齐CMKT可以提取更具代表性的特征，同时通过利用模态之间的固有相关性来减少噪音。

更新时间: 2024-10-23 03:38:00

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2408.05307v2

FedGMark: Certifiably Robust Watermarking for Federated Graph Learning

Federated graph learning (FedGL) is an emerging learning paradigm to collaboratively train graph data from various clients. However, during the development and deployment of FedGL models, they are susceptible to illegal copying and model theft. Backdoor-based watermarking is a well-known method for mitigating these attacks, as it offers ownership verification to the model owner. We take the first step to protect the ownership of FedGL models via backdoor-based watermarking. Existing techniques have challenges in achieving the goal: 1) they either cannot be directly applied or yield unsatisfactory performance; 2) they are vulnerable to watermark removal attacks; and 3) they lack of formal guarantees. To address all the challenges, we propose FedGMark, the first certified robust backdoor-based watermarking for FedGL. FedGMark leverages the unique graph structure and client information in FedGL to learn customized and diverse watermarks. It also designs a novel GL architecture that facilitates defending against both the empirical and theoretically worst-case watermark removal attacks. Extensive experiments validate the promising empirical and provable watermarking performance of FedGMark. Source code is available at: https://github.com/Yuxin104/FedGMark.

Updated: 2024-10-23 03:25:55

标题: FedGMark：面向联合图学习的可靠水印技术

摘要: 联邦图学习（FedGL）是一种新兴的学习范式，用于协作训练来自各个客户端的图数据。然而，在开发和部署FedGL模型期间，它们容易遭受非法复制和模型窃取的风险。基于后门的水印技术是一种用于减轻这些攻击的知名方法，因为它为模型所有者提供了所有权验证。我们通过基于后门的水印技术迈出了保护FedGL模型所有权的第一步。现有技术在实现目标时存在挑战：1）它们要么无法直接应用，要么产生不尽如人意的性能；2）它们容易受到水印移除攻击的影响；3）它们缺乏正式的保证。为了解决所有这些挑战，我们提出了FedGMark，这是针对FedGL的首个经过认证的强韧后门水印技术。FedGMark利用FedGL中的独特图结构和客户端信息来学习定制和多样化的水印。它还设计了一种新颖的GL架构，有利于防御经验和理论上的最坏情况下的水印移除攻击。大量实验证实了FedGMark具有有希望的经验和可证实的水印性能。源代码可在以下链接获取：https://github.com/Yuxin104/FedGMark。

更新时间: 2024-10-23 03:25:55

领域: cs.CR

下载: http://arxiv.org/abs/2410.17533v1

The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards

While Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents to follow instructions, our research reveals that agents guided by VLM rewards often underperform compared to those employing only intrinsic (exploration-driven) rewards, contradicting expectations set by recent work. We hypothesize that false positive rewards -- instances where unintended trajectories are incorrectly rewarded -- are more detrimental than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric is prone to false positive reward estimates. To address this, we introduce BiMI ({Bi}nary {M}utual {I}nformation), a novel reward function designed to mitigate noise. BiMI significantly enhances learning efficiency across diverse and challenging embodied navigation environments. Our findings offer a nuanced understanding of how different types of reward noise impact agent learning and highlight the importance of addressing multimodal reward signal noise when training embodied agents

Updated: 2024-10-23 03:22:48

标题: 丰厚奖励的阴暗面：理解和减轻VLM奖励中的噪音

摘要: 虽然视觉语言模型（VLMs）越来越被用来生成奖励信号，以训练具有指导性的代理来遵循指令，但我们的研究发现，受VLM奖励引导的代理往往表现不佳，与仅使用内在（探索驱动）奖励的代理相比，这与最近的工作设定的预期相矛盾。我们假设假阳性奖励-其中意外轨迹被错误奖励-比假阴性更有害。我们的分析证实了这一假设，显示广泛使用的余弦相似度度量易受假阳性奖励估计影响。为了解决这个问题，我们引入了BiMI（{Bi}nary {M}utual {I}nformation），这是一种旨在减少噪音的新颖奖励函数。BiMI大大提高了在多样化和具有挑战性的具体导航环境中的学习效率。我们的研究结果提供了对不同类型的奖励噪音如何影响代理学习的细致理解，并强调了在训练具有指导性代理时解决多模态奖励信号噪音的重要性。

更新时间: 2024-10-23 03:22:48

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.15922v2

Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact

Multilingual Large Language Models (MLLMs) represent a pivotal advancement in democratizing artificial intelligence across linguistic boundaries. While theoretical foundations are well-established, practical implementation guidelines remain scattered. This work bridges this gap by providing a comprehensive end-to-end framework for developing and deploying MLLMs in production environments. We make three distinctive contributions: First, we present an actionable pipeline from data pre-processing through deployment, integrating insights from academic research and industrial applications. Second, using Llama2 as a case study, we provide detailed optimization strategies for enhancing multilingual capabilities, including curriculum learning approaches for balancing high-resource and low-resource languages, tokenization strategies, and effective sampling methods. Third, we offer an interdisciplinary analysis that considers technical, linguistic, and cultural perspectives in MLLM development. Our findings reveal critical challenges in supporting linguistic diversity, with 88.38% of world languages categorized as low-resource, affecting over a billion speakers. We examine practical solutions through real-world applications in customer service, search engines, and machine translation. By synthesizing theoretical frameworks with production-ready implementation strategies, this survey provides essential guidance for practitioners and researchers working to develop more inclusive and effective multilingual AI systems.

Updated: 2024-10-23 03:19:15

标题: 负责多语种大型语言模型：发展、应用和社会影响调查

摘要: 多语言大型语言模型（MLLMs）代表了在跨语言边界上实现人工智能民主化的一个重要进步。虽然理论基础已经建立得很好，但实际实施指南仍然零散。本文通过提供一个全面的端到端框架来开发和部署MLLMs在生产环境中弥合了这一差距。我们做出了三个独特的贡献：首先，我们提出了一个从数据预处理到部署的可操作流程，整合了学术研究和工业应用的见解。其次，以Llama2为案例研究，我们提供了详细的优化策略，以增强多语言能力，包括课程学习方法来平衡高资源和低资源语言，标记化策略和有效的采样方法。第三，我们提供了一种跨学科分析，考虑了在MLLM开发中的技术、语言和文化视角。我们的研究揭示了支持语言多样性的关键挑战，世界语言中有88.38%被归类为低资源语言，影响了超过10亿人口。我们通过在客户服务、搜索引擎和机器翻译等实际应用中检验实际解决方案。通过将理论框架与生产就绪的实施策略综合起来，本调查为致力于开发更具包容性和有效性的多语言人工智能系统的从业者和研究人员提供了基本指导。

更新时间: 2024-10-23 03:19:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17532v1

Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment

We present Music102, an advanced model built upon the Music101 prototype, aimed at enhancing chord progression accompaniment through a D12-equivariant transformer. Inspired by group theory and symbolic music structures, Music102 leverages musical symmetry--such as transposition and reflection operations--integrating these properties into the transformer architecture. By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences. The POP909 dataset was employed to train and evaluate Music102, revealing significant improvements over Music101 in both weighted loss and exact accuracy metrics, despite using fewer parameters. This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain, addressing challenges in computational music analysis. With its stable and flexible neural framework, Music102 sets the stage for further exploration in equivariant music generation and computational composition tools, bridging mathematical theory with practical music performance.

Updated: 2024-10-23 03:11:01

标题: 音乐102：用于和弦进行伴奏的$D_{12}$等变换器

摘要: 我们提出了Music102，这是在Music101原型基础上构建的一个先进模型，旨在通过一个D12等变换器增强和弦进行伴奏。受到群论和符号音乐结构的启发，Music102利用音乐对称性，如移调和反射操作，将这些特性整合到变换器架构中。通过对先前音乐知识进行编码，该模型在旋律和和弦序列之间保持等变性。使用POP909数据集对Music102进行训练和评估，结果显示在加权损失和精确度指标上，Music102相比Music101有显著的改进，尽管使用了更少的参数。这项工作展示了自注意机制和层归一化在离散音乐领域的适应性，解决了计算音乐分析中的挑战。通过其稳定和灵活的神经框架，Music102为等变音乐生成和计算作曲工具的进一步探索奠定了基础，将数学理论与实际音乐表演联系起来。

更新时间: 2024-10-23 03:11:01

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.18151v1

BrainTransformers: SNN-LLM

This study introduces BrainTransformers, an innovative Large Language Model (LLM) implemented using Spiking Neural Networks (SNN). Our key contributions include: (1) designing SNN-compatible Transformer components such as SNNMatmul, SNNSoftmax, and SNNSiLU; (2) implementing an SNN approximation of the SiLU activation function; and (3) developing a Synapsis module to simulate synaptic plasticity. Our 3-billion parameter model, BrainTransformers-3B-Chat, demonstrates competitive performance across various benchmarks, including MMLU (63.2), BBH (54.1), ARC-C (54.3), and GSM8K (76.3), while potentially offering improved energy efficiency and biological plausibility. The model employs a three-stage training approach, including SNN-specific neuronal synaptic plasticity training. This research opens new avenues for brain-like AI systems in natural language processing and neuromorphic computing. Future work will focus on hardware optimization, developing specialized SNN fine-tuning tools, and exploring practical applications in energy-efficient computing environments.

Updated: 2024-10-23 03:05:37

标题: 脑变形器：SNN-LLM

摘要: 这项研究介绍了BrainTransformers，这是一个创新的大型语言模型（LLM），使用脉冲神经网络（SNN）实现。我们的主要贡献包括：（1）设计了SNN兼容的Transformer组件，如SNNMatmul、SNNSoftmax和SNNSiLU；（2）实现了SiLU激活函数的SNN近似；以及（3）开发了一个模拟突触可塑性的Synapsis模块。我们的30亿参数模型BrainTransformers-3B-Chat，在各种基准测试中表现出竞争力，包括MMLU（63.2）、BBH（54.1）、ARC-C（54.3）和GSM8K（76.3），同时可能提供更好的能源效率和生物可信度。该模型采用了三阶段训练方法，包括SNN特定的神经元突触可塑性训练。这项研究为自然语言处理和神经形态计算中的类脑人工智能系统开辟了新的途径。未来的工作将集中在硬件优化、开发专门的SNN微调工具，并在节能计算环境中探索实际应用。

更新时间: 2024-10-23 03:05:37

领域: cs.NE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14687v2

GDDA: Semantic OOD Detection on Graphs under Covariate Shift via Score-Based Diffusion Models

Out-of-distribution (OOD) detection poses a significant challenge for Graph Neural Networks (GNNs), particularly in open-world scenarios with varying distribution shifts. Most existing OOD detection methods on graphs primarily focus on identifying instances in test data domains caused by either semantic shifts (changes in data classes) or covariate shifts (changes in data features), while leaving the simultaneous occurrence of both distribution shifts under-explored. In this work, we address both types of shifts simultaneously and introduce a novel challenge for OOD detection on graphs: graph-level semantic OOD detection under covariate shift. In this scenario, variations between the training and test domains result from the concurrent presence of both covariate and semantic shifts, where only graphs associated with unknown classes are identified as OOD samples (OODs). To tackle this challenge, we propose a novel two-phase framework called Graph Disentangled Diffusion Augmentation (GDDA). The first phase focuses on disentangling graph representations into domain-invariant semantic factors and domain-specific style factors. In the second phase, we introduce a novel distribution-shift-controlled score-based generative diffusion model that generates latent factors outside the training semantic and style spaces. Additionally, auxiliary pseudo-in-distribution (InD) and pseudo-OOD graph representations are employed to enhance the effectiveness of the energy-based semantic OOD detector. Extensive empirical studies on three benchmark datasets demonstrate that our approach outperforms state-of-the-art baselines.

Updated: 2024-10-23 03:05:33

标题: GDDA：基于评分的扩散模型在图形上检测语义OOD时的协变量转移

摘要: Out-of-distribution (OOD)检测对于图神经网络(GNNs)构成了重大挑战，特别是在存在不同分布转移的开放世界场景中。大多数现有的图上OOD检测方法主要集中在识别由于语义转移（数据类别的改变）或协变量转移（数据特征的改变）而导致的测试数据领域中的实例，同时留下了同时发生两种分布转移的情况的探索不足。在这项工作中，我们同时处理这两种转移类型，并在图上引入一个新的OOD检测挑战：在协变量转移下的图级语义OOD检测。在这种情况下，训练和测试领域之间的变化来自协变量和语义转移的同时出现，只有与未知类别相关的图被识别为OOD样本（OODs）。为了应对这一挑战，我们提出了一个名为Graph Disentangled Diffusion Augmentation (GDDA)的新颖的两阶段框架。第一阶段着重于将图表示解缠为领域不变的语义因子和领域特定的风格因子。在第二阶段，我们引入了一个新颖的基于分数的分布转移控制生成扩散模型，用于生成训练语义和风格空间之外的潜在因子。此外，辅助伪内分布(InD)和伪OOD图表示被用来增强基于能量的语义OOD检测器的有效性。对三个基准数据集进行的大量实证研究表明，我们的方法胜过了最先进的基线。

更新时间: 2024-10-23 03:05:33

领域: cs.LG

下载: http://arxiv.org/abs/2410.17526v1

TSDS: Data Selection for Task-Specific Model Finetuning

Finetuning foundation models for specific tasks is an emerging paradigm in modern machine learning. The efficacy of task-specific finetuning largely depends on the selection of appropriate training data. We present TSDS (Task-Specific Data Selection), a framework to select data for task-specific model finetuning, guided by a small but representative set of examples from the target task. To do so, we formulate data selection for task-specific finetuning as an optimization problem with a distribution alignment loss based on optimal transport to capture the discrepancy between the selected data and the target distribution. In addition, we add a regularizer to encourage the diversity of the selected data and incorporate kernel density estimation into the regularizer to reduce the negative effects of near-duplicates among the candidate data. We connect our optimization problem to nearest neighbor search and design efficient algorithms to compute the optimal solution based on approximate nearest neighbor search techniques. We evaluate our method on data selection for both continued pretraining and instruction tuning of language models. We show that instruction tuning using data selected by our method with a 1% selection ratio often outperforms using the full dataset and beats the baseline selection methods by 1.5 points in F1 score on average.

Updated: 2024-10-23 03:00:41

标题: TSDS：用于任务特定模型微调的数据选择

摘要: 将基础模型进行微调以适用于特定任务是现代机器学习中的一种新兴范式。任务特定微调的有效性在很大程度上取决于适当选择的训练数据。我们提出了TSDS（Task-Specific Data Selection），这是一个框架，根据目标任务中的一小部分典型示例来选择数据，以用于任务特定模型微调。为此，我们将任务特定微调的数据选择形式化为一个基于最优输运的分布对齐损失的优化问题，以捕捉所选数据与目标分布之间的差异。此外，我们添加了一个正则化项来鼓励所选数据的多样性，并将核密度估计纳入正则化项中，以减少候选数据中近似重复数据的负面影响。我们将我们的优化问题与最近邻搜索联系起来，并设计了基于近似最近邻搜索技术的有效算法来计算最优解。我们评估了我们的方法在语言模型的继续预训练和指令微调的数据选择上的表现。我们展示了使用我们方法选择的数据进行指令微调，其选择比例为1%时，通常优于使用完整数据集，并且平均在F1分数上击败基准选择方法1.5个点。

更新时间: 2024-10-23 03:00:41

领域: cs.LG,cs.AI,cs.CL,68T50, 68T01,I.2.6; I.2.7

下载: http://arxiv.org/abs/2410.11303v2

Comparing Quantum Encoding Techniques

As quantum computers continue to become more capable, the possibilities of their applications increase. For example, quantum techniques are being integrated with classical neural networks to perform machine learning. In order to be used in this way, or for any other widespread use like quantum chemistry simulations or cryptographic applications, classical data must be converted into quantum states through quantum encoding. There are three fundamental encoding methods: basis, amplitude, and rotation, as well as several proposed combinations. This study explores the encoding methods, specifically in the context of hybrid quantum-classical machine learning. Using the QuClassi quantum neural network architecture to perform binary classification of the `3' and `6' digits from the MNIST datasets, this study obtains several metrics such as accuracy, entropy, loss, and resistance to noise, while considering resource usage and computational complexity to compare the three main encoding methods.

Updated: 2024-10-23 02:55:57

标题: 比较量子编码技术

摘要: 随着量子计算机的能力不断增强，其应用的可能性也在增加。例如，量子技术正在与经典神经网络集成，以进行机器学习。为了以这种方式使用，或者用于量子化学模拟或加密应用等其他广泛用途，经典数据必须通过量子编码转换为量子态。有三种基本的编码方法：基础、幅度和旋转，以及几种提出的组合。本研究探讨了编码方法，特别是在混合量子-经典机器学习的背景下。利用QuClassi量子神经网络架构对MNIST数据集中的“3”和“6”数字进行二进制分类，本研究获得了准确性、熵、损失和抗噪声性等多个指标，同时考虑资源使用和计算复杂性，比较了三种主要编码方法。

更新时间: 2024-10-23 02:55:57

领域: quant-ph,cs.ET,cs.LG

下载: http://arxiv.org/abs/2410.09121v2

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail images or textbook illustrations. Is it possible for machines to achieve a similar multimodal understanding capability? In response, we present Understanding Pun with Image Explanations (UNPIE), a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities. Puns serve as the ideal subject for this evaluation due to their intrinsic ambiguity. Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings. We pose three multimodal challenges with the annotations to assess different aspects of multimodal literacy; Pun Grounding, Disambiguation, and Reconstruction. The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context, particularly as the complexity of the tasks increases.

Updated: 2024-10-23 02:55:20

标题: 视觉语言模型能够通过视觉线索解决文本歧义吗？让视觉双关语告诉你！

摘要: 人类具有多模式识字能力，使他们能够积极地整合来自各种模态的信息以形成推理。面对文本中的词汇歧义等挑战，我们通过其他模态，如缩略图像或教科书插图来补充这一能力。机器是否可能实现类似的多模式理解能力？作为回应，我们提出了理解带有图像解释的双关语（UNPIE），这是一个旨在评估多模式输入对解决词汇歧义的影响的新型基准。双关语由于其固有的歧义性而成为此评估的理想对象。我们的数据集包括1,000个双关语，每个都附有解释两种含义的图像。我们提出了三个多模式挑战，以评估多模式识字的不同方面；双关语接地、消歧和重建。结果表明，各种苏格拉底模型和视觉语言模型在给定视觉背景时优于仅文本模型，尤其是在任务复杂度增加时。

更新时间: 2024-10-23 02:55:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.01023v2

No more hard prompts: SoftSRV prompting for synthetic data generation

We present a novel soft prompt based framework, SoftSRV, that leverages a frozen pre-trained large language model (LLM) to generate targeted synthetic text sequences. Given a sample from the target distribution, our proposed framework uses data-driven loss minimization to train a parameterized "contextual" soft prompt. This soft prompt is then used to steer the frozen LLM to generate synthetic sequences that are similar to the target distribution. We argue that SoftSRV provides a practical improvement over common hard-prompting approaches that rely on human-curated prompt-templates, which can be idiosyncratic, labor-intensive to craft, and may need to be specialized per domain. We empirically evaluate SoftSRV and hard-prompting baselines by generating synthetic data to fine-tune a small Gemma model on three different domains (coding, math, reasoning). To stress the generality of SoftSRV, we perform these evaluations without any particular specialization of the framework to each domain. We find that SoftSRV significantly improves upon hard-prompting baselines, generating data with superior fine-tuning performance and that better matches the target distribution according to the MAUVE similarity metric.

Updated: 2024-10-23 02:55:14

标题: 不再使用困难提示：SoftSRV提示用于合成数据生成

摘要: 我们提出了一个新颖的基于软提示的框架SoftSRV，利用冻结的预训练大型语言模型（LLM）生成针对性的合成文本序列。给定目标分布中的一个样本，我们提出的框架使用数据驱动的损失最小化来训练一个参数化的“上下文”软提示。然后使用这个软提示来引导冻结的LLM生成与目标分布相似的合成序列。我们认为SoftSRV相对于依赖于人为策划的硬提示模板的常见方法提供了实际改进，这些模板可能是特殊的、费时费力的制作，并且可能需要针对每个领域进行专门化。我们通过生成合成数据来对SoftSRV和硬提示基线进行实证评估，以对三个不同领域（编码、数学、推理）上的小型Gemma模型进行微调。为了强调SoftSRV的通用性，我们在没有将框架专门化到每个领域的情况下进行这些评估。我们发现SoftSRV在生成具有优越微调性能并根据MAUVE相似度度量更好地匹配目标分布的数据方面显著优于硬提示基线。

更新时间: 2024-10-23 02:55:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.16534v2

Feature Homomorphism -- A Cryptographic Scheme For Data Verification Under Ciphertext-Only Conditions

Privacy computing involves the extensive exchange and processing of encrypted data. For the parties involved in these interactions, how to determine the consistency of exchanged data without accessing the original data, ensuring tamper resistance, non-repudiation, quality traceability, indexing, and retrieval during the use of encrypted data, which is a key topic of achieving "Data Availability versus Visibility". This paper proposes a new type of homomorphism: Feature Homomorphism, and based on this feature, introduces a cryptographic scheme for data verification under ciphertext-only conditions. The proposed scheme involves designing a group of algorithms that meet the requirements outlined in this paper, including encryption/decryption algorithms and Feature Homomorphic Algorithm. This group of algorithms not only allows for the encryption and decryption of data but also ensures that the plaintext and its corresponding ciphertext, encrypted using the specified encryption algorithm, satisfy the following property: the eigenvalue of the plaintext obtained using the Feature Homomorphic Algorithm is equal to the eigenvalue of the ciphertext obtained using the same algorithm. With this group of algorithms, it is possible to verify data consistency directly by comparing the eigenvalues of the plaintext and ciphertext without accessing the original data (i.e., under ciphertext-only conditions). This can be used for tamper resistance, non-repudiation, and quality traceability. Additionally, the eigenvalue can serve as a ciphertext index, enabling searchable encryption. This scheme completes a piece of the puzzle in homomorphic encryption. Keywords: Privacy Computing, Data Consistency, Searchable Encryption, Zero-Knowledge Proof, Feature Homomorphism

Updated: 2024-10-23 02:52:58

标题: 特征同态---一种在仅有密文条件下进行数据验证的加密方案

摘要: 隐私计算涉及加密数据的广泛交换和处理。对于参与这些交互的各方来说，如何在不访问原始数据的情况下确定交换数据的一致性，确保抗篡改性、不可否认性、质量可追溯性、索引和在使用加密数据期间检索，是实现“数据可用性与可见性”的关键主题。本文提出了一种新型同态性：特征同态性，并基于这一特征，介绍了一种在仅有密文条件下进行数据验证的加密方案。所提出的方案涉及设计一组算法，满足本文中概述的要求，包括加密/解密算法和特征同态算法。这组算法不仅允许对数据进行加密和解密，还确保使用指定的加密算法加密的明文及其对应的密文满足以下属性：使用特征同态算法获得的明文的特征值等于使用相同算法获得的密文的特征值。通过这组算法，可以直接通过比较明文和密文的特征值来验证数据的一致性，而无需访问原始数据（即，在仅有密文的条件下）。这可以用于抗篡改性、不可否认性和质量可追溯性。此外，特征值可以作为密文索引，实现可搜索加密。这个方案完成了同态加密中的一个难题。关键词：隐私计算、数据一致性、可搜索加密、零知识证明、特征同态性

更新时间: 2024-10-23 02:52:58

领域: cs.CR

下载: http://arxiv.org/abs/2410.17106v2

MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

Autonomous agents powered by large language models (LLMs) show promising potential in assistive tasks across various domains, including mobile device control. As these agents interact directly with personal information and device settings, ensuring their safe and reliable behavior is crucial to prevent undesirable outcomes. However, no benchmark exists for standardized evaluation of the safety of mobile device-control agents. In this work, we introduce MobileSafetyBench, a benchmark designed to evaluate the safety of device-control agents within a realistic mobile environment based on Android emulators. We develop a diverse set of tasks involving interactions with various mobile applications, including messaging and banking applications. To clearly evaluate safety apart from general capabilities, we design separate tasks measuring safety and tasks evaluating helpfulness. The safety tasks challenge agents with managing potential risks prevalent in daily life and include tests to evaluate robustness against indirect prompt injections. Our experiments demonstrate that while baseline agents, based on state-of-the-art LLMs, perform well in executing helpful tasks, they show poor performance in safety tasks. To mitigate these safety concerns, we propose a prompting method that encourages agents to prioritize safety considerations. While this method shows promise in promoting safer behaviors, there is still considerable room for improvement to fully earn user trust. This highlights the urgent need for continued research to develop more robust safety mechanisms in mobile environments. We open-source our benchmark at: https://mobilesafetybench.github.io/.

Updated: 2024-10-23 02:51:43

标题: MobileSafetyBench: 评估移动设备控制中自主代理的安全性

摘要: 由大型语言模型（LLMs）驱动的自主代理在各个领域中展示了协助任务的潜力，包括移动设备控制。由于这些代理直接与个人信息和设备设置进行交互，确保它们的安全和可靠行为对于防止不良结果至关重要。然而，目前没有针对移动设备控制代理安全性的标准化评估基准存在。在这项工作中，我们介绍了MobileSafetyBench，这是一个基于Android模拟器设计的基准，旨在评估设备控制代理在真实移动环境中的安全性。我们开发了一系列涉及与各种移动应用程序进行交互的任务，包括消息传递和银行应用程序。为了清晰评估安全性而非一般能力，我们设计了分开的任务来衡量安全性和评估帮助性。安全任务挑战代理管理日常生活中普遍存在的潜在风险，并包括评估对间接提示注入的鲁棒性的测试。我们的实验表明，基于最先进的LLMs的基线代理在执行有用任务方面表现良好，但在安全任务中表现不佳。为了缓解这些安全顾虑，我们提出了一种提示方法，鼓励代理优先考虑安全因素。虽然这种方法在促进更安全的行为方面表现出潜力，但仍有很大的改进空间来完全赢得用户信任。这凸显了在移动环境中开发更强大的安全机制的迫切需求。我们在https://mobilesafetybench.github.io/上开源我们的基准。

更新时间: 2024-10-23 02:51:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.17520v1

Univariate Conditional Variational Autoencoder for Morphogenic Patterns Design in Frontal Polymerization-Based Manufacturing

Rapid reaction-thermal diffusion during frontal polymerization (FP) with variations in initial and boundary conditions destabilizes the planar mode of front propagation, leading to spatially varying complex hierarchical patterns in polymeric materials. Although modern reaction-diffusion models can predict the patterns resulting from unstable FP, the inverse design of patterns, which aims to retrieve process conditions that produce a desired pattern, remains an open challenge due to the nonunique and nonintuitive mapping between process conditions and patterns. In this work, we propose a novel probabilistic generative model named univariate conditional variational autoencoder (UcVAE) for the inverse design of hierarchical patterns in FP-based manufacturing. Unlike the cVAE, which encodes both the design space and the design target, the UcVAE encodes only the design space. In the encoder of the UcVAE, the number of training parameters is significantly reduced compared to the cVAE, resulting in a shorter training time while maintaining comparable performance. Given desired pattern images, the trained UcVAE can generate multiple process condition solutions that produce high-fidelity hierarchical patterns.

Updated: 2024-10-23 02:50:05

标题: 基于前沿聚合物化学的单变量条件变分自编码器用于形态模式设计

摘要: 在前沿聚合（FP）过程中，快速反应-热扩散以及初始和边界条件的变化会破坏前沿传播的平面模式，导致高度复杂的空间变化的分层图案在聚合材料中产生。尽管现代反应-扩散模型可以预测不稳定FP产生的图案，但图案的反向设计，即旨在恢复产生期望图案的过程条件，由于过程条件和图案之间的非唯一和非直观映射而仍然是一个挑战。在这项工作中，我们提出了一种新颖的概率生成模型，名为单变量条件变分自动编码器（UcVAE），用于基于FP的制造中的分层图案的反向设计。与cVAE不同，cVAE既编码设计空间又编码设计目标，而UcVAE只编码设计空间。在UcVAE的编码器中，与cVAE相比，训练参数数量大大减少，从而在保持可比性能的同时缩短了训练时间。给定期望的图案图像，经过训练的UcVAE可以生成多个产生高保真分层图案的过程条件解决方案。

更新时间: 2024-10-23 02:50:05

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2410.17518v1

Bridging Swarm Intelligence and Reinforcement Learning

Swarm intelligence (SI) explores how large groups of simple individuals (e.g., insects, fish, birds) collaborate to produce complex behaviors, exemplifying that the whole is greater than the sum of its parts. A fundamental task in SI is Collective Decision-Making (CDM), where a group selects the best option among several alternatives, such as choosing an optimal foraging site. In this work, we demonstrate a theoretical and empirical equivalence between CDM and single-agent reinforcement learning (RL) in multi-armed bandit problems, utilizing concepts from opinion dynamics, evolutionary game theory, and RL. This equivalence bridges the gap between SI and RL and leads us to introduce a novel abstract RL update rule called Maynard-Cross Learning. Additionally, it provides a new population-based perspective on common RL practices like learning rate adjustment and batching. Our findings enable cross-disciplinary fertilization between RL and SI, allowing techniques from one field to enhance the understanding and methodologies of the other.

Updated: 2024-10-23 02:49:37

标题: 连接群体智能和强化学习

摘要: 群体智能（SI）探讨了大量简单个体（如昆虫、鱼类、鸟类）如何协作产生复杂行为，展示整体大于部分之和的特性。SI中的一个基本任务是集体决策（CDM），其中群体在多个选择中选出最佳选项，例如选择最佳的觅食地点。在这项工作中，我们展示了CDM与多臂赌博问题中单一智能体强化学习（RL）之间的理论和实证等价性，利用了来自意见动态、进化博弈论和RL的概念。这种等价性弥合了SI与RL之间的差距，并引入了一种称为Maynard-Cross Learning的新的抽象RL更新规则。此外，它为常见RL实践（如学习速率调整和批处理）提供了新的基于群体的视角。我们的发现促进了RL和SI之间的跨学科交叉，使一领域的技术能够增强对另一领域的理解和方法论。

更新时间: 2024-10-23 02:49:37

领域: cs.MA,cs.AI,cs.GT

下载: http://arxiv.org/abs/2410.17517v1

RegExplainer: Generating Explanations for Graph Neural Networks in Regression Task

Graph regression is a fundamental task and has received increasing attention in a wide range of graph learning tasks. However, the inference process is often not interpretable. Most existing explanation techniques are limited to understanding GNN behaviors in classification tasks. In this work, we seek an explanation to interpret the graph regression models (XAIG-R). We show that existing methods overlook the distribution shifting and continuously ordered decision boundary, which hinders them away from being applied in the regression tasks. To address these challenges, we propose a novel objective based on the information bottleneck theory and introduce a new mix-up framework, which could support various GNNs in a model-agnostic manner. We further present a contrastive learning strategy to tackle the continuously ordered labels in regression task. To empirically verify the effectiveness of the proposed method, we introduce three benchmark datasets and a real-life dataset for evaluation. Extensive experiments show the effectiveness of the proposed method in interpreting GNN models in regression tasks.

Updated: 2024-10-23 02:43:03

标题: RegExplainer：在回归任务中为图神经网络生成解释说明

摘要: 图形回归是一项基础任务，在各种图学习任务中越来越受到关注。然而，推理过程通常不易解释。大多数现有的解释技术仅限于理解分类任务中的GNN行为。在这项工作中，我们寻求解释图回归模型（XAIG-R）的方法。我们发现现有方法忽视了分布偏移和连续排序的决策边界，这使它们难以应用于回归任务。为了解决这些挑战，我们提出了一种基于信息瓶颈理论的新目标，并引入了一个新的混合框架，可以以模型无关的方式支持各种GNN。我们进一步提出了一种对抗学习策略，以应对回归任务中的连续排序标签。为了在实证上验证所提出方法的有效性，我们引入了三个基准数据集和一个真实数据集进行评估。大量实验显示了所提出方法在解释回归任务中的GNN模型方面的有效性。

更新时间: 2024-10-23 02:43:03

领域: cs.LG,cs.AI,I.2.0

下载: http://arxiv.org/abs/2307.07840v3

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent extracted by 2D VAEs without quantization. The temporal compression is simply realized by uniform frame sampling which results in unsmooth motion between consecutive frames. Currently, there lacks of a commonly used continuous video (3D) VAE for latent diffusion-based video models in the research community. Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). The compatibility is achieved by the proposed novel latent space regularization, which involves formulating a regularization loss using the image VAE. Benefiting from the latent space compatibility, video models can be trained seamlessly from pre-trained T2I or video models in a truly spatio-temporally compressed latent space, rather than simply sampling video frames at equal intervals. With our CV-VAE, existing video models can generate four times more frames with minimal finetuning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed video VAE.

Updated: 2024-10-23 02:38:44

标题: CV-VAE：一种用于潜在生成视频模型的兼容视频VAE

摘要: 视频的时空压缩，在OpenAI的SORA和许多其他视频生成模型中发挥着关键作用，利用诸如变分自动编码器（VAE）之类的网络。例如，许多LLM类似的视频模型在VQVAE框架中学习来自3D VAE的离散标记的分布，而大多数基于扩散的视频模型捕获由2D VAE提取的连续潜在变量的分布而不经过量化。时间压缩通过均匀帧采样简单实现，导致连续帧之间运动不平滑。目前，在研究界缺乏一种常用的用于基于扩散的视频模型的连续视频（3D）VAE。此外，由于当前基于扩散的方法通常使用预训练的文本到图像（T2I）模型实现，直接训练视频VAE而不考虑与现有T2I模型兼容性将导致它们之间的潜在空间差距，即使使用T2I模型作为初始化，也需要巨大的计算资源来弥合差距。为了解决这个问题，我们提出了一种用于训练视频模型的视频VAE的方法，即CV-VAE，其潜在空间与给定图像VAE（例如Stable Diffusion（SD）的图像VAE）兼容。该兼容性通过提出的新颖潜在空间正则化实现，其中利用图像VAE制定正则化损失。受益于潜在空间的兼容性，视频模型可以在真正的时空压缩潜在空间中无缝地从预训练的T2I或视频模型中训练，而不仅仅是在等间隔采样视频帧。有了我们的CV-VAE，现有视频模型可以在最小微调的情况下生成四倍更多的帧。进行了大量实验证明了所提出的视频VAE的有效性。

更新时间: 2024-10-23 02:38:44

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2405.20279v2

Time and Frequency Synergy for Source-Free Time-Series Domain Adaptations

The issue of source-free time-series domain adaptations still gains scarce research attentions. On the other hand, existing approaches rely solely on time-domain features ignoring frequency components providing complementary information. This paper proposes Time Frequency Domain Adaptation (TFDA), a method to cope with the source-free time-series domain adaptation problems. TFDA is developed with a dual branch network structure fully utilizing both time and frequency features in delivering final predictions. It induces pseudo-labels based on a neighborhood concept where predictions of a sample group are aggregated to generate reliable pseudo labels. The concept of contrastive learning is carried out in both time and frequency domains with pseudo label information and a negative pair exclusion strategy to make valid neighborhood assumptions. In addition, the time-frequency consistency technique is proposed using the self-distillation strategy while the uncertainty reduction strategy is implemented to alleviate uncertainties due to the domain shift problem. Last but not least, the curriculum learning strategy is integrated to combat noisy pseudo labels. Our experiments demonstrate the advantage of our approach over prior arts with noticeable margins in benchmark problems.

Updated: 2024-10-23 02:29:50

标题: 时间和频率协同作用在无源时间序列领域适应中的应用

摘要: 时间序列领域自适应问题仍然受到很少的研究关注。另一方面，现有方法仅依赖于时间域特征，忽略了提供互补信息的频率分量。本文提出了时间频率域自适应（TFDA）方法，用于处理无源时间序列领域自适应问题。TFDA采用双分支网络结构开发，充分利用时间和频率特征以提供最终预测。它基于邻域概念引入伪标签，其中样本组的预测被聚合以生成可靠的伪标签。在时间和频率领域中进行对比学习，利用伪标签信息和负对排除策略进行有效的邻域假设。此外，提出了时间频率一致性技术，使用自蒸馏策略，同时实施不确定性减少策略以减轻由于域转移问题而产生的不确定性。最后，集成课程学习策略以对抗嘈杂的伪标签。我们的实验表明，我们的方法在基准问题中优于以往方法，并取得显著的优势。

更新时间: 2024-10-23 02:29:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17511v1

AskBeacon -- Performing genomic data exchange and analytics with natural language

Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables Large Language Models to be applied to securely shared cohorts via the GA4GH Beacon protocol. By simply "asking" Beacon, actionable insights can be gained, analyzed and made publication-ready.

Updated: 2024-10-23 02:29:24

标题: AskBeacon-使用自然语言执行基因组数据交换和分析

摘要: 帮助临床医生和研究人员通过消除技术障碍直接与全球基因组数据资源互动对于医学基因组学至关重要。AskBeacon通过GA4GH Beacon协议使大型语言模型能够应用于安全共享的队列。通过简单地“询问”Beacon，可以获得可操作的见解，进行分析并准备出版。

更新时间: 2024-10-23 02:29:24

领域: cs.AI,cs.CY,q-bio.GN

下载: http://arxiv.org/abs/2410.16700v2

RoPINN: Region Optimized Physics-Informed Neural Networks

Physics-informed neural networks (PINNs) have been widely applied to solve partial differential equations (PDEs) by enforcing outputs and gradients of deep models to satisfy target equations. Due to the limitation of numerical computation, PINNs are conventionally optimized on finite selected points. However, since PDEs are usually defined on continuous domains, solely optimizing models on scattered points may be insufficient to obtain an accurate solution for the whole domain. To mitigate this inherent deficiency of the default scatter-point optimization, this paper proposes and theoretically studies a new training paradigm as region optimization. Concretely, we propose to extend the optimization process of PINNs from isolated points to their continuous neighborhood regions, which can theoretically decrease the generalization error, especially for hidden high-order constraints of PDEs. A practical training algorithm, Region Optimized PINN (RoPINN), is seamlessly derived from this new paradigm, which is implemented by a straightforward but effective Monte Carlo sampling method. By calibrating the sampling process into trust regions, RoPINN finely balances optimization and generalization error. Experimentally, RoPINN consistently boosts the performance of diverse PINNs on a wide range of PDEs without extra backpropagation or gradient calculation. Code is available at this repository: https://github.com/thuml/RoPINN.

Updated: 2024-10-23 02:26:20

标题: RoPINN: 区域优化的基于物理信息的神经网络

摘要: 物理学知识引导的神经网络（PINNs）已被广泛应用于通过强制深度模型的输出和梯度满足目标方程来解决偏微分方程（PDEs）。由于数值计算的限制，PINNs通常在有限选定点上进行优化。然而，由于PDEs通常在连续域上定义，仅在散点上优化模型可能不足以获得整个域的准确解。为了减轻默认散点优化的固有缺陷，本文提出并理论上研究了一种新的训练范式，即区域优化。具体而言，我们提出将PINNs的优化过程从孤立点扩展到它们的连续邻域区域，这在理论上可以降低泛化误差，特别是对于PDEs的隐藏高阶约束。通过一个简单但有效的蒙特卡洛采样方法，从这个新范式中无缝推导出了一个实用的训练算法，即区域优化的PINN（RoPINN）。通过将采样过程校准到信任区域，RoPINN可以很好地平衡优化和泛化误差。在实验中，RoPINN在各种PDEs上始终提升了不同PINNs的性能，而无需额外的反向传播或梯度计算。代码可在此存储库中找到：https://github.com/thuml/RoPINN。

更新时间: 2024-10-23 02:26:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.14369v3

Congestion Forecast for Trains with Railroad-Graph-based Semi-Supervised Learning using Sparse Passenger Reports

Forecasting rail congestion is crucial for efficient mobility in transport systems. We present rail congestion forecasting using reports from passengers collected through a transit application. Although reports from passengers have received attention from researchers, ensuring a sufficient volume of reports is challenging due to passenger's reluctance. The limited number of reports results in the sparsity of the congestion label, which can be an issue in building a stable prediction model. To address this issue, we propose a semi-supervised method for congestion forecasting for trains, or SURCONFORT. Our key idea is twofold: firstly, we adopt semi-supervised learning to leverage sparsely labeled data and many unlabeled data. Secondly, in order to complement the unlabeled data from nearby stations, we design a railway network-oriented graph and apply the graph to semi-supervised graph regularization. Empirical experiments with actual reporting data show that SURCONFORT improved the forecasting performance by 14.9% over state-of-the-art methods under the label sparsity.

Updated: 2024-10-23 02:25:53

标题: 基于铁路图的稀疏乘客报告的半监督学习方法预测火车拥堵情况

摘要: 预测铁路拥堵对于交通系统的有效流动至关重要。我们提出使用乘客通过一款公交应用程序收集的报告来预测铁路拥堵。尽管乘客的报告受到研究人员的关注，但由于乘客的不愿意，确保足够量的报告是具有挑战性的。报告数量有限导致拥堵标签的稀疏性，这可能会影响构建稳定的预测模型。为了解决这个问题，我们提出了一种用于火车拥堵预测的半监督方法，即SURCONFORT。我们的关键思想是双重的：首先，我们采用半监督学习来利用稀疏标记数据和许多未标记数据。其次，为了补充来自附近车站的未标记数据，我们设计了一个铁路网络导向的图，并将该图应用于半监督图正则化。实证实验证明，SURCONFORT在标签稀疏性下将预测性能提高了14.9％以上，超过了现有方法。

更新时间: 2024-10-23 02:25:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17510v1

WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

The need for effective unlearning mechanisms in large language models (LLMs) is increasingly urgent, driven by the necessity to adhere to data regulations and foster ethical generative AI practices. Despite growing interest of LLM unlearning, much of the existing research has focused on varied unlearning method designs to boost effectiveness and efficiency. However, the inherent relationship between model weights and LLM unlearning has not been extensively examined. In this paper, we systematically explore how model weights interact with unlearning processes in LLMs and we design the weight attribution-guided LLM unlearning method, WAGLE, which unveils the interconnections between 'influence' of weights and 'influence' of data to forget and retain in LLM generation. By strategically guiding the LLM unlearning across different types of unlearning methods and tasks, WAGLE can erase the undesired content, while maintaining the performance of the original tasks. We refer to the weight attribution-guided LLM unlearning method as WAGLE, which unveils the interconnections between 'influence' of weights and 'influence' of data to forget and retain in LLM generation. Our extensive experiments show that WAGLE boosts unlearning performance across a range of LLM unlearning methods such as gradient difference and (negative) preference optimization, applications such as fictitious unlearning, malicious use prevention, and copyrighted information removal, and models including Zephyr-7b-beta and Llama2-7b. To the best of our knowledge, our work offers the first principled method for attributing and pinpointing the influential weights in enhancing LLM unlearning. It stands in contrast to previous methods that lack weight attribution and simpler weight attribution techniques.

Updated: 2024-10-23 02:22:07

标题: WAGLE：大型语言模型中有效和模块化去学习的战略权重归因

摘要: 越来越迫切地需要在大型语言模型（LLMs）中实现有效的遗忘机制，这是因为需要遵守数据法规并促进道德生成式人工智能实践。尽管LLM遗忘引起了越来越多的关注，但现有研究大多集中在设计各种遗忘方法以提高效果和效率。然而，模型权重与LLM遗忘之间的固有关系尚未得到广泛研究。本文系统地探讨了模型权重如何与LLMs中的遗忘过程互动，并设计了基于权重归因的LLM遗忘方法WAGLE，揭示了权重的“影响力”与在LLM生成中遗忘和保留数据的“影响力”之间的相互关系。通过在不同类型的遗忘方法和任务中战略性地引导LLM遗忘，WAGLE可以消除不需要的内容，同时保持原始任务的性能。我们将基于权重归因的LLM遗忘方法称为WAGLE，它揭示了在LLM生成中遗忘和保留数据的“影响力”的权重之间的相互关系。我们的广泛实验表明，WAGLE提升了在各种LLM遗忘方法（如梯度差异和（负面）偏好优化）、应用（如虚构遗忘、恶意使用预防和版权信息去除）以及模型（包括Zephyr-7b-beta和Llama2-7b）中的遗忘性能。据我们所知，我们的工作为提升LLM遗忘提供了首个基于原则的权重归因和指向具有影响力的权重的方法。这与先前缺乏权重归因和更简单的权重归因技术的方法形成鲜明对比。

更新时间: 2024-10-23 02:22:07

领域: cs.LG

下载: http://arxiv.org/abs/2410.17509v1

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Although large language models (LLMs) demonstrate impressive proficiency in various tasks, they present potential safety risks, such as `jailbreaks', where malicious inputs can coerce LLMs into generating harmful content. To address these issues, many LLM developers have implemented various safety measures to align these models. This alignment involves several techniques, including data filtering during pre-training, supervised fine-tuning, reinforcement learning from human feedback, and red-teaming exercises. These methods often introduce deliberate and intentional biases similar to Political Correctness (PC) to ensure the ethical behavior of LLMs. In this paper, we delve into the intentional biases injected into LLMs for safety purposes and examine methods to circumvent these safety alignment techniques. Notably, these intentional biases result in a jailbreaking success rate in GPT-4o models that differs by 20% between non-binary and cisgender keywords and by 16% between white and black keywords, even when the other parts of the prompts are identical. We introduce the concept of PCJailbreak, highlighting the inherent risks posed by these safety-induced biases. Additionally, we propose an efficient defense method PCDefense, which prevents jailbreak attempts by injecting defense prompts prior to generation. PCDefense stands as an appealing alternative to Guard Models, such as Llama-Guard, that require additional inference cost after text generation. Our findings emphasize the urgent need for LLM developers to adopt a more responsible approach when designing and implementing safety measures.

Updated: 2024-10-23 02:15:52

标题: LLMs是否具有政治正确性？分析AI系统中的道德偏见和越狱漏洞

摘要: 尽管大型语言模型(LLMs)在各种任务中展示出令人印象深刻的熟练度，但它们存在潜在的安全风险，比如“越狱”，恶意输入可以迫使LLMs生成有害内容。为了解决这些问题，许多LLM开发者已经实施了各种安全措施来对齐这些模型。这种对齐涉及多种技术，包括在预训练期间进行数据过滤、监督微调、从人类反馈中进行强化学习，以及红队演习。这些方法通常会引入类似政治正确性(PC)的故意偏见，以确保LLMs的道德行为。在本文中，我们深入探讨了为了安全目的而注入LLMs的故意偏见，并研究了规避这些安全对齐技术的方法。值得注意的是，这些故意偏见导致在GPT-4o模型中越狱成功率在非二进制和异性恋关键词之间相差20%，在白人和黑人关键词之间相差16%，即使提示的其他部分是相同的。我们引入了PCJailbreak的概念，强调了这些安全诱发偏见所带来的固有风险。此外，我们提出了一种高效的防御方法PCDefense，通过在生成之前注入防御提示来阻止越狱尝试。PCDefense作为一个吸引人的替代方案，与需要在生成文本后额外推理成本的守护模型(Llama-Guard)不同。我们的发现强调了LLM开发者在设计和实施安全措施时需要采取更负责任的方法的迫切性。

更新时间: 2024-10-23 02:15:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.13334v2

Mitigating Graph Covariate Shift via Score-based Out-of-distribution Augmentation

Distribution shifts between training and testing datasets significantly impair the model performance on graph learning. A commonly-taken causal view in graph invariant learning suggests that stable predictive features of graphs are causally associated with labels, whereas varying environmental features lead to distribution shifts. In particular, covariate shifts caused by unseen environments in test graphs underscore the critical need for out-of-distribution (OOD) generalization. Existing graph augmentation methods designed to address the covariate shift often disentangle the stable and environmental features in the input space, and selectively perturb or mixup the environmental features. However, such perturbation-based methods heavily rely on an accurate separation of stable and environmental features, and their exploration ability is confined to existing environmental features in the training distribution. To overcome these limitations, we introduce a novel approach using score-based graph generation strategies that synthesize unseen environmental features while preserving the validity and stable features of overall graph patterns. Our comprehensive empirical evaluations demonstrate the enhanced effectiveness of our method in improving graph OOD generalization.

Updated: 2024-10-23 02:09:02

标题: 通过基于分数的外分布增强减轻图协变量转移

摘要: 在图学习中，训练集和测试集之间的分布转移显著影响模型性能。在图不变学习中，一个常见的因果观点表明，图的稳定预测特征与标签有因果关联，而不同的环境特征导致分布转移。特别是测试图中未见环境引起的协变量转移强调了对超出分布（OOD）泛化的关键需求。现有的用于解决协变量转移的图增强方法通常会将输入空间中的稳定特征和环境特征分离开来，并选择性地扰动或混合环境特征。然而，这种基于扰动的方法严重依赖于对稳定和环境特征的准确分离，而且它们的探索能力仅限于训练分布中的现有环境特征。为了克服这些限制，我们引入了一种使用基于分数的图生成策略的新方法，该方法在保持整体图案的有效性和稳定特征的同时合成未见环境特征。我们的全面实证评估证明了我们的方法在改进图OOD泛化方面的增强有效性。

更新时间: 2024-10-23 02:09:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17506v1

Online Differentially Private Synthetic Data Generation

We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(\log(t)t^{-1/d})$ for $d\geq 2$ and $O(\log^{4.5}(t)t^{-1})$ for $d=1$ in the 1-Wasserstein distance. This result extends the previous work on the continual release model for counting queries to Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.

Updated: 2024-10-23 02:07:09

标题: 在线差分隐私合成数据生成

摘要: 我们提出了一种在线差分私密合成数据生成的多项式时间算法。针对在超立方体$[0,1]^d$内的数据流和无限时间范围，我们开发了一种在线算法，它可以在每个时间$t$生成一个差分私密的合成数据集。对于$d\geq 2$，该算法在1-Wasserstein距离下实现了接近最优精度界限$O(\log(t)t^{-1/d})$，对于$d=1$，精度界限为$O(\log^{4.5}(t)t^{-1})$。这个结果扩展了之前针对计数查询的持续发布模型的工作到利普希茨查询。与离线情况相比，在那种情况下整个数据集一次性可用，我们的方法只需要额外的对数多项式因子来实现精度界限。

更新时间: 2024-10-23 02:07:09

领域: math.ST,cs.DS,cs.LG,math.PR,stat.TH

下载: http://arxiv.org/abs/2402.08012v3

An Ontology-Enabled Approach For User-Centered and Knowledge-Enabled Explanations of AI Systems

Explainable Artificial Intelligence (AI) focuses on helping humans understand the working of AI systems or their decisions and has been a cornerstone of AI for decades. Recent research in explainability has focused on explaining the workings of AI models or model explainability. There have also been several position statements and review papers detailing the needs of end-users for user-centered explainability but fewer implementations. Hence, this thesis seeks to bridge some gaps between model and user-centered explainability. We create an explanation ontology (EO) to represent literature-derived explanation types via their supporting components. We implement a knowledge-augmented question-answering (QA) pipeline to support contextual explanations in a clinical setting. Finally, we are implementing a system to combine explanations from different AI methods and data modalities. Within the EO, we can represent fifteen different explanation types, and we have tested these representations in six exemplar use cases. We find that knowledge augmentations improve the performance of base large language models in the contextualized QA, and the performance is variable across disease groups. In the same setting, clinicians also indicated that they prefer to see actionability as one of the main foci in explanations. In our explanations combination method, we plan to use similarity metrics to determine the similarity of explanations in a chronic disease detection setting. Overall, through this thesis, we design methods that can support knowledge-enabled explanations across different use cases, accounting for the methods in today's AI era that can generate the supporting components of these explanations and domain knowledge sources that can enhance them.

Updated: 2024-10-23 02:03:49

标题: 一个启用本体论的方法，用于用户为中心和知识启用的AI系统解释

摘要: 可解释的人工智能（AI）侧重于帮助人类理解AI系统的工作方式或其决策，并且已经是人工智能的基石数十年。最近在可解释性方面的研究集中在解释AI模型的运作或模型可解释性。还有一些关于终端用户对用户中心可解释性需求的立场声明和评论论文，但实施较少。因此，这篇论文旨在弥合模型和用户中心可解释性之间的一些差距。我们创建了一个解释本体论（EO），通过其支持组件代表文献衍生的解释类型。我们实施了一个知识增强问答（QA）流程，以支持临床环境中的情境解释。最后，我们正在实施一个系统，将来自不同AI方法和数据模态的解释进行合并。在EO中，我们可以代表十五种不同的解释类型，并且我们已经在六个示例用例中测试了这些表示。我们发现知识增强改善了基础大型语言模型在情境化问答中的性能，并且在疾病群体之间的表现有所不同。在相同环境中，临床医生也指出他们更喜欢看到可行性作为解释的主要焦点之一。在我们的解释组合方法中，我们计划使用相似性度量来确定慢性疾病检测环境中解释的相似性。总的来说，通过这篇论文，我们设计了可以支持不同用例中的知识启用解释的方法，考虑到当今能够生成这些解释的支持组件和可以增强它们的领域知识来源的方法。

更新时间: 2024-10-23 02:03:49

领域: cs.AI

下载: http://arxiv.org/abs/2410.17504v1

Learning Fair and Preferable Allocations through Neural Network

The fair allocation of indivisible resources is a fundamental problem. Existing research has developed various allocation mechanisms or algorithms to satisfy different fairness notions. For example, round robin (RR) was proposed to meet the fairness criterion known as envy-freeness up to one good (EF1). Expert algorithms without mathematical formulations are used in real-world resource allocation problems to find preferable outcomes for users. Therefore, we aim to design mechanisms that strictly satisfy good properties with replicating expert knowledge. However, this problem is challenging because such heuristic rules are often difficult to formalize mathematically, complicating their integration into theoretical frameworks. Additionally, formal algorithms struggle to find preferable outcomes, and directly replicating these implicit rules can result in unfair allocations because human decision-making can introduce biases. In this paper, we aim to learn implicit allocation mechanisms from examples while strictly satisfying fairness constraints, specifically focusing on learning EF1 allocation mechanisms through supervised learning on examples of reported valuations and corresponding allocation outcomes produced by implicit rules. To address this, we developed a neural RR (NRR), a novel neural network that parameterizes RR. NRR is built from a differentiable relaxation of RR and can be trained to learn the agent ordering used for RR. We conducted experiments to learn EF1 allocation mechanisms from examples, demonstrating that our method outperforms baselines in terms of the proximity of predicted allocations and other metrics.

Updated: 2024-10-23 01:47:55

标题: 通过神经网络学习公平和优先分配

摘要: 不可分配资源的公平分配是一个基本问题。现有研究已经开发了各种分配机制或算法来满足不同的公平性概念。例如，循环轮询（RR）被提出来满足被称为一种良好的公平性准则（EF1）的公平性标准。在现实世界的资源分配问题中使用没有数学公式的专家算法来找到用户的优选结果。因此，我们的目标是设计严格满足良好性质并复制专家知识的机制。然而，这个问题具有挑战性，因为这些启发式规则通常难以在数学上形式化，从而使其集成到理论框架中变得复杂。此外，正式的算法很难找到优选结果，并且直接复制这些隐含规则可能导致不公平的分配，因为人类决策可能引入偏见。在这篇论文中，我们的目标是通过示例学习隐含的分配机制，严格满足公平性约束，特别关注通过监督学习在报告的估值示例和由隐含规则产生的相应分配结果上学习EF1分配机制。为了解决这个问题，我们开发了一个神经RR（NRR），这是一个参数化RR的新型神经网络。NRR是从RR的可微松弛构建的，并可以训练学习用于RR的代理排序。我们进行了实验，从示例中学习EF1分配机制，证明我们的方法在预测分配和其他指标的接近度方面优于基线。

更新时间: 2024-10-23 01:47:55

领域: cs.AI

下载: http://arxiv.org/abs/2410.17500v1

Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks

Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL). This success flies in the face of decades of predictions that artificial neural networks cannot master abstract symbol manipulation. We seek to understand the mechanisms that can enable robust symbol processing in transformer networks, illuminating both the unanticipated success, and the significant limitations, of transformers in symbol processing. Borrowing insights from symbolic AI on the power of Production System architectures, we develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing, and create compilers that precisely implement PSL programs in transformer networks which are, by construction, 100% mechanistically interpretable. We demonstrate that PSL is Turing Universal, so the work can inform the understanding of transformer ICL in general. The type of transformer architecture that we compile from PSL programs suggests a number of paths for enhancing transformers' capabilities at symbol processing. (Note: The first section of the paper gives an extended synopsis of the entire paper.)

Updated: 2024-10-23 01:38:10

标题: Transformer网络中符号处理的上下文学习机制

摘要: 大型语言模型(LLMs)通过上下文学习(ICL)展示了在符号处理方面的令人印象深刻的能力。这一成功与几十年来人工神经网络无法掌握抽象符号处理的预测相矛盾。我们试图理解能够在变压器网络中实现稳健符号处理的机制，阐明变压器在符号处理中的意外成功和重要限制。借鉴符号人工智能对产品系统架构强大性的见解，我们开发了一种高级语言PSL，使我们能够编写符号程序进行复杂、抽象的符号处理，并创建编译器，精确实现PSL程序在变压器网络中，这些网络构建时100%的机械可解释性。我们证明PSL是图灵通用的，因此这项工作可以为理解变压器ICL提供一般性信息。我们从PSL程序编译的变压器架构类型暗示了一些增强变压器在符号处理能力方面的途径。(备注：论文的第一部分提供了整个论文的延伸摘要。)

更新时间: 2024-10-23 01:38:10

领域: cs.AI,cs.CL,cs.NE,cs.SC,F.1; I.2

下载: http://arxiv.org/abs/2410.17498v1

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One noted issue is the high latency associated with auto-regressive generation, rendering large LLMs use dependent on advanced computing infrastructure. Assisted decoding, where a smaller draft model guides a larger target model's generation, has helped alleviate this, but remains dependent on alignment between the two models. Thus if the draft model is insufficiently capable on some domain relative to the target model, performance can degrade. Alternatively, one can leverage multiple draft models to better cover the expertise of the target, but when multiple black-box draft models are available, selecting an assistant without details about its construction can be difficult. To better understand this decision making problem, we observe it as a contextual bandit, where a policy must choose a draft model based on a context. We show that even without prior knowledge of the draft models, creating an offline dataset from only outputs of independent draft/target models and training a policy over the alignment of these outputs can accelerate performance on multiple domains provided the candidates are effective. Further results show this to hold on various settings with multiple assisted decoding candidates, highlighting its flexibility and the advantageous role that such decision making can play.

Updated: 2024-10-23 01:36:25

标题: 上下文感知助手选择以提高大型语言模型推理加速的效果

摘要: 尽管大型语言模型（LLMs）被广泛采用，但在资源限制下使用仍然困难重重，其不断增长的规模只会增加使用障碍。一个值得注意的问题是自回归生成所带来的高延迟，使得大型LLMs的使用取决于先进的计算基础设施。辅助解码，即一个较小的草稿模型指导一个较大的目标模型的生成，已经有助于缓解这个问题，但仍然依赖于两个模型之间的对齐。因此，如果草稿模型在某个领域相对于目标模型不够强大，性能可能会下降。另一种方法是利用多个草稿模型来更好地覆盖目标的专业知识，但是当有多个黑匣子草稿模型可用时，选择一个没有关于其构造细节的助手可能会很困难。为了更好地理解这个决策问题，我们将其视为一种上下文臂带，其中一个策略必须基于上下文选择一个草稿模型。我们表明，即使没有关于草稿模型的先验知识，仅从独立的草稿/目标模型的输出中创建一个离线数据集，并训练一个基于这些输出的对齐的策略，可以加速在多个领域上的性能，前提是候选人是有效的。进一步的结果显示，在具有多个辅助解码候选人的各种设置中，这种方法是有效的，突出了其灵活性以及这种决策方式所能起到的有利作用。

更新时间: 2024-10-23 01:36:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.08470v2

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.

Updated: 2024-10-23 01:23:04

标题: 正则化的Q学习

摘要: Q学习是强化学习领域中广泛使用的算法。在查找表设置下，其收敛性已经得到很好的建立。然而，其行为在线性函数逼近情况下被认为是不稳定的。本文开发了一种新的Q学习算法，在使用线性函数逼近时收敛。我们证明，简单地添加一个适当的正则化项可以确保算法的收敛性。我们使用基于切换系统模型的最新分析工具证明了其稳定性。此外，我们通过实验证明，在Q学习与线性函数逼近已知发散的环境中，该算法收敛。我们还提供了算法收敛时解的误差界限。

更新时间: 2024-10-23 01:23:04

领域: cs.LG

下载: http://arxiv.org/abs/2202.05404v7

BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss. Moreover, it consistently exhibits a significant discrimination score, distinguishing between pre-defined target and non-target attacked groups across various datasets and models.

Updated: 2024-10-23 01:14:54

标题: BadFair：使用组条件触发器的后门公平攻击

摘要: 攻击公平性至关重要，因为受损的模型可能引入偏见的结果，破坏信任并加剧在招聘、医疗保健和执法等敏感应用中的不平等现象。这突显了迫切需要了解公平性机制如何被利用，并开发确保公平性和稳健性的防御措施。我们引入了一种新型的后门公平性攻击方法BadFair。BadFair巧妙地构建了一个模型，在正常情况下能够准确和公平地运行，但是当被特定触发器激活时，会对特定群体进行歧视并产生错误结果。这种类型的攻击特别隐蔽和危险，因为它规避了现有的公平性检测方法，在正常使用中保持了公平性的外观。我们的研究结果表明，BadFair在针对目标群体的攻击中平均攻击成功率超过85%，同时仅造成了最小的准确性损失。此外，它始终表现出显著的歧视分数，在各种数据集和模型中能够区分预定义的目标和非目标攻击群体。

更新时间: 2024-10-23 01:14:54

领域: cs.CR,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.17492v1

Hybrid Spatial Representations for Species Distribution Modeling

We address an important problem in ecology called Species Distribution Modeling (SDM), whose goal is to predict whether a species exists at a certain position on Earth. In particular, we tackle a challenging version of this task, where we learn from presence-only data in a community-sourced dataset, model a large number of species simultaneously, and do not use any additional environmental information. Previous work has used neural implicit representations to construct models that achieve promising results. However, implicit representations often generate predictions of limited spatial precision. We attribute this limitation to their inherently global formulation and inability to effectively capture local feature variations. This issue is especially pronounced with presence-only data and a large number of species. To address this, we propose a hybrid embedding scheme that combines both implicit and explicit embeddings. Specifically, the explicit embedding is implemented with a multiresolution hashgrid, enabling our models to better capture local information. Experiments demonstrate that our results exceed other works by a large margin on various standard benchmarks, and that the hybrid representation is better than both purely implicit and explicit ones. Qualitative visualizations and comprehensive ablation studies reveal that our hybrid representation successfully addresses the two main challenges. Our code is open-sourced at https://github.com/Shiran-Yuan/HSR-SDM.

Updated: 2024-10-23 01:13:24

标题: 物种分布建模的混合空间表示

摘要: 我们解决了生态学中一个重要的问题，称为物种分布模型（SDM），其目标是预测某个位置上是否存在某种物种。特别是，我们处理了这个任务的一个具有挑战性的版本，其中我们从社区共享的数据集中学习存在数据，同时建模大量物种，并且不使用任何额外的环境信息。先前的工作已经使用神经隐式表示来构建模型，取得了令人期待的结果。然而，隐式表示通常会生成具有有限空间精度的预测。我们将这种限制归因于其固有的全局制定和无法有效捕捉局部特征变化的能力。这个问题在存在数据和大量物种的情况下尤为明显。为了解决这个问题，我们提出了一个混合嵌入方案，结合了隐式和显式嵌入。具体来说，显式嵌入是通过多分辨率哈希格实现的，使我们的模型能够更好地捕捉局部信息。实验证明，我们的结果在各种标准基准测试中超过了其他作品很大一部分，而混合表示比纯粹的隐式和显式表示更好。定性可视化和全面的消融研究显示，我们的混合表示成功地解决了两个主要挑战。我们的代码是开源的，网址是https://github.com/Shiran-Yuan/HSR-SDM。

更新时间: 2024-10-23 01:13:24

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.10937v2

Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching

Although Large Language Models (LLMs) have demonstrated remarkable capabilities, their massive parameter counts and associated extensive computing make LLMs' deployment the main part of carbon emission from nowadays AI applications. Compared to modern GPUs like H$100$, it would be significantly carbon-sustainable if we could leverage old-fashioned GPUs such as M$40$ (as shown in Figure 1, M$40$ only has one third carbon emission of H$100$'s) for LLM servings. However, the limited High Bandwidth Memory (HBM) available on such GPU often cannot support the loading of LLMs due to the gigantic model size and intermediate activation data, making their serving challenging. For instance, a LLaMA2 model with $70$B parameters typically requires $128$GB for inference, which substantially surpasses $24$GB HBM in a $3090$ GPU and remains infeasible even considering the additional $64$GB DRAM. To address this challenge, this paper proposes a mixed-precision with a model modularization algorithm to enable LLM inference on outdated hardware with resource constraints. (The precision denotes the numerical precision like FP16, INT8, INT4) and multi-level caching (M2Cache).) Specifically, our M2Cache first modulizes neurons in LLM and creates their importance ranking. Then, it adopts a dynamic sparse mixed-precision quantization mechanism in weight space to reduce computational demands and communication overhead at each decoding step. It collectively lowers the operational carbon emissions associated with LLM inference. Moreover, M2Cache introduces a three-level cache management system with HBM, DRAM, and SSDs that complements the dynamic sparse mixed-precision inference. To enhance communication efficiency, M2Cache maintains a neuron-level mixed-precision LRU cache in HBM, a larger layer-aware cache in DRAM, and a full model in SSD.

Updated: 2024-10-23 01:08:59

标题: 利用DRAM和SSD进行可持续和可访问的LLM推理，采用混合精度和多级缓存技术

摘要: 尽管大型语言模型（LLMs）展示了卓越的能力，但其庞大的参数数量和相关的大量计算使得LLMs的部署成为如今人工智能应用中碳排放的主要部分。与现代显卡如H100相比，如果我们能够利用老式显卡如M40（如图1所示，M40的碳排放仅为H100的三分之一），那将显著有利于碳可持续发展，用于LLM服务。然而，这种显卡上可用的有限高带宽内存（HBM）通常无法支持LLMs的加载，因为其庞大的模型大小和中间激活数据，使得它们的服务具有挑战性。例如，一个具有70亿参数的LLaMA2模型通常需要128GB用于推理，这远远超过了3090显卡上的24GB HBM，即使考虑了额外的64GB DRAM也仍然不可行。为了解决这一挑战，本文提出了一种混合精度与模型模块化算法，使得LLM推理可以在资源受限的过时硬件上进行。（精度表示数字精度，如FP16、INT8、INT4）和多级缓存（M2Cache）。具体来说，我们的M2Cache首先将LLM中的神经元进行模块化，并创建它们的重要性排名。然后，它采用一种动态稀疏的混合精度量化机制在权重空间中减少每个解码步骤的计算需求和通信开销。它共同降低了与LLM推理相关的操作碳排放。此外，M2Cache引入了一个具有HBM、DRAM和SSD的三级缓存管理系统，与动态稀疏混合精度推理相辅相成。为增强通信效率，M2Cache在HBM中维护一个神经元级混合精度LRU缓存，一个更大的层感知缓存在DRAM中，以及一个完整的模型在SSD中。

更新时间: 2024-10-23 01:08:59

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2410.14740v2

ODBAE: a high-performance model identifying complex phenotypes in high-dimensional biological datasets

Identifying complex phenotypes from high-dimensional biological data is challenging due to the intricate interdependencies among different physiological indicators. Traditional approaches often focus on detecting outliers in single variables, overlooking the broader network of interactions that contribute to phenotype emergence. Here, we introduce ODBAE (Outlier Detection using Balanced Autoencoders), a machine learning method designed to uncover both subtle and extreme outliers by capturing latent relationships among multiple physiological parameters. ODBAE's revised loss function enhances its ability to detect two key types of outliers: influential points (IP), which disrupt latent correlations between dimensions, and high leverage points (HLP), which deviate from the norm but go undetected by traditional autoencoder-based methods. Using data from the International Mouse Phenotyping Consortium (IMPC), we show that ODBAE can identify knockout mice with complex, multi-indicator phenotypes - normal in individual traits, but abnormal when considered together. In addition, this method reveals novel metabolism-related genes and uncovers coordinated abnormalities across metabolic indicators. Our results highlight the utility of ODBAE in detecting joint abnormalities and advancing our understanding of homeostatic perturbations in biological systems.

Updated: 2024-10-23 01:02:38

标题: 基于高维生物数据集的高性能模型ODBAE：识别复杂表型

摘要: 从高维生物数据中识别复杂表型是具有挑战性的，这是因为不同生理指标之间存在错综复杂的相互依赖关系。传统方法通常侧重于检测单个变量中的异常值，忽视了对导致表型出现的更广泛的相互作用网络。在这里，我们介绍了ODBAB（使用平衡自动编码器进行异常值检测），这是一种机器学习方法，旨在通过捕获多个生理参数之间的潜在关系来揭示细微和极端的异常值。ODBAB的修订损失函数增强了其检测两种关键异常值的能力：影响点（IP），破坏维度之间的潜在相关性，以及高杠杆点（HLP），偏离正常值但被传统的基于自动编码器的方法所未能检测到。使用国际小鼠表型 consorium（IMPC）的数据，我们展示了ODBAB可以识别具有复杂多指标表型的敲除小鼠 - 在单个特征上正常，但在考虑在一起时异常。此外，该方法揭示了新的与代谢相关的基因，并发现了代谢指标之间的协调异常。我们的结果突显了ODBAB在检测联合异常和推动我们对生物系统中稳态紊乱的理解方面的实用性。

更新时间: 2024-10-23 01:02:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2211.03054v2

Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment

Recent advancements in deep learning-based wearable human action recognition (wHAR) have improved the capture and classification of complex motions, but adoption remains limited due to the lack of expert annotations and domain discrepancies from user variations. Limited annotations hinder the model's ability to generalize to out-of-distribution samples. While data augmentation can improve generalizability, unsupervised augmentation techniques must be applied carefully to avoid introducing noise. Unsupervised domain adaptation (UDA) addresses domain discrepancies by aligning conditional distributions with labeled target samples, but vanilla pseudo-labeling can lead to error propagation. To address these challenges, we propose $\mu$DAR, a novel joint optimization architecture comprised of three functions: (i) consistency regularizer between augmented samples to improve model classification generalizability, (ii) temporal ensemble for robust pseudo-label generation and (iii) conditional distribution alignment to improve domain generalizability. The temporal ensemble works by aggregating predictions from past epochs to smooth out noisy pseudo-label predictions, which are then used in the conditional distribution alignment module to minimize kernel-based class-wise conditional maximum mean discrepancy ($k$CMMD) between the source and target feature space to learn a domain invariant embedding. The consistency-regularized augmentations ensure that multiple augmentations of the same sample share the same labels; this results in (a) strong generalization with limited source domain samples and (b) consistent pseudo-label generation in target samples. The novel integration of these three modules in $\mu$DAR results in a range of $\approx$ 4-12% average macro-F1 score improvement over six state-of-the-art UDA methods in four benchmark wHAR datasets

Updated: 2024-10-23 00:59:27

标题: 无监督领域自适应用于动作识别的方法：通过自组合和条件嵌入对齐

摘要: 最近的深度学习技术在基于可穿戴设备的人体动作识别（wHAR）领域取得了进展，提高了对复杂运动的捕捉和分类能力，但由于缺乏专家标注和用户变化导致的领域差异，其应用仍然受限。有限的标注限制了模型对于分布外样本的泛化能力。虽然数据增强可以提高泛化性能，但无监督增强技术必须谨慎应用以避免引入噪音。无监督领域自适应（UDA）通过将有标签的目标样本的条件分布进行调整来解决领域差异，但简单的伪标签可能导致错误传播。为解决这些挑战，我们提出了$\mu$DAR，这是一个新颖的联合优化架构，由三个函数组成：（i）增强样本之间的一致性正则化器，以提高模型分类的泛化能力，（ii）用于稳健伪标签生成的时间集成，以及（iii）用于改善领域泛化能力的条件分布对齐。时间集成通过聚合过去时期的预测结果来平滑噪音伪标签预测，然后将其用于条件分布对齐模块，以最小化源特征空间和目标特征空间之间的基于核的类别条件最大均值差异（$k$CMMD），从而学习一个领域不变的嵌入。一致性正则化增强确保同一样本的多个增强具有相同的标签；这导致（a）在有限的源域样本中实现强泛化，和（b）目标样本中一致的伪标签生成。$\mu$DAR中这三个模块的新颖集成导致在四个基准wHAR数据集中，相对于六种最先进的UDA方法，平均宏F1分数的改进范围约为4-12%。

更新时间: 2024-10-23 00:59:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17489v1

GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Diffusion-based policies have shown remarkable capability in executing complex robotic manipulation tasks but lack explicit characterization of geometry and semantics, which often limits their ability to generalize to unseen objects and layouts. To enhance the generalization capabilities of Diffusion Policy, we introduce a novel framework that incorporates explicit spatial and semantic information via 3D semantic fields. We generate 3D descriptor fields from multi-view RGBD observations with large foundational vision models, then compare these descriptor fields against reference descriptors to obtain semantic fields. The proposed method explicitly considers geometry and semantics, enabling strong generalization capabilities in tasks requiring category-level generalization, resolving geometric ambiguities, and attention to subtle geometric details. We evaluate our method across eight tasks involving articulated objects and instances with varying shapes and textures from multiple object categories. Our method demonstrates its effectiveness by increasing Diffusion Policy's average success rate on unseen instances from 20% to 93%. Additionally, we provide a detailed analysis and visualization to interpret the sources of performance gain and explain how our method can generalize to novel instances.

Updated: 2024-10-23 00:51:47

标题: GenDP：类别级可泛化扩散策略的3D语义场

摘要: 基于扩散的策略在执行复杂的机器人操作任务方面表现出显著的能力，但缺乏对几何和语义的明确表征，这经常限制它们推广到未知对象和布局的能力。为了增强扩散策略的泛化能力，我们引入了一个通过3D语义场合并显式空间和语义信息的新框架。我们利用大型基础视觉模型从多视角RGBD观测中生成3D描述符场，然后将这些描述符场与参考描述符进行比较以获得语义场。所提出的方法明确考虑了几何和语义，使其在需要类别级泛化、解决几何模糊和关注微妙几何细节的任务中具有强大的泛化能力。我们在涉及多个对象类别的具有不同形状和纹理的关节对象和实例的八项任务上评估了我们的方法。我们的方法通过将Diffusion Policy在未知实例上的平均成功率从20%提高到93%来展示其有效性。此外，我们提供了详细的分析和可视化来解释性能增益的来源，并说明我们的方法如何能够推广到新实例。

更新时间: 2024-10-23 00:51:47

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17488v1

On Catastrophic Inheritance of Large Foundation Models

Large foundation models (LFMs) are claiming incredible performances. Yet great concerns have been raised about their mythic and uninterpreted potentials not only in machine learning, but also in various other disciplines. In this position paper, we propose to identify a neglected issue deeply rooted in LFMs: Catastrophic Inheritance, describing the weaknesses and limitations inherited from biased large-scale pre-training data to behaviors of LFMs on the downstream tasks, including samples that are corrupted, long-tailed, noisy, out-of-distributed, to name a few. Such inheritance can potentially cause catastrophes to downstream applications, such as bias, lack of generalization, deteriorated performance, security vulnerability, privacy leakage, and value misalignment. We discuss the challenges behind this issue and propose UIM, a framework to Understand the catastrophic inheritance of LFMs from both pre-training and downstream adaptation, Interpret the implications of catastrophic inheritance on downstream tasks, and how to Mitigate it. UIM aims to unite both the machine learning and social sciences communities for more responsible and promising AI development and deployment.

Updated: 2024-10-23 00:40:23

标题: 关于大型基础模型的灾难性遗传

摘要: 大型基础模型（LFMs）声称具有令人难以置信的性能。然而，人们对它们神话般和不可解释的潜力提出了巨大关注，不仅在机器学习领域，还在各种其他学科中。在这篇立场论文中，我们提出识别一个在LFMs中根深蒂固的被忽视问题：灾难性继承，描述了从有偏见的大规模预训练数据继承到LFMs在下游任务中的行为的弱点和局限性，包括被损坏、长尾、嘈杂、分布不均等样本。这种继承潜在地会导致下游应用的灾难，如偏见、缺乏泛化、性能恶化、安全漏洞、隐私泄露和价值不一致。我们讨论了这个问题背后的挑战，并提出了UIM，一个框架，用于理解LFMs的灾难性继承，包括从预训练和下游适应中继承的内容，解释灾难性继承对下游任务的影响，以及如何减轻它。UIM旨在将机器学习和社会科学界团结起来，以更负责任和有前途地发展和部署人工智能。

更新时间: 2024-10-23 00:40:23

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2402.01909v2

When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

Recent advancements in the safety of Large Language Models (LLMs) have primarily focused on mitigating attacks crafted in natural language or in common encryption techniques like Base64. However, new models which often possess better reasoning capabilities, open the door to new attack vectors that were previously non-existent in older models. This seems counter-intuitive at first glance, but these advanced models can decipher more complex cryptic queries that previous models could not, making them susceptible to attacks using such prompts. To exploit this vulnerability, we propose Attacks using Custom Encryptions (ACE), a novel method to jailbreak LLMs by leveraging custom encryption schemes. We evaluate the effectiveness of ACE on four state-of-the-art LLMs, achieving Attack Success Rates (ASR) of up to 66% on close-source models and 88% on open-source models. Building upon this, we introduce Layered Attacks using Custom Encryptions (LACE), which employs multiple layers of encryption through our custom ciphers to further enhance the ASR. Our findings demonstrate that LACE significantly enhances the ability to jailbreak LLMs, increasing the ASR of GPT-4o from 40% to 78%, a 38% improvement. Our results highlight that the advanced capabilities of LLMs introduce unforeseen vulnerabilities to complex attacks. Specifically complex and layered ciphers increase the chance of jailbreaking.

Updated: 2024-10-23 00:38:14

标题: 当“推理能力”打开脆弱的大门：通过新型复杂密码对LLMs进行越狱

摘要: 最近关于大型语言模型（LLMs）安全性的进展主要集中在减轻自然语言或常见加密技术（如Base64）制作的攻击。然而，新模型通常具有更好的推理能力，打开了先前在旧模型中不存在的新攻击向量的大门。乍一看，这似乎有些矛盾，但这些先进模型可以解密以前模型无法解密的更复杂的神秘查询，使它们容易受到使用这些提示的攻击。为了利用这种漏洞，我们提出了使用自定义加密（ACE）的攻击，这是一种利用自定义加密方案破解LLMs的新方法。我们评估了ACE在四种最先进的LLMs上的有效性，达到了对闭源模型高达66％和对开源模型高达88％的攻击成功率（ASR）。在此基础上，我们引入了使用自定义加密的分层攻击（LACE），它通过我们的自定义密码使用多层加密来进一步增强ASR。我们的研究结果表明，LACE显著提高了破解LLMs的能力，将GPT-4o的ASR从40％提高到78％，提高了38％。我们的结果突显了LLMs的先进功能引入了复杂攻击的意外漏洞。特别是复杂和分层密码增加了破解的机会。

更新时间: 2024-10-23 00:38:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10601v2

Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering

Conventional medical artificial intelligence (AI) models face barriers in clinical application and ethical issues owing to their inability to handle the privacy-sensitive characteristics of medical data. We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models, addressing privacy reliability challenges in the medical domain. Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs. Then we introduce a reliable client VQA model that incorporates Dempster-Shafer evidence theory to quantify uncertainty in predictions, enhancing the model's reliability. Furthermore, we propose a novel inter-client communication mechanism that uses maximum likelihood estimation to balance accuracy and uncertainty, fostering efficient integration of insights across clients.

Updated: 2024-10-23 00:31:17

标题: 哪种客户端是可靠的？：一种可靠且个性化的基于提示的医学图像问答联邦学习

摘要: 传统的医学人工智能（AI）模型在临床应用和伦理问题上面临障碍，因为它们无法处理医学数据的隐私敏感特征。我们提出了一种新颖的个性化联邦学习（pFL）方法，用于医学视觉问答（VQA）模型，解决医学领域中的隐私可靠性挑战。我们的方法在Transformer架构中引入可学习提示，以便在不需要大量计算成本的情况下有效地对其进行训练，涵盖各种医学数据集。然后，我们引入了一个可靠的客户VQA模型，该模型结合Dempster-Shafer证据理论来量化预测中的不确定性，增强了模型的可靠性。此外，我们提出了一种新颖的客户间通信机制，利用最大似然估计来平衡准确性和不确定性，促进跨客户间见解的有效整合。

更新时间: 2024-10-23 00:31:17

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.17484v1

AI, Global Governance, and Digital Sovereignty

This essay examines how Artificial Intelligence (AI) systems are becoming more integral to international affairs by affecting how global governors exert power and pursue digital sovereignty. We first introduce a taxonomy of multifaceted AI payoffs for governments and corporations related to instrumental, structural, and discursive power in the domains of violence, markets, and rights. We next leverage different institutional and practice perspectives on sovereignty to assess how digital sovereignty is variously implicated in AI-empowered global governance. States both seek sovereign control over AI infrastructures in the institutional approach, while establishing sovereign competence through AI infrastructures in the practice approach. Overall, we present the digital sovereignty stakes of AI as related to entanglements of public and private power. Rather than foreseeing technology companies as replacing states, we argue that AI systems will embed in global governance to create dueling dynamics of public/private cooperation and contestation. We conclude with sketching future directions for IR research on AI and global governance.

Updated: 2024-10-23 00:05:33

标题: 人工智能、全球治理和数字主权

摘要: 这篇文章探讨了人工智能（AI）系统如何越来越成为国际事务中不可或缺的一部分，影响着全球治理者如何行使权力并追求数字主权。我们首先介绍了政府和公司在暴力、市场和权利领域中与工具性、结构性和话语性权力相关的多方面AI收益的分类法。接下来，我们利用不同的机构和实践角度来评估数字主权在被AI赋能的全球治理中如何被涉及。在机构方法中，国家既寻求对AI基础设施的主权控制，同时在实践方法中通过AI基础设施确立主权能力。总的来说，我们将AI的数字主权利益与公共和私人权力的纠缠联系起来。我们认为AI系统不会取代国家，而是会嵌入全球治理中，产生公共/私人合作和争议的对立动态。最后，我们提出了关于AI和全球治理的国际关系研究的未来方向。

更新时间: 2024-10-23 00:05:33

领域: cs.AI

下载: http://arxiv.org/abs/2410.17481v1

Deep Autoencoder with SVD-Like Convergence and Flat Minima

Representation learning for high-dimensional, complex physical systems aims to identify a low-dimensional intrinsic latent space, which is crucial for reduced-order modeling and modal analysis. To overcome the well-known Kolmogorov barrier, deep autoencoders (AEs) have been introduced in recent years, but they often suffer from poor convergence behavior as the rank of the latent space increases. To address this issue, we propose the learnable weighted hybrid autoencoder, a hybrid approach that combines the strengths of singular value decomposition (SVD) with deep autoencoders through a learnable weighted framework. We find that the introduction of learnable weighting parameters is essential - without them, the resulting model would either collapse into a standard POD or fail to exhibit the desired convergence behavior. Additionally, we empirically find that our trained model has a sharpness thousands of times smaller compared to other models. Our experiments on classical chaotic PDE systems, including the 1D Kuramoto-Sivashinsky and forced isotropic turbulence datasets, demonstrate that our approach significantly improves generalization performance compared to several competing methods, paving the way for robust representation learning of high-dimensional, complex physical systems.

Updated: 2024-10-23 00:04:26

标题: 具有类似SVD收敛和平坦极小值的深度自动编码器

摘要: 高维复杂物理系统的表示学习旨在识别一个关键的低维固有潜在空间，这对于简化建模和模态分析至关重要。为了克服著名的科尔莫哥罗夫壁垒，近年来引入了深度自动编码器(AEs)，但随着潜在空间的秩增加，它们经常遭受收敛行为不佳的困扰。为了解决这个问题，我们提出了可学习的加权混合自动编码器，这是一种混合方法，通过可学习的加权框架结合奇异值分解(SVD)的优势和深度自动编码器。我们发现，引入可学习的加权参数是必不可少的 - 没有它们，得到的模型要么会崩溃成标准POD，要么无法展现出期望的收敛行为。此外，我们经验性地发现，我们训练的模型的锐度比其他模型小上千倍。我们对经典混沌PDE系统进行了实验，包括1D Kuramoto-Sivashinsky和强迫各向同性湍流数据集，结果表明，与几种竞争方法相比，我们的方法明显提高了泛化性能，为高维复杂物理系统的稳健表示学习铺平了道路。

更新时间: 2024-10-23 00:04:26

领域: cs.LG,cs.AI,physics.comp-ph,stat.ML,68T07, 76F99

下载: http://arxiv.org/abs/2410.18148v1

Learning Action Embeddings for Off-Policy Evaluation

Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy. OPE is a viable alternative to running expensive online A/B tests: it can speed up the development of new policies, and reduces the risk of exposing customers to suboptimal treatments. However, when the number of actions is large, or certain actions are under-explored by the logging policy, existing estimators based on inverse-propensity scoring (IPS) can have a high or even infinite variance. Saito and Joachims (arXiv:2202.06317v2 [cs.LG]) propose marginalized IPS (MIPS) that uses action embeddings instead, which reduces the variance of IPS in large action spaces. MIPS assumes that good action embeddings can be defined by the practitioner, which is difficult to do in many real-world applications. In this work, we explore learning action embeddings from logged data. In particular, we use intermediate outputs of a trained reward model to define action embeddings for MIPS. This approach extends MIPS to more applications, and in our experiments improves upon MIPS with pre-defined embeddings, as well as standard baselines, both on synthetic and real-world data. Our method does not make assumptions about the reward model class, and supports using additional action information to further improve the estimates. The proposed approach presents an appealing alternative to DR for combining the low variance of DM with the low bias of IPS.

Updated: 2024-10-23 00:03:41

标题: 学习动作嵌入以进行离线评估

摘要: 离线策略评估（OPE）方法使我们能够通过使用由不同策略收集的日志数据来计算策略的预期奖励。 OPE是运行昂贵的在线A/B测试的可行替代方案：它可以加快新策略的开发速度，并减少向客户提供次优处理的风险。但是，当行动数量较大或某些行动受到记录策略的低探索程度影响时，基于反向倾向分数（IPS）的现有估计器可能具有高甚至无限的方差。 Saito和Joachims（arXiv:2202.06317v2 [cs.LG]）提出了使用动作嵌入的边缘IPS（MIPS），从而降低了IPS在大动作空间中的方差。 MIPS假设可以由实践者定义良好的动作嵌入，但这在许多实际应用中很难做到。在这项工作中，我们探索从记录的数据中学习动作嵌入。特别是，我们使用经过训练的奖励模型的中间输出来为MIPS定义动作嵌入。这种方法扩展了MIPS的应用范围，并在我们的实验中改进了具有预定义嵌入的MIPS，以及合成和实际数据上的标准基线。我们的方法不对奖励模型类做假设，并支持使用额外的动作信息来进一步改进估计。所提出的方法为将DM的低方差与IPS的低偏差相结合提供了一种具有吸引力的替代方案。

更新时间: 2024-10-23 00:03:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.03954v2