Arxiv Day: Article

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

Updated: 2024-07-08 23:47:29

标题: MUSE：语言模型的机器取消学习六方面评估

摘要: 语言模型（LMs）是在大量文本数据上进行训练的，这些数据可能包括私人和受版权保护的内容。数据所有者可能会要求从经过训练的模型中移除他们的数据，原因可能是隐私或版权问题。然而，仅仅取消学习这些数据点（即，重新训练并删除数据）在现代模型中是不可行的。这导致了许多近似取消学习算法的发展。这些算法的有效性评估在传统上范围狭窄，未能准确量化算法从模型部署者和数据所有者的角度来看的成功和实用性。我们通过提出MUSE，一个全面的机器取消学习评估基准，列举了取消学习模型的六个多样化的理想属性：（1）没有逐字背诵，（2）没有知识背诵，（3）没有隐私泄漏，（4）在不打算删除的数据上保持实用性，（5）与删除请求的大小相关的可扩展性，以及（6）在连续取消学习请求中的可持续性。使用这些标准，我们评估了八种流行的取消学习算法在7B参数的LMs上如何取消学习《哈利·波特》书籍和新闻文章。我们的结果表明，大多数算法可以在不同程度上防止逐字背诵和知识背诵，但只有一种算法不会导致严重的隐私泄漏。此外，现有算法未能满足部署者的期望，因为它们经常降低了模型的通用实用性，也无法可持续地适应连续的取消学习请求或大规模的内容删除。我们的研究结果指出了现有取消学习算法在语言模型上的实用性关键问题，并发布了我们的基准以促进进一步的评估：muse-bench.github.io。

更新时间: 2024-07-08 23:47:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06460v1

How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at \url{https://github.com/TeachingwithProgress/Non-Expert\_Demonstrations}.

Updated: 2024-07-08 23:47:13

标题: 我取得了多少进展？一种未被探索的人类反馈信号，用于教导机器人

摘要: Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: progress, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at https://github.com/TeachingwithProgress/Non-Expert_Demonstrations.

更新时间: 2024-07-08 23:47:13

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06459v1

SD-BLS: Privacy Preserving Selective Disclosure of Verifiable Credentials with Unlinkable Threshold Revocation

Ensuring privacy and protection from issuer corruption in digital identity systems is crucial. We propose a method for selective disclosure and privacy-preserving revocation of digital credentials using second-order Elliptic Curves and Boneh-Lynn-Shacham (BLS) signatures. We make holders able to present proofs of possession of selected credentials without disclosing them, and we protect their presentations from replay attacks. Revocations may be distributed among multiple revocation issuers using publicly verifiable secret sharing (PVSS) and activated only by configurable consensus, ensuring robust protection against issuer corruption. Our system's unique design enables extremely fast revocation checks, even with large revocation lists, leveraging optimized hash map lookups.

Updated: 2024-07-08 23:37:55

标题: SD-BLS：隐私保护的可验证凭证选择性披露与不可链接阈值吊销

摘要: 确保数字身份系统中的隐私和防止发行者腐败至关重要。我们提出了一种利用二阶椭圆曲线和Boneh-Lynn-Shacham（BLS）签名的数字凭证选择性披露和保护隐私的吊销方法。我们使持有者能够呈现所选凭证的拥有证据，而不必披露它们，并保护他们的展示免受重放攻击。吊销可以通过公开可验证的秘密共享（PVSS）分布在多个吊销发行者之间，并只能通过可配置的共识激活，确保针对发行者腐败的强大保护。我们系统独特的设计使得即使有大量吊销列表，也能进行极快的吊销检查，利用优化的哈希映射查找。

更新时间: 2024-07-08 23:37:55

领域: cs.CR

下载: http://arxiv.org/abs/2406.19035v3

Exploiting Heterogeneity in Timescales for Sparse Recurrent Spiking Neural Networks for Energy-Efficient Edge Computing

Spiking Neural Networks (SNNs) represent the forefront of neuromorphic computing, promising energy-efficient and biologically plausible models for complex tasks. This paper weaves together three groundbreaking studies that revolutionize SNN performance through the introduction of heterogeneity in neuron and synapse dynamics. We explore the transformative impact of Heterogeneous Recurrent Spiking Neural Networks (HRSNNs), supported by rigorous analytical frameworks and novel pruning methods like Lyapunov Noise Pruning (LNP). Our findings reveal how heterogeneity not only enhances classification performance but also reduces spiking activity, leading to more efficient and robust networks. By bridging theoretical insights with practical applications, this comprehensive summary highlights the potential of SNNs to outperform traditional neural networks while maintaining lower computational costs. Join us on a journey through the cutting-edge advancements that pave the way for the future of intelligent, energy-efficient neural computing.

Updated: 2024-07-08 23:33:12

标题: 利用时间尺度的异质性实现稀疏循环尖峰神经网络，用于能效边缘计算

摘要: 脉冲神经网络（SNNs）代表了神经形态计算的前沿，为复杂任务提供了能效高且符合生物学的模型。本文汇集了三项开创性研究，通过引入神经元和突触动力学的异质性，彻底改变了SNN的性能。我们探讨了异质性循环脉冲神经网络（HRSNNs）的变革性影响，支持严谨的分析框架和类似李亚普诺夫噪声修剪（LNP）的新型修剪方法。我们的研究结果显示，异质性不仅提高了分类性能，还降低了脉冲活动，使网络更加高效和稳健。通过将理论洞见与实际应用相结合，这一全面总结突显了SNN超越传统神经网络的潜力，同时保持更低的计算成本。加入我们，共同探索引领未来智能、能效高的神经计算的前沿进展。

更新时间: 2024-07-08 23:33:12

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.06452v1

MELT: Mining Effective Lightweight Transformations from Pull Requests

Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce MELT, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. MELT rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated MELT on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9x. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios.

Updated: 2024-07-08 23:16:16

标题: MELT: 从拉取请求中挖掘有效轻量级转换

摘要: 软件开发人员经常难以更新API，导致手动、耗时和易出错的过程。我们介绍了一种新方法MELT，它直接从流行库存储库中的拉取请求生成轻量级API迁移规则。我们的关键洞察是已合并到开源库的拉取请求是足够挖掘API迁移规则的丰富信息源。通过利用从库源中挖掘的代码示例和基于拉取请求自动生成的代码示例，我们利用\comby语言推断出转换规则，这是一种用于结构化代码搜索和替换的语言。由于从单个代码示例推断出的规则可能过于具体，我们提出了一种泛化过程，使规则更适用于客户项目。MELT规则是语法驱动的，可解释的，易于适应的。此外，与以往的工作不同，我们的方法使规则推断能够无缝集成到库工作流程中，消除了等待客户代码迁移的需要。我们在四个流行库的拉取请求上评估了MELT，在拉取请求中从代码示例中成功挖掘出461个迁移规则，从自动生成的代码示例中挖掘出114个规则。我们的泛化过程将挖掘出的规则匹配数量增加了9倍。我们将这些规则应用于客户项目并运行了它们的测试，从而导致警告数量的总体减少，并修复了一些测试案例，展示了MELT在真实场景中的有效性。

更新时间: 2024-07-08 23:16:16

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2308.14687v2

Geospatial Trajectory Generation via Efficient Abduction: Deployment for Independent Testing

The ability to generate artificial human movement patterns while meeting location and time constraints is an important problem in the security community, particularly as it enables the study of the analog problem of detecting such patterns while maintaining privacy. We frame this problem as an instance of abduction guided by a novel parsimony function represented as an aggregate truth value over an annotated logic program. This approach has the added benefit of affording explainability to an analyst user. By showing that any subset of such a program can provide a lower bound on this parsimony requirement, we are able to abduce movement trajectories efficiently through an informed (i.e., A*) search. We describe how our implementation was enhanced with the application of multiple techniques in order to be scaled and integrated with a cloud-based software stack that included bottom-up rule learning, geolocated knowledge graph retrieval/management, and interfaces with government systems for independently conducted government-run tests for which we provide results. We also report on our own experiments showing that we not only provide exact results but also scale to very large scenarios and provide realistic agent trajectories that can go undetected by machine learning anomaly detectors.

Updated: 2024-07-08 23:11:47

标题: 地理空间轨迹生成的有效绑架：独立测试的部署

摘要: 在安全社区中，生成人工运动模式以满足位置和时间约束的能力是一个重要问题，特别是因为它能够研究类似问题——在保持隐私的同时检测此类模式。我们将这个问题框定为一种由一种新颖的简约函数引导的绑架实例，该函数以一个带注释的逻辑程序上的聚合真值表示。这种方法的额外好处是为分析师用户提供可解释性。通过展示这样一个程序的任何子集都可以为这种简约要求提供一个下限，我们能够通过一个经过信息化的（即A*）搜索有效地绑架运动轨迹。我们描述了如何通过应用多种技术来增强我们的实现，以便与包括自下而上规则学习、地理知识图检索/管理以及与政府系统接口的云软件堆栈集成，以进行独立进行的政府执行测试，我们提供了结果。我们还报告了我们自己的实验结果，表明我们不仅提供精确的结果，而且能够扩展到非常大的场景，并提供可以绕过机器学习异常检测器的逼真代理轨迹。

更新时间: 2024-07-08 23:11:47

领域: cs.LO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06447v1

Deep Learning in Physical Layer: Review on Data Driven End-to-End Communication Systems and their Enabling Semantic Applications

Deep learning (DL) has revolutionized wireless communication systems by introducing datadriven end-to-end (E2E) learning, where the physical layer (PHY) is transformed into DL architectures to achieve peak optimization. Leveraging DL for E2E optimization in PHY significantly enhances its adaptability and performance in complex wireless environments, meeting the demands of advanced network systems such as 5G and beyond. Furthermore, this evolution of data-driven PHY optimization has also enabled advanced semantic applications across various modalities, including text, image, audio, video, and multimodal transmissions. These applications elevate communication from bit-level to semantic-level intelligence, making it capable of discerning context and intent. Although the PHY, as a DL architecture, plays a crucial role in enabling semantic communication (SemCom) systems, comprehensive studies that integrate both E2E communication and SemCom systems remain significantly underexplored. This highlights the novelty and potential of these integrative fields, marking them as a promising research domain. Therefore, this article provides a comprehensive review of the emerging field of data-driven PHY for E2E communication systems, emphasizing their role in enabling semantic applications across various modalities. It also identifies key challenges and potential research directions, serving as a crucial guide for future advancements in DL for E2E communication and SemCom systems.

Updated: 2024-07-08 22:58:11

标题: 物理层的深度学习：基于数据驱动的端到端通信系统及其支持的语义应用的综述

摘要: 深度学习（DL）通过引入数据驱动的端到端（E2E）学习，彻底改变了无线通信系统，其中物理层（PHY）被转化为DL架构以实现峰值优化。利用DL来进行PHY的E2E优化显著提升了其在复杂无线环境中的适应性和性能，满足了5G及以后网络系统的需求。此外，数据驱动PHY优化的演进还使得跨各种模态的高级语义应用成为可能，包括文本、图像、音频、视频和多模式传输。这些应用将通信从比特级提升到语义级智能，使其能够区分上下文和意图。尽管作为DL架构的PHY在实现语义通信（SemCom）系统方面发挥着关键作用，但综合研究同时整合E2E通信和SemCom系统仍然相对未被充分探索。这突显了这些整合领域的新颖性和潜力，并将它们视为一个有前途的研究领域。因此，本文全面回顾了数据驱动PHY用于E2E通信系统的新兴领域，强调它们在实现跨各种模态的语义应用中的作用。同时，它还确定了关键挑战和潜在的研究方向，为未来DL在E2E通信和SemCom系统方面的进步提供了关键指导。

更新时间: 2024-07-08 22:58:11

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2401.12800v2

Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment

Large Language Models (LLMs) have seen widespread adoption due to their remarkable natural language capabilities. However, when deploying them in real-world settings, it is important to align LLMs to generate texts according to acceptable human standards. Methods such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) have made significant progress in refining LLMs using human preference data. However, the privacy concerns inherent in utilizing such preference data have yet to be adequately studied. In this paper, we investigate the vulnerability of LLMs aligned using human preference datasets to membership inference attacks (MIAs), highlighting the shortcomings of previous MIA approaches with respect to preference data. Our study has two main contributions: first, we introduce a novel reference-based attack framework specifically for analyzing preference data called PREMIA (\uline{Pre}ference data \uline{MIA}); second, we provide empirical evidence that DPO models are more vulnerable to MIA compared to PPO models. Our findings highlight gaps in current privacy-preserving practices for LLM alignment.

Updated: 2024-07-08 22:53:23

标题: 揭示隐私漏洞：对LLM对齐偏好数据进行成员推断攻击

摘要: 大型语言模型（LLMs）由于其出色的自然语言能力而得到广泛应用。然而，在实际环境中部署它们时，将LLMs对齐以生成符合人类标准的文本是非常重要的。诸如Proximal Policy Optimization（PPO）和Direct Preference Optimization（DPO）等方法已经在使用人类偏好数据对LLMs进行改进方面取得了重大进展。然而，利用此类偏好数据存在的隐私问题尚未得到充分研究。本文调查了使用人类偏好数据对齐的LLMs对成员推断攻击（MIAs）的脆弱性，并突出了先前MIAs方法在偏好数据方面的不足之处。我们的研究有两个主要贡献：首先，我们引入了一种专门用于分析偏好数据的基于参考的攻击框架，称为PREMIA（Preference data MIA）；其次，我们提供了实证证据表明，与PPO模型相比，DPO模型更容易受到MIA攻击。我们的发现凸显了当前LLM对齐中隐私保护实践的差距。

更新时间: 2024-07-08 22:53:23

领域: cs.AI

下载: http://arxiv.org/abs/2407.06443v1

A Single Transformer for Scalable Vision-Language Modeling

We present SOLO, a single transformer for Scalable visiOn-Language mOdeling. Current large vision-language models (LVLMs) such as LLaVA mostly employ heterogeneous architectures that connect pre-trained visual encoders with large language models (LLMs) to facilitate visual recognition and complex reasoning. Although achieving remarkable performance with relatively lightweight training, we identify four primary scalability limitations: (1) The visual capacity is constrained by pre-trained visual encoders, which are typically an order of magnitude smaller than LLMs. (2) The heterogeneous architecture complicates the use of established hardware and software infrastructure. (3) Study of scaling laws on such architecture must consider three separate components - visual encoder, connector, and LLMs, which complicates the analysis. (4) The use of existing visual encoders typically requires following a pre-defined specification of image inputs pre-processing, for example, by reshaping inputs to fixed-resolution square images, which presents difficulties in processing and training on high-resolution images or those with unusual aspect ratio. A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs; however, its limited adoption in the modern context likely stems from the absence of reliable training recipes that balance both modalities and ensure stable training for billion-scale models. In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM using moderate academic resources. The training recipe involves initializing from LLMs, sequential pre-training on ImageNet and web-scale data, and instruction fine-tuning on our curated high-quality datasets. On extensive evaluation, SOLO demonstrates performance comparable to LLaVA-v1.5-7B, particularly excelling in visual mathematical reasoning.

Updated: 2024-07-08 22:40:15

标题: 一个用于可扩展视觉-语言建模的单一Transformer

摘要: 我们提出了SOLO，一个用于可扩展视觉-语言建模的单一Transformer。目前大规模视觉-语言模型（LVLMs）例如LLaVA主要采用异构架构，将预训练的视觉编码器与大型语言模型（LLMs）连接起来，以促进视觉识别和复杂推理。尽管相对轻量级的训练取得了显著的性能，我们确定了四个主要的可扩展性限制：（1）视觉容量受到预训练视觉编码器的限制，通常比LLMs小一个数量级。（2）异构架构使得使用已建立的硬件和软件基础设施变得复杂。（3）对这种架构的缩放定律研究必须考虑三个单独的组件 - 视觉编码器、连接器和LLMs，这使得分析变得复杂。（4）现有视觉编码器的使用通常需要遵循预定义的图像输入预处理规范，例如通过将输入重新调整为固定分辨率的方形图像，这在处理和训练高分辨率图像或具有不寻常宽高比的图像时会带来困难。像SOLO这样的统一单Transformer架构有效地解决了LVLMs中的这些可扩展性问题；然而，它在现代环境中的有限采用可能源于缺乏平衡两种模态并确保十亿规模模型稳定训练的可靠训练配方。在本文中，我们介绍了第一个用于开发SOLO的开源训练配方，这是一个使用适度学术资源的开源7B LVLM。训练配方包括从LLMs初始化，对ImageNet和网络规模数据进行顺序预训练，以及在我们精心策划的高质量数据集上进行指导微调。在广泛的评估中，SOLO表现出与LLaVA-v1.5-7B相当的性能，特别擅长视觉数学推理。

更新时间: 2024-07-08 22:40:15

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.06438v1

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this, we introduce DebUnc, a multi-agent debate framework that uses uncertainty metrics to assess agent confidence levels. We adapted the LLM attention mechanism to adjust token weights based on confidence levels and also explored using textual prompts to convey confidence. Our evaluations across various benchmarks show that attention-based methods are particularly effective, and that as uncertainty metrics evolve, performance will continue to increase. The code is available at https://github.com/lukeyoffe/debunc

Updated: 2024-07-08 22:15:01

标题: DebUnc：通过不确定性估计减轻大型语言模型代理通信中的幻觉

摘要: 为了增强大型语言模型（LLM）的能力，引入了多代理辩论，多个LLM在多轮辩论中讨论问题的解决方案。然而，LLMs经常会产生看似自信但不正确的回应，这可能会误导其他代理。这在一定程度上是因为代理在标准辩论中不表达他们的自信水平。为了解决这个问题，我们引入了DebUnc，一个利用不确定性指标评估代理自信水平的多代理辩论框架。我们调整了LLM注意力机制，根据自信水平调整标记权重，并探索了使用文本提示传达自信。我们在各种基准测试中的评估表明，基于注意力的方法特别有效，随着不确定性指标的发展，性能将继续提高。代码可在https://github.com/lukeyoffe/debunc获取。

更新时间: 2024-07-08 22:15:01

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2407.06426v1

Large Language Models in Finance: A Survey

Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance. In this paper, we provide a practical survey focused on two key aspects of utilizing LLMs for financial tasks: existing solutions and guidance for adoption. First, we review current approaches employing LLMs in finance, including leveraging pretrained models via zero-shot or few-shot learning, fine-tuning on domain-specific data, and training custom LLMs from scratch. We summarize key models and evaluate their performance improvements on financial natural language processing tasks. Second, we propose a decision framework to guide financial professionals in selecting the appropriate LLM solution based on their use case constraints around data, compute, and performance needs. The framework provides a pathway from lightweight experimentation to heavy investment in customized LLMs. Lastly, we discuss limitations and challenges around leveraging LLMs in financial applications. Overall, this survey aims to synthesize the state-of-the-art and provide a roadmap for responsibly applying LLMs to advance financial AI.

Updated: 2024-07-08 22:13:09

标题: 金融领域中的大型语言模型：一项调研

摘要: 最近大型语言模型（LLMs）的进展为金融领域的人工智能应用打开了新的可能性。在本文中，我们提供了一个实用的调查，重点关注利用LLMs进行金融任务的两个关键方面：现有解决方案和采用指南。首先，我们回顾了当前在金融领域采用LLMs的方法，包括通过零射击或少量射击学习利用预训练模型，对领域特定数据进行微调，以及从头开始训练定制LLMs。我们总结了关键模型，并评估它们在金融自然语言处理任务上的性能改进。其次，我们提出了一个决策框架，指导金融专业人士根据其用例约束选择适当的LLM解决方案，围绕数据、计算和性能需求。该框架提供了从轻量级实验到定制LLMs的大量投资的路径。最后，我们讨论了在金融应用中利用LLMs的局限性和挑战。总的来说，这项调查旨在综合最新技术并为负责任地应用LLMs推动金融人工智能提供一条路线图。

更新时间: 2024-07-08 22:13:09

领域: q-fin.GN,cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.10723v2

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

This paper introduces \textit{online bilevel optimization} in which a sequence of time-varying bilevel problems is revealed one after the other. We extend the known regret bounds for online single-level algorithms to the bilevel setting. Specifically, we provide new notions of \textit{bilevel regret}, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and give regret bounds in terms of the path-length of the inner and outer minimizer sequences.

Updated: 2024-07-08 22:10:33

标题: 在线双层优化：在线交替梯度方法的遗憾分析

摘要: 这篇论文介绍了在线双层优化，在这种优化中，一系列随时间变化的双层问题逐一显现。我们将已知的在线单层算法的遗憾界扩展到双层设置中。具体而言，我们提出了新的\textit{双层遗憾}概念，开发了一种在线交替时间平均梯度方法，能够利用平滑性，并根据内部和外部最小化器序列的路径长度给出遗憾界。

更新时间: 2024-07-08 22:10:33

领域: math.OC,cs.DS,cs.LG

下载: http://arxiv.org/abs/2207.02829v7

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We introduce InsightBench, a benchmark dataset with three key features. First, it consists of 31 datasets representing diverse business use cases such as finance and incident management, each accompanied by a carefully curated set of insights planted in the datasets. Second, unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics, including formulating questions, interpreting answers, and generating a summary of insights and actionable steps. Third, we conducted comprehensive quality assurance to ensure that each dataset in the benchmark had clear goals and included relevant and meaningful questions and analysis. Furthermore, we implement a two-way evaluation mechanism using LLaMA-3-Eval as an effective, open-source evaluator method to assess agents' ability to extract insights. We also propose AgentPoirot, our baseline data analysis agent capable of performing end-to-end data analytics. Our evaluation on InsightBench shows that AgentPoirot outperforms existing approaches (such as Pandas Agent) that focus on resolving single queries. We also compare the performance of open- and closed-source LLMs and various evaluation strategies. Overall, this benchmark serves as a testbed to motivate further development in comprehensive data analytics and can be accessed here: https://github.com/ServiceNow/insight-bench.

Updated: 2024-07-08 22:06:09

标题: InsightBench: 通过多步洞察生成评估商业分析代理

摘要: 数据分析对于从数据中提取有价值的见解以帮助组织做出有效决策至关重要。我们介绍了InsightBench，一个具有三个关键特性的基准数据集。首先，它由31个数据集组成，代表了多样化的业务用例，如金融和事件管理，每个数据集都附带一个精心策划的见解集。其次，与现有基准不同，InsightBench评估代理的能力是基于他们执行端到端数据分析的能力，包括制定问题、解释答案和生成见解摘要及可行步骤。第三，我们进行了全面的质量保证，以确保基准中的每个数据集都有明确的目标，并包含相关和有意义的问题和分析。此外，我们使用LLaMA-3-Eval作为一个有效的开源评估方法实现了一个双向评估机制，以评估代理的提取见解的能力。我们还提出了AgentPoirot，我们的基准数据分析代理，能够执行端到端数据分析。我们在InsightBench上的评估显示，AgentPoirot胜过现有方法（如Pandas Agent），后者专注于解决单个查询。我们还比较了开源和闭源LLMs以及各种评估策略的表现。总的来说，这一基准数据集作为一个测试平台，激励进一步发展全面数据分析，并可在此处访问：https://github.com/ServiceNow/insight-bench。

更新时间: 2024-07-08 22:06:09

领域: cs.AI

下载: http://arxiv.org/abs/2407.06423v1

Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)

Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using ChatGPT, covering topics related to pressing social issues like COVID-19 misinformation, social bot deception, cyberbully, clickbait news, and the Russo-Ukrainian War. Our findings demonstrate that ChatGPT exhibits promise in handling these data annotation tasks, albeit with some challenges. Across the seven datasets, ChatGPT achieves an average annotation F1-score of 72.00%. Its performance excels in clickbait news annotation, correctly labeling 89.66% of the data. However, we also observe significant variations in performance across individual labels. Our study reveals predictable patterns in ChatGPT's annotation performance. Thus, we propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task. Researchers can use this to identify where ChatGPT might be suitable for their annotation requirements. We show that GPT-Rater effectively predicts ChatGPT's performance. It performs best on a clickbait headlines dataset by achieving an average F1-score of 95.00%. We believe that this research opens new avenues for analysis and can reduce barriers to engaging in social computing research.

Updated: 2024-07-08 22:04:30

标题: 探索ChatGPT在社交计算任务中复制人类标签的能力（扩展版）

摘要: 利用类似ChatGPT这样的大型语言模型（LLMs）的潜力可以通过包容、道德和可持续的方式来解决社会挑战。本文研究了ChatGPT能够标注社会计算任务数据的程度，旨在降低进行网络研究的复杂性和成本。为了评估ChatGPT的潜力，我们使用ChatGPT重新标注了七个数据集，涵盖了与紧迫社会问题相关的主题，如COVID-19的错误信息、社交机器人欺骗、网络欺凌、点击诱导新闻以及俄罗斯-乌克兰战争。我们的研究结果表明，尽管存在一些挑战，ChatGPT在处理这些数据标注任务方面表现出了潜力。在这七个数据集中，ChatGPT达到了平均注释F1分数为72.00％。它在点击诱导新闻注释方面表现出色，正确标记了89.66％的数据。然而，我们也观察到在各个标签上性能存在显著的变化。我们的研究揭示了ChatGPT注释表现中可预测的模式。因此，我们提出了GPT-Rater，一个工具，用于预测ChatGPT是否能够正确标记给定注释任务的数据。研究人员可以使用这个工具来确定ChatGPT在其注释需求中可能合适的地方。我们展示了GPT-Rater有效地预测了ChatGPT的表现。它在一个点击诱导标题数据集上表现最佳，平均F1分数达到了95.00％。我们相信这项研究为分析开辟了新途径，并可以降低从事社会计算研究的障碍。

更新时间: 2024-07-08 22:04:30

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06422v1

System stabilization with policy optimization on unstable latent manifolds

Stability is a basic requirement when studying the behavior of dynamical systems. However, stabilizing dynamical systems via reinforcement learning is challenging because only little data can be collected over short time horizons before instabilities are triggered and data become meaningless. This work introduces a reinforcement learning approach that is formulated over latent manifolds of unstable dynamics so that stabilizing policies can be trained from few data samples. The unstable manifolds are minimal in the sense that they contain the lowest dimensional dynamics that are necessary for learning policies that guarantee stabilization. This is in stark contrast to generic latent manifolds that aim to approximate all -- stable and unstable -- system dynamics and thus are higher dimensional and often require higher amounts of data. Experiments demonstrate that the proposed approach stabilizes even complex physical systems from few data samples for which other methods that operate either directly in the system state space or on generic latent manifolds fail.

Updated: 2024-07-08 21:57:28

标题: 在不稳定潜在流形上通过策略优化实现系统稳定化

摘要: 稳定性是研究动态系统行为时的基本要求。然而，通过强化学习来稳定动态系统是具有挑战性的，因为在短时间内只能收集到少量数据，然后就会触发不稳定性，数据变得毫无意义。本文介绍了一种基于不稳定动态系统的潜在流形的强化学习方法，从而可以从少量数据样本中训练出稳定策略。这些不稳定流形是最小的，因为它们包含了学习保证稳定性所必需的最低维度动态。这与旨在近似所有系统动态的通用潜在流形形成鲜明对比，因此通常需要更高维度和更多数据。实验证明，所提出的方法甚至可以从少量数据样本中稳定复杂的物理系统，而其他直接在系统状态空间中操作或在通用潜在流形上操作的方法则失败。

更新时间: 2024-07-08 21:57:28

领域: math.OC,cs.LG,cs.NA,math.DS,math.NA,37N35, 68T07, 90C30, 93C57, 93D15

下载: http://arxiv.org/abs/2407.06418v1

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Large deep learning models have achieved impressive performance across a range of applications. However, their large memory requirements, including parameter memory and activation memory, have become a significant challenge for their practical serving. While existing methods mainly address parameter memory, the importance of activation memory has been overlooked. Especially for long input sequences, activation memory is expected to experience a significant exponential growth as the length of sequences increases. In this approach, we propose AutoChunk, an automatic and adaptive compiler system that efficiently reduces activation memory for long sequence inference by chunk strategies. The proposed system generates chunk plans by optimizing through multiple stages. In each stage, the chunk search pass explores all possible chunk candidates and the chunk selection pass identifies the optimal one. At runtime, AutoChunk employs code generation to automatically apply chunk strategies. The experiments demonstrate that AutoChunk can reduce over 80\% of activation memory while maintaining speed loss within 10%, extend max sequence length by 3.2x to 11.7x, and outperform state-of-the-art methods by a large margin.

Updated: 2024-07-08 21:52:08

标题: AutoChunk：用于内存高效的长序列推理的自动激活块请问还有其他什么可以帮助您的吗？如果没有的话，谢谢！

摘要: 深度学习模型在各种应用中取得了令人印象深刻的性能。然而，它们的大内存需求，包括参数内存和激活内存，已经成为它们实际应用中的一个重要挑战。尽管现有方法主要解决参数内存的问题，但激活内存的重要性却被忽视了。特别是对于长输入序列，随着序列长度的增加，激活内存预计将经历显著的指数增长。在这种方法中，我们提出了AutoChunk，一个自动和自适应的编译器系统，通过分块策略有效减少长序列推断的激活内存。该系统通过多个阶段的优化生成分块计划。在每个阶段，分块搜索遍历所有可能的分块候选项，分块选择确定最佳候选项。在运行时，AutoChunk采用代码生成自动应用分块策略。实验表明，AutoChunk可以在保持速度损失在10%以内的情况下，减少超过80%的激活内存，将最大序列长度扩展3.2倍至11.7倍，并大幅优于最先进的方法。

更新时间: 2024-07-08 21:52:08

领域: cs.PF,cs.DC,cs.LG

下载: http://arxiv.org/abs/2401.10652v3

Hybrid Classical-Quantum architecture for vectorised image classification of hand-written sketches

Quantum machine learning (QML) investigates how quantum phenomena can be exploited in order to learn data in an alternative way, \textit{e.g.} by means of a quantum computer. While recent results evidence that QML models can potentially surpass their classical counterparts' performance in specific tasks, quantum technology hardware is still unready to reach quantum advantage in tasks of significant relevance to the broad scope of the computer science community. Recent advances indicate that hybrid classical-quantum models can readily attain competitive performances at low architecture complexities. Such investigations are often carried out for image-processing tasks, and are notably constrained to modelling \textit{raster images}, represented as a grid of two-dimensional pixels. Here, we introduce vector-based representation of sketch drawings as a test-bed for QML models. Such a lower-dimensional data structure results handful to benchmark model's performance, particularly in current transition times, where classical simulations of quantum circuits are naturally limited in the number of qubits, and quantum hardware is not readily available to perform large-scale experiments. We report some encouraging results for primitive hybrid classical-quantum architectures, in a canonical sketch recognition problem.

Updated: 2024-07-08 21:51:20

标题: 混合经典-量子架构用于手写草图的矢量化图像分类

摘要: 量子机器学习（QML）研究如何利用量子现象以替代方式学习数据，例如通过量子计算机。最近的结果表明，QML模型在特定任务中潜在地可以超越其经典对应物的性能，但量子技术硬件仍未准备好在计算机科学社区广泛范围内具有重要意义的任务中达到量子优势。最近的进展表明，混合经典-量子模型可以轻松地在低体系结构复杂性下获得竞争性能。这些研究通常用于图像处理任务，特别是限制在建模栅格图像上，表示为二维像素网格。在这里，我们引入基于向量的草图绘制表示作为QML模型的测试平台。这种低维数据结构有助于评估模型的性能，特别是在当前过渡时期，其中经典模拟量子电路受到自然限制，量子硬件不容易进行大规模实验。我们报告了一些对于基本的混合经典-量子体系结构在典型草图识别问题中的鼓舞人心的结果。

更新时间: 2024-07-08 21:51:20

领域: quant-ph,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06416v1

SmartChoices: Augmenting Software with Learned Implementations

In many software systems, heuristics are used to make decisions - such as cache eviction, task scheduling, and information presentation - that have a significant impact on overall system behavior. While machine learning may outperform these heuristics, replacing existing heuristics in a production system safely and reliably can be prohibitively costly. We present SmartChoices, a novel approach that reduces the cost to deploy production-ready ML solutions for contextual bandits problems. SmartChoices' interface cleanly separates problem formulation from implementation details: engineers describe their use case by defining datatypes for the context, arms, and feedback that are passed to SmartChoices APIs, while SmartChoices manages encoding & logging data and training, evaluating & deploying policies. Our implementation codifies best practices, is efficient enough for use in low-level applications, and provides valuable production features off the shelf via a shared library. Overall, SmartChoices enables non-experts to rapidly deploy production-ready ML solutions by eliminating many sources of technical debt common to ML systems. Engineers have independently used SmartChoices to improve a wide range of software including caches, batch processing workloads, and UI layouts, resulting in better latency, throughput, and click-through rates.

Updated: 2024-07-08 21:44:23

标题: 智能选择：利用学习实现增强软件

摘要: 在许多软件系统中，启发式方法被用来做出决策-例如缓存淘汰、任务调度和信息呈现-这些决策对整个系统行为产生重要影响。虽然机器学习可能会胜过这些启发式方法，但在生产系统中安全、可靠地替换现有的启发式方法可能成本过高。我们提出了SmartChoices，这是一种新方法，可以降低为上下文强化学习问题部署生产就绪的ML解决方案的成本。SmartChoices的界面清晰地将问题制定与实现细节分开：工程师通过定义传递给SmartChoices API的上下文、臂和反馈的数据类型来描述他们的用例，而SmartChoices则管理编码和记录数据以及训练、评估和部署策略。我们的实现体现了最佳实践，足够高效以用于低级应用，并通过共享库提供了可直接使用的有价值的生产功能。总的来说，SmartChoices通过消除常见的ML系统技术债务来源，使非专家能够迅速部署生产就绪的ML解决方案。工程师已经独立使用SmartChoices来改进各种软件，包括缓存、批处理工作负载和UI布局，从而提高了延迟、吞吐量和点击率。

更新时间: 2024-07-08 21:44:23

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2304.13033v3

If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers

Large language models (LLMs) sometimes exhibit dangerous unintended behaviors. Finding and fixing these is challenging because the attack surface is massive -- it is not tractable to exhaustively search for all possible inputs that may elicit such behavior. One specific and particularly challenging case is that if data-poisoning-injected trojans, since there is no way to know what they are to search for them. To our knowledge, there is no generally applicable method to unlearn unknown trojans injected during pre-training. This work seeks to provide a general purpose recipe (filters) and a specific implementation (LoRA) filters that work in practice on small to medium sized models. The focus is primarily empirical, though some perplexing behavior opens the door to the fundamental question of how LLMs store and process information. Not unexpectedly, we find that our filters work best on the residual stream and the latest layers.

Updated: 2024-07-08 21:40:23

标题: 如果你不理解它，就不要使用它：通过在层之间添加过滤器来消除特洛伊木马

摘要: 大型语言模型（LLMs）有时会表现出危险的意外行为。发现并修复这些问题具有挑战性，因为攻击面很大 - 不可能穷尽地搜索可能引发此类行为的所有可能输入。一个特定且特别具有挑战性的案例是数据毒害注入的特洛伊木马，因为无法知道它们是什么，也无法搜索它们。据我们所知，在预训练期间注入的未知特洛伊木马没有通用方法可供取消学习。本文旨在提供一种通用的方法（过滤器）和一个具体的实现（LoRA）过滤器，可以在小到中等规模的模型上实践。重点主要是经验性的，尽管一些令人困惑的行为引发了一个基本问题，即LLMs如何存储和处理信息。毫不意外的是，我们发现我们的过滤器在残留流和最新层上表现最佳。

更新时间: 2024-07-08 21:40:23

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.06411v1

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

In text-to-image generation, using negative prompts, which describe undesirable image characteristics, can significantly boost image quality. However, producing good negative prompts is manual and tedious. To address this, we propose NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning. Our combined approach results in a substantial increase of 25% in Inception Score compared to other approaches and surpasses ground-truth negative prompts from the test set. Furthermore, with NegOpt we can preferentially optimize the metrics most important to us. Finally, we construct Negative Prompts DB (https://github.com/mikeogezi/negopt), a publicly available dataset of negative prompts.

Updated: 2024-07-08 21:37:03

标题: 优化负面提示以增强文本到图像生成中的美学和保真度

摘要: 在文本到图像生成中，使用描述不良图像特征的负面提示可以显著提高图像质量。然而，生成好的负面提示是手动且繁琐的。为了解决这个问题，我们提出了NegOpt，一种新颖的方法，通过监督微调和强化学习优化负面提示的生成，以增强图像生成。我们的综合方法使Inception Score相比其他方法显著提高了25%，并超过了测试集中的基准负面提示。此外，通过NegOpt，我们可以优先优化对我们最重要的指标。最后，我们构建了Negative Prompts DB（https://github.com/mikeogezi/negopt），这是一个公开可用的负面提示数据集。

更新时间: 2024-07-08 21:37:03

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.07605v2

AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships

Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict causal genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.

Updated: 2024-07-08 21:23:25

标题: 基于人工智能的多组学整合，用于多尺度因果基因型-环境-表型关系的预测建模

摘要: 尽管有大量的单细胞多组学数据，但在预测人体中新的遗传和化学干扰的后果方面仍然具有挑战性。这需要对所有生物学水平的分子相互作用有所了解，涵盖疾病模型和人类。目前的机器学习方法主要建立基因型和表型之间的统计相关性，但难以识别具有生理意义的因果因素，限制了它们的预测能力。预测建模中的关键挑战包括标记数据的稀缺性，跨不同领域的泛化和从相关性中分离因果关系。鉴于最近在多组学数据整合方面的进展，我们提出了一个新的人工智能(AI)驱动的受生物学启发的多尺度建模框架来解决这些问题。该框架将整合不同生物水平、生物体层次和物种的多组学数据，以预测在不同条件下的因果基因型-环境-表型关系。受生物学启发的AI模型可能会识别新的分子靶点、生物标志物、药物和个性化药物，以满足目前未满足的医疗需求。

更新时间: 2024-07-08 21:23:25

领域: cs.AI

下载: http://arxiv.org/abs/2407.06405v1

Knowledge Management in the Companion Cognitive Architecture

One of the fundamental aspects of cognitive architectures is their ability to encode and manipulate knowledge. Without a consistent, well-designed, and scalable knowledge management scheme, an architecture will be unable to move past toy problems and tackle the broader problems of cognition. In this paper, we document some of the challenges we have faced in developing the knowledge stack for the Companion cognitive architecture and discuss the tools, representations, and practices we have developed to overcome them. We also lay out a series of potential next steps that will allow Companion agents to play a greater role in managing their own knowledge. It is our hope that these observations will prove useful to other cognitive architecture developers facing similar challenges.

Updated: 2024-07-08 21:20:05

标题: 《伴侣认知架构中的知识管理》

摘要: 认知架构的一个基本方面是它们能够编码和操作知识。没有一套一致的、设计良好的和可扩展的知识管理方案，架构将无法超越玩具问题，解决更广泛的认知问题。在本文中，我们记录了在开发Companion认知架构的知识堆栈过程中所面临的一些挑战，并讨论了我们开发的工具、表示和实践来克服这些挑战。我们还提出了一系列潜在的下一步，这将让Companion代理在管理自己的知识方面发挥更大的作用。我们希望这些观察对其他面临类似挑战的认知架构开发者有用。

更新时间: 2024-07-08 21:20:05

领域: cs.AI

下载: http://arxiv.org/abs/2407.06401v1

Interactively Diagnosing Errors in a Semantic Parser

Hand-curated natural language systems provide an inspectable, correctable alternative to language systems based on machine learning, but maintaining them requires considerable effort and expertise. Interactive Natural Language Debugging (INLD) aims to lessen this burden by casting debugging as a reasoning problem, asking the user a series of questions to diagnose and correct errors in the system's knowledge. In this paper, we present work in progress on an interactive error diagnosis system for the CNLU semantic parser. We show how the first two stages of the INLD pipeline (symptom identification and error localization) can be cast as a model-based diagnosis problem, demonstrate our system's ability to diagnose semantic errors on synthetic examples, and discuss design challenges and frontiers for future work.

Updated: 2024-07-08 21:16:09

标题: 交互式诊断语义解析器中的错误

摘要: 手工策划的自然语言系统提供了一种可检查、可纠正的替代方案，与基于机器学习的语言系统相比，但是维护这些系统需要相当多的努力和专业知识。交互式自然语言调试（INLD）旨在通过将调试作为一种推理问题，向用户提出一系列问题来诊断和纠正系统知识中的错误，以减轻这种负担。在本文中，我们介绍了一个针对CNLU语义解析器的交互式错误诊断系统的研究进展。我们展示了INLD流水线的前两个阶段（症状识别和错误定位）如何被视为基于模型的诊断问题，展示了我们系统在合成示例中诊断语义错误的能力，并讨论了未来工作的设计挑战和前沿。

更新时间: 2024-07-08 21:16:09

领域: cs.AI

下载: http://arxiv.org/abs/2407.06400v1

Learning Diffusion Priors from Observations by Expectation Maximization

Diffusion models recently proved to be remarkable priors for Bayesian inverse problems. However, training these models typically requires access to large amounts of clean data, which could prove difficult in some settings. In this work, we present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only. Unlike previous works, our method leads to proper diffusion models, which is crucial for downstream tasks. As part of our method, we propose and motivate a new posterior sampling scheme for unconditional diffusion models. We present empirical evidence supporting the effectiveness of our method.

Updated: 2024-07-08 21:12:54

标题: 通过期望最大化从观察中学习扩散先验

摘要: 最近，扩散模型被证明是贝叶斯反问题的显著先验。然而，训练这些模型通常需要大量干净的数据，这在某些情况下可能会很困难。在这项工作中，我们提出了一种基于期望最大化算法的新方法，仅从不完整和嘈杂的观测数据中训练扩散模型。与先前的研究不同，我们的方法导致正确的扩散模型，这对下游任务至关重要。作为我们方法的一部分，我们提出并推动了一种新的后验采样方案，用于无条件扩散模型。我们提供实证证据支持我们方法的有效性。

更新时间: 2024-07-08 21:12:54

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.13712v2

JANET: Joint Adaptive predictioN-region Estimation for Time-series

Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (Joint Adaptive predictioN-region Estimation for Time-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET's superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.

Updated: 2024-07-08 21:03:15

标题: JANET：时间序列的联合自适应预测区域估计

摘要: Conformal prediction提供了具有理论保证的预测集，但是其基本假设的可交换性限制了其在时间序列数据中的适用性。此外，现有方法在处理多步预测任务时存在困难，其中跨多个未来时间点的不确定性估计至关重要。我们提出了JANET（Joint Adaptive predictioN-region Estimation for Time-series），这是一个构建符合预测区域的新框架，对于单变量和多变量时间序列均有效。JANET泛化了归纳式符合框架，并有效地生成具有受控K-家族误差率的联合预测区域，使其能够灵活地适应特定应用需求。我们的实证评估表明，在各种时间序列数据集中，JANET在多步预测任务中表现出卓越的性能，突显了其在顺序数据中可靠和可解释的不确定性量化的潜力。

更新时间: 2024-07-08 21:03:15

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2407.06390v1

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g. performing a dance routine). Further, it makes end-to-end finetuning on embodied data challenging due to domain shift and catastrophic forgetting. We introduce our method -- Learnable Latent Codes as Bridges (LCB) -- as an alternate architecture to overcome these limitations. \method~uses a learnable latent code to act as a bridge between LLMs and low-level policies. This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations. Additionally, it enables end-to-end finetuning without destroying the embedding space of word tokens learned during pre-training. Through experiments on Language Table and Calvin, two common language based benchmarks for embodied agents, we find that \method~outperforms baselines (including those w/ GPT-4V) that leverage pure language as the interface layer on tasks that require reasoning and multi-step behaviors.

Updated: 2024-07-08 21:02:37

标题: 从LLMs到动作：潜在代码作为层次化机器人控制中的桥梁

摘要: 长期以来，机器人的分层控制一直受到一个困扰，即需要一个明确定义的接口层来在高级任务规划器和低级策略之间进行通信。随着LLMs的出现，语言作为一个潜在的接口层逐渐崭露头角。然而，这种方法存在一些局限性。并非所有任务都可以分解为在自然语言中容易表达的步骤（例如执行舞蹈动作）。此外，由于领域转移和灾难性遗忘，对具体数据进行端到端微调变得具有挑战性。我们介绍了我们的方法--可学习的潜在代码作为桥梁（LCB）--作为一种克服这些限制的替代架构。该方法使用可学习的潜在代码作为LLMs和低级策略之间的桥梁。这使得LLMs能够在任务计划中灵活地传达目标，而不完全受限于语言限制。此外，它使得在不破坏预训练期间学习的单词标记的嵌入空间的情况下进行端到端微调成为可能。通过对基于语言的两个常见基准测试 Language Table 和 Calvin 的实验，我们发现\method 在需要推理和多步行为的任务上优于纯语言作为接口层的基线（包括使用 GPT-4V 的基线）。

更新时间: 2024-07-08 21:02:37

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.04798v2

Bucketized Active Sampling for Learning ACOPF

This paper considers optimization proxies for Optimal Power Flow (OPF), i.e., machine-learning models that approximate the input/output relationship of OPF. Recent work has focused on showing that such proxies can be of high fidelity. However, their training requires significant data, each instance necessitating the (offline) solving of an OPF. To meet the requirements of market-clearing applications, this paper proposes Bucketized Active Sampling (BAS), a novel active learning framework that aims at training the best possible OPF proxy within a time limit. BAS partitions the input domain into buckets and uses an acquisition function to determine where to sample next. By applying the same partitioning to the validation set, BAS leverages labeled validation samples in the selection of unlabeled samples. BAS also relies on an adaptive learning rate that increases and decreases over time. Experimental results demonstrate the benefits of BAS.

Updated: 2024-07-08 21:00:14

标题: 基于桶化的主动采样用于学习ACOPF

摘要: 本文考虑了用于最优潮流（OPF）的优化代理，即近似OPF输入/输出关系的机器学习模型。最近的研究集中展示了这些代理可以具有高保真度。然而，它们的训练需要大量数据，每个实例都需要离线解决一个OPF问题。为了满足市场清算应用的要求，本文提出了一种新颖的主动学习框架Bucketized Active Sampling（BAS），旨在在时间限制内训练最佳的OPF代理。BAS将输入域划分为桶，并使用一种获取函数确定下一个采样点的位置。通过将相同的划分应用于验证集，BAS利用标记的验证样本来选择未标记的样本。BAS还依赖于随时间增减的自适应学习速率。实验结果展示了BAS的好处。

更新时间: 2024-07-08 21:00:14

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2208.07497v3

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO) to create robust system-level defenses. Our approach directly incorporates the adversary into the defensive objective and optimizes a lightweight and transferable suffix, enabling RPO to adapt to worst-case adaptive attacks. Our theoretical and experimental results show improved robustness to both jailbreaks seen during optimization and unknown jailbreaks, reducing the attack success rate (ASR) on GPT-4 to 6% and Llama-2 to 0% on JailbreakBench, setting the state-of-the-art. Code can be found at https://github.com/lapisrocks/rpo

Updated: 2024-07-08 20:33:36

标题: 抵御越狱攻击的语言模型强化提示优化

摘要: 尽管人工智能对齐取得了进展，但大型语言模型（LLMs）仍然容易受到对抗性攻击或越狱攻击的影响，即攻击者可以修改提示以引发不良行为。虽然已经提出了一些防御方案，但它们尚未适应新提出的攻击和更具挑战性的威胁模型。为了解决这个问题，我们提出了一个基于优化的目标，用于防御LLMs免受越狱攻击，并提出了一种算法，Robust Prompt Optimization（RPO），用于创建强大的系统级防御。我们的方法直接将对手纳入防御目标中，并优化一个轻量级且可转移的后缀，使RPO能够适应最坏情况下的自适应攻击。我们的理论和实验结果表明，在优化过程中和未知的越狱攻击中，我们实现了对GPT-4的攻击成功率（ASR）降低到6％，对Llama-2降低到0％，在JailbreakBench上创造了最新技术。可以在https://github.com/lapisrocks/rpo找到代码。

更新时间: 2024-07-08 20:33:36

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2401.17263v4

Non-Robust Features are Not Always Useful in One-Class Classification

The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our results show that lightweight one-class classifiers learn features that are not robust (e.g. texture) under stronger attacks. However, unlike in multi-class classification (Ilyas et al., 2019), these non-robust features are not always useful for the one-class task, suggesting that learning these unpredictive and non-robust features is an unwanted consequence of training.

Updated: 2024-07-08 20:32:19

标题: 非稳健特征在一类分类中并非总是有用

摘要: 机器学习模型的鲁棒性受到对抗性样本的存在而受到质疑。我们研究了在需要轻量级模型进行单类分类的实际应用中对抗性样本的威胁。基于Ilyas等人的研究（2019），我们调查了轻量级单类分类器对对抗攻击的脆弱性及可能的原因。我们的结果表明，轻量级单类分类器学习的特征不具有鲁棒性（例如纹理）在更强的攻击下。然而，与多类分类不同（Ilyas等人，2019），这些不具有鲁棒性的特征并不总是对单类任务有用，这表明学习这些不可预测和不具有鲁棒性的特征是训练的一个不希望的后果。

更新时间: 2024-07-08 20:32:19

领域: cs.LG,cs.CV,68T45,I.2.10; I.4.10; I.5.4

下载: http://arxiv.org/abs/2407.06372v1

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models

Explainable Artificial Intelligence (XAI) aims to uncover the decision-making processes of AI models. However, the data used for such explanations can pose security and privacy risks. Existing literature identifies attacks on machine learning models, including membership inference, model inversion, and model extraction attacks. These attacks target either the model or the training data, depending on the settings and parties involved. XAI tools can increase the vulnerability of model extraction attacks, which is a concern when model owners prefer black-box access, thereby keeping model parameters and architecture private. To exploit this risk, we propose AUTOLYCUS, a novel retraining (learning) based model extraction attack framework against interpretable models under black-box settings. As XAI tools, we exploit Local Interpretable Model-Agnostic Explanations (LIME) and Shapley values (SHAP) to infer decision boundaries and create surrogate models that replicate the functionality of the target model. LIME and SHAP are mainly chosen for their realistic yet information-rich explanations, coupled with their extensive adoption, simplicity, and usability. We evaluate AUTOLYCUS on six machine learning datasets, measuring the accuracy and similarity of the surrogate model to the target model. The results show that AUTOLYCUS is highly effective, requiring significantly fewer queries compared to state-of-the-art attacks, while maintaining comparable accuracy and similarity. We validate its performance and transferability on multiple interpretable ML models, including decision trees, logistic regression, naive bayes, and k-nearest neighbor. Additionally, we show the resilience of AUTOLYCUS against proposed countermeasures.

Updated: 2024-07-08 20:17:23

标题: AUTOLYCUS：利用可解释人工智能（XAI）对可解释模型进行模型提取攻击

摘要: 可解释人工智能（XAI）旨在揭示AI模型的决策过程。然而，用于此类解释的数据可能会带来安全和隐私风险。现有文献确定了对机器学习模型的攻击，包括成员推断、模型反演和模型提取攻击。这些攻击针对模型或训练数据，取决于设置和涉及的各方。 XAI工具可能增加模型提取攻击的脆弱性，这在模型所有者更倾向于黑盒访问、因此保持模型参数和架构私密时尤为令人担忧。为了利用这一风险，我们提出了AUTOLYCUS，一种针对黑盒设置下可解释模型的基于重新训练的模型提取攻击框架。作为XAI工具，我们利用局部可解释的模型不可知解释（LIME）和Shapley值（SHAP）来推断决策边界并创建替代模型，以复制目标模型的功能。LIME和SHAP主要被选择是因为它们具有现实而信息丰富的解释，加上其广泛采用、简单和易用。我们在六个机器学习数据集上评估了AUTOLYCUS，测量了替代模型与目标模型的准确性和相似性。结果显示AUTOLYCUS非常有效，与最先进的攻击相比，需要显著较少的查询，同时保持可比较的准确性和相似性。我们验证了其在多个可解释ML模型上的性能和可转移性，包括决策树、逻辑回归、朴素贝叶斯和k最近邻。此外，我们展示了AUTOLYCUS对提出的对策的抵抗力。

更新时间: 2024-07-08 20:17:23

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2302.02162v3

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks. Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards.

Updated: 2024-07-08 20:15:46

标题: 重新审视稀疏奖励在目标达成强化学习中的作用

摘要: 许多现实世界中的机器人学习问题，比如拾取和放置或到达目的地，可以被看作是尽快达到目标状态的问题。这些问题，当被制定为情节性强化学习任务时，可以很容易地被规定为与我们预期目标良好对齐的问题：在每一个时间步骤上都有-1的奖励，当达到目标状态时终止，称为最短时间任务。尽管这种简单性，这些规范往往被忽视，而更倾向于密集奖励，因为它们被认为难以理解且缺乏信息。我们的研究对比了两种奖励范式，揭示了最短时间任务规范不仅有助于学习更高质量的策略，而且在性能指标上也可以超越基于密集奖励的策略。至关重要的是，我们还确定了初始策略的目标命中率作为在这种稀疏反馈环境中学习成功的稳健早期指标。最后，利用四个不同的真实机器人平台，我们展示了使用恒定负奖励可以在两到三小时内从零开始学习基于像素的策略是可能的。

更新时间: 2024-07-08 20:15:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.00324v2

Improving Text-To-Audio Models with Synthetic Captions

It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.

Updated: 2024-07-08 20:15:33

标题: 用合成字幕改进文本到音频模型

摘要: 获取高质量的训练数据，尤其是字幕，对于文本到音频模型来说是一个开放的挑战。尽管先前的方法利用“仅文本语言模型”来增强和改进字幕，但这些方法在规模和音频与字幕之间的连贯性方面存在限制。在这项工作中，我们提出了一个音频字幕生成流程，使用“音频语言模型”来合成规模化的准确和多样化的音频字幕。我们利用这个流程为AudioSet生成了一个名为“AF-AudioSet”的合成字幕数据集，然后评估了在这些合成字幕上预训练文本到音频模型的好处。通过对AudioCaps和MusicCaps进行系统评估，我们发现利用我们的流程和合成字幕能够显著提高音频生成质量，达到了一个新的“最新技术水平”。

更新时间: 2024-07-08 20:15:33

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.15487v2

Learning Regionalization using Accurate Spatial Cost Gradients within a Differentiable High-Resolution Hydrological Model: Application to the French Mediterranean Region

Estimating spatially distributed hydrological parameters in ungauged catchments poses a challenging regionalization problem and requires imposing spatial constraints given the sparsity of discharge data. A possible approach is to search for a transfer function that quantitatively relates physical descriptors to conceptual model parameters. This paper introduces a Hybrid Data Assimilation and Parameter Regionalization (HDA-PR) approach incorporating learnable regionalization mappings, based on either multi-linear regressions or artificial neural networks (ANNs), into a differentiable hydrological model. This approach demonstrates how two differentiable codes can be linked and their gradients chained, enabling the exploitation of heterogeneous datasets across extensive spatio-temporal computational domains within a high-dimensional regionalization context, using accurate adjoint-based gradients. The inverse problem is tackled with a multi-gauge calibration cost function accounting for information from multiple observation sites. HDA-PR was tested on high-resolution, hourly and kilometric regional modeling of 126 flash-flood-prone catchments in the French Mediterranean region. The results highlight a strong regionalization performance of HDA-PR especially in the most challenging upstream-to-downstream extrapolation scenario with ANN, achieving median Nash-Sutcliffe efficiency (NSE) scores from 0.6 to 0.71 for spatial, temporal, spatio-temporal validations, and improving NSE by up to 30% on average compared to the baseline model calibrated with lumped parameters. ANN enables to learn a non-linear descriptors-to-parameters mapping which provides better model controllability than a linear mapping for complex calibration cases.

Updated: 2024-07-08 20:08:43

标题: 学习区域化：在可微分的高分辨率水文模型中利用精确的空间成本梯度——以法国地中海地区为例

摘要: 在未测量的集水区中估算空间分布的水文参数是一个具有挑战性的区域化问题，需要在流量数据稀疏的情况下施加空间约束。一种可能的方法是寻找一个转移函数，定量地将物理描述符与概念模型参数联系起来。本文介绍了一种混合数据同化和参数区域化（HDA-PR）方法，将可学习的区域化映射（基于多线性回归或人工神经网络）整合到一个可微的水文模型中。这种方法展示了如何链接两个可微代码并链接它们的梯度，利用精确的伴随梯度在高维区域化背景下利用广泛的时空计算域中的异构数据集。逆问题通过考虑来自多个观测站点的信息的多站标定成本函数得到解决。HDA-PR在法国地中海地区126个易发洪水的集水区进行了高分辨率、小时级和公里级区域建模的测试。结果突出了HDA-PR在最具挑战性的上游到下游外推场景中特别表现出色，利用ANN实现了中位Nash-Sutcliffe效率（NSE）得分从0.6到0.71的空间、时间、时空验证，并与使用集中参数校准的基线模型相比，平均提高了30%的NSE。ANN能够学习非线性的描述符到参数映射，提供了比复杂校准情况下的线性映射更好的模型可控性。

更新时间: 2024-07-08 20:08:43

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2308.02040v2

Calibrating Transformers via Sparse Gaussian Processes

Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. It replaces the scaled dot-product operation with a valid symmetric kernel and uses sparse Gaussian processes (SGP) techniques to approximate the posterior processes of MHA outputs. Empirically, on a suite of prediction tasks on text, images and graphs, SGPA-based Transformers achieve competitive predictive accuracy, while noticeably improving both in-distribution calibration and out-of-distribution robustness and detection.

Updated: 2024-07-08 19:56:35

标题: 通过稀疏高斯过程对变压器进行校准

摘要: Transformer模型在自然语言处理、语音识别和计算机视觉等各种应用中取得了深远的成功。将Transformer的成功延伸到安全关键领域需要进行校准的不确定性估计，这一领域仍未得到充分探索。为了解决这个问题，我们提出了Sparse Gaussian Process attention（SGPA），它在Transformer的多头注意力块（MHA）的输出空间中直接执行贝叶斯推断，以校准其不确定性。它用一个合法的对称核替换了缩放点积操作，并使用稀疏高斯过程（SGP）技术来近似MHA输出的后验过程。在文本、图像和图形等一系列预测任务中，基于SGPA的Transformer实现了有竞争力的预测准确性，同时显著改善了分布内校准和分布外稳健性和检测能力。

更新时间: 2024-07-08 19:56:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2303.02444v3

Parametric Matrix Models

We present a general class of machine learning algorithms called parametric matrix models. In contrast with most existing machine learning models that imitate the biology of neurons, parametric matrix models use matrix equations that emulate the physics of quantum systems. Similar to how physics problems are usually solved, parametric matrix models learn the governing equations that lead to the desired outputs. Parametric matrix models can be efficiently trained from empirical data, and the equations may use algebraic, differential, or integral relations. While originally designed for scientific computing, we prove that parametric matrix models are universal function approximators that can be applied to general machine learning problems. After introducing the underlying theory, we apply parametric matrix models to a series of different challenges that show their performance for a wide range of problems. For all the challenges tested here, parametric matrix models produce accurate results within an efficient and interpretable computational framework that allows for input feature extrapolation.

Updated: 2024-07-08 19:55:41

标题: 参数矩阵模型

摘要: 我们提出了一类名为参数矩阵模型的机器学习算法。与大多数现有的模仿神经元生物学的机器学习模型不同，参数矩阵模型使用模拟量子系统物理的矩阵方程。与通常解决物理问题的方式类似，参数矩阵模型学习导致期望输出的控制方程。参数矩阵模型可以高效地从经验数据中训练，并且方程可以使用代数、微分或积分关系。虽然最初设计用于科学计算，我们证明参数矩阵模型是通用的函数逼近器，可应用于一般的机器学习问题。在介绍基础理论后，我们将参数矩阵模型应用于一系列不同的挑战，展示它们在广泛问题范围内的性能。在这里测试的所有挑战中，参数矩阵模型在一个高效且可解释的计算框架内产生准确的结果，允许输入特征的外推。

更新时间: 2024-07-08 19:55:41

领域: cs.LG,cond-mat.dis-nn,nucl-th,physics.comp-ph,quant-ph

下载: http://arxiv.org/abs/2401.11694v3

Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regression not only as a tool for translation but also as a means to uncover counterexamples. This procedure terminates when no counterexamples are found in the analytical formulation. Compared with previous results, our algorithm directly produces an analytical form of the Lyapunov function with improved interpretability in both the learning process and the final results. We apply our algorithm to 2-D inverted pendulum, path following, Van Der Pol Oscillator, 3-D trig dynamics, 4-D rotating wheel pendulum, 6-D 3-bus power system, and demonstrate that our algorithm successfully finds their valid Lyapunov functions.

Updated: 2024-07-08 19:54:08

标题: 将神经网络和符号回归相结合用于分析李雅普诺夫函数的发现

摘要: 我们提出了CoNSAL（结合神经网络和符号回归构建非线性动态系统的分析李雅普诺夫函数）来构建非线性动态系统的分析李雅普诺夫函数。该框架包含神经网络李雅普诺夫函数和符号回归组件，其中符号回归被应用于将神经网络提炼为精确的分析形式。我们的方法不仅利用符号回归作为翻译工具，还作为发现反例的手段。当在分析形式中找不到反例时，此过程终止。与先前的结果相比，我们的算法直接产生了李雅普诺夫函数的分析形式，提高了在学习过程和最终结果中的可解释性。我们将算法应用于2-D倒立摆，路径跟踪，范德波尔振荡器，3-D三角动力学，4-D旋转车轮摆，6-D三母线电力系统，并证明我们的算法成功找到了它们的有效李雅普诺夫函数。

更新时间: 2024-07-08 19:54:08

领域: eess.SY,cs.AI,cs.SC,cs.SY

下载: http://arxiv.org/abs/2406.15675v2

P3GNN: A Privacy-Preserving Provenance Graph-Based Model for APT Detection in Software Defined Networking

Software Defined Networking (SDN) has brought significant advancements in network management and programmability. However, this evolution has also heightened vulnerability to Advanced Persistent Threats (APTs), sophisticated and stealthy cyberattacks that traditional detection methods often fail to counter, especially in the face of zero-day exploits. A prevalent issue is the inadequacy of existing strategies to detect novel threats while addressing data privacy concerns in collaborative learning scenarios. This paper presents P3GNN (privacy-preserving provenance graph-based graph neural network model), a novel model that synergizes Federated Learning (FL) with Graph Convolutional Networks (GCN) for effective APT detection in SDN environments. P3GNN utilizes unsupervised learning to analyze operational patterns within provenance graphs, identifying deviations indicative of security breaches. Its core feature is the integration of FL with homomorphic encryption, which fortifies data confidentiality and gradient integrity during collaborative learning. This approach addresses the critical challenge of data privacy in shared learning contexts. Key innovations of P3GNN include its ability to detect anomalies at the node level within provenance graphs, offering a detailed view of attack trajectories and enhancing security analysis. Furthermore, the models unsupervised learning capability enables it to identify zero-day attacks by learning standard operational patterns. Empirical evaluation using the DARPA TCE3 dataset demonstrates P3GNNs exceptional performance, achieving an accuracy of 0.93 and a low false positive rate of 0.06.

Updated: 2024-07-08 19:50:26

标题: P3GNN：软件定义网络中用于高级持久性图的隐私保护型模型，用于APT检测

摘要: 软件定义网络（SDN）在网络管理和可编程性方面带来了重大进步。然而，这种演变也加剧了对高级持续性威胁（APTs）的脆弱性，这些是复杂且隐秘的网络攻击，传统检测方法常常无法应对，特别是在面对零日漏洞时。一个普遍存在的问题是现有策略无法检测新型威胁，同时解决在协作学习场景中的数据隐私问题。本文介绍了P3GNN（基于隐私保护溯源图的图神经网络模型），这是一种新颖的模型，将联邦学习（FL）与图卷积网络（GCN）相结合，用于有效地检测SDN环境中的APT。P3GNN利用无监督学习来分析溯源图中的操作模式，识别出安全漏洞的指标性偏差。其核心特点是将FL与同态加密相结合，以在协作学习过程中加强数据保密性和梯度完整性。这种方法解决了在共享学习环境中数据隐私的关键挑战。P3GNN的关键创新包括其能够在溯源图中的节点级别检测异常，提供攻击轨迹的详细视图，并增强安全分析。此外，该模型的无监督学习能力使其能够通过学习标准操作模式来识别零日攻击。使用DARPA TCE3数据集进行的实证评估显示，P3GNN表现出色，达到了0.93的准确率和0.06的低误报率。

更新时间: 2024-07-08 19:50:26

领域: cs.CR

下载: http://arxiv.org/abs/2406.12003v2

Self-Organising Neural Discrete Representation Learning à la Kohonen

Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications. Vector Quantisation (VQ) has become popular for this, in particular in the context of generative models, such as Variational Auto-Encoders (VAEs), where the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here, we study an alternative VQ algorithm based on Kohonen's learning rule for the Self-Organising Map (KSOM; 1982). EMA-VQ is a special case of KSOM. KSOM is known to offer two potential benefits: empirically, it converges faster than EMA-VQ, and KSOM-generated discrete representations form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the brain's topographic map. We revisit these properties by using KSOM in VQ-VAEs for image processing. In our experiments, the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, but KSOM is generally much more robust, e.g., w.r.t. the choice of initialisation schemes.

Updated: 2024-07-08 19:47:40

标题: 自组织神经离散表示学习 à la Kohonen

摘要: 在神经网络中，从连续表示学习到离散表示对许多现代应用至关重要。矢量量化（VQ）已经在这方面变得流行，特别是在生成模型的背景下，比如变分自动编码器（VAEs），这里通常使用基于指数移动平均的VQ（EMA-VQ）算法。在这里，我们研究了一种基于Kohonen自组织映射学习规则的替代VQ算法（KSOM; 1982）。EMA-VQ是KSOM的一个特例。KSOM已知具有两个潜在的好处：经验上，它比EMA-VQ更快地收敛，并且KSOM生成的离散表示在网格上形成了拓扑结构，其节点是离散符号，从而产生了大脑拓扑地图的人工版本。我们通过在图像处理中使用KSOM在VQ-VAEs中重新审视这些特性。在我们的实验中，与良好配置的EMA-VQ相比，加速只在训练开始时可观察到，但KSOM通常更加稳健，例如在选择初始化方案方面。

更新时间: 2024-07-08 19:47:40

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2302.07950v2

Large Language Model Recall Uncertainty is Modulated by the Fan Effect

This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan effect. Our results show that removing uncertainty disrupts the observed effect. The experiments suggest the fan effect is consistent whether the fan value is induced in-context or in the pre-training data. Finally, these findings provide in-silico evidence that fan effects and typicality are expressions of the same phenomena.

Updated: 2024-07-08 19:40:50

标题: 大型语言模型的召回不确定性受风扇效应调制

摘要: 本文评估了大型语言模型（LLMs）在预先训练人类文本数据后是否表现出类似于安德森在人类中发现的认知风扇效应。我们进行了两组设计用于引发风扇效应的上下文回忆实验。与人类结果一致，我们发现LLM的回忆不确定性，通过标记概率测量，受到风扇效应的影响。我们的结果表明，消除不确定性会干扰观察到的效应。实验表明，无论在上下文中还是在预先训练数据中诱发风扇值，风扇效应都是一致的。最后，这些发现提供了计算机证据，即风扇效应和典型性是同一现象的表达。

更新时间: 2024-07-08 19:40:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06349v1

FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools primarily analyze individual contracts and resort to brute-force methods for DeFi protocols crossing numerous smart contracts, leading to inefficiency. We introduce Foray, a highly effective attack synthesis framework against deep logical bugs in DeFi protocols. Foray proposes a novel attack sketch generation and completion framework. Specifically, instead of treating DeFis as regular programs, we design a domain-specific language (DSL) to lift the low-level smart contracts into their high-level financial operations. Based on our DSL, we first compile a given DeFi protocol into a token flow graph, our graphical representation of DeFi protocols. Then, we design an efficient sketch generation method to synthesize attack sketches for a certain attack goal (e.g., price manipulation, arbitrage, etc.). This algorithm strategically identifies candidate sketches by finding reachable paths in TFG, which is much more efficient than random enumeration. For each candidate sketch written in our DSL, Foray designs a domain-specific symbolic compilation to compile it into SMT constraints. Our compilation simplifies the constraints by removing redundant smart contract semantics. It maintains the usability of symbolic compilation, yet scales to problems orders of magnitude larger. Finally, the candidates are completed via existing solvers and are transformed into concrete attacks via direct syntax transformation.

Updated: 2024-07-08 19:35:48

标题: 突袭：针对DeFi协议中深层逻辑漏洞的有效攻击合成

摘要: 随着去中心化金融（DeFi）应用的兴起，区块链的采用率迅速增长。然而，DeFi协议管理的数字资产的巨大价值使它们成为攻击的主要目标。当前的智能合约漏洞检测工具在处理DeFi协议时面临困难，因为这些协议中存在着由多个智能合约之间复杂的金融交互引起的深层逻辑错误。这些工具主要分析单个合约，并采用蛮力方法来处理跨越多个智能合约的DeFi协议，导致效率低下。我们引入了Foray，一个针对DeFi协议中深层逻辑错误的高效攻击合成框架。Foray提出了一种新颖的攻击草图生成和完成框架。具体来说，我们设计了一种领域特定语言（DSL），将低级智能合约提升到其高级金融操作。基于我们的DSL，我们首先将给定的DeFi协议编译成一个令牌流图，即我们对DeFi协议的图形表示。然后，我们设计了一种高效的草图生成方法，用于合成特定攻击目标（如价格操纵、套利等）的攻击草图。该算法通过在TFG中找到可达路径来策略性地识别候选草图，这比随机枚举要高效得多。对于每一个用我们的DSL编写的候选草图，Foray设计了一个领域特定的符号编译，将其编译成SMT约束。我们的编译通过消除冗余的智能合约语义简化了约束。它保持了符号编译的可用性，但可以扩展到数个数量级更大的问题。最后，候选者通过现有的求解器完成，并通过直接语法转换转化为具体攻击。

更新时间: 2024-07-08 19:35:48

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2407.06348v1

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer increasingly from communication costs as the data size or the number of iterations grows. Recent work on linear models has shown that a surrogate likelihood can be optimized locally to iteratively improve on an initial solution in a communication-efficient manner. However, existing versions of these methods experience multiple shortcomings as the data size becomes massive, including diverging updates and efficiently handling sparsity. In this work we develop solutions to these problems which enable us to learn a communication-efficient distributed logistic regression model even beyond millions of features. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed, and similar or faster runtimes. Our code is available at \url{https://github.com/FutureComputing4AI/ProxCSL}.

Updated: 2024-07-08 19:34:39

标题: 高维分布式稀疏分类与可扩展通信高效全局更新

摘要: 随着在统计学习中使用的数据集规模不断增长，模型的分布式训练引起了越来越多的关注。这些方法将数据进行分区并利用并行性来减少内存和运行时间，但随着数据规模或迭代次数的增长，受到通信成本的影响也越来越大。最近关于线性模型的研究表明，可以在本地优化一个替代似然，以迭代改进初始解决方案，以实现高效的通信。然而，随着数据规模变得庞大，现有版本的这些方法存在多个缺点，包括更新发散和高效处理稀疏性。在这项工作中，我们开发了解决这些问题的解决方案，使我们能够学习一个高效的分布式逻辑回归模型，甚至超过数百万个特征。在我们的实验中，我们展示了与仅需要少量分布式更新步骤的分布式算法相比，准确性大幅提高，而且具有相似或更快的运行时间。我们的代码可在\url{https://github.com/FutureComputing4AI/ProxCSL}上找到。

更新时间: 2024-07-08 19:34:39

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2407.06346v1

ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts. As an extension of the ClimSim dataset (Yu et al., 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.

Updated: 2024-07-08 19:33:54

标题: ClimSim-Online: 一个用于混合机器学习-物理气候仿真的大规模多尺度数据集和框架

摘要: 现代气候预测由于计算约束缺乏足够的空间和时间分辨率，导致在代表发生在亚分辨尺度上的雷暴等关键过程时存在不准确性。将物理学与机器学习（ML）相结合的混合方法通过将计算密集、高分辨率模拟外包给ML仿真器，提供更快、更高保真度的气候模拟。然而，这些混合ML-物理仿真需要领域特定的数据和工作流程，这些对许多ML专家来说是不可访问的。作为ClimSim数据集（Yu等人，2024年）的延伸，我们提出了ClimSim-Online，其中还包括一个用于开发混合ML-物理模拟器的端到端工作流程。ClimSim数据集包括57亿对多变量输入/输出向量，捕捉高分辨率、高保真度物理对主机气候模拟器宏观状态的影响。该数据集是全球的，覆盖了十年的高采样频率。我们提供一个跨平台、容器化的管道，将ML模型集成到运行中的气倨模拟器中进行混合测试。我们还实现了各种ML基线，以及一个混合基线模拟器，以突出构建稳定、有技巧的仿真器的ML挑战。数据（https://huggingface.co/datasets/LEAP/ClimSim_high-res）和代码（https://leap-stc.github.io/ClimSim和https://github.com/leap-stc/climsim-online）已公开发布，以支持混合ML-物理和高保真度气候模拟的发展。

更新时间: 2024-07-08 19:33:54

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2306.08754v6

Novel Models for High-Dimensional Imaging: High-Resolution fMRI Acceleration and Quantification

The goals of functional Magnetic Resonance Imaging (fMRI) include high spatial and temporal resolutions with a high signal-to-noise ratio (SNR). To simultaneously improve spatial and temporal resolutions and maintain the high SNR advantage of OSSI, we present novel pipelines for fast acquisition and high-resolution fMRI reconstruction and physics parameter quantification. We propose a patch-tensor low-rank model, a physics-based manifold model, and a voxel-wise attention network. With novel models for acquisition and reconstruction, we demonstrate that we can improve SNR and resolution simultaneously without compromising scan time. All the proposed models outperform other comparison approaches with higher resolution and more functional information.

Updated: 2024-07-08 19:24:21

标题: 高维成像的新模型：高分辨率fMRI加速和量化

摘要: 功能性磁共振成像（fMRI）的目标包括高空间和时间分辨率以及高信噪比（SNR）。为了同时提高空间和时间分辨率并保持OSSI的高SNR优势，我们提出了用于快速采集和高分辨率fMRI重建以及物理参数量化的新型流程。我们提出了一个贴片张量低秩模型，一个基于物理的流形模型和一个基于体素的注意力网络。通过新模型的采集和重建，我们展示了我们可以在不牺牲扫描时间的情况下同时提高SNR和分辨率。所有提出的模型在分辨率和功能信息方面均优于其他比较方法。

更新时间: 2024-07-08 19:24:21

领域: eess.IV,cs.LG,eess.SP,physics.med-ph

下载: http://arxiv.org/abs/2407.06343v1

Noise-Free Explanation for Driving Action Prediction

Although attention mechanisms have achieved considerable progress in Transformer-based architectures across various Artificial Intelligence (AI) domains, their inner workings remain to be explored. Existing explainable methods have different emphases but are rather one-sided. They primarily analyse the attention mechanisms or gradient-based attribution while neglecting the magnitudes of input feature values or the skip-connection module. Moreover, they inevitably bring spurious noisy pixel attributions unrelated to the model's decision, hindering humans' trust in the spotted visualization result. Hence, we propose an easy-to-implement but effective way to remedy this flaw: Smooth Noise Norm Attention (SNNA). We weigh the attention by the norm of the transformed value vector and guide the label-specific signal with the attention gradient, then randomly sample the input perturbations and average the corresponding gradients to produce noise-free attribution. Instead of evaluating the explanation method on the binary or multi-class classification tasks like in previous works, we explore the more complex multi-label classification scenario in this work, i.e., the driving action prediction task, and trained a model for it specifically. Both qualitative and quantitative evaluation results show the superiority of SNNA compared to other SOTA attention-based explainable methods in generating a clearer visual explanation map and ranking the input pixel importance.

Updated: 2024-07-08 19:21:24

标题: 无噪音的驾驶行为预测解释

摘要: 尽管注意力机制在各种人工智能领域的基于Transformer的架构中取得了相当大的进展，但其内部机制仍有待探索。现有的可解释方法有不同的重点，但都有些片面。它们主要分析注意力机制或基于梯度的属性，却忽视了输入特征值的大小或跳连模块。此外，它们不可避免地带来与模型决策无关的虚假嘈杂像素属性，阻碍了人类对所见可视化结果的信任。因此，我们提出了一种易于实施但有效的方法来纠正这一缺陷：平滑噪声规范注意（SNNA）。我们通过转换值向量的范数对注意力进行加权，并使用注意力梯度引导标签特定的信号，然后随机采样输入扰动并平均相应的梯度以生成无噪声的属性。与以往的工作中在二元或多类分类任务上评估解释方法不同，我们在本文中探讨了更复杂的多标签分类情景，即驾驶动作预测任务，并为其专门训练了一个模型。定性和定量评估结果显示，与其他最先进的基于注意力的可解释方法相比，SNNA在生成更清晰的视觉解释图和排名输入像素重要性方面具有优势。

更新时间: 2024-07-08 19:21:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06339v1

Exploring the Latest LLMs for Leaderboard Extraction

The rapid advancements in Large Language Models (LLMs) have opened new avenues for automating complex tasks in AI research. This paper investigates the efficacy of different LLMs-Mistral 7B, Llama-2, GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. We explore three types of contextual inputs to the models: DocTAET (Document Title, Abstract, Experimental Setup, and Tabular Information), DocREC (Results, Experiments, and Conclusions), and DocFULL (entire document). Our comprehensive study evaluates the performance of these models in generating (Task, Dataset, Metric, Score) quadruples from research papers. The findings reveal significant insights into the strengths and limitations of each model and context type, providing valuable guidance for future AI research automation efforts.

Updated: 2024-07-08 19:04:26

标题: 探索用于排名提取的最新LLMs

摘要: 大型语言模型（LLMs）的快速发展为自动化AI研究中的复杂任务开辟了新的途径。本文调查了不同LLMs-Mistral 7B、Llama-2、GPT-4-Turbo和GPT-4.o在从实证AI研究文章中提取排行榜信息方面的效力。我们探讨了模型的三种上下文输入类型：DocTAET（文档标题、摘要、实验设置和表格信息）、DocREC（结果、实验和结论）和DocFULL（整个文档）。我们的综合研究评估了这些模型在从研究论文中生成（任务、数据集、指标、分数）四元组方面的性能。研究结果揭示了每个模型和上下文类型的优势和局限性，为未来的AI研究自动化工作提供了有价值的指导。

更新时间: 2024-07-08 19:04:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.04383v2

Inductive Link Prediction in Knowledge Graphs using Path-based Neural Networks

Link prediction is a crucial research area in knowledge graphs, with many downstream applications. In many real-world scenarios, inductive link prediction is required, where predictions have to be made among unseen entities. Embedding-based models usually need fine-tuning on new entity embeddings, and hence are difficult to be directly applied to inductive link prediction tasks. Logical rules captured by rule-based models can be directly applied to new entities with the same graph typologies, but the captured rules are discrete and usually lack generosity. Graph neural networks (GNNs) can generalize topological information to new graphs taking advantage of deep neural networks, which however may still need fine-tuning on new entity embeddings. In this paper, we propose SiaILP, a path-based model for inductive link prediction using siamese neural networks. Our model only depends on relation and path embeddings, which can be generalized to new entities without fine-tuning. Experiments show that our model achieves several new state-of-the-art performances in link prediction tasks using inductive versions of WN18RR, FB15k-237, and Nell995. Our code is available at \url{https://github.com/canlinzhang/SiaILP}.

Updated: 2024-07-08 19:01:47

标题: 知识图谱中基于路径的神经网络的归纳链接预测

摘要: 链接预测是知识图中一个关键的研究领域，具有许多下游应用。在许多现实场景中，需要归纳式链接预测，其中需要在未见实体之间进行预测。基于嵌入的模型通常需要对新实体嵌入进行微调，因此难以直接应用于归纳式链接预测任务。基于规则的模型捕获的逻辑规则可以直接应用于具有相同图形结构的新实体，但所捕获的规则是离散的并且通常缺乏包容性。图神经网络（GNNs）可以利用深度神经网络将拓扑信息推广到新图形，然而可能仍需要对新实体嵌入进行微调。在本文中，我们提出了SiaILP，一个基于路径的模型，使用孪生神经网络进行归纳式链接预测。我们的模型仅依赖于关系和路径嵌入，可以推广到新实体而无需微调。实验表明，我们的模型在使用WN18RR、FB15k-237和Nell995的归纳版本进行链接预测任务时取得了几个新的最新表现。我们的代码可在\url{https://github.com/canlinzhang/SiaILP}上找到。

更新时间: 2024-07-08 19:01:47

领域: cs.LG

下载: http://arxiv.org/abs/2312.10293v2

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.

Updated: 2024-07-08 18:56:00

标题: 双端合成规划与目标约束的双向搜索

摘要: 计算机辅助合成规划（CASP）算法已经展示出在规划低至中等复杂度分子的逆合成路径方面具有专家级能力。然而，当前的搜索方法假设达到任意构建块就足够了，未能解决使用特定分子的常见现实约束。为此，我们提出了一种具有起始物质约束的合成规划形式。在这种形式下，我们提出了双向合成规划（DESP），这是一种新颖的CASP算法，采用双向图搜索方案，交替扩展目标和起始材料，以确保约束可满足性。该搜索算法由一个从部分观察到的有效化学反应的超图中离线学习的目标条件成本网络引导。我们展示了DESP在多个新基准测试中通过偏向专家目标来提高解决率并减少搜索扩展次数的实用性。DESP可以利用现有的一步逆合成模型，并且我们预计随着这些一步模型能力的提高，其性能将会提升。

更新时间: 2024-07-08 18:56:00

领域: cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2407.06334v1

A third-order finite difference weighted essentially non-oscillatory scheme with shallow neural network

In this paper, we introduce the finite difference weighted essentially non-oscillatory (WENO) scheme based on the neural network for hyperbolic conservation laws. We employ the supervised learning and design two loss functions, one with the mean squared error and the other with the mean squared logarithmic error, where the WENO3-JS weights are computed as the labels. Each loss function consists of two components where the first component compares the difference between the weights from the neural network and WENO3-JS weights, while the second component matches the output weights of the neural network and the linear weights. The former of the loss function enforces the neural network to follow the WENO properties, implying that there is no need for the post-processing layer. Additionally the latter leads to better performance around discontinuities. As a neural network structure, we choose the shallow neural network (SNN) for computational efficiency with the Delta layer consisting of the normalized undivided differences. These constructed WENO3-SNN schemes show the outperformed results in one-dimensional examples and improved behavior in two-dimensional examples, compared with the simulations from WENO3-JS and WENO3-Z.

Updated: 2024-07-08 18:55:57

标题: 一种带有浅层神经网络的三阶有限差分加权基本无振动方案

摘要: 在本文中，我们介绍了基于神经网络的有限差分加权基本非振荡（WENO）方案，用于双曲型守恒律。我们采用监督学习并设计了两个损失函数，一个是均方误差，另一个是均方对数误差，其中WENO3-JS权重被计算为标签。每个损失函数由两个组件组成，其中第一个组件比较了神经网络和WENO3-JS权重之间的差异，而第二个组件匹配了神经网络的输出权重和线性权重。损失函数的前者强制神经网络遵循WENO属性，意味着不需要后处理层。此外，后者导致在不连续性周围表现更好。作为神经网络结构，我们选择了浅层神经网络（SNN）以提高计算效率，Delta层由归一化未分割差异组成。这些构建的WENO3-SNN方案在一维示例中显示出优于WENO3-JS和WENO3-Z模拟的结果，并在二维示例中表现出改进的行为。

更新时间: 2024-07-08 18:55:57

领域: cs.LG,cs.NA,cs.NE,math.NA

下载: http://arxiv.org/abs/2407.06333v1

Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming

Multi-model Markov decision process (MMDP) is a promising framework for computing policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models. Because MMDPs are NP-hard to solve, most methods resort to approximations. In this paper, we derive the policy gradient of MMDPs and propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs. The main innovation of CADP compared with earlier algorithms is to take the coordinate ascent perspective to adjust model weights iteratively to guarantee monotone policy improvements to a local maximum. A theoretical analysis of CADP proves that it never performs worse than previous dynamic programming algorithms like WSU. Our numerical results indicate that CADP substantially outperforms existing methods on several benchmark problems.

Updated: 2024-07-08 18:47:59

标题: 用坐标上升和动态规划方法解决多模型MDPs

摘要: 多模型马尔可夫决策过程（MMDP）是一个有希望的框架，用于计算在MDPs中对参数不确定性具有鲁棒性的策略。MMDPs旨在找到一个策略，该策略最大化MDP模型分布下的预期回报。由于MMDPs是NP难题，大多数方法采用近似方法。在本文中，我们推导了MMDPs的策略梯度，并提出了CADP，该方法结合了坐标上升方法和动态规划算法来解决MMDPs。与早期算法相比，CADP的主要创新在于采用坐标上升的视角迭代地调整模型权重，以保证对局部最大值的单调策略改进。CADP的理论分析证明它永远不会比之前的动态规划算法（如WSU）表现更差。我们的数值结果表明，CADP在几个基准问题上显著优于现有方法。

更新时间: 2024-07-08 18:47:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06329v1

CONGO: Compressive Online Gradient Optimization with Application to Microservices Management

We address the challenge of online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from distributed queueing systems like microservices-based applications, characterized by request-response workloads. Here, each request type proceeds through a sequence of microservices to produce a response, and the resource allocation across the collection of microservices is controlled to balance end-to-end latency with resource costs. While the number of microservices is substantial, the latency function primarily reacts to resource changes in a few, rendering the gradient sparse. Our proposed method, CONGO (Compressive Online Gradient Optimization), combines simultaneous perturbation with compressive sensing to estimate gradients. We establish analytical bounds on the requisite number of compressive sensing samples per iteration to maintain bounded bias of gradient estimates, ensuring sub-linear regret. By exploiting sparsity, we reduce the samples required per iteration to match the gradient's sparsity, rather than the problem's original dimensionality. Numerical experiments and real-world microservices benchmarks demonstrate CONGO's superiority over multiple stochastic gradient descent approaches, as it quickly converges to performance comparable to policies pre-trained with workload awareness.

Updated: 2024-07-08 18:42:50

标题: 刚果：应用于微服务管理的压缩在线梯度优化

摘要: 我们解决了在线凸优化的挑战，其中目标函数的梯度表现出稀疏性，表明只有少数维度具有非零梯度。我们的目标是利用这种稀疏性，即使只有有限数量的函数样本可用，也能获得有用的目标函数梯度估计。我们的动机源自分布式排队系统，如基于微服务的应用程序，其特点是请求-响应工作负载。在这里，每种请求类型通过一系列微服务来生成响应，对整个微服务集合的资源分配进行控制，以平衡端到端延迟和资源成本。虽然微服务的数量很大，但延迟函数主要对少数资源变化做出反应，使得梯度稀疏化。我们提出的方法CONGO（压缩在线梯度优化）结合了同时扰动和压缩感知以估计梯度。我们在每次迭代所需的压缩感知样本数量上建立了分析界限，以保持梯度估计的有界偏差，确保次线性遗憾。通过利用稀疏性，我们减少了每次迭代所需的样本数量，以匹配梯度的稀疏性，而不是问题的原始维度。数值实验和真实世界的微服务基准测试表明，CONGO优于多种随机梯度下降方法，因为它能迅速收敛到具有工作负载感知的预训练策略性能相当的水平。

更新时间: 2024-07-08 18:42:50

领域: cs.LG,cs.DC,math.OC

下载: http://arxiv.org/abs/2407.06325v1

B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent hybrid architectures have combined eidetic and fading memory, but with limitations that do not allow the designer or the learning process to seamlessly modulate the two, nor to extend the eidetic memory span. We leverage ideas from Stochastic Realization Theory to develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an elementary composable module. The overall architecture can be used to implement models that can access short-term eidetic memory "in-context," permanent structural memory "in-weights," fading memory "in-state," and long-term eidetic memory "in-storage" by natively incorporating retrieval from an asynchronously updated memory. We show that Transformers, existing SSMs such as Mamba, and hybrid architectures such as Jamba are special cases of B'MOJO and describe a basic implementation, to be open sourced, that can be stacked and scaled efficiently in hardware. We test B'MOJO on transductive inference tasks, such as associative recall, where it outperforms existing SSMs and Hybrid models; as a baseline, we test ordinary language modeling where B'MOJO achieves perplexity comparable to similarly-sized Transformers and SSMs up to 1.4B parameters, while being up to 10% faster to train. Finally, we show that B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens, four-fold the length of the longest sequences seen during training.

Updated: 2024-07-08 18:41:01

标题: B'MOJO：基于恒定和逐渐消失记忆的基础模型的混合状态空间实现

摘要: 我们描述了一类支持传导推理的架构，允许内存增长到有限但先验未知的界限，同时有效利用有限资源进行推理。当前的架构使用这些资源来表示数据，要么在有限范围内eidetically（例如在Transformers中的“上下文”），要么在无限范围内逐渐消失（在状态空间模型或SSMs中）。最近的混合架构已经结合了eidetic和fading memory，但存在限制，不允许设计师或学习过程无缝地调节这两种记忆，也不允许扩展eidetic memory span。我们利用随机实现理论的思想开发了一类模型，称为B'MOJO，可以在一个基本的可组合模块内无缝地结合eidetic和fading memory。整体架构可用于实现可以通过本地包含从异步更新的内存中检索的模型，访问短期eidetic memory“在上下文中”，永久结构memory“在权重中”，逐渐消失的memory“在状态中”，以及长期eidetic memory“在存储中”。我们展示了Transformers、现有SSMs（如Mamba）和混合架构（如Jamba）是B'MOJO的特例，并描述了一种基本的实现方法，将开源，可以在硬件中高效地堆叠和扩展。我们在传导推理任务上测试了B'MOJO，比如联想回忆，在这些任务中，它的性能优于现有的SSMs和混合模型；作为基准，我们测试了普通语言建模，在这方面，B'MOJO达到了与大小相似的Transformers和SSMs相当的困惑度，参数量高达14亿，同时训练速度提高了10%。最后，我们展示了B'MOJO调节eidetic和fading memory的能力导致更好的推理结果，对长序列的推理测试高达32K个token，是训练过程中出现的最长序列长度的四倍。

更新时间: 2024-07-08 18:41:01

领域: cs.LG,cs.CL,cs.NE

下载: http://arxiv.org/abs/2407.06324v1

MagMax: Leveraging Model Merging for Seamless Continual Learning

This paper introduces a continual learning approach named MagMax, which utilizes model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Distinct from traditional continual learning methods that aim to reduce forgetting during task training, MagMax combines sequential fine-tuning with a maximum magnitude weight selection for effective knowledge integration across tasks. Our initial contribution is an extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly hold up well in various continual learning contexts. More importantly, we present MagMax, a novel model-merging strategy that enables continual learning of large pre-trained models for successive tasks. Our thorough evaluation demonstrates the superiority of MagMax in various scenarios, including class- and domain-incremental learning settings.

Updated: 2024-07-08 18:38:52

标题: MagMax：利用模型合并实现无缝的持续学习

摘要: 本文介绍了一种名为MagMax的连续学习方法，利用模型合并实现了大型预训练模型在不忘记先前获得知识的情况下持续学习新数据。与传统的连续学习方法不同，传统方法旨在减少任务训练过程中的遗忘，MagMax将顺序微调与最大幅度权重选择相结合，以实现跨任务有效知识整合。我们的初始贡献是对模型合并技术进行了广泛审查，揭示了简单方法如权重平均化和随机权重选择在各种连续学习环境中出人意料地表现良好。更重要的是，我们提出了MagMax，一种新颖的模型合并策略，实现了大型预训练模型对连续任务的持续学习。我们的彻底评估显示了MagMax在各种情景下的优越性，包括类别和领域增量学习设置。

更新时间: 2024-07-08 18:38:52

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06322v1

Neural Context Flows for Learning Generalizable Dynamical Systems

Neural Ordinary Differential Equations typically struggle to generalize to new dynamical behaviors created by parameter changes in the underlying system, even when the dynamics are close to previously seen behaviors. The issue gets worse when the changing parameters are unobserved, i.e., their value or influence is not directly measurable when collecting data. We introduce Neural Context Flow (NCF), a framework that encodes said unobserved parameters in a latent context vector as input to a vector field. NCFs leverage differentiability of the vector field with respect to the parameters, along with first-order Taylor expansion to allow any context vector to influence trajectories from other parameters. We validate our method and compare it to established Multi-Task and Meta-Learning alternatives, showing competitive performance in mean squared error for in-domain and out-of-distribution evaluation on the Lotka-Volterra, Glycolytic Oscillator, and Gray-Scott problems. This study holds practical implications for foundational models in science and related areas that benefit from conditional neural ODEs. Our code is openly available at https://github.com/ddrous/ncflow.

Updated: 2024-07-08 18:38:41

标题: 神经背景下的上下文流用于学习具有普适性的动力系统

摘要: 神经常微分方程通常很难推广到基础系统中参数变化所产生的新动态行为，即使这些动态与先前观察到的行为非常接近。当变化的参数是未观察到的时，问题会变得更加严重，即在收集数据时无法直接测量它们的值或影响。我们引入了神经上下文流（NCF）框架，将这些未观察到的参数编码为输入到矢量场的潜在上下文向量。NCF利用矢量场相对于参数的可微性，以及一阶泰勒展开，使任何上下文向量都能影响其他参数的轨迹。我们验证了我们的方法，并将其与已建立的多任务和元学习替代方案进行比较，在Lotka-Volterra、糖酵解振荡器和Gray-Scott问题上展现了竞争性的均方误差性能，这对从条件神经ODE中获益的科学和相关领域的基础模型具有实际意义。我们的代码开放获取，网址为https://github.com/ddrous/ncflow。

更新时间: 2024-07-08 18:38:41

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2405.02154v2

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form $f(\mathbf{x}_t)+\epsilon_t$, being $\epsilon_t$ a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations $y_t \sim \text{Ber}(f(\mathbf{x}_t))$ sampled from a Bernoulli distribution with parameter $f(\mathbf{x}_t)$. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Capp\'e, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.

Updated: 2024-07-08 18:38:11

标题: 开放问题：具有伯努利奖励的核化多臂老虎机的紧密界限

摘要: 我们考虑核化赌博机（KBs）来优化一个函数$f：\mathcal{X} \rightarrow [0,1]$，该函数属于再生核希尔伯特空间（RKHS）$\mathcal{H}_k$。主流的核化赌博机研究关注于一个次高斯噪声模型，在该模型中形式为$f(\mathbf{x}_t)+\epsilon_t$的观测可用，其中$\epsilon_t$为一个次高斯噪声（Chowdhury和Gopalan，2017）。相反，我们关注的是一种情况，即我们观察到从参数为$f(\mathbf{x}_t)$的伯努利分布中采样的实现$y_t \sim \text{Ber}(f(\mathbf{x}_t))$。虽然伯努利模型在多臂赌博机（Garivier和Cappé，2011）、逻辑回归赌博机（Faury等，2022）、度量空间中的赌博机（Magureanu等，2014）中已成功研究，但对于KBs是否可以获得紧密结果仍然是一个悬而未决的问题。本文旨在引起在线学习社区对这一待解问题的关注。

更新时间: 2024-07-08 18:38:11

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.06321v1

Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.

Updated: 2024-07-08 18:32:40

标题: 自动驾驶中的增强安全性：将潜在状态扩散模型集成到端到端导航中

摘要: 随着自动驾驶技术的进步，确保在运动规划和导航过程中的安全性变得越来越重要。然而，大多数端到端规划方法存在安全性不足的问题。本研究解决了自动驾驶控制优化问题中的安全性问题，将其构建为约束马尔可夫决策过程（CMDPs）。我们提出了一种新颖的基于模型的策略优化方法，利用基于条件风险价值的软演员评论家来有效管理复杂、高维状态空间中的约束。我们的方法引入了最坏情况演员来引导安全探索，确保即使在不可预测的情况下也严格遵守安全要求。策略优化采用增广拉格朗日方法，并利用潜在扩散模型来预测和模拟未来轨迹。这种双重方法不仅有助于安全地导航环境，还通过整合分布建模以考虑环境不确定性来改进策略的性能。在模拟和实际环境中进行的实证评估表明，我们的方法在安全性、效率和决策能力方面优于现有方法。

更新时间: 2024-07-08 18:32:40

领域: cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2407.06317v1

Shedding More Light on Robust Classifiers under the lens of Energy-based Models

By reinterpreting a robust discriminative classifier as Energy-based Model (EBM), we offer a new take on the dynamics of adversarial training (AT). Our analysis of the energy landscape during AT reveals that untargeted attacks generate adversarial images much more in-distribution (lower energy) than the original data from the point of view of the model. Conversely, we observe the opposite for targeted attacks. On the ground of our thorough analysis, we present new theoretical and practical results that show how interpreting AT energy dynamics unlocks a better understanding: (1) AT dynamic is governed by three phases and robust overfitting occurs in the third phase with a drastic divergence between natural and adversarial energies (2) by rewriting the loss of TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) in terms of energies, we show that TRADES implicitly alleviates overfitting by means of aligning the natural energy with the adversarial one (3) we empirically show that all recent state-of-the-art robust classifiers are smoothing the energy landscape and we reconcile a variety of studies about understanding AT and weighting the loss function under the umbrella of EBMs. Motivated by rigorous evidence, we propose Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet. We further show that robust classifiers vary in the intensity and quality of their generative capabilities, and offer a simple method to push this capability, reaching a remarkable Inception Score (IS) and FID using a robust classifier without training for generative modeling. The code to reproduce our results is available at http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ .

Updated: 2024-07-08 18:31:19

标题: 在能量基模型的视角下更深入地了解强健分类器

摘要: 通过将强大的判别分类器重新解释为基于能量的模型（EBM），我们对对抗训练（AT）动态提出了新的看法。我们对AT期间能量景观的分析揭示，从模型的角度来看，无目标攻击比原始数据生成的对抗图像更加分布在内部（能量更低）。相反，我们观察到有针对性的攻击的情况相反。在我们彻底的分析基础上，我们提出了新的理论和实践结果，展示了如何解释AT能量动态可以带来更好的理解：（1）AT动态受三个阶段的控制，在第三阶段发生了鲁棒过拟合，自然能量和对抗能量之间出现了明显的分歧（2）通过将TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization（TRADES）的损失重写为能量的形式，我们展示了TRADES通过将自然能量与对抗能量对齐的方式隐含地减轻了过拟合（3）我们在实验中展示了所有最近的最先进鲁棒分类器都在平滑能量景观，我们通过EBMs的统一框架调和了关于理解AT和加权损失函数的各种研究。受到严格证据的激励，我们提出了加权能量对抗训练（WEAT），这是一种新颖的样本加权方案，可以在诸如CIFAR-10和SVHN等多个基准测试中达到与最先进技术相匹配的鲁棒准确性，同时在CIFAR-100和Tiny-ImageNet中取得更大进展。我们进一步展示了鲁棒分类器在其生成能力的强度和质量上存在差异，并提供了一种简单的方法来提高这种能力，使用一个经过训练的鲁棒分类器达到了显著的Inception Score（IS）和FID，而无需进行生成建模训练。我们的结果的代码可在http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ 上找到。

更新时间: 2024-07-08 18:31:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06315v1

BeHonest: Benchmarking Honesty in Large Language Models

Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, present severe risks that intensify as these models approach superintelligent levels. Enhancing honesty in LLMs addresses critical limitations and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs. In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We encourage the AI community to prioritize honesty alignment in these models, which can harness their full potential to benefit society while preventing them from causing harm through deception or inconsistency. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.

Updated: 2024-07-08 18:29:58

标题: 诚实之道：大型语言模型诚实性的基准测试

摘要: 以前关于大型语言模型（LLMs）的工作主要集中在评估它们的帮助性或无害性上。然而，诚实性，另一个关键的对齐标准，受到相对较少的关注。LLMs中的不诚实行为，如传播错误信息和欺诈用户，带来严重风险，随着这些模型接近超级智能水平，这些风险加剧。增强LLMs的诚实性解决了关键限制，并有助于揭示不容易表达的潜在能力。这凸显了迫切需要可靠的方法和基准，以有效确保和评估LLMs的诚实性。在本文中，我们介绍了BeHonest，一个专门设计用于全面评估LLMs诚实性的开创性基准。BeHonest评估诚实性的三个关键方面：对知识边界的认识，避免欺骗，以及回应的一致性。在此基础上，我们设计了10个场景来评估和分析市场上的9种流行LLMs，包括来自不同模型系列的闭源和开源模型，具有不同的模型大小。我们的研究结果表明LLMs的诚实性仍有显著改进空间。我们鼓励人工智能社区将诚实性对齐置于这些模型的首要位置，这可以利用它们的全部潜力造福社会，同时防止它们通过欺骗或不一致性造成伤害。我们的基准和代码可以在以下网址找到：\url{https://github.com/GAIR-NLP/BeHonest}。

更新时间: 2024-07-08 18:29:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13261v3

Limits and Powers of Koopman Learning

Dynamical systems provide a comprehensive way to study complex and changing behaviors across various sciences. Many modern systems are too complicated to analyze directly or we do not have access to models, driving significant interest in learning methods. Koopman operators have emerged as a dominant approach because they allow the study of nonlinear dynamics using linear techniques by solving an infinite-dimensional spectral problem. However, current algorithms face challenges such as lack of convergence, hindering practical progress. This paper addresses a fundamental open question: \textit{When can we robustly learn the spectral properties of Koopman operators from trajectory data of dynamical systems, and when can we not?} Understanding these boundaries is crucial for analysis, applications, and designing algorithms. We establish a foundational approach that combines computational analysis and ergodic theory, revealing the first fundamental barriers -- universal for any algorithm -- associated with system geometry and complexity, regardless of data quality and quantity. For instance, we demonstrate well-behaved smooth dynamical systems on tori where non-trivial eigenfunctions of the Koopman operator cannot be determined by any sequence of (even randomized) algorithms, even with unlimited training data. Additionally, we identify when learning is possible and introduce optimal algorithms with verification that overcome issues in standard methods. These results pave the way for a sharp classification theory of data-driven dynamical systems based on how many limits are needed to solve a problem. These limits characterize all previous methods, presenting a unified view. Our framework systematically determines when and how Koopman spectral properties can be learned.

Updated: 2024-07-08 18:24:48

标题: Koopman学习的限制和能力

摘要: 动力系统提供了研究复杂和变化行为的全面方法，涵盖了各种科学领域。许多现代系统过于复杂，无法直接分析，或者我们无法获得模型，这促使人们对学习方法产生了浓厚兴趣。Koopman算子已经成为主导方法，因为它允许使用线性技术解决无限维谱问题来研究非线性动力学。然而，当前算法面临挑战，如缺乏收敛性，阻碍了实际进展。本文解决了一个基本开放问题：\textit{我们何时可以从动力系统轨迹数据中稳健地学习Koopman算子的谱特性，何时不行？}了解这些界限对于分析、应用和设计算法至关重要。我们建立了一个结合计算分析和遗传理论的基础方法，揭示了与系统几何和复杂性相关的第一个基本障碍--对于任何算法都是普遍的--这些障碍与数据质量和数量无关。例如，我们展示了在环面上的良好行为的平滑动力系统，Koopman算子的非平凡特征函数无法通过任何序列（甚至是随机化的）算法确定，即使拥有无限的训练数据。此外，我们确定了学习何时可能，并引入了具有验证的最优算法，克服了标准方法中的问题。这些结果为基于解决问题需要多少极限的数据驱动动力系统的分类理论铺平了道路。这些极限表征了所有先前的方法，呈现了一个统一的观点。我们的框架系统地确定了何时以及如何学习Koopman谱特性。

更新时间: 2024-07-08 18:24:48

领域: math.DS,cs.LG,cs.NA,math.NA,math.OC,math.SP

下载: http://arxiv.org/abs/2407.06312v1

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-time adaptation of DNN/TDNN and Conformer ASR models. These include: 1) speaker-level variance-regularized spectral basis embedding (VR-SBE) features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation; and 2) feature-based learning hidden unit contributions (f-LHUC) transforms that are conditioned on VR-SBE features. Experiments are conducted on four tasks across two languages: the English UASpeech and TORGO dysarthric speech datasets, the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora. The proposed on-the-fly speaker adaptation techniques consistently outperform baseline iVector and xVector adaptation by statistically significant word or character error rate reductions up to 5.32% absolute (18.57% relative) and batch-mode LHUC speaker adaptation by 2.24% absolute (9.20% relative), while operating with real-time factors speeding up to 33.6 times against xVectors during adaptation. The efficacy of the proposed adaptation techniques is demonstrated in a comparison against current ASR technologies including SSL pre-trained systems on UASpeech, where our best system produces a state-of-the-art WER of 23.33%. Analyses show VR-SBE features and f-LHUC transforms are insensitive to speaker-level data quantity in testtime adaptation. T-SNE visualization reveals they have stronger speaker-level homogeneity than baseline iVectors, xVectors and batch-mode LHUC transforms.

Updated: 2024-07-08 18:20:24

标题: 即时发音障碍和年长者说话者适应的同质化说话者特征

摘要: 数据密集型自动语音识别（ASR）技术在运用于运动障碍和老年成年人的语音时，面临着与健康和非老年声音的不匹配、数据稀缺和大量说话者级别变异的挑战。为此，本文提出了两种新颖的数据高效方法，用于学习同质性的运动障碍和老年说话者级别特征，以便在DNN/TDNN和Conformer ASR模型的测试时快速进行自适应。这些方法包括：1）说话者级别方差正则化的谱基础嵌入（VR-SBE）特征，利用特殊的正则化项来强制在自适应中使说话者特征同质化；和2）基于特征学习隐藏单元贡献（f-LHUC）变换，条件是VR-SBE特征。实验在两种语言的四个任务上进行，包括英语UASpeech和TORGO运动障碍语音数据集，英语DementiaBank Pitt和粤语JCCOCC MoCA老年语音语料库。提出的实时说话者自适应技术在统计上明显优于基线iVector和xVector自适应，绝对减少了高达5.32%的字或字符错误率（相对18.57%），批处理模式下的LHUC说话者自适应也降低了2.24%的绝对值（相对9.20%），同时在自适应过程中实时因子加快了高达33.6倍，而xVectors在自适应过程中。提出的自适应技术的有效性在与当前ASR技术（包括UASpeech上的SSL预训练系统）进行比较时得到了证实，我们的最佳系统产生了23.33%的最先进的WER。分析显示，在测试时自适应过程中，VR-SBE特征和f-LHUC转换对说话者级别数据数量不敏感。T-SNE可视化显示它们比基线iVectors、xVectors和批处理模式下的LHUC转换具有更强的说话者级别同质性。

更新时间: 2024-07-08 18:20:24

领域: cs.SD,cs.AI,cs.HC,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.06310v1

Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

Mobile applications (Apps) could expose children to inappropriate themes such as sexual content, violence, and drug use. Maturity rating offers a quick and effective method for potential users, particularly guardians, to assess the maturity levels of apps. Determining accurate maturity ratings for mobile apps is essential to protect children's health in today's saturated digital marketplace. Existing approaches to maturity rating are either inaccurate (e.g., self-reported rating by developers) or costly (e.g., manual examination). In the literature, there are few text-mining-based approaches to maturity rating. However, each app typically involves multiple modalities, namely app description in the text, and screenshots in the image. In this paper, we present a framework for determining app maturity levels that utilize multimodal large language models (MLLMs), specifically ChatGPT-4 Vision. Powered by Chain-of-Thought (CoT) reasoning, our framework systematically leverages ChatGPT-4 to process multimodal app data (i.e., textual descriptions and screenshots) and guide the MLLM model through a step-by-step reasoning pathway from initial content analysis to final maturity rating determination. As a result, through explicitly incorporating CoT reasoning, our framework enables ChatGPT to understand better and apply maturity policies to facilitate maturity rating. Experimental results indicate that the proposed method outperforms all baseline models and other fusion strategies.

Updated: 2024-07-08 18:20:10

标题: 通过ChatGPT进行多模态思维链推理，以保护儿童免受年龄不当的应用程序的影响

摘要: 移动应用程序（Apps）可能会让儿童接触到不当主题，如性内容、暴力和药物使用。成熟度评级为潜在用户，特别是监护人，提供了一种快速有效的方法来评估应用程序的成熟水平。确定移动应用程序的准确成熟度评级对于保护儿童在今天饱和的数字市场中的健康至关重要。现有的成熟度评级方法要么不准确（例如，开发人员自报评级），要么成本高昂（例如，手动检查）。在文献中，基于文本挖掘的成熟度评级方法很少。然而，每个应用程序通常涉及多种形式，即文本中的应用程序描述和图像中的截图。在本文中，我们提出了一个利用多模态大型语言模型（MLLMs），特别是ChatGPT-4 Vision，来确定应用程序成熟水平的框架。通过Chain-of-Thought（CoT）推理，我们的框架系统地利用ChatGPT-4来处理多模态应用程序数据（即文本描述和截图），并通过一步步的推理路径指导MLLM模型从初始内容分析到最终成熟度评级确定。因此，通过明确地整合CoT推理，我们的框架使ChatGPT能更好地理解并应用成熟度政策，以促进成熟度评级。实验结果表明，所提出的方法优于所有基线模型和其他融合策略。

更新时间: 2024-07-08 18:20:10

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.06309v1

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to thoroughly assess and scrutinize their capabilities. Due to increasingly complex and novel behavioral patterns in current LLMs, this can be done by treating them as participants in psychology experiments that were originally designed to test humans. For this purpose, the paper introduces a new field of research called "machine psychology". The paper outlines how different subfields of psychology can inform behavioral tests for LLMs. It defines methodological standards for machine psychology research, especially by focusing on policies for prompt designs. Additionally, it describes how behavioral patterns discovered in LLMs are to be interpreted. In sum, machine psychology aims to discover emergent abilities in LLMs that cannot be detected by most traditional natural language processing benchmarks.

Updated: 2024-07-08 18:15:13

标题: 机器心理学：使用心理学方法调查大型语言模型中的新兴能力和行为

摘要: 大型语言模型(LLMs)目前处于将人工智能系统与人类沟通和日常生活紧密联系在一起的前沿。由于技术的快速发展和它们极高的多功能性，LLMs现在拥有数百万用户，并且正在成为信息检索、内容生成、问题解决等主要技术。因此，深入评估和审查它们的能力是非常重要的。由于当前LLMs中出现越来越复杂和新颖的行为模式，可以将它们视为原本设计用于测试人类的心理实验的参与者来进行评估。为此，本文介绍了一个名为“机器心理学”的新研究领域。本文概述了心理学不同子领域如何为LLMs的行为测试提供信息。它定义了机器心理学研究的方法论标准，特别关注提示设计的政策。此外，它描述了如何解释LLMs中发现的行为模式。总的来说，机器心理学旨在发现LLMs中的新兴能力，这些能力大多数传统自然语言处理基准无法检测到。

更新时间: 2024-07-08 18:15:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2303.13988v5

VIMI: Grounding Video Generation through Multi-modal Instruction

Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their versatility and application in multimodal integration. To address this, we construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts and then utilize a two-stage training strategy to enable diverse video generation tasks within the same model. In the first stage, we propose a multimodal conditional video generation framework for pretraining on these augmented datasets, establishing a foundational model for grounded video generation. Secondly, we finetune the model from the first stage on three video generation tasks, incorporating multi-modal instructions. This process further refines the model's ability to handle diverse inputs and tasks, ensuring seamless integration of multi-modal information. After this two-stage train-ing process, VIMI demonstrates multimodal understanding capabilities, producing contextually rich and personalized videos grounded in the provided inputs, as shown in Figure 1. Compared to previous visual grounded video generation methods, VIMI can synthesize consistent and temporally coherent videos with large motion while retaining the semantic control. Lastly, VIMI also achieves state-of-the-art text-to-video generation results on UCF101 benchmark.

Updated: 2024-07-08 18:12:49

标题: VIMI: 通过多模态指导实现视频生成的基础

摘要: 现有的文本到视频扩散模型仅依赖于仅用于预训练的文本编码器。这一限制源于缺乏大规模多模态提示视频数据集，导致缺乏视觉基础并限制了它们在多模态集成中的多功能性和应用。为了解决这一问题，我们通过利用检索方法将上下文示例与给定的文本提示配对，构建了一个大规模的多模态提示数据集，然后利用两阶段训练策略在同一模型内实现多样化视频生成任务。在第一阶段，我们提出了一个多模态条件视频生成框架，用于在这些增强数据集上进行预训练，为基于基础的视频生成模型建立基础。其次，我们在第一阶段对模型进行微调，涵盖多模态指令的三个视频生成任务。该过程进一步提高了模型处理多样化输入和任务的能力，确保了多模态信息的无缝集成。经过这两阶段的训练过程，VIMI展示了多模态理解能力，产生了基于提供的输入的丰富上下文和个性化视频，如图1所示。与以往的视觉基础视频生成方法相比，VIMI能够合成具有大运动的一致和时间上连贯的视频，同时保留语义控制。最后，VIMI还在UCF101基准测试上取得了最先进的文本到视频生成结果。

更新时间: 2024-07-08 18:12:49

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06304v1

Unsupervised Fault Detection using SAM with a Moving Window Approach

Automated f ault detection and monitoring in engineering are critical but frequently difficult owing to the necessity for collecting and labeling large amounts of defective samples . We present an unsupervised method that uses the high end Segment Anything Model (SAM) and a moving window approach. SAM has gained recognition in AI image segmentation communities for its accuracy and versatility. However, its performance can be inconsistent when dealing with certain unexpected shapes , such as shadows and subtle surface irregularities. This limitation raise s concerns about its applicability for fault detection in real world scenarios We aim to overcome these challenges without requiring fine tun ing or labeled data. Our technique divides pictures into smaller windows, which are subsequently processed using SAM. This increases the accuracy of fault identification by focusing on localized details. We compute the sizes of the segmented sections and then us e a clustering technique to discover consistent fault areas while filtering out noise. To further improve the method's robustness , we propose adding the Exponentially Weighted Moving Average (EWMA) technique for continuous monitoring in industrial settings, which would improve the method's capacity to trace faults over time. We compare our method to various well established methods u sing a real case study where our model achieve s 0.96 accuracy compared to 0. 8 5 for the second best method. W e also compare our method us ing two open source datasets where our model attains a consistent 0. 86 accuracy across the datasets compared to 0.53 and 0.54 for second best model s.

Updated: 2024-07-08 18:12:29

标题: 无监督的故障检测：使用SAM和移动窗口方法

摘要: 工程中的自动故障检测和监测是至关重要的，但通常难以实现，因为需要收集和标记大量有缺陷的样本。我们提出了一种无监督方法，使用高级的Segment Anything Model (SAM)和移动窗口方法。SAM在AI图像分割社区中因其准确性和多功能性而得到认可。然而，当处理某些意外形状（如阴影和微妙的表面不规则性）时，其性能可能不稳定。这种限制引发了对其在实际场景中故障检测适用性的担忧。我们旨在克服这些挑战，而不需要精细调整或标记数据。我们的技术将图片分成较小的窗口，随后使用SAM进行处理。通过专注于局部细节，这增加了故障识别的准确性。我们计算分段部分的大小，然后使用聚类技术发现一致的故障区域，同时过滤出噪音。为了进一步提高方法的鲁棒性，我们建议在工业环境中添加指数加权移动平均（EWMA）技术进行持续监测，这将提高方法随时间追踪故障的能力。我们将我们的方法与各种已建立的方法进行比较，使用一个真实案例研究，我们的模型达到了0.96的准确率，而第二好的方法为0.85。我们还使用两个开源数据集比较我们的方法，在这些数据集上，我们的模型在不同数据集上均达到了稳定的0.86准确率，而第二好的模型分别为0.53和0.54。

更新时间: 2024-07-08 18:12:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06303v1

Enhancing Software Supply Chain Resilience: Strategy For Mitigating Software Supply Chain Security Risks And Ensuring Security Continuity In Development Lifecycle

This article delves into the strategic approaches and preventive measures necessary to safeguard the software supply chain against evolving threats. It aims to foster an understanding of the challenges and vulnerabilities inherent in software supply chain resilience and to promote transparency and trust in the digital infrastructure that underpins contemporary society. By examining the concept of software supply chain resilience and assessing the current state of supply chain security, the article provides a foundation for discussing strategies and practices that can mitigate security risks and ensure security continuity throughout the development lifecycle. Through this comprehensive analysis, the article contributes to the ongoing effort to strengthen the security posture of software supply chains, thereby ensuring the reliable and secure operation of digital systems in a connected world

Updated: 2024-07-08 18:10:47

标题: 加强软件供应链韧性：减轻软件供应链安全风险并确保开发生命周期安全持续性的策略

摘要: 这篇文章深入探讨了保护软件供应链免受不断演变的威胁所需的战略方法和预防措施。其目的是促进对软件供应链弹性中存在的挑战和脆弱性的理解，并在支撑当代社会的数字基础设施中促进透明度和信任。通过审视软件供应链弹性的概念并评估供应链安全的当前状况，该文章建立了讨论可以减轻安全风险并确保整个开发生命周期中安全连续性的策略和实践的基础。通过这种全面分析，该文章为加强软件供应链安全姿态的持续努力做出了贡献，从而确保数字系统在连接的世界中可靠且安全地运行。

更新时间: 2024-07-08 18:10:47

领域: cs.CR

下载: http://arxiv.org/abs/2407.13785v1

Decoding Human Activities: Analyzing Wearable Accelerometer and Gyroscope Data for Activity Recognition

A person's movement or relative positioning can be effectively captured by different types of sensors and corresponding sensor output can be utilized in various manipulative techniques for the classification of different human activities. This letter proposes an effective scheme for human activity recognition, which introduces two unique approaches within a multi-structural architecture, named FusionActNet. The first approach aims to capture the static and dynamic behavior of a particular action by using two dedicated residual networks and the second approach facilitates the final decision-making process by introducing a guidance module. A two-stage training process is designed where at the first stage, residual networks are pre-trained separately by using static (where the human body is immobile) and dynamic (involving movement of the human body) data. In the next stage, the guidance module along with the pre-trained static or dynamic models are used to train the given sensor data. Here the guidance module learns to emphasize the most relevant prediction vector obtained from the static or dynamic models, which helps to effectively classify different human activities. The proposed scheme is evaluated using two benchmark datasets and compared with state-of-the-art methods. The results clearly demonstrate that our method outperforms existing approaches in terms of accuracy, precision, recall, and F1 score, achieving 97.35% and 95.35% accuracy on the UCI HAR and Motion-Sense datasets, respectively which highlights both the effectiveness and stability of the proposed scheme.

Updated: 2024-07-08 18:09:11

标题: 解码人类活动：分析可穿戴加速度计和陀螺仪数据以进行活动识别

摘要: 一个人的移动或相对定位可以被不同类型的传感器有效捕捉，并且相应的传感器输出可以被用于各种分类不同人类活动的操纵技术。本信函提出了一种有效的人类活动识别方案，该方案在一个多结构架构内引入了两种独特的方法，名为FusionActNet。第一种方法旨在通过使用两个专门的残差网络来捕捉特定动作的静态和动态行为，第二种方法通过引入一个指导模块来促进最终的决策过程。设计了一个两阶段训练过程，在第一阶段，通过使用静态（人体静止不动）和动态（涉及人体运动）数据分别对残差网络进行预训练。在下一个阶段，指导模块以及预训练的静态或动态模型被用来训练给定的传感器数据。在这里，指导模块学习强调从静态或动态模型中获得的最相关的预测向量，有助于有效分类不同的人类活动。所提出的方案使用两个基准数据集进行评估，并与现有的方法进行比较。结果清楚地表明，我们的方法在准确性、精确度、召回率和F1得分方面优于现有方法，在UCI HAR和Motion-Sense数据集上分别达到了97.35%和95.35%的准确率，这突显了该方案的有效性和稳定性。

更新时间: 2024-07-08 18:09:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.02011v3

Multi-Label Plant Species Classification with Self-Supervised Vision Transformers

We present a transfer learning approach using a self-supervised Vision Transformer (DINOv2) for the PlantCLEF 2024 competition, focusing on the multi-label plant species classification. Our method leverages both base and fine-tuned DINOv2 models to extract generalized feature embeddings. We train classifiers to predict multiple plant species within a single image using these rich embeddings. To address the computational challenges of the large-scale dataset, we employ Spark for distributed data processing, ensuring efficient memory management and processing across a cluster of workers. Our data processing pipeline transforms images into grids of tiles, classifying each tile, and aggregating these predictions into a consolidated set of probabilities. Our results demonstrate the efficacy of combining transfer learning with advanced data processing techniques for multi-label image classification tasks. Our code is available at https://github.com/dsgt-kaggle-clef/plantclef-2024.

Updated: 2024-07-08 18:07:33

标题: 使用自监督视觉转换器进行多标签植物物种分类

摘要: 我们提出了一种使用自监督Vision Transformer（DINOv2）的迁移学习方法，针对PlantCLEF 2024竞赛，重点是多标签植物物种分类。我们的方法利用基础和微调的DINOv2模型提取广义特征嵌入。我们训练分类器，使用这些丰富的嵌入来预测单个图像中的多个植物物种。为了解决大规模数据集的计算挑战，我们使用Spark进行分布式数据处理，确保在工作节点集群中进行高效的内存管理和处理。我们的数据处理管道将图像转换为瓦片网格，对每个瓦片进行分类，并将这些预测聚合成一组综合概率。我们的结果表明，将迁移学习与先进的数据处理技术相结合，对于多标签图像分类任务非常有效。我们的代码可在https://github.com/dsgt-kaggle-clef/plantclef-2024找到。

更新时间: 2024-07-08 18:07:33

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.06298v1

Engineering morphogenesis of cell clusters with differentiable programming

Understanding the rules underlying organismal development is a major unsolved problem in biology. Each cell in a developing organism responds to signals in its local environment by dividing, excreting, consuming, or reorganizing, yet how these individual actions coordinate over a macroscopic number of cells to grow complex structures with exquisite functionality is unknown. Here we use recent advances in automatic differentiation to discover local interaction rules and genetic networks that yield emergent, systems-level characteristics in a model of development. We consider a growing tissue with cellular interactions are mediated by morphogen diffusion, differential cell adhesion and mechanical stress. Each cell has an internal genetic network that it uses to make decisions based on its local environment. We show that one can simultaneously learn parameters governing the cell interactions and the genetic network for complex developmental scenarios, including the symmetry breaking of an embryo from an initial cell, the creation of emergent chemical gradients,homogenization of growth via mechanical stress, programmed growth into a prespecified shape, and the ability to repair from damage. When combined with recent experimental advances measuring spatio-temporal dynamics and gene expression of cells in a growing tissue, the methodology outlined here offers a promising path to unravelling the cellular basis of development.

Updated: 2024-07-08 18:05:11

标题: 使用可区分编程工程细胞簇的形态发生

摘要: 理解有机体发育背后的规则是生物学中一个尚未解决的重大问题。发育中的每个细胞都会对其局部环境中的信号做出反应，通过分裂、排泄、消耗或重组来响应，然而如何协调这些个体行为，使宏观数量的细胞共同生长出具有精妙功能的复杂结构仍然不为人知。在这里，我们利用自动微分的最新进展，发现模型发展中产生的局部相互作用规则和遗传网络，从而形成系统级特征。我们考虑一个由形态因子扩散、细胞粘附差异和机械应力介导的细胞相互作用的生长组织。每个细胞都有一个内部遗传网络，根据其局部环境做出决策。我们展示了可以同时学习控制细胞相互作用和遗传网络的参数，以模拟复杂的发育情况，包括从一个初始细胞中产生胚胎的对称破坏、产生新的化学梯度、通过机械应力均匀发育、按照预定形状有序发育以及从损伤中修复的能力。当与最近的实验进展结合，测量生长组织中细胞的时空动态和基因表达时，这里概述的方法提供了一个有希望的途径来揭示发展的细胞基础。

更新时间: 2024-07-08 18:05:11

领域: q-bio.CB,cs.LG

下载: http://arxiv.org/abs/2407.06295v1

Hybrid X-Linker: Automated Data Generation and Extreme Multi-label Ranking for Biomedical Entity Linking

State-of-the-art deep learning entity linking methods rely on extensive human-labelled data, which is costly to acquire. Current datasets are limited in size, leading to inadequate coverage of biomedical concepts and diminished performance when applied to new data. In this work, we propose to automatically generate data to create large-scale training datasets, which allows the exploration of approaches originally developed for the task of extreme multi-label ranking in the biomedical entity linking task. We propose the hybrid X-Linker pipeline that includes different modules to link disease and chemical entity mentions to concepts in the MEDIC and the CTD-Chemical vocabularies, respectively. X-Linker was evaluated on several biomedical datasets: BC5CDR-Disease, BioRED-Disease, NCBI-Disease, BC5CDR-Chemical, BioRED-Chemical, and NLM-Chem, achieving top-1 accuracies of 0.8307, 0.7969, 0.8271, 0.9511, 0.9248, and 0.7895, respectively. X-Linker demonstrated superior performance in three datasets: BC5CDR-Disease, NCBI-Disease, and BioRED-Chemical. In contrast, SapBERT outperformed X-Linker in the remaining three datasets. Both models rely only on the mention string for their operations. The source code of X-Linker and its associated data are publicly available for performing biomedical entity linking without requiring pre-labelled entities with identifiers from specific knowledge organization systems.

Updated: 2024-07-08 18:04:22

标题: 混合X-链接器：用于生物医学实体链接的自动化数据生成和极端多标签排名

摘要: 目前最先进的深度学习实体链接方法依赖于大量人工标注的数据，这种获取成本高昂。当前数据集规模有限，导致对生物医学概念覆盖不足，并在应用于新数据时性能不佳。在本研究中，我们提出自动生成数据以创建大规模训练数据集，这允许探索最初为生物医学实体链接任务开发的极端多标签排名方法。我们提出了混合 X-Linker 流水线，包括不同模块将疾病和化学实体提及链接到 MEDIC 和 CTD-Chemical 词汇表中的概念。X-Linker 在多个生物医学数据集上进行了评估：BC5CDR-Disease、BioRED-Disease、NCBI-Disease、BC5CDR-Chemical、BioRED-Chemical 和 NLM-Chem，在这些数据集中分别实现了0.8307、0.7969、0.8271、0.9511、0.9248 和 0.7895 的 top-1 准确率。X-Linker 在三个数据集中表现出卓越的性能：BC5CDR-Disease、NCBI-Disease 和 BioRED-Chemical。相比之下，SapBERT 在其余三个数据集中胜过 X-Linker。这两个模型仅依赖于提及字符串进行操作。X-Linker 的源代码及其相关数据可公开获取，可用于进行生物医学实体链接，无需预先标记具有特定知识组织系统标识符的实体。

更新时间: 2024-07-08 18:04:22

领域: cs.CL,cs.AI,cs.DL

下载: http://arxiv.org/abs/2407.06292v1

Characterization of topological structures in different neural network architectures

One of the most crucial tasks in the future will be to understand what is going on in neural networks, as they will become even more powerful and widely deployed. This work aims to use TDA methods to analyze neural representations. We develop methods for analyzing representations from different architectures and check how one should use them to obtain valid results. Our findings indicate that removing outliers does not have much impact on the results and that we should compare representations with the same number of elements. We applied these methods for ResNet, VGG19, and ViT architectures and found substantial differences along with some similarities. Additionally, we determined that models with similar architecture tend to have a similar topology of representations and models with a larger number of layers change their topology more smoothly. Furthermore, we found that the topology of pre-trained and finetuned models starts to differ in the middle and final layers while remaining quite similar in the initial layers. These findings demonstrate the efficacy of TDA in the analysis of neural network behavior.

Updated: 2024-07-08 18:02:18

标题: 不同神经网络结构中拓扑结构的表征

摘要: 未来最关键的任务之一将是理解神经网络中正在发生的事情，因为它们将变得更加强大和广泛应用。本研究旨在使用TDA方法分析神经表示。我们开发了用于分析不同架构表示的方法，并检查如何使用它们获得有效结果。我们的发现表明，去除异常值对结果影响不大，应该比较具有相同元素数量的表示。我们应用了这些方法来分析ResNet、VGG19和ViT架构，并发现了一些重大差异以及一些相似之处。此外，我们确定了具有相似架构的模型倾向于具有类似的表示拓扑结构，而具有更多层的模型会更平滑地改变其拓扑结构。此外，我们发现预训练和微调模型的拓扑结构在中间和最终层开始有所不同，而在初始层中保持相似。这些发现表明TDA在分析神经网络行为方面的有效性。

更新时间: 2024-07-08 18:02:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.06286v1

Multi-Object Hallucination in Vision-Language Models

Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent objects or become distracted) when tasked with focusing on multiple objects simultaneously. We introduce Recognition-based Object Probing Evaluation (ROPE), an automated evaluation protocol that considers the distribution of object classes within a single image during testing and uses visual referring prompts to eliminate ambiguity. With comprehensive empirical studies and analysis of potential factors leading to multi-object hallucination, we found that (1) LVLMs suffer more hallucinations when focusing on multiple objects compared to a single object. (2) The tested object class distribution affects hallucination behaviors, indicating that LVLMs may follow shortcuts and spurious correlations.(3) Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and model intrinsic behaviors. We hope to enable LVLMs to recognize and reason about multiple objects that often occur in realistic visual scenes, provide insights, and quantify our progress towards mitigating the issues.

Updated: 2024-07-08 17:59:57

标题: 视觉-语言模型中的多对象幻觉

摘要: 大视觉语言模型（LVLMs）经常受到物体幻觉的困扰，会产生不在给定图像中的物体。虽然当前关于物体幻觉的基准主要集中在单个对象类的存在上，而不是个体实体，但这项工作系统地调查了多对象幻觉，研究了模型在同时关注多个对象时如何错误地理解（例如，发明不存在的物体或分散注意力）。我们引入了基于识别的对象探测评估（ROPE），这是一个自动化评估协议，考虑了在测试过程中单个图像中对象类的分布，并使用视觉指代提示来消除歧义。通过全面的实证研究和分析导致多对象幻觉的潜在因素，我们发现（1）与关注单个对象相比，LVLMs在关注多个对象时更容易产生幻觉。（2）测试的对象类分布影响幻觉行为，表明LVLMs可能会遵循捷径和虚假相关性。(3) 幻觉行为受数据特定因素、显著性和频率以及模型内在行为的影响。我们希望使LVLMs能够识别和推理出现在现实视觉场景中的多个对象，提供洞见，并量化我们在减轻问题方面的进展。

更新时间: 2024-07-08 17:59:57

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06192v1

4D Contrastive Superflows are Dense 3D Representation Learners

In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. SuperFlow stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances the alignment of the knowledge distilled from camera views. Extensive comparative and ablation studies across 11 heterogeneous LiDAR datasets validate our effectiveness and superiority. Additionally, we observe several interesting emerging properties by scaling up the 2D and 3D backbones during pretraining, shedding light on the future research of 3D foundation models for LiDAR-based perception.

Updated: 2024-07-08 17:59:54

标题: 4D对比超流是密集的3D表示学习者

摘要: 在自动驾驶领域，准确的3D感知是基础。然而，开发这样的模型依赖于广泛的人工标注 —— 这是一种既昂贵又劳动密集的过程。为了从数据表示学习的角度解决这一挑战，我们引入了SuperFlow，这是一个新颖的框架，旨在利用连续的LiDAR-摄像头对建立时空预训练目标。SuperFlow的独特之处在于整合了两个关键设计：1）密集到稀疏一致性正则化，促进在特征学习过程中对点云密度变化的不敏感性；2）基于流的对比学习模块，精心设计以从现有的传感器校准中提取有意义的时间线索。为了进一步提高学习效率，我们还加入了一个即插即用的视图一致性模块，增强了从摄像头视图中提炼的知识的对齐性。对11个异构LiDAR数据集进行了广泛的比较和消融研究，验证了我们的有效性和优越性。此外，我们观察到通过扩大2D和3D骨干在预训练期间的规模，出现了一些有趣的新兴特性，为基于LiDAR的感知的3D基础模型的未来研究投下了光芒。

更新时间: 2024-07-08 17:59:54

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.06190v1

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

The performance of Large Vision Language Models (LVLMs) is dependent on the size and quality of their training datasets. Existing video instruction tuning datasets lack diversity as they are derived by prompting large language models with video captions to generate question-answer pairs, and are therefore mostly descriptive. Meanwhile, many labeled video datasets with diverse labels and supervision exist - however, we find that their integration into LVLMs is non-trivial. Herein, we present Video Self-Training with augmented Reasoning (Video-STaR), the first video self-training approach. Video-STaR allows the utilization of any labeled video dataset for video instruction tuning. In Video-STaR, an LVLM cycles between instruction generation and finetuning, which we show (I) improves general video understanding and (II) adapts LVLMs to novel downstream tasks with existing supervision. During generation, an LVLM is prompted to propose an answer. The answers are then filtered only to those that contain the original video labels, and the LVLM is then re-trained on the generated dataset. By only training on generated answers that contain the correct video labels, Video-STaR utilizes these existing video labels as weak supervision for video instruction tuning. Our results demonstrate that Video-STaR-enhanced LVLMs exhibit improved performance in (I) general video QA, where TempCompass performance improved by 10%, and (II) on downstream tasks, where Video-STaR improved Kinetics700-QA accuracy by 20% and action quality assessment on FineDiving by 15%.

Updated: 2024-07-08 17:59:42

标题: Video-STaR: 自我训练使视频指导调整与任何监督相匹配

摘要: 大型视觉语言模型（LVLMs）的性能取决于其训练数据集的大小和质量。现有的视频指导调整数据集缺乏多样性，因为它们是通过提示大型语言模型使用视频字幕生成问题-答案对而衍生的，因此大多数是描述性的。同时，许多具有不同标签和监督的标记视频数据集存在 - 但是，我们发现它们集成到LVLMs中是非平凡的。在这里，我们介绍了增强推理的视频自训练（Video-STaR），这是第一个视频自训练方法。Video-STaR允许利用任何标记的视频数据集进行视频指导调整。在Video-STaR中，LVLM在指导生成和微调之间循环，我们展示（I）改进了一般视频理解和（II）将LVLMs适应了具有现有监督的新任务。在生成过程中，LVLM被提示提出一个答案。然后仅对包含原始视频标签的答案进行筛选，然后LVLM在生成的数据集上重新训练。通过仅在生成的包含正确视频标签的答案上进行训练，Video-STaR利用这些现有视频标签作为视频指导调整的弱监督。我们的结果表明，经过Video-STaR增强的LVLM在（I）一般视频问答中表现出10％的改进，（II）在下游任务中，Video-STaR将Kinetics700-QA准确度提高了20％，并且FineDiving上的动作质量评估提高了15％。

更新时间: 2024-07-08 17:59:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06189v1

Stepping on the Edge: Curvature Aware Learning Rate Tuners

Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tuning and curvature. We find that classical learning rate tuners may yield greater one-step loss reduction, yet they ultimately underperform in the long term when compared to constant learning rates in the full batch regime. These models break the stabilization of the sharpness, which we explain using a simplified model of the joint dynamics of the learning rate and the curvature. To further investigate these effects, we introduce a new learning rate tuning method, Curvature Dynamics Aware Tuning (CDAT), which prioritizes long term curvature stabilization over instantaneous progress on the objective. In the full batch regime, CDAT shows behavior akin to prefixed warm-up schedules on deep learning objectives, outperforming tuned constant learning rates. In the mini batch regime, we observe that stochasticity introduces confounding effects that explain the previous success of some learning rate tuners at appropriate batch sizes. Our findings highlight the critical role of understanding the joint dynamics of the learning rate and curvature, beyond greedy minimization, to diagnose failures and design effective adaptive learning rate tuners.

Updated: 2024-07-08 17:56:00

标题: 踩在边缘上：曲率感知学习率调节器

摘要: 曲率信息，特别是损失Hessian的最大特征值，即被称为锐度的信息，通常构成学习率调节器的基础。然而，最近的研究表明，在训练过程中，曲率信息经历复杂动态，从增加锐度的阶段到最终稳定。我们分析了学习率调整和曲率之间的闭环反馈效应。我们发现，经典的学习率调节器可能会产生更大的一步损失减少，但与全批次制度下的恒定学习率相比，它们在长期内表现不佳。这些模型打破了锐度的稳定性，我们使用学习率和曲率的联合动态的简化模型来解释这一现象。为了进一步研究这些效应，我们引入了一种新的学习率调整方法，Curvature Dynamics Aware Tuning（CDAT），该方法优先考虑目标上的长期曲率稳定性，而不是瞬时进展。在全批次制度下，CDAT表现出类似于深度学习目标上的预设热身计划的行为，优于调整后的恒定学习率。在小批次制度下，我们观察到随机性引入了混淆效应，解释了一些学习率调节器在适当批次大小下之前成功的原因。我们的发现突出了理解学习率和曲率的联合动态的关键作用，超越贪婪最小化，以诊断失败并设计有效的自适应学习率调节器。

更新时间: 2024-07-08 17:56:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.06183v1

SimPO: Simple Preference Optimization with a Reference-Free Reward

Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective approach. The effectiveness of SimPO is attributed to a key design: using the average log probability of a sequence as the implicit reward. This reward formulation better aligns with model generation and eliminates the need for a reference model, making it more compute and memory efficient. Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further enhancing the algorithm's performance. We compare SimPO to DPO and its latest variants across various state-of-the-art training setups, including both base and instruction-tuned models like Mistral and Llama3. We evaluated on extensive instruction-following benchmarks, including AlpacaEval 2, MT-Bench, and the recent challenging Arena-Hard benchmark. Our results demonstrate that SimPO consistently and significantly outperforms existing approaches without substantially increasing response length. Specifically, SimPO outperforms DPO by up to 6.4 points on AlpacaEval 2 and by up to 7.5 points on Arena-Hard. Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 53.7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 36.5 win rate on Arena-Hard -- making it the strongest 8B open-source model.

Updated: 2024-07-08 17:55:24

标题: SimPO：简单偏好优化与无参考奖励

摘要: 直接偏好优化（DPO）是一种广泛使用的离线偏好优化算法，它重新参数化了来自人类反馈的强化学习中的奖励函数，以增强简单性和训练稳定性。在这项工作中，我们提出了SimPO，这是一种更简单但更有效的方法。SimPO的有效性归因于一个关键设计：使用序列的平均对数概率作为隐式奖励。这种奖励制定更符合模型生成，并消除了对参考模型的需求，使其在计算和内存效率上更高。此外，我们在Bradley-Terry目标中引入目标奖励边际，以鼓励在获胜和失败响应之间有更大的边际，进一步提升算法的性能。我们将SimPO与DPO及其最新变体进行比较，包括各种最先进的训练设置，包括基本模型和指导调整模型，如Mistral和Llama3。我们在广泛的指令跟随基准测试中进行了评估，包括AlpacaEval 2、MT-Bench和最近具有挑战性的Arena-Hard基准测试。我们的结果表明，SimPO在不显著增加响应长度的情况下，始终明显优于现有方法。具体而言，在AlpacaEval 2上，SimPO的表现比DPO高出最多6.4个点，在Arena-Hard上高出最多7.5个点。我们的表现最佳模型，建立在Llama3-8B-Instruct上，实现了在AlpacaEval 2上惊人的53.7长度控制胜率，超过了排行榜上的Claude 3 Opus，以及在Arena-Hard上的36.5胜率，使其成为最强大的8B开源模型。

更新时间: 2024-07-08 17:55:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.14734v2

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.

Updated: 2024-07-08 17:52:23

标题: 使用自监督视觉Transformer进行蛇类识别的迁移学习

摘要: 我们提出了我们在SnakeCLEF 2024比赛中用于从图像中预测蛇种的方法。我们探索并使用Meta的DINOv2视觉变压器模型进行特征提取，以应对数据集中182,261张图像中物种高变异性和视觉相似性的挑战。我们对嵌入进行了探索性分析，以了解它们的结构，并在嵌入上训练线性分类器来预测物种。尽管取得了39.69的分数，但我们的结果显示了DINOv2嵌入在蛇种识别中的潜力。此项目的所有代码都可以在https://github.com/dsgt-kaggle-clef/snakeclef-2024上找到。

更新时间: 2024-07-08 17:52:23

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.06178v1

Vision-Language Models under Cultural and Inclusive Considerations

Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives. Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case. To address this problem, we create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind. We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting. While our results for state-of-the-art models are promising, we identify challenges such as hallucination and misalignment of automatic evaluation metrics with human judgment. We make our survey, data, code, and model outputs publicly available.

Updated: 2024-07-08 17:50:00

标题: 跨文化和包容性考虑下的视觉语言模型

摘要: 大型视觉语言模型（VLMs）可以通过描述盲人日常生活中的图像来帮助视力受损的人。目前的评估数据集可能不能反映出不同文化用户背景或这种使用情境的多样性。为了解决这个问题，我们创建了一项调查来确定字幕偏好，并通过过滤由盲人拍摄的图像组成的现有数据集VizWiz，提出了一个以文化为中心的评估基准。然后我们评估了几个VLMs，探讨它们在文化多样性环境中作为视觉助手的可靠性。虽然我们对最先进的模型的结果令人鼓舞，但我们发现了一些挑战，比如幻觉和自动评估指标与人类判断的不一致。我们将我们的调查、数据、代码和模型输出公开提供。

更新时间: 2024-07-08 17:50:00

领域: cs.CV,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2407.06177v1

Byzantine-Resilient Secure Aggregation for Federated Learning Without Privacy Compromises

Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious users through trust scores (TS) that attenuate or amplify the users' gradients. The trust scores are based on the ReLU function, which we approximate by a polynomial. The distributed and privacy-preserving computation in ByITFL is designed using a combination of Lagrange coded computing, verifiable secret sharing and re-randomization steps. ByITFL is the first Byzantine resilient scheme for FL with full information-theoretic privacy.

Updated: 2024-07-08 17:48:43

标题: 拜占庭弹性安全聚合：无隐私妥协的联邦学习

摘要: 联邦学习（FL）在大规模机器学习中表现出巨大的潜力，但也带来了新的隐私和安全风险。我们提出了ByITFL，这是一种新颖的FL方案，可以在抵御拜占庭用户的同时保护用户数据不被联合学习器和其他用户访问。该方案基于现有的非私有FLTrust方案，通过信任分数（TS）来容忍恶意用户，这些分数可以衰减或放大用户的梯度。信任分数基于ReLU函数，我们通过多项式来近似该函数。ByITFL的分布式和隐私保护计算使用拉格朗日编码计算、可验证秘密共享和重新随机化步骤相结合设计。ByITFL是首个具有完整信息理论隐私性的拜占庭容错FL方案。

更新时间: 2024-07-08 17:48:43

领域: cs.IT,cs.CR,cs.DC,cs.LG,math.IT

下载: http://arxiv.org/abs/2405.08698v2

On Speeding Up Language Model Evaluation

Large language models (LLMs) currently dominate the field of natural language processing (NLP), representing the state-of-the-art across a diverse array of tasks. Developing a model of this nature, from training to inference, requires making numerous decisions which define a combinatorial search problem. For example, selecting the optimal pre-trained LLM, prompt, or hyperparameters to attain the best performance for a task often requires evaluating multiple candidates on an entire test set. This exhaustive evaluation can be time-consuming and costly, as both inference and metric computation with LLMs are resource-intensive. In this paper, we address the challenge of identifying the best method within a limited budget for evaluating methods on test examples. By leveraging the well-studied multi-armed bandit framework, which sequentially selects the next method-example pair to evaluate, our approach, combining multi-armed bandit algorithms with low-rank factorization, significantly reduces the required resources. Experiments show that our algorithms can identify the top-performing method using only 5-15\% of the typically needed resources, resulting in an 85-95\% reduction in cost.

Updated: 2024-07-08 17:48:42

标题: 加速语言模型评估

摘要: 大型语言模型（LLMs）目前主导着自然语言处理（NLP）领域，代表了在各种任务中的最新技术。开发这种类型的模型，从训练到推理，需要做出许多定义组合搜索问题的决策。例如，选择最佳的预训练LLM、提示或超参数以获得任务的最佳性能通常需要在整个测试集上评估多个候选项。这种详尽的评估可能耗时且昂贵，因为LLMs的推理和度量计算都需要大量资源。在本文中，我们解决了在有限预算内确定在测试示例上评估方法的最佳方法的挑战。通过利用广泛研究的多臂老虎机框架，该框架顺序选择下一个方法-示例对进行评估，我们的方法结合了多臂老虎机算法和低秩因子分解，显著减少了所需的资源。实验证明，我们的算法只需使用通常所需资源的5-15\%，就能识别出表现最优秀的方法，从而降低成本85-95%。

更新时间: 2024-07-08 17:48:42

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06172v1

Potential Based Diffusion Motion Planning

Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potentials requires solving a global optimization across configuration space potential landscape, which is often prone to local minima. We propose a new approach towards learning potential based motion planning, where we train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories. We illustrate the effectiveness of such approach, significantly outperforming both classical and recent learned motion planning approaches and avoiding issues with local minima. We further illustrate its inherent composability, enabling us to generalize to a multitude of different motion constraints.

Updated: 2024-07-08 17:48:39

标题: 基于潜力的扩散运动规划

摘要: 在高维空间中有效的运动规划是机器人领域长期存在的一个开放性问题。一类传统的运动规划算法是基于潜力的运动规划。基于潜力的运动规划的一个优点是可组合性 - 不同的运动约束可以通过添加相应的潜力轻松地结合在一起。然而，从潜力中构建运动路径需要解决配置空间潜力景观中的全局优化问题，这往往容易陷入局部最小值。我们提出了一种新的学习基于潜力的运动规划方法，通过训练神经网络来捕捉和学习运动规划轨迹上易于优化的潜力。我们证明了这种方法的有效性，明显优于传统和最近学习的运动规划方法，并避免了局部最小值的问题。我们进一步说明了其固有的可组合性，使我们能够推广到多种不同的运动约束。

更新时间: 2024-07-08 17:48:39

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06169v1

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment constraints. As the number of deployment scenarios grows, there is a need to find scalable solutions to design and train specialized CNNs. Once-for-all training has emerged as a scalable approach that jointly co-trains many models (subnets) at once with a constant training cost and finds specialized CNNs later. The scalability is achieved by training the full model and simultaneously reducing it to smaller subnets that share model weights (weight-shared shrinking). However, existing once-for-all training approaches incur huge training costs reaching 1200 GPU hours. We argue this is because they either start the process of shrinking the full model too early or too late. Hence, we propose Delayed $\epsilon$-Shrinking (D$\epsilon$pS) that starts the process of shrinking the full model when it is partially trained (~50%) which leads to training cost improvement and better in-place knowledge distillation to smaller models. The proposed approach also consists of novel heuristics that dynamically adjust subnet learning rates incrementally (E), leading to improved weight-shared knowledge distillation from larger to smaller subnets as well. As a result, DEpS outperforms state-of-the-art once-for-all training techniques across different datasets including CIFAR10/100, ImageNet-100, and ImageNet-1k on accuracy and cost. It achieves 1.83% higher ImageNet-1k top1 accuracy or the same accuracy with 1.3x reduction in FLOPs and 2.5x drop in training cost (GPU*hrs)

Updated: 2024-07-08 17:45:40

标题: DεpS：延迟ε-收缩以加快一次性全面训练

摘要: 卷积神经网络（CNNs）越来越广泛地部署在不同的硬件、动态环境和低功耗嵌入式设备上。这导致设计和训练CNN架构的目标是最大化准确性，同时考虑这些不同的部署约束。随着部署场景数量的增加，有必要找到可扩展的解决方案来设计和训练专门的CNN。一次性训练已经成为一种可扩展的方法，同时以恒定的训练成本一次性联合训练许多模型（子网络），然后找到专门的CNN。通过训练完整模型并同时将其缩小为共享模型权重的较小子网络来实现可扩展性（权重共享收缩）。然而，现有的一次性训练方法会导致巨大的训练成本，达到1200个GPU小时。我们认为这是因为它们要么在太早开始缩小完整模型的过程，要么在太晚开始。因此，我们提出了延迟 $\epsilon$-收缩（D$\epsilon$pS）方法，当完整模型部分训练（约50%）时开始缩小完整模型的过程，从而实现训练成本的改善，并且更好地将知识蒸馏到更小的模型中。所提出的方法还包括新颖的启发式方法，逐步动态调整子网络的学习率（E），从而实现从更大到更小子网络的改进的共享权重知识蒸馏。结果，DEpS在准确性和成本方面优于各种不同数据集上的最先进的一次性训练技术，包括CIFAR10/100、ImageNet-100和ImageNet-1k。它在ImageNet-1k top1准确率上达到1.83%更高，或者在FLOPs减少了1.3倍和训练成本（GPU*小时）减少了2.5倍的情况下实现相同的准确性。

更新时间: 2024-07-08 17:45:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06167v1

Improving Alignment and Robustness with Circuit Breakers

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

Updated: 2024-07-08 17:42:41

标题: 使用断路器提升对齐性和鲁棒性

摘要: 人工智能系统可能采取有害行动，并且极易受到对抗性攻击的影响。我们提出了一种方法，受到最近在表示工程方面的进展的启发，通过“断路器”干预模型在产生有害输出时的响应。现有的旨在改善对齐的技术，如拒绝训练，经常被绕过。对抗性训练等技术试图通过对抗特定攻击来填补这些漏洞。作为拒绝训练和对抗性训练的替代，断路器直接控制首次产生有害输出的表示。我们的技术可应用于纯文本和多模态语言模型，以防止产生有害输出而不损害效用--即使在强大的未知攻击的情况下。值得注意的是，尽管独立图像识别中的对抗性鲁棒性仍然是一个未解决的挑战，但断路器使更大的多模态系统能够可靠地抵抗旨在产生有害内容的图像“劫持”。最后，我们将我们的方法扩展到AI代理，当它们受到攻击时，显示出有害行动率的显著降低。我们的方法代表了在开发可靠的防范有害行为和对抗性攻击方面迈出的重要一步。

更新时间: 2024-07-08 17:42:41

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.04313v3

Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance

Multilingual large language models (MLLMs) have demonstrated significant cross-lingual capabilities through in-context learning. Existing approaches typically construct monolingual few-shot examples, either in the source or target language. However, translating entire in-context examples into the target language might compromise contextual integrity and be costly in the case of long-context passages. To address this, we introduce Cross-lingual QA, a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs. Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA prompting effectively stimulates models to elicit their cross-lingual knowledge, outperforming prior monolingual few-shot prompting approaches. Furthermore, we show that prompting open-source MLLMs with cross-lingual few-shot examples enhances performance as the model scale increases.

Updated: 2024-07-08 17:34:02

标题: 跨语言问答：解锁上下文跨语言性能的关键

摘要: 多语言大型语言模型（MLLMs）通过上下文学习展示出显著的跨语言能力。现有方法通常构建单语言的少样本示例，无论是在源语言还是目标语言中。然而，将整个上下文示例翻译成目标语言可能会损害上下文的完整性，并且在长篇章节的情况下成本很高。为了解决这个问题，我们引入了跨语言问答（Cross-lingual QA），一种跨语言提示方法，只翻译问题和答案部分，从而降低了翻译成本。在四个不同类型的多语言基准测试上进行的实验表明，跨语言QA提示有效地激发了模型引出其跨语言知识，优于先前的单语少样本提示方法。此外，我们展示了用跨语言少样本示例提示开源MLLMs可以提高性能，随着模型规模的增加。

更新时间: 2024-07-08 17:34:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.15233v2

Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design

Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level representation learning framework to advance electrolyte design. Our approach involves two-stage pretraining: reconstructing three-dimensional molecular structures at the molecular level using the Uni-Mol model, and predicting statistical structural properties (e.g., radial distribution functions) from molecular dynamics simulations at the mixture level. Through this comprehensive pretraining, Uni-ELF is able to capture intricate molecular and mixture-level information, which significantly enhances its predictive capability. As a result, Uni-ELF substantially outperforms state-of-the-art methods in predicting both molecular properties (e.g., melting point, boiling point, synthesizability) and formulation properties (e.g., conductivity, Coulombic efficiency). Moreover, Uni-ELF can be seamlessly integrated into an automatic experimental design workflow. We believe this innovative framework will pave the way for automated AI-based electrolyte design and engineering.

Updated: 2024-07-08 17:26:49

标题: Uni-ELF：一种用于电解质配方设计的多层表示学习框架

摘要: 锂电池技术的进步在很大程度上依赖于电解质的设计和工程。然而，当前的电解质分子设计和配方优化方案缺乏有效的计算-实验闭环，往往无法准确预测不同电解质配方的性质。在这项工作中，我们介绍了Uni-ELF，这是一个新颖的多层表示学习框架，用于推进电解质设计。我们的方法涉及两阶段预训练：使用Uni-Mol模型在分子级别重建三维分子结构，并从分子动力学模拟中预测统计结构性质（例如径向分布函数）在混合物级别。通过这种全面的预训练，Uni-ELF能够捕捉复杂的分子和混合物级别信息，显著提高其预测能力。结果，Uni-ELF在预测分子性质（例如熔点、沸点、可合成性）和配方性质（例如电导率、库仑效率）方面显著优于现有方法。此外，Uni-ELF可以无缝集成到自动实验设计工作流程中。我们相信这一创新框架将为基于人工智能的自动电解质设计和工程铺平道路。

更新时间: 2024-07-08 17:26:49

领域: physics.chem-ph,cs.AI

下载: http://arxiv.org/abs/2407.06152v1

Accelerating Phase Field Simulations Through a Hybrid Adaptive Fourier Neural Operator with U-Net Backbone

Prolonged contact between a corrosive liquid and metal alloys can cause progressive dealloying. For such liquid-metal dealloying (LMD) process, phase field models have been developed. However, the governing equations often involve coupled non-linear partial differential equations (PDE), which are challenging to solve numerically. In particular, stiffness in the PDEs requires an extremely small time steps (e.g. $10^{-12}$ or smaller). This computational bottleneck is especially problematic when running LMD simulation until a late time horizon is required. This motivates the development of surrogate models capable of leaping forward in time, by skipping several consecutive time steps at-once. In this paper, we propose U-Shaped Adaptive Fourier Neural Operators (U-AFNO), a machine learning (ML) model inspired by recent advances in neural operator learning. U-AFNO employs U-Nets for extracting and reconstructing local features within the physical fields, and passes the latent space through a vision transformer (ViT) implemented in the Fourier space (AFNO). We use U-AFNOs to learn the dynamics mapping the field at a current time step into a later time step. We also identify global quantities of interest (QoI) describing the corrosion process (e.g. the deformation of the liquid-metal interface) and show that our proposed U-AFNO model is able to accurately predict the field dynamics, in-spite of the chaotic nature of LMD. Our model reproduces the key micro-structure statistics and QoIs with a level of accuracy on-par with the high-fidelity numerical solver. We also investigate the opportunity of using hybrid simulations, in which we alternate forward leap in time using the U-AFNO with high-fidelity time stepping. We demonstrate that while advantageous for some surrogate model design choices, our proposed U-AFNO model in fully auto-regressive settings consistently outperforms hybrid schemes.

Updated: 2024-07-08 17:23:22

标题: 通过具有U-Net骨干的混合自适应傅立叶神经算子加速相场模拟

摘要: 长时间的腐蚀液体与金属合金的接触可能导致逐渐脱合。为了这种液体-金属脱合（LMD）过程，已经开发了相场模型。然而，控制方程通常涉及耦合非线性偏微分方程（PDE），这些方程在数值求解时具有挑战性。特别是，PDE中的刚度需要非常小的时间步长（例如$10^{-12}$或更小）。当需要运行LMD模拟直到较迟的时间范围时，这种计算瓶颈尤为棘手。这促使开发能够通过一次跳过多个连续时间步长来快速向前推进时间的代理模型。在本文中，我们提出了U形自适应傅里叶神经操作符（U-AFNO），这是一种受到神经操作符学习最新进展启发的机器学习（ML）模型。U-AFNO利用U型网络来提取和重建物理场中的局部特征，并通过在傅里叶空间中实现的视觉变换器（ViT）传递潜在空间（AFNO）。我们使用U-AFNO来学习将当前时间步的场映射到较晚时间步的动态。我们还确定描述腐蚀过程的全局感兴趣量（QoI）（例如液体-金属界面的变形），并展示我们提出的U-AFNO模型能够准确预测场的动态，尽管LMD具有混沌性质。我们的模型与高保真度数值求解器具有相当水平的准确性，复制了关键的微观结构统计和QoIs。我们还研究了使用混合模拟的机会，其中我们交替使用U-AFNO进行时间向前跃进和高保真度的时间步进。我们证明，虽然对于一些代理模型设计选择是有利的，但我们提出的U-AFNO模型在完全自回归设置中始终优于混合方案。

更新时间: 2024-07-08 17:23:22

领域: cs.CE,cs.CV,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.17119v2

A Universal Growth Rate for Learning with Smooth Surrogate Losses

This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our lower bound requires weaker conditions than those in previous work for excess error bounds, and our upper bound is entirely novel. Moreover, we extend this analysis to multi-class classification with a series of novel results, demonstrating a universal square-root growth rate for smooth comp-sum and constrained losses, covering common choices for training neural networks in multi-class classification. Given this universal rate, we turn to the question of choosing among different surrogate losses. We first examine how $H$-consistency bounds vary across surrogates based on the number of classes. Next, ignoring constants and focusing on behavior near zero, we identify minimizability gaps as the key differentiating factor in these bounds. Thus, we thoroughly analyze these gaps, to guide surrogate loss selection, covering: comparisons across different comp-sum losses, conditions where gaps become zero, and general conditions leading to small gaps. Additionally, we demonstrate the key role of minimizability gaps in comparing excess error bounds and $H$-consistency bounds.

Updated: 2024-07-08 17:20:19

标题: 一种适用于使用平滑替代损失函数学习的通用增长率

摘要: 本文对用于分类的各种代理损失的$H$-一致性界限（和过量误差界限）的增长率进行了全面分析。我们证明了在二元分类中平滑边界损失的平方根增长率接近零，提供了在温和假设下的上限和下限。这一结果也适用于过量误差界限。我们的下限要求的条件比先前工作中对过量误差界限的要求更弱，而我们的上限完全是新颖的。此外，我们将这一分析扩展到多类分类，得出了一系列新颖结果，展示了平滑comp-sum和受限损失的通用平方根增长率，涵盖了多类分类中用于训练神经网络的常见选择。鉴于这一通用率，我们转向选择不同代理损失的问题。我们首先研究了基于类别数量的代理损失之间的$H$-一致性界限的变化。接下来，忽略常数，关注接近零点的行为，我们确定了最小化差距作为这些界限的关键区别因素。因此，我们彻底分析了这些差距，以指导代理损失的选择，包括：跨不同comp-sum损失的比较，差距变为零的条件，以及导致小差距的一般条件。此外，我们展示了最小化差距在比较过量误差界限和$H$-一致性界限中的关键作用。

更新时间: 2024-07-08 17:20:19

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05968v2

Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks

We present and evaluate a method called grammar masking, which is used to guide large language models (LLMs) toward producing syntactically correct models for a given context-free grammar. Prompt engineering methods such as few-shot learning or priming can be used to improve the chances of an LLM producing correct syntax, but the more complex the grammar, the more time-consuming and less promising these methods become. Previous work is focused primarily on the usage of either language model training or prompt engineering. In this work, a method is presented that restricts the output to a given grammar using constrained decoding to ensure the output adheres to a valid syntax. We use several DSLs built with MontiCore and task multiple LLMs to produce models with and without constrained decoding. A corresponding parser is used to confirm the syntactic correctness of each model. We show that grammar masking can dramatically improve the modeling capabilities of several LLMs, reducing the need for well-refined prompting while increasing the chance of producing correct models.

Updated: 2024-07-08 17:19:59

标题: 使用语法遮罩确保在基于LLM的建模任务中的句法有效性

摘要: 我们提出并评估了一种称为语法屏蔽的方法，该方法用于引导大型语言模型（LLMs）朝着产生给定上下文无关语法的句法正确模型。提示工程方法，如少样本学习或引导，可用于提高LLM产生正确句法的机会，但是语法越复杂，这些方法就变得越耗时且前景越不明朗。先前的工作主要集中在对语言模型训练或提示工程的使用上。在这项工作中，提出了一种利用受限解码将输出限定为给定语法的方法，以确保输出符合有效的句法。我们使用使用MontiCore构建的多个DSL，要求多个LLMs生成有和没有受限解码的模型。使用相应的解析器来确认每个模型的句法正确性。我们展示了语法屏蔽可以显著提高多个LLMs的建模能力，减少对精心设计提示的需求，同时增加产生正确模型的机会。

更新时间: 2024-07-08 17:19:59

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.06146v1

Fast Neighborhood Search Heuristics for the Colored Bin Packing Problem

The Colored Bin Packing Problem (CBPP) is a generalization of the Bin Packing Problem (BPP). The CBPP consists of packing a set of items, each with a weight and a color, in bins of limited capacity, minimizing the number of used bins and satisfying the constraint that two items of the same color cannot be packed side by side in the same bin. In this article, we proposed an adaptation of BPP heuristics and new heuristics for the CBPP. Moreover, we propose a set of fast neighborhood search algorithms for CBPP. These neighborhoods are applied in a meta-heuristic approach based on the Variable Neighborhood Search (VNS) and a matheuristic approach that combines linear programming with the meta-heuristics VNS and Greedy Randomized Adaptive Search (GRASP). The results indicate that our matheuristic is superior to VNS and that both approaches can find near-optimal solutions for a large number of instances, even for those with many items.

Updated: 2024-07-08 17:09:19

标题: 快速的邻域搜索启发式算法用于彩色装箱问题

摘要: The Colored Bin Packing Problem (CBPP)是Bin Packing Problem (BPP)的一个泛化问题。CBPP包括将一组带有重量和颜色的物品装箱到有限容量的箱中，最小化使用的箱数，并满足相同颜色的两个物品不能相邻放置在同一个箱子中的约束。在本文中，我们提出了一种BPP启发式算法的改进和CBPP的新启发式算法。此外，我们提出了一组用于CBPP的快速邻域搜索算法。这些邻域被应用于基于Variable Neighborhood Search (VNS)的元启发式方法和结合了线性规划与元启发式VNS和Greedy Randomized Adaptive Search (GRASP)的matheuristic方法。结果表明，我们的matheuristic优于VNS，而且这两种方法都可以在大量实例中找到接近最优解，即使是具有许多物品的实例也是如此。

更新时间: 2024-07-08 17:09:19

领域: cs.AI,math.OC,68T20, 90C59

下载: http://arxiv.org/abs/2310.04471v2

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation. To mitigate these limitations, we present Anole, an open, autoregressive, native large multimodal model for interleaved image-text generation. We build Anole from Meta AI's Chameleon, adopting an innovative fine-tuning strategy that is both data-efficient and parameter-efficient. Anole demonstrates high-quality, coherent multimodal generation capabilities. We have open-sourced our model, training framework, and instruction tuning data.

Updated: 2024-07-08 17:08:02

标题: ANOLE：一种开放的、自回归的、本地的大型多模态模型，用于交替生成图像文本

摘要: 先前的开源大型多模态模型（LMMs）面临几个限制：（1）它们通常缺乏本地集成，需要适配器来将视觉表示与预训练的大型语言模型（LLMs）对齐；（2）许多受限于单模态生成；（3）虽然一些支持多模态生成，但它们依赖于用于视觉建模和生成的独立扩散模型。为了减轻这些限制，我们提出了Anole，这是一个开放的、自回归的、本地的大型多模态模型，用于交错的图像-文本生成。我们从Meta AI的Chameleon构建了Anole，采用了一种既节省数据又节省参数的创新微调策略。Anole展示了高质量、连贯的多模态生成能力。我们已经开源了我们的模型、训练框架和指导微调数据。

更新时间: 2024-07-08 17:08:02

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06135v1

Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our findings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation.

Updated: 2024-07-08 17:04:31

标题: 评估LLMs在数据可视化中对自然语言话语的语义建模能力

摘要: 回应人类对数据集的话语自动生成数据可视化需要对数据话语进行深入的语义理解，包括对数据属性、可视化任务和必要的数据准备步骤的隐含和显式引用。数据可视化的自然语言接口（NLIs）已经探索了推断这些信息的方式，然而由于人类言语中固有的不确定性，挑战依然存在。最近大型语言模型（LLMs）的进展为解决这些挑战提供了途径，但它们提取相关语义信息的能力尚未被探索。在这项研究中，我们评估了四个公开可用的LLMs（GPT-4、Gemini-Pro、Llama3和Mixtral），研究它们在面对不确定性时理解话语并识别相关数据背景和可视化任务的能力。我们的发现显示LLMs对话语中的不确定性敏感。尽管存在这种敏感性，它们能够提取相关的数据背景。然而，LLMs在推断可视化任务方面存在困难。基于这些结果，我们强调了未来利用LLMs进行可视化生成的研究方向。

更新时间: 2024-07-08 17:04:31

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.06129v1

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionnaires, including variants of the Patient Health Questionnaire (PHQ) by Clinicians and mental health professionals. This approach places significant reliance on the experience and judgment of trained physicians, making the diagnosis susceptible to personal biases. Given that the underlying mechanisms causing depression are still being actively researched, physicians often face challenges in diagnosing and treating the condition, particularly in its early stages of clinical presentation. Recently, significant strides have been made in Artificial neural computing to solve problems involving text, image, and speech in various domains. Our analysis has aimed to leverage these state-of-the-art (SOTA) models in our experiments to achieve optimal outcomes leveraging multiple modalities. The experiments were performed on the Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge. The proposed solutions demonstrate better results achieved by Proprietary and Open-source Large Language Models (LLMs), which achieved a Root Mean Square Error (RMSE) score of 3.98 on Textual Modality, beating the AVEC 2019 challenge baseline results and current SOTA regression analysis architectures. Additionally, the proposed solution achieved an accuracy of 71.43% in the classification task. The paper also includes a novel audio-visual multi-modal network that predicts PHQ-8 scores with an RMSE of 6.51.

Updated: 2024-07-08 17:00:51

标题: 利用大型语言模型对文本和音频-视觉模式的抑郁症检测和分析

摘要: 抑郁症已被证明是一个重要的公共卫生问题，深刻影响个体的心理健康。如果抑郁症未被诊断出来，可能会导致严重的健康问题，甚至会表现为身体上的问题，甚至导致自杀。一般来说，诊断抑郁症或任何其他心理障碍涉及进行半结构化访谈以及补充问卷调查，包括临床医生和心理健康专业人员使用的患者健康问卷（PHQ）的变种。这种方法在很大程度上依赖于经验和受过训练的医生的判断，使诊断容易受到个人偏见的影响。鉴于导致抑郁症的基本机制仍在积极研究中，医生在诊断和治疗该病症时常常面临挑战，特别是在其临床表现的早期阶段。最近，在人工神经计算领域取得了重大进展，以解决涉及文本、图像和语音的各个领域的问题。我们的分析旨在利用这些最先进的模型在我们的实验中实现最佳结果，利用多种模式。实验是在音频/视频情感挑战（AVEC）2019挑战中提供的扩展苦恼分析访谈语料库（E-DAIC）上进行的。所提出的解决方案展示了专有和开源大语言模型（LLMs）取得的更好结果，在文本模式下取得了均方根误差（RMSE）得分为3.98，超过了AVEC 2019挑战的基线结果和当前的SOTA回归分析架构。此外，所提出的解决方案在分类任务中实现了71.43%的准确率。该论文还包括一个新颖的音频-视觉多模态网络，预测PHQ-8得分的RMSE为6.51。

更新时间: 2024-07-08 17:00:51

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.06125v1

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.

Updated: 2024-07-08 17:00:28

标题: 结构化生成：使用分层聚类指导扩散模型

摘要: 本文介绍了Diffuse-TreeVAE，这是一个将分层聚类集成到去噪扩散概率模型（DDPMs）框架中的深度生成模型。所提出的方法通过从学习的潜在树VAE结构的根嵌入中进行采样，然后沿着分层路径传播，并利用第二阶段的DDPM来精炼和生成每个数据集群的独特、高质量的图像。结果是一个模型，不仅提高了图像的清晰度，还确保生成的样本代表其各自的集群，解决了以往基于VAE的方法的局限性，并推动了基于聚类的生成建模的发展。

更新时间: 2024-07-08 17:00:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.06124v1

Characterizing Data Point Vulnerability via Average-Case Robustness

Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.

Updated: 2024-07-08 17:00:16

标题: 通过平均情况下的稳健性表征数据点的脆弱性

摘要: 研究机器学习模型的稳健性是确保模型在现实世界环境中具有一致行为的重要因素。为此，对抗性稳健性是一个标准框架，通过二元透镜来看待预测的稳健性：要么在输入周围的局部区域存在最坏情况的对抗性错误分类，要么不存在。然而，这种二元视角没有考虑到脆弱性的程度，因为在其邻域中具有更多错误分类示例的数据点更容易受到攻击。在这项工作中，我们考虑了一个称为平均情况稳健性的补充框架，它衡量了局部区域中提供一致预测的点的比例。然而，计算这个数量是困难的，因为标准蒙特卡罗方法在高维输入中特别低效。在这项工作中，我们提出了多类分类器的平均情况稳健性的第一个分析估计器。我们实证地表明，我们的估计器对于标准深度学习模型是准确和高效的，并展示了它们在识别脆弱数据点和量化模型的稳健性偏差方面的有用性。总的来说，我们的工具提供了对稳健性的补充视角，提高了我们对模型行为的表征能力。

更新时间: 2024-07-08 17:00:16

领域: cs.LG

下载: http://arxiv.org/abs/2307.13885v6

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.

Updated: 2024-07-08 17:00:07

标题: 快速TRAC：一种无参数优化器，用于终身强化学习

摘要: 终身强化学习（RL）中的一个关键挑战是可塑性的丧失，先前的学习进展阻碍了代理机器人对新任务的适应。虽然正则化和重置可以帮助，但它们需要在开始时进行精确的超参数选择和基于环境的调整。基于在线凸优化的原则性理论，我们提出了一种称为TRAC的终身强化学习的无参数优化器，它不需要调整或关于分布变化的先验知识。对Procgen、Atari和Gym控制环境进行了大量实验，结果显示TRAC在减轻可塑性丧失和迅速适应具有挑战性的分布变化方面表现出色，尽管底层优化问题是非凸和非稳态的。

更新时间: 2024-07-08 17:00:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16642v2

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a stereoscopic manner. Additionally, the system mirrors the operator's arm and hand movements on the robot, creating an immersive experience as if the operator's mind is transmitted to a robot embodiment. We validate the effectiveness of our system by collecting data and training imitation learning policies on four long-horizon, precise tasks (Can Sorting, Can Insertion, Folding, and Unloading) for 2 different humanoid robots and deploy them in the real world. The system is open-sourced at: https://robot-tv.github.io/

Updated: 2024-07-08 16:59:38

标题: 开放式电视：具有沉浸式主动视觉反馈的远程操作

摘要: 远程操作是一种收集机器人学习所需数据的强大方法，这些数据来自演示。远程操作系统的直观性和易用性对于确保高质量、多样化和可扩展的数据至关重要。为了实现这一目标，我们提出了一种沉浸式远程操作系统Open-TeleVision，允许操作员以立体方式主动感知机器人周围的环境。此外，该系统在机器人上镜像操作员的手臂和手部运动，创造了一种沉浸式体验，就好像操作员的思维被传输到一个机器人体现中。我们通过收集数据并在两个不同的人形机器人上对四项长期、精确的任务（分类、插入、折叠和卸载）进行模仿学习策略的训练来验证我们系统的有效性，并在现实世界中部署这些任务。该系统开源地址为：https://robot-tv.github.io/

更新时间: 2024-07-08 16:59:38

领域: cs.RO,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.01512v2

Periodic agent-state based Q-learning for POMDPs

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.

Updated: 2024-07-08 16:58:57

标题: 基于周期性代理状态的POMDP的Q学习

摘要: 部分可观察马尔可夫决策过程（POMDPs）的标准方法是将它们转换为完全可观察的信念状态MDP。然而，信念状态取决于系统模型，因此在强化学习（RL）环境中不可行。一个广泛使用的替代方法是使用代理状态，这是一个无模型、可递归更新的观察历史函数。示例包括帧堆叠和递归神经网络。由于代理状态是无模型的，它被用于调整标准RL算法以适应POMDPs。然而，像Q-learning这样的标准RL算法学习一个固定策略。我们通过示例阐明的主要观点是，由于代理状态不满足马尔可夫性质，非固定代理状态的策略可能优于固定策略。为了利用这个特性，我们提出了PASQL（周期代理状态Q-learning），它是一种学习周期性策略的代理状态Q-learning变体。通过结合周期马尔可夫链和随机逼近的思想，我们严格证明了PASQL收敛到一个循环极限，并刻画了收敛周期策略的近似误差。最后，我们进行了一项数值实验，以突出PASQL的显著特点，并证明学习周期性策略优于固定策略的好处。

更新时间: 2024-07-08 16:58:57

领域: cs.LG

下载: http://arxiv.org/abs/2407.06121v1

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

Updated: 2024-07-08 16:57:26

标题: 草图瞬间匹配：快速和可证明的数据选择以进行微调

摘要: 我们从一个基本的角度重新审视现代微调背景下的数据选择。将低维度中方差最小化的经典智慧延伸到高维微调，我们的泛化分析揭示了另外减少由低秩近似引起的偏差的重要性。受高维度理论中的方差-偏差权衡启发，我们引入了Sketchy Moment Matching（SkMM），这是一个具有两个阶段的可扩展数据选择方案。首先，使用渐变素描探索信息量较低的子空间$\mathcal{S}$来控制偏差；然后通过原始和选定数据集之间的矩匹配来减少$\mathcal{S}$上的方差。理论上，我们展示了渐变素描快速且可证明准确：通过在$\mathcal{S}$上减少方差选择$n$个样本可以保持快速的泛化速率$O(\dim(\mathcal{S})/n)$，与参数维度无关。在实证方面，我们通过合成实验具体化了方差-偏差平衡，并展示了SkMM在真实视觉任务中微调的有效性。

更新时间: 2024-07-08 16:57:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.06120v1

Research on Autonomous Robots Navigation based on Reinforcement Learning

Reinforcement learning continuously optimizes decision-making based on real-time feedback reward signals through continuous interaction with the environment, demonstrating strong adaptive and self-learning capabilities. In recent years, it has become one of the key methods to achieve autonomous navigation of robots. In this work, an autonomous robot navigation method based on reinforcement learning is introduced. We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process through the continuous interaction between the robot and the environment, and the reward signals with real-time feedback. By combining the Q-value function with the deep neural network, deep Q network can handle high-dimensional state space, so as to realize path planning in complex environments. Proximal policy optimization is a strategy gradient-based method, which enables robots to explore and utilize environmental information more efficiently by optimizing policy functions. These methods not only improve the robot's navigation ability in the unknown environment, but also enhance its adaptive and self-learning capabilities. Through multiple training and simulation experiments, we have verified the effectiveness and robustness of these models in various complex scenarios.

Updated: 2024-07-08 16:50:48

标题: 基于强化学习的自主机器人导航研究

摘要: 强化学习通过与环境的持续互动，基于实时反馈奖励信号不断优化决策，展现出强大的自适应和自学习能力。近年来，强化学习已成为实现机器人自主导航的关键方法之一。本文介绍了一种基于强化学习的自主机器人导航方法。我们使用深度 Q 网络（DQN）和 Proximal Policy Optimization（PPO）模型通过机器人与环境之间的持续互动和实时反馈的奖励信号来优化路径规划和决策过程。通过将 Q 值函数与深度神经网络结合，深度 Q 网络可以处理高维状态空间，从而实现复杂环境中的路径规划。Proximal Policy Optimization 是一种基于策略梯度的方法，能够通过优化策略函数使机器人更有效地探索和利用环境信息。这些方法不仅提高了机器人在未知环境中的导航能力，还增强了其自适应和自学习能力。通过多次训练和模拟实验，我们验证了这些模型在各种复杂场景中的有效性和稳健性。

更新时间: 2024-07-08 16:50:48

领域: cs.RO,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.02539v2

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score.

Updated: 2024-07-08 16:39:35

标题: 一个用于蛋白质理解的大型语言模型的微调数据集和基准测试

摘要: 蛋白质序列和自然语言在它们的顺序结构之间的相似性，启发了将大型语言模型(LLMs)应用于蛋白质理解。尽管LLMs在自然语言处理方面取得了成功，但它们在理解蛋白质序列方面的有效性仍然是一个未解之谜，主要是因为缺乏将蛋白质序列与描述性文本相连接的数据集。研究人员尝试通过将蛋白质序列编码器与预训练的LLM集成来适应蛋白质理解，然而，这种调整提出了一个基本问题：“LLMs最初设计用于自然语言处理，能有效地理解蛋白质序列作为一种语言形式吗？”目前的数据集无法解决这个问题，因为蛋白质序列与相应文本描述之间缺乏直接的相关性，限制了有效地训练和评估LLMs进行蛋白质理解的能力。为了弥补这一差距，我们介绍了ProteinLMDataset，这是一个专门设计用于进一步进行自监督预训练和监督微调（SFT）的数据集，以增强LLMs对蛋白质序列理解的能力。具体来说，ProteinLMDataset包括174.6亿个标记用于预训练和89.3万个指令用于SFT。此外，我们提出了ProteinLMBench，这是第一个基准数据集，包括944个经过手工验证的多项选择题，用于评估LLMs的蛋白质理解能力。ProteinLMBench包含多种语言的蛋白质相关细节和序列，为评估LLMs在蛋白质理解方面的能力建立了新的标准。在ProteinLMDataset上预训练和微调的大型语言模型InternLM2-7B在ProteinLMBench上表现优于GPT-4，达到了最高的准确性得分。

更新时间: 2024-07-08 16:39:35

领域: q-bio.QM,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05540v2

Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging

Operational meteorological forecasting has long relied on physics-based numerical weather prediction (NWP) models. Recently, this landscape has been disrupted by the advent of data-driven artificial intelligence (AI)-based weather models, which offer tremendous computational performance and competitive forecasting skill. However, data-driven models for medium-range forecasting generally suffer from major limitations, including low effective resolution and a narrow range of predicted variables. This study illustrates the relative strengths and weaknesses of these competing paradigms using the GEM (Global Environmental Multiscale) and GraphCast models to represent physics-based and AI-based approaches, respectively. By analyzing global predictions from these two models against observations and analyses in both physical and spectral spaces, this study demonstrates that GraphCast-predicted large scales outperform GEM, particularly for longer lead times. Building on this insight, a hybrid NWP-AI system is proposed, wherein GEM-predicted large-scale state variables are spectrally nudged toward GraphCast predictions, while allowing GEM to freely generate fine-scale details critical for weather extremes. Results indicate that this hybrid approach is capable of leveraging the strengths of GraphCast to enhance the prediction skill of the GEM model. Importantly, trajectories of tropical cyclones are predicted with enhanced accuracy without significant changes in intensity. Furthermore, this new hybrid system ensures that meteorologists have access to a complete set of forecast variables, including those relevant for high-impact weather events.

Updated: 2024-07-08 16:39:25

标题: 利用数据驱动的天气模型通过大规模光谱调控提高数值天气预报技能

摘要: 运营气象预报长期以来一直依赖于基于物理的数值天气预报（NWP）模型。近年来，数据驱动的人工智能（AI）天气模型的出现打破了这一局面，这些模型提供了巨大的计算性能和竞争力强的预报技能。然而，中程预报的数据驱动模型通常存在重大限制，包括低有效分辨率和预测变量范围狭窄。本研究利用GEM（全球环境多尺度）和GraphCast模型分别代表基于物理和基于AI的方法，阐明了这两种竞争范式的相对优势和劣势。通过分析这两个模型在物理和频谱空间中对全球预测与观测和分析的对比，本研究表明GraphCast预测的大尺度优于GEM，尤其是针对较长的先导时间。基于这一见解，提出了一种混合NWP-AI系统，其中GEM预测的大尺度状态变量被频谱地推向GraphCast的预测，同时允许GEM自由生成对于极端天气至关重要的细节。结果表明，这种混合方法能够利用GraphCast的优势提高GEM模型的预测技能。重要的是，热带气旋的轨迹可以预测得更准确，而强度没有显著变化。此外，这种新的混合系统确保气象学家可以获得一套完整的预报变量，包括那些与高影响天气事件相关的变量。

更新时间: 2024-07-08 16:39:25

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2407.06100v1

Verifiably Following Complex Robot Instructions with Foundation Models

Enabling mobile robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow expressive and complex open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We perform a large scale evaluation and demonstrate our approach on 150 instructions in five real-world environments showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP performs comparably with state-of-the-art LLM task planners and LLM code-writing planners on standard open vocabulary tasks and additionally achieves 79\% success rate on complex spatiotemporal instructions while LLM and Code-writing planners both achieve 38\%. See supplementary materials and demo videos at https://robotlimp.github.io

Updated: 2024-07-08 16:38:57

标题: 使用基础模型可验证地遵循复杂机器人指令

摘要: 将移动机器人能够遵循复杂的自然语言指令变得可能是一个重要且具有挑战性的问题。人们希望在指挥机器人时能够灵活表达约束条件，参考任意地标，并验证行为。相反，机器人必须将人类指令解释为规范，并将指令参照物与现实世界联系起来。我们提出了一种称为运动规划语言指令接地（LIMP）的方法，该方法使机器人能够在现实环境中可靠地遵循富有表现力和复杂的开放式指令，而无需预先构建语义地图。LIMP构建了一种符号指令表示，揭示了机器人与指导者的意图动机的一致性，并支持通过构造正确的机器人行为。我们进行了大规模评估，并在五个真实环境中对150条指令展示了我们的方法的泛化性和在新领域的部署便捷性。在我们的实验中，LIMP在标准开放词汇任务上与最先进的LLM任务规划器和LLM代码编写规划器表现相当，并在复杂的时空指令上取得了79%的成功率，而LLM和代码编写规划器分别只有38%的成功率。请访问https://robotlimp.github.io 查看补充材料和演示视频。

更新时间: 2024-07-08 16:38:57

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2402.11498v2

Physics-Informed Machine Learning Towards A Real-Time Spacecraft Thermal Simulator

Modeling thermal states for complex space missions, such as the surface exploration of airless bodies, requires high computation, whether used in ground-based analysis for spacecraft design or during onboard reasoning for autonomous operations. For example, a finite-element thermal model with hundreds of elements can take significant time to simulate, which makes it unsuitable for onboard reasoning during time-sensitive scenarios such as descent and landing, proximity operations, or in-space assembly. Further, the lack of fast and accurate thermal modeling drives thermal designs to be more conservative and leads to spacecraft with larger mass and higher power budgets. The emerging paradigm of physics-informed machine learning (PIML) presents a class of hybrid modeling architectures that address this challenge by combining simplified physics models with machine learning (ML) models resulting in models which maintain both interpretability and robustness. Such techniques enable designs with reduced mass and power through onboard thermal-state estimation and control and may lead to improved onboard handling of off-nominal states, including unplanned down-time. The PIML model or hybrid model presented here consists of a neural network which predicts reduced nodalizations (distribution and size of coarse mesh) given on-orbit thermal load conditions, and subsequently a (relatively coarse) finite-difference model operates on this mesh to predict thermal states. We compare the computational performance and accuracy of the hybrid model to a data-driven neural net model, and a high-fidelity finite-difference model of a prototype Earth-orbiting small spacecraft. The PIML based active nodalization approach provides significantly better generalization than the neural net model and coarse mesh model, while reducing computing cost by up to 1.7x compared to the high-fidelity model.

Updated: 2024-07-08 16:38:52

标题: 基于物理的机器学习技术用于实时航天器热仿真器

摘要: 建模复杂空间任务的热态，例如无大气体天体的表面探测，需要高计算量，无论是用于地面分析进行航天器设计还是在机载推理期间进行自主操作。例如，具有数百个元素的有限元热模型可能需要大量时间来模拟，这使其在诸如下降和着陆、接近操作或太空组装等时间敏感情景中不适用于机载推理。此外，缺乏快速准确的热建模导致热设计更加保守，导致航天器具有更大的质量和更高的功率预算。物理信息机器学习（PIML）新兴范式提出了一类混合建模架构，通过将简化的物理模型与机器学习（ML）模型相结合，得到既具有解释性又具有鲁棒性的模型，应对这一挑战。这些技术使得通过机载热状态估计和控制实现重量和功率减少的设计，并可能改善机载对非正常状态的处理，包括计划外停机。这里提出的PIML模型或混合模型包括一个神经网络，根据轨道热负载条件预测减少节点化（粗网格的分布和大小），随后在此网格上运行（相对较粗糙的）有限差分模型来预测热态。我们将混合模型的计算性能和准确性与数据驱动的神经网络模型以及一个原型地球轨道小型航天器的高保真度有限差分模型进行比较。基于PIML的主动节点化方法在泛化能力方面明显优于神经网络模型和粗网格模型，同时与高保真度模型相比，可以将计算成本降低高达1.7倍。

更新时间: 2024-07-08 16:38:52

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2407.06099v1

Epistemological Bias As a Means for the Automated Detection of Injustices in Text

Injustice occurs when someone experiences unfair treatment or their rights are violated and is often due to the presence of implicit biases and prejudice such as stereotypes. The automated identification of injustice in text has received little attention, due in part to the fact that underlying implicit biases or stereotypes are rarely explicitly stated and that instances often occur unconsciously due to the pervasive nature of prejudice in society. Here, we describe a novel framework that combines the use of a fine-tuned BERT-based bias detection model, two stereotype detection models, and a lexicon-based approach to show that epistemological biases (i.e., words, which presupposes, entails, asserts, hedges, or boosts text to erode or assert a person's capacity as a knower) can assist with the automatic detection of injustice in text. The news media has many instances of injustice (i.e. discriminatory narratives), thus it is our use case here. We conduct and discuss an empirical qualitative research study which shows how the framework can be applied to detect injustices, even at higher volumes of data.

Updated: 2024-07-08 16:38:31

标题: 认识论偏见作为自动检测文本中不公正行为的手段

摘要: 不公正发生在某人经历不公平对待或其权利被侵犯时，通常是由于内在的偏见和偏见的存在，比如刻板印象。在文本中自动识别不公正一直受到很少关注，部分原因是潜在的内在偏见或刻板印象很少被明确陈述，而且这些情况通常是由于社会中偏见的普遍性而无意识地发生的。在这里，我们描述了一个新颖的框架，该框架结合了一个经过精细调整的基于BERT的偏见检测模型、两个刻板印象检测模型和一个基于词库的方法，以展示认识论偏见（即，假设、蕴含、断言、踌躇或增强文本来侵蚀或肯定一个人的认知能力的词语）如何可以帮助自动检测文本中的不公正。新闻媒体中存在许多不公正的例子（即，歧视性叙事），因此这是我们在此使用的案例。我们进行并讨论了一项经验性定性研究，展示了该框架如何可以应用于检测不公正，甚至在更大量的数据中。

更新时间: 2024-07-08 16:38:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06098v1

Tiny Models are the Computational Saver for Large Models

This paper introduces TinySaver, an early-exit-like dynamic model compression approach which employs tiny models to substitute large models adaptively. Distinct from traditional compression techniques, dynamic methods like TinySaver can leverage the difficulty differences to allow certain inputs to complete their inference processes early, thereby conserving computational resources. Most existing early exit designs are implemented by attaching additional network branches to the model's backbone. Our study, however, reveals that completely independent tiny models can replace a substantial portion of the larger models' job with minimal impact on performance. Employing them as the first exit can remarkably enhance computational efficiency. By searching and employing the most appropriate tiny model as the computational saver for a given large model, the proposed approaches work as a novel and generic method to model compression. This finding will help the research community in exploring new compression methods to address the escalating computational demands posed by rapidly evolving AI models. Our evaluation of this approach in ImageNet-1k classification demonstrates its potential to reduce the number of compute operations by up to 90\%, with only negligible losses in performance, across various modern vision models.

Updated: 2024-07-08 16:38:20

标题: 微型模型是大模型的计算节省者

摘要: 这篇论文介绍了TinySaver，一种类似早期退出的动态模型压缩方法，它使用微小模型来自适应性地替代大型模型。与传统压缩技术不同，像TinySaver这样的动态方法可以利用困难差异，允许某些输入提前完成推理过程，从而节省计算资源。大多数现有的早期退出设计是通过将额外的网络分支附加到模型的主干上来实现的。然而，我们的研究揭示了完全独立的微小模型可以用最小的性能影响替代较大模型的重要部分工作。将它们作为第一个退出可以显着提高计算效率。通过搜索并使用最适合作为给定大型模型的计算节省器的微小模型，所提出的方法作为一种新颖且通用的模型压缩方法。这一发现将有助于研究社区探索新的压缩方法，以解决快速发展的人工智能模型带来的不断增长的计算需求。我们在ImageNet-1k分类中对这种方法的评估显示，它可以将计算操作数量减少高达90\%，在各种现代视觉模型中仅有微不足道的性能损失。

更新时间: 2024-07-08 16:38:20

领域: cs.AI

下载: http://arxiv.org/abs/2403.17726v2

Artificial Intuition: Efficient Classification of Scientific Abstracts

It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics.

Updated: 2024-07-08 16:34:47

标题: 人工直觉：科学摘要的高效分类

摘要: 翻译如下：对于粗略分类短科学文本，如资助或出版摘要，以获取战略洞察或研究组合管理是很有必要的。这些文本能够高效传递密集信息给拥有丰富知识体系的专家，帮助他们进行解释。然而，由于文本简洁和缺乏上下文，自动化这一任务确实非常困难。为了填补这一差距，我们开发了一种新方法来生成和适当分配粗略领域特定的标签。我们展示了大型语言模型（LLM）可以为这一任务提供必要的元数据，这一过程类似于增加代表人类直觉的补充知识，并提出了一个工作流程。作为试点研究，我们使用了美国国家航空航天局（NASA）的奖助摘要语料库。我们还开发了新的评估工具，并结合已建立的性能指标。

更新时间: 2024-07-08 16:34:47

领域: cs.AI

下载: http://arxiv.org/abs/2407.06093v1

Assessing Cardiomegaly in Dogs Using a Simple CNN Model

This paper introduces DogHeart, a dataset comprising 1400 training, 200 validation, and 400 test images categorized as small, normal, and large based on VHS score. A custom CNN model is developed, featuring a straightforward architecture with 4 convolutional layers and 4 fully connected layers. Despite the absence of data augmentation, the model achieves a 72\% accuracy in classifying cardiomegaly severity. The study contributes to automated assessment of cardiac conditions in dogs, highlighting the potential for early detection and intervention in veterinary care.

Updated: 2024-07-08 16:31:49

标题: 使用简单的CNN模型评估狗的心脏增大

摘要: 本文介绍了DogHeart数据集，该数据集包含1400张训练图像、200张验证图像和400张测试图像，根据VHS评分分为小、正常和大三类。开发了一个自定义CNN模型，采用了简单的架构，包括4个卷积层和4个全连接层。尽管没有使用数据增强，该模型在分类心脏肥大严重程度方面达到了72\%的准确率。该研究有助于实现对狗的心脏状况的自动评估，突出了在兽医护理中早期检测和干预的潜力。

更新时间: 2024-07-08 16:31:49

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.06092v1

Qualitative Event Perception: Leveraging Spatiotemporal Episodic Memory for Learning Combat in a Strategy Game

Event perception refers to people's ability to carve up continuous experience into meaningful discrete events. We speak of finishing our morning coffee, mowing the lawn, leaving work, etc. as singular occurrences that are localized in time and space. In this work, we analyze how spatiotemporal representations can be used to automatically segment continuous experience into structured episodes, and how these descriptions can be used for analogical learning. These representations are based on Hayes' notion of histories and build upon existing work on qualitative episodic memory. Our agent automatically generates event descriptions of military battles in a strategy game and improves its gameplay by learning from this experience. Episodes are segmented based on changing properties in the world and we show evidence that they facilitate learning because they capture event descriptions at a useful spatiotemporal grain size. This is evaluated through our agent's performance in the game. We also show empirical evidence that the perception of spatial extent of episodes affects both their temporal duration as well as the number of overall cases generated.

Updated: 2024-07-08 16:28:38

标题: 定性事件感知：利用时空事件记忆来学习战斗策略游戏

摘要: 事件感知是指人们将连续的体验划分为有意义的离散事件的能力。我们谈论喝完早晨的咖啡、修剪草坪、离开工作等，这些都被视为在时间和空间上局限的单一事件。在这项工作中，我们分析了如何利用时空表征自动将连续的体验分割为结构化的片段，并且这些描述如何用于类比学习。这些表征基于海斯的历史概念，并建立在现有的关于定性情节记忆的研究基础之上。我们的代理程序自动生成了一个战略游戏中的军事战斗事件描述，并通过从这个经验中学习来改善游戏表现。事件根据世界中的变化属性进行分割，我们展示了它们有助于学习，因为它们以有用的时空颗粒大小捕捉事件描述。这通过我们的代理程序在游戏中的表现来评估。我们还展示了实证证据，即片段的空间范围感知影响它们的时间持续以及生成的总体案例数量。

更新时间: 2024-07-08 16:28:38

领域: cs.AI

下载: http://arxiv.org/abs/2407.06088v1

Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise

We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or are limited to special cases of noise ensembles. In this paper, using tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals) we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. Remarkably, our analysis unveils the asymptotic equivalence between the rotationally invariant model and a surrogate Gaussian one. Finally, we show how to saturate the predicted statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations.

Updated: 2024-07-08 16:26:03

标题: 信息极限和带有结构噪声的尖峰矩阵模型的Thouless-Anderson-Palmer方程

摘要: 我们考虑了一个结构化尖峰模型的贝叶斯推断的典型问题：一个低秩信号被加性噪声污染。当噪声是高斯维格纳矩阵时，信息理论和算法限制都被很好地理解，但更现实的结构化噪声情况仍然具有挑战性。为了捕捉结构同时保持数学可处理性，一系列研究已经专注于旋转不变噪声。然而，现有研究要么提供次优算法，要么仅限于特殊噪声集合的情况。在本文中，我们使用统计物理工具（重复方法）和随机矩阵理论（广义球面积分）建立了第一个对从一般迹集合中抽取的噪声矩阵的信息理论限制的特征化。值得注意的是，我们的分析揭示了旋转不变模型和替代高斯模型之间的渐近等价性。最后，我们展示了如何利用灵活的TAP（Thouless-Anderson-Palmer）方程理论启发的高效算法来达到预测的统计限制。

更新时间: 2024-07-08 16:26:03

领域: cs.IT,cond-mat.dis-nn,cs.LG,math.IT,math.ST,stat.TH,62F15, 82B44

下载: http://arxiv.org/abs/2405.20993v2

Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis

We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architecture that simultaneously synthesizes images at multiple resolution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution, while reducing the computational cost per step. We demonstrate that higher resolution synthesis can be achieved by layering convolutions at additional resolution scales, in contrast to other methods which require additional models for super-resolution synthesis.

Updated: 2024-07-08 16:25:34

标题: 一次高分辨率文本到图像合成的分层扩散模型

摘要: 我们提出了一种一次性文本到图像扩散模型，可以从自然语言描述中生成高分辨率图像。我们的模型采用了分层U-Net架构，可以同时在多个分辨率尺度上合成图像。我们展示了这种方法优于仅在目标分辨率合成图像的基准方法，同时降低了每步的计算成本。我们证明，通过在额外的分辨率尺度上添加卷积层，可以实现更高分辨率的合成，与其他方法相比，其他方法需要额外的模型来进行超分辨率合成。

更新时间: 2024-07-08 16:25:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06079v1

Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots

Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic mapping could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene mapping with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.

Updated: 2024-07-08 16:25:01

标题: 面向对象的材料分类和3D聚类，以提高移动机器人的语义感知和地图绘制

摘要: 对不同对象表面材料类型进行分类可以在移动机器人和自主车辆的决策算法中发挥重要作用。基于RGB的场景级语义分割在文献中已经得到了很好的解决。然而，利用深度模态改进材料识别，并将其与SLAM算法相结合用于3D语义映射，可能在机器人感知流程中开启新的潜在优势。为此，我们提出了一种基于RGB-D的材料分类的互补感知深度学习方法，建立在面向对象的流水线之上。该方法进一步集成了ORB-SLAM2方法，用于对由视觉SLAM算法生成的点云地图中检测到的材料语义进行多尺度聚类的3D场景映射。通过对现有公共数据集和新贡献的真实世界机器人数据集的广泛实验结果，我们展示了与3D语义场景映射的最先进方法相比，在材料分类和3D聚类准确性方面的显著改进。

更新时间: 2024-07-08 16:25:01

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06077v1

Understanding Visual Feature Reliance through the Lens of Complexity

Recent studies suggest that deep learning models inductive bias towards favoring simpler features may be one of the sources of shortcut learning. Yet, there has been limited focus on understanding the complexity of the myriad features that models learn. In this work, we introduce a new metric for quantifying feature complexity, based on $\mathscr{V}$-information and capturing whether a feature requires complex computational transformations to be extracted. Using this $\mathscr{V}$-information metric, we analyze the complexities of 10,000 features, represented as directions in the penultimate layer, that were extracted from a standard ImageNet-trained vision model. Our study addresses four key questions: First, we ask what features look like as a function of complexity and find a spectrum of simple to complex features present within the model. Second, we ask when features are learned during training. We find that simpler features dominate early in training, and more complex features emerge gradually. Third, we investigate where within the network simple and complex features flow, and find that simpler features tend to bypass the visual hierarchy via residual connections. Fourth, we explore the connection between features complexity and their importance in driving the networks decision. We find that complex features tend to be less important. Surprisingly, important features become accessible at earlier layers during training, like a sedimentation process, allowing the model to build upon these foundational elements.

Updated: 2024-07-08 16:21:53

标题: 透过复杂性镜头理解视觉特征依赖

摘要: 最近的研究表明，深度学习模型对简单特征的归纳偏好可能是快捷学习的一个原因之一。然而，对于模型学习的众多特征的复杂性的理解还存在限制。在这项工作中，我们引入了一种基于$\mathscr{V}$-信息的量化特征复杂性的新度量标准，捕捉了一个特征是否需要复杂的计算转换来提取。利用这个$\mathscr{V}$-信息度量标准，我们分析了从标准ImageNet训练的视觉模型中提取的10,000个特征的复杂性，这些特征表示为倒数第二层中的方向。我们的研究涉及四个关键问题：首先，我们探究了特征在复杂性方面的表现，并发现模型中存在简单到复杂的特征谱。其次，我们研究了特征在训练过程中何时被学习。我们发现，在训练初期简单特征占主导地位，而更复杂的特征逐渐出现。第三，我们调查了简单和复杂特征在网络中流动的位置，发现简单特征倾向于通过残差连接绕过视觉层次结构。第四，我们探讨了特征复杂性与它们在驱动网络决策中的重要性之间的联系。我们发现复杂特征往往不太重要。令人惊讶的是，重要特征在训练过程中更早地变得可访问，就像沉积过程一样，使模型能够建立在这些基础元素之上。

更新时间: 2024-07-08 16:21:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06076v1

Multi-step Inference over Unstructured Data

The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cognition (EC), we have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine for logical inference, planning and interactive constraint solving. We describe Cora, a Collaborative Research Assistant built on this platform, that is designed to perform complex research and discovery tasks in high-stakes domains. This paper discusses the multi-step inference challenges inherent in such domains, critiques the limitations of existing LLM-based methods, and demonstrates how Cora's neuro-symbolic approach effectively addresses these issues. We provide an overview of the system architecture, key algorithms for knowledge extraction and formal reasoning, and present preliminary evaluation results that highlight Cora's superior performance compared to well-known LLM and RAG baselines.

Updated: 2024-07-08 16:16:20

标题: 多步推理在非结构化数据上的应用

摘要: 大型语言模型（LLMs）和生成式人工智能的出现彻底改变了各个领域的自然语言应用。然而，在医疗、法律和金融等领域的高风险决策任务需要一定精度、全面性和逻辑一致性，而纯粹的LLM或检索增强生成（RAG）方法往往无法达到这些要求。在Elemental Cognition（EC）公司，我们开发了一种神经符号人工智能平台来解决这些问题。该平台集成了精细调整的LLMs用于知识提取和对齐，并与强大的符号推理引擎整合，用于逻辑推理、规划和交互式约束求解。我们描述了基于该平台构建的协作研究助手Cora，旨在在高风险领域执行复杂的研究和发现任务。本文讨论了这些领域固有的多步推理挑战，批判了现有基于LLM的方法的限制，并展示了Cora的神经符号方法如何有效地解决这些问题。我们提供了系统架构概述、知识提取和形式推理的关键算法，并呈现了初步评估结果，突显了Cora相对于知名LLM和RAG基线的优越性能。

更新时间: 2024-07-08 16:16:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17987v2

A Tutorial on Doubly Robust Learning for Causal Inference

Doubly robust learning offers a robust framework for causal inference from observational data by integrating propensity score and outcome modeling. Despite its theoretical appeal, practical adoption remains limited due to perceived complexity and inaccessible software. This tutorial aims to demystify doubly robust methods and demonstrate their application using the EconML package. We provide an introduction to causal inference, discuss the principles of outcome modeling and propensity scores, and illustrate the doubly robust approach through simulated case studies. By simplifying the methodology and offering practical coding examples, we intend to make doubly robust learning accessible to researchers and practitioners in data science and statistics.

Updated: 2024-07-08 16:15:08

标题: 一个关于因果推断的双重稳健学习教程

摘要: 双重稳健学习通过整合倾向得分和结果建模为观察数据提供了一个稳健的因果推断框架。尽管在理论上具有吸引力，但实际应用仍然有限，因为人们认为其复杂性高且软件不易获取。本教程旨在揭示双重稳健方法，并使用EconML软件包展示其应用。我们介绍因果推断的基本概念，讨论结果建模和倾向得分的原则，并通过模拟案例研究展示双重稳健方法。通过简化方法论并提供实际编码示例，我们希望使双重稳健学习对数据科学和统计领域的研究人员和从业者更易于接触。

更新时间: 2024-07-08 16:15:08

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2406.00853v2

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

Updated: 2024-07-08 16:13:42

标题: 从循环到失误：语言模型在不确定性下的回退行为

摘要: 大型语言模型（LLMs）经常表现出令人不满的行为，例如幻觉和序列重复。我们建议将这些行为视为模型在不确定性下表现出的回退，并研究它们之间的联系。我们对回退行为进行分类--序列重复，退化文本和幻觉--并在相同家族的模型中进行了广泛分析，这些模型在预训练令牌的数量，参数数量或包含遵循指令训练方面有所不同。我们的实验揭示了在所有这些轴上回退行为的明确且一致的排序：LLM越先进（即训练了更多的令牌，有更多参数，或经过指令调整），其回退行为从序列重复转移到退化文本，然后转移到幻觉。此外，即使对于表现最佳的模型，我们也观察到相同的排序在单个生成过程中保持不变；随着不确定性的增加，模型从产生幻觉转移到产生退化文本，然后是序列重复。最后，我们证明了虽然常见的解码技术，如随机抽样，可能缓解一些不良行为，如序列重复，但会增加更难检测到的幻觉。

更新时间: 2024-07-08 16:13:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06071v1

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs, which usually require a strong LLM's emergent ability to improve its original bad answer. To address these challenges, we propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. This method encompasses stages of Question Analysis, Answer Guidance, and Safe Answer production. It is designed to enable LLMs to generate high-quality, safe responses throughout various stages of their development. Furthermore, we introduce the Mixture of insighTful Experts (MoTE) architecture, which applies mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency. The MoTE approach not only outperforms existing methods in aligning LLMs with human values but also highlights the benefits of using self-generated data, revealing the dual benefits of improved alignment and training efficiency.

Updated: 2024-07-08 16:02:18

标题: 深思专家混合（MoTE）：自我调整中的思维链与专家混合的协同效应

摘要: 随着大型语言模型（LLMs）的能力得到了显著扩展，将这些模型与人类价值观对齐面临着重大挑战。传统的对齐策略主要依赖于人类干预，比如监督微调（SFT）和来自人类反馈的强化学习（RLHF），或者依赖LLMs的自我对齐能力，通常需要一个强大的LLM的新兴能力来改善其原始的错误答案。为了应对这些挑战，我们提出了一种新颖的自我对齐方法，利用了一种思维链（CoT）方法，称为AlignCoT。该方法包括问题分析、答案指导和安全答案生成等阶段。它旨在使LLMs能够在其发展的各个阶段生成高质量、安全的回应。此外，我们引入了深思专家混合体（MoTE）架构，将专家混合应用于增强AlignCoT过程的每个组件，显著提高了对齐效率。MoTE方法不仅在将LLMs与人类价值观对齐方面优于现有方法，还突显了使用自动生成数据的好处，揭示了对齐和训练效率的双重好处。

更新时间: 2024-07-08 16:02:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.00557v3

MERGE -- A Bimodal Dataset for Static Music Emotion Recognition

The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a severe lack of public and sizeable bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively called MERGE, created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. In addition, we propose and validate fixed train-validate-test splits. The obtained results confirm the viability of the proposed datasets, achieving the best overall result of 79.21% F1-score for bimodal classification using a deep neural network.

Updated: 2024-07-08 16:01:04

标题: MERGE -- 一个用于静态音乐情感识别的双模数据集

摘要: 音乐情感识别（MER）领域近年来取得了稳定的发展，贡献来自特征工程、机器学习和深度学习。该领域的发展也从以音频为中心的系统转变为结合音频和歌词的双模态集成系统。然而，缺乏公开且规模可观的双模态数据库阻碍了双模态音频-歌词系统的发展和改进。本文提出了三个新的音频、歌词和双模态MER研究数据集，统称为MERGE，采用半自动方法创建。为了全面评估所提出的数据集并建立基准进行比较，我们针对每种模态进行了多次实验，采用特征工程、机器学习和深度学习方法。此外，我们提出并验证了固定的训练-验证-测试分割方法。获得的结果证实了所提出的数据集的可行性，利用深度神经网络进行双模态分类获得了79.21%的F1分数，为最佳综合结果。

更新时间: 2024-07-08 16:01:04

领域: cs.SD,cs.IR,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2407.06060v1

Variational Best-of-N Alignment

Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the BoN algorithm. We then propose to fine-tune the language model to minimize backward KL divergence to the BoN distribution. Our approach is analogous to mean-field variational inference and, thus, we term it variational BoN (vBoN). To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on a controlled generation task suggest that while variational BoN is not as effective as BoN in aligning language models, it is close to BoN performance as vBoN appears more often on the Pareto frontier of reward and KL divergence compared to models trained with KL-constrained RL objective.

Updated: 2024-07-08 15:59:44

标题: 变分最优N对齐

摘要: Best-of-N（BoN）是一种流行且有效的算法，用于将语言模型与人类偏好对齐。该算法的工作方式如下：在推断时间，从语言模型中抽取N个样本，并由奖励模型评判，返回具有最高奖励的样本作为输出。尽管其有效性，BoN计算成本高昂；它将采样吞吐量降低了N倍。为了使BoN在推断时间更加高效，一种策略是微调语言模型以模拟BoN在推断过程中的操作。为了实现这一目标，我们推导出由BoN算法引发的分布。然后建议微调语言模型以最小化到BoN分布的反向KL散度。我们的方法类似于均场变分推断，因此我们将其称为变分BoN（vBoN）。在这种微调成功且我们获得了良好近似的情况下，我们将推断成本降低了N倍。我们在受控生成任务上的实验表明，虽然变分BoN在对齐语言模型方面不如BoN有效，但与BoN性能接近，因为vBoN在奖励和KL散度的帕累托前沿上出现更频繁，相比使用KL约束的RL目标训练的模型。

更新时间: 2024-07-08 15:59:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06057v1

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and wide diversity of such situations present a significant challenge for these data-driven methods. To overcome this challenge, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that some key high-level behaviors of our approach transfer to a physical robot.

Updated: 2024-07-08 15:58:33

标题: 陌生人危险！在基于强化学习的社交机器人导航中识别和避开不可预测的行人

摘要: 强化学习（RL）方法用于社交机器人导航在大规模人群中表现出极佳的成功，但这些基于学习的方法在特别具有挑战性或陌生情况下的性能往往会下降，因为模型依赖于代表性的训练数据。为了确保人类的安全和舒适，这些算法处理罕见情况至关重要，但这些情况的低频率和广泛多样性为这些数据驱动方法带来了重大挑战。为了克服这一挑战，我们提出了对学习过程的修改，鼓励这些RL策略在陌生情况下保持额外谨慎。具体而言，我们通过（1）修改训练过程系统地引入行人模型的偏差，（2）更新价值网络以估计和利用行人不可预测性特征，以及（3）实施奖励函数来学习对行人不可预测性的有效响应，改进了Socially Attentive Reinforcement Learning（SARL）策略。与原始SARL策略相比，我们修改后的策略保持类似的导航时间和路径长度，同时将碰撞次数减少了82％，并将在最困难情况下花费在行人个人空间中的时间比例减少了高达19个百分点。我们还描述了如何将这些修改应用于其他RL策略，并展示我们方法的一些关键高级行为如何转移到实际机器人上。

更新时间: 2024-07-08 15:58:33

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.06056v1

Simple Opinion Dynamics for No-Regret Learning

We study a cooperative multi-agent bandit setting in the distributed GOSSIP model: in every round, each of $n$ agents chooses an action from a common set, observes the action's corresponding reward, and subsequently exchanges information with a single randomly chosen neighbor, which may inform its choice in the next round. We introduce and analyze families of memoryless and time-independent protocols for this setting, inspired by opinion dynamics that are well-studied for other algorithmic tasks in the GOSSIP model. For stationary reward settings, we prove for the first time that these simple protocols exhibit best-of-both-worlds behavior, simultaneously obtaining constant cumulative regret scaling like $R(T)/T = \widetilde O(1/T)$, and also reaching consensus on the highest-mean action within $\widetilde O(\sqrt{n})$ rounds. We obtain these results by showing a new connection between the global evolution of these decentralized protocols and a class of zero-sum multiplicative weights update} processes. Using this connection, we establish a general framework for analyzing the population-level regret and other properties of our protocols. Finally, we show our protocols are also surprisingly robust to adversarial rewards, and in this regime we obtain sublinear regret scaling like $R(T)/T = \widetilde O(1/\sqrt{T})$ as long as the number of rounds does not grow too fast as a function of $n$.

Updated: 2024-07-08 15:56:39

标题: 简单的意见动态学习的无悔学习

摘要: 我们研究了在分布式GOSSIP模型中的合作多智能体赌博设置：在每一轮中，$n$个智能体中的每一个从一个共同的动作集中选择一个动作，观察动作对应的奖励，随后与一个随机选择的邻居交换信息，这可能会影响其在下一轮中的选择。我们引入并分析了适用于这种设置的无记忆和独立于时间的协议系列，这些协议受到在GOSSIP模型中已经广泛研究的意见动态的启发。对于稳态奖励设置，我们首次证明了这些简单协议展现出了最佳的双赢行为，同时获得了像$R(T)/T = \widetilde O(1/T)$这样缩放的恒定累积遗憾，并且在$\widetilde O(\sqrt{n})$轮内达成对最高平均动作的共识。我们通过展示这些分散协议的全局演变与一类零和乘法权重更新过程之间的新连接来获得这些结果。利用这一连接，我们建立了一个分析我们协议的人口级遗憾和其他性质的一般框架。最后，我们展示我们的协议对敌对奖励也异常强大，在这种情况下，只要轮数不会作为$n$的函数增长得太快，我们就可以获得像$R(T)/T = \widetilde O(1/\sqrt{T})$这样的次线性遗憾缩放。

更新时间: 2024-07-08 15:56:39

领域: cs.LG,cs.DC,cs.DS

下载: http://arxiv.org/abs/2306.08670v4

Learning local equivariant representations for quantum operators

Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (Strictly Localized Equivariant Message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM's key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the $f$ and $g$ orbitals in their basis sets. We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM's design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery.

Updated: 2024-07-08 15:55:12

标题: 学习量子算符的局部等变表示

摘要: 在密度泛函理论（DFT）框架中预测量子算符矩阵（如哈密顿量、重叠矩阵和密度矩阵）对于理解材料性质至关重要。当前的方法通常专注于单个算符，并且对于大型系统的效率和可扩展性存在困难。在这里，我们介绍了一种新颖的深度学习模型SLEM（Strictly Localized Equivariant Message-passing），用于预测多个量子算符，实现了最先进的准确性，同时显著提高了计算效率。SLEM的关键创新在于其基于严格局部性的设计，构建了量子张量的局部、等变表示，同时保持物理对称性。这使得复杂的多体依赖性成为可能，而不需要扩展有效的感受野，从而实现了更优越的数据效率和可传递性。通过创新的SO（2）卷积技术，SLEM减少了高阶张量积的计算复杂性，因此能够处理需要在其基组中使用$f$和$g$轨道的系统。我们展示了SLEM在各种2D和3D材料中的能力，即使在有限的训练数据下也实现了高准确性。SLEM的设计促进了有效的并行化，有可能将DFT模拟扩展到设备级尺寸的系统，为大规模量子模拟和高通量材料发现开辟了新的可能性。

更新时间: 2024-07-08 15:55:12

领域: cond-mat.mtrl-sci,cs.LG,quant-ph

下载: http://arxiv.org/abs/2407.06053v1

EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DiT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DiT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance.

Updated: 2024-07-08 15:51:29

标题: EndoUIC：胶囊内窥镜统一照明校正的可提示扩散变换器

摘要: 无线胶囊内镜（WCE）因其无创且无痛的方法而备受重视，尽管其有效性受到硬件约束和复杂内部动力学造成的光照不均匀的影响，导致图像过曝或欠曝。虽然研究人员已经讨论了在WCE中低光照增强的挑战，但对于不同曝光水平的校正问题仍未得到充分探讨。为了解决这个问题，我们介绍了EndoUIC，这是一个使用端到端可提示扩散变换器（DiT）模型的WCE统一照明校正解决方案。在我们的工作中，照明提示模块将引导模型适应不同的曝光水平并进行有针对性的图像增强，其中自适应提示集成（API）和全局提示扫描器（GPS）模块将进一步增强提示参数和特征之间的并发表示学习。此外，U形恢复DiT模型将捕捉长距离依赖关系和上下文信息，用于统一照明恢复。此外，我们提出了一个新颖的胶囊内镜曝光校正（CEC）数据集，包括由专业摄影师注释的地面真实和损坏图像对。对四个数据集进行的大量实验证明了我们提出的方法和组件在WCE照明恢复中的有效性，进一步的下游实验进一步证明了其在临床诊断和手术辅助中的实用性。

更新时间: 2024-07-08 15:51:29

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.13705v2

UCCA: A Verified Architecture for Compartmentalization of Untrusted Code Sections in Resource-Constrained Devices

Micro-controller units (MCUs) implement the de facto interface between the physical and digital worlds. As a consequence, they appear in a variety of sensing/actuation applications, from smart personal spaces to complex industrial control systems and safety-critical medical equipment. While many of these devices perform safety- and time-critical tasks, they often lack support for security features compatible with their importance to overall system functions. This lack of architectural support leaves them vulnerable to run-time attacks that can remotely alter their intended behavior, with potentially catastrophic consequences. In particular, we note that MCU software often includes untrusted third-party libraries (some of them closed-source) that are blindly used within MCU programs, without proper isolation from the rest of the system. In turn, a single vulnerability (or intentional backdoor) in one such third-party software can often compromise the entire MCU software state. In this paper, we tackle this problem by proposing, demonstrating security, and formally verifying the implementation of UCCA: an Untrusted Code Compartment Architecture. UCCA provides flexible hardware-enforced isolation of untrusted code sections (e.g., third-party software modules) in resource-constrained and time-critical MCUs. To demonstrate UCCA's practicality, we implement an open-source version of the design on a real resource-constrained MCU: the well-known TI MSP430. Our evaluation shows that UCCA incurs little overhead and is affordable even to lowest-end MCUs, requiring significantly less overhead and assumptions than prior related work.

Updated: 2024-07-08 15:49:18

标题: UCCA: 一种用于资源受限设备中不受信任代码区隔的验证架构

摘要: 微控制器单元（MCUs）实现了物理世界和数字世界之间的事实界面。因此，它们出现在各种感知/执行应用中，从智能个人空间到复杂的工业控制系统和安全关键的医疗设备。虽然许多这些设备执行安全和时间关键任务，但它们通常缺乏与其对整个系统功能的重要性相匹配的安全功能支持。这种缺乏体系结构支持使它们容易受到运行时攻击的影响，这些攻击可能在远程改变它们预期行为的情况下产生潜在灾难性后果。特别要注意的是，MCU软件通常包括不受信任的第三方库（其中一些是闭源），在MCU程序中盲目使用，而没有适当地与系统的其余部分隔离。反过来，一个这样的第三方软件中的单个漏洞（或故意的后门）往往会危害整个MCU软件状态。在本文中，我们通过提出、展示安全性，并正式验证UCCA的实现来解决这个问题：一个不受信任的代码隔离架构。UCCA在资源受限且时间关键的MCUs中提供灵活的硬件强制隔离不受信任的代码段（例如第三方软件模块）。为了展示UCCA的实用性，我们在一个真实的资源受限的MCU上实现了设计的开源版本：著名的TI MSP430。我们的评估显示，UCCA的开销很小，即使对于最低端的MCUs也是可负担的，需要的开销和假设远远少于之前的相关工作。

更新时间: 2024-07-08 15:49:18

领域: cs.CR

下载: http://arxiv.org/abs/2312.02348v2

A Planning Ontology to Represent and Exploit Planning Knowledge for Performance Efficiency

Ontologies are known for their ability to organize rich metadata, support the identification of novel insights via semantic queries, and promote reuse. In this paper, we consider the problem of automated planning, where the objective is to find a sequence of actions that will move an agent from an initial state of the world to a desired goal state. We hypothesize that given a large number of available planners and diverse planning domains; they carry essential information that can be leveraged to identify suitable planners and improve their performance for a domain. We use data on planning domains and planners from the International Planning Competition (IPC) to construct a planning ontology and demonstrate via experiments in two use cases that the ontology can lead to the selection of promising planners and improving their performance using macros - a form of action ordering constraints extracted from planning ontology. We also make the planning ontology and associated resources available to the community to promote further research.

Updated: 2024-07-08 15:44:34

标题: 一个规划本体用于表示和利用规划知识以提高性能效率

摘要: 本文提出本文提出了一个问题，即自动规划的问题，其目标是找到一系列动作，将一个代理从世界的初始状态移动到期望的目标状态。我们假设在 IPC（国际规划竞赛）提供了大量的规划器和多样化的规划领域；它们携带着可以利用的基本信息，以识别适合的规划器并改进其在领域中的性能。我们利用 IPC 中的规划领域和规划器的数据构建一个规划本体，并通过两个用例的实验演示，本体可以导致选择有前途的规划器并使用宏来改进其性能 - 一种从规划本体中提取的动作排序约束。我们还将规划本体和相关资源提供给社区，以促进进一步的研究。

更新时间: 2024-07-08 15:44:34

领域: cs.AI

下载: http://arxiv.org/abs/2307.13549v2

Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression

We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires the calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria, or Lepskii principle.

Updated: 2024-07-08 15:43:59

标题: 选择$k$的$k$-NN回归中的最小差异原则策略

摘要: 我们提出了一种新的数据驱动策略，用于选择$k$-NN回归估计器中的超参数$k，而无需使用任何保留数据。我们将选择超参数的问题视为一个迭代过程（关于$k$），并提出了一种基于早停和最小差异原则的易于实现的策略。该模型选择策略被证明在某些光滑函数类上是极小极优的，例如在有界域上的Lipschitz函数类。与其他模型选择策略（如保留方法、5倍交叉验证和AIC准则）相比，这种新颖的方法通常在人工和真实数据集上提高了统计性能。该策略的新颖之处在于减少了模型选择过程的计算时间，同时保持了结果估计量的统计（极小极优）最优性。更具体地说，给定大小为$n$的样本，如果在$\left\{ 1, \ldots, n \right\}$中选择$k$，并且$\left\{ f^1, \ldots, f^n \right\}$是回归函数的估计量，则最小差异原则要求计算一部分估计量，而广义交叉验证、Akaike的AIC准则或Lepskii原则则不需要这样做。

更新时间: 2024-07-08 15:43:59

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2008.08718v7

Enabling Performant and Secure EDA as a Service in Public Clouds Using Confidential Containers

Increasingly, business opportunities available to fabless design teams in the semiconductor industry far exceed those addressable with on-prem compute resources. An attractive option to capture these electronic design automation (EDA) design opportunities is through public cloud bursting. However, security concerns with public cloud bursting arise from having to protect process design kits, third party intellectual property, and new design data for semiconductor devices and chips. One way to address security concerns for public cloud bursting is to leverage confidential containers for EDA workloads. Confidential containers add zero trust computing elements to significantly reduce the probability of intellectual property escapes. A key concern that often follows security discussions is whether EDA workload performance will suffer with confidential computing. In this work we demonstrate a full set of EDA confidential containers and their deployment and characterize performance impacts of confidential elements of the flow including storage and networking. A complete end-to-end confidential container-based EDA workload exhibits 7.13% and 2.05% performance overheads over bare-metal container and VM based solutions, respectively.

Updated: 2024-07-08 15:36:30

标题: 在公共云中使用保密容器实现高性能和安全的EDA服务

摘要: 随着在半导体行业中可供无厂设计团队利用的商机远远超过了可以通过本地计算资源解决的范围，通过公共云爆破捕捉这些电子设计自动化（EDA）设计机会是一个具有吸引力的选择。然而，公共云爆破带来的安全问题源于需要保护过程设计工具包、第三方知识产权以及半导体设备和芯片的新设计数据。解决公共云爆破安全问题的一种方法是利用EDA工作负载的保密容器。保密容器添加了零信任计算元素，可以显著降低知识产权泄露的可能性。安全讨论经常引发的一个关键问题是EDA工作负载在保密计算下是否会受到性能影响。在这项工作中，我们展示了一整套EDA保密容器及其部署，并对包括存储和网络在内的保密元素的性能影响进行了表征。完整的端到端基于保密容器的EDA工作负载相对于基于裸金属容器和虚拟机的解决方案分别具有7.13%和2.05%的性能开销。

更新时间: 2024-07-08 15:36:30

领域: cs.CR

下载: http://arxiv.org/abs/2407.06040v1

UDPM: Upsampling Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this mapping. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}

Updated: 2024-07-08 15:32:52

标题: UDPM：上采样扩散概率模型

摘要: 去噪扩散概率模型（DDPM）最近引起了人们的极大关注。DDPM组成一个马尔可夫过程，从数据领域开始，逐渐添加噪声，直到达到纯白噪声。通过定义一个反向过程并训练深度神经网络学习这种映射，DDPM从复杂数据分布中生成高质量样本。然而，这些模型效率低下，因为它们需要许多扩散步骤才能产生审美上令人满意的样本。此外，与生成对抗网络（GANs）不同，扩散模型的潜在空间不太可解释。在这项工作中，我们提出将去噪扩散过程概括为一种上采样扩散概率模型（UDPM）。在正向过程中，我们通过下采样减少潜变量维度，然后进行传统的噪声扰动。因此，反向过程逐渐去噪和上采样潜变量，以产生来自数据分布的样本。我们形式化了UDPM的马尔科夫扩散过程，并展示了它在流行的FFHQ、AFHQv2和CIFAR10数据集上的生成能力。UDPM生成的图像只需三次网络评估，整体计算成本低于单个DDPM或EDM步骤，同时实现了6.86的FID得分。这超过了当前最先进的高效扩散模型，这些模型仅使用单个去噪步骤进行采样。此外，UDPM提供一个可解释和可插值的潜在空间，这使其比传统的DDPM具有优势。我们的代码可以在线获取：\url{https://github.com/shadyabh/UDPM/}

更新时间: 2024-07-08 15:32:52

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2305.16269v3

iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement

Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noise, as well as rare real-life events not included in the reward function, such as unconsidered emergency vehicles. To address these limitations, we introduce a novel integration framework that combines a large language model (LLM) with RL. This framework is designed to manage overlooked elements in the reward function and gaps in state information, thereby enhancing the policies of RL agents. In our approach, RL initially makes decisions based on observed data. Subsequently, LLMs evaluate these decisions to verify their reasonableness. If a decision is found to be unreasonable, it is adjusted accordingly. Additionally, this integration approach can be seamlessly integrated with existing RL-based TSC systems without necessitating modifications. Extensive testing confirms that our approach reduces the average waiting time by $17.5\%$ in degraded communication conditions as compared to traditional RL methods, underscoring its potential to advance practical RL applications in intelligent transportation systems. The related code can be found at \url{https://github.com/Traffic-Alpha/iLLM-TSC}.

Updated: 2024-07-08 15:22:49

标题: iLLM-TSC: 整合强化学习和大型语言模型以改进交通信号控制策略

摘要: 城市拥堵仍然是一个关键挑战，交通信号控制(TSC)被认为是一个有效的解决方案。TSC通常被建模为马尔可夫决策过程问题，然后使用强化学习(RL)来解决，这已被证明是有效的。然而，现有基于RL的TSC系统往往忽视由通信降级引起的不完美观测，如数据包丢失、延迟和噪声，以及奖励函数中未包含的罕见现实事件，如未考虑的紧急车辆。为了解决这些限制，我们引入了一个新颖的集成框架，将大型语言模型(LLM)与RL结合起来。该框架旨在管理奖励函数中被忽视的元素和状态信息中的空白，从而增强RL代理的策略。在我们的方法中，RL最初基于观测数据做出决策。随后，LLMs评估这些决策以验证它们的合理性。如果发现某个决策是不合理的，将相应地进行调整。此外，这种集成方法可以无缝集成到现有的基于RL的TSC系统中，而无需进行修改。广泛的测试证实，与传统RL方法相比，在通信降级条件下，我们的方法将平均等待时间缩短了17.5%，突显了它在智能交通系统中推进实际RL应用的潜力。相关代码可以在\url{https://github.com/Traffic-Alpha/iLLM-TSC}找到。

更新时间: 2024-07-08 15:22:49

领域: cs.AI

下载: http://arxiv.org/abs/2407.06025v1

Distilling System 2 into System 1

Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought (Wei et al., 2022), many such System 2 techniques have been proposed such as Rephrase and Respond (Deng et al., 2023a), System 2 Attention (Weston and Sukhbaatar, 2023) and Branch-Solve-Merge (Saha et al., 2023). In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations without intermediate reasoning token sequences, as this reasoning has been distilled into System 1. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that such System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.

Updated: 2024-07-08 15:17:46

标题: 将System 2提炼为System 1

摘要: 大型语言模型（LLMs）可以在推理过程中花费额外的计算资源来生成中间思维，从而帮助产生更好的最终响应。自从Chain-of-Thought（Wei等人，2022）提出以来，许多类似的System 2技术已经被提出，例如Rephrase and Respond（Deng等人，2023a）、System 2 Attention（Weston和Sukhbaatar，2023）和Branch-Solve-Merge（Saha等人，2023）。在这项工作中，我们研究了自监督方法，将System 2技术中更高质量的输出“编译”（蒸馏）回LLM生成中，而无需中间推理令牌序列，因为这种推理已经被蒸馏为System 1。我们展示了几种这样的技术可以成功地蒸馏，与原始System 1性能相比，结果有所改善，并且推理成本比System 2更低。我们认为，这种System 2蒸馏将成为未来不断学习的人工智能系统的重要特性，使它们能够将System 2能力集中在它们目前尚无法很好执行的推理任务上。

更新时间: 2024-07-08 15:17:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06023v1

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Weakly-Supervised Video Object Localization (WSVOL) involves localizing an object in videos using only video-level labels, also referred to as tags. State-of-the-art WSVOL methods like Temporal CAM (TCAM) rely on class activation mapping (CAM) and typically require a pre-trained CNN classifier. However, their localization accuracy is affected by their tendency to minimize the mutual information between different instances of a class and exploit temporal information during training for downstream tasks, e.g., detection and tracking. In the absence of bounding box annotation, it is challenging to exploit precise information about objects from temporal cues because the model struggles to locate objects over time. To address these issues, a novel method called transformer based CAM for videos (TrCAM-V), is proposed for WSVOL. It consists of a DeiT backbone with two heads for classification and localization. The classification head is trained using standard classification loss (CL), while the localization head is trained using pseudo-labels that are extracted using a pre-trained CLIP model. From these pseudo-labels, the high and low activation values are considered to be foreground and background regions, respectively. Our TrCAM-V method allows training a localization network by sampling pseudo-pixels on the fly from these regions. Additionally, a conditional random field (CRF) loss is employed to align the object boundaries with the foreground map. During inference, the model can process individual frames for real-time localization applications. Extensive experiments on challenging YouTube-Objects unconstrained video datasets show that our TrCAM-V method achieves new state-of-the-art performance in terms of classification and localization accuracy.

Updated: 2024-07-08 15:08:41

标题: 利用变压器在无约束视频中进行弱监督目标定位

摘要: 弱监督视频对象定位（WSVOL）涉及仅使用视频级标签（也称为标签）在视频中定位对象。像Temporal CAM（TCAM）这样的最先进的WSVOL方法依赖于类激活映射（CAM），通常需要预先训练的CNN分类器。然而，它们的定位准确性受到影响，因为它们倾向于最小化类的不同实例之间的互信息，并在训练期间利用时间信息进行下游任务（例如检测和跟踪）。在缺乏边界框标注的情况下，利用来自时间线索的对象的精确信息是具有挑战性的，因为模型在时间上难以定位对象。为了解决这些问题，提出了一种名为基于Transformer的视频CAM（TrCAM-V）的新方法用于WSVOL。它由一个DeiT骨干网络和两个用于分类和定位的头部组成。分类头使用标准分类损失（CL）进行训练，而定位头使用使用预先训练的CLIP模型提取的伪标签进行训练。从这些伪标签中，高和低激活值被视为前景和背景区域，分别。我们的TrCAM-V方法允许通过在这些区域动态抽样伪像素来训练定位网络。此外，使用条件随机场（CRF）损失来将对象边界与前景图对齐。在推断过程中，模型可以处理单个帧以进行实时定位应用。在具有挑战性的YouTube-Objects无约束视频数据集上进行的大量实验表明，我们的TrCAM-V方法在分类和定位准确性方面实现了新的最先进性能。

更新时间: 2024-07-08 15:08:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06018v1

Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.

Updated: 2024-07-08 15:08:16

标题: 物理学调控的深度强化学习：不变嵌入

摘要: 本文提出了Phy-DRL：一种用于安全关键自主系统的物理调节深度强化学习（DRL）框架。 Phy-DRL具有三种独特的不变嵌入设计：i）残差动作策略（即，集成数据驱动的DRL动作策略和基于物理模型的动作策略），ii）自动构建的安全嵌入奖励，以及iii）物理模型引导的神经网络（NN）编辑，包括链接编辑和激活编辑。从理论上讲，Phy-DRL展示了1）可以数学证明的安全保证和2）评论家和演员网络严格遵守关于动作值函数和动作策略的物理知识。最后，我们在一个平衡杆系统和一个四足机器人上评估了Phy-DRL。实验证实了我们的理论结果，并表明Phy-DRL相对于纯数据驱动的DRL和仅基于模型的设计具有保证的安全性，同时提供了明显较少的学习参数和快速训练以确保安全。

更新时间: 2024-07-08 15:08:16

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.16614v2

Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation Experiments

Causal structure learning (CSL) refers to the task of learning causal relationships from data. Advances in CSL now allow learning of causal graphs in diverse application domains, which has the potential to facilitate data-driven causal decision-making. Real-world CSL performance depends on a number of $\textit{context-specific}$ factors, including context-specific data distributions and non-linear dependencies, that are important in practical use-cases. However, our understanding of how to assess and select CSL methods in specific contexts remains limited. To address this gap, we present $\textit{CausalRegNet}$, a multiplicative effect structural causal model that allows for generating observational and interventional data incorporating context-specific properties, with a focus on the setting of gene perturbation experiments. Using real-world gene perturbation data, we show that CausalRegNet generates accurate distributions and scales far better than current simulation frameworks. We illustrate the use of CausalRegNet in assessing CSL methods in the context of interventional experiments in biology.

Updated: 2024-07-08 15:06:03

标题: 基因扰动实验中因果结构学习的基于模拟的基准测试

摘要: 因果结构学习（CSL）指的是从数据中学习因果关系的任务。 CSL的进展现在允许在各种应用领域学习因果图，这有助于促进基于数据的因果决策。现实世界中的CSL性能取决于许多$\textit{特定上下文}$因素，包括特定上下文数据分布和非线性依赖关系，在实际应用中非常重要。然而，我们对如何评估和选择特定上下文中的CSL方法的理解仍然有限。为了弥补这一差距，我们提出了$\textit{CausalRegNet}$，这是一种可以生成包含特定上下文属性的观测和干预数据的乘法效应结构因果模型，重点放在基因干扰实验的设置上。使用真实世界的基因干扰数据，我们展示了CausalRegNet生成准确的分布，并且比当前的模拟框架具有更好的扩展性。我们演示了在生物学干预实验背景下使用CausalRegNet评估CSL方法的用途。

更新时间: 2024-07-08 15:06:03

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2407.06015v1

Evaluating Predictive Models in Cybersecurity: A Comparative Analysis of Machine and Deep Learning Techniques for Threat Detection

As these attacks become more and more difficult to see, the need for the great hi-tech models that detect them is undeniable. This paper examines and compares various machine learning as well as deep learning models to choose the most suitable ones for detecting and fighting against cybersecurity risks. The two datasets are used in the study to assess models like Naive Bayes, SVM, Random Forest, and deep learning architectures, i.e., VGG16, in the context of accuracy, precision, recall, and F1-score. Analysis shows that Random Forest and Extra Trees do better in terms of accuracy though in different aspects of the dataset characteristics and types of threat. This research not only emphasizes the strengths and weaknesses of each predictive model but also addresses the difficulties associated with deploying such technologies in the real-world environment, such as data dependency and computational demands. The research findings are targeted at cybersecurity professionals to help them select appropriate predictive models and configure them to strengthen the security measures against cyber threats completely.

Updated: 2024-07-08 15:05:59

标题: 评估网络安全中的预测模型：机器学习和深度学习技术在威胁检测中的比较分析

摘要: 随着这些攻击变得越来越难以察觉，检测它们的高科技模型的需求变得不可否认。本文检验并比较了各种机器学习以及深度学习模型，以选择最适合用于检测和对抗网络安全风险的模型。研究中使用了两个数据集来评估像朴素贝叶斯、支持向量机、随机森林以及深度学习架构（如VGG16）等模型在准确率、精确度、召回率和F1分数方面的表现。分析结果显示，随机森林和额外树在准确率方面表现更好，尽管在数据集特征和威胁类型方面有所不同。这项研究不仅突出了每个预测模型的优势和劣势，还解决了在现实环境中部署此类技术所面临的困难，如数据依赖性和计算需求。研究结果旨在帮助网络安全专业人士选择适当的预测模型并配置它们以完全加强对抗网络威胁的安全措施。

更新时间: 2024-07-08 15:05:59

领域: cs.CR

下载: http://arxiv.org/abs/2407.06014v1

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping of large spaces. Agents solve the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a "fragmentation event" that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM) and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We demonstrate that FARMap replicates the fragmentation points observed in animal studies. We evaluate FARMap on complex procedurally-generated spatial environments and realistic simulations to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance. https://jd730.github.io/projects/FARMap/

Updated: 2024-07-08 15:04:55

标题: 受格网细胞启发的片段化和回忆用于高效地地图构建

摘要: 动物和机器人通过建立和完善空间地图来在环境中导航。这些地图使得导航回家、规划、搜索和觅食等功能成为可能。在这里，我们利用来自神经科学的观察，特别是在分隔空间中观察到的网格细胞地图的碎片化，提出并应用了“碎片化和回溯”（FARMap）的概念来映射大空间。代理通过基于惊喜的空间聚类来构建本地地图，从而解决了映射问题，他们利用这些地图来设定空间探索的子目标。代理构建并使用本地地图来预测他们的观察结果；高惊喜导致“碎片化事件”，即截断本地地图。在这些事件中，最近的本地地图被放入长期记忆（LTM）中，并初始化一个不同的本地地图。如果在断裂点的观察与存储的本地地图中的观察相匹配，则从LTM中检索（并因此重复使用）该地图。碎片化点引发更大空间的自然在线聚类，形成一组存储在LTM中的内在潜在子目标的拓扑图。代理从当前本地地图或LTM中的近距离和远距离潜在子目标集中选择下一个子目标。因此，本地地图在本地引导探索，而LTM促进全局探索。我们证明FARMap复制了动物研究中观察到的碎片化点。我们在复杂的程序生成的空间环境和逼真的模拟中评估了FARMap，以证明这种映射策略更快地覆盖环境（代理步数和墙钟时间），并且在主动内存使用效率上更高，而不会影响性能。

更新时间: 2024-07-08 15:04:55

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2307.05793v3

Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian

The development of domain-specific language models has significantly advanced natural language processing applications in various specialized fields, particularly in biomedicine. However, the focus has largely been on English-language models, leaving a gap for less-resourced languages such as Italian. This paper introduces Igea, the first decoder-only language model designed explicitly for biomedical text generation in Italian. Built on the Minerva model and continually pretrained on a diverse corpus of Italian medical texts, Igea is available in three model sizes: 350 million, 1 billion, and 3 billion parameters. The models aim to balance computational efficiency and performance, addressing the challenges of managing the peculiarities of medical terminology in Italian. We evaluate Igea using a mix of in-domain biomedical corpora and general-purpose benchmarks, highlighting its efficacy and retention of general knowledge even after the domain-specific training. This paper discusses the model's development and evaluation, providing a foundation for future advancements in Italian biomedical NLP.

Updated: 2024-07-08 15:04:21

标题: Igea：一种仅解码器的语言模型，用于意大利语生物医学文本生成

摘要: 领域特定语言模型的发展显著推动了各种专业领域中的自然语言处理应用，特别是在生物医学领域。然而，目前主要集中在英语语言模型上，留下了对意大利语等资源较少的语言的空白。本文介绍了Igea，这是第一个专门设计用于意大利语生物医学文本生成的仅解码器语言模型。基于Minerva模型构建，并在各种意大利医学文本语料库上进行持续预训练，Igea提供三种模型大小：3.5亿、10亿和30亿个参数。这些模型旨在平衡计算效率和性能，解决管理意大利语医学术语的特殊挑战。我们利用混合领域生物医学语料库和通用基准进行Igea的评估，突出其有效性，并在领域特定训练后仍保留通用知识。本文讨论了模型的发展和评估，为未来意大利生物医学自然语言处理的进展奠定了基础。

更新时间: 2024-07-08 15:04:21

领域: cs.CL,cs.AI,I.2.7; J.3

下载: http://arxiv.org/abs/2407.06011v1

Surprising gender biases in GPT

We present seven experiments exploring gender biases in GPT. Initially, GPT was asked to generate demographics of a potential writer of twenty phrases containing feminine stereotypes and twenty with masculine stereotypes. Results show a strong asymmetry, with stereotypically masculine sentences attributed to a female more often than vice versa. For example, the sentence "I love playing fotbal! Im practicing with my cosin Michael" was constantly assigned by ChatGPT to a female writer. This phenomenon likely reflects that while initiatives to integrate women in traditionally masculine roles have gained momentum, the reverse movement remains relatively underdeveloped. Subsequent experiments investigate the same issue in high-stakes moral dilemmas. GPT-4 finds it more appropriate to abuse a man to prevent a nuclear apocalypse than to abuse a woman. This bias extends to other forms of violence central to the gender parity debate (abuse), but not to those less central (torture). Moreover, this bias increases in cases of mixed-sex violence for the greater good: GPT-4 agrees with a woman using violence against a man to prevent a nuclear apocalypse but disagrees with a man using violence against a woman for the same purpose. Finally, these biases are implicit, as they do not emerge when GPT-4 is directly asked to rank moral violations. These results highlight the necessity of carefully managing inclusivity efforts to prevent unintended discrimination.

Updated: 2024-07-08 14:57:02

标题: 《GPT中令人惊讶的性别偏见》

摘要: 我们展示了七个实验，探讨了GPT中的性别偏见。最初，要求GPT生成包含女性刻板印象的二十个短语和包含男性刻板印象的二十个短语的潜在作者的人口统计信息。结果显示出明显的不对称性，刻板印象中的男性句子被更频繁地归因于女性，反之亦然。例如，“我喜欢踢足球！我正在和我的堂兄迈克尔练习”这句话被ChatGPT不断地归属于女性作者。这种现象可能反映出，虽然将女性融入传统男性角色的努力正在加速发展，但反向运动相对不够发展。随后的实验探讨了高风险道德困境中的同一问题。GPT-4认为虐待一个男性以防止核大灾难比虐待一个女性更合适。这种偏见扩展到了与性别平等辩论中的其他形式的暴力（虐待）相关的形式，但不包括那些较不重要的形式（酷刑）。此外，这种偏见在涉及为了大局而进行的混合性别暴力的情况下增加：GPT-4同意一个女性使用暴力来阻止一个男性核大灾难，但反对一个男性使用暴力来阻止一个女性为了同样的目的。最后，这些偏见是内隐的，因为当直接要求GPT-4排名道德违规行为时，它们并不显现。这些结果强调了需要仔细管理包容性努力以防止意外歧视。

更新时间: 2024-07-08 14:57:02

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.06003v1

Optimizing Data-driven Causal Discovery Using Knowledge-guided Search

Learning causal relationships solely from observational data often fails to reveal the underlying causal mechanisms due to the vast search space of possible causal graphs, which can grow exponentially, especially for greedy algorithms using score-based approaches. Leveraging prior causal information, such as the presence or absence of causal edges, can help restrict and guide the score-based discovery process, leading to a more accurate search. In the healthcare domain, prior knowledge is abundant from sources like medical journals, electronic health records (EHRs), and clinical intervention outcomes. This study introduces a knowledge-guided causal structure search (KGS) approach that utilizes observational data and structural priors (such as causal edges) as constraints to learn the causal graph. KGS leverages prior edge information between variables, including the presence of a directed edge, the absence of an edge, and the presence of an undirected edge. We extensively evaluate KGS in multiple settings using synthetic and benchmark real-world datasets, as well as in a real-life healthcare application related to oxygen therapy treatment. To obtain causal priors, we use GPT-4 to retrieve relevant literature information. Our results show that structural priors of any type and amount enhance the search process, improving performance and optimizing causal discovery. This guided strategy ensures that the discovered edges align with established causal knowledge, enhancing the trustworthiness of findings while expediting the search process. It also enables a more focused exploration of causal mechanisms, potentially leading to more effective and personalized healthcare solutions.

Updated: 2024-07-08 14:54:48

标题: 优化数据驱动的因果发现：使用知识引导搜索

摘要: 从仅观测数据中学习因果关系通常无法揭示潜在的因果机制，这是因为可能的因果图搜索空间庞大，尤其是对于使用基于评分的贪婪算法，可能呈指数增长。利用先前的因果信息，如因果边的存在或不存在，可以帮助限制和引导基于评分的发现过程，从而实现更准确的搜索。在医疗保健领域，先前的知识来源丰富，包括医学期刊、电子健康记录（EHRs）和临床干预结果。本研究介绍了一种知识引导的因果结构搜索（KGS）方法，利用观测数据和结构先验（如因果边）作为约束来学习因果图。KGS利用变量之间的先前边信息，包括有向边的存在、边的缺失和无向边的存在。我们在多种设置中广泛评估了KGS，包括使用合成和基准真实世界数据集，以及与氧疗治疗相关的真实医疗应用。为了获得因果先验，我们使用GPT-4来检索相关文献信息。我们的结果表明，任何类型和数量的结构先验都可以增强搜索过程，提高性能并优化因果发现。这种引导策略确保发现的边与已建立的因果知识一致，增强了发现结果的可信度，同时加快了搜索过程。它还能够更加专注地探索因果机制，可能导致更有效和个性化的医疗解决方案。

更新时间: 2024-07-08 14:54:48

领域: cs.AI

下载: http://arxiv.org/abs/2304.05493v2

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations. While doing so, we additionally outline a short-form revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Our results reveal good system performance from a user experience perspective (85.5% recommendation accuracy) but underscore latency, cost, and quality issues challenging business viability. Notably, with a median cost of $0.04 per interaction and a latency of 5.7s, cost-effectiveness and response time emerge as crucial areas for achieving a more user-friendly and economically viable LLM-driven CRS for SME settings. One major driver of these costs is the use of an advanced LLM as a ranker within the retrieval-augmented generation (RAG) technique. Our results additionally indicate that relying solely on approaches such as Prompt-based learning with ChatGPT as the underlying LLM makes it challenging to achieve satisfying quality in a production environment. Strategic considerations for SMEs deploying an LLM-driven CRS are outlined, particularly considering trade-offs in the current technical landscape.

Updated: 2024-07-08 14:50:49

标题: EventChat：在SME环境中探索休闲活动的大型语言模型驱动的对话推荐系统的实施和用户中心评估

摘要: 大型语言模型(LLMs)对话式推荐系统(CRS)的战略潜力产生了巨大的发展。然而，迄今为止，研究主要集中在实施LLM驱动的CRS的技术框架，而不是终端用户评估或对企业的战略影响，特别是从中小型企业(SME)的角度来看，这些企业构成了全球经济的基石。在本文中，我们详细介绍了在SME环境中设计LLM驱动的CRS的过程，以及在野外使用客观系统度量和主观用户评估的性能。在此过程中，我们还概述了一个用于评估LLM驱动的CRS的短形修订ResQue模型，从而使其在快速发展的领域中可复制。我们的结果显示，从用户体验角度来看，系统表现良好(85.5%的推荐准确率)，但突出了延迟、成本和质量问题，挑战了业务的可行性。值得注意的是，每次互动的中位成本为0.04美元，延迟为5.7秒，成本效益和响应时间成为实现更具用户友好性和经济可行性的SME环境中LLM驱动的CRS的关键领域。这些成本的一个主要驱动因素是在检索增强生成(RAG)技术中将先进的LLM用作排名器。我们的结果还表明，仅依靠Prompt-based learning和ChatGPT等方法作为基础LLM在生产环境中实现令人满意的质量是具有挑战性的。为部署LLM驱动的CRS的SME提出了战略考虑，特别是考虑了当前技术格局中的权衡。

更新时间: 2024-07-08 14:50:49

领域: cs.IR,cs.AI,cs.CL,cs.LG,68T50,I.2.7; H.5.2

下载: http://arxiv.org/abs/2407.04472v2

On the Topology Awareness and Generalization Performance of Graph Neural Networks

Many computer vision and machine learning problems are modelled as learning tasks on graphs where graph neural networks GNNs have emerged as a dominant tool for learning representations of graph structured data A key feature of GNNs is their use of graph structures as input enabling them to exploit the graphs inherent topological properties known as the topology awareness of GNNs Despite the empirical successes of GNNs the influence of topology awareness on generalization performance remains unexplored, particularly for node level tasks that diverge from the assumption of data being independent and identically distributed IID The precise definition and characterization of the topology awareness of GNNs especially concerning different topological features are still unclear This paper introduces a comprehensive framework to characterize the topology awareness of GNNs across any topological feature Using this framework we investigate the effects of topology awareness on GNN generalization performance Contrary to the prevailing belief that enhancing the topology awareness of GNNs is always advantageous our analysis reveals a critical insight improving the topology awareness of GNNs may inadvertently lead to unfair generalization across structural groups which might not be desired in some scenarios Additionally we conduct a case study using the intrinsic graph metric the shortest path distance on various benchmark datasets The empirical results of this case study confirm our theoretical insights Moreover we demonstrate the practical applicability of our framework by using it to tackle the cold start problem in graph active learning

Updated: 2024-07-08 14:49:14

标题: 关于图神经网络的拓扑感知和泛化性能

摘要: 许多计算机视觉和机器学习问题被建模为在图上学习的任务，图神经网络（GNNs）已经成为学习图结构数据表示的主要工具。GNNs的一个关键特点是它们将图结构作为输入，使其能够利用图的固有拓扑特性，即GNNs的拓扑感知性。尽管GNNs在经验上取得了成功，但拓扑感知对泛化性能的影响尚未被探究，特别是对于与数据独立同分布假设不符的节点级任务。关于GNNs的拓扑感知的精确定义和特征化，尤其是关于不同拓扑特征的情况仍然不清楚。本文介绍了一个全面的框架，用于表征GNNs对任何拓扑特征的拓扑感知性。利用这个框架，我们研究了拓扑感知对GNN泛化性能的影响。与普遍认为增强GNNs的拓扑感知总是有利的观点相反，我们的分析揭示了一个关键的见解，即改善GNNs的拓扑感知可能会无意中导致在结构群体之间不公平的泛化，这在某些场景下可能是不希望的。此外，我们在各种基准数据集上进行了一个案例研究，使用内在图度量——最短路径距离。这个案例研究的经验结果验证了我们的理论见解。此外，我们通过使用该框架来解决图主动学习中的冷启动问题，展示了我们框架的实际适用性。

更新时间: 2024-07-08 14:49:14

领域: cs.LG

下载: http://arxiv.org/abs/2403.04482v2

Adaptive and robust watermark against model extraction attack

Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP of LLMs. Our method incorporates a self-watermarking mechanism that allows LLMs to autonomously insert watermarks into their generated content to avoid the degradation of model content. We also propose a robust watermark detection mechanism capable of effectively identifying watermark signals under the interference of varying adversarial strategies. Besides, ModelShield is a plug-and-play method that does not require additional model training, enhancing its applicability in LLM deployments. Extensive evaluations on two real-world datasets and three LLMs demonstrate that our method surpasses existing methods in terms of defense effectiveness and robustness while significantly reducing the degradation of watermarking on the model-generated content.

Updated: 2024-07-08 14:47:42

标题: 自适应和鲁棒的水印抵抗模型提取攻击

摘要: 大语言模型（LLMs）展示了在各种机器学习任务中的普遍智能，从而增强了它们的知识产权（IP）的商业价值。为了保护这些知识产权，模型所有者通常只允许用户以黑匣子的方式访问，然而，对手仍然可以利用模型提取攻击来窃取模型生成中编码的智能。数字水印技术为抵御这种攻击提供了一个有前途的解决方案，通过将唯一标识符嵌入到模型生成的内容中。然而，现有的数字水印方法通常由于启发式改变而损害了生成内容的质量，并且缺乏抵制对手策略的强大机制，从而限制了它们在现实场景中的实用性。本文介绍了一种自适应和强大的水印方法（名为ModelShield）来保护LLMs的知识产权。我们的方法包含了一种自水印机制，允许LLMs自主地将水印插入到它们生成的内容中，以避免模型内容的降级。我们还提出了一种鲁棒的水印检测机制，能够有效地在各种对手策略的干扰下识别水印信号。此外，ModelShield是一种即插即用的方法，不需要额外的模型训练，增强了它在LLMs部署中的适用性。在两个真实世界数据集和三个LLMs上进行的广泛评估表明，我们的方法在防御效果和鲁棒性方面超越了现有方法，同时显著减少了水印对模型生成内容的降级。

更新时间: 2024-07-08 14:47:42

领域: cs.CR

下载: http://arxiv.org/abs/2405.02365v2

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of the state-of-the-art method PDiscoNet with a transformer-based backbone. We consistently obtain substantial improvements across the board, both on part discovery metrics and the downstream classification task, showing that the strong inductive biases in self-supervised ViT models require to rethink the geometric priors that can be used for unsupervised part discovery.

Updated: 2024-07-08 14:44:06

标题: PDiscoFormer：使用视觉Transformer放松部分发现约束

摘要: 计算机视觉方法明确检测对象部分并对其进行推理是朝着固有可解释模型迈出的一步。现有的方法通过一个细粒度分类任务驱动部分发现，对所发现部分的几何属性做出了非常严格的假设；它们应该是小型且紧凑的。尽管这种先验在某些情况下很有用，但本文展示了预训练的基于Transformer的视觉模型，比如自监督的DINOv2 ViT，使得这些约束得以放宽。特别是，我们发现允许任意大小的多个连接组件的总变差（TV）先验，大大优于先前的工作。我们在三个细粒度分类基准数据集上测试我们的方法：CUB，PartImageNet和Oxford Flowers，并将我们的结果与先前发布的方法以及基于Transformer骨干的现有最先进方法PDiscoNet的重新实现进行比较。我们在部分发现指标和下游分类任务上持续获得实质性改进，表明自监督ViT模型中的强归纳偏差要求重新考虑可用于无监督部分发现的几何先验。

更新时间: 2024-07-08 14:44:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.04538v2

Multi-Texture Synthesis through Signal Responsive Neural Cellular Automata

Neural Cellular Automata (NCA) have proven to be effective in a variety of fields, with numerous biologically inspired applications. One of the fields, in which NCAs perform well is the generation of textures, modelling global patterns from local interactions governed by uniform and coherent rules. This paper aims to enhance the usability of NCAs in texture synthesis by addressing a shortcoming of current NCA architectures for texture generation, which requires separately trained NCA for each individual texture. In this work, we train a single NCA for the evolution of multiple textures, based on individual examples. Our solution provides texture information in the state of each cell, in the form of an internally coded genomic signal, which enables the NCA to generate the expected texture. Such a neural cellular automaton not only maintains its regenerative capability but also allows for interpolation between learned textures and supports grafting techniques. This demonstrates the ability to edit generated textures and the potential for them to merge and coexist within the same automaton. We also address questions related to the influence of the genomic information and the cost function on the evolution of the NCA.

Updated: 2024-07-08 14:36:20

标题: 通过信号响应神经元细胞自动机进行多纹理合成

摘要: 神经细胞自动机（NCA）已被证明在各个领域具有有效性，并有许多受生物启发的应用。其中之一是NCAs在纹理生成方面表现良好，通过由统一和一致规则控制的局部相互作用建模全局模式。本文旨在通过解决当前NCA体系结构在纹理生成方面的一个缺点，即需要为每个单独的纹理单独训练NCA，以增强NCAs在纹理合成中的可用性。在这项工作中，我们基于各个示例训练一个单一的NCA以进化多个纹理。我们的解决方案通过在每个细胞状态中提供纹理信息，以内部编码的基因组信号的形式，使NCA能够生成预期的纹理。这样的神经细胞自动机不仅保持其再生能力，还允许在学习纹理之间进行插值并支持嫁接技术。这展示了编辑生成的纹理的能力以及它们在同一自动机中融合和共存的潜力。我们还讨论了基因组信息和成本函数对NCA进化的影响相关的问题。

更新时间: 2024-07-08 14:36:20

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.05991v1

Towards LLM-based Autograding for Short Textual Answers

Grading exams is an important, labor-intensive, subjective, repetitive, and frequently challenging task. The feasibility of autograding textual responses has greatly increased thanks to the availability of large language models (LLMs) such as ChatGPT and the substantial influx of data brought about by digitalization. However, entrusting AI models with decision-making roles raises ethical considerations, mainly stemming from potential biases and issues related to generating false information. Thus, in this manuscript, we provide an evaluation of a large language model for the purpose of autograding, while also highlighting how LLMs can support educators in validating their grading procedures. Our evaluation is targeted towards automatic short textual answers grading (ASAG), spanning various languages and examinations from two distinct courses. Our findings suggest that while "out-of-the-box" LLMs provide a valuable tool to provide a complementary perspective, their readiness for independent automated grading remains a work in progress, necessitating human oversight.

Updated: 2024-07-08 14:28:41

标题: 朝向基于LLM的短文本答案自动评分

摘要: 批阅考试是一项重要且劳动密集、主观、重复且常常具有挑战性的任务。由于大型语言模型（LLMs）如ChatGPT的可用性以及数字化带来的大量数据的涌入，自动评分文本回答的可行性大大增加。然而，将决策角色委托给人工智能模型引发了伦理考虑，主要源自潜在偏见和有关生成虚假信息的问题。因此，在本文中，我们评估了一个大型语言模型用于自动评分的目的，同时强调LLMs如何支持教育工作者验证他们的评分程序。我们的评估针对自动短文本答案评分（ASAG），涵盖了来自两门不同课程的各种语言和考试。我们的研究结果表明，“开箱即用”的LLMs提供了一种有价值的工具，可以提供一个补充性的视角，但它们独立自动评分的准备工作仍然在进行中，需要人类监督。

更新时间: 2024-07-08 14:28:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.11508v2

KidSat: satellite imagery to map childhood poverty dataset and benchmark

Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representations. Our dataset consists of 33,608 images, each 10 km $\times$ 10 km, from 19 countries in Eastern and Southern Africa in the time period 1997-2022. As defined by UNICEF, multidimensional child poverty covers six dimensions and it can be calculated from the face-to-face Demographic and Health Surveys (DHS) Program . As part of the benchmark, we test spatial as well as temporal generalization, by testing on unseen locations, and on data after the training years. Using our dataset we benchmark multiple models, from low-level satellite imagery models such as MOSAIKS , to deep learning foundation models, which include both generic vision models such as Self-Distillation with no Labels (DINOv2) models and specific satellite imagery models such as SatMAE. We provide open source code for building the satellite dataset, obtaining ground truth data from DHS and running various models assessed in our work.

Updated: 2024-07-08 14:26:30

标题: KidSat：利用卫星影像绘制儿童贫困数据集和基准。

摘要: 卫星图像已经成为分析人口、健康和发展指标的重要工具。虽然已经建立了各种深度学习模型来处理这些任务，但每个模型都针对特定问题，可用的标准基准较少。我们提出了一个新的数据集，将卫星图像与关于儿童贫困的高质量调查数据配对，以用于评估卫星特征表示。我们的数据集包括来自东部和南部非洲19个国家的33,608张图像，每张图像为10公里×10公里，时间跨度为1997年至2022年。根据联合国儿童基金会的定义，多维儿童贫困涵盖六个维度，可以从面对面的人口与健康调查（DHS）计划中计算得出。作为基准测试的一部分，我们测试空间和时间的泛化能力，通过在未知位置上进行测试，并在训练年份之后的数据上进行测试。使用我们的数据集，我们对多个模型进行基准测试，从低级卫星图像模型（如MOSAIKS）到包括通用视觉模型（如没有标签的自我蒸馏（DINOv2）模型）和特定卫星图像模型（如SatMAE）在内的深度学习基础模型。我们提供了用于构建卫星数据集、从DHS获取地面真实数据以及运行各种在我们的工作中评估过的模型的开源代码。

更新时间: 2024-07-08 14:26:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.05986v1

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. We collect the first set of human annotated red-teaming prompts in different languages distinguishing between global and local harm, which serve as a laboratory for understanding the reliability of alignment techniques when faced with preference distributions that are non-stationary across geographies and languages. While this setting is seldom covered by the literature to date, which primarily centers on English harm mitigation, it captures real-world interactions with AI systems around the world. We establish a new precedent for state-of-the-art alignment techniques across 6 languages with minimal degradation in general performance. Our work provides important insights into cross-lingual transfer and novel optimization approaches to safeguard AI systems designed to serve global populations.

Updated: 2024-07-08 14:26:16

标题: 多语言对齐棱镜：将全球和本地偏好对齐以减少伤害

摘要: “对齐”概念中的一个关键问题是“对齐到什么？”人工智能系统在全球范围内的使用日益增多，然而安全对齐往往集中在同质单语环境中。此外，偏好训练和安全措施往往过度拟合于西方中心数据集中常见的危害。在这里，我们探讨了在平衡双重目标时不同对齐方法的可行性：处理和优化非同质语言和文化偏好集合，同时最大程度地减少全球和本地危害。我们收集了第一组以不同语言标注的红队提示，区分全球和本地危害，这些提示作为了解当面对地理和语言跨区域的偏好分布时对齐技术可靠性的实验室。尽管这种设置在文献中鲜有涉及，大部分主要集中在英文危害缓解上，但它捕捉了全球范围内与人工智能系统的真实互动。我们在6种语言中建立了一种新的最先进的对齐技术的先例，整体性能几乎没有降级。我们的工作为跨语言转移和新颖的优化方法提供了重要见解，以保护旨在服务全球人口的人工智能系统。

更新时间: 2024-07-08 14:26:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18682v2

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

Over recent years, deep convolutional neural networks have significantly advanced the field of face recognition techniques for both verification and identification purposes. Despite the impressive accuracy, these neural networks are often criticized for lacking explainability. There is a growing demand for understanding the decision-making process of AI-based face recognition systems. Some studies have investigated the use of visual saliency maps as explanations, but they have predominantly focused on the specific face verification case. The discussion on more general face recognition scenarios and the corresponding evaluation methodology for these explanations have long been absent in current research. Therefore, this manuscript conceives a comprehensive explanation framework for face recognition tasks. Firstly, an exhaustive definition of visual saliency map-based explanations for AI-based face recognition systems is provided, taking into account the two most common recognition situations individually, i.e., face verification and identification. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions between any given face images. Subsequently, the explanation framework conceives a new evaluation methodology that offers quantitative measurement and comparison of the performance of general visual saliency explanation methods in face recognition. Consequently, extensive experiments are carried out on multiple verification and identification scenarios. The results showcase that CorrRISE generates insightful saliency maps and demonstrates superior performance, particularly in similarity maps in comparison with the state-of-the-art explanation approaches.

Updated: 2024-07-08 14:25:46

标题: 朝向基于人工智能的人脸识别系统的综合视觉显著性解释框架

摘要: 在最近几年，深度卷积神经网络显著推动了人脸识别技术领域的发展，用于验证和识别目的。尽管准确性令人印象深刻，这些神经网络经常被批评缺乏可解释性。人们越来越希望了解基于人工智能的人脸识别系统的决策过程。一些研究已经调查了使用视觉显著性图作为解释，但它们主要集中在特定的人脸验证案例上。当前研究长期缺乏对更一般的人脸识别场景及其解释的评估方法的讨论。因此，本手稿构想了一个全面的人脸识别任务解释框架。首先，提供了基于视觉显著性图的解释对于基于人工智能的人脸识别系统的详尽定义，分别考虑了两种最常见的识别情况，即人脸验证和识别。其次，提出了一种名为CorrRISE的新的与模型无关的解释方法，用于生成显著性图，揭示任何给定人脸图像之间相似和不同的区域。随后，解释框架构想了一种新的评估方法，提供了对人脸识别中一般视觉显著性解释方法性能的定量测量和比较。因此，在多个验证和识别场景上进行了广泛的实验。结果显示，CorrRISE生成了富有洞见的显著性图，并在相似性图中表现出卓越的性能，特别是与最先进的解释方法相比。

更新时间: 2024-07-08 14:25:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05983v1

MTL-Split: Multi-Task Learning for Edge Devices using Split Computing

Split Computing (SC), where a Deep Neural Network (DNN) is intelligently split with a part of it deployed on an edge device and the rest on a remote server is emerging as a promising approach. It allows the power of DNNs to be leveraged for latency-sensitive applications that do not allow the entire DNN to be deployed remotely, while not having sufficient computation bandwidth available locally. In many such embedded systems scenarios, such as those in the automotive domain, computational resource constraints also necessitate Multi-Task Learning (MTL), where the same DNN is used for multiple inference tasks instead of having dedicated DNNs for each task, which would need more computing bandwidth. However, how to partition such a multi-tasking DNN to be deployed within a SC framework has not been sufficiently studied. This paper studies this problem, and MTL-Split, our novel proposed architecture, shows encouraging results on both synthetic and real-world data. The source code is available at https://github.com/intelligolabs/MTL-Split.

Updated: 2024-07-08 14:25:39

标题: MTL-Split：使用分布式计算的边缘设备多任务学习

摘要: 分割计算（SC）是一种新兴的方法，其中深度神经网络（DNN）被智能地分割，其中部分部署在边缘设备上，其余部分部署在远程服务器上。这种方法被认为是一种有前途的方法。它允许利用DNN的强大性能用于延迟敏感的应用程序，这些应用程序不允许整个DNN被远程部署，同时又没有足够的本地计算带宽可用。在许多这样的嵌入式系统场景中，例如汽车领域，计算资源限制也需要多任务学习（MTL），其中相同的DNN用于多个推断任务，而不是为每个任务使用专用的DNN，这将需要更多的计算带宽。然而，如何将这种多任务DNN分区部署在SC框架内尚未得到充分研究。本文研究了这个问题，并提出了我们的新颖架构MTL-Split，在合成和真实数据上展示了令人鼓舞的结果。源代码可在https://github.com/intelligolabs/MTL-Split找到。

更新时间: 2024-07-08 14:25:39

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2407.05982v1

Learning Dynamics from Multicellular Graphs with Deep Neural Networks

Multicellular self-assembly into functional structures is a dynamic process that is critical in the development and diseases, including embryo development, organ formation, tumor invasion, and others. Being able to infer collective cell migratory dynamics from their static configuration is valuable for both understanding and predicting these complex processes. However, the identification of structural features that can indicate multicellular motion has been difficult, and existing metrics largely rely on physical instincts. Here we show that using a graph neural network (GNN), the motion of multicellular collectives can be inferred from a static snapshot of cell positions, in both experimental and synthetic datasets.

Updated: 2024-07-08 14:24:40

标题: 利用深度神经网络从多细胞图中学习动态特征

摘要: 多细胞自组装成功能结构是发育和疾病中至关重要的动态过程，包括胚胎发育、器官形成、肿瘤侵袭等。能够从静态配置推断集体细胞迁移动态对于理解和预测这些复杂过程都具有价值。然而，识别能够指示多细胞运动的结构特征一直很困难，现有的度量主要依赖于物理直觉。在这里，我们展示了使用图神经网络（GNN），可以从细胞位置的静态快照中推断出多细胞集体的运动，无论是在实验数据集还是合成数据集中。

更新时间: 2024-07-08 14:24:40

领域: physics.bio-ph,cond-mat.soft,cs.LG

下载: http://arxiv.org/abs/2401.12196v2

HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks

The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.

Updated: 2024-07-08 14:21:59

标题: 超级MAML：使用超网络对深度模型进行少样本适应

摘要: Few-Shot学习方法的目标是训练模型，可以基于少量数据轻松适应之前未见任务。其中最流行和优雅的Few-Shot学习方法之一是Model-Agnostic Meta-Learning（MAML）。该方法的主要思想是学习元模型的通用权重，然后在少量梯度步骤中进一步适应特定问题。然而，该模型的主要局限在于更新过程是通过基于梯度的优化实现的。因此，MAML并不总能在一个或几个梯度迭代中将权重修改到必要的水平。另一方面，使用许多梯度步骤会导致复杂且耗时的优化过程，很难在实践中训练，并可能导致过拟合。在本文中，我们提出了HyperMAML，这是MAML的一种新的泛化方法，其中更新过程的训练也是模型的一部分。在HyperMAML中，我们使用可训练的Hypernetwork来代替使用梯度下降更新权重。因此，在这个框架中，模型可以生成重要的更新，其范围不仅限于固定数量的梯度步骤。实验证明，HyperMAML始终优于MAML，并在许多标准Few-Shot学习基准测试中表现出与其他最先进技术相当的性能。

更新时间: 2024-07-08 14:21:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2205.15745v3

Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity

This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.

Updated: 2024-07-08 14:20:05

标题: 探索人类-机器学习模型对话：心理模型与毒性的起源

摘要: 本研究探讨了在多样化、不受限制的环境中人类与大型语言模型（LLMs）的真实世界互动，与大多数先前研究侧重于像ChatGPT这样专门用于特定任务的伦理修剪模型形成对比。我们的目标是理解毒性内容的产生者。我们的研究结果表明，尽管LLMs被正当地指责为提供有毒内容，但这种内容大多是由积极寻求此类内容的人类需求或至少被挑衅而产生的。我们对API商业供应商判断为有毒的数百次对话进行了手动分析，还对当前拒绝回答哪些用户请求的做法提出了疑问。此外，我们根据多个经验指标推测，人类表现出其心智模型的改变，从更多地与机器互动的思维方式转向更多地与人类互动的方式。

更新时间: 2024-07-08 14:20:05

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.05977v1

Change-Point Detection in Industrial Data Streams based on Online Dynamic Mode Decomposition with Control

We propose a novel change-point detection method based on online Dynamic Mode Decomposition with control (ODMDwC). Leveraging ODMDwC's ability to find and track linear approximation of a non-linear system while incorporating control effects, the proposed method dynamically adapts to its changing behavior due to aging and seasonality. This approach enables the detection of changes in spatial, temporal, and spectral patterns, providing a robust solution that preserves correspondence between the score and the extent of change in the system dynamics. We formulate a truncated version of ODMDwC and utilize higher-order time-delay embeddings to mitigate noise and extract broad-band features. Our method addresses the challenges faced in industrial settings where safety-critical systems generate non-uniform data streams while requiring timely and accurate change-point detection to protect profit and life. Our results demonstrate that this method yields intuitive and improved detection results compared to the Singular-Value-Decomposition-based method. We validate our approach using synthetic and real-world data, showing its competitiveness to other approaches on complex systems' benchmark datasets. Provided guidelines for hyperparameters selection enhance our method's practical applicability.

Updated: 2024-07-08 14:18:33

标题: 基于在线动态模态分解与控制的工业数据流变点检测

摘要: 我们提出了一种基于在线动态模态分解控制（ODMDwC）的新型变点检测方法。利用ODMDwC寻找和跟踪非线性系统的线性近似能力，并结合控制效果，所提出的方法能够动态适应由于老化和季节性而发生的系统行为变化。这种方法使得能够检测到空间、时间和频谱模式的变化，提供了一种能够保持得分与系统动力学变化程度对应的稳健解决方案。我们制定了ODMDwC的截断版本，并利用高阶时滞嵌入来减少噪音并提取宽带特征。我们的方法解决了工业环境中面临的挑战，即安全关键系统生成非均匀数据流，同时需要及时准确地检测变点以保护利润和生命。我们的结果表明，与基于奇异值分解的方法相比，此方法产生直观且改进的检测结果。我们使用合成和真实数据验证了我们的方法，展示了其在复杂系统基准数据集上与其他方法竞争的能力。提供了用于选择超参数的指导方针，增强了我们方法的实际适用性。

更新时间: 2024-07-08 14:18:33

领域: cs.AI

下载: http://arxiv.org/abs/2407.05976v1

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

Large Language Models~(LLMs) demonstrate remarkable translation capabilities in high-resource language tasks, yet their performance in low-resource languages is hindered by insufficient multilingual data during pre-training. To address this, we dedicate 35,000 A100-SXM4-80GB GPU hours in conducting extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages. Through a comprehensive analysis of training strategies, such as vocabulary expansion and data augmentation, we develop LLaMAX. Remarkably, without sacrificing its generalization ability, LLaMAX achieves significantly higher translation performance compared to existing open-source LLMs~(by more than 10 spBLEU points) and performs on-par with specialized translation model~(M2M-100-12B) on the Flores-101 benchmark. Extensive experiments indicate that LLaMAX can serve as a robust multilingual foundation model. The code~\footnote{\url{https://github.com/CONE-MT/LLaMAX/.}} and models~\footnote{\url{https://huggingface.co/LLaMAX/.}} are publicly available.

Updated: 2024-07-08 14:18:28

标题: LLaMAX：通过增强超过100种语言的翻译能力来扩大LLM的语言范围

摘要: 大型语言模型（LLMs）展示了在高资源语言任务中出色的翻译能力，然而它们在低资源语言中的表现受到预训练过程中多语言数据不足的阻碍。为解决这一问题，我们投入了35,000个A100-SXM4-80GB GPU小时进行LLaMA系列模型的广泛多语言持续预训练，使其能够支持超过100种语言的翻译。通过对诸如词汇扩展和数据增强等训练策略的全面分析，我们开发了LLaMAX。值得注意的是，LLaMAX在不牺牲其泛化能力的情况下，相比现有的开源LLMs（超过10个spBLEU点）实现了显著更高的翻译性能，并在Flores-101基准测试中与专门的翻译模型（M2M-100-12B）表现相当。广泛的实验表明，LLaMAX可以作为一个强大的多语言基础模型。代码和模型均可在公开获取。

更新时间: 2024-07-08 14:18:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05975v1

Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid

While extensive research exists on physical adversarial attacks within the visible spectrum, studies on such techniques in the infrared spectrum are limited. Infrared object detectors are vital in modern technological applications but are susceptible to adversarial attacks, posing significant security threats. Previous studies using physical perturbations like light bulb arrays and aerogels for white-box attacks, or hot and cold patches for black-box attacks, have proven impractical or limited in multi-view support. To address these issues, we propose the Adversarial Infrared Grid (AdvGrid), which models perturbations in a grid format and uses a genetic algorithm for black-box optimization. These perturbations are cyclically applied to various parts of a pedestrian's clothing to facilitate multi-view black-box physical attacks on infrared pedestrian detectors. Extensive experiments validate AdvGrid's effectiveness, stealthiness, and robustness. The method achieves attack success rates of 80.00\% in digital environments and 91.86\% in physical environments, outperforming baseline methods. Additionally, the average attack success rate exceeds 50\% against mainstream detectors, demonstrating AdvGrid's robustness. Our analyses include ablation studies, transfer attacks, and adversarial defenses, confirming the method's superiority.

Updated: 2024-07-08 14:17:26

标题: 多视角黑盒物理攻击红外行人探测器使用对抗性红外网格

摘要: 尽管关于可见光谱中的物理对抗攻击的研究已经很广泛，但是红外光谱中这类技术的研究还有限。红外物体探测器在现代技术应用中至关重要，但容易受到对抗攻击，构成重大安全威胁。先前的研究使用物理扰动，如灯泡阵列和气凝胶进行白盒攻击，或者使用热和冷斑点进行黑盒攻击，已被证明在多视角支持方面不切实际或有限。为解决这些问题，我们提出了对抗性红外网格（AdvGrid），该方法以网格格式建模扰动，并使用遗传算法进行黑盒优化。这些扰动被循环应用于行人服装的各个部分，以促进对红外行人探测器的多视角黑盒物理攻击。大量实验证实了AdvGrid的有效性、隐秘性和稳健性。该方法在数字环境中的攻击成功率达到80.00％，在物理环境中达到91.86％，优于基准方法。此外，平均攻击成功率超过50％针对主流探测器，展示了AdvGrid的稳健性。我们的分析包括消融研究、转移攻击和对抗性防御，验证了该方法的优越性。

更新时间: 2024-07-08 14:17:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01168v2

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.

Updated: 2024-07-08 14:16:05

标题: 在存在高标签噪声的情况下，针对不平衡的医学图像分类任务进行稳健训练的主动标签细化

摘要: 受标签噪声影响，监督式深度学习在医学图像分类中的鲁棒性明显受损。尽管已经提出了几种方法来增强在存在嘈杂标签情况下的分类性能，但它们面临一些挑战：1）在处理类别不平衡的数据集时很难处理，导致常常忽视少数类作为嘈杂样本；2）过于专注于利用嘈杂数据集最大化性能，而不考虑引入专家来主动清理嘈杂标签。为了缓解这些挑战，我们提出了一个结合了有噪标签学习（LNL）和主动学习的两阶段方法。这种方法不仅提高了医学图像分类在存在嘈杂标签情况下的鲁棒性，还通过在有限的注释预算下重新标记重要的错误标签，逐步提高数据集的质量。此外，在LNL阶段我们引入了一种新颖的梯度方差方法，通过补充基于损失的样本选择，也对少数类别的样本进行采样。通过使用两个不平衡的嘈杂医学分类数据集，我们证明了我们提出的技术在处理类别不平衡时优于先前的方法，不会误将少数类别的干净样本误认为是嘈杂样本。

更新时间: 2024-07-08 14:16:05

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.05973v1

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by Segment Anything Model (SAM), along with introduced 3D spatial consistency regularization. Compared to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization, style transfer and scene recomposition. Our code and models are at https://github.com/lkeab/gaussian-grouping.

Updated: 2024-07-08 14:11:51

标题: 高斯分组：在3D场景中分割和编辑任何物体

摘要: 最近的高斯飞溅技术实现了高质量和实时的新视角合成3D场景。然而，它仅集中在外观和几何建模上，缺乏对细粒度对象级场景理解。为了解决这个问题，我们提出了高斯分组，将高斯飞溅扩展到联合重建和分割开放世界3D场景中的任何物体。我们通过将每个高斯与紧凑的身份编码相结合，允许按照它们在3D场景中的对象实例或材料成员资格对高斯进行分组。我们通过利用Segment Anything Model (SAM)的2D掩模预测，以及引入的3D空间一致性正则化，在不使用昂贵的3D标签的情况下，在不同iable渲染过程中监督身份编码。与隐式NeRF表示相比，我们展示了离散和分组的3D高斯可以以高视觉质量、细粒度和效率重建、分割和编辑3D中的任何物体。基于高斯分组，我们进一步提出了一种本地高斯编辑方案，在多功能场景编辑应用中表现出有效性，包括3D物体去除、修复、着色、样式转移和场景重组。我们的代码和模型位于https://github.com/lkeab/gaussian-grouping。

更新时间: 2024-07-08 14:11:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.00732v2

Malicious Agent Detection for Robust Multi-Agent Collaborative Perception

Recently, multi-agent collaborative (MAC) perception has been proposed and outperformed the traditional single-agent perception in many applications, such as autonomous driving. However, MAC perception is more vulnerable to adversarial attacks than single-agent perception due to the information exchange. The attacker can easily degrade the performance of a victim agent by sending harmful information from a malicious agent nearby. In this paper, we extend adversarial attacks to an important perception task -- MAC object detection, where generic defenses such as adversarial training are no longer effective against these attacks. More importantly, we propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception that can be deployed by each agent to accurately detect and then remove any potential malicious agent in its local collaboration network. In particular, MADE inspects each agent in the network independently using a semi-supervised anomaly detector based on a double-hypothesis test with the Benjamini-Hochberg procedure to control the false positive rate of the inference. For the two hypothesis tests, we propose a match loss statistic and a collaborative reconstruction loss statistic, respectively, both based on the consistency between the agent to be inspected and the ego agent where our detector is deployed. We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X and show that with the protection of MADE, the drops in the average precision compared with the best-case "oracle" defender against our attack are merely 1.28% and 0.34%, respectively, much lower than 8.92% and 10.00% for adversarial training, respectively.

Updated: 2024-07-08 14:08:51

标题: 恶意代理检测用于强大的多代理协作感知

摘要: 最近，多智能体协作感知（MAC）已被提出，并在许多应用中表现出优于传统单智能体感知的性能，如自动驾驶。然而，由于信息交换，MAC感知比单智能体感知更容易受到对抗性攻击的影响。攻击者可以通过从附近的恶意智能体发送有害信息来轻易降低受害智能体的性能。在本文中，我们将对抗性攻击扩展到一个重要的感知任务 - MAC目标检测，通用的防御措施如对抗性训练不再对这些攻击有效。更重要的是，我们提出了Malicious Agent Detection（MADE），这是一种针对MAC感知的特定的反应性防御措施，每个智能体都可以部署，以准确检测并消除其本地协作网络中任何潜在的恶意智能体。具体来说，MADE独立检查网络中的每个智能体，使用基于Benjamini-Hochberg程序的双假设检测器进行半监督异常检测，以控制推理的误报率。对于两个假设检测，我们分别提出了一个匹配损失统计和一个协作重构损失统计，两者都基于待检测智能体与我们的检测器部署的自我智能体之间的一致性。我们在基准3D数据集V2X-sim和真实道路数据集DAIR-V2X上进行了全面评估，并展示在MADE的保护下，与最佳情况下的“oracle”防御者相比，对我们的攻击的平均精度下降仅分别为1.28%和0.34%，远低于对抗性训练分别为8.92%和10.00%。

更新时间: 2024-07-08 14:08:51

领域: cs.CR

下载: http://arxiv.org/abs/2310.11901v2

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.

Updated: 2024-07-08 14:05:03

标题: 关于连续时间策略评估的贝尔曼方程I：离散化与逼近

摘要: 我们研究了从连续时间扩散过程的离散观测轨迹中计算值函数的问题。我们基于易于实现的数值方案开发了一种新的算法类，这些算法类与具有函数逼近的离散时间强化学习（RL）兼容。我们为所提出的方法建立高阶数值精度以及逼近误差保证。与逼近因子取决于有效时间跨度的离散时间RL问题相比，即使有效时间跨度趋于无穷大，我们也可以使用基本椭圆结构获得有界逼近因子。

更新时间: 2024-07-08 14:05:03

领域: cs.LG,cs.NA,math.NA,math.OC,math.PR

下载: http://arxiv.org/abs/2407.05966v1

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset using LLMs and jailbreaking prompt attacks. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.

Updated: 2024-07-08 14:04:58

标题: T2VSafetyBench：评估文本到视频生成模型的安全性

摘要: 最近Sora的发展开创了文本到视频（T2V）生成的新时代。随之而来的是对其安全风险的日益关注。生成的视频可能包含非法或不道德的内容，而对其安全性缺乏全面的量化理解，这给它们的可靠性和实际部署带来了挑战。先前的评估主要集中在视频生成的质量上。虽然一些对文本到图像模型的评估考虑了安全性，但涵盖的方面较少，并未解决视频生成中固有的独特时间风险。为填补这一研究空白，我们引入了T2VSafetyBench，这是一个专门设计用于进行文本到视频模型安全关键评估的新基准。我们定义了视频生成安全的12个关键方面，并利用LLMs和越狱提示攻击构建了一个恶意提示数据集。根据我们的评估结果，我们得出了一些重要发现，包括：1）没有单一模型在所有方面表现出色，不同模型展现出不同的优势；2）GPT-4评估与手动审核之间的相关性通常较高；3）文本到视频生成模型的可用性和安全性之间存在权衡。这表明随着视频生成领域的快速发展，安全风险将激增，突显了优先考虑视频安全的紧迫性。我们希望T2VSafetyBench能为更好地理解生成AI时代视频生成的安全性提供见解。

更新时间: 2024-07-08 14:04:58

领域: cs.CV,cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.05965v1

6GSoft: Software for Edge-to-Cloud Continuum

In the era of 6G, developing and managing software requires cutting-edge software engineering (SE) theories and practices tailored for such complexity across a vast number of connected edge devices. Our project aims to lead the development of sustainable methods and energy-efficient orchestration models specifically for edge environments, enhancing architectural support driven by AI for contemporary edge-to-cloud continuum computing. This initiative seeks to position Finland at the forefront of the 6G landscape, focusing on sophisticated edge orchestration and robust software architectures to optimize the performance and scalability of edge networks. Collaborating with leading Finnish universities and companies, the project emphasizes deep industry-academia collaboration and international expertise to address critical challenges in edge orchestration and software architecture, aiming to drive significant advancements in software productivity and market impact.

Updated: 2024-07-08 14:03:17

标题: 6GSoft：边缘到云端连续体的软件

摘要: 在6G时代，开发和管理软件需要为大量连接的边缘设备量身定制的尖端软件工程（SE）理论和实践。我们的项目旨在引领可持续方法和能效高的编排模型的发展，专门针对边缘环境，增强由人工智能驱动的当代边缘到云端连续计算的架构支持。该倡议旨在将芬兰置于6G领域的前沿，聚焦于复杂的边缘编排和健壮的软件架构，以优化边缘网络的性能和可扩展性。与芬兰领先的大学和公司合作，该项目强调深度产学合作和国际专业知识，以解决边缘编排和软件架构中的关键挑战，旨在推动软件生产力和市场影响的重大进步。

更新时间: 2024-07-08 14:03:17

领域: cs.SE,cs.AI,cs.NI,cs.SI

下载: http://arxiv.org/abs/2407.05963v1

Large language models in healthcare and medical domain: A review

The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications, elucidating the trajectory of their development, starting from traditional Pretrained Language Models (PLMs) to the present state of LLMs in healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multi-modal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector, offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development.

Updated: 2024-07-08 14:01:20

标题: 大型语言模型在医疗保健和医学领域的应用：综述

摘要: 在医疗领域部署大型语言模型（LLMs）引发了热情和担忧。这些模型展示了提供熟练的自由文本查询响应的显着能力，展示了对专业医学知识的微妙理解。这项综合调查深入探讨了为医疗应用设计的现有LLMs的功能，阐明了它们的发展轨迹，从传统的预训练语言模型（PLMs）到目前医疗领域LLMs的状态。首先，我们探讨了LLMs提高各种医疗应用效率和有效性的潜力，特别关注临床语言理解任务。这些任务包括各种范围，从命名实体识别和关系提取到自然语言推理，多模态医疗应用，文档分类和问答。此外，我们对医疗领域最新的最先进LLMs进行了广泛比较，同时评估了各种开源LLMs的利用并突出它们在医疗应用中的重要性。此外，我们提出了用于评估生物医学领域LLMs的关键性能指标，揭示了它们的有效性和局限性。最后，我们总结了大型语言模型在医疗领域面临的突出挑战和限制，提供了对它们潜在好处和缺点的整体视角。这篇评论全面探讨了医疗领域LLMs目前的景观，探讨了它们在转变医疗应用中的作用以及需要进一步研究和发展的领域。

更新时间: 2024-07-08 14:01:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.06775v2

On the differential and Walsh spectra of $x^{2q+1}$ over $\mathbb{F}_{q^2}$

Let $q$ be an odd prime power and let $\mathbb{F}_{q^2}$ be the finite field with $q^2$ elements. In this paper, we determine the differential spectrum of the power function $F(x)=x^{2q+1}$ over $\mathbb{F}_{q^2}$. When the characteristic of $\mathbb{F}_{q^2}$ is $3$, we also determine the value distribution of the Walsh spectrum of $F$, showing that it is $4$-valued, and use the obtained result to determine the weight distribution of a $4$-weight cyclic code.

Updated: 2024-07-08 14:01:06

标题: 关于$x^{2q+1}$在$\mathbb{F}_{q^2}$上的差分和Walsh谱

摘要: 让$q$是一个奇素数幂，让$\mathbb{F}_{q^2}$是具有$q^2$个元素的有限域。在这篇论文中，我们确定了幂函数$F(x)=x^{2q+1}$在$\mathbb{F}_{q^2}$上的微分谱。当$\mathbb{F}_{q^2}$的特征为$3$时，我们还确定了$F$的Walsh谱的值分布，表明它是$4$值的，并利用所得结果确定了一个$4$权重循环码的权重分布。

更新时间: 2024-07-08 14:01:06

领域: cs.CR,cs.IT,math.IT,math.NT

下载: http://arxiv.org/abs/2407.07710v1

What Do We Know About the Psychology of Insider Threats?

Insider threats refer to threats originating from people inside organizations. Although such threats are a classical research topic, the systematization of existing knowledge is still limited particularly with respect to non-technical research approaches. To this end, this paper presents a systematic literature review on the psychology of insider threats. According to the review results, the literature has operated with multiple distinct theories but there is still a lack of robust theorization with respect to psychology. The literature has also considered characteristics of a person, his or her personal situation, and other more or less objective facts about the person. These are seen to correlate with psychological concepts such as personality traits and psychological states of a person. In addition, the review discusses gaps and limitations in the existing research, thus opening the door for further psychology research.

Updated: 2024-07-08 13:46:20

标题: 我们对内部威胁心理学了解多少？

摘要: 内部威胁是指源自组织内部人员的威胁。尽管这样的威胁是一个经典的研究课题，但对现有知识的系统化仍然有限，特别是在非技术研究方法方面。为此，本文提出了一项关于内部威胁心理学的系统文献综述。根据综述结果，文献运用了多个不同的理论，但在心理学方面仍然缺乏强大的理论化。文献还考虑了一个人的特征，他或她的个人情况，以及关于这个人的更多或更少客观的事实。这些被认为与心理概念如人格特征和一个人的心理状态相关。此外，综述讨论了现有研究中的差距和局限性，从而为进一步的心理学研究打开了大门。

更新时间: 2024-07-08 13:46:20

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2407.05943v1

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

Can we obtain insights about the brain using AI models? How is the information in deep learning models related to brain recordings? Can we improve AI models with the help of brain recordings? Such questions can be tackled by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures, and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic cognitive science and neuroscience research. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus may also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, several neural encoding and decoding models have been recently proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a summary and discussion about future trends. Given the large amount of recently published work in the computational cognitive neuroscience (CCN) community, we believe that this survey enables an entry point for DNN researchers to diversify into CCN research.

Updated: 2024-07-08 13:44:56

标题: 深度神经网络与脑对齐：脑编码与解码（调查）

摘要: 我们能否利用AI模型获得有关大脑的见解？深度学习模型中的信息与大脑记录有何关联？我们能否借助大脑记录改进AI模型？这些问题可以通过研究像功能性磁共振成像（fMRI）这样的大脑记录来解决。作为第一步，神经科学界已贡献了几个与 passivereading/listening/viewing of concept words、narratives、pictures 和 movies 相关的大型认知神经科学数据集。在过去的二十年中，还提出了使用这些数据集的编码和解码模型。这些模型作为基础认知科学和神经科学研究的额外工具。编码模型旨在在给定刺激的情况下自动生成fMRI大脑表示。它们在评估和诊断神经病症方面具有几个实际应用，因此也可能有助于设计治疗大脑损伤。解码模型解决了重建给定fMRI的刺激的逆问题。它们对设计脑机接口或脑计算机界面很有用。受深度学习模型在自然语言处理、计算机视觉和语音方面的有效性启发，最近提出了几种神经编码和解码模型。在这项调查中，我们首先将讨论语言、视觉和语音刺激的流行表示，并总结神经科学数据集。此外，我们将审查基于深度学习的流行编码和解码架构，并注意它们的好处和局限性。最后，我们将总结并讨论未来趋势。鉴于计算认知神经科学（CCN）社区最近发表的大量作品，我们相信这项调查使DNN研究人员能够进入多样化的CCN研究。

更新时间: 2024-07-08 13:44:56

领域: q-bio.NC,cs.AI,cs.CL,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2307.10246v2

Graph Anomaly Detection with Noisy Labels by Reinforcement Learning

Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy labels. Thus, we are motivated to cut the edges of suspicious nodes to alleviate the impact of noise. However, it remains difficult to precisely identify the nodes with noisy labels. Moreover, it is hard to quantitatively evaluate the regret of cutting the edges, which may have either positive or negative influences. To this end, we propose a novel framework REGAD, i.e., REinforced Graph Anomaly Detector. Specifically, we aim to maximize the performance improvement (AUC) of a base detector by cutting noisy edges approximated through the nodes with high-confidence labels. (i) We design a tailored action and search space to train a policy network to carefully prune edges step by step, where only a few suspicious edges are prioritized in each step. (ii) We design a policy-in-the-loop mechanism to iteratively optimize the policy based on the feedback from base detector. The overall performance is evaluated by the cumulative rewards. Extensive experiments are conducted on three datasets under different anomaly ratios. The results indicate the superior performance of our proposed REGAD.

Updated: 2024-07-08 13:41:21

标题: 用强化学习进行带有噪声标签的图异常检测

摘要: 图形异常检测（GAD）已被广泛应用于许多领域，例如金融领域的欺诈检测和社交网络中的机器人账户。现有方法致力于识别偏离正常节点的异常节点。虽然它们严重依赖于高质量的注释，在现实场景中很难获得，这可能导致基于噪声标签的性能严重降低。因此，我们受到启发，剪切可疑节点的边缘以减轻噪声的影响。然而，精确识别带有噪声标签的节点仍然困难。此外，很难量化剪切边缘的后悔程度，这可能产生积极或消极的影响。为此，我们提出了一种新颖的框架REGAD，即REinforced Graph Anomaly Detector。具体而言，我们旨在通过剪切通过高置信度标签的节点近似的噪声边缘，最大化基础检测器的性能改进（AUC）。（i）我们设计了一个定制的动作和搜索空间，训练一个策略网络逐步谨慎地修剪边缘，每个步骤只优先考虑几个可疑边缘。（ii）我们设计了一个基于策略的机制，根据基础检测器的反馈循环优化策略。整体性能通过累积奖励进行评估。在不同异常比率下，我们在三个数据集上进行了大量实验。结果表明我们提出的REGAD具有卓越的性能。

更新时间: 2024-07-08 13:41:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05934v1

The Interplay of Learning, Analytics, and Artificial Intelligence in Education: A Vision for Hybrid Intelligence

This paper presents a multi-dimensional view of AI's role in learning and education, emphasizing the intricate interplay between AI, analytics, and the learning processes. Here, I challenge the prevalent narrow conceptualisation of AI as tools, as exemplified in generative AI tools, and argue for the importance of alternative conceptualisations of AI for achieving human-AI hybrid intelligence. I highlight the differences between human intelligence and artificial information processing, the importance of hybrid human-AI systems to extend human cognition, and posit that AI can also serve as an instrument for understanding human learning. Early learning sciences and AI in Education research (AIED), which saw AI as an analogy for human intelligence, have diverged from this perspective, prompting a need to rekindle this connection. The paper presents three unique conceptualisations of AI: the externalization of human cognition, the internalization of AI models to influence human mental models, and the extension of human cognition via tightly coupled human-AI hybrid intelligence systems. Examples from current research and practice are examined as instances of the three conceptualisations in education, highlighting the potential value and limitations of each conceptualisation for education, as well as the perils of overemphasis on externalising human cognition. The paper concludes with advocacy for a broader approach to AIED that goes beyond considerations on the design and development of AI, but also includes educating people about AI and innovating educational systems to remain relevant in an AI-ubiquitous world.

Updated: 2024-07-08 13:38:27

标题: 教育中学习、分析和人工智能的相互作用：混合智能的愿景

摘要: 这篇论文提出了人工智能在学习和教育中的多维视角，强调人工智能、分析和学习过程之间错综复杂的相互作用。在这里，我挑战了人工智能被普遍狭隘地理解为工具的观念，如生成式人工智能工具所体现的那样，并主张采用替代的人工智能概念来实现人工智能与人类混合智能的重要性。我强调了人类智能和人工智能信息处理之间的差异，混合人工智能系统扩展人类认知的重要性，并提出人工智能也可以作为理解人类学习的工具。早期的学习科学和教育中的人工智能研究(AIED)将人工智能视为人类智能的类比，已经偏离了这种观点，促使我们重新建立这种联系。该论文提出了三种独特的人工智能概念：人类认知的外部化，内部化人工智能模型以影响人类心智模型，以及通过紧密耦合的人工智能混合智能系统扩展人类认知。通过教育领域当前研究和实践中的例子，探讨了这三种概念在教育中的潜在价值和局限性，以及过分强调人类认知外部化的危险。论文最后主张采取更广泛的教育中的人工智能方法，不仅仅考虑人工智能的设计和开发，还包括教育人们关于人工智能的知识，并创新教育系统以在一个普遍存在人工智能的世界中保持相关性。

更新时间: 2024-07-08 13:38:27

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.16081v4

Fault Detection for agents on power grid topology optimization: A Comprehensive analysis

The topology optimization of transmission networks using Deep Reinforcement Learning (DRL) has increasingly come into focus. Various researchers have proposed different DRL agents, which are often benchmarked on the Grid2Op environment from the Learning to Run a Power Network (L2RPN) challenges. The environments have many advantages with their realistic chronics and underlying power flow backends. However, the interpretation of agent survival or failure is not always clear, as there are a variety of potential causes. In this work, we focus on the failures of the power grid to identify patterns and detect them a priori. We collect the failed chronics of three different agents on the WCCI 2022 L2RPN environment, totaling about 40k data points. By clustering, we are able to detect five distinct clusters, identifying different failure types. Further, we propose a multi-class prediction approach to detect failures beforehand and evaluate five different models. Here, the Light Gradient-Boosting Machine (LightGBM) shows the best performance, with an accuracy of 86%. It also correctly identifies in 91% of the time failure and survival observations. Finally, we provide a detailed feature importance analysis that identifies critical features and regions in the grid.

Updated: 2024-07-08 13:35:12

标题: 电网拓扑优化中代理的故障检测：综合分析

摘要: 使用深度强化学习（DRL）进行传输网络拓扑优化越来越受到关注。各种研究人员提出了不同的DRL代理，通常在“学习运行电力网络”（L2RPN）挑战的Grid2Op环境上进行基准测试。这些环境具有许多优势，具有逼真的时间序列和基础功率流后端。然而，代理的生存或失败的解释并不总是清晰的，因为存在多种潜在原因。在这项工作中，我们专注于电网的故障，以识别模式并事先检测它们。我们收集了WCCI 2022 L2RPN环境中三个不同代理的故障时间序列，总计约40k数据点。通过聚类，我们能够检测出五个不同的簇，识别不同的故障类型。此外，我们提出了一种多类预测方法来事先检测故障，并评估了五种不同的模型。在这里，轻量梯度提升机（LightGBM）表现最佳，准确率达86%。它还在91%的时间内正确识别故障和生存观察结果。最后，我们提供了详细的特征重要性分析，识别了电网中的关键特征和区域。

更新时间: 2024-07-08 13:35:12

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.16426v2

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

Updated: 2024-07-08 13:35:00

标题: 朝着确保安全的人工智能：确保强大可靠的人工智能系统的框架

摘要: 确保人工智能系统可靠地和牢固地避免有害或危险行为是一个至关重要的挑战，特别是对于具有高度自主性和智能或用于安全关键环境的人工智能系统。在本文中，我们将介绍和定义一系列人工智能安全方法，我们将其称为保证安全（GS）人工智能。这些方法的核心特点是它们旨在产生具有高保证度量安全保证的人工智能系统。这通过三个核心组件的相互作用实现：一个世界模型（提供人工智能系统如何影响外部世界的数学描述）、一个安全规范（对可接受的影响的数学描述）和一个验证器（提供可审计的证明证书，证明人工智能相对于世界模型满足安全规范）。我们概述了创建这三个核心组件的几种方法，描述了主要的技术挑战，并提出了一些潜在的解决方案。我们还为这种方法对人工智能安全的必要性进行了论证，并认为主要替代方法的不足。

更新时间: 2024-07-08 13:35:00

领域: cs.AI

下载: http://arxiv.org/abs/2405.06624v3

Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop

Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.

Updated: 2024-07-08 13:32:14

标题: 朝着优化和评估一个具有人类在回路中的检索增强型问答聊天机器人：使用LLMs

摘要: 大型语言模型已经在各种乏味和重复的任务中找到了应用，包括人力资源（HR）支持。我们与SAP SE的领域专家合作，开发了一个HR支持聊天机器人，作为一种高效和有效的工具来解决员工的问题。我们在开发周期的各个环节中插入了人机协作，例如数据集收集、提示优化和生成输出的评估。通过提升LLM驱动的聊天机器人的响应质量，并探索替代检索方法，我们创建了一个高效、可扩展和灵活的工具，供HR专业人员有效地解决员工问题。我们的实验和评估得出结论，GPT-4的表现优于其他模型，并且能够通过内部推理能力克服数据的不一致性。此外，通过专家分析，我们推断出像G-Eval和Prometheus这样的无参考评估指标表现出与人类评估密切一致的可靠性。

更新时间: 2024-07-08 13:32:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05925v1

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three-dimensional point tracking has none. To this end, leveraging existing footage, we build a new benchmark for 3D point tracking featuring 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor and outdoor environments. To measure performance on the TAP-3D task, we formulate a collection of metrics that extend the Jaccard-based metric used in TAP to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness. We manually verify a large sample of trajectories to ensure correct video annotations, and assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models. We anticipate this benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video. Code for dataset download, generation, and model evaluation is available at https://tapvid3d.github.io

Updated: 2024-07-08 13:28:47

标题: TAPVid-3D：3D空间中任意点跟踪的基准测试

摘要: 我们引入了一个新的基准，TAPVid-3D，用于评估在3D中追踪任意点的长距离追踪任务（TAP-3D）。虽然在二维空间中的点追踪（TAP）有许多基准来衡量在现实世界视频中的性能，比如TAPVid-DAVIS，但三维空间中的点追踪却没有。为此，我们利用现有的视频素材，建立了一个新的用于3D点追踪的基准，包括4,000多个现实世界视频，由三种不同的数据源组成，涵盖各种对象类型、运动模式以及室内和室外环境。为了衡量TAP-3D任务的性能，我们制定了一系列度量标准，扩展了TAP中使用的基于Jaccard的度量标准，以处理模型之间的模糊深度尺度、遮挡和多轨迹时空平滑性等复杂性。我们手动验证了大量轨迹，以确保正确的视频注释，并通过使用现有的跟踪模型构建竞争基线来评估TAP-3D任务的当前状态。我们期望这个基准将作为一个指南，帮助我们更好地理解从单眼视频中获取精确的3D运动和表面变形。数据集下载、生成和模型评估的代码可在https://tapvid3d.github.io上找到。

更新时间: 2024-07-08 13:28:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05921v1

LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD) a flexible framework for training architectures with embedded optimization layers that seamlessly integrates into automatic differentiation libraries. LPGD efficiently computes meaningful replacements of the degenerate optimization layer derivatives by re-running the forward solver oracle on a perturbed input. LPGD captures various previously proposed methods as special cases, while fostering deep links to traditional optimization methods. We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup.

Updated: 2024-07-08 13:27:41

标题: LPGD：嵌入式优化层反向传播的通用框架

摘要: 将参数化优化问题嵌入到机器学习架构中作为一种强大的归纳偏差。使用随机梯度下降训练这样的架构需要谨慎，因为嵌入的优化问题的退化导数通常会使梯度不具信息性。我们提出Lagrangian Proximal Gradient Descent（LPGD），这是一个灵活的框架，用于训练具有嵌入式优化层的架构，可以无缝集成到自动微分库中。LPGD通过在扰动输入上重新运行正向求解器来高效计算退化优化层导数的有意义替代。LPGD将各种先前提出的方法作为特殊情况，同时促进与传统优化方法的深度联系。我们在理论上分析了我们的方法，并在历史和合成数据上证明了LPGD甚至在可微设置中也比梯度下降更快地收敛。

更新时间: 2024-07-08 13:27:41

领域: cs.LG

下载: http://arxiv.org/abs/2407.05920v1

Fostering Trust and Quantifying Value of AI and ML

Artificial Intelligence (AI) and Machine Learning (ML) providers have a responsibility to develop valid and reliable systems. Much has been discussed about trusting AI and ML inferences (the process of running live data through a trained AI model to make a prediction or solve a task), but little has been done to define what that means. Those in the space of ML- based products are familiar with topics such as transparency, explainability, safety, bias, and so forth. Yet, there are no frameworks to quantify and measure those. Producing ever more trustworthy machine learning inferences is a path to increase the value of products (i.e., increased trust in the results) and to engage in conversations with users to gather feedback to improve products. In this paper, we begin by examining the dynamic of trust between a provider (Trustor) and users (Trustees). Trustors are required to be trusting and trustworthy, whereas trustees need not be trusting nor trustworthy. The challenge for trustors is to provide results that are good enough to make a trustee increase their level of trust above a minimum threshold for: 1- doing business together; 2- continuation of service. We conclude by defining and proposing a framework, and a set of viable metrics, to be used for computing a trust score and objectively understand how trustworthy a machine learning system can claim to be, plus their behavior over time.

Updated: 2024-07-08 13:25:28

标题: 培养信任和量化人工智能和机器学习的价值

摘要: 人工智能（AI）和机器学习（ML）提供商有责任开发有效且可靠的系统。关于信任AI和ML推断（通过训练后的AI模型对实时数据进行处理以进行预测或解决任务的过程）已经讨论了很多，但对此并没有明确定义。那些从事基于ML产品领域的人熟悉透明度、可解释性、安全性、偏见等主题。然而，目前还没有框架来量化和衡量这些主题。生产更可信赖的机器学习推断是提高产品价值的一条途径（即增加对结果的信任），并与用户进行对话以收集反馈来改进产品。在本文中，我们首先探讨了提供者（信任者）与用户（受托人）之间信任动态。信任者需要具备信任和值得信赖的品质，而受托人则不需要具备信任和值得信赖的品质。对于信任者的挑战在于提供足够好的结果，使受托人将其信任水平提高到超过最低阈值，以实现：1- 共同开展业务；2- 继续提供服务。最后，我们通过定义和提出一个框架以及一组可行的指标，用于计算信任评分并客观了解一个机器学习系统可以宣称自己有多值得信赖，以及其随时间变化的行为。

更新时间: 2024-07-08 13:25:28

领域: cs.LG,cs.CY,91A80,I.2.0; D.2.8; H.1.2

下载: http://arxiv.org/abs/2407.05919v1

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3$\times$ lower GPU memory usage and 5$\times$ faster fitting time not only rivals INRs (e.g., WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 2000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. Code is available at https://github.com/Xinjie-Q/GaussianImage.

Updated: 2024-07-08 13:22:14

标题: 高斯图像：通过2D高斯喷溅实现的每秒1000帧图像表示和压缩

摘要: 隐式神经表示（INRs）最近在图像表示和压缩方面取得了巨大成功，提供了高视觉质量和快速渲染速度，帧率在10-1000 FPS之间，假设有足够的GPU资源可用。然而，这一要求经常阻碍了它们在内存有限的低端设备上的使用。为此，我们提出了一种通过二维高斯分层的图像表示和压缩的创新范式，称为GaussianImage。我们首先引入二维高斯来表示图像，其中每个高斯具有包括位置、协方差和颜色在内的8个参数。随后，我们揭示了一种基于累积求和的新型渲染算法。值得注意的是，我们的方法在GPU内存使用方面至少降低了3倍，拟合时间快了5倍，不仅在表示性能上与INRs（例如WIRE、I-NGP）相媲美，而且无论参数大小如何，都可以提供更快的渲染速度为1500-2000 FPS。此外，我们集成了现有的矢量量化技术来构建图像编解码器。实验结果表明，我们的编解码器在速率失真性能上与基于压缩的INRs（如COIN和COIN++）相当，同时还能实现约2000 FPS的解码速度。此外，初步的概念验证表明，使用部分比特回传编码时，我们的编解码器在性能上优于COIN和COIN++。代码可在https://github.com/Xinjie-Q/GaussianImage 公开获取。

更新时间: 2024-07-08 13:22:14

领域: eess.IV,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2403.08551v4

UniFIDES: Universal Fractional Integro-Differential Equation Solvers

The development of data-driven approaches for solving differential equations has been followed by a plethora of applications in science and engineering across a multitude of disciplines and remains a central focus of active scientific inquiry. However, a large body of natural phenomena incorporates memory effects that are best described via fractional integro-differential equations (FIDEs), in which the integral or differential operators accept non-integer orders. Addressing the challenges posed by nonlinear FIDEs is a recognized difficulty, necessitating the application of generic methods with immediate practical relevance. This work introduces the Universal Fractional Integro-Differential Equation Solvers (UniFIDES), a comprehensive machine learning platform designed to expeditiously solve a variety of FIDEs in both forward and inverse directions, without the need for ad hoc manipulation of the equations. The effectiveness of UniFIDES is demonstrated through a collection of integer-order and fractional problems in science and engineering. Our results highlight UniFIDES' ability to accurately solve a wide spectrum of integro-differential equations and offer the prospect of using machine learning platforms universally for discovering and describing dynamical and complex systems.

Updated: 2024-07-08 13:18:17

标题: UniFIDES：通用分数积分微分方程求解器

摘要: 数据驱动方法在解决微分方程方面的发展已经被广泛应用于科学和工程的各个领域，并且仍然是活跃科学探究的中心焦点。然而，许多自然现象包含记忆效应，最好通过分数积分微分方程（FIDEs）来描述，其中积分或微分算子接受非整数阶。解决非线性FIDEs所带来的挑战被认为是一种困难，需要应用具有即时实用价值的通用方法。本研究介绍了通用分数积分微分方程求解器（UniFIDES），这是一个全面的机器学习平台，旨在快速解决各种FIDEs问题，无论是正向还是反向，而无需对方程进行特定的处理。UniFIDES的有效性通过在科学和工程领域的一系列整数阶和分数问题中得到了证明。我们的结果突显了UniFIDES准确解决各种积分微分方程的能力，并提供了使用机器学习平台普遍发现和描述动态和复杂系统的前景。

更新时间: 2024-07-08 13:18:17

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2407.01848v2

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding

Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may be useful to prevent it from reoccurring. The task of being able to classify a traffic scene as a specific type of accident is the focus of this work. We approach the problem by likening a traffic scene to a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges. This representation of an accident can be referred to as a scene graph, and is used as input for an accident classifier. Better results can be obtained with a classifier that fuses the scene graph input with representations from vision and language. This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification. When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly (DoTA) benchmark, representing an increase of close to 5 percentage points from the case where scene graph information is not taken into account.

Updated: 2024-07-08 13:15:11

标题: 使用场景图增强视觉语言模型以便理解交通事故

摘要: 认识到交通事故是任何自动驾驶或道路监控系统中至关重要的一部分。事故可能以各种形式出现，了解正在发生的事故类型可能有助于防止其再次发生。能够将交通场景分类为特定类型事故的任务是本文的重点。我们通过将交通场景类比为图形来解决这个问题，其中诸如汽车之类的对象可以表示为节点，它们之间的相对距离和方向可以表示为边缘。这种对事故的表示可以称为场景图，并被用作事故分类器的输入。通过将场景图输入与来自视觉和语言的表示融合的分类器可以获得更好的结果。本文介绍了一个多阶段、多模态的流水线，用于预处理交通事故视频，将其编码为场景图，并将此表示与视觉和语言模态对齐以进行事故分类。在训练4个类别时，我们的方法在流行的Traffic Anomaly Detection（DoTA）基准的（不平衡）子集上实现了57.77%的平衡准确度得分，相比没有考虑场景图信息的情况，增加了接近5个百分点。

更新时间: 2024-07-08 13:15:11

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.05910v1

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively. We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively.

Updated: 2024-07-08 13:11:53

标题: MVGT：基于空间关系的多视图图形变换器用于EEG情绪识别

摘要: 脑电图（EEG）是一种医学成像技术，通过电极捕捉头皮脑结构的电活动，已被广泛应用于情感计算。EEG的空间域富含情感信息。然而，现有研究中很少同时分析空间域中几何和解剖结构的多个视角的EEG信号。本文提出了一种基于空间关系的多视图图形转换器（MVGT），整合了来自时间、频率和空间域的信息，包括几何和解剖结构，从而全面增强模型的表达能力。我们将EEG通道的空间信息整合到模型中作为编码，从而提高其感知通道的空间结构的能力。与此同时，基于公开可用数据集的实验结果表明，我们提出的模型在近年来优于最先进的方法。此外，结果还显示MVGT能够有效提取多个域中的信息，并有效捕捉EEG情绪识别任务中通道间的关系。

更新时间: 2024-07-08 13:11:53

领域: cs.NE,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.03131v2

Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Unequal representation across cultures and socioeconomic groups in AI is a significant and challenging problem, often leading to uneven model performance. As a step toward addressing this issue, we formulate translated non-English, geographic, and socioeconomic integrated prompts and evaluate their impact on VL model performance for data from different countries and income groups. Our findings show that geographic and socioeconomic integrated prompts improve VL performance on lower-income data and favor the retrieval of topic appearances commonly found in data from low-income households. From our analyses, we identify and highlight contexts where these strategies yield the most improvements. Our model analysis code is publicly available at https://github.com/Anniejoan/Uplifting-Lower-income-data .

Updated: 2024-07-08 13:09:39

标题: 提升低收入数据：视觉-语言模型中实现经济视角转变的策略

摘要: 跨文化和社会经济群体在人工智能中的不均衡代表是一个重要且具有挑战性的问题，通常会导致模型表现不均衡。为了解决这个问题，我们制定了翻译成非英语、地理和社会经济整合提示，并评估它们对来自不同国家和收入群体的数据的VL模型性能的影响。我们的研究结果显示，地理和社会经济整合提示可以提高低收入数据的VL性能，并有利于检索通常在低收入家庭数据中发现的主题出现。通过我们的分析，我们确定并突出显示这些策略产生最多改进的情境。我们的模型分析代码可在https://github.com/Anniejoan/Uplifting-Lower-income-data 公开获取。

更新时间: 2024-07-08 13:09:39

领域: cs.CY,cs.AI,cs.CL,cs.CV,K.4; I.2.7; I.2.8

下载: http://arxiv.org/abs/2407.02623v2

ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access Networks

Large Language Models (LLMs) can revolutionize how we deploy and operate Open Radio Access Networks (O-RAN) by enhancing network analytics, anomaly detection, and code generation and significantly increasing the efficiency and reliability of a plethora of O-RAN tasks. In this paper, we present ORAN-Bench-13K, the first comprehensive benchmark designed to evaluate the performance of Large Language Models (LLMs) within the context of O-RAN. Our benchmark consists of 13,952 meticulously curated multiple-choice questions generated from 116 O-RAN specification documents. We leverage a novel three-stage LLM framework, and the questions are categorized into three distinct difficulties to cover a wide spectrum of ORAN-related knowledge. We thoroughly evaluate the performance of several state-of-the-art LLMs, including Gemini, Chat-GPT, and Mistral. Additionally, we propose ORANSight, a Retrieval-Augmented Generation (RAG)-based pipeline that demonstrates superior performance on ORAN-Bench-13K compared to other tested closed-source models. Our findings indicate that current popular LLM models are not proficient in O-RAN, highlighting the need for specialized models. We observed a noticeable performance improvement when incorporating the RAG-based ORANSight pipeline, with a Macro Accuracy of 0.784 and a Weighted Accuracy of 0.776, which was on average 21.55% and 22.59% better than the other tested LLMs.

Updated: 2024-07-08 13:07:50

标题: ORAN-Bench-13K：用于评估开放式无线接入网络中LLM的开源基准测试Benchmark

摘要: 大型语言模型（LLMs）可以通过增强网络分析、异常检测和代码生成来彻底改变我们部署和运营开放式无线接入网络（O-RAN）的方式，并显著提高众多O-RAN任务的效率和可靠性。在本文中，我们提出了ORAN-Bench-13K，这是第一个旨在评估大型语言模型（LLMs）在O-RAN环境中性能的全面基准。我们的基准由13,952个精心策划的多项选择题组成，这些题目是从116份O-RAN规范文件中生成的。我们利用了一种新颖的三阶段LLM框架，将问题分为三个不同难度级别，以涵盖广泛的ORAN相关知识。我们彻底评估了几种最先进的LLM模型的性能，包括Gemini、Chat-GPT和Mistral。此外，我们提出了ORANSight，这是一个基于检索增强生成（RAG）的管道，在ORAN-Bench-13K上表现优于其他经过测试的闭源模型。我们的研究结果表明，目前流行的LLM模型在O-RAN方面表现不佳，突显了专门模型的需求。当引入基于RAG的ORANSight管道时，我们观察到明显的性能提升，宏观准确率为0.784，加权准确率为0.776，平均比其他经过测试的LLM模型提高了21.55%和22.59%。

更新时间: 2024-07-08 13:07:50

领域: cs.NI,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.06245v1

Contrastive Learning of Preferences with a Contextual InfoNCE Loss

A common problem in contextual preference ranking is that a single preferred action is compared against several choices, thereby blowing up the complexity and skewing the preference distribution. In this work, we show how one can solve this problem via a suitable adaptation of the CLIP framework.This adaptation is not entirely straight-forward, because although the InfoNCE loss used by CLIP has achieved great success in computer vision and multi-modal domains, its batch-construction technique requires the ability to compare arbitrary items, and is not well-defined if one item has multiple positive associations in the same batch. We empirically demonstrate the utility of our adapted version of the InfoNCE loss in the domain of collectable card games, where we aim to learn an embedding space that captures the associations between single cards and whole card pools based on human selections. Such selection data only exists for restricted choices, thus generating concrete preferences of one item over a set of other items rather than a perfect fit between the card and the pool. Our results show that vanilla CLIP does not perform well due to the aforementioned intuitive issues. However, by adapting CLIP to the problem, we receive a model outperforming previous work trained with the triplet loss, while also alleviating problems associated with mining triplets.

Updated: 2024-07-08 13:05:08

标题: 具有上下文信息NCE损失的偏好对比学习

摘要: 在情境偏好排序中的一个常见问题是，一个首选操作被与多个选择进行比较，从而使复杂性增加并偏移偏好分布。在这项工作中，我们展示了如何通过适当调整CLIP框架来解决这个问题。这种调整并非完全直接，因为虽然CLIP使用的InfoNCE损失在计算机视觉和多模态领域取得了巨大成功，但其批次构建技术要求能够比较任意项，并且如果一个项在同一批次中有多个正关联，则不明确定义。我们通过实验证明了我们对InfoNCE损失的调整版本在可收集卡牌游戏领域的实用性，我们的目标是学习一个嵌入空间，基于人类选择捕捉单张卡片和整个卡片池之间的关联。这种选择数据仅存在于受限选择中，因此产生了一个项目优先于一组其他项目的具体偏好，而不是卡片和池之间的完美匹配。我们的结果显示，由于前述直观问题，普通的CLIP表现不佳。然而，通过将CLIP适应到这个问题，我们得到了一个模型，它在训练时优于使用三元组损失的先前工作，同时也减轻了与挖掘三元组相关的问题。

更新时间: 2024-07-08 13:05:08

领域: cs.AI

下载: http://arxiv.org/abs/2407.05898v1

Link Representation Learning for Probabilistic Travel Time Estimation

Travel time estimation is a crucial application in navigation apps and web mapping services. Current deterministic and probabilistic methods primarily focus on modeling individual trips, assuming independence among trips. However, in real-world scenarios, we often observe strong inter-trip correlations due to factors such as weather conditions, traffic management, and road works. In this paper, we propose to model trip-level link travel time using a Gaussian hierarchical model, which can characterize both inter-trip and intra-trip correlations. The joint distribution of travel time of multiple trips becomes a multivariate Gaussian parameterized by learnable link representations. To effectively use the sparse GPS trajectories, we also propose a data augmentation method based on trip sub-sampling, which allows for fine-grained gradient backpropagation in learning link representations. During inference, we estimate the probability distribution of the travel time of a queried trip conditional on the completed trips that are spatiotemporally adjacent. We refer to the overall framework as ProbTTE. We evaluate ProbTTE on two real-world GPS trajectory datasets, and the results demonstrate its superior performance compared to state-of-the-art deterministic and probabilistic baselines. Additionally, we find that the learned link representations align well with the physical geometry of the network, making them suitable as input for other applications.

Updated: 2024-07-08 13:01:53

标题: 链接表示学习用于概率旅行时间估计

摘要: 旅行时间估计是导航应用程序和网络地图服务中至关重要的应用。目前的确定性和概率方法主要集中在建模单个旅行，假设旅行之间相互独立。然而，在现实世界的场景中，我们经常观察到由于诸如天气条件、交通管理和道路施工等因素而产生强烈的旅行之间的相关性。在本文中，我们提出使用高斯分层模型来对旅行级别的链接旅行时间进行建模，可以表征旅行之间和旅行内部的相关性。多个旅行的旅行时间的联合分布成为由可学习的链接表示参数化的多元高斯分布。为了有效地利用稀疏的GPS轨迹，我们还提出了一种基于旅行子采样的数据增强方法，可以在学习链接表示时实现精细的梯度反向传播。在推断过程中，我们估计查询旅行的旅行时间的概率分布，条件是已完成的在空间上接近的旅行。我们将整体框架称为ProbTTE。我们在两个真实世界的GPS轨迹数据集上评估了ProbTTE，结果显示与最先进的确定性和概率基线相比，其性能优越。此外，我们发现学习到的链接表示与网络的物理几何结构很好地吻合，使它们适合作为其他应用程序的输入。

更新时间: 2024-07-08 13:01:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05895v1

An efficient method to automate tooth identification and 3D bounding box extraction from Cone Beam CT Images

Accurate identification, localization, and segregation of teeth from Cone Beam Computed Tomography (CBCT) images are essential for analyzing dental pathologies. Modeling an individual tooth can be challenging and intricate to accomplish, especially when fillings and other restorations introduce artifacts. This paper proposes a method for automatically detecting, identifying, and extracting teeth from CBCT images. Our approach involves dividing the three-dimensional images into axial slices for image detection. Teeth are pinpointed and labeled using a single-stage object detector. Subsequently, bounding boxes are delineated and identified to create three-dimensional representations of each tooth. The proposed solution has been successfully integrated into the dental analysis tool Dentomo.

Updated: 2024-07-08 12:59:28

标题: 一种自动化牙齿识别和三维边界框提取的高效方法

摘要: 准确识别、定位和分离锥束计算机断层摄影（CBCT）图像中的牙齿对于分析牙科病理至关重要。建模单个牙齿可能具有挑战性和复杂性，特别是当充填物和其他修复引入伪影时。本文提出了一种自动检测、识别和提取CBCT图像中牙齿的方法。我们的方法涉及将三维图像分成轴向切片进行图像检测。牙齿被定位并使用单阶段对象检测器标记。随后，边界框被描绘并识别，以创建每颗牙齿的三维表示。所提出的解决方案已成功整合到牙科分析工具Dentomo中。

更新时间: 2024-07-08 12:59:28

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05892v1

Properties of Discrete Sliced Wasserstein Losses

The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.

Updated: 2024-07-08 12:52:12

标题: 离散切片Wasserstein损失的特性

摘要: 切片Wasserstein（SW）距离已成为比较概率测度的流行替代方法。广泛的应用包括图像处理、领域适应和生成建模，通常需要优化一些参数以最小化SW，该距离作为离散概率测度之间的损失函数（因为具有密度的测度在数值上是无法实现的）。所有这些优化问题都具有相同的子问题，即最小化切片Wasserstein能量。本文研究了$\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$的性质，即两个具有相同点数量的均匀离散测度之间的SW距离作为一个关于其中一个测度支撑$Y \in \mathbb{R}^{n \times d}$的函数。我们研究了这种能量的正则性和优化性质，以及它的蒙特卡洛近似$\mathcal{E}_p$（仅使用$p$个样本来估计SW中的期望），并展示了$\mathcal{E}_p$的临界点收敛于$\mathcal{E}$的临界点，以及对于过程$\mathcal{E}_p(Y)$的几乎必然的一致收敛和一致中心极限结果。最后，我们证明了以某种意义上，随机梯度下降方法最小化$\mathcal{E}$和$\mathcal{E}_p$会收敛到这些能量的（Clarke）临界点。

更新时间: 2024-07-08 12:52:12

领域: stat.ML,cs.LG,math.OC,math.PR

下载: http://arxiv.org/abs/2307.10352v6

Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs

The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the letter and spirit of the law. Computer-based systems for de-identification of personal information are vulnerable to data drift, often rendering them ineffective in cross-institution settings. Therefore, a rigorous assessment of existing de-identification against local health datasets is imperative to support the safe adoption of digital health initiatives in India. Using a small set of de-identified patient discharge summaries provided by an Indian healthcare institution, in this paper, we report the nominal performance of de-identification algorithms (based on language models) trained on publicly available non-Indian datasets, pointing towards a lack of cross-institutional generalization. Similarly, experimentation with off-the-shelf de-identification systems reveals potential risks associated with the approach. To overcome data scarcity, we explore generating synthetic clinical reports (using publicly available and Indian summaries) by performing in-context learning over Large Language Models (LLMs). Our experiments demonstrate the use of generated reports as an effective strategy for creating high-performing de-identification systems with good generalization capabilities.

Updated: 2024-07-08 12:47:03

标题: 印度临床出院总结的生成和去标识化使用LLMs

摘要: 医疗数据泄露的后果对患者、提供者和支付方可能是毁灭性的。最近几个月数据泄露的平均财务影响估计接近1000万美元。对于印度的医疗组织来说，这尤为重要，他们正在进行快速数字化，同时还在建立符合法律信函和精神的数据治理程序。基于计算机的个人信息去标识化系统容易受到数据漂移的影响，通常导致在跨机构环境中失效。因此，对现有去标识化系统进行严格评估，以支持印度数字健康倡议的安全采用是必不可少的。本文利用印度医疗机构提供的一小组去标识化患者出院摘要，报告了基于语言模型训练在公开非印度数据集上的去标识化算法的名义性能，指向缺乏跨机构泛化的问题。同样，对现成的去标识化系统进行实验显示了与该方法相关的潜在风险。为了克服数据稀缺问题，我们尝试使用公开可用和印度摘要进行上下文学习，从而生成合成临床报告。我们的实验表明，利用生成的报告作为创建具有良好泛化能力的高性能去标识化系统的有效策略。

更新时间: 2024-07-08 12:47:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05887v1

One system for learning and remembering episodes and rules

Humans can learn individual episodes and generalizable rules and also successfully retain both kinds of acquired knowledge over time. In the cognitive science literature, (1) learning individual episodes and rules and (2) learning and remembering are often both conceptualized as competing processes that necessitate separate, complementary learning systems. Inspired by recent research in statistical learning, we challenge these trade-offs, hypothesizing that they arise from capacity limitations rather than from the inherent incompatibility of the underlying cognitive processes. Using an associative learning task, we show that one system with excess representational capacity can learn and remember both episodes and rules.

Updated: 2024-07-08 12:44:18

标题: 一个用于学习和记忆情节和规则的系统

摘要: 人类可以学习个别事件和可推广的规则，并且成功地在长时间内保持这两种获得知识。在认知科学文献中，学习个别事件和规则以及学习和记忆经常被概念化为竞争的过程，需要分开、互补的学习系统。受到最近统计学习研究的启发，我们挑战这些权衡，假设它们源于容量限制而不是基础认知过程的固有不兼容性。通过一个联想学习任务，我们展示了一个具有过剩表征容量的系统可以学习和记住个别事件和规则。

更新时间: 2024-07-08 12:44:18

领域: cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.05884v1

Learning With Generalised Card Representations for "Magic: The Gathering"

A defining feature of collectable card games is the deck building process prior to actual gameplay, in which players form their decks according to some restrictions. Learning to build decks is difficult for players and models alike due to the large card variety and highly complex semantics, as well as requiring meaningful card and deck representations when aiming to utilise AI. In addition, regular releases of new card sets lead to unforeseeable fluctuations in the available card pool, thus affecting possible deck configurations and requiring continuous updates. Previous Game AI approaches to building decks have often been limited to fixed sets of possible cards, which greatly limits their utility in practice. In this work, we explore possible card representations that generalise to unseen cards, thus greatly extending the real-world utility of AI-based deck building for the game "Magic: The Gathering".We study such representations based on numerical, nominal, and text-based features of cards, card images, and meta information about card usage from third-party services. Our results show that while the particular choice of generalised input representation has little effect on learning to predict human card selections among known cards, the performance on new, unseen cards can be greatly improved. Our generalised model is able to predict 55\% of human choices on completely unseen cards, thus showing a deep understanding of card quality and strategy.

Updated: 2024-07-08 12:42:44

标题: 使用广义卡片表示学习《魔法：聚光之战》

摘要: 集换式卡牌游戏的一个显著特征是在实际游戏之前进行的卡组建设过程，玩家根据一些限制条件来组建他们的卡组。学习如何构建卡组对于玩家和模型来说都很困难，这是因为卡片种类繁多，语义高度复杂，而且在利用人工智能时需要有意义的卡片和卡组表示。此外，定期发布新的卡牌套装会导致可用卡片池的不可预见波动，从而影响可能的卡组配置，并需要持续更新。先前的游戏人工智能方法通常局限于固定的可能卡片集，这在实践中极大地限制了它们的实用性。在这项工作中，我们探讨了可能通用到未知卡片的卡片表示，从而极大地扩展了基于人工智能的卡组构建在“魔术风云”游戏中的实用性。我们研究了基于卡片的数值、名义和文本特征、卡片图像以及来自第三方服务的有关卡片使用情况的元信息的表示。我们的结果表明，虽然通用输入表示的具体选择对于学习预测已知卡片中的人类选择几乎没有影响，但在新的未知卡片上的表现可以得到极大的改善。我们的通用模型能够预测完全未知卡片中55%的人类选择，从而显示了对卡片质量和战略的深刻理解。

更新时间: 2024-07-08 12:42:44

领域: cs.AI

下载: http://arxiv.org/abs/2407.05879v1

Efficiently Training Neural Networks for Imperfect Information Games by Sampling Information Sets

In imperfect information games, the evaluation of a game state not only depends on the observable world but also relies on hidden parts of the environment. As accessing the obstructed information trivialises state evaluations, one approach to tackle such problems is to estimate the value of the imperfect state as a combination of all states in the information set, i.e., all possible states that are consistent with the current imperfect information. In this work, the goal is to learn a function that maps from the imperfect game information state to its expected value. However, constructing a perfect training set, i.e. an enumeration of the whole information set for numerous imperfect states, is often infeasible. To compute the expected values for an imperfect information game like \textit{Reconnaissance Blind Chess}, one would need to evaluate thousands of chess positions just to obtain the training target for a single state. Still, the expected value of a state can already be approximated with appropriate accuracy from a much smaller set of evaluations. Thus, in this paper, we empirically investigate how a budget of perfect information game evaluations should be distributed among training samples to maximise the return. Our results show that sampling a small number of states, in our experiments roughly 3, for a larger number of separate positions is preferable over repeatedly sampling a smaller quantity of states. Thus, we find that in our case, the quantity of different samples seems to be more important than higher target quality.

Updated: 2024-07-08 12:37:07

标题: 通过对信息集进行抽样，高效训练神经网络以应对不完全信息博弈

摘要: 在信息不完全的游戏中，对游戏状态的评估不仅取决于可观察的世界，还依赖于环境中隐藏的部分。访问被遮挡的信息使状态评估变得琐碎，解决这类问题的一种方法是估计不完全状态的价值，将其视为信息集中所有状态的组合，即与当前不完全信息一致的所有可能状态。在这项工作中，目标是学习一个函数，将不完全的游戏信息状态映射到其期望值。然而，构建一个完美的训练集，即列举大量不完全状态的整个信息集，通常是不可行的。要计算像“侦察盲棋”这样的不完全信息游戏的期望值，需要评估成千上万个棋局位置，才能获得单个状态的训练目标。然而，一个状态的期望值已经可以从一个更小的评估集合中以适当的准确性近似得出。因此，在本文中，我们经验性地研究了如何将完美信息游戏评估的预算分配给训练样本，以最大化回报。我们的结果表明，在我们的实验中，对较大数量的不同位置抽取少量状态，大约是3个，要优于重复对少量状态进行抽样。因此，我们发现，在我们的情况下，不同样本的数量似乎比更高的目标质量更重要。

更新时间: 2024-07-08 12:37:07

领域: cs.AI,cs.GT,cs.LG

下载: http://arxiv.org/abs/2407.05876v1

Scaling Exponents Across Parameterizations and Optimizers

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results under weaker assumptions and a broader set of optimizers. Our extensive empirical investigation includes tens of thousands of models trained with all combinations of three optimizers, four parameterizations, several alignment assumptions, more than a dozen learning rates, and fourteen model sizes up to 26.8B parameters. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work. Our results show that all parameterizations, not just maximal update parameterization (muP), can achieve hyperparameter transfer; moreover, our novel per-layer learning rate prescription for standard parameterization outperforms muP. Finally, we demonstrate that an overlooked aspect of parameterization, the epsilon parameter in Adam, must be scaled correctly to avoid gradient underflow and propose Adam-atan2, a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter entirely.

Updated: 2024-07-08 12:32:51

标题: 在参数化和优化器之间的尺度指数

摘要: 在从小到大宽度的模型的稳健和有效缩放通常需要精确调整许多算法和架构细节，如参数化和优化器选择。在这项工作中，我们提出了一种新的参数化视角，通过研究先前工作中关于参数和数据之间对齐的关键假设，并在更弱的假设和更广泛的优化器集合下推导出新的理论结果。我们进行了大量的实证研究，包括数万个使用三种优化器、四种参数化、几种对齐假设、十多个学习率和最多26.8B参数的十四种模型尺寸的组合训练的模型。我们发现，在先前工作的假设中往往会排除最佳的学习率缩放方案。我们的结果显示，所有参数化（不仅仅是最大更新参数化(muP)）都可以实现超参数传递；此外，我们针对标准参数化提出的新的逐层学习率处方优于muP。最后，我们展示了参数化中被忽视的一个方面，即Adam中的epsilon参数必须正确缩放以避免梯度下溢，并提出了Adam-atan2，这是一个新的数值稳定、尺度不变的Adam版本，完全消除了epsilon超参数。

更新时间: 2024-07-08 12:32:51

领域: cs.LG

下载: http://arxiv.org/abs/2407.05872v1

PORCA: Root Cause Analysis with Partially

Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.

Updated: 2024-07-08 12:31:12

标题: PORCA：带有部分信息的根本原因分析

摘要: 根本原因分析（RCA）旨在通过揭示和分析复杂系统的因果结构，识别系统故障的潜在原因。它已被广泛应用于许多应用领域。可靠的诊断结论对于减轻系统故障和财务损失至关重要。然而，先前的研究隐含地假设对系统进行了完全观察，忽略了部分观察的影响（即，缺失节点和潜在故障）。因此，它们未能得出可靠的RCA结果。在本文中，我们揭示了在部分观察中未观察到的混杂因素和异质性问题，并提出了一个新的部分观察数据下的根本原因分析问题。为实现这一目标，我们提出了PORCA，一个能够在未观察到的混杂因素和未观察到的异质性下探索可靠根本原因的新颖RCA框架。PORCA利用放大的基于分数的因果发现，有效优化无观察到的混杂因素下的无环有向混合图。此外，我们还开发了一个异质性感知的调度策略，提供自适应的样本权重。对一个合成数据集和两个真实世界数据集的广泛实验结果证明了所提出框架的有效性和优越性。

更新时间: 2024-07-08 12:31:12

领域: cs.AI

下载: http://arxiv.org/abs/2407.05869v1

KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions

Recent studies have demonstrated that large language models (LLMs) are susceptible to being misled by false premise questions (FPQs), leading to errors in factual knowledge, know as factuality hallucination. Existing benchmarks that assess this vulnerability primarily rely on manual construction, resulting in limited scale and lack of scalability. In this work, we introduce an automated, scalable pipeline to create FPQs based on knowledge graphs (KGs). The first step is modifying true triplets extracted from KGs to create false premises. Subsequently, utilizing the state-of-the-art capabilities of GPTs, we generate semantically rich FPQs. Based on the proposed method, we present a comprehensive benchmark, the Knowledge Graph-based False Premise Questions (KG-FPQ), which contains approximately 178k FPQs across three knowledge domains, at six levels of confusability, and in two task formats. Using KG-FPQ, we conduct extensive evaluations on several representative LLMs and provide valuable insights. The KG-FPQ dataset and code are available at~https://github.com/yanxuzhu/KG-FPQ.

Updated: 2024-07-08 12:31:03

标题: KG-FPQ：基于知识图谱的错误前提问题评估LLM中的事实性幻觉

摘要: 最近的研究表明，大型语言模型（LLMs）易受虚假前提问题（FPQs）的误导，导致事实知识错误，即事实幻觉。评估这种脆弱性的现有基准主要依赖于手工构建，导致规模有限且缺乏可扩展性。在这项工作中，我们引入了一种基于知识图（KGs）创建FPQs的自动化、可伸缩的流程。第一步是修改从KGs中提取的真实三元组以创建虚假前提。随后，利用GPTs的最新能力，我们生成语义丰富的FPQs。基于所提出的方法，我们提出了一个综合基准，即基于知识图的虚假前提问题（KG-FPQ），该基准包含约178k个FPQs，涵盖三个知识领域，在六个混淆级别和两种任务格式中。利用KG-FPQ，我们对几个代表性LLMs进行了广泛评估，并提供了有价值的见解。KG-FPQ数据集和代码可在https://github.com/yanxuzhu/KG-FPQ 上获得。

更新时间: 2024-07-08 12:31:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05868v1

Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess

In imperfect information games, the game state is generally not fully observable to players. Therefore, good gameplay requires policies that deal with the different information that is hidden from each player. To combat this, effective algorithms often reason about information sets; the sets of all possible game states that are consistent with a player's observations. While there is no way to distinguish between the states within an information set, this property does not imply that all states are equally likely to occur in play. We extend previous research on assigning weights to the states in an information set in order to facilitate better gameplay in the imperfect information game of Reconnaissance Blind Chess. For this, we train two different neural networks which estimate the likelihood of each state in an information set from historical game data. Experimentally, we find that a Siamese neural network is able to achieve higher accuracy and is more efficient than a classical convolutional neural network for the given domain. Finally, we evaluate an RBC-playing agent that is based on the generated weightings and compare different parameter settings that influence how strongly it should rely on them. The resulting best player is ranked 5th on the public leaderboard.

Updated: 2024-07-08 12:29:29

标题: 神经网络基于信息集加权的盲棋侦察对局

摘要: 在信息不完全的游戏中，游戏状态通常对玩家不是完全可观察的。因此，良好的游戏玩法需要处理不同信息，这些信息对每个玩家都是隐藏的。为了应对这种情况，有效的算法通常会考虑信息集；即与玩家观察一致的所有可能游戏状态的集合。虽然无法区分信息集内的状态，但这并不意味着所有状态在游戏中出现的可能性相等。我们扩展了以前关于在信息不完全的游戏中为信息集中的状态分配权重的研究，以促进在《侦察盲棋》中的更好游戏玩法。为此，我们训练了两种不同的神经网络，用于根据历史游戏数据估计信息集中每个状态发生的可能性。实验结果表明，一种连体神经网络能够实现更高的准确性，并且比给定领域的传统卷积神经网络更有效。最后，我们评估了基于生成权重的RBC对战代理，并比较了影响它应该如何依赖这些权重的不同参数设置。最终，最佳玩家在公共排行榜上排名第五。

更新时间: 2024-07-08 12:29:29

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05864v1

Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well as the lack of parallel computing capacity of mobile CPU/GPU. To enable practical on-device LLM, we present mllm-NPU, the first-of-its-kind LLM inference system that efficiently leverages on-device Neural Processing Unit (NPU) offloading. Essentially, mllm-NPU is an algorithm-system co-design that tackles a few semantic gaps between the LLM architecture and contemporary NPU design. Specifically, it re-constructs the prompt and model in three levels: (1) At prompt level, it divides variable-length prompts into multiple fixed-sized chunks while maintaining data dependencies; (2) At tensor level, it identifies and extracts significant outliers to run on the CPU/GPU in parallel with minimal overhead; (3) At block level, it schedules Transformer blocks in an out-of-order manner to the CPU/GPU and NPU based on their hardware affinity and sensitivity to accuracy. Compared to competitive baselines, mllm-NPU achieves 22.4x faster prefill speed and 30.7x energy savings on average, and up to 32.8x speedup in an end-to-end real-world application. For the first time, mllm-NPU achieves more than 1,000 tokens/sec prefilling for a billion-sized model (Qwen1.5-1.8B), paving the way towards practical on-device LLM.

Updated: 2024-07-08 12:20:45

标题: 在设备上使用MLLM-NPU实现每秒1000个令牌的LLM预填充功能

摘要: 在设备上使用大型语言模型（LLMs）正在催生新颖的移动应用程序，例如UI任务自动化和个性化电子邮件自动回复，而不会泄露用户的私人数据。然而，在设备上使用的LLMs仍然遭受无法接受的推断延迟，特别是第一个标记的时间（预填阶段），因为需要长上下文以进行准确、个性化的内容生成，以及移动CPU/GPU缺乏并行计算能力。为了实现实用的设备上LLM，我们提出了mllm-NPU，这是第一种能够有效利用设备上神经处理单元（NPU）卸载的LLM推断系统。实质上，mllm-NPU是一种算法系统共同设计，解决了LLM架构和当代NPU设计之间的一些语义差距。具体而言，在三个级别上重构提示和模型：（1）在提示级别上，将可变长度提示分成多个固定大小的块，同时保持数据依赖性；（2）在张量级别上，识别并提取重要的异常值以在CPU/GPU上以最小的开销并行运行；（3）在块级别上，根据其硬件亲和性和对准确性的敏感性，以无序的方式将Transformer块调度到CPU/GPU和NPU上。与竞争基线相比，mllm-NPU在平均预填速度上实现了22.4倍的加速和30.7倍的节能，并在端到端的实际应用程序中实现了高达32.8倍的加速。mllm-NPU首次实现了对于十亿级模型（Qwen1.5-1.8B）每秒超过1,000个标记的预填，为实现实用的设备上LLM铺平了道路。

更新时间: 2024-07-08 12:20:45

领域: cs.AI

下载: http://arxiv.org/abs/2407.05858v1

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

Updated: 2024-07-08 12:11:38

标题: 可解释机器学习中的分歧问题：从从业者的角度看

摘要: 随着各种事后解释方法越来越多地被利用来解释高风险环境中的复杂模型，发展对这些方法输出的解释是否存在分歧的更深入理解以及这种分歧如何在实践中解决变得至关重要。然而，目前很少有研究提供对这些关键问题的答案。在这项工作中，我们介绍并研究了可解释机器学习中的分歧问题。更具体地，我们形式化了不同解释之间的分歧概念，分析了这种分歧在实践中发生的频率以及从业者如何解决这些分歧。我们首先与数据科学家进行访谈，以了解由不同方法生成的相同模型预测的解释之间的分歧构成什么，并引入了一个新颖的定量框架来形式化这种理解。然后，我们利用这个框架对四个真实数据集、六种最先进的事后解释方法和六种不同的预测模型进行了严格的实证分析，以衡量各种流行解释方法生成的解释之间的分歧程度。此外，我们还进行了一项在线用户研究，以了解数据科学家在解决上述分歧时的方法。我们的结果表明，（1）最先进的解释方法在输出的解释方面经常存在分歧，（2）机器学习从业者在解决这种分歧时经常采用临时启发式方法。这些发现表明，从业者在做出重要决策时可能依赖于误导性的解释。它们也强调了开发基于原则的框架以有效评估和比较各种解释技术输出的重要性。

更新时间: 2024-07-08 12:11:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2202.01602v4

Exploring the Adversarial Capabilities of Large Language Models

The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into the security and privacy issues of LLMs, the extent to which these models can exhibit adversarial behavior remains largely unexplored. Addressing this gap, we investigate whether common publicly available LLMs have inherent capabilities to perturb text samples to fool safety measures, so-called adversarial examples resp.~attacks. More specifically, we investigate whether LLMs are inherently able to craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems. Our findings carry significant implications for (semi-)autonomous systems relying on LLMs, highlighting potential challenges in their interaction with existing systems and safety measures.

Updated: 2024-07-08 12:10:58

标题: 探索大型语言模型的对抗能力

摘要: 大型语言模型（LLMs）的大量增长引发了广泛和普遍的兴趣，因为它们具有强大的语言生成能力，为行业和研究提供了巨大潜力。尽管先前的研究深入探讨了LLMs的安全和隐私问题，但这些模型可能表现出对抗行为的程度仍然大部分未被探索。为了填补这一空白，我们调查了常见的公开可用LLMs是否具有固有的能力扭曲文本样本以欺骗安全措施，即所谓的对抗性示例攻击。更具体地说，我们调查了LLMs是否天生能够将良性样本制成对抗性示例以欺骗现有的安全措施。我们的实验，重点关注仇恨言论检测，揭示了LLMs成功地找到对抗性扰动，有效地破坏了仇恨言论检测系统。我们的发现对依赖LLMs的（半）自主系统具有重要意义，突显了它们与现有系统和安全措施互动的潜在挑战。

更新时间: 2024-07-08 12:10:58

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.09132v4

Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks

Label smoothing -- using softened labels instead of hard ones -- is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs.

Updated: 2024-07-08 12:05:50

标题: 当心你所追求的：标签平滑可以是隐私屏障，也可能是模型逆向攻击的催化剂

摘要: 标签平滑——使用软化的标签而不是硬性标签——是深度学习中广泛采用的正则化方法，表现出增强的泛化和校准等多种益处。然而，它对于保护模型隐私的影响尚未被探索。为了填补这一空白，我们调查了标签平滑对模型反演攻击（MIAs）的影响，这些攻击旨在利用分类器中编码的知识生成代表性类别样本，从而推断其训练数据的敏感信息。通过广泛的分析，我们发现传统的标签平滑促进了MIAs，从而增加了模型的隐私泄漏。更重要的是，我们揭示了使用负因子进行平滑可以抵消这种趋势，阻碍了类别相关信息的提取，实现了隐私保护，超越了最先进的防御手段。这建立了一种实用且强大的新方法，可以增强模型对MIAs的抵抗力。

更新时间: 2024-07-08 12:05:50

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2310.06549v5

Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction

Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models, resulting in a substantial performance gap from white-box attacks. We observe that prior work overlooks the interaction mechanisms between modalities, which plays a crucial role in understanding the intricacies of VLP models. In response, we propose a novel attack, called Collaborative Multimodal Interaction Attack (CMI-Attack), leveraging modality interaction through embedding guidance and interaction enhancement. Specifically, attacking text at the embedding level while preserving semantics, as well as utilizing interaction image gradients to enhance constraints on perturbations of texts and images. Significantly, in the image-text retrieval task on Flickr30K dataset, CMI-Attack raises the transfer success rates from ALBEF to TCL, $\text{CLIP}_{\text{ViT}}$ and $\text{CLIP}_{\text{CNN}}$ by 8.11%-16.75% over state-of-the-art methods. Moreover, CMI-Attack also demonstrates superior performance in cross-task generalization scenarios. Our work addresses the underexplored realm of transfer attacks on VLP models, shedding light on the importance of modality interaction for enhanced adversarial robustness.

Updated: 2024-07-08 12:03:14

标题: 通过协作多模态交互提高视觉语言预训练模型的对抗迁移性

摘要: 尽管视觉语言预训练（VLP）模型取得了显著进展，但它们对对抗攻击的敏感性带来了重大挑战。现有研究很少研究对VLP模型的攻击的可转移性，导致了白盒攻击的显著性能差距。我们观察到先前的工作忽视了模态之间的互动机制，这在理解VLP模型的复杂性中起着至关重要的作用。为此，我们提出了一种新颖的攻击，称为协作多模态交互攻击（CMI-Attack），通过嵌入引导和交互增强利用模态之间的互动。具体来说，在嵌入级别攻击文本同时保留语义，并利用图像互动梯度增强对文本和图像扰动的约束。显著地，在Flickr30K数据集上的图像文本检索任务中，CMI-Attack将从ALBEF到TCL，CLIP_ViT和CLIP_CNN的转移成功率提高了8.11%-16.75%，超过了最先进的方法。此外，CMI-Attack还在跨任务泛化场景中展示了出色的性能。我们的工作解决了VLP模型上转移攻击的未经充分探索的领域，为增强对抗鲁棒性的模态交互的重要性提供了启示。

更新时间: 2024-07-08 12:03:14

领域: cs.CV,cs.CR,cs.MM

下载: http://arxiv.org/abs/2403.10883v2

Vulnerability Detection in Smart Contracts: A Comprehensive Survey

In the growing field of blockchain technology, smart contracts exist as transformative digital agreements that execute transactions autonomously in decentralised networks. However, these contracts face challenges in the form of security vulnerabilities, posing significant financial and operational risks. While traditional methods to detect and mitigate vulnerabilities in smart contracts are limited due to a lack of comprehensiveness and effectiveness, integrating advanced machine learning technologies presents an attractive approach to increasing effective vulnerability countermeasures. We endeavour to fill an important gap in the existing literature by conducting a rigorous systematic review, exploring the intersection between machine learning and smart contracts. Specifically, the study examines the potential of machine learning techniques to improve the detection and mitigation of vulnerabilities in smart contracts. We analysed 88 articles published between 2018 and 2023 from the following databases: IEEE, ACM, ScienceDirect, Scopus, and Google Scholar. The findings reveal that classical machine learning techniques, including KNN, RF, DT, XG-Boost, and SVM, outperform static tools in vulnerability detection. Moreover, multi-model approaches integrating deep learning and classical machine learning show significant improvements in precision and recall, while hybrid models employing various techniques achieve near-perfect performance in vulnerability detection accuracy. By integrating state-of-the-art solutions, this work synthesises current methods, thoroughly investigates research gaps, and suggests directions for future studies. The insights gathered from this study are intended to serve as a seminal reference for academics, industry experts, and bodies interested in leveraging machine learning to enhance smart contract security.

Updated: 2024-07-08 11:51:15

标题: 智能合约中的漏洞检测：一项全面调查

摘要: 在不断发展的区块链技术领域，智能合约作为一种具有变革性的数字协议，在去中心化网络中自主执行交易。然而，这些合约面临着安全漏洞的挑战，可能带来重大的财务和运营风险。传统的检测和缓解智能合约漏洞的方法受限于缺乏全面性和有效性，整合先进的机器学习技术提供了一个吸引人的途径来增加有效的漏洞对策。我们致力于填补现有文献中的一个重要空白，通过进行严格的系统性回顾，探索机器学习和智能合约之间的交叉点。具体来说，该研究考察了机器学习技术改进智能合约漏洞检测和缓解的潜力。我们分析了从2018年到2023年发表的88篇文章，这些文章来自以下数据库：IEEE、ACM、ScienceDirect、Scopus和谷歌学术。研究结果显示，经典的机器学习技术，包括KNN、RF、DT、XG-Boost和SVM，在漏洞检测方面优于静态工具。此外，将深度学习和经典机器学习结合的多模型方法在精度和召回率上显示出显著的改善，而采用各种技术的混合模型在漏洞检测准确性方面表现接近完美。通过整合最先进的解决方案，本研究综合了当前方法，深入调查了研究空白，并提出了未来研究的方向。从这项研究中获得的见解旨在为学术界、行业专家和有兴趣利用机器学习增强智能合约安全性的机构提供一个重要的参考。

更新时间: 2024-07-08 11:51:15

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.07922v1

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models

Language Models (LMs) excel in natural language processing tasks for English but show reduced performance in most other languages. This problem is commonly tackled by continually pre-training and fine-tuning these models for said languages. A significant issue in this process is the limited vocabulary coverage in the original model's tokenizer, leading to inadequate representation of new languages and necessitating an expansion of the tokenizer. The initialization of the embeddings corresponding to new vocabulary items presents a further challenge. Current strategies require cross-lingual embeddings and lack a solid theoretical foundation as well as comparisons with strong baselines. In this paper, we first establish theoretically that initializing within the convex hull of existing embeddings is a good initialization, followed by a novel but simple approach, Constrained Word2Vec (CW2V), which does not require cross-lingual embeddings. Our study evaluates different initialization methods for expanding RoBERTa and LLaMA 2 across four languages and five tasks. The results show that CW2V performs equally well or even better than more advanced techniques. Additionally, simpler approaches like multivariate initialization perform on par with these advanced methods indicating that efficient large-scale multilingual continued pretraining can be achieved even with simpler initialization methods.

Updated: 2024-07-08 11:38:49

标题: 语言模型中词汇扩展和初始化方法的实证比较

摘要: 语言模型（LMs）在英语的自然语言处理任务中表现出色，但在大多数其他语言中表现不佳。这一问题通常通过不断为这些模型进行预训练和微调来解决。这一过程中的一个重要问题是原始模型的标记器中词汇覆盖有限，导致新语言的表达不足，需要扩展标记器。新词汇对应的嵌入的初始化也带来进一步的挑战。当前的策略需要跨语言嵌入，并且缺乏坚实的理论基础以及与强基线的比较。本文首先在理论上建立了在现有嵌入的凸包内初始化是一个良好的初始化的观点，接着提出了一种新颖而简单的方法，受限制的Word2Vec（CW2V），它不需要跨语言嵌入。我们的研究评估了扩展RoBERTa和LLaMA 2的不同初始化方法在四种语言和五项任务中的表现。结果显示，CW2V的表现与更先进的技术一样好甚至更好。此外，像多变量初始化这样的 simpler 方法表现与这些先进方法相当，表明即使使用简单的初始化方法也可以实现高效的大规模多语言持续预训练。

更新时间: 2024-07-08 11:38:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.05841v1

A Data-Driven Machine Learning Approach for Detecting Albedo Anomalies on the Lunar Surface

This study introduces a data-driven approach using machine learning (ML) techniques to explore and predict albedo anomalies on the Moon's surface. The research leverages diverse planetary datasets, including high-spatial-resolution albedo maps and element maps (LPFe, LPK, LPTh, LPTi) derived from laser and gamma-ray measurements. The primary objective is to identify relationships between chemical elements and albedo, thereby expanding our understanding of planetary surfaces and offering predictive capabilities for areas with incomplete datasets. To bridge the gap in resolution between the albedo and element maps, we employ Gaussian blurring techniques, including an innovative adaptive Gaussian blur. Our methodology culminates in the deployment of an Extreme Gradient Boosting Regression Model, optimized to predict full albedo based on elemental composition. Furthermore, we present an interactive analytical tool to visualize prediction errors, delineating their spatial and chemical characteristics. The findings not only pave the way for a more comprehensive understanding of the Moon's surface but also provide a framework for similar studies on other celestial bodies.

Updated: 2024-07-08 11:25:30

标题: 一个数据驱动的机器学习方法用于检测月表反照率异常

摘要: 这项研究引入了一种利用机器学习技术的数据驱动方法，探索和预测月球表面的反照率异常。该研究利用多样化的行星数据集，包括从激光和γ射线测量中得出的高空间分辨率反照率地图和元素地图（LPFe、LPK、LPTh、LPTi）。主要目标是识别化学元素与反照率之间的关系，从而扩展我们对行星表面的了解，并为具有不完整数据集的区域提供预测能力。为了弥合反照率和元素地图之间的分辨率差距，我们采用了高斯模糊技术，包括一种创新的自适应高斯模糊。我们的方法最终部署了一个经过优化的极端梯度增强回归模型，用于基于元素组成预测完整的反照率。此外，我们提供了一个交互式分析工具，可视化预测误差，描绘它们的空间和化学特征。这些发现不仅为更全面地了解月球表面铺平了道路，还为其他天体的类似研究提供了框架。

更新时间: 2024-07-08 11:25:30

领域: astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2407.05832v1

SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation

It is often desirable to distill the capabilities of large language models (LLMs) into smaller student models due to compute and memory constraints. One way to do this for classification tasks is via dataset synthesis, which can be accomplished by generating examples of each label from the LLM. Prior approaches to synthesis use few-shot prompting, which relies on the LLM's parametric knowledge to generate usable examples. However, this leads to issues of repetition, bias towards popular entities, and stylistic differences from human text. In this work, we propose Synthesize by Retrieval and Refinement (SynthesizRR), which uses retrieval augmentation to introduce variety into the dataset synthesis process: as retrieved passages vary, the LLM is seeded with different content to generate its examples. We empirically study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor, requiring complex synthesis strategies. We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance, when compared to 32-shot prompting and four prior approaches. We release our extensive codebase at https://github.com/amazon-science/synthesizrr

Updated: 2024-07-08 11:20:42

标题: SynthesizRR：利用检索增强技术生成多样化数据集

摘要: 通常希望将大型语言模型（LLMs）的能力提炼为较小的学生模型，这是由于计算和内存限制。对于分类任务，一种方法是通过数据集合成来实现，可以通过从LLM生成每个标签的示例来实现。先前的合成方法使用少量提示，依赖于LLM的参数化知识来生成可用的示例。然而，这会导致重复、偏向于流行实体以及与人类文本的风格差异等问题。在这项工作中，我们提出了检索和改进合成（SynthesizRR），它使用检索增强来将多样性引入数据集合成过程中：随着检索到的段落的变化，LLM被种子化不同的内容来生成其示例。我们对涵盖主题分类、情感分析、语调检测和幽默等复杂合成策略的六个数据集进行了实证研究。我们发现，与32次提示和四种先前方法相比，SynthesizRR在词汇和语义多样性、与人类编写文本的相似性以及提炼性能方面都有很大的改进。我们在https://github.com/amazon-science/synthesizrr上发布了我们的大量代码库。

更新时间: 2024-07-08 11:20:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.10040v2

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify the assumptions under which a DA algorithm has good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, which can be estimated through conditional invariant penalty (CIP), play three prominent roles in providing target risk guarantees in DA. First, we propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings such as covariate shift and label shift. Second, we show that CICs help identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flipping features. We support our new algorithms and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, Camelyon17, and DomainNet datasets.

Updated: 2024-07-08 11:11:51

标题: 领域适应中有条件不变组件的重要作用：理论与算法

摘要: 域自适应（DA）是一个统计学习问题，当用于训练模型的源数据的分布与用于评估模型的目标数据的分布不同时出现。虽然许多DA算法已经展示出相当大的经验成功，但是盲目应用这些算法通常会导致在新数据集上的性能下降。为了解决这个问题，关键是澄清一个DA算法具有良好目标性能的假设。在这项工作中，我们关注存在条件不变组件（CICs）的假设，这些组件对于预测是相关的，并且在源数据和目标数据之间保持条件不变。我们证明了通过条件不变惩罚（CIP）估计的CICs在提供DA中目标风险保证方面发挥了三个突出作用。首先，我们提出了一个基于CICs的新算法，即基于重要性加权的条件不变惩罚（IW-CIP），它在超越简单设置（如协变量偏移和标签偏移）的目标风险保证方面。其次，我们展示CICs有助于识别其他DA算法的源和目标风险之间的巨大差异。最后，我们证明将CICs纳入域不变投影（DIP）算法中可以解决由标签翻转特征引起的失败情形。通过对合成数据、MNIST、CelebA、Camelyon17和DomainNet数据集的数值实验，我们支持了我们的新算法和理论发现。

更新时间: 2024-07-08 11:11:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2309.10301v2

An explainable three dimension framework to uncover learning patterns: A unified look in variable sulci recognition

The significant features identified in a representative subset of the dataset during the learning process of an artificial intelligence model are referred to as a 'global' explanation. Three-dimensional (3D) global explanations are crucial in neuroimaging where a complex representational space demands more than basic two-dimensional interpretations. Curently, studies in the literature lack accurate, low-complexity, and 3D global explanations in neuroimaging and beyond. To fill this gap, we develop a novel explainable artificial intelligence (XAI) 3D-Framework that provides robust, faithful, and low-complexity global explanations. We evaluated our framework on various 3D deep learning networks trained, validated, and tested on a well-annotated cohort of 596 MRI images. The focus of detection was on the presence or absence of the paracingulate sulcus, a highly variable feature of brain topology associated with symptoms of psychosis. Our proposed 3D-Framework outperformed traditional XAI methods in terms of faithfulness for global explanations. As a result, these explanations uncovered new patterns that not only enhance the credibility and reliability of the training process but also reveal the broader developmental landscape of the human cortex. Our XAI 3D-Framework proposes for the first time, a way to utilize global explanations to discover the context in which detection of specific features are embedded, opening our understanding of normative brain development and atypical trajectories that can lead to the emergence of mental illness.

Updated: 2024-07-08 11:07:55

标题: 一个可解释的三维框架用于揭示学习模式：在变量沟识别中统一的视角

摘要: 在人工智能模型学习过程中识别的数据集代表性子集中确定的重要特征被称为“全局”解释。在神经影像学中，三维（3D）全局解释至关重要，因为复杂的表征空间需要比基本的二维解释更多。目前，文献中对神经影像学及其他领域缺乏准确、低复杂度和3D全局解释的研究。为了填补这一空白，我们开发了一种新颖的可解释人工智能（XAI）3D框架，提供强大、忠实和低复杂度的全局解释。我们在一个包含596张MRI图像的经过良好标注的队列上对我们的框架进行了评估，这些图像训练、验证和测试了各种3D深度学习网络。检测的重点是前纹状回裂的存在或缺失，这是与精神病症状相关的大脑拓扑的高度可变特征。我们提出的3D框架在全局解释的忠实性方面胜过传统XAI方法。因此，这些解释揭示了新的模式，不仅增强了训练过程的可信度和可靠性，还揭示了人类皮层更广泛的发展格局。我们的XAI 3D框架首次提出了一种利用全局解释来发现特定特征检测所处上下文的方法，从而拓展了我们对正常大脑发育和导致心理疾病出现的非典型轨迹的理解。

更新时间: 2024-07-08 11:07:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.00903v4

Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing

High computational overhead is a troublesome problem for diffusion models. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the quantization of widely-used pretrained text-to-image models, e.g., Stable Diffusion, largely unexplored. In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost. Additionally, we demonstrate the previous metrics for text-to-image diffusion model quantization are not accurate due to the distribution gap. To tackle the problem, we propose a novel QDiffBench benchmark, which utilizes data in the same domain for more accurate evaluation. Besides, QDiffBench also considers the generalization performance of the quantized model outside the calibration dataset. Extensive experiments on Stable Diffusion and Stable Diffusion XL demonstrate the superiority of our method and benchmark. Moreover, we are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.

Updated: 2024-07-08 11:02:47

标题: 训练后的文本到图像扩散模型量化：具有渐进校准和激活放松的方法

摘要: 扩散模型的高计算开销是一个令人困扰的问题。最近的研究利用了后训练量化（PTQ）来压缩扩散模型。然而，大多数研究仅关注无条件模型，对广泛使用的预训练文本到图像模型，例如Stable Diffusion的量化几乎未被探索。在本文中，我们提出了一种新颖的后训练量化方法PCR（渐进校准和放松）用于文本到图像扩散模型，该方法包括一个考虑跨时间步累积量化误差的渐进校准策略，以及一个通过微不足道的成本改善性能的激活放松策略。此外，我们证明了以前用于文本到图像扩散模型量化的指标由于分布差异而不准确。为了解决这个问题，我们提出了一个新颖的QDiffBench基准，该基准利用相同领域的数据进行更准确的评估。此外，QDiffBench还考虑了校准数据集外的量化模型的泛化性能。对Stable Diffusion和Stable Diffusion XL的大量实验证明了我们方法和基准的优越性。此外，我们是第一个在保持性能的同时实现Stable Diffusion XL的量化。

更新时间: 2024-07-08 11:02:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.06322v3

CAM-Based Methods Can See through Walls

CAM-based methods are widely-used post-hoc interpretability method that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model's behavior.

Updated: 2024-07-08 11:00:51

标题: 基于计算机视觉的方法可以穿墙看见

摘要: CAM-based方法是广泛使用的后续可解释性方法，它生成一个显著性地图来解释图像分类模型的决策。显著性地图突出显示与预测相关的图像重要区域。在本文中，我们展示了大多数这些方法可能会错误地将重要评分归因于模型无法看到的图像部分。我们展示了这种现象在理论和实验上都发生。在理论方面，我们分析了GradCAM在简单遮罩CNN模型初始化时的行为。在实验上，我们训练了一个类似VGG的模型，限制其不使用图像的下半部分，但仍然观察到未见部分图像中出现正分数。这种行为在两个新数据集上定量评估。我们认为这是一个问题，可能导致对模型行为的错误解释。

更新时间: 2024-07-08 11:00:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.01964v2

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

We consider minimizing a twice-differentiable, $L$-smooth, and $\mu$-strongly convex objective $\phi$ over an $n\times n$ positive semidefinite matrix $M\succeq0$, under the assumption that the minimizer $M^{\star}$ has low rank $r^{\star}\ll n$. Following the Burer--Monteiro approach, we instead minimize the nonconvex objective $f(X)=\phi(XX^{T})$ over a factor matrix $X$ of size $n\times r$. This substantially reduces the number of variables from $O(n^{2})$ to as few as $O(n)$ and also enforces positive semidefiniteness for free, but at the cost of giving up the convexity of the original problem. In this paper, we prove that if the search rank $r\ge r^{\star}$ is overparameterized by a \emph{constant factor} with respect to the true rank $r^{\star}$, namely as in $r>\frac{1}{4}(L/\mu-1)^{2}r^{\star}$, then despite nonconvexity, local optimization is guaranteed to globally converge from any initial point to the global optimum. This significantly improves upon a previous rank overparameterization threshold of $r\ge n$, which we show is sharp in the absence of smoothness and strong convexity, but would increase the number of variables back up to $O(n^{2})$. Conversely, without rank overparameterization, we prove that such a global guarantee is possible if and only if $\phi$ is almost perfectly conditioned, with a condition number of $L/\mu<3$. Therefore, we conclude that a small amount of overparameterization can lead to large improvements in theoretical guarantees for the nonconvex Burer--Monteiro factorization.

Updated: 2024-07-08 10:58:33

标题: 通过秩超参数化改进非凸Burer-Monteiro分解的全局保证

摘要: 我们考虑在一个$n\times n$正半定矩阵$M\succeq0$上最小化一个两次可微、$L$-光滑和$\mu$-强凸的目标函数$\phi$，假设最小化者$M^{\star}$的秩很低$r^{\star}\ll n$。按照Burer--Monteiro方法，我们改为在一个大小为$n\times r$的因子矩阵$X$上最小化非凸目标函数$f(X)=\phi(XX^{T})$。这大大减少了变量的数量，从$O(n^{2})$减少到最少的$O(n)$，同时免费强制实现正半定性，但代价是放弃原问题的凸性。在本文中，我们证明了如果搜索秩$r\ge r^{\star}$相对于真实秩$r^{\star}$被\emph{常数因子}过度参数化，即$r>\frac{1}{4}(L/\mu-1)^{2}r^{\star}$，那么尽管非凸性，局部优化保证从任意初始点全局收敛到全局最优解。这明显改进了先前的秩过度参数化阈值$r\ge n$，我们展示在没有光滑和强凸性的情况下是尖锐的，但会把变量数量增加到$O(n^{2})$。相反，没有秩过度参数化，我们证明只有当$\phi$几乎完全条件良好，条件数为$L/\mu<3$时，这样的全局保证才是可能的。因此，我们得出结论，一点点过度参数化可以为非凸Burer--Monteiro因子分解提供大幅度的理论保证改进。

更新时间: 2024-07-08 10:58:33

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2207.01789v2

Cyber Physical Games

We describe a formulation of multi-agents operating within a Cyber-Physical System, resulting in collaborative or adversarial games. We show that the non-determinism inherent in the communication medium between agents and the underlying physical environment gives rise to environment evolution that is a probabilistic function of agents' strategies. We name these emergent properties Cyber Physical Games and study its properties. We present an algorithmic model that determines the most likely system evolution, approximating Cyber Physical Games through Probabilistic Finite State Automata, and evaluate it on collaborative and adversarial versions of the Iterated Boolean Game, comparing theoretical results with simulated ones. Results support the validity of the proposed model, and suggest several required research directions to continue evolving our understanding of Cyber Physical System, as well as how to best design agents that must operate within such environments.

Updated: 2024-07-08 10:54:14

标题: 网络物理游戏

摘要: 我们描述了在一个网络物理系统内运行的多智能体的公式，导致合作或对抗性游戏。我们展示了在智能体之间通信媒介和基础物理环境中固有的非确定性导致了一个环境演变，这是智能体策略的概率函数。我们将这些新出现的属性命名为网络物理游戏，并研究其属性。我们提出了一个算法模型，确定最有可能的系统演变，通过概率有限状态自动机近似网络物理游戏，并在迭代布尔游戏的合作和对抗版本上进行评估，将理论结果与模拟结果进行比较。结果支持所提出模型的有效性，并提出了几个必要的研究方向，以继续发展我们对网络物理系统的理解，以及如何最好地设计必须在这样的环境中运作的智能体。

更新时间: 2024-07-08 10:54:14

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.05817v1

Graph Reasoning Networks

Graph neural networks (GNNs) are the predominant approach for graph-based machine learning. While neural networks have shown great performance at learning useful representations, they are often criticized for their limited high-level reasoning abilities. In this work, we present Graph Reasoning Networks (GRNs), a novel approach to combine the strengths of fixed and learned graph representations and a reasoning module based on a differentiable satisfiability solver. While results on real-world datasets show comparable performance to GNN, experiments on synthetic datasets demonstrate the potential of the newly proposed method.

Updated: 2024-07-08 10:53:49

标题: 图推理网络

摘要: 图神经网络（GNNs）是基于图的机器学习的主要方法。虽然神经网络在学习有用的表示方面表现出色，但它们经常因其有限的高级推理能力而受到批评。在这项工作中，我们提出了图推理网络（GRNs），这是一种结合固定和学习图表示的优势以及基于可微分可满足性求解器的推理模块的新方法。虽然对真实数据集的结果显示与GNN相当的性能，对合成数据集的实验显示了这种新方法的潜力。

更新时间: 2024-07-08 10:53:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05816v1

Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.

Updated: 2024-07-08 10:51:03

标题: 跨领域少样本情境学习以增强交通标志识别

摘要: 最近的多模态大型语言模型（MLLM）如GPT-4o和GPT-4v在自动驾驶方面展现出了巨大潜力。在本文中，我们提出了一种基于MLLM的跨领域少样本情境学习方法，用于增强交通标志识别（TSR）能力。我们首先基于Vision Transformer Adapter和一个提取模块构建了一个交通标志检测网络，用于从原始道路图像中提取交通标志。为了减少对训练数据的依赖并提高跨国TSR的性能稳定性，我们引入了一种基于MLLM的跨领域少样本情境学习方法。为了增强MLLM对交通标志的细粒度识别能力，所提出的方法使用模板交通标志生成相应的描述文本。这些描述文本包含有关交通标志形状、颜色和构成的关键信息，可以激发MLLM感知细粒度交通标志类别的能力。通过使用描述文本，我们的方法减少了模板和真实交通标志之间的跨领域差异。我们的方法仅需要简单统一的文本指示，无需大规模交通标志图像和标签。我们对德国交通标志识别基准数据集、比利时交通标志数据集和两个来自日本的真实世界数据集进行了全面评估。实验结果表明，我们的方法显著提升了TSR性能。

更新时间: 2024-07-08 10:51:03

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.05814v1

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.

Updated: 2024-07-08 10:50:56

标题: 朝着自发幽默的多模态预测：一个新颖的数据集和初步结果

摘要: 幽默是人类社会行为、情感和认知的重要元素。其自动理解可以促进更自然的人工智能-人类互动。目前的幽默检测方法完全基于预先安排的数据，使它们不适用于“现实世界”应用。我们通过引入新颖的Passau-Spontaneous Football Coach Humor (Passau-SFCH)数据集来解决这一不足，该数据集包括约11小时的录音。Passau-SFCH数据集标注了幽默及其维度（情感和方向），如Martin的幽默风格问卷中所提出的。我们进行了一系列实验，使用了预训练的Transformer、卷积神经网络和专家设计的特征。分析了文本、音频、视频等各种模态在自发幽默识别中的表现，并对它们的互补性进行了调查。我们的研究结果表明，对于幽默及其情感的自动分析，面部表情是最有前途的，而幽默方向最好通过基于文本的特征建模。此外，我们尝试了不同的多模态方法来识别幽默，包括决策级融合和MulT，一种多模态Transformer方法。在这个背景下，我们提出了一种新颖的多模态架构，取得了最佳的整体结果。最后，我们在https://www.github.com/lc0197/passau-sfch 上公开了我们的代码。Passau-SFCH数据集可根据请求提供。

更新时间: 2024-07-08 10:50:56

领域: cs.LG,cs.CL,cs.CV,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2209.14272v3

Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT

The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduate medical imaging course in the Spring 2024 semester. This study investigates the use of ChatGPT throughout a semester-long trial, providing insights into students' engagement, perception, and the overall educational effectiveness of the technology. We systematically collected and analyzed data concerning students' interaction with ChatGPT, focusing on their attitudes, concerns, and usage patterns. The findings indicate that ChatGPT offers significant advantages such as improved information access and increased interactivity, but its adoption is accompanied by concerns about the accuracy of the information provided and the necessity for well-defined guidelines to optimize its use.

Updated: 2024-07-08 10:44:34

标题: 将AI整合到大学教育中：与ChatGPT的积极但复杂的经验

摘要: 人工智能（AI）聊天机器人（Chatbots）的整合进高等教育标志着向新一代教学工具的转变，与互联网的到来类似。随着ChatGPT-4 Turbo在2023年11月的推出，我们开发了基于ChatGPT的教学应用程序，并将其整合到2024年春季学期的本科医学影像课程中。本研究调查了ChatGPT在整个学期试用期间的使用情况，提供了关于学生参与、感知以及技术整体教育效果的见解。我们系统地收集并分析了有关学生与ChatGPT互动的数据，重点关注他们的态度、关注点和使用模式。研究结果表明，ChatGPT提供了诸如改善信息获取和增加互动性等显著优势，但其采用伴随着对所提供信息准确性的担忧以及优化其使用的明确定义指南的必要性。

更新时间: 2024-07-08 10:44:34

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.05810v1

FedMRL: Data Heterogeneity Aware Federated Multi-agent Deep Reinforcement Learning for Medical Imaging

Despite recent advancements in federated learning (FL) for medical image diagnosis, addressing data heterogeneity among clients remains a significant challenge for practical implementation. A primary hurdle in FL arises from the non-IID nature of data samples across clients, which typically results in a decline in the performance of the aggregated global model. In this study, we introduce FedMRL, a novel federated multi-agent deep reinforcement learning framework designed to address data heterogeneity. FedMRL incorporates a novel loss function to facilitate fairness among clients, preventing bias in the final global model. Additionally, it employs a multi-agent reinforcement learning (MARL) approach to calculate the proximal term $(\mu)$ for the personalized local objective function, ensuring convergence to the global optimum. Furthermore, FedMRL integrates an adaptive weight adjustment method using a Self-organizing map (SOM) on the server side to counteract distribution shifts among clients' local data distributions. We assess our approach using two publicly available real-world medical datasets, and the results demonstrate that FedMRL significantly outperforms state-of-the-art techniques, showing its efficacy in addressing data heterogeneity in federated learning. The code can be found here~{\url{https://github.com/Pranabiitp/FedMRL}}.

Updated: 2024-07-08 10:10:07

标题: FedMRL：面向数据异质性的医学影像联邦多智能体深度强化学习

摘要: 尽管最近在医学图像诊断的联邦学习（FL）方面取得了进展，但解决客户端之间的数据异质性仍然是实际实施的重要挑战。FL中的一个主要障碍来自于客户端之间数据样本的非IID性质，这通常导致聚合全局模型性能下降。在本研究中，我们介绍了FedMRL，这是一个旨在解决数据异质性的新型联邦多智能体深度强化学习框架。FedMRL包含一个新颖的损失函数，以促进客户端之间的公平性，防止最终全局模型中的偏见。此外，它采用多智能体强化学习（MARL）方法来计算个性化本地目标函数的近似项$(\mu)$，确保收敛到全局最优解。此外，FedMRL在服务器端集成了一种自组织图（SOM）上的自适应权重调整方法，以抵消客户端本地数据分布之间的分布变化。我们使用两个公开可用的真实医学数据集评估我们的方法，结果表明FedMRL明显优于最新的技术，显示了其在联邦学习中解决数据异质性的有效性。代码可以在这里找到~{\url{https://github.com/Pranabiitp/FedMRL}}。

更新时间: 2024-07-08 10:10:07

领域: cs.LG,cs.AI,cs.CV,cs.DC

下载: http://arxiv.org/abs/2407.05800v1

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

We address the challenging problem of dynamically pricing complementary items that are sequentially displayed to customers. An illustrative example is the online sale of flight tickets, where customers navigate through multiple web pages. Initially, they view the ticket cost, followed by ancillary expenses such as insurance and additional luggage fees. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. Our scenario also involves a sales constraint, which specifies a minimum number of items to sell, and uncertainty regarding customer demand curves. To tackle this problem, we originally formulate it as a Markov Decision Process with constraints. Leveraging online learning tools, we design a primal-dual online optimization algorithm. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, covering various configurations from stationary to non-stationary, and compare its performance in terms of constraints violation and regret against well-known baselines optimizing each state singularly.

Updated: 2024-07-08 09:55:31

标题: 一个用于顺序展示互补商品动态定价的原始-对偶在线学习方法在销售约束下

摘要: 我们解决了一个具有挑战性的问题，即动态定价顺序展示给客户的互补产品。一个示例是在线销售机票，客户通过多个网页浏览。起初，他们查看机票价格，然后是附加费用，如保险和额外行李费用。对互补产品的一致定价政策是必不可少的，因为单独优化每个项目的定价是无效的。我们的情景还涉及销售约束，指定了必须销售的最低数量，以及客户需求曲线的不确定性。为了解决这个问题，我们最初将其构建为一个带有约束的马尔可夫决策过程。利用在线学习工具，我们设计了一个原始-对偶在线优化算法。我们通过从真实数据随机生成的合成设置对我们的方法进行了实证评估，涵盖了从稳态到非稳态的各种配置，并在违反约束和后悔方面与优化每个状态的已知基线进行了比较。

更新时间: 2024-07-08 09:55:31

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.05793v1

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI.

Updated: 2024-07-08 09:54:09

标题: 未来人工智能：医疗保健中值得信赖和可部署人工智能的国际共识指南

摘要: 尽管在医学和医疗领域的人工智能（AI）取得了重大进展，但AI技术的部署和采用在真实世界的临床实践中仍然有限。近年来，人们对医疗AI所涉及的技术、临床、伦理和法律风险提出了关注。为了增加真实世界的采用，医疗AI工具必须得到患者、临床医生、卫生组织和管理机构的信任和接受。本研究描述了FUTURE-AI指南作为第一份国际共识框架，用于指导医疗保健中值得信赖的AI工具的开发和部署。FUTURE-AI联盟成立于2021年，目前由来自51个国家的118位跨学科专家组成，代表了所有大洲，包括AI科学家、临床医生、伦理学家和社会科学家。在两年的时间里，该联盟通过深入文献回顾、修改后的Delphi调查和在线共识会议的迭代过程，定义了值得信赖的AI的指导原则和最佳实践。FUTURE-AI框架建立在医疗保健中值得信赖的AI的六个指导原则上，即公平性、普适性、可追溯性、可用性、稳健性和可解释性。通过共识，制定了一套包括技术、临床、法律和社会伦理维度的28项最佳实践，涵盖了医疗AI的整个生命周期，从设计、开发和验证到监管、部署和监测。FUTURE-AI是一个风险感知、无假设的指导方针，为构建将在真实世界实践中受到信任、部署和采用的医疗AI工具提供了结构化方法。研究人员被鼓励在概念验证阶段考虑这些建议，以促进未来将医疗AI转化为临床实践。

更新时间: 2024-07-08 09:54:09

领域: cs.CY,cs.AI,cs.CV,cs.LG,I.2.0; I.4.0; I.5.0

下载: http://arxiv.org/abs/2309.12325v3

CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC

High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC.

Updated: 2024-07-08 09:51:02

标题: CANDID DAC：利用DAC中的耦合动作维度和重要性差异

摘要: 高维动作空间仍然是动态算法配置（DAC）的一个挑战。动作维度之间的相互依赖性和重要性的变化是DAC问题的另一个关键特征。我们认为这些带有重要性差异的耦合动作维度（CANDID）代表了尚未完全探索的DAC问题的一些方面。为了解决这一差距，我们在DACBench套件中引入了一个模拟CANDID属性的新的白盒基准。此外，我们提出了连续策略作为管理这些属性的有效策略。这些策略将动作空间分解，并通过学习每个动作维度的策略来减少指数增长。同时，这些策略通过促进隐式协调来适应动作维度之间的相互依赖性。我们在我们的新基准测试中对基于价值的策略进行了实验研究。这项研究表明，连续策略在CANDID动作空间中明显优于分解策略的独立学习。此外，它们克服了学习单个策略跨越所有动作维度的可伸缩性限制。我们在实验中使用的代码可在https://github.com/PhilippBordne/candidDAC上找到。

更新时间: 2024-07-08 09:51:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05789v1

Automated Computational Energy Minimization of ML Algorithms using Constrained Bayesian Optimization

Bayesian optimization (BO) is an efficient framework for optimization of black-box objectives when function evaluations are costly and gradient information is not easily accessible. BO has been successfully applied to automate the task of hyperparameter optimization (HPO) in machine learning (ML) models with the primary objective of optimizing predictive performance on held-out data. In recent years, however, with ever-growing model sizes, the energy cost associated with model training has become an important factor for ML applications. Here we evaluate Constrained Bayesian Optimization (CBO) with the primary objective of minimizing energy consumption and subject to the constraint that the generalization performance is above some threshold. We evaluate our approach on regression and classification tasks and demonstrate that CBO achieves lower energy consumption without compromising the predictive performance of ML models.

Updated: 2024-07-08 09:49:38

标题: 基于约束贝叶斯优化的机器学习算法自动计算能量最小化

摘要: 贝叶斯优化（BO）是一种高效的优化框架，用于在函数评估昂贵且梯度信息难以获取时优化黑盒目标。BO已成功应用于自动化机器学习（ML）模型中的超参数优化（HPO）任务，其主要目标是优化在持有数据上的预测性能。然而，近年来，随着模型规模不断增长，与模型训练相关的能量成本已成为机器学习应用的一个重要因素。在这里，我们评估了约束贝叶斯优化（CBO），其主要目标是最小化能量消耗，并受到泛化性能高于某个阈值的约束。我们在回归和分类任务上评估了我们的方法，并证明CBO在不损害ML模型预测性能的情况下实现了更低的能量消耗。

更新时间: 2024-07-08 09:49:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05788v1

Large Language Models for Judicial Entity Extraction: A Comparative Study

Domain-specific Entity Recognition holds significant importance in legal contexts, serving as a fundamental task that supports various applications such as question-answering systems, text summarization, machine translation, sentiment analysis, and information retrieval specifically within case law documents. Recent advancements have highlighted the efficacy of Large Language Models in natural language processing tasks, demonstrating their capability to accurately detect and classify domain-specific facts (entities) from specialized texts like clinical and financial documents. This research investigates the application of Large Language Models in identifying domain-specific entities (e.g., courts, petitioner, judge, lawyer, respondents, FIR nos.) within case law documents, with a specific focus on their aptitude for handling domain-specific language complexity and contextual variations. The study evaluates the performance of state-of-the-art Large Language Model architectures, including Large Language Model Meta AI 3, Mistral, and Gemma, in the context of extracting judicial facts tailored to Indian judicial texts. Mistral and Gemma emerged as the top-performing models, showcasing balanced precision and recall crucial for accurate entity identification. These findings confirm the value of Large Language Models in judicial documents and demonstrate how they can facilitate and quicken scientific research by producing precise, organised data outputs that are appropriate for in-depth examination.

Updated: 2024-07-08 09:49:03

标题: 大型语言模型用于司法实体提取：一项比较研究

摘要: 领域特定实体识别在法律背景下具有重要意义，是支持各种应用程序的基本任务，如问答系统、文本摘要、机器翻译、情感分析和信息检索，特别是在案例法文件中。最近的进展突显了大型语言模型在自然语言处理任务中的有效性，展示了它们能够准确检测和分类来自专业文本（如临床和财务文件）的领域特定事实（实体）。本研究调查了大型语言模型在识别案例法文件中的领域特定实体（例如法院、申诉人、法官、律师、被告、FIR编号）方面的应用，重点关注它们处理领域特定语言复杂性和上下文变化的能力。该研究评估了最先进的大型语言模型架构，包括大型语言模型Meta AI 3、Mistral和Gemma，在提取适合印度司法文本的司法事实方面的表现。Mistral和Gemma被认为是表现最佳的模型，展示了平衡的精确度和召回率，对于准确的实体识别至关重要。这些发现证实了大型语言模型在司法文件中的价值，并展示了它们如何通过产生精确、有条理的数据输出来促进和加快科学研究，这些数据输出适合深入研究。

更新时间: 2024-07-08 09:49:03

领域: cs.CL,cs.AI,I.2.1

下载: http://arxiv.org/abs/2407.05786v1

Self-Labeling the Job Shop Scheduling Problem

In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative models by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, completely removing the need for optimality information. We prove the effectiveness of this Self-Labeling strategy on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the Reinforcement Learning community. We propose a generative model based on the well-known Pointer Network and train it with our strategy. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and current state-of-the-art learning proposals for the JSP.

Updated: 2024-07-08 09:47:59

标题: 自我标记的作业车间调度问题

摘要: 在这项工作中，我们提出了一种专门针对组合问题设计的自监督训练策略。将监督范式应用于这类问题的主要障碍之一是需要昂贵的目标解作为地面真相，通常由昂贵的精确解算器产生。受到半监督和自监督学习的启发，我们表明可以通过采样多个解并根据问题目标使用最佳解作为伪标签来轻松训练生成模型。通过这种方式，我们通过仅依赖于其自我监督来迭代改进模型的生成能力，完全消除了优化信息的需求。我们证明了这种自标记策略在作业车间调度（JSP）上的有效性，这是一个受到强化学习社区高度关注的复杂组合问题。我们提出了一个基于著名的指针网络的生成模型，并用我们的策略进行训练。对流行基准测试的实验显示了这种方法的潜力，因为结果模型优于构造启发式和当前JSP的最新学习提议。

更新时间: 2024-07-08 09:47:59

领域: cs.LG,cs.AI,math.CO,I.2; G.2

下载: http://arxiv.org/abs/2401.11849v2

A Survey of Fragile Model Watermarking

Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years. Unlike robust watermarks, which are widely used for identifying model copyrights, fragile watermarks for models are designed to identify whether models have been subjected to unexpected alterations such as backdoors, poisoning, compression, among others. These alterations can pose unknown risks to model users, such as misidentifying stop signs as speed limit signs in classic autonomous driving scenarios. This paper provides an overview of the relevant work in the field of model fragile watermarking since its inception, categorizing them and revealing the developmental trajectory of the field, thus offering a comprehensive survey for future endeavors in model fragile watermarking.

Updated: 2024-07-08 09:47:01

标题: 一个脆弱模型水印技术调查

摘要: 模型脆弱水印技术，受到对抗性攻击神经网络领域和传统多媒体脆弱水印技术的启发，逐渐成为检测篡改的有效工具，并在近年来迅速发展。与用于识别模型版权的稳健水印不同，用于模型的脆弱水印旨在识别模型是否经历了意外的修改，如后门、中毒、压缩等。这些修改可能对模型用户造成未知风险，例如在经典的自动驾驶场景中将停车标志误认为限速标志。本文概述了自模型脆弱水印技术出现以来在该领域的相关工作，对其进行分类并揭示了该领域的发展轨迹，为未来在模型脆弱水印技术方面的努力提供了全面的调查。

更新时间: 2024-07-08 09:47:01

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.04809v4

Sequential Contrastive Audio-Visual Learning

Contrastive learning has emerged as a powerful technique in audio-visual representation learning, leveraging the natural co-occurrence of audio and visual modalities in extensive web-scale video datasets to achieve significant advancements. However, conventional contrastive audio-visual learning methodologies often rely on aggregated representations derived through temporal aggregation, which neglects the intrinsic sequential nature of the data. This oversight raises concerns regarding the ability of standard approaches to capture and utilize fine-grained information within sequences, information that is vital for distinguishing between semantically similar yet distinct examples. In response to this limitation, we propose sequential contrastive audio-visual learning (SCAV), which contrasts examples based on their non-aggregated representation space using sequential distances. Retrieval experiments with the VGGSound and Music datasets demonstrate the effectiveness of SCAV, showing 2-3x relative improvements against traditional aggregation-based contrastive learning and other methods from the literature. We also show that models trained with SCAV exhibit a high degree of flexibility regarding the metric employed for retrieval, allowing them to operate on a spectrum of efficiency-accuracy trade-offs, potentially making them applicable in multiple scenarios, from small- to large-scale retrieval.

Updated: 2024-07-08 09:45:20

标题: 连续对比音频-视觉学习

摘要: 对比学习已经成为音视频表示学习中的一种强大技术，利用广泛的网络规模视频数据集中音频和视觉模态的自然共现来取得显著进展。然而，传统的对比音视频学习方法通常依赖通过时间聚合得出的聚合表示，这忽略了数据的固有顺序性质。这一疏忽引发了关于标准方法捕获和利用序列内细粒度信息的能力的担忧，这些信息对于区分语义上相似但有明显差异的示例至关重要。为了应对这一限制，我们提出了顺序对比音视频学习（SCAV），该方法通过序列距离对示例进行对比，而不是在非聚合表示空间上进行对比。利用VGGSound和音乐数据集进行检索实验表明，SCAV的有效性，相对于传统基于聚合和文献中的其他方法，显示出2-3倍的相对改进。我们还表明，使用SCAV训练的模型在检索时表现出高度的灵活性，可以在效率-准确性权衡的光谱上运行，潜在地使其适用于多种场景，从小规模到大规模检索。

更新时间: 2024-07-08 09:45:20

领域: cs.SD,cs.CV,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2407.05782v1

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static settings. Toward rigorously establishing the benefit of representation learning in dynamic settings, we analyze the regret of multi-task representation learning for linear-quadratic control. This setting introduces unique challenges. Firstly, we must account for and balance the $\textit{misspecification}$ introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is "benign", the regret of any agent after $T$ timesteps scales as $\tilde O(\sqrt{T/H})$, where $H$ is the number of agents. In settings with "difficult" exploration, the regret scales as $\tilde{\mathcal O}(\sqrt{d_u d_\theta} \sqrt{T} + T^{3/4}/H^{1/5})$, where $d_x$ is the state-space dimension, $d_u$ is the input dimension, and $d_\theta$ is the task-specific parameter count. In both cases, by comparing to the minimax single-task regret $\tilde{\mathcal O}(\sqrt{d_x d_u^2}\sqrt{T})$, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small $d_\theta < d_x d_u$. Lastly, we provide numerical validation of the trends we predict.

Updated: 2024-07-08 09:41:42

标题: 多任务表示学习在线性二次自适应控制中的后悔分析

摘要: 表征学习是一种强大的工具，它通过强制所有代理在一组共同学习的特征上操作，使得可以在大量代理或领域上进行学习。然而，许多机器人学或控制应用程序可能会从协作中受益，这些应用程序在具有不断变化的环境和目标的设置中运行，而表示学习的大多数保证是针对静态设置的。为了严格确定表示学习在动态设置中的益处，我们分析了线性二次控制的多任务表示学习的遗憾。这种设置引入了独特的挑战。首先，我们必须考虑并平衡近似表示引入的规范错误。其次，我们不能依赖于单任务在线LQR的参数更新方案，其中最小二乘法通常足够，必须设计一种新颖的方案以确保足够的改进。我们证明，在“良性”探索的情况下，T个时间步之后任何代理的遗憾都会按照$\tilde O(\sqrt{T/H})$的比例增长，其中H是代理的数量。在“困难”探索的情况下，遗憾将按照$\tilde{\mathcal O}(\sqrt{d_u d_\theta} \sqrt{T} + T^{3/4}/H^{1/5})$的比例增长，其中$d_x$是状态空间维度，$d_u$是输入维度，$d_\theta$是任务特定参数计数。在这两种情况下，通过与最小化单任务遗憾$\tilde{\mathcal O}(\sqrt{d_x d_u^2}\sqrt{T})$进行比较，我们可以看到大量代理的益处。值得注意的是，在困难探索情况下，通过在任务之间共享表示，有效的任务特定参数计数通常较小$d_\theta < d_x d_u$。最后，我们对我们预测的趋势进行了数值验证。

更新时间: 2024-07-08 09:41:42

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.05781v1

TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction

The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.

Updated: 2024-07-08 09:40:03

标题: TbExplain：一种基于文本的场景分类模型解释方法，具有统计预测校正功能

摘要: 可解释人工智能（XAI）领域旨在提高黑盒机器学习模型的可解释性。基于输入特征的重要性值构建热图是解释这些模型产生预测的基本功能的一种流行方法。热图对人类几乎是可理解的，但并非没有缺陷。例如，非专业用户可能无法完全理解热图的逻辑（突出显示与模型预测相关的像素的逻辑以不同的强度或颜色）。此外，与模型预测相关的输入图像的对象和区域经常不能完全通过热图区分。在本文中，我们提出了一个名为TbExplain的框架，该框架利用XAI技术和预先训练的对象检测器来呈现基于文本的场景分类模型解释。此外，TbExplain结合了一种新颖的方法，根据输入图像中对象的统计数据来纠正预测并在初始预测不可靠时进行文本解释。为了评估基于文本的解释的可信度和有效性，我们进行了定性实验，结果表明这些解释是足够可靠的。此外，我们对TbExplain在场景分类数据集上进行的定量和定性实验显示，与ResNet变体相比，分类准确性有所提高。

更新时间: 2024-07-08 09:40:03

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2307.10003v2

When is the consistent prediction likely to be a correct prediction?

Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all outputs, are more likely to be correct. This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning with no custom prompts merely while generating longer responses, which lead to consistent predictions that are more accurate. In the zero-shot setting, by sampling Mixtral-8x7B model multiple times and considering longer responses, we achieve 86% of its self-consistency performance obtained through zero-shot CoT prompting on the GSM8K and MultiArith datasets. Finally, we demonstrate that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.

Updated: 2024-07-08 09:37:27

标题: 何时一致的预测可能是正确的预测？

摘要: 自洽性（王等人，2023年）表明，通过大型语言模型（LLMs）获得的最一致答案更有可能是正确的。在本文中，我们对这一观点提出了挑战，并提出了一个微妙的修正。我们的观察表明，通过更多计算即更长的推理文本得出的一致答案，而不仅仅是所有输出中最一致的答案，更有可能是正确的。这主要是因为我们证明了LLMs可以在生成更长的响应时自主产生思维链式（CoT）风格的推理，而无需定制提示，从而导致更准确的一致预测。在零样本设置中，通过多次采样Mixtral-8x7B模型并考虑更长的响应，我们在GSM8K和MultiArith数据集上实现了其零样本CoT提示的自洽性性能的86%。最后，我们证明LLMs生成更长响应的概率非常低，突显出需要基于输出长度的解码策略。

更新时间: 2024-07-08 09:37:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05778v1

Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

We believe that agents for automated incident response based on machine learning need to handle changes in network structure. Computer networks are dynamic, and can naturally change in structure over time. Retraining agents for small network changes costs time and energy. We attempt to address this issue with an existing method of relational agent learning, where the relations between objects are assumed to remain consistent across problem instances. The state of the computer network is represented as a relational graph and encoded through a message passing neural network. The message passing neural network and an agent policy using the encoding are optimized end-to-end using reinforcement learning. We evaluate the approach on the second instance of the Cyber Autonomy Gym for Experimentation (CAGE~2), a cyber incident simulator that simulates attacks on an enterprise network. We create variants of the original network with different numbers of hosts and agents are tested without additional training on them. Our results show that agents using relational information are able to find solutions despite changes to the network, and can perform optimally in some instances. Agents using the default vector state representation perform better, but need to be specially trained on each network variant, demonstrating a trade-off between specialization and generalization.

Updated: 2024-07-08 09:34:22

标题: 自主网络事件响应中的结构泛化：基于消息传递神经网络和强化学习

摘要: 我们认为，基于机器学习的自动化事件响应代理需要处理网络结构的变化。计算机网络是动态的，随着时间的推移，结构自然会发生变化。为了处理小范围的网络变化，重新训练代理需要耗费时间和精力。我们试图通过现有的关系代理学习方法来解决这个问题，其中假定对象之间的关系在问题实例之间保持一致。计算机网络的状态被表示为一个关系图，并通过消息传递神经网络进行编码。消息传递神经网络和使用编码的代理策略通过强化学习进行端到端优化。我们在Cyber Autonomy Gym for Experimentation（CAGE~2）的第二个实例上评估了这种方法，这是一个模拟对企业网络发动攻击的网络事件模拟器。我们创建了原始网络的变体，其中包含不同数量的主机，代理在没有额外训练的情况下进行测试。我们的结果显示，使用关系信息的代理能够在网络发生变化时找到解决方案，并在某些情况下表现最佳。使用默认向量状态表示的代理表现更好，但需要在每个网络变体上进行专门训练，展示了专业化和泛化之间的权衡。

更新时间: 2024-07-08 09:34:22

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.05775v1

Building Call Graph of WebAssembly Programs via Abstract Semantics

WebAssembly is a binary format for code that is gaining popularity thanks to its focus on portability and performance. Currently, the most common use case for WebAssembly is execution in a browser. It is also being increasingly adopted as a stand-alone application due to its portability. The binary format of WebAssembly, however, makes it prone to being used as a vehicle for malicious software. For instance, one could embed a cryptocurrency miner in code executed by a browser. As a result, there is substantial interest in developing tools for WebAssembly security verification, information flow control, and, more generally, for verifying behavioral properties such as correct API usage. In this document, we address the issue of building call graphs for WebAssembly code. This is important because having or computing a call graph is a prerequisite for most inter-procedural verification tasks. In this paper, we propose a formal solution based on the theory of Abstract Interpretation. We compare our approach to the state-of-the-art by predicting how it would perform against a set of specifically crafted benchmark programs.

Updated: 2024-07-08 09:32:47

标题: 通过抽象语义构建WebAssembly程序的调用图

摘要: WebAssembly是一种二进制代码格式，因其便携性和性能而日益受到欢迎。目前，WebAssembly最常见的用例是在浏览器中执行。由于其便携性，它也越来越多地被采用为独立应用程序。然而，WebAssembly的二进制格式使其容易被用作恶意软件的载体。例如，可以在浏览器中执行的代码中嵌入加密货币挖矿程序。因此，人们对开发用于WebAssembly安全验证、信息流控制以及验证行为属性（如正确的API使用）的工具表现出了极大的兴趣。在本文中，我们解决了为WebAssembly代码构建调用图的问题。这一点很重要，因为拥有或计算调用图是大多数过程间验证任务的先决条件。在本文中，我们提出了一种基于抽象解释理论的正式解决方案。我们通过预测其在一组专门设计的基准程序上的表现来将我们的方法与最新技术进行比较。

更新时间: 2024-07-08 09:32:47

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2407.14527v1

Conditional computation in neural networks: principles and research trends

This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.

Updated: 2024-07-08 09:21:00

标题: 神经网络中的条件计算：原理和研究趋势

摘要: 本文总结了将\textit{条件计算}方法应用于神经网络设计的新兴领域的原则和思想。我们特别关注可以根据输入动态激活或取消其计算图的部分的神经网络。示例包括动态选择输入标记、层（或层集）以及每层内的子模块（例如，卷积滤波器中的通道）。我们首先提供了一种描述这些技术的一般形式化方法。然后，我们介绍了这些原则的三个显著实现：专家混合网络（MoEs）、标记选择机制和提前退出神经网络。本文旨在为这一不断增长的领域提供类似教程的介绍。为此，我们分析了这些模块化设计在效率、可解释性和迁移学习方面的优势，重点关注从自动科学发现到语义交流等新兴应用领域。

更新时间: 2024-07-08 09:21:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07965v2

Multi-agent Reinforcement Learning-based Network Intrusion Detection System

Intrusion Detection Systems (IDS) play a crucial role in ensuring the security of computer networks. Machine learning has emerged as a popular approach for intrusion detection due to its ability to analyze and detect patterns in large volumes of data. However, current ML-based IDS solutions often struggle to keep pace with the ever-changing nature of attack patterns and the emergence of new attack types. Additionally, these solutions face challenges related to class imbalance, where the number of instances belonging to different classes (normal and intrusions) is significantly imbalanced, which hinders their ability to effectively detect minor classes. In this paper, we propose a novel multi-agent reinforcement learning (RL) architecture, enabling automatic, efficient, and robust network intrusion detection. To enhance the capabilities of the proposed model, we have improved the DQN algorithm by implementing the weighted mean square loss function and employing cost-sensitive learning techniques. Our solution introduces a resilient architecture designed to accommodate the addition of new attacks and effectively adapt to changes in existing attack patterns. Experimental results realized using CIC-IDS-2017 dataset, demonstrate that our approach can effectively handle the class imbalance problem and provide a fine grained classification of attacks with a very low false positive rate. In comparison to the current state-of-the-art works, our solution demonstrates a significant superiority in both detection rate and false positive rate.

Updated: 2024-07-08 09:18:59

标题: 多智能体强化学习网络入侵检测系统

摘要: 入侵检测系统（IDS）在确保计算机网络安全方面起着至关重要的作用。机器学习已经成为入侵检测的一种流行方法，因为它能够分析和检测大量数据中的模式。然而，当前基于机器学习的IDS解决方案经常难以跟上攻击模式不断变化和新攻击类型的出现。此外，这些解决方案面临与类别不平衡相关的挑战，即属于不同类别（正常和入侵）的实例数量明显不平衡，这阻碍了它们有效检测次要类别的能力。在本文中，我们提出了一种新颖的多智能体强化学习（RL）架构，实现了自动、高效和强大的网络入侵检测。为了增强所提出模型的能力，我们改进了DQN算法，通过实施加权均方损失函数和采用成本敏感学习技术。我们的解决方案引入了一个弹性架构，旨在适应新攻击的加入并有效适应现有攻击模式的变化。使用CIC-IDS-2017数据集实现的实验结果表明，我们的方法可以有效解决类别不平衡问题，并以非常低的误报率提供对攻击的细粒度分类。与当前最先进的工作相比，我们的解决方案在检测率和误报率方面表现出显著的优势。

更新时间: 2024-07-08 09:18:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.05766v1

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.

Updated: 2024-07-08 09:08:42

标题: 多模式大型语言模型在医学图像和自由文本报告数据挖掘中的潜力

摘要: 医学影像和放射学报告对于诊断医疗状况至关重要，凸显了定量分析对临床决策的重要性。然而，这些数据的多样性和跨来源的异质性挑战了当前数据挖掘方法的推广性。多模态大语言模型（MLLMs）最近已经改变了许多领域，显著影响了医学领域。值得注意的是，Gemini-Vision-Series（Gemini）和GPT-4-Series（GPT-4）模型在计算机视觉的人工通用智能（AGI）中体现了一种范式转变，展示了它们在生物医学领域的潜力。在本研究中，我们评估了Gemini、GPT-4和4个流行的大型模型在14个医学影像数据集（包括5个医学影像类别：皮肤科、放射科、牙科、眼科和内窥镜）和3个放射学报告数据集中的表现。研究任务涵盖了疾病分类、病变分割、解剖定位、疾病诊断、报告生成和病变检测。我们的实验结果表明，Gemini系列模型在报告生成和病变检测方面表现出色，但在疾病分类和解剖定位方面面临挑战。相反，GPT系列模型在病变分割和解剖定位方面表现出色，但在疾病诊断和病变检测方面遇到困难。此外，Gemini系列和GPT系列都包含已经证明具有可观生成效率的模型。虽然这两种模型都有望减轻医生的工作量，减轻有限医疗资源的压力，并促进临床从业者与人工智能技术之间的合作，但在临床部署之前，仍然需要进行实质性的改进和全面的验证。

更新时间: 2024-07-08 09:08:42

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05758v1

Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Health monitoring, fault analysis, and detection are critical for the safe and sustainable operation of battery systems. We apply Gaussian process resistance models on lithium iron phosphate battery field data to effectively separate the time-dependent and operating point-dependent resistance. The data set contains 29 battery systems returned to the manufacturer for warranty, each with eight cells in series, totaling 232 cells and 131 million data rows. We develop probabilistic fault detection rules using recursive spatiotemporal Gaussian processes. These processes allow the quick processing of over a million data points, enabling advanced online monitoring and furthering the understanding of battery pack failure in the field. The analysis underlines that often, only a single cell shows abnormal behavior or a knee point, consistent with weakest-link failure for cells connected in series, amplified by local resistive heating. The results further the understanding of how batteries degrade and fail in the field and demonstrate the potential of efficient online monitoring based on data. We open-source the code and publish the large data set upon completion of the review of this article.

Updated: 2024-07-08 09:07:51

标题: 锂离子电池系统健康监测和故障分析：使用高斯过程从现场数据中进行

摘要: 健康监测、故障分析和检测对于电池系统的安全和可持续运行至关重要。我们将高斯过程阻抗模型应用于锂铁磷酸盐电池的现场数据，以有效区分时间依赖和工作点依赖的阻抗。数据集包含29个退回制造商进行保修的电池系统，每个系统有8个串联电池，总计232个电池和1.31亿数据行。我们利用递归时空高斯过程开发概率故障检测规则。这些过程能够快速处理超过一百万个数据点，实现先进的在线监测，并进一步加深对电池组在现场故障的理解。分析强调，通常只有一个电池显示异常行为或拐点，这与串联连接的电池的最弱环节故障相一致，并受到局部阻性加热的放大。结果进一步加深了对电池在现场如何退化和失败的理解，并展示了基于数据的高效在线监测的潜力。我们将代码开源，并在完成本文审查后发布大型数据集。

更新时间: 2024-07-08 09:07:51

领域: cs.LG,cs.AI,cs.SY,eess.SY,stat.AP,I.2.6

下载: http://arxiv.org/abs/2406.19015v2

On the Completeness of Invariant Geometric Deep Learning Models

Invariant models, one important class of geometric deep learning models, are capable of generating meaningful geometric representations by leveraging informative geometric features in point clouds. These models are characterized by their simplicity, good experimental results and computational efficiency. However, their theoretical expressive power still remains unclear, restricting a deeper understanding of the potential of such models. In this work, we concentrate on characterizing the theoretical expressiveness of a wide range of invariant models. We first rigorously bound the expressiveness of the most classic invariant model, message-passing neural networks incorporating distance (DisGNN), restricting its unidentifiable cases to be only highly symmetric point clouds. We then show that GeoNGNN, the geometric counterpart of one of the simplest subgraph graph neural networks (subgraph GNNs), can effectively break these corner cases' symmetry and thus achieve E(3)-completeness. By leveraging GeoNGNN as a theoretical tool, we further prove that: 1) most subgraph GNNs developed in traditional graph learning can be seamlessly extended to geometric scenarios with E(3)-completeness; 2) DimeNet, GemNet and SphereNet, three well-established invariant models, are also all capable of achieving E(3)-completeness. Our theoretical results fill the gap in the theoretical power of invariant models, contributing to a rigorous and comprehensive understanding of their capabilities. We also empirically evaluated GeoNGNN, the simplest model within the large E(3)-complete family we established, which achieves competitive results to models relying on high-order invariant/equivariant representations on molecule-relevant tasks.

Updated: 2024-07-08 08:57:35

标题: 关于不变几何深度学习模型完备性的研究

摘要: 不变模型是几何深度学习模型中的一个重要类别，通过利用点云中的信息性几何特征，能够生成有意义的几何表示。这些模型以其简单性、良好的实验结果和计算效率而闻名。然而，它们的理论表达能力仍然不清楚，限制了对这类模型潜力的深入理解。在这项工作中，我们集中于表征广泛的不变模型的理论表达能力。我们首先严格限制了最经典的不变模型，即包含距离的消息传递神经网络（DisGNN）的表达能力，将其无法识别的情况限定为高度对称的点云。然后，我们展示了GeoNGNN，这是最简单的子图图神经网络（subgraph GNNs）的几何对应物，可以有效打破这些特殊情况的对称性，从而实现E(3)-完备性。通过将GeoNGNN作为理论工具，我们进一步证明：1）传统图学习中开发的大多数子图GNNs可以无缝扩展到具有E(3)-完备性的几何场景；2）DimeNet、GemNet和SphereNet，三个成熟的不变模型，也都能够实现E(3)-完备性。我们的理论结果填补了不变模型理论能力的空白，有助于对其能力的严格和全面理解。我们还对GeoNGNN进行了经验评估，这是我们建立的大型E(3)-完备家族中最简单的模型，其在依赖高阶不变/等变表示的分子相关任务上取得了竞争性结果。

更新时间: 2024-07-08 08:57:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.04836v2

LDGCN: An Edge-End Lightweight Dual GCN Based on Single-Channel EEG for Driver Drowsiness Monitoring

Driver drowsiness electroencephalography (EEG) signal monitoring can timely alert drivers of their drowsiness status, thereby reducing the probability of traffic accidents. Graph convolutional networks (GCNs) have shown significant advancements in processing the non-stationary, time-varying, and non-Euclidean nature of EEG signals. However, the existing single-channel EEG adjacency graph construction process lacks interpretability, which hinders the ability of GCNs to effectively extract adjacency graph features, thus affecting the performance of drowsiness monitoring. To address this issue, we propose an edge-end lightweight dual graph convolutional network (LDGCN). Specifically, we are the first to incorporate neurophysiological knowledge to design a Baseline Drowsiness Status Adjacency Graph (BDSAG), which characterizes driver drowsiness status. Additionally, to express more features within limited EEG data, we introduce the Augmented Graph-level Module (AGM). This module captures global and local information at the graph level, ensuring that BDSAG features remain intact while enhancing effective feature expression capability. Furthermore, to deploy our method on the fourth-generation Raspberry Pi, we utilize Adaptive Pruning Optimization (APO) on both channels and neurons, reducing inference latency by almost half. Experiments on benchmark datasets demonstrate that LDGCN offers the best trade-off between monitoring performance and hardware resource utilization compared to existing state-of-the-art algorithms. All our source code can be found at https://github.com/BryantDom/Driver-Drowsiness-Monitoring.

Updated: 2024-07-08 08:55:25

标题: LDGCN：基于单通道脑电图的驾驶员疲劳监测的边缘端轻量级双GCN

摘要: 司机疲劳脑电图（EEG）信号监测可以及时提醒司机他们的疲劳状态，从而降低交通事故的发生概率。图卷积网络（GCNs）在处理非平稳、时变和非欧几里德的EEG信号方面取得了显著进展。然而，现有的单通道EEG邻接图构建过程缺乏可解释性，这影响了GCNs有效提取邻接图特征的能力，从而影响了疲劳监测的性能。为了解决这个问题，我们提出了一种边缘端轻量级双图卷积网络（LDGCN）。具体来说，我们是第一个将神经生理知识纳入设计基线疲劳状态邻接图（BDSAG）的人，该图表征司机的疲劳状态。此外，为了在有限的EEG数据中表达更多特征，我们引入了增强图级模块（AGM）。该模块在图级别捕获全局和局部信息，确保BDSAG特征保持完整，同时增强了有效特征表达能力。此外，为了在第四代树莓派上部署我们的方法，我们利用自适应修剪优化（APO）对两个通道和神经元进行优化，将推理延迟减少了近一半。对基准数据集的实验表明，与现有最先进的算法相比，LDGCN在监测性能和硬件资源利用方面提供了最佳折衷。我们的所有源代码都可以在https://github.com/BryantDom/Driver-Drowsiness-Monitoring找到。

更新时间: 2024-07-08 08:55:25

领域: eess.SP,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.05749v1

The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs

Post-training quantization reduces the computational demand of Large Language Models (LLMs) but can weaken some of their capabilities. Since LLM abilities emerge with scale, smaller LLMs are more sensitive to quantization. In this paper, we explore how quantization affects smaller LLMs' ability to perform retrieval-augmented generation (RAG), specifically in longer contexts. We chose personalization for evaluation because it is a challenging domain to perform using RAG as it requires long-context reasoning over multiple documents. We compare the original FP16 and the quantized INT4 performance of multiple 7B and 8B LLMs on two tasks while progressively increasing the number of retrieved documents to test how quantized models fare against longer contexts. To better understand the effect of retrieval, we evaluate three retrieval models in our experiments. Our findings reveal that if a 7B LLM performs the task well, quantization does not impair its performance and long-context reasoning capabilities. We conclude that it is possible to utilize RAG with quantized smaller LLMs.

Updated: 2024-07-08 08:52:56

标题: 量化对检索增强生成的影响：小型LLM的分析

摘要: 训练后的量化降低了大型语言模型（LLMs）的计算需求，但可能会削弱它们的某些能力。由于LLM的能力随着规模的增大而出现，较小的LLM对量化更为敏感。在本文中，我们探讨了量化如何影响较小LLM在长文本情境下执行检索增强生成（RAG）的能力。我们选择了个性化进行评估，因为在RAG中执行个性化是一个具有挑战性的领域，需要对多个文档进行长文本推理。我们比较了多个7B和8B LLM的原始FP16和量化INT4性能在两个任务中的表现，同时逐渐增加检索文档的数量，以测试量化模型在更长情境下的表现。为了更好地理解检索的影响，我们在实验中评估了三种检索模型。我们的研究结果表明，如果一个7B LLM能够很好地执行任务，量化不会损害其性能和长文本推理能力。我们得出结论，可以利用量化的较小LLM进行RAG。

更新时间: 2024-07-08 08:52:56

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.10251v2

MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition

In this work, we detail our submission to the 2024 edition of the MSP-Podcast Speech Emotion Recognition (SER) Challenge. This challenge is divided into two distinct tasks: Categorical Emotion Recognition and Emotional Attribute Prediction. We concentrated our efforts on Task 1, which involves the categorical classification of eight emotional states using data from the MSP-Podcast dataset. Our approach employs an ensemble of models, each trained independently and then fused at the score level using a Support Vector Machine (SVM) classifier. The models were trained using various strategies, including Self-Supervised Learning (SSL) fine-tuning across different modalities: speech alone, text alone, and a combined speech and text approach. This joint training methodology aims to enhance the system's ability to accurately classify emotional states. This joint training methodology aims to enhance the system's ability to accurately classify emotional states. Thus, the system obtained F1-macro of 0.35\% on development set.

Updated: 2024-07-08 08:52:06

标题: MSP-Podcast SER Challenge 2024: The Ventoux Antenna Multimodal Self-Supervised Learning for Speech Emotion Recognition

摘要: 在这项工作中，我们详细介绍了我们参加2024年MSP-Podcast Speech Emotion Recognition（SER）挑战赛的提交内容。该挑战赛分为两个不同的任务：分类情感识别和情感属性预测。我们集中精力在任务1上，该任务涉及使用MSP-Podcast数据集的数据对八种情感状态进行分类。我们的方法采用了一组模型的集成，每个模型都经过独立训练，然后在得分级别使用支持向量机（SVM）分类器进行融合。模型使用各种策略进行训练，包括跨不同模态的自监督学习（SSL）微调：仅语音、仅文本和结合语音和文本的方法。这种联合训练方法旨在增强系统准确分类情感状态的能力。因此，系统在开发集上获得了0.35％的F1-macro。

更新时间: 2024-07-08 08:52:06

领域: cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.05746v1

Multi-Bit Mechanism: A Novel Information Transmission Paradigm for Spiking Neural Networks

Since proposed, spiking neural networks (SNNs) gain recognition for their high performance, low power consumption and enhanced biological interpretability. However, while bringing these advantages, the binary nature of spikes also leads to considerable information loss in SNNs, ultimately causing performance degradation. We claim that the limited expressiveness of current binary spikes, resulting in substantial information loss, is the fundamental issue behind these challenges. To alleviate this, our research introduces a multi-bit information transmission mechanism for SNNs. This mechanism expands the output of spiking neurons from the original single bit to multiple bits, enhancing the expressiveness of the spikes and reducing information loss during the forward process, while still maintaining the low energy consumption advantage of SNNs. For SNNs, this represents a new paradigm of information transmission. Moreover, to further utilize the limited spikes, we extract effective signals from the previous layer to re-stimulate the neurons, thus encouraging full spikes emission across various bit levels. We conducted extensive experiments with our proposed method using both direct training method and ANN-SNN conversion method, and the results show consistent performance improvements.

Updated: 2024-07-08 08:46:31

标题: 多位机制：一种新的脉冲神经网络信息传输范式

摘要: 自提出以来，尖峰神经网络（SNNs）因其高性能、低功耗和增强的生物解释性而得到认可。然而，虽然带来这些优势，尖峰的二进制特性也导致了SNNs中相当大的信息丢失，最终导致了性能下降。我们认为，当前二进制尖峰的有限表达能力导致了信息丢失，这是这些挑战背后的根本问题。为了缓解这一问题，我们的研究引入了一种多位信息传输机制用于SNNs。该机制将尖峰神经元的输出从原始的单个位扩展到多个位，增强了尖峰的表达能力，并在前向过程中减少了信息丢失，同时仍保持了SNNs的低能耗优势。对于SNNs，这代表着一种新的信息传输范式。此外，为了进一步利用有限的尖峰，我们从前一层中提取有效信号来重新激励神经元，从而鼓励在各种位级别上发射完整的尖峰。我们使用我们提出的方法进行了大量实验，使用直接训练方法和ANN-SNN转换方法，结果显示了一致的性能改进。

更新时间: 2024-07-08 08:46:31

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.05739v1

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

Large language models have catalyzed an unprecedented wave in code generation. While achieving significant advances, they blur the distinctions between machine- and human-authored source code, causing integrity and authenticity issues of software artifacts. Previous methods such as DetectGPT have proven effective in discerning machine-generated texts, but they do not identify and harness the unique patterns of machine-generated code. Thus, its applicability falters when applied to code. In this paper, we carefully study the specific patterns that characterize machine- and human-authored code. Through a rigorous analysis of code attributes such as lexical diversity, conciseness, and naturalness, we expose unique patterns inherent to each source. We particularly notice that the syntactic segmentation of code is a critical factor in identifying its provenance. Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code. Diverging from conventional techniques that depend on external LLMs for perturbations, DetectCodeGPT perturbs the code corpus by strategically inserting spaces and newlines, ensuring both efficacy and efficiency. Experiment results show that our approach significantly outperforms state-of-the-art techniques in detecting machine-generated code.

Updated: 2024-07-08 08:45:55

标题: 代码之间：揭示机器和人类程序员的独特模式

摘要: 大型语言模型已经催生了代码生成的前所未有的浪潮。虽然取得了显著进展，但它们模糊了机器生成和人类编写源代码之间的区别，导致软件构件的完整性和真实性问题。先前的方法，如DetectGPT，在区分机器生成的文本方面已被证明是有效的，但它们并没有识别和利用机器生成的代码的独特模式。因此，当应用于代码时，其适用性会受到影响。在本文中，我们仔细研究了表征机器生成和人类编写代码的特定模式。通过对代码属性（如词汇多样性、简洁性和自然性）进行严格分析，我们揭示了每个源的独特模式。我们特别注意到代码的句法分割是识别其来源的关键因素。基于我们的研究结果，我们提出了DetectCodeGPT，一种用于检测机器生成代码的新方法，通过捕捉代码的独特样式化模式，改进了DetectGPT。与依赖外部LLMs进行扰动的传统技术不同，DetectCodeGPT通过策略性地插入空格和换行符来扰动代码语料库，确保了有效性和效率。实验结果表明，我们的方法在检测机器生成代码方面明显优于最先进的技术。

更新时间: 2024-07-08 08:45:55

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.06461v4

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction model, called TransMA. TransMA employs a multi-modal molecular structure fusion architecture, wherein the fine-grained atomic spatial relationship extractor named molecule 3D Transformer captures three-dimensional spatial features of the molecule, and the coarse-grained atomic sequence extractor named molecule Mamba captures one-dimensional molecular features. We design the mol-attention mechanism block, enabling it to align coarse and fine-grained atomic features and captures relationships between atomic spatial and sequential structures. TransMA achieves state-of-the-art performance in predicting transfection efficiency using the scaffold and cliff data splitting methods on the current largest LNPs dataset, including Hela and RAW cell lines. Moreover, we find that TransMA captures the relationship between subtle structural changes and significant transfection efficiency variations, providing valuable insights for LNPs design. Additionally, TransMA's predictions on external transfection efficiency data maintain a consistent order with actual transfection efficiencies, demonstrating its robust generalization capability. The code, model and data are made publicly available at https://github.com/wklix/TransMA/tree/master. We hope that high-accuracy transfection prediction models in the future can aid in LNPs design and initial screening, thereby assisting in accelerating the mRNA design process.

Updated: 2024-07-08 08:43:32

标题: TransMA：一种可解释的用于预测mRNA传递中离子脂质纳米粒子性质的多模态深度学习模型

摘要: 作为主要的mRNA传递载体，离子性脂质纳米颗粒(LNPs)展现出极佳的安全性、高转染效率和强烈的免疫应答诱导作用。然而，LNPs的筛选过程耗时且成本高昂。为加速高转染效率mRNA药物传递系统的识别，我们提出了一种可解释的LNPs转染效率预测模型，称为TransMA。TransMA采用多模态分子结构融合架构，其中细粒度原子空间关系提取器命名为分子3D Transformer捕获分子的三维空间特征，而粗粒度原子序列提取器命名为分子Mamba捕获一维分子特征。我们设计了mol-attention机制块，使其能够对齐粗粒度和细粒度的原子特征，并捕获原子空间和序列结构之间的关系。TransMA在当前最大的LNPs数据集上使用脚手架和悬崖数据拆分方法实现了预测转染效率的最新性能，包括Hela和RAW细胞系。此外，我们发现TransMA捕获了细微结构变化与显著转染效率变化之间的关系，为LNPs设计提供了有价值的见解。此外，TransMA对外部转染效率数据的预测与实际转染效率维持一致的顺序，展示了其强大的泛化能力。代码、模型和数据可在https://github.com/wklix/TransMA/tree/master上公开获得。我们希望未来高精度的转染预测模型能够帮助LNPs设计和初步筛选，从而加速mRNA设计过程。

更新时间: 2024-07-08 08:43:32

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05736v1

FairPFN: Transformers Can do Counterfactual Fairness

Machine Learning systems are increasingly prevalent across healthcare, law enforcement, and finance but often operate on historical data, which may carry biases against certain demographic groups. Causal and counterfactual fairness provides an intuitive way to define fairness that closely aligns with legal standards. Despite its theoretical benefits, counterfactual fairness comes with several practical limitations, largely related to the reliance on domain knowledge and approximate causal discovery techniques in constructing a causal model. In this study, we take a fresh perspective on counterfactually fair prediction, building upon recent work in in context learning (ICL) and prior fitted networks (PFNs) to learn a transformer called FairPFN. This model is pretrained using synthetic fairness data to eliminate the causal effects of protected attributes directly from observational data, removing the requirement of access to the correct causal model in practice. In our experiments, we thoroughly assess the effectiveness of FairPFN in eliminating the causal impact of protected attributes on a series of synthetic case studies and real world datasets. Our findings pave the way for a new and promising research area: transformers for causal and counterfactual fairness.

Updated: 2024-07-08 08:36:44

标题: 公平PFN：变压器可以实现反事实公平

摘要: 机器学习系统在医疗、执法和金融领域越来越普遍，但通常是基于历史数据运行，这些数据可能对某些人口群体存在偏见。因果和反事实公平性提供了一种直观的定义公平性的方式，与法律标准密切相关。尽管反事实公平性在理论上有益处，但实践中存在一些限制，主要与依赖领域知识和近似因果发现技术构建因果模型有关。在这项研究中，我们对反事实公平预测采用了新的视角，借鉴了最近在上下文学习（ICL）和先验拟合网络（PFNs）方面的工作，学习了一种称为FairPFN的转换器。该模型使用合成公平性数据进行预训练，以消除来自观测数据中受保护属性的因果影响，消除了在实践中访问正确因果模型的要求。在我们的实验中，我们充分评估了FairPFN在一系列合成案例研究和真实世界数据集中消除受保护属性对因果影响的有效性。我们的研究结果为一个新的且有前景的研究领域铺平了道路：用于因果和反事实公平性的转换器。

更新时间: 2024-07-08 08:36:44

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.05732v1

Function+Data Flow: A Framework to Specify Machine Learning Pipelines for Digital Twinning

The development of digital twins (DTs) for physical systems increasingly leverages artificial intelligence (AI), particularly for combining data from different sources or for creating computationally efficient, reduced-dimension models. Indeed, even in very different application domains, twinning employs common techniques such as model order reduction and modelization with hybrid data (that is, data sourced from both physics-based models and sensors). Despite this apparent generality, current development practices are ad-hoc, making the design of AI pipelines for digital twinning complex and time-consuming. Here we propose Function+Data Flow (FDF), a domain-specific language (DSL) to describe AI pipelines within DTs. FDF aims to facilitate the design and validation of digital twins. Specifically, FDF treats functions as first-class citizens, enabling effective manipulation of models learned with AI. We illustrate the benefits of FDF on two concrete use cases from different domains: predicting the plastic strain of a structure and modeling the electromagnetic behavior of a bearing.

Updated: 2024-07-08 08:28:34

标题: 功能+数据流：一种用于指定数字孪生机器学习管道的框架

摘要: 数字孪生体（DTs）的发展越来越多地利用人工智能（AI），特别是用于将来自不同来源的数据进行组合或创建计算效率高、维度减少的模型。事实上，即使在非常不同的应用领域，数字孪生体也采用常见的技术，比如模型降阶和混合数据建模（即来自基于物理模型和传感器的数据）。尽管如此，当前的开发实践是临时性的，使得为数字孪生体设计人工智能管道变得复杂且耗时。在这里，我们提出了一个称为Function+Data Flow（FDF）的特定领域语言（DSL）来描述数字孪生体中的人工智能管道。FDF旨在促进数字孪生体的设计和验证。具体而言，FDF将函数视为一等公民，使得对使用人工智能学习的模型进行有效操作成为可能。我们通过两个不同领域的具体用例展示了FDF的好处：预测结构的塑性应变和建模轴承的电磁行为。

更新时间: 2024-07-08 08:28:34

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.19670v2

Implementing a hybrid approach in a knowledge engineering process to manage technical advice relating to feedback from the operation of complex sensitive equipment

How can technical advice on operating experience feedback be managed efficiently in an organization that has never used knowledge engineering techniques and methods? This article explains how an industrial company in the nuclear and defense sectors adopted such an approach, adapted to its "TA KM" organizational context and falls within the ISO30401 framework, to build a complete system with a "SARBACANES" application to support its business processes and perpetuate its know-how and expertise in a knowledge base. Over and above the classic transfer of knowledge between experts and business specialists, SARBACANES also reveals the ability of this type of engineering to deliver multi-functional operation. Modeling was accelerated by the use of a tool adapted to this type of operation: the Ardans Knowledge Maker platform.

Updated: 2024-07-08 08:17:10

标题: 采用混合方法在知识工程过程中实施，以管理与复杂敏感设备操作反馈相关的技术建议

摘要: 如何有效地管理关于操作经验反馈的技术建议，对于一个从未使用知识工程技术和方法的组织来说，这是一个问题。本文解释了一个在核能和国防领域的工业公司如何采用这种方法，适应其“TA KM”组织背景，并符合ISO30401框架，建立了一个完整的系统，其中包括一个“SARBACANES”应用程序，以支持其业务流程，并使其知识库中的专业知识得以传承。除了专家和业务专家之间的经典知识传递之外，SARBACANES还展示了这种工程类型提供多功能操作的能力。建模过程加快了，得益于使用适合这种操作的工具：Ardans Knowledge Maker平台。

更新时间: 2024-07-08 08:17:10

领域: cs.AI

下载: http://arxiv.org/abs/2407.05714v1

Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge

Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction and predicting their timing. Our method first detects all potential active objects in the last frame of egocentric video by fine-tuning a pre-trained YOLOv9. Then, we combine these potential active objects as query with transformer encoder, thereby identifying the most promising next active object and predicting its future interaction and time-to-contact. Experimental results demonstrate that our method outperforms state-of-the-art models on the challenge test set, achieving the best performance in predicting next active objects and their interactions. Finally, our proposed ranked the third overall top-5 mAP when including time-to-contact predictions. The source code is available at https://github.com/KeenyJin/SOIA-DOD.

Updated: 2024-07-08 08:13:16

标题: 短期目标交互预测与解耦目标检测@Ego4D短期目标交互预测挑战

摘要: 短期对象交互预测是自我中心视频分析中的重要任务，包括对未来交互及其时间的精确预测，以及涉及的活动对象的类别和位置。为了减轻这一任务的复杂性，我们提出的方法SOIA-DOD有效地将其分解为1）检测活动对象和2）分类交互并预测它们的时间。我们的方法首先通过微调预训练的YOLOv9在自我中心视频的最后一帧中检测所有潜在的活动对象。然后，我们将这些潜在的活动对象与变压器编码器结合起来，从而识别最有前途的下一个活动对象，并预测其未来的交互和接触时间。实验结果表明，我们的方法在挑战测试集上优于最先进的模型，在预测下一个活动对象及其交互方面表现最佳。最后，我们的方法在包括接触时间预测时排名第三的整体前5 mAP。源代码可在https://github.com/KeenyJin/SOIA-DOD上找到。

更新时间: 2024-07-08 08:13:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05713v1

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

Updated: 2024-07-08 08:12:57

标题: MobilePortrait：移动设备上实时一次性神经头像

摘要: 现有的神经头像方法在肖像动画的图像质量和运动范围方面取得了显著进展。然而，这些方法忽视了计算开销，并据我们所知，没有一个是设计用于在移动设备上运行的。本文介绍了MobilePortrait，一种轻量级的一次性神经头像方法，通过将外部知识整合到动作建模和图像合成中，降低了学习复杂性，实现了移动设备上的实时推断。具体而言，我们引入了显式和隐式关键点的混合表示，用于精确的动作建模，以及预先计算的视觉特征，用于增强前景和背景合成。通过这两个关键设计，并使用简单的U-Nets作为骨干，我们的方法以不到十分之一的计算需求实现了最先进的性能。它已经验证可以在移动设备上达到超过100 FPS的速度，并支持视频和音频驱动的输入。

更新时间: 2024-07-08 08:12:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05712v1

Can machine learning solve the challenge of adaptive learning and the individualization of learning paths? A field experiment in an online learning platform

The individualization of learning contents based on digital technologies promises large individual and social benefits. However, it remains an open question how this individualization can be implemented. To tackle this question we conduct a randomized controlled trial on a large digital self-learning platform. We develop an algorithm based on two convolutional neural networks that assigns tasks to $4,365$ learners according to their learning paths. Learners are randomized into three groups: two treatment groups -- a group-based adaptive treatment group and an individual adaptive treatment group -- and one control group. We analyze the difference between the three groups with respect to effort learners provide and their performance on the platform. Our null results shed light on the multiple challenges associated with the individualization of learning paths.

Updated: 2024-07-08 08:07:35

标题: 机器学习能解决自适应学习和个性化学习路径的挑战吗？在线学习平台上的一项实地实验

摘要: 基于数字技术的学习内容个性化为个人和社会带来了巨大的益处。然而，如何实现这种个性化仍然是一个悬而未决的问题。为了解决这个问题，我们在一个大型数字自主学习平台上进行了一项随机对照试验。我们开发了一个基于两个卷积神经网络的算法，根据学习路径将任务分配给4365名学习者。学习者被随机分为三组：两个处理组——群体自适应处理组和个体自适应处理组——以及一个对照组。我们分析了三组在学习平台上提供的努力和表现之间的差异。我们的无效结果揭示了与学习路径个性化相关的多重挑战。

更新时间: 2024-07-08 08:07:35

领域: cs.LG

下载: http://arxiv.org/abs/2407.03118v2

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant challenge to fine-tuning KGE models efficiently. To address this issue, we propose a fast CKGE framework (\model), incorporating an incremental low-rank adapter (\mec) mechanism to efficiently acquire new knowledge while preserving old knowledge. Specifically, to mitigate catastrophic forgetting, \model\ isolates and allocates new knowledge to specific layers based on the fine-grained influence between old and new KGs. Subsequently, to accelerate fine-tuning, \model\ devises an efficient \mec\ mechanism, which embeds the specific layers into incremental low-rank adapters with fewer training parameters. Moreover, \mec\ introduces adaptive rank allocation, which makes the LoRA aware of the importance of entities and adjusts its rank scale adaptively. We conduct experiments on four public datasets and two new datasets with a larger initial scale. Experimental results demonstrate that \model\ can reduce training time by 34\%-49\% while still achieving competitive link prediction performance against state-of-the-art models on four public datasets (average MRR score of 21.0\% vs. 21.1\%).Meanwhile, on two newly constructed datasets, \model\ saves 51\%-68\% training time and improves link prediction performance by 1.5\%.

Updated: 2024-07-08 08:07:13

标题: 快速和持续的知识图嵌入：通过增量LoRA

摘要: 持续知识图嵌入（CKGE）旨在有效学习新知识并同时保留旧知识。主流方法主要集中在减轻旧知识的灾难性遗忘，但忽略了对新知识的有效学习。然而，在现实世界情景中，知识图（KGs）不断增长，这给有效微调KGE模型带来了重大挑战。为了解决这个问题，我们提出了一个快速的CKGE框架（\model），结合了一个增量低秩适配器（\mec）机制，以便在获取新知识的同时保留旧知识。具体而言，为了减轻灾难性遗忘，\model\根据旧KG和新KG之间的细粒度影响，将新知识隔离并分配到特定层。随后，为了加速微调，\model\设计了一个高效的\mec\机制，将特定层嵌入到具有更少训练参数的增量低秩适配器中。此外，\mec\引入了自适应秩分配，使LoRA了解实体的重要性并自适应调整其秩尺度。我们在四个公共数据集和两个新数据集上进行实验。实验结果表明，\model\可以将训练时间减少34\%-49\%，同时仍在四个公共数据集上取得与最先进模型竞争力的链接预测性能（平均MRR得分为21.0\% vs. 21.1\%）。同时，在两个新构建的数据集上，\model\节省了51\%-68\%的训练时间，并提高了1.5\%的链接预测性能。

更新时间: 2024-07-08 08:07:13

领域: cs.AI

下载: http://arxiv.org/abs/2407.05705v1

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

In this paper, we consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$ stages, and each episode is evaluated with respect to a reward function that will be revealed only at the end of the episode. We propose an algorithm, called APO-MVP, that achieves a regret bound of order $\tilde{\mathcal{O}}(\mathrm{poly}(H)\sqrt{SAT})$, where $S$ and $A$ are sizes of the state and action spaces, respectively. This result improves upon the best-known regret bound by a factor of $\sqrt{S}$, bridging the gap between adversarial and stochastic MDPs, and matching the minimax lower bound $\Omega(\sqrt{H^3SAT})$ as far as the dependencies in $S,A,T$ are concerned. The proposed algorithm and analysis completely avoid the typical tool given by occupancy measures; instead, it performs policy optimization based only on dynamic programming and on a black-box online linear optimization strategy run over estimated advantage functions, making it easy to implement. The analysis leverages two recent techniques: policy optimization based on online linear optimization strategies (Jonckheere et al., 2023) and a refined martingale analysis of the impact on values of estimating transitions kernels (Zhang et al., 2023).

Updated: 2024-07-08 08:06:45

标题: 通过策略优化缩小对抗性和随机MDPs之间的差距

摘要: 在这篇论文中，我们考虑在全信息设定下，学习对抗性马尔可夫决策过程[MDPs]中的问题，其中对手是一个无意识的对手。代理与环境互动共进行$T$个回合，每个回合包含$H$个阶段，并且每个回合根据一个奖励函数进行评估，该奖励函数只在回合结束时才会揭示。我们提出了一种算法，称为APO-MVP，它实现了一个遗憾上界，具有$\tilde{\mathcal{O}}(\mathrm{poly}(H)\sqrt{SAT})$的阶数，其中$S$和$A$分别是状态空间和动作空间的大小。这个结果改进了已知的最佳遗憾上界，通过一个$\sqrt{S}$的因子，弥合了对抗性和随机MDPs之间的差距，并且在$S,A,T$的依赖关系方面与极小化下界$\Omega(\sqrt{H^3SAT})$相匹配。所提出的算法和分析完全遏制了通常使用的占用度量工具；相反，它仅基于动态规划和在估计的优势函数上运行的黑盒在线线性优化策略进行策略优化，使其易于实现。分析利用了两种最近的技术：基于在线线性优化策略的策略优化（Jonckheere等人，2023年）和对估计转换核值的影响进行精细的鞅分析（Zhang等人，2023年）。

更新时间: 2024-07-08 08:06:45

领域: cs.LG

下载: http://arxiv.org/abs/2407.05704v1

Faster Convergence on Heterogeneous Federated Edge Learning: An Adaptive Clustered Data Sharing Approach

Federated Edge Learning (FEEL) emerges as a pioneering distributed machine learning paradigm for the 6G Hyper-Connectivity, harnessing data from the Internet of Things (IoT) devices while upholding data privacy. However, current FEEL algorithms struggle with non-independent and non-identically distributed (non-IID) data, leading to elevated communication costs and compromised model accuracy. To address these statistical imbalances within FEEL, we introduce a clustered data sharing framework, mitigating data heterogeneity by selectively sharing partial data from cluster heads to trusted associates through sidelink-aided multicasting. The collective communication pattern is integral to FEEL training, where both cluster formation and the efficiency of communication and computation impact training latency and accuracy simultaneously. To tackle the strictly coupled data sharing and resource optimization, we decompose the overall optimization problem into the clients clustering and effective data sharing subproblems. Specifically, a distribution-based adaptive clustering algorithm (DACA) is devised basing on three deductive cluster forming conditions, which ensures the maximum sharing yield. Meanwhile, we design a stochastic optimization based joint computed frequency and shared data volume optimization (JFVO) algorithm, determining the optimal resource allocation with an uncertain objective function. The experiments show that the proposed framework facilitates FEEL on non-IID datasets with faster convergence rate and higher model accuracy in a limited communication environment.

Updated: 2024-07-08 08:06:00

标题: 异构联邦边缘学习中更快的收敛速度：一种自适应聚类数据共享方法

摘要: 联合边缘学习（FEEL）作为6G超连接的开创性分布式机器学习范式出现，利用物联网设备的数据，同时保护数据隐私。然而，当前的FEEL算法在处理非独立和非同分布（non-IID）数据时存在困难，导致通信成本升高和模型准确性受损。为了解决FEEL中的这些统计不平衡问题，我们引入了一个集群数据共享框架，通过侧链辅助的多播将部分数据从集群头部选择性地分享给受信任的合作伙伴，从而减轻数据异构性。集体通信模式对于FEEL训练至关重要，其中集群形成和通信以及计算效率同时影响训练时延和准确性。为了解决严格耦合的数据共享和资源优化问题，我们将整体优化问题分解为客户端聚类和有效数据共享子问题。具体来说，基于三种演绎性集群形成条件设计了一个基于分布的自适应聚类算法（DACA），确保最大的共享收益。同时，我们设计了一个基于随机优化的联合计算频率和共享数据量优化（JFVO）算法，确定具有不确定目标函数的最优资源分配。实验证明，所提出的框架在有限的通信环境中促进了FEEL在非IIND数据集上的更快收敛速度和更高的模型准确性。

更新时间: 2024-07-08 08:06:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.09776v2

InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation is the misalignment between the translation of formal and informal languages: translating formal language (i.e., code) to informal language (i.e., natural language) is more straightforward than the reverse. Based on this observation, we propose INVERSE-INSTRUCT, which summarizes instructions from code snippets instead of the reverse. Specifically, given an instruction tuning corpus for code and the resulting instruction-tuned code LLM, we ask the code LLM to generate additional high-quality instructions for the original corpus through code summarization and self-evaluation. Then, we fine-tune the base LLM on the combination of the original corpus and the self-generated one, which yields a stronger instruction-tuned LLM. We present a series of code LLMs named InverseCoder, which surpasses the performance of the original code LLMs on a wide range of benchmarks, including Python text-to-code generation, multilingual coding, and data-science code generation.

Updated: 2024-07-08 08:00:05

标题: InverseCoder：释放逆向指令调整代码LLMs的力量

摘要: 最近开源代码大型语言模型（LLMs）的进展已经通过对来自强大的闭源LLMs（如GPT-3.5和GPT-4）生成的数据进行微调，展示了出色的编码能力。本文探讨了如何通过生成来自自身而非查询闭源LLMs的数据，进一步改进指令调整的代码LLM。我们的关键观察是正式语言和非正式语言之间的翻译存在不匹配：将正式语言（即代码）翻译为非正式语言（即自然语言）比反向翻译更直接。基于这一观察，我们提出了INVERSE-INSTRUCT，它从代码片段汇总指令而非反向。具体来说，给定用于代码的指令调整语料库和结果指令调整的代码LLM，我们要求代码LLM通过代码总结和自我评估为原始语料库生成额外的高质量指令。然后，我们在原始语料库和自动生成的语料库的组合上微调基础LLM，从而产生更强大的指令调整的LLM。我们展示了一系列名为InverseCoder的代码LLMs，在Python文本到代码生成、多语言编码和数据科学代码生成等各种基准测试中超越了原始代码LLMs的表现。

更新时间: 2024-07-08 08:00:05

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.05700v1

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Large Language Models (LLMs) are recruited in applications that span from clinical assistance and legal support to question answering and education. Their success in specialized tasks has led to the claim that they possess human-like linguistic capabilities related to compositional understanding and reasoning. Yet, reverse-engineering is bound by Moravec's Paradox, according to which easy skills are hard. We systematically assess 7 state-of-the-art models on a novel benchmark. Models answered a series of comprehension questions, each prompted multiple times in two settings, permitting one-word or open-length replies. Each question targets a short text featuring high-frequency linguistic constructions. To establish a baseline for achieving human-like performance, we tested 400 humans on the same prompts. Based on a dataset of n=26,680 datapoints, we discovered that LLMs perform at chance accuracy and waver considerably in their answers. Quantitatively, the tested models are outperformed by humans, and qualitatively their answers showcase distinctly non-human errors in language understanding. We interpret this evidence as suggesting that, despite their usefulness in various tasks, current AI models fall short of understanding language in a way that matches humans, and we argue that this may be due to their lack of a compositional operator for regulating grammatical and semantic information.

Updated: 2024-07-08 07:58:34

标题: 在语言理解任务上测试人工智能揭示对基本含义的不敏感性

摘要: 大型语言模型（LLMs）被应用于从临床辅助和法律支持到问题回答和教育等各种应用中。它们在专门任务中取得的成功导致人们声称它们具有类似人类的语言能力，涉及组合理解和推理。然而，根据Moravec的悖论，逆向工程受到限制，即简单的技能很难。我们系统评估了7个最先进的模型在一个新的基准测试上的表现。模型回答了一系列理解问题，每个问题在两种设置下多次提示，允许一词或开放长度的回答。每个问题都针对一个包含高频语言结构的短文本。为了建立实现类似于人类表现的基准线，我们对同样的提示测试了400名人类。基于n=26,680个数据点的数据集，我们发现LLMs在准确率上表现随机，并且在回答中波动较大。从数量上看，经过测试的模型被人类超越，从质量上看，它们的答案展示了明显的非人类语言理解错误。我们解释这一证据表明，尽管它们在各种任务中很有用，但当前的AI模型在理解语言方面未能达到与人类匹配的水平，我们认为这可能是由于它们缺乏用于调节语法和语义信息的组合运算符。

更新时间: 2024-07-08 07:58:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2302.12313v3

Decrypting Nonlinearity: Koopman Interpretation and Analysis of Cryptosystems

Public-key cryptosystems rely on computationally difficult problems for security, traditionally analyzed using number theory methods. In this paper, we introduce a novel perspective on cryptosystems by viewing the Diffie-Hellman key exchange and the Rivest-Shamir-Adleman cryptosystem as nonlinear dynamical systems. By applying Koopman theory, we transform these dynamical systems into higher-dimensional spaces and analytically derive equivalent purely linear systems. This formulation allows us to reconstruct the secret integers of the cryptosystems through straightforward manipulations, leveraging the tools available for linear systems analysis. Additionally, we establish an upper bound on the minimum lifting dimension required to achieve perfect accuracy. Our results on the required lifting dimension are in line with the intractability of brute-force attacks. To showcase the potential of our approach, we establish connections between our findings and existing results on algorithmic complexity. Furthermore, we extend this methodology to a data-driven context, where the Koopman representation is learned from data samples of the cryptosystems.

Updated: 2024-07-08 07:56:35

标题: 解密非线性：Koopman解释和分析加密系统

摘要: 公钥密码系统依赖于计算困难问题来确保安全性，传统上使用数论方法进行分析。在本文中，我们通过将Diffie-Hellman密钥交换和Rivest-Shamir-Adleman密码系统视为非线性动力系统，引入了一种新颖的密码系统视角。通过应用Koopman理论，我们将这些动力系统转换为高维空间，并从中推导出等价的纯线性系统。这种表述允许我们通过简单的操作重构密码系统的秘密整数，利用线性系统分析的工具。此外，我们建立了实现完美准确性所需的最小提升维度的上限。我们对所需提升维度的结果与暴力攻击的难度相一致。为展示我们方法的潜力，我们建立了我们的发现与现有算法复杂性结果之间的联系。此外，我们将这种方法论扩展到数据驱动的上下文中，其中Koopman表示是从密码系统的数据样本中学习的。

更新时间: 2024-07-08 07:56:35

领域: eess.SY,cs.CR,cs.SY,math.DS

下载: http://arxiv.org/abs/2311.12714v2

On the Limitations of Compute Thresholds as a Governance Strategy

At face value, this essay is about understanding a fairly esoteric governance tool called compute thresholds. However, in order to grapple with whether these thresholds will achieve anything, we must first understand how they came to be. This requires engaging with a decades-old debate at the heart of computer science progress, namely, is bigger always better? Hence, this essay may be of interest not only to policymakers and the wider public but also to computer scientists interested in understanding the role of compute in unlocking breakthroughs. Does a certain inflection point of compute result in changes to the risk profile of a model? This discussion is increasingly urgent given the wide adoption of governance approaches that suggest greater compute equates with higher propensity for harm. Several leading frontier AI companies have released responsible scaling policies. Both the White House Executive Orders on AI Safety (EO) and the EU AI Act encode the use of FLOP or floating-point operations as a way to identify more powerful systems. What is striking about the choice of compute thresholds to-date is that no models currently deployed in the wild fulfill the current criteria set by the EO. This implies that the emphasis is often not on auditing the risks and harms incurred by currently deployed models - but rather is based upon the belief that future levels of compute will introduce unforeseen new risks. A key conclusion of this essay is that compute thresholds as currently implemented are shortsighted and likely to fail to mitigate risk. Governance that is overly reliant on compute fails to understand that the relationship between compute and risk is highly uncertain and rapidly changing. It also overestimates our ability to predict what abilities emerge at different scales. This essay ends with recommendations for a better way forward.

Updated: 2024-07-08 07:53:06

标题: 关于将计算阈值作为一种治理策略的局限性

摘要: 在表面上，这篇文章是关于理解一种相当晦涩的治理工具，称为计算门槛。然而，为了弄清这些门槛是否会取得任何成就，我们必须首先了解它们是如何产生的。这需要参与计算机科学进展中几十年来的一个核心辩论，即，更大是否总是更好？因此，这篇文章可能不仅对政策制定者和更广泛的公众感兴趣，也对对了解计算在解锁突破的作用感兴趣的计算机科学家感兴趣。特定的计算拐点会导致模型的风险概况发生变化吗？鉴于治理方法的广泛采用，这种讨论变得越来越迫切，这些方法表明更大的计算等同于更高的伤害倾向。几家领先的前沿人工智能公司已发布了负责任的扩展政策。白宫关于AI安全的行政命令（EO）和欧盟AI法案都将FLOP或浮点操作作为识别更强大系统的一种方式编码。迄今为止，关于计算门槛选择的显著之处在于，目前没有任何模型符合EO设定的当前标准。这意味着重点通常不在审计当前部署模型造成的风险和伤害上 - 而是基于这样一种信念，即未来的计算水平将引入无法预测的新风险。这篇文章的一个关键结论是，目前实施的计算门槛是目光短浅的，并且很可能无法减轻风险。过度依赖计算的治理未能理解计算与风险之间的关系是高度不确定且快速变化的。它还高估了我们预测不同规模下会出现什么能力的能力。这篇文章最后提出了更好前进的建议。

更新时间: 2024-07-08 07:53:06

领域: cs.AI,cs.CL,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.05694v1

An active learning method for solving competitive multi-agent decision-making and control problems

To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.

Updated: 2024-07-08 07:52:09

标题: 一个用于解决竞争性多智能体决策和控制问题的主动学习方法

摘要: 为了确定一个执行私有策略的竞争代理人群体的静止行为配置，我们引入了一种新颖的主动学习方案，其中一个集中的外部观察者（或实体）可以探测代理人的反应，并递归地更新简单的本地参数估计的行为-反应映射。在非常一般的工作假设下（甚至不假定存在静止配置），建立了评估所提出的主动学习方法的渐近性质的充分条件，以便如果表征行动-反应映射的参数收敛，则可以实现一个静止的行为配置。因此，这些条件也作为存在这种配置的证书。涉及典型的竞争多代理控制和决策问题的大量数值模拟说明了所提出的基于学习的方法的实际有效性。

更新时间: 2024-07-08 07:52:09

领域: eess.SY,cs.LG,cs.MA,cs.SY,math.OC

下载: http://arxiv.org/abs/2212.12561v4

Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation

In-context learning (ICL) leverages in-context examples as prompts for the predictions of Large Language Models (LLMs). These prompts play a crucial role in achieving strong performance. However, the selection of suitable prompts from a large pool of labeled examples often entails significant annotation costs. To address this challenge, we propose \textbf{Sub-SA} (\textbf{Sub}modular \textbf{S}elective \textbf{A}nnotation), a submodule-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples and minimizing the time consumption of the selection process. In Sub-SA, we design a submodular function that facilitates effective subset selection for annotation and demonstrates the characteristics of monotonically and submodularity from the theoretical perspective. Specifically, we propose \textbf{RPR} (\textbf{R}eward and \textbf{P}enalty \textbf{R}egularization) to better balance the diversity and representativeness of the unlabeled dataset attributed to a reward term and a penalty term, respectively. Consequently, the selection for annotations can be effectively addressed with a simple yet effective greedy search algorithm based on the submodular function. Finally, we apply the similarity prompt retrieval to get the examples for ICL.

Updated: 2024-07-08 07:47:30

标题: 子SA：通过子模选择性注释加强上下文学习

摘要: 上下文学习（ICL）利用上下文示例作为大型语言模型（LLMs）预测的提示。这些提示在实现强大性能方面起着至关重要的作用。然而，从大量已标记示例中选择合适的提示通常需要显着的注释成本。为了解决这一挑战，我们提出了\textbf{Sub-SA}（基于子模块的选择性注释方法）。Sub-SA的目标是降低注释成本，同时提高上下文示例的质量并最小化选择过程的时间消耗。在Sub-SA中，我们设计了一个子模块函数，促进有效的子集选择进行注释，并从理论角度展示了单调性和子模块性的特征。具体来说，我们提出了\textbf{RPR}（奖励和惩罚正则化），以更好地平衡未标记数据集的多样性和代表性，分别归因于奖励项和惩罚项。因此，通过基于子模块函数的简单而有效的贪婪搜索算法，可以有效地解决注释选择问题。最后，我们应用相似性提示检索来获取ICL的示例。

更新时间: 2024-07-08 07:47:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.05693v1

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules.

Updated: 2024-07-08 07:45:38

标题: 将大型语言模型修剪为具有过渡激活的模块内低秩架构

摘要: 结构化剪枝从根本上减少了大型语言模型（LLMs）的计算和内存开销，并为端到端LLM部署提供了可行的解决方案。经过结构剪枝的模型保持了稠密和高精度，非常适用于进一步调整和压缩。然而，由于粗粒度的结构化剪枝对高度相互连接的模型造成了很大的损害，实现放大LLMs的高压缩比仍然是一个挑战。在本文中，我们介绍了一个与紧凑Transformer架构设计相结合的任务不可知的结构化剪枝方法。所提出的方法名为TransAct，减少了多头注意力（MHA）和多层感知器（MLP）模块内的转换激活，同时保留了对扰动敏感的模块间激活。因此，LLM被剪枝为一个模块内低秩架构，显著减少了权重、KV Cache和注意力计算。TransAct在LLaMA模型上实施，并在下游基准测试中进行评估。结果验证了我们的方法在高压缩方面的优越性，既在效率上又在性能上。此外，消融研究揭示了激活引导的迭代剪枝的优势，并对MHA和MLP模块的冗余进行了实验分析。

更新时间: 2024-07-08 07:45:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.05690v1

Ten Years of Teaching Empirical Software Engineering in the context of Energy-efficient Software

In this chapter we share our experience in running ten editions of the Green Lab course at the Vrije Universiteit Amsterdam, the Netherlands. The course is given in the Software Engineering and Green IT track of the Computer Science Master program of the VU. The course takes place every year over a 2-month period and teaches Computer Science students the fundamentals of Empirical Software Engineering in the context of energy-efficient software. The peculiarity of the course is its research orientation: at the beginning of the course the instructor presents a catalog of scientifically relevant goals, and each team of students signs up for one of them and works together for 2 months on their own experiment for achieving the goal. Each team goes over the classic steps of an empirical study, starting from a precise formulation of the goal and research questions to context definition, selection of experimental subjects and objects, definition of experimental variables, experiment execution, data analysis, and reporting. Over the years, the course became well-known within the Software Engineering community since it led to several scientific studies that have been published at various scientific conferences and journals. Also, students execute their experiments using \textit{open-source tools}, which are developed and maintained by researchers and other students within the program, thus creating a virtuous community of learners where students exchange ideas, help each other, and learn how to collaboratively contribute to open-source projects in a safe environment.

Updated: 2024-07-08 07:44:49

标题: 十年来在节能软件的背景下教授实证软件工程

摘要: 在这一章中，我们分享了我们在荷兰阿姆斯特丹自由大学举办的绿色实验室课程的十届经验。该课程是自由大学计算机科学硕士课程中软件工程和绿色信息技术领域的课程。该课程每年持续两个月，教授计算机科学学生在节能软件背景下的经验性软件工程基础知识。该课程的特殊之处在于其研究导向性：课程开始时，教师呈现了一个科学相关目标目录，每个学生团队选择其中一个目标并共同努力两个月以实现该目标。每个团队经历了经验研究的经典步骤，从明确目标和研究问题到环境定义、实验对象和主体的选择、实验变量的定义、实验执行、数据分析和报告。多年来，该课程在软件工程界中广为人知，因为它导致了几项在各种科学会议和期刊上发表的科学研究。此外，学生使用开源工具执行他们的实验，这些工具由研究人员和其他学生在该课程中开发和维护，从而创建了一个学习者之间交流想法、互相帮助，并在安全环境中学习如何协作贡献于开源项目的善行社区。

更新时间: 2024-07-08 07:44:49

领域: cs.SE,cs.AI,cs.DC

下载: http://arxiv.org/abs/2407.05689v1

Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition

Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts across multiple domains and (ii) the intra-domain shifts stemming from the ambiguous expressions and low inter-class distinctions. In this paper, we propose a novel Learning with Alignments CMFER framework, named LA-CMFER, to handle both inter- and intra-domain shifts. Specifically, LA-CMFER is constructed with a global branch and a local branch to extract features from the full images and local subtle expressions, respectively. Based on this, LA-CMFER presents a dual-level inter-domain alignment method to force the model to prioritize hard-to-align samples in knowledge transfer at a sample level while gradually generating a well-clustered feature space with the guidance of class attributes at a cluster level, thus narrowing the inter-domain shifts. To address the intra-domain shifts, LA-CMFER introduces a multi-view intra-domain alignment method with a multi-view clustering consistency constraint where a prediction similarity matrix is built to pursue consistency between the global and local views, thus refining pseudo labels and eliminating latent noise. Extensive experiments on six benchmark datasets have validated the superiority of our LA-CMFER.

Updated: 2024-07-08 07:43:06

标题: 学习与对齐：解决跨多领域面部表情识别中的领域内和领域间转移

摘要: 面部表情识别（FER）在人机交互中具有重要意义。现有的跨领域FER方法通常将知识仅从单个标记的源领域转移到未标记的目标领域，忽略了跨多个来源的全面信息。然而，跨多领域FER（CMFER）对于（i）多个领域之间固有的跨领域转变和（ii）源自模糊表达和低类间差异的领域内转变非常具有挑战性。在本文中，我们提出了一种新颖的具有对齐的CMFER框架，命名为LA-CMFER，以处理跨领域和领域内转变。具体而言，LA-CMFER由全局分支和局部分支构成，分别从完整图像和局部微妙表情中提取特征。基于此，LA-CMFER提出了一种双层次跨领域对齐方法，以在样本级别强制模型优先考虑难以对齐的样本在知识传递中，同时在聚类级别上在类属性的指导下逐渐生成一个良好聚类的特征空间，从而缩小跨领域转变。为了解决领域内转变，LA-CMFER引入了一个多视角领域内对齐方法，其中包括多视角聚类一致性约束，建立预测相似性矩阵以追求全局和局部视图之间的一致性，从而提炼伪标签并消除潜在的噪声。对六个基准数据集进行的大量实验验证了我们的LA-CMFER的优越性。

更新时间: 2024-07-08 07:43:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05688v1

Multi-Fidelity Bayesian Neural Network for Uncertainty Quantification in Transonic Aerodynamic Loads

Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implement a multi-fidelity Bayesian neural network model that applies transfer learning to fuse data generated by models at different fidelities. Bayesian neural networks use probability distributions over network weights, enabling them to provide predictions along with estimates of their confidence. This approach harnesses the predictive and data fusion capabilities of neural networks while also quantifying uncertainty. The results demonstrate that the multi-fidelity Bayesian model outperforms the state-of-the-art Co-Kriging in terms of overall accuracy and robustness on unseen data.

Updated: 2024-07-08 07:34:35

标题: 跨音速空气动力载荷不确定性量化的多保真度贝叶斯神经网络

摘要: 多保真模型在工程领域越来越普遍，特别是在航空航天领域，因为它们将低保真模型的计算效率与高保真模拟的准确性结合起来。存在各种先进技术用于融合不同保真来源的数据，包括Co-Kriging和神经网络中的迁移学习。本文旨在实现一个多保真贝叶斯神经网络模型，应用迁移学习来融合不同保真模型生成的数据。贝叶斯神经网络使用网络权重上的概率分布，使它们能够提供预测以及其置信度的估计。这种方法利用了神经网络的预测和数据融合能力，同时量化了不确定性。结果表明，多保真贝叶斯模型在整体准确性和对未知数据的稳健性方面优于最先进的Co-Kriging方法。

更新时间: 2024-07-08 07:34:35

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2407.05684v1

RadiomicsFill-Mammo: Synthetic Mammogram Mass Manipulation with Radiomics Features

Motivated by the question, "Can we generate tumors with desired attributes?'' this study leverages radiomics features to explore the feasibility of generating synthetic tumor images. Characterized by its low-dimensional yet biologically meaningful markers, radiomics bridges the gap between complex medical imaging data and actionable clinical insights. We present RadiomicsFill-Mammo, the first of the RadiomicsFill series, an innovative technique that generates realistic mammogram mass images mirroring specific radiomics attributes using masked images and opposite breast images, leveraging a recent stable diffusion model. This approach also allows for the incorporation of essential clinical variables, such as BI-RADS and breast density, alongside radiomics features as conditions for mass generation. Results indicate that RadiomicsFill-Mammo effectively generates diverse and realistic tumor images based on various radiomics conditions. Results also demonstrate a significant improvement in mass detection capabilities, leveraging RadiomicsFill-Mammo as a strategy to generate simulated samples. Furthermore, RadiomicsFill-Mammo not only advances medical imaging research but also opens new avenues for enhancing treatment planning and tumor simulation. Our code is available at https://github.com/nainye/RadiomicsFill.

Updated: 2024-07-08 07:33:52

标题: 放射影像填充乳房X线照片：具有放射影像特征的合成乳腺肿块操纵

摘要: 受到“我们能否生成具有期望属性的肿瘤？”这个问题的启发，这项研究利用放射组学特征探索生成合成肿瘤图像的可行性。放射组学以其低维度但具有生物学意义的标记而著称，弥合了复杂医学成像数据与可操作临床见解之间的差距。我们提出了RadiomicsFill-Mammo，RadiomicsFill系列的第一款创新技术，利用掩蔽图像和相反的乳房图像，借助最近稳定的扩散模型生成模拟乳腺X线质量图像，反映特定的放射组学属性。这种方法还允许将BI-RADS和乳腺密度等重要临床变量作为生成肿块的条件，与放射组学特征一起结合。结果表明，RadiomicsFill-Mammo基于各种放射组学条件有效生成多样化和逼真的肿瘤图像。结果还表明，在肿块检测能力方面取得了显著提高，利用RadiomicsFill-Mammo作为生成模拟样本的策略。此外，RadiomicsFill-Mammo不仅推动了医学成像研究的发展，还为增强治疗规划和肿瘤模拟开辟了新途径。我们的代码可在https://github.com/nainye/RadiomicsFill找到。

更新时间: 2024-07-08 07:33:52

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05683v1

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.

Updated: 2024-07-08 07:32:28

标题: SemEval-2024任务3：对话中的多模态情绪原因分析

摘要: 理解情绪的能力是人类智能的一个重要组成部分，因为情绪在很大程度上影响着人类的认知、决策和社会互动。除了在对话中识别情绪外，识别对话中个体情绪状态背后潜在原因的任务，在许多应用场景中也具有重要意义。我们组织了SemEval-2024 Task 3，名为对话中的多模态情绪原因分析，旨在从对话中提取所有情绪及其对应原因的对。在不同的模态设置下，该任务包括两个子任务：对话中的文本情绪-原因对提取（TECPE）和对话中的多模态情绪-原因对提取（MECPE）。共享任务吸引了143个注册和216个成功提交。在本文中，我们介绍了任务、数据集和评估设置，总结了顶级团队的系统，并讨论了参与者的发现。

更新时间: 2024-07-08 07:32:28

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2405.13049v3

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering

Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.

Updated: 2024-07-08 07:28:24

标题: 细粒度多视角手部重建的逆向渲染方法

摘要: 用复杂的纹理重建高保真手部模型在增强人-物互动和推动现实世界应用方面起着至关重要的作用。尽管目前的技术在纹理生成和图像渲染方面表现出色，但通常在准确捕捉几何细节方面面临挑战。基于学习的方法通常具有更好的鲁棒性和更快的推断速度，这往往会产生更平滑的结果并需要大量的训练数据。为了解决这些问题，我们提出了一种新颖的细粒度多视角手部网格重建方法，利用反渲染来恢复手部姿势和精细细节。首先，我们的方法通过基于图卷积网络（GCN）的方法从多视角图像中预测参数化手部网格模型。我们进一步引入了一种新颖的手部反照率和网格（HAM）优化模块，以改善手部网格和纹理，能够保留网格拓扑结构。此外，我们提出了一种有效的基于网格的神经渲染方案，通过将预训练渲染网络与顶点特征融合，同时生成逼真的图像并优化网格几何形状。我们在InterHand2.6M、DeepHandMesh和我们自己收集的数据集上进行了全面的实验，令人满意的结果表明，我们提出的方法在重建准确性和渲染质量方面优于现有方法。代码和数据集可在https://github.com/agnJason/FMHR 上公开获取。

更新时间: 2024-07-08 07:28:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05680v1

BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model. The multi-modal tokenizer first encodes multi-modality information and the decoder is able to reconstruct the latent BEV tokens into LiDAR and image observations by ray-casting rendering in a self-supervised manner. Then the latent BEV sequence diffusion model predicts future scenarios given action tokens as conditions. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction. Code will be available at https://github.com/zympsyche/BevWorld.

Updated: 2024-07-08 07:26:08

标题: BEVWorld：通过统一的BEV潜在空间实现自动驾驶的多模态世界模型.

摘要: 世界模型在自动驾驶中越来越受到关注，因为它们能够预测潜在的未来情景。在这篇论文中，我们提出了一种新颖的方法BEVWorld，将多模态传感器输入标记为统一且紧凑的鸟瞰（BEV）潜在空间，用于环境建模。该世界模型由两部分组成：多模态标记器和潜在BEV序列扩散模型。多模态标记器首先对多模态信息进行编码，解码器能够通过射线投射渲染以自监督方式将潜在BEV标记重构为LiDAR和图像观测。然后，潜在BEV序列扩散模型根据动作标记作为条件预测未来情景。实验证明了BEVWorld在自动驾驶任务中的有效性，展示了其在生成未来场景并有利于感知和运动预测等下游任务方面的能力。代码可在https://github.com/zympsyche/BevWorld获取。

更新时间: 2024-07-08 07:26:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05679v1

Machine unlearning through fine-grained model parameters perturbation

Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable. In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning.

Updated: 2024-07-08 07:24:49

标题: 通过细粒度模型参数扰动实现机器遗忘

摘要: 机器遗忘技术涉及撤销数据记录和减少该数据对训练模型影响的方法，有助于实现用户隐私保护目标，但会产生显著的计算成本。基于权重扰动的遗忘是一种通用方法，但通常涉及全局修改参数。我们提出了精细化的Top-K和Random-k参数扰动不精确机器遗忘策略，以满足隐私需求并保持计算成本可控。为了证明我们策略的有效性，我们还解决了评估机器遗忘效果的挑战，考虑了模型在遗忘和保留数据上的泛化性能。为了更好地评估遗忘效果和模型泛化性能，我们提出了新颖的度量标准，即遗忘速率和记忆保留率。然而，对于不精确的机器遗忘，当前的度量标准无法准确量化应用遗忘策略后发生的遗忘程度。为了解决这一问题，我们引入了SPD-GAN，微妙地扰动针对遗忘的数据分布。然后，通过测量模型在遗忘过程之前和之后对扰动遗忘数据的性能差异，评估遗忘程度。通过实施这些创新技术和度量标准，我们在机器学习应用中实现了计算有效的隐私保护，而不会明显牺牲模型性能。此外，这种方法为评估遗忘程度提供了一种新颖的方法。

更新时间: 2024-07-08 07:24:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.04385v3

LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies

Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy.

Updated: 2024-07-08 07:17:40

标题: LLM基于可编程策略的开放领域集成任务和知识助手

摘要: 基于LLM的知识和任务助手编程，要忠实符合开发人员提供的政策是具有挑战性的。这些代理必须检索并提供一致、准确和相关的信息，以解决用户的查询和需求。然而，这样的代理会产生毫无根据的回应（“幻觉”）。传统的对话树只能处理有限数量的对话流程，使其本质上脆弱。为此，我们提出了KITA - 一个可编程框架，用于创建处理复杂用户交互的面向任务的对话代理。与LLMs不同，KITA通过其表达式规范KITA工作表提供可靠的基于事实的响应，并通过可控的代理政策。与对话树相比，它对各种用户查询具有弹性，有助于知识来源，并通过其声明性范式提供易编程的政策。通过涉及62名参与者的真实用户研究，我们展示了KITA在执行准确性、对话行为准确性和目标完成率方面分别比GPT-4的功能调用基线高出26.1、22.5和52.4个点。我们还发布了22个经过手动校正以确保准确性的KITA实际用户对话。

更新时间: 2024-07-08 07:17:40

领域: cs.AI,cs.CL,cs.PL

下载: http://arxiv.org/abs/2407.05674v1

MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such oversights inevitably compromise the accuracy of trajectory predictions. To tackle this challenge, we propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. Specifically, the MAH component concurrently captures multiscale motion representation of trajectory sequence from various temporal granularities, utilizing a multi-head attention mechanism. This approach facilitates the modeling of global dependencies in motion across different scales, thereby mitigating the adverse effects of missing values. Additionally, the IIPA module adaptively extracts continuity representation of motion across time steps by analyzing missing patterns in the data. The continuity representation delineates motion trend at a higher level, guiding MSTF to generate predictions consistent with motion continuity. We evaluate our proposed MSTF model using two large-scale real-world datasets. Experimental results demonstrate that MSTF surpasses state-of-the-art (SOTA) models in the task of incomplete trajectory prediction, showcasing its efficacy in addressing the challenges posed by missing values in motion forecasting for autonomous driving systems.

Updated: 2024-07-08 07:10:17

标题: MSTF：不完整轨迹预测的多尺度Transformer

摘要: 运动预测在自动驾驶系统中发挥着关键作用，使车辆能够根据周围车辆的预测执行碰撞警告和合理的本地路径规划。然而，普遍存在的方法通常假设完整的观测轨迹，忽视了由物体遮挡、范围限制和传感器故障引起的缺失值的潜在影响。这种疏忽不可避免地会损害轨迹预测的准确性。为了解决这一挑战，我们提出了一个端到端的框架，称为Multiscale Transformer（MSTF），专门为不完整轨迹预测而设计。MSTF集成了一个Multiscale Attention Head（MAH）和一个基于信息增量的模式自适应（IIPA）模块。具体来说，MAH组件同时捕获来自各种时间粒度的轨迹序列的多尺度运动表示，利用多头注意力机制。这种方法有助于对不同尺度上的运动中的全局依赖关系进行建模，从而减轻缺失值的不利影响。此外，IIPA模块通过分析数据中的缺失模式，自适应地提取运动在时间步之间的连续性表示。连续性表示描绘了更高级别的运动趋势，指导MSTF生成与运动连续性一致的预测。我们使用两个大规模真实世界数据集评估了我们提出的MSTF模型。实验结果表明，MSTF在不完整轨迹预测任务中超过了最先进的模型，在解决自动驾驶系统中运动预测中缺失值带来的挑战方面表现出了有效性。

更新时间: 2024-07-08 07:10:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05671v1

Fractional Budget Allocation for Influence Maximization under General Marketing Strategies

We consider the fractional influence maximization problem, i.e., identifying users on a social network to be incentivized with potentially partial discounts to maximize the influence on the network. The larger the discount given to a user, the higher the likelihood of its activation (adopting a new product or innovation), who then attempts to activate its neighboring users, causing a cascade effect of influence through the network. Our goal is to devise efficient algorithms that assign initial discounts to the network's users to maximize the total number of activated users at the end of the cascade, subject to a constraint on the total sum of discounts given. In general, the activation likelihood could be any non-decreasing function of the discount, whereas, our focus lies on the case when the activation likelihood is an affine function of the discount, potentially varying across different users. As this problem is shown to be NP-hard, we propose and analyze an efficient (1-1/e)-approximation algorithm. Furthermore, we run experiments on real-world social networks to show the performance and scalability of our method.

Updated: 2024-07-08 07:09:11

标题: 通用营销策略下的影响力最大化分数预算分配

摘要: 我们考虑分数影响最大化问题，即在社交网络上识别应该被激励的用户，以最大化网络上的影响力，可能给予部分折扣。对于给予用户的折扣越大，其被激活（采用新产品或创新）的可能性越高，然后试图激活其邻近用户，引发网络中的影响级联效应。我们的目标是设计高效的算法，为网络用户分配初始折扣，以最大化级联结束时被激活的用户总数，同时满足给予折扣总和的约束。一般而言，激活概率可以是任何非递减函数的折扣，而我们的重点在于当激活概率是折扣的一个线性函数时，可能在不同用户之间变化。鉴于这个问题被证明是NP难的，我们提出并分析了一个高效的（1-1/e）-近似算法。此外，我们在真实社交网络上运行实验，展示了我们方法的性能和可扩展性。

更新时间: 2024-07-08 07:09:11

领域: cs.SI,cs.AI,cs.DS,stat.ML,05C85, 60J60, 68R05, 68R10, 68T01, 90C27, 90C35,F.2.2; G.1.2; G.1.6; G.2.1; G.2.2; G.3; I.2.0; J.4

下载: http://arxiv.org/abs/2407.05669v1

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the $F_{1}$-norm (or the related Barron norm) for large width adaptivity. We show that the global minimizer of the regularized loss of DNNs can fit for example the composition of two functions $f^{*}=h\circ g$ from a small number of observations, assuming $g$ is smooth/regular and reduces the dimensionality (e.g. $g$ could be the modulo map of the symmetries of $f^{*}$), so that $h$ can be learned in spite of its low regularity. The measures of regularity we consider is the Sobolev norm with different levels of differentiability, which is well adapted to the $F_{1}$ norm. We compute scaling laws empirically and observe phase transitions depending on whether $g$ or $h$ is harder to learn, as predicted by our theory.

Updated: 2024-07-08 06:59:29

标题: 深度神经网络如何打破维度灾难：组合性和对称性学习

摘要: 我们展示了深度神经网络（DNNs）能够高效地学习任何具有有界$F_{1}$-范数的函数组合，这使得DNNs能够打破维度诅咒，而浅层网络无法做到。更具体地，我们推导了一个泛化界限，结合了覆盖数论证对复合性的影响，以及$F_{1}$-范数（或相关的Barron范数）对宽度适应性的影响。我们展示了DNNs正则化损失的全局最小化器可以从少量观测中拟合两个函数的组合，例如$f^{*}=h\circ g$，假设$g$是光滑的/规则的，且降低了维度（例如$g$可以是$f^{*}$的对称性的模映射），这样$h$可以被学习，尽管其正则性较低。我们考虑的正则性度量是具有不同可微性水平的Sobolev范数，这对$F_{1}$范数非常适合。我们经验性地计算了缩放规律，并观察到依赖于$g$或$h$哪个更难学习的相位转变，符合我们的理论预测。

更新时间: 2024-07-08 06:59:29

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05664v1

Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum

Graph-based fraud detection (GFD) can be regarded as a challenging semi-supervised node binary classification task. In recent years, Graph Neural Networks (GNN) have been widely applied to GFD, characterizing the anomalous possibility of a node by aggregating neighbor information. However, fraud graphs are inherently heterophilic, thus most of GNNs perform poorly due to their assumption of homophily. In addition, due to the existence of heterophily and class imbalance problem, the existing models do not fully utilize the precious node label information. To address the above issues, this paper proposes a semi-supervised GNN-based fraud detector SEC-GFD. This detector includes a hybrid filtering module and a local environmental constraint module, the two modules are utilized to solve heterophily and label utilization problem respectively. The first module starts from the perspective of the spectral domain, and solves the heterophily problem to a certain extent. Specifically, it divides the spectrum into various mixed-frequency bands based on the correlation between spectrum energy distribution and heterophily. Then in order to make full use of the node label information, a local environmental constraint module is adaptively designed. The comprehensive experimental results on four real-world fraud detection datasets denote that SEC-GFD outperforms other competitive graph-based fraud detectors. We release our code at https://github.com/Sunxkissed/SEC-GFD.

Updated: 2024-07-08 06:54:37

标题: 重新审视基于图的欺诈检测：异质性和频谱视角

摘要: 基于图的欺诈检测（GFD）可以被视为一项具有挑战性的半监督节点二元分类任务。近年来，图神经网络（GNN）已被广泛应用于GFD，通过聚合邻居信息来表征节点的异常可能性。然而，欺诈图在本质上是异质的，因此大多数GNN由于假设同质性而表现不佳。此外，由于异质性和类别不平衡问题的存在，现有模型并没有充分利用宝贵的节点标签信息。为了解决上述问题，本文提出了一种基于半监督GNN的欺诈检测器SEC-GFD。该检测器包括混合过滤模块和局部环境约束模块，这两个模块分别用于解决异质性和标签利用问题。第一个模块从谱域的角度出发，解决了异质性问题到一定程度。具体而言，它基于频谱能量分布和异质性之间的相关性将频谱分成各种混合频段。然后为了充分利用节点标签信息，设计了一个自适应的局部环境约束模块。对四个真实世界的欺诈检测数据集的综合实验结果表明，SEC-GFD优于其他具有竞争力的基于图的欺诈检测器。我们在https://github.com/Sunxkissed/SEC-GFD 上发布了我们的代码。

更新时间: 2024-07-08 06:54:37

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2312.06441v3

Random Features Hopfield Networks generalize retrieval to previously unseen examples

It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model. In this work we reveal that the network also develops attractors corresponding to previously unseen examples generated with the same set of features. We explain this surprising behaviour in terms of spurious states of the learned features: we argue that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples. We support this claim with the computation of the phase diagram of the model.

Updated: 2024-07-08 06:35:13

标题: 随机特征霍普菲尔德网络将检索推广到以前未见的示例

摘要: 最近研究表明，当一个Hopfield网络存储由随机特征叠加生成的示例时，会发生学习转变，模型中出现对应这些特征的新吸引子。在这项研究中，我们发现网络还会发展出对应于以相同特征集生成的先前未见示例的吸引子。我们解释这种令人惊讶的行为是由于学习特征的虚假状态：我们认为，当存储的示例数量超过学习转变时，模型也会学会混合特征来表示已存储和先前未见的示例。我们通过计算模型的相图来支持这一观点。

更新时间: 2024-07-08 06:35:13

领域: cond-mat.dis-nn,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05658v1

Multi-label Learning with Random Circular Vectors

The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance. While deep neural networks~(DNNs) have demonstrated remarkable success in XMC problems, the task is still challenging because it must deal with a large number of output labels, which make the DNN training computationally expensive. This paper addresses the issue by exploring the use of random circular vectors, where each vector component is represented as a complex amplitude. In our framework, we can develop an output layer and loss function of DNNs for XMC by representing the final output layer as a fully connected layer that directly predicts a low-dimensional circular vector encoding a set of labels for a data instance. We conducted experiments on synthetic datasets to verify that circular vectors have better label encoding capacity and retrieval ability than normal real-valued vectors. Then, we conducted experiments on actual XMC datasets and found that these appealing properties of circular vectors contribute to significant improvements in task performance compared with a previous model using random real-valued vectors, while reducing the size of the output layers by up to 99%.

Updated: 2024-07-08 06:29:46

标题: 使用随机圆形向量进行多标签学习

摘要: 极端多标签分类(XMC)任务涉及学习一个能够从大型标签集中预测数据实例的最相关子集标签的分类器。虽然深度神经网络(DNNs)在XMC问题中取得了显著的成功，但该任务仍然具有挑战性，因为它必须处理大量的输出标签，这使得DNN训练计算成本高昂。本文通过探索随机圆形向量的使用来解决这个问题，其中每个向量分量都表示为复幅。在我们的框架中，我们可以通过将最终输出层表示为直接预测一个低维圆形向量来开发XMC的DNN的输出层和损失函数，该向量编码了数据实例的标签集。我们在合成数据集上进行了实验，验证了圆形向量具有比普通实值向量更好的标签编码能力和检索能力。然后，我们在实际XMC数据集上进行了实验，并发现圆形向量的这些吸引人属性相比于使用随机实值向量的先前模型，有助于显著改善任务性能，同时将输出层的大小减少了高达99%。

更新时间: 2024-07-08 06:29:46

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.05656v1

Problem-Solving in Language Model Networks

To improve the reasoning and question-answering capabilities of Large Language Models (LLMs), several multi-agent approaches have been introduced. While these methods enhance performance, the application of collective intelligence-based approaches to complex network structures and the dynamics of agent interactions remain underexplored. This work extends the concept of multi-agent debate to more general network topologies, measuring the question-answering accuracy, influence, consensus, and the effects of bias on the collective. The results show that random networks perform similarly to fully connected networks despite using significantly fewer tokens. Furthermore, a strong consensus among agents in correlates with correct answers, whereas divided responses typically indicate incorrect answers. Analysing the influence of the agents reveals a balance between self-reflection and interconnectedness; self-reflection aids when local interactions are incorrect, and local interactions aid when the agent itself is incorrect. Additionally, bias plays a strong role in system performance with correctly biased hub nodes boosting performance. These insights suggest that using random networks or scale-free networks with knowledgeable agents placed in central positions can enhance the overall performance of multi-agent systems.

Updated: 2024-07-08 06:23:18

标题: 语言模型网络中的问题解决

摘要: 为了提高大型语言模型（LLMs）的推理和问答能力，引入了几种多Agent方法。虽然这些方法提高了性能，但对复杂网络结构和Agent互动动态的集体智能方法的应用尚未得到充分探索。本研究将多Agent辩论的概念扩展到更一般的网络拓扑结构，衡量了问答准确性、影响力、共识以及偏见对集体的影响。结果表明，随机网络的表现与完全连接的网络类似，尽管使用的令牌数量显著较少。此外，在Agent之间形成强烈的共识与正确答案相关，而分歧的回应通常表明答案不正确。分析Agent的影响力揭示了自我反思和互连性之间的平衡；当局部交互不正确时，自我反思有助于，当Agent本身不正确时，局部交互有助于。此外，偏见在系统性能中发挥了重要作用，具有正确偏见的中心节点可以提升性能。这些见解表明，使用随机网络或规模自由网络，将知识Agent置于中心位置，可以提升多Agent系统的整体性能。

更新时间: 2024-07-08 06:23:18

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.12374v2

The Dynamic Net Architecture: Learning Robust and Holistic Visual Representations Through Self-Organizing Networks

We present a novel intelligent-system architecture called "Dynamic Net Architecture" (DNA) that relies on recurrence-stabilized networks and discuss it in application to vision. Our architecture models a (cerebral cortical) area wherein elementary feature neurons encode details of visual structures, and coherent nets of such neurons model holistic object structures. By interpreting smaller or larger coherent pieces of an area network as complex features, our model encodes hierarchical feature representations essentially different than artificial neural networks (ANNs). DNA models operate on a dynamic connectionism principle, wherein neural activations stemming from initial afferent signals undergo stabilization through a self-organizing mechanism facilitated by Hebbian plasticity alongside periodically tightening inhibition. In contrast to ANNs, which rely on feed-forward connections and backpropagation of error, we posit that this processing paradigm leads to highly robust representations, as by employing dynamic lateral connections, irrelevant details in neural activations are filtered out, freeing further processing steps from distracting noise and premature decisions. We empirically demonstrate the viability of the DNA by composing line fragments into longer lines and show that the construction of nets representing lines remains robust even with the introduction of up to $59\%$ noise at each spatial location. Furthermore, we demonstrate the model's capability to reconstruct anticipated features from partially obscured inputs and that it can generalize to patterns not observed during training. In this work, we limit the DNA to one cortical area and focus on its internals while providing insights into a standalone area's strengths and shortcomings. Additionally, we provide an outlook on how future work can implement invariant object recognition by combining multiple areas.

Updated: 2024-07-08 06:22:10

标题: 动态网络架构：通过自组织网络学习稳健和整体的视觉表征

摘要: 我们提出了一种名为“动态网络架构”（DNA）的新型智能系统架构，它依赖于循环稳定网络，并讨论了其在视觉应用中的应用。我们的架构模拟了一个（大脑皮层）区域，其中基本特征神经元编码视觉结构的细节，而这些神经元的连贯网络则模拟整体对象结构。通过将区域网络的较小或较大连贯部分解释为复杂特征，我们的模型编码了与人工神经网络（ANNs）本质上不同的分层特征表示。 DNA模型基于动态连接主义原则运作，其中源自初始传入信号的神经激活通过自组织机制进行稳定化，该机制借助Hebbian可塑性和定期收紧抑制。与依赖前馈连接和误差反向传播的ANNs相比，我们认为这种处理范式导致高度稳健的表示，因为通过采用动态横向连接，神经激活中的无关细节被过滤掉，使进一步处理步骤摆脱干扰噪音和过早决策。我们通过将线段组合成较长线段的方式在实验中证明了DNA的可行性，并展示了即使在每个空间位置引入高达59％的噪音，表示线段的网络的构建仍然稳健。此外，我们展示了该模型能够从部分遮挡的输入中重建预期的特征，并且它能够推广到训练期间未观察到的模式。在这项工作中，我们将DNA限制在一个大脑皮层区域，并关注其内部结构，同时提供有关一个独立区域的优势和劣势的见解。此外，我们展望了未来如何通过结合多个区域来实现不变的对象识别。

更新时间: 2024-07-08 06:22:10

领域: cs.CV,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05650v1

Graph Attention with Random Rewiring

Graph Neural Networks (GNNs) have become fundamental in graph-structured deep learning. Key paradigms of modern GNNs include message passing, graph rewiring, and Graph Transformers. This paper introduces Graph-Rewiring Attention with Stochastic Structures (GRASS), a novel GNN architecture that combines the advantages of these three paradigms. GRASS rewires the input graph by superimposing a random regular graph, enhancing long-range information propagation while preserving structural features of the input graph. It also employs a unique additive attention mechanism tailored for graph-structured data, providing a graph inductive bias while remaining computationally efficient. Our empirical evaluations demonstrate that GRASS achieves state-of-the-art performance on multiple benchmark datasets, confirming its practical efficacy.

Updated: 2024-07-08 06:21:56

标题: 使用随机重连的图注意力网络

摘要: 图神经网络（GNNs）已经成为图结构深度学习中的基础。现代GNNs的关键范式包括消息传递、图重连和图变换器。本文介绍了一种新颖的GNN架构，称为带随机结构的图重连注意力（GRASS），它结合了这三种范式的优点。GRASS通过叠加一个随机正则图来重新连接输入图，增强了远程信息传播，同时保留了输入图的结构特征。它还采用了一种针对图结构数据定制的独特的加性注意力机制，提供了图归纳偏见，同时保持了计算效率。我们的实证评估表明，GRASS在多个基准数据集上实现了最先进的性能，证实了其实际有效性。

更新时间: 2024-07-08 06:21:56

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.05649v1

Deep Learning-based Anomaly Detection and Log Analysis for Computer Networks

Computer network anomaly detection and log analysis, as an important topic in the field of network security, has been a key task to ensure network security and system reliability. First, existing network anomaly detection and log analysis methods are often challenged by high-dimensional data and complex network topologies, resulting in unstable performance and high false-positive rates. In addition, traditional methods are usually difficult to handle time-series data, which is crucial for anomaly detection and log analysis. Therefore, we need a more efficient and accurate method to cope with these problems. To compensate for the shortcomings of current methods, we propose an innovative fusion model that integrates Isolation Forest, GAN (Generative Adversarial Network), and Transformer with each other, and each of them plays a unique role. Isolation Forest is used to quickly identify anomalous data points, and GAN is used to generate synthetic data with the real data distribution characteristics to augment the training dataset, while the Transformer is used for modeling and context extraction on time series data. The synergy of these three components makes our model more accurate and robust in anomaly detection and log analysis tasks. We validate the effectiveness of this fusion model in an extensive experimental evaluation. Experimental results show that our model significantly improves the accuracy of anomaly detection while reducing the false alarm rate, which helps to detect potential network problems in advance. The model also performs well in the log analysis task and is able to quickly identify anomalous behaviors, which helps to improve the stability of the system. The significance of this study is that it introduces advanced deep learning techniques, which work anomaly detection and log analysis.

Updated: 2024-07-08 06:07:51

标题: 基于深度学习的计算机网络异常检测和日志分析

摘要: 计算机网络异常检测和日志分析作为网络安全领域中的重要课题，是确保网络安全和系统可靠性的关键任务。首先，现有的网络异常检测和日志分析方法往往受到高维数据和复杂网络拓扑的挑战，导致性能不稳定和高误报率。此外，传统方法通常难以处理时间序列数据，这对于异常检测和日志分析至关重要。因此，我们需要一种更高效和准确的方法来解决这些问题。为弥补当前方法的不足，我们提出了一种创新的融合模型，将孤立森林、GAN（生成对抗网络）和Transformer相互整合，每个组件都发挥着独特的作用。孤立森林用于快速识别异常数据点，GAN用于生成具有真实数据分布特征的合成数据来增加训练数据集，而Transformer用于对时间序列数据进行建模和上下文提取。这三个组件的协同作用使得我们的模型在异常检测和日志分析任务中更加准确和稳健。我们通过大量实验评估验证了这一融合模型的有效性。实验结果显示，我们的模型显著提高了异常检测的准确性，同时降低了误报率，有助于提前发现潜在的网络问题。该模型在日志分析任务中表现良好，能够快速识别异常行为，有助于提高系统的稳定性。这项研究的重要性在于引入了先进的深度学习技术，用于异常检测和日志分析。

更新时间: 2024-07-08 06:07:51

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.05639v1

Analysis and Predictive Modeling of Solar Coronal Holes Using Computer Vision and LSTM Networks

In the era of space exploration, coronal holes on the sun play a significant role due to their impact on satellites and aircraft through their open magnetic fields and increased solar wind emissions. This study employs computer vision techniques to detect coronal hole regions and estimate their sizes using imagery from the Solar Dynamics Observatory (SDO). Additionally, we utilize deep learning methods, specifically Long Short-Term Memory (LSTM) networks, to analyze trends in the area of coronal holes and predict their areas across various solar regions over a span of seven days. By examining time series data, we aim to identify patterns in coronal hole behavior and understand their potential effects on space weather. This research enhances our ability to anticipate and prepare for space weather events that could affect Earth's technological systems.

Updated: 2024-07-08 06:06:34

标题: 使用计算机视觉和LSTM网络分析和预测太阳日冕空洞

摘要: 在太空探索时代，太阳上的日冕空洞由于其对卫星和飞行器的影响而发挥着重要作用，这是因为它们的开放磁场和增加的太阳风排放。本研究利用计算机视觉技术检测太阳动力学观测卫星（SDO）图像中的日冕空洞区域并估计其大小。此外，我们利用深度学习方法，特别是长短期记忆（LSTM）网络，分析日冕空洞面积的趋势，并预测在七天内跨越各种太阳区域的日冕空洞面积。通过检查时间序列数据，我们旨在识别日冕空洞行为中的模式，并了解它们可能对太空天气的影响。这项研究增强了我们预测和准备可能影响地球技术系统的太空天气事件的能力。

更新时间: 2024-07-08 06:06:34

领域: astro-ph.SR,astro-ph.EP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09802v2

AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing

Private inference (PI) has emerged as a promising solution to execute computations on encrypted data, safeguarding user privacy and model parameters in edge computing. However, existing PI methods are predominantly developed considering constant resource constraints, overlooking the varied and dynamic resource constraints in diverse edge devices, like energy budgets. Consequently, model providers have to design specialized models for different devices, where all of them have to be stored on the edge server, resulting in inefficient deployment. To fill this gap, this work presents AdaPI, a novel approach that achieves adaptive PI by allowing a model to perform well across edge devices with diverse energy budgets. AdaPI employs a PI-aware training strategy that optimizes the model weights alongside weight-level and feature-level soft masks. These soft masks are subsequently transformed into multiple binary masks to enable adjustments in communication and computation workloads. Through sequentially training the model with increasingly dense binary masks, AdaPI attains optimal accuracy for each energy budget, which outperforms the state-of-the-art PI methods by 7.3\% in terms of test accuracy on CIFAR-100. The code of AdaPI can be accessed via https://github.com/jiahuiiiiii/AdaPI.

Updated: 2024-07-08 05:58:49

标题: AdaPI：促进边缘计算中高效私密推断的DNN模型适应性

摘要: 私有推理（PI）已经成为在加密数据上执行计算的一个有前途的解决方案，在边缘计算中保护用户隐私和模型参数。然而，现有的PI方法主要考虑恒定的资源约束，忽视了不同边缘设备中各种不同和动态的资源约束，比如能源预算。因此，模型提供者必须为不同设备设计专门的模型，所有这些模型都必须存储在边缘服务器上，导致部署效率低下。为了填补这一空白，本文提出了AdaPI，这是一种新颖的方法，通过允许模型在具有不同能源预算的边缘设备上表现良好来实现自适应PI。AdaPI采用了一种PI感知训练策略，优化了模型权重以及权重级别和特征级别的软掩模。这些软掩模随后被转换为多个二进制掩模，以便调整通信和计算工作负载。通过逐步训练模型，使用越来越密集的二进制掩模，AdaPI实现了对每个能源预算的最佳准确性，其在CIFAR-100上的测试准确性比最先进的PI方法提高了7.3％。AdaPI的代码可以通过https://github.com/jiahuiiiiii/AdaPI访问。

更新时间: 2024-07-08 05:58:49

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.05633v1

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "$\underline{D}$escribe, $\underline{E}$xplain, $\underline{P}$lan and $\underline{S}$elect" ($\textbf{DEPS}$), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated $\textit{plan}$ by integrating $\textit{description}$ of the plan execution process and providing self-$\textit{explanation}$ of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal $\textit{selector}$, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the $\texttt{ObtainDiamond}$ grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner.

Updated: 2024-07-08 05:56:47

标题: 描述、解释、规划和选择：与大型语言模型进行交互式规划，实现开放世界多任务代理

摘要: 我们研究了多任务具体代理在开放世界环境中的任务规划挑战。确定了两个主要困难：1）在开放世界环境（例如Minecraft）中执行计划需要准确和多步推理，因为任务具有长期性质；2）由于传统计划者在制定复杂计划中不考虑当前代理人如何轻松实现给定子任务，因此产生的计划可能效率低下甚至不可行。为此，我们提出了基于大型语言模型（LLMs）的交互式规划方法“描述、解释、计划和选择”（DEPS），DEPS通过集成计划执行过程的描述并在扩展规划阶段遇到失败时提供自我解释的反馈，有助于更好地纠正初始LLM生成的计划。此外，它包括一个目标选择器，这是一个可训练的模块，根据估计的完成步骤对并行候选子目标进行排名，从而优化初始计划。我们的实验标志着第一个能够稳健地完成70多项Minecraft任务并将整体表现几乎翻倍的零射击多任务代理的里程碑。进一步的测试显示了我们的方法在广泛采用的非开放性领域（如ALFWorld和桌面操作）中的一般有效性。消融和探索性研究详细说明了我们的设计如何击败竞争对手，并通过我们的方法在“ObtainDiamond”大挑战上提供了令人满意的更新。代码发布在https://github.com/CraftJarvis/MC-Planner。

更新时间: 2024-07-08 05:56:47

领域: cs.AI

下载: http://arxiv.org/abs/2302.01560v3

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.

Updated: 2024-07-08 05:53:10

标题: 使用双层自适应损失压缩加速深度学习推荐模型训练中的通信

摘要: DLRM是一种最先进的推荐系统模型，在各种行业应用中得到了广泛采用。然而，DLRM模型的庞大规模需要使用多个设备/GPU进行高效训练。在这个过程中的一个重要瓶颈是需要耗时的全互连通信，以收集来自所有设备的嵌入数据。为了缓解这一问题，我们引入了一种方法，利用错误有界的有损压缩来减小通信数据大小，加速DLRM训练。我们开发了一种新颖的错误有界的有损压缩算法，通过深入分析嵌入数据特征，实现了高压缩比。此外，我们引入了一个双层自适应策略，用于错误界限的调整，涵盖表格级和迭代级两个方面，以平衡压缩带来的好处与对准确性的潜在影响。我们进一步优化了我们的压缩器，使其适用于GPU上的PyTorch张量，最大限度地减少了压缩开销。评估结果显示，我们的方法实现了1.38倍的训练加速，对准确性的影响极小。

更新时间: 2024-07-08 05:53:10

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.04272v2

New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data

The stakeholders' needs in sentiment analysis for various issues, whether positive or negative, are speed and accuracy. One new challenge in sentiment analysis tasks is the limited training data, which often leads to suboptimal machine learning models and poor performance on test data. This paper discusses the problem of text classification based on limited training data (300 to 600 samples) into three classes: positive, negative, and neutral. A benchmark dataset is provided for training and testing data on the issue of Kaesang Pangarep's appointment as Chairman of PSI. External data for aggregation and augmentation purposes are provided, consisting of two datasets: the topic of Covid Vaccination sentiment and an open topic. The official score used is the F1-score, which balances precision and recall among the three classes, positive, negative, and neutral. A baseline score is provided as a reference for researchers for unoptimized classification methods. The optimized score is provided as a reference for the target score to be achieved by any proposed method. Both scoring (baseline and optimized) use the SVM method, which is widely reported as the state-of-the-art in conventional machine learning methods. The F1-scores achieved by the baseline and optimized methods are 40.83% and 51.28%, respectively.

Updated: 2024-07-08 05:42:29

标题: 文献标题翻译：文本分类研究的新方向：最大化有限数据情感分类的性能

摘要: 情感分析中各种问题的利益相关者的需求，无论是积极的还是消极的，都是速度和准确性。情感分析任务中的一个新挑战是有限的训练数据，这经常导致次优的机器学习模型和测试数据的表现不佳。本文讨论了基于有限训练数据（300至600个样本）进行文本分类的问题，分为三类：积极、消极和中性。提供了一个基准数据集，用于Kaesang Pangarep任命为PSI主席的问题的训练和测试数据。提供了用于聚合和增强的外部数据，包括两个数据集：关于Covid疫苗接种情感的话题和一个开放话题。官方评分使用F1分数，平衡了三个类别（积极、消极和中性）之间的精确度和召回率。提供了一个基准分数作为未优化分类方法的参考。优化分数作为任何提议方法应达到的目标分数的参考。基准和优化方法均使用SVM方法进行评分，这在传统机器学习方法中被广泛报道为最先进的技术。基准和优化方法实现的F1分数分别为40.83%和51.28%。

更新时间: 2024-07-08 05:42:29

领域: cs.CL,cs.IR,cs.IT,cs.LG,cs.SI,math.IT

下载: http://arxiv.org/abs/2407.05627v1

LLM4DyG: Can Large Language Models Solve Spatial-Temporal Problems on Dynamic Graphs?

In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature. In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions. Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance. Finally, we propose Disentangled Spatial-Temporal Thoughts (DST2) for LLMs on dynamic graphs to enhance LLMs' spatial-temporal understanding abilities. Our main observations are: 1) LLMs have preliminary spatial-temporal understanding abilities on dynamic graphs, 2) Dynamic graph tasks show increasing difficulties for LLMs as the graph size and density increase, while not sensitive to the time span and data generation mechanism, 3) the proposed DST2 prompting method can help to improve LLMs' spatial-temporal understanding abilities on dynamic graphs for most tasks. The data and codes are publicly available at Github.

Updated: 2024-07-08 05:39:38

标题: LLM4DyG：大型语言模型能否解决动态图上的时空问题？

摘要: 在一个以越来越多地采用大型语言模型（LLMs）为特征的时代，人们越来越关注探索LLMs在处理网络数据，特别是图数据方面的能力。动态图在现实世界的网络数据中普遍存在，它们捕捉了网络演化模式。评估LLMs在理解动态图上的时空信息的能力对于它们在网络应用中的采用是至关重要的，而这在文献中尚未被探讨。本文通过提出评估LLMs在动态图上的时空理解能力，填补了这一空白，据我们所知，这是首次。具体来说，我们提出了LLM4DyG基准，其中包括考虑LLMs在时空维度上的能力评估的九个特别设计的任务。然后，我们进行了大量实验，分析了不同数据生成器、数据统计、提示技术和LLMs对模型性能的影响。最后，我们提出了用于动态图上LLMs的解耦空间-时间思维（DST2）来增强LLMs的时空理解能力。我们的主要观察结果是：1）LLMs在动态图上具有初步的时空理解能力，2）动态图任务对LLMs越来越困难，随着图的大小和密度的增加，而不敏感于时间跨度和数据生成机制，3）所提出的DST2提示方法可以帮助改善LLMs在动态图上的时空理解能力。数据和代码可以在Github上公开获取。

更新时间: 2024-07-08 05:39:38

领域: cs.LG

下载: http://arxiv.org/abs/2310.17110v3

New User Event Prediction Through the Lens of Causal Inference

Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this approach requires extensive data to fully understand user behavior, presenting challenges in modeling newcomers without historical knowledge. In this paper, we propose a novel discrete event prediction framework for new users through the lens of causal inference. Our method offers an unbiased prediction for new users without needing to know their categories. We treat the user event history as the ''treatment'' for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, with the new user model trained on an adjusted dataset where each event is re-weighted by its inverse propensity score. We demonstrate the superior performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.

Updated: 2024-07-08 05:35:54

标题: 通过因果推断视角预测新用户事件

摘要: 建模和分析由不同行为模式的异质用户生成的事件序列密切涉及我们的日常生活，包括信用卡欺诈检测、在线平台用户推荐和社交网络分析。这项任务最常采用的方法是将用户分类为基于行为的类别，并分别分析每个用户。然而，这种方法需要大量的数据来充分理解用户行为，同时在建模没有历史知识的新用户时面临挑战。在本文中，我们提出了一个新颖的离散事件预测框架，通过因果推断的视角为新用户提供无偏预测，而无需了解他们的类别。我们将用户事件历史视为未来事件的“处理”，用户类别视为关键混淆因素。因此，预测问题可以被构建为对事实结果估计，新用户模型在调整后的数据集上进行训练，其中每个事件都通过其逆倾向得分重新加权。我们通过数值模拟研究和两个实际应用，包括Netflix评分预测和亚马逊客户支持中卖家联系预测，展示了所提出框架的卓越性能。

更新时间: 2024-07-08 05:35:54

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2407.05625v1

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.

Updated: 2024-07-08 05:30:34

标题: 学习稀疏函数的复杂性：统计和梯度查询

摘要: 本文的目标是研究学习稀疏函数（juntas）时梯度算法的复杂性。我们引入一种统计查询（$\mathsf{SQ}$），称为可微学习查询（$\mathsf{DLQ}$），用于建模关于任意模型的指定损失的梯度查询。我们提供了学习稀疏函数在通用乘积分布上支持的$\mathsf{DLQ}$查询复杂性的紧密表征。这种复杂性关键取决于损失函数。对于平方损失，$\mathsf{DLQ}$与相关统计查询（$\mathsf{CSQ}$）的复杂性相匹配，可能比$\mathsf{SQ}$糟糕得多。但对于其他简单损失函数，包括$\ell_1$损失，$\mathsf{DLQ}$总是达到与$\mathsf{SQ}$相同的复杂性。我们还提供证据表明$\mathsf{DLQ}$确实可以捕捉使用（随机）梯度下降进行学习，因为它正确描述了在均场极限和线性缩放下使用两层神经网络进行学习的复杂性。

更新时间: 2024-07-08 05:30:34

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2407.05622v1

Enhancing Class Fairness in Classification with A Two-Player Game Approach

Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose a FAir Classification approach with a Two-player game (FACT). We first formulate the training of a classifier with data augmentation as a fair optimization problem, which can be further written as an adversarial two-player game. Following this formulation, we propose a novel multiplicative weight optimization algorithm, for which we theoretically prove that it can converge to a solution that is fair over classes. Interestingly, our formulation also reveals that this fairness issue over classes is not due to data augmentation only, but is in fact a general phenomenon. Our empirical experiments demonstrate that the performance of our learned classifiers is indeed more fairly distributed over classes in five datasets, with only limited impact on the average accuracy.

Updated: 2024-07-08 05:21:59

标题: 用两人游戏方法增强分类中的班级公平性

摘要: 数据增强被广泛应用，并在不同的机器学习任务中显示出其益处。然而，正如最近在一些下游任务中观察到的，数据增强可能会对分类产生不公平的影响。虽然它可以提高某些类别的性能，但实际上可能对其他类别有害，这在一些应用领域可能会成为问题。在本文中，为了抵消这种现象，我们提出了一种具有双人游戏（FACT）的公平分类方法。我们首先将带有数据增强的分类器训练形式化为一个公平优化问题，进而将其表示为一种对抗性的双人游戏。根据这种形式化，我们提出了一种新颖的乘法权重优化算法，理论上证明它可以收敛到一个在类别之间公平的解决方案。有趣的是，我们的形式化还揭示了这种类别之间的公平问题并不仅仅是由于数据增强，而实际上是一个普遍现象。我们的实证实验表明，在五个数据集中，我们学习到的分类器的性能确实更加公平地分布在类别之间，并且对平均准确率只有有限的影响。

更新时间: 2024-07-08 05:21:59

领域: cs.CY,cs.AI,cs.CV,cs.GT,cs.LG

下载: http://arxiv.org/abs/2407.03146v2

OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos

It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video. Existing works formulate this problem into finding a single most plausible solution by adding various constraints such as depth priors and strong geometry constraints, ignoring the fact that there could be infinitely many 3D scene representations corresponding to a single dynamic video. In this paper, we aim to learn all plausible 3D scene configurations that match the input video, instead of just inferring a specific one. To achieve this ambitious goal, we introduce a new framework, called OSN. The key to our approach is a simple yet innovative object scale network together with a joint optimization module to learn an accurate scale range for every dynamic 3D object. This allows us to sample as many faithful 3D scene configurations as possible. Extensive experiments show that our method surpasses all baselines and achieves superior accuracy in dynamic novel view synthesis on multiple synthetic and real-world datasets. Most notably, our method demonstrates a clear advantage in learning fine-grained 3D scene geometry. Our code and data are available at https://github.com/vLAR-group/OSN

Updated: 2024-07-08 05:03:46

标题: OSN：来自单目视频的动态3D场景的无限表示

摘要: 长期以来，从单目RGB视频中恢复基础动态3D场景表示一直是一个挑战。现有研究将这个问题转化为通过添加各种约束（如深度先验和强几何约束）来寻找单一最合理的解决方案，忽略了一个事实，即对应于单个动态视频的3D场景表示可能有无限多种。在本文中，我们的目标是学习所有与输入视频匹配的可能的3D场景配置，而不仅仅是推断一个特定的配置。为了实现这个雄心勃勃的目标，我们提出了一个新的框架，称为OSN。我们方法的关键在于一个简单但创新的物体尺度网络，结合一个联合优化模块，学习每个动态3D物体的准确尺度范围。这使我们能够尽可能多地采样忠实的3D场景配置。大量实验证明，我们的方法超越了所有基线，在多个合成和现实世界数据集上实现了动态新视图合成的卓越准确性。最值得注意的是，我们的方法在学习细粒度的3D场景几何方面具有明显优势。我们的代码和数据可在https://github.com/vLAR-group/OSN中找到。

更新时间: 2024-07-08 05:03:46

领域: cs.CV,cs.GR,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.05615v1

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

Recent works show that reducing the number of layers in a convolutional neural network can enhance efficiency while maintaining the performance of the network. Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer. However, these methods suffer from a critical drawback; the kernel size of the merged layers becomes larger, significantly undermining the latency reduction gained from reducing the depth of the network. We show that this problem can be addressed by jointly pruning convolution layers and activation functions. To this end, we propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove, to achieve a desired inference speed-up while minimizing performance loss. Since the corresponding selection problem involves an exponential search space, we formulate a novel surrogate optimization problem and efficiently solve it via dynamic programming. Empirical results demonstrate that our method consistently outperforms existing depth compression and layer pruning methods on various network architectures, both on image classification and generation tasks. We release the code at https://github.com/snu-mllab/LayerMerge.

Updated: 2024-07-08 04:55:34

标题: LayerMerge：通过层修剪和合并进行神经网络深度压缩

摘要: 最近的研究表明，在卷积神经网络中减少层数可以提高效率，同时保持网络的性能。现有的深度压缩方法通过删除冗余的非线性激活函数，并将连续的卷积层合并成一个单独的层来实现。然而，这些方法存在一个关键缺点；合并层的核大小变大，严重削弱了通过减少网络深度获得的延迟减少。我们表明，这个问题可以通过联合修剪卷积层和激活函数来解决。为此，我们提出了LayerMerge，一种新颖的深度压缩方法，选择要删除的激活层和卷积层，以实现所需的推断加速，同时最小化性能损失。由于相应的选择问题涉及指数搜索空间，我们制定了一个新颖的替代优化问题，并通过动态规划有效地解决了它。实证结果表明，我们的方法在各种网络架构上始终优于现有的深度压缩和层修剪方法，无论是在图像分类还是生成任务中。我们将代码发布在https://github.com/snu-mllab/LayerMerge。

更新时间: 2024-07-08 04:55:34

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.12837v3

GenFollower: Enhancing Car-Following Prediction with Large Language Models

Accurate modeling of car-following behaviors is essential for various applications in traffic management and autonomous driving systems. However, current approaches often suffer from limitations like high sensitivity to data quality and lack of interpretability. In this study, we propose GenFollower, a novel zero-shot prompting approach that leverages large language models (LLMs) to address these challenges. We reframe car-following behavior as a language modeling problem and integrate heterogeneous inputs into structured prompts for LLMs. This approach achieves improved prediction performance and interpretability compared to traditional baseline models. Experiments on the Waymo Open datasets demonstrate GenFollower's superior performance and ability to provide interpretable insights into factors influencing car-following behavior. This work contributes to advancing the understanding and prediction of car-following behaviors, paving the way for enhanced traffic management and autonomous driving systems.

Updated: 2024-07-08 04:54:42

标题: GenFollower：利用大型语言模型增强汽车跟随预测

摘要: 汽车跟随行为的准确建模对于交通管理和自动驾驶系统中的各种应用至关重要。然而，当前的方法往往存在诸如对数据质量高度敏感和缺乏可解释性等限制。在本研究中，我们提出了GenFollower，这是一种新颖的零样本提示方法，利用大型语言模型（LLMs）来解决这些挑战。我们将汽车跟随行为重新构建为一个语言建模问题，并将异构输入整合到LLMs的结构化提示中。与传统基线模型相比，这种方法实现了更好的预测性能和可解释性。对Waymo Open数据集的实验表明，GenFollower具有卓越的性能和能力，能够提供对影响汽车跟随行为的因素的可解释见解。这项工作有助于推动对汽车跟随行为的理解和预测，为提升交通管理和自动驾驶系统铺平道路。

更新时间: 2024-07-08 04:54:42

领域: cs.AI

下载: http://arxiv.org/abs/2407.05611v1

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs by generative visual question answering. WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern, in which pathologists can achieve immunohistochemical grading, survival prediction, and tumor subtyping following human-machine interaction. Furthermore, we establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs. Besides the ability to deal with different slide-level tasks, our generative model which is named Wsi2Text Transformer (W2T) outperforms existing discriminative models in medical correctness, which reveals the potential of our model to be applied in the clinical scenario. Additionally, we also visualize the co-attention mapping between word embeddings and WSIs as an intuitive explanation for diagnostic results. The dataset and related code are available at https://github.com/cpystan/WSI-VQA.

Updated: 2024-07-08 04:37:32

标题: WSI-VQA: 通过生成式视觉问答解释整张幻灯片图像

摘要: 全切片成像常规用于癌症诊断和预后。病理医生需要丰富的经验才能准确可靠地诊断全切片图像（WSI）。WSI的巨大尺寸和异质性特征使病理阅片工作流程极为耗时。本文提出了一个新的框架（WSI-VQA），通过生成式视觉问答来解释WSI。WSI-VQA通过重新构建各种幻灯片级任务，以问答模式显示出普适性，病理医生可以通过人机交互实现免疫组织化学分级、生存预测和肿瘤亚型分类。此外，我们建立了一个包含8672个幻灯片级问答对和977个WSI的WSI-VQA数据集。除了能够处理不同的幻灯片级任务外，我们的生成模型Wsi2Text Transformer（W2T）在医学正确性方面胜过现有的判别模型，揭示了我们的模型在临床场景中应用的潜力。此外，我们还将词嵌入和WSI之间的共同关注映射可视化，作为诊断结果的直观解释。数据集和相关代码可在https://github.com/cpystan/WSI-VQA 获取。

更新时间: 2024-07-08 04:37:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05603v1

SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-scale instruction tuning dataset FIT-RS, containing 1,800,851 instruction samples. FIT-RS covers common interpretation tasks and innovatively introduces several complex comprehension tasks of escalating difficulty, ranging from relation reasoning to image-level scene graph generation. Based on FIT-RS, we build the FIT-RSFG benchmark. Furthermore, we establish a new benchmark to evaluate the fine-grained relation comprehension capabilities of LMMs, named FIT-RSRC. Based on combined instruction data, we propose SkySenseGPT, which achieves outstanding performance on both public datasets and FIT-RSFG, surpassing existing RSLMMs. We hope the FIT-RS dataset can enhance the relation comprehension capability of RSLMMs and provide a large-scale fine-grained data source for the remote sensing community. The dataset will be available at https://github.com/Luo-Z13/SkySenseGPT

Updated: 2024-07-08 04:33:37

标题: SkySenseGPT：用于遥感视觉语言理解的细粒度指令调整数据集和模型

摘要: 遥感大型多模态模型（RSLMMs）正在迅速发展，并展示出在遥感图像（RSI）理解方面的显著能力。然而，由于现有数据集的限制，RSLMMs在理解复杂遥感场景中物体之间丰富语义关系方面存在不足。为了解锁RSLMMs的复杂理解能力，我们提出了一个大规模的指导调整数据集FIT-RS，包含1,800,851个指导样本。FIT-RS涵盖常见的解释任务，并创新地引入了几个逐渐增加难度的复杂理解任务，从关系推理到图像级场景图生成。基于FIT-RS，我们构建了FIT-RSFG基准。此外，我们建立了一个新的基准来评估LMMs的细粒度关系理解能力，命名为FIT-RSRC。基于结合的指导数据，我们提出了SkySenseGPT，它在公共数据集和FIT-RSFG上取得了出色的表现，超过了现有的RSLMMs。我们希望FIT-RS数据集可以增强RSLMMs的关系理解能力，并为遥感社区提供一个大规模的细粒度数据来源。该数据集将在https://github.com/Luo-Z13/SkySenseGPT 上提供。

更新时间: 2024-07-08 04:33:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10100v2

Unmasking Trees for Tabular Data

We herein describe UnmaskingTrees, a method and open-source software package for tabular data generation and, especially, imputation. Our experiments suggest that training gradient-boosted trees to incrementally unmask features offers a simple, strong baseline for imputation.

Updated: 2024-07-08 04:15:43

标题: 揭开树木的遮面：用于表格数据

摘要: 我们在这里描述了UnmaskingTrees，这是一种用于生成表格数据和特别是插补的方法和开源软件包。我们的实验表明，训练梯度提升树逐步揭示特征为插补提供了一个简单、强大的基线。

更新时间: 2024-07-08 04:15:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05593v1

A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain

There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to partially overcome privacy problems, but there still remain security concerns, e.g., single-point failure and malicious attacks. In this paper, we propose a framework named DFLoc to achieve precise 3D localization tasks while considering the following two security concerns. Particularly, we design a specialized blockchain to decentralize the framework by distributing the tasks such as model distribution and aggregation which are handled by a central server to all clients in most previous works, to address the issue of the single-point failure for a reliable and accurate indoor localization system. Moreover, we introduce an updated model verification mechanism within the blockchain to alleviate the concern of malicious node attacks. Experimental results substantiate the framework's capacity to deliver accurate 3D location predictions and its superior resistance to the impacts of single-point failure and malicious attacks when compared to conventional centralized federated learning systems.

Updated: 2024-07-08 04:14:19

标题: 一个基于联邦学习和区块链的可信AIoT定位系统

摘要: 在智能建筑中，室内定位技术的需求非常大，而在这个领域中最有前途的解决方案是使用RF传感器和基于指纹的方法，这些方法使用机器学习模型，该模型是根据从物联网设备收集的众包用户数据训练而成。然而，这在实践中会引发安全和隐私问题。一些研究人员建议使用联邦学习来部分克服隐私问题，但仍然存在安全问题，例如单点故障和恶意攻击。在本文中，我们提出了一个名为DFLoc的框架，以实现精确的3D定位任务，同时考虑以下两个安全问题。特别是，我们设计了一个专门的区块链来分散框架，通过将模型分发和聚合等任务分配给所有客户端，以解决在大多数先前工作中由中央服务器处理的单点故障问题，从而建立一个可靠和准确的室内定位系统。此外，我们在区块链中引入了一个更新的模型验证机制，以减轻对恶意节点攻击的担忧。实验结果证实了该框架能够提供准确的3D位置预测，并且与传统的集中式联邦学习系统相比，在单点故障和恶意攻击的影响方面具有更强的抵抗力。

更新时间: 2024-07-08 04:14:19

领域: cs.CR,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.07921v1

On the Power of Convolution Augmented Transformer

The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. CAT incorporates convolutional filters in the K/Q/V embeddings of an attention layer. Through CAT, we show that the locality of the convolution synergizes with the global view of the attention. Unlike comparable architectures, such as Mamba or transformer, CAT can provably solve the associative recall (AR) and copying tasks using a single layer while also enjoying guaranteed length generalization. We also establish computational tradeoffs between convolution and attention by characterizing how convolution can mitigate the need for full attention by summarizing the context window and creating salient summary tokens to attend. Evaluations on real datasets corroborate our findings and demonstrate that CAT and its variations indeed enhance the language modeling performance.

Updated: 2024-07-08 04:08:35

标题: 关于卷积增强变压器的能力

摘要: 变压器架构在语言建模方面催生了革命性的进展。然而，最近的架构配方，如状态空间模型，已经弥合了性能差距。受此启发，我们研究了卷积增强变压器（CAT）在召回、复制和长度泛化任务中的益处。CAT将卷积滤波器整合到注意力层的K/Q/V嵌入中。通过CAT，我们展示了卷积的局部性与注意力的全局视图相辅相成。与可比较的架构（如Mamba或变压器）不同，CAT可以证明使用单层解决联想召回（AR）和复制任务，同时也享有保证的长度泛化能力。我们还通过表征卷积如何可以缓解对全注意力的需求，通过总结上下文窗口并创建显著的摘要标记来建立卷积和注意力之间的计算权衡。在真实数据集上的评估证实了我们的发现，并表明CAT及其变体确实提升了语言建模性能。

更新时间: 2024-07-08 04:08:35

领域: cs.LG,cs.CL,cs.NE

下载: http://arxiv.org/abs/2407.05591v1

Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

While forward reasoning (i.e., find the answer given the question) has been explored extensively in recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? On modifying three benchmark datasets for this task, to evaluate this task: GSM8k, SVAMP, and MultiArith, we find a significant drop in the accuracy of models on this task compared to forward reasoning across SOTA LLMs (GPT4, GPT3.5, PaLM-2, and LLaMa). Motivated by the fact backward reasoning can be seen as the ''inverse'' of forward reasoning, we propose variations of three different forward reasoning strategies to improve performance. Rephrase reformulates the given problem into a forward reasoning problem, PAL-Tools combines the idea of Program-Aided LLMs to produce a set of equations that can be solved by an external solver, and Check your Work exploits the availability of natural verifier of high accuracy in the forward direction, interleaving solving and verification steps. Finally, realizing that each of our base methods correctly solves a different set of problems, we propose a novel Bayesian formulation for creating an ensemble over the base methods to further boost the accuracy. Extensive experimentation demonstrates successive improvement in the performance of LLMs on the backward reasoning task, using our strategies, with our ensemble-based method resulting in significant performance gains compared to the SOTA forward reasoning strategies we adapt.

Updated: 2024-07-08 03:33:43

标题: 填空题：探索和增强LLM在数学题中向后推理的能力

摘要: 尽管近年来前向推理（即，在问题给出答案）在文献中得到了广泛探讨，但后向推理相对较少被探索。我们研究了LLMs在数学文字问题（MWPs）上的后向推理能力：在给定一个数学问题及其答案的情况下，如果问题中有一些细节被省略，LLMs是否能有效地检索到缺失的信息？通过修改三个基准数据集（GSM8k、SVAMP和MultiArith）来评估这个任务，我们发现相比前向推理，各种LLMs在这个任务上的准确性明显下降，包括GPT4、GPT3.5、PaLM-2和LLaMa在内。由于后向推理可以被看作是前向推理的“逆向”，我们提出了三种不同的前向推理策略的变体以提高性能。Rephrase将给定的问题重新表述为一个前向推理问题，PAL-Tools将程序辅助LLMs的思想与外部求解器产生一组方程相结合，Check your Work利用前向方向的高准确性自然验证器的可用性，交替进行求解和验证步骤。最后，我们意识到我们的每种基本方法都可以正确解决不同的问题集，因此我们提出了一种新颖的贝叶斯公式来创建一个基于基本方法的集成，进一步提高准确性。广泛的实验表明，使用我们的策略，LLMs在后向推理任务上的性能得到了逐步改进，我们的基于集成的方法相比我们采用的SOTA前向推理策略获得了显著的性能增益。

更新时间: 2024-07-08 03:33:43

领域: cs.CL,cs.AI,cs.LG,I.2.3

下载: http://arxiv.org/abs/2310.01991v2

$\mathrm{E^{2}CFD}$: Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model

Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learning algorithms are misaligned with the task requirements. Based on the need to address these issues, we propose $\mathrm{E^{2}CFD}$, an effective and efficient cost function design framework. $\mathrm{E^{2}CFD}$ leverages the capabilities of a large language model (LLM) to comprehend various safety scenarios and generate corresponding cost functions. It incorporates the \textit{fast performance evaluation (FPE)} method to facilitate rapid and iterative updates to the generated cost function. Through this iterative process, $\mathrm{E^{2}CFD}$ aims to obtain the most suitable cost function for policy training, tailored to the specific tasks within the safety scenario. Experiments have proven that the performance of policies trained using this framework is superior to traditional safe reinforcement learning algorithms and policies trained with carefully designed cost functions.

Updated: 2024-07-08 03:30:25

标题: $\mathrm{E^{2}CFD}$：通过大型语言模型实现安全强化学习的有效和高效成本函数设计

摘要: 不同类别的安全强化学习算法在各种安全需求场景中表现出令人满意的性能。然而，现有方法主要解决特定安全需求场景问题的一个或几个类别，并不能应用于任意安全需求场景。此外，现有强化学习算法的优化目标与任务要求不一致。基于解决这些问题的需要，我们提出了一种有效且高效的成本函数设计框架$\mathrm{E^{2}CFD}$。$\mathrm{E^{2}CFD}$利用大型语言模型（LLM）的能力来理解各种安全场景并生成相应的成本函数。它结合了\textit{快速性能评估（FPE）}方法，以促进对生成的成本函数的快速和迭代更新。通过这个迭代过程，$\mathrm{E^{2}CFD}$旨在获得最适合策略训练的成本函数，定制给安全场景中特定任务。实验证明，使用这种框架训练的策略性能优于传统的安全强化学习算法和使用精心设计的成本函数训练的策略。

更新时间: 2024-07-08 03:30:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.05580v1

Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

End-to-end medical image segmentation is of great value for computer-aided diagnosis dominated by task-specific models, usually suffering from poor generalization. With recent breakthroughs brought by the segment anything model (SAM) for universal image segmentation, extensive efforts have been made to adapt SAM for medical imaging but still encounter two major issues: 1) severe performance degradation and limited generalization without proper adaptation, and 2) semi-automatic segmentation relying on accurate manual prompts for interaction. In this work, we propose SAMUS as a universal model tailored for ultrasound image segmentation and further enable it to work in an end-to-end manner denoted as AutoSAMUS. Specifically, in SAMUS, a parallel CNN branch is introduced to supplement local information through cross-branch attention, and a feature adapter and a position adapter are jointly used to adapt SAM from natural to ultrasound domains while reducing training complexity. AutoSAMUS is realized by introducing an auto prompt generator (APG) to replace the manual prompt encoder of SAMUS to automatically generate prompt embeddings. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate the superiority of SAMUS and AutoSAMUS against the state-of-the-art task-specific and SAM-based foundation models. We believe the auto-prompted SAM-based model has the potential to become a new paradigm for end-to-end medical image segmentation and deserves more exploration. Code and data are available at https://github.com/xianlin7/SAMUS.

Updated: 2024-07-08 03:24:35

标题: 超越适应SAM：通过自动提示实现端到端超声图像分割

摘要: 端到端的医学图像分割对于由特定任务模型主导的计算机辅助诊断具有巨大价值，通常受到泛化能力差的困扰。由于分段任何模型（SAM）带来的最新突破，为了将SAM适应于医学成像，已经付出了大量努力，但仍然遇到两个主要问题：1）在没有适当适应的情况下，性能严重下降且泛化能力有限，2）依赖准确的手动提示进行半自动分割互动。在这项工作中，我们提出了SAMUS作为定制的超声图像分割通用模型，并进一步使其以AutoSAMUS方式工作。具体来说，在SAMUS中，引入了一个并行CNN分支通过交叉分支注意力来补充局部信息，并且同时使用特征适配器和位置适配器来将SAM从自然领域适应到超声领域，同时减少训练复杂性。通过引入自动提示生成器（APG）来替换SAMUS的手动提示编码器，实现了AutoSAMUS。收集了一个包含约30k张图像和69k个掩模的全面超声数据集，涵盖了六个对象类别，用于验证。广泛的比较实验表明，SAMUS和AutoSAMUS比基于最先进的特定任务和基于SAM的基础模型更具优势。我们相信基于自动提示的SAM模型有潜力成为端到端医学图像分割的新范式，并值得进一步探索。代码和数据可在https://github.com/xianlin7/SAMUS上找到。

更新时间: 2024-07-08 03:24:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.06824v2

Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning

Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space and utilizing pretrained formulaic alpha set as initial seed values to generate synergistic formulaic alpha. We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model. Using CSI300 market data, we conducted real investment simulations and observed significant performance improvement compared to existing techniques.

Updated: 2024-07-08 02:59:56

标题: 基于强化学习的量化交易协同式公式α生成

摘要: 挖掘公式化的Alpha因子是指在股票市场量化交易中发现和开发特定因子或指标（称为Alpha因子）的过程。为了在庞大的搜索空间中高效发现Alpha因子，通常采用强化学习（RL）。本文提出了一种方法，通过扩展搜索空间并利用预训练的公式化Alpha集作为初始种子值来生成协同的公式化Alpha，以增强现有的Alpha因子挖掘方法。我们将信息系数（IC）和排名信息系数（Rank IC）作为模型的性能评估指标。利用CSI300市场数据，我们进行了真实的投资模拟，并观察到与现有技术相比显著的性能提升。

更新时间: 2024-07-08 02:59:56

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2401.02710v2

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.

Updated: 2024-07-08 02:52:05

标题: 一个用于大型语言模型标签高效监督微调的实验设计框架

摘要: 在指导数据集上进行监督微调（SFT）在实现现代大型语言模型（LLMs）中观察到的显著零样本泛化能力方面发挥了关键作用。然而，为了为指导创建高质量的响应而需要的注释工作正在变得 prohibitively 昂贵，特别是随着指导数据集涵盖的任务数量继续增加。主动学习在从未标记的样本池中识别有用的子集进行注释方面非常有效，但其高计算成本仍然是在LLMs环境中广泛应用的一个障碍。为了减少SFT的注释成本并规避主动学习的计算瓶颈，我们建议使用实验设计。实验设计技术选择最具信息量的样本进行标记，通常最大化某种不确定性和/或多样性的概念。在我们的工作中，我们实现了一个评估几种现有和新颖实验设计技术的框架，并发现这些方法在标签效率方面始终会产生显著的收益，而计算开销很小。在生成任务中，我们的方法实现了与随机抽样方法相比仅需50％的注释成本即可获得相同的泛化性能。

更新时间: 2024-07-08 02:52:05

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06692v3

Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport

We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.

Updated: 2024-07-08 02:48:15

标题: 扩散模型的速度-准确性权衡：来自非平衡热力学和最优输运的智慧

摘要: 我们讨论了一个称为扩散模型的生成模型与福克-普朗克方程的非平衡热力学之间的联系，称为随机热力学。基于随机热力学的技术，我们推导了扩散模型的速度-准确性权衡，这是扩散模型中数据生成速度和准确性之间的权衡关系。我们的结果表明，正向过程中的熵产生速率影响数据生成中的错误。从随机热力学的角度来看，我们的结果提供了如何在扩散模型中生成数据的量化见解。最佳学习协议是由随机热力学中的保守力和最优输运理论中的2-Wasserstein距离空间的测地线引入的。我们通过数值方法展示了具有不同噪声时间表的扩散模型的速度-准确性权衡的有效性，例如余弦时间表、条件最优输运和最优输运。

更新时间: 2024-07-08 02:48:15

领域: cond-mat.stat-mech,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.04495v2

SSP-GNN: Learning to Track via Bilevel Optimization

We propose a graph-based tracking formulation for multi-object tracking (MOT) where target detections contain kinematic information and re-identification features (attributes). Our method applies a successive shortest paths (SSP) algorithm to a tracking graph defined over a batch of frames. The edge costs in this tracking graph are computed via a message-passing network, a graph neural network (GNN) variant. The parameters of the GNN, and hence, the tracker, are learned end-to-end on a training set of example ground-truth tracks and detections. Specifically, learning takes the form of bilevel optimization guided by our novel loss function. We evaluate our algorithm on simulated scenarios to understand its sensitivity to scenario aspects and model hyperparameters. Across varied scenario complexities, our method compares favorably to a strong baseline.

Updated: 2024-07-08 02:37:44

标题: SSP-GNN: 通过双层优化学习跟踪

摘要: 我们提出了一种基于图的多目标跟踪（MOT）公式，其中目标检测包含运动信息和再识别特征（属性）。我们的方法将连续最短路径（SSP）算法应用于一个在一批帧上定义的跟踪图。在这个跟踪图中，边的成本是通过一个信息传递网络计算的，即图神经网络（GNN）变种。GNN的参数，因此也就是跟踪器，是在一个训练集上学习的，该训练集包含示例地面真实轨迹和检测。具体来说，学习采用我们的新型损失函数引导的双层优化形式。我们通过模拟场景评估我们的算法，以了解其对场景因素和模型超参数的敏感性。在各种场景复杂性中，我们的方法与一个强基线相比表现出色。

更新时间: 2024-07-08 02:37:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.04308v2

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

As LLMs become increasingly prevalent across various applications, it is critical to establish safety guardrails to moderate input/output content of LLMs. Existing guardrail models treat various safety categories independently and fail to explicitly capture the intercorrelations among them. This has led to limitations such as ineffectiveness due to inadequate training on long-tail data from correlated safety categories, susceptibility to jailbreaking attacks, and inflexibility regarding new safety categories. To address these limitations, we propose $R^2$-Guard, a robust reasoning enabled LLM guardrail via knowledge-enhanced logical reasoning. Specifically, $R^2$-Guard comprises two parts: data-driven category-specific learning and reasoning components. The data-driven guardrail models provide unsafety probabilities of moderated content on different safety categories. We then encode safety knowledge among different categories as first-order logical rules and embed them into a probabilistic graphic model (PGM) based reasoning component. The unsafety probabilities of different categories from data-driven guardrail models are sent to the reasoning component for final inference. We employ two types of PGMs: Markov logic networks (MLNs) and probabilistic circuits (PCs), and optimize PCs to achieve precision-efficiency balance via improved graph structure. To further perform stress tests for guardrail models, we employ a pairwise construction method to construct a new safety benchmark TwinSafety, which features principled categories. We demonstrate the effectiveness of $R^2$-Guard by comparisons with eight strong guardrail models on six safety benchmarks, and demonstrate the robustness of $R^2$-Guard against four SOTA jailbreaking attacks. $R^2$-Guard significantly surpasses SOTA method LlamaGuard by 30.2% on ToxicChat and by 59.5% against jailbreaking attacks.

Updated: 2024-07-08 02:15:29

标题: $R^2$-Guard：通过知识增强的逻辑推理实现的鲁棒推理启用的LLM防护栏

摘要: 随着大规模语言模型在各种应用中日益普及，建立安全防护措施以调节LLMs的输入/输出内容至关重要。现有的防护模型将各种安全类别独立处理，并未明确捕捉它们之间的相互关系。这导致了诸如由于对相关安全类别的长尾数据进行不足训练而导致的无效性、易受越狱攻击影响以及对新的安全类别缺乏灵活性等限制。为了解决这些限制，我们提出了一种通过知识增强的逻辑推理实现的强大推理型LLM防护措施$R^2$-Guard。具体而言，$R^2$-Guard包括两部分：基于数据的特定类别学习和推理组件。基于数据驱动的防护模型提供了不同安全类别上调节内容的不安全概率。然后我们将不同类别之间的安全知识编码为一阶逻辑规则，并将其嵌入到基于概率图模型（PGM）的推理组件中。来自数据驱动防护模型的不同类别的不安全概率被发送到推理组件进行最终推理。我们采用两种类型的PGMs：马尔可夫逻辑网络（MLNs）和概率电路（PCs），并通过改进的图结构优化PCs以实现精确性和效率的平衡。为了进一步对防护模型进行压力测试，我们采用成对构造方法构建了一个新的安全基准TwinSafety，该基准具有基本的类别。通过与六个安全基准上的八个强大的防护模型进行比较，我们展示了$R^2$-Guard的有效性，并展示了$R^2$-Guard对四种SOTA越狱攻击的稳健性。$R^2$-Guard在ToxicChat上比SOTA方法LlamaGuard提高了30.2%，并且对抗越狱攻击提高了59.5%。

更新时间: 2024-07-08 02:15:29

领域: cs.AI

下载: http://arxiv.org/abs/2407.05557v1

GIFT: Generative Interpretable Fine-Tuning

We present Generative Interpretable Fine-Tuning (GIFT) for parameter-efficient fine-tuning of pretrained Transformer backbones, which can be formulated as a simple factorized matrix multiplication in the parameter space or equivalently in the activation/representation space, and thus embraces built-in interpretability. For a layer with weights $\omega\in \mathbb{R}^{d_{out}\times d_{in}}$, our proposed GIFT learns the fine-tuned weights $\hat{\omega}$ directly from $\omega$ as $\hat{\omega}=\omega\cdot (\mathbb{I}+\phi_{d_{in}\times r}\cdot\psi_{r\times d_{in}})$. $\Theta=(\phi, \psi)$ are the learnable parameters of the two linear layers. $\Theta$ can be shared by all layers selected for fine-tuning (e.g., all the Query and Value layers), or can be layer-type specific (e.g., different $\Theta$'s used for Query and Value), resulting in significantly fewer trainable parameters compared to layer-specific Low-Rank Adaptation (LoRA). We perform comprehensive evaluations on natural language tasks (commonsense and arithmetic reasoning, instruction tuning, and sequence classification), and fine-grained visual classification tasks. We obtain the best performance and parameter efficiency among baselines on commonsense reasoning, instruction tuning and visual recognition benchmarks. Compared to LoRA, we obtain 5.9% absolute increase in average accuracy with 53.8 times reduction of parameters on Commonsense170k using Llama-3 (8B), and 5.4% absolute increase in the win rate with 4 times reduction of parameters using Llama-2 (7B) during instruction tuning. Our GIFT also obtains a slightly higher win rate on instruction tuning than GPT 3.5 (Turbo 1106). We show the output of the first linear layer (i.e., $\omega\cdot \phi$) is surprisingly interpretable, which can play the role of a token-clustering head as a by-product to localize meaningful objects/parts in images for computer vision tasks.

Updated: 2024-07-08 01:59:10

标题: GIFT: 生成式可解释微调

摘要: 我们提出了一种用于参数高效微调预训练Transformer骨干模型的生成可解释微调（GIFT）方法，可以在参数空间或激活/表示空间中简单地表示为一个分解矩阵乘法，从而具有内置解释性。对于具有权重$\omega\in\mathbb{R}^{d_{out}\times d_{in}}$的层，我们提出的GIFT直接从$\omega$中学习微调权重$\hat{\omega}$，即$\hat{\omega}=\omega\cdot(\mathbb{I}+\phi_{d_{in}\times r}\cdot\psi_{r\times d_{in}})$。$\Theta=(\phi,\psi)$是两个线性层的可学习参数。$\Theta$可以被所有选择进行微调的层共享（例如，所有的Query和Value层），也可以是特定于层类型的（例如，Query和Value使用不同的$\Theta$），相较于层特定的低秩适应（LoRA），可显著减少可训练参数。我们在自然语言任务（常识和算术推理、指令微调和序列分类）以及细粒度视觉分类任务上进行了全面评估。在常识推理、指令微调和视觉识别基准测试中，我们获得了最佳性能和参数效率。与LoRA相比，在Commonsense170k上使用Llama-3（8B），平均准确率增加5.9%，参数减少53.8倍；在指令微调过程中使用Llama-2（7B），胜率增加5.4%，参数减少4倍。我们的GIFT在指令微调上的胜率也略高于GPT 3.5（Turbo 1106）。我们展示了第一个线性层的输出（即$\omega\cdot\phi$）是令人惊讶的可解释的，可以作为一个副产品扮演一个标记聚类头的角色，以定位图像中的有意义的对象/部分，用于计算机视觉任务。

更新时间: 2024-07-08 01:59:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.00700v3

MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning

Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition. The MEEG dataset captures a wide range of emotional responses to music, enabling an in-depth analysis of brainwave patterns in musical contexts. The AT-DGNN combines an attention-based temporal learner with a dynamic graph neural network (DGNN) to accurately model the local and global graph dynamics of EEG data across varying brain network topology. Our evaluations show that AT-DGNN achieves superior performance, with an accuracy (ACC) of 83.06\% in arousal and 85.31\% in valence, outperforming state-of-the-art (SOTA) methods on the MEEG dataset. Comparative analyses with traditional datasets like DEAP highlight the effectiveness of our approach and underscore the potential of music as a powerful medium for emotion induction. This study not only advances our understanding of the brain emotional processing, but also enhances the accuracy of emotion recognition technologies in brain-computer interfaces (BCI), leveraging both graph-based learning and the emotional impact of music. The source code and dataset are available at \textit{https://github.com/xmh1011/AT-DGNN}.

Updated: 2024-07-08 01:58:48

标题: MEEG和AT-DGNN：利用音乐和图学习推进EEG情绪识别

摘要: 最近神经科学的进展揭示了在认知任务期间协调脑区活动的关键作用。为了探索这种复杂性，我们引入了MEEG数据集，这是一个全面的多模态音乐诱导脑电图（EEG）数据集，以及基于注意力的时间学习器和动态图神经网络（AT-DGNN），这是一个用于基于EEG的情绪识别的新框架。MEEG数据集捕捉了对音乐的广泛情绪反应，从而能够深入分析音乐背景下的脑波模式。AT-DGNN将基于注意力的时间学习器与动态图神经网络（DGNN）结合起来，以准确地建模EEG数据在不同脑网络拓扑中的局部和全局图动态。我们的评估结果显示，AT-DGNN在唤醒中的准确率（ACC）为83.06％，在愉悦度方面为85.31％，优于MEEG数据集上的最新方法。与传统数据集（如DEAP）的比较分析突显了我们方法的有效性，并强调了音乐作为情绪诱导强大媒介的潜力。这项研究不仅推动了我们对大脑情绪处理的理解，还提高了大脑-计算机界面（BCI）中情绪识别技术的准确性，利用了基于图的学习和音乐的情绪影响。源代码和数据集可从\textit{https://github.com/xmh1011/AT-DGNN}获取。

更新时间: 2024-07-08 01:58:48

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.05550v1

Foundations and Frontiers of Graph Learning Theory

Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the theoretical analysis framework to distill the core concepts, helps understand the key principles that drive the functionality better and guide further development. Given this surge in interest, this article provides a comprehensive summary of the theoretical foundations and breakthroughs concerning the approximation and learning behaviors intrinsic to prevalent graph learning models. Encompassing discussions on fundamental aspects such as expressiveness power, generalization, optimization, and unique phenomena such as over-smoothing and over-squashing, this piece delves into the theoretical foundations and frontier driving the evolution of graph learning. In addition, this article also presents several challenges and further initiates discussions on possible solutions.

Updated: 2024-07-08 01:22:37

标题: 图学习理论的基础和前沿

摘要: 最近图学习的进展彻底改变了理解和分析具有复杂结构的数据的方式。值得注意的是，图神经网络（GNNs），即为学习图表示而设计的神经网络架构，已成为一种流行的范例。这些模型通常以直觉驱动的设计或高度复杂的组件为特征，将它们置于理论分析框架内以澄清核心概念有助于更好地理解推动功能的关键原则并指导进一步发展。鉴于对此的兴趣激增，本文提供了关于主流图学习模型内在逼近和学习行为的理论基础和突破的全面总结。涵盖了关于表现力、泛化、优化以及过度平滑和过度压缩等独特现象的讨论，本文深入探讨了推动图学习演变的理论基础和前沿。此外，本文还提出了几个挑战，并进一步展开可能解决方案的讨论。

更新时间: 2024-07-08 01:22:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.03125v2

On the Equivalence between Logic Programming and SETAF

A framework with sets of attacking arguments (SETAF) is an extension of the well-known Dung's Abstract Argumentation Frameworks (AAFs) that allows joint attacks on arguments. In this paper, we provide a translation from Normal Logic Programs (NLPs) to SETAFs and vice versa, from SETAFs to NLPs. We show that there is pairwise equivalence between their semantics, including the equivalence between L-stable and semi-stable semantics. Furthermore, for a class of NLPs called Redundancy-Free Atomic Logic Programs (RFALPs), there is also a structural equivalence as these back-and-forth translations are each other's inverse. Then, we show that RFALPs are as expressive as NLPs by transforming any NLP into an equivalent RFALP through a series of program transformations already known in the literature. We also show that these program transformations are confluent, meaning that every NLP will be transformed into a unique RFALP. The results presented in this paper enhance our understanding that NLPs and SETAFs are essentially the same formalism. Under consideration in Theory and Practice of Logic Programming (TPLP).

Updated: 2024-07-08 01:03:53

标题: 关于逻辑编程和SETAF之间的等价性

摘要: SETAF是Dung的抽象论证框架（AAFs）的一个扩展，允许对论点进行联合攻击。本文提供了从Normal Logic Programs（NLPs）到SETAFs和反之亦然的翻译。我们展示了它们的语义之间存在成对的等价性，包括L-稳定和半稳定语义之间的等价性。此外，对于一类称为无冗余原子逻辑程序（RFALPs）的NLPs，这些来回翻译也具有结构等价性。然后，我们展示了RFALPs与NLPs的表达能力相同，通过一系列在文献中已知的程序转换将任何NLP转换为等效的RFALP。我们还展示了这些程序转换是可汇聚的，意味着每个NLP将被转换为唯一的RFALP。本文提出的结果增进了我们对NLPs和SETAFs实质上是相同形式化的理解。正在考虑发表在《逻辑编程的理论与实践》（TPLP）中。

更新时间: 2024-07-08 01:03:53

领域: cs.AI,I.2.3

下载: http://arxiv.org/abs/2407.05538v1

Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research

Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value less than 0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value less than 0.01). In conclusion, synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging.

Updated: 2024-07-08 00:56:36

标题: 合成增强：揭示合成数据在医学影像研究中的潜力

摘要: X射线检查（CXR）对于诊断各种疾病至关重要，但当应用于新人群时，模型的泛化能力问题限制了其有效性。生成式人工智能，特别是去噪扩散概率模型（DDPMs），提供了一种有前途的方法来生成合成图像，增强数据集的多样性。本研究调查了合成数据补充对医学影像研究性能和泛化能力的影响。该研究利用DDPMs基于CheXpert数据集的人口统计和病理特征创建了条件化的合成CXR。这些合成图像用于补充病理分类器的训练数据集，旨在提高其性能。评估涉及三个数据集（CheXpert，MIMIC-CXR和Emory Chest X-ray）和各种实验，包括用合成数据补充真实数据，仅使用合成数据进行训练，以及将合成数据与外部数据集混合。性能使用接收器工作特征曲线下面积（AUROC）进行评估。将合成数据添加到真实数据集中导致AUROC值显著增加（在内部和外部测试集中增加1000％，AUROC增加至0.02，在所有情况下p值小于0.01）。当分类器仅在合成数据上进行训练时，它们的性能水平与在真实数据上进行200-300％数据补充训练的分类器相当。来自不同来源的真实和合成数据的组合展示了增强的模型泛化能力，将模型AUROC从0.76增加到0.80（p值小于0.01）。总之，合成数据补充显著提高了医学影像中病理分类器的性能和泛化能力。

更新时间: 2024-07-08 00:56:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.09402v2

What Matters in Transformers? Not All Attention is Needed

Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and dropping ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers. Our findings provide valuable insights for future network architecture design. The code will be released at: \url{https://github.com/Shwai-He/LLM-Drop}.

Updated: 2024-07-08 00:28:52

标题: 变压器中重要的是什么？并非所有的注意力都是必要的

摘要: 基于Transformer的大型语言模型（LLMs）的扩展已经在各种任务中展示出了令人期待的性能。然而，这种扩展也引入了冗余结构，给实际部署带来了挑战。尽管在LLMs中有一些冗余性的认识，但对于不同结构（如MLP和注意力层）之间的冗余性变化的研究尚未充分探讨。在这项工作中，我们利用基于相似度的度量方法研究了Transformer中不同模块（包括块、MLP和注意力层）之间的冗余性变化。这个度量方法的基本假设是冗余结构会生成与它们的输入高度相似的输出。令人惊讶的是，虽然注意力层对于transformers至关重要并将其区分于其他主流架构，我们发现大部分注意力层表现出过高的相似度，并且可以安全地剪枝而不会降低性能，从而降低内存和计算成本。此外，我们进一步提出了一种同时删除注意力和MLP层的方法，实现了性能和删除比率的提升。大量实验证明了我们方法的有效性，例如，即使剪去一半的注意力层，Llama-3-70B仍然保持了可比较的性能。我们的发现为未来网络架构设计提供了宝贵的见解。代码将发布在：\url{https://github.com/Shwai-He/LLM-Drop}。

更新时间: 2024-07-08 00:28:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15786v2

This&That: Language-Gesture Controlled Video Generation for Robot Planning

We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intents, and 3) translating visual planning into robot actions. We propose language-gesture conditioning to generate videos, which is both simpler and clearer than existing language-only methods, especially in complex and uncertain environments. We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness in addressing the above three challenges, and justifies the use of video generation as an intermediate representation for generalizable task planning and execution. Project website: https://cfeng16.github.io/this-and-that/.

Updated: 2024-07-08 00:28:41

标题: 这个标题的翻译是：这个与那个：语言手势控制的视频生成用于机器人规划

摘要: 我们提出了一种用于沟通、规划和执行各种任务的机器人学习方法，称为This&That。我们通过利用在包含丰富物理和语义上下文的互联网规模数据上训练的视频生成模型的能力，实现了机器人对于一般任务的规划。在这项工作中，我们解决了基于视频的规划中的三个基本挑战：1) 使用简单的人类指令进行明确的任务沟通，2) 尊重用户意图的可控视频生成，以及3) 将视觉规划转化为机器人动作。我们提出了语言手势调节来生成视频，这比现有的仅使用语言的方法更简单且更清晰，尤其是在复杂和不确定的环境中。然后，我们建议采用行为克隆设计，无缝地整合视频计划。This&That展示了在解决上述三个挑战方面的最新有效性，并证明了将视频生成作为通用任务规划和执行的中间表示的合理性。项目网站：https://cfeng16.github.io/this-and-that/。

更新时间: 2024-07-08 00:28:41

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.05530v1

Nonparametric Density Estimation via Variance-Reduced Sketching

Nonparametric density models are of great interest in various scientific and engineering disciplines. Classical density kernel methods, while numerically robust and statistically sound in low-dimensional settings, become inadequate even in moderate higher-dimensional settings due to the curse of dimensionality. In this paper, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariable density functions with a reduced curse of dimensionality. Our framework conceptualizes multivariable functions as infinite-size matrices, and facilitates a new sketching technique motivated by numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network estimators and classical kernel methods in numerous density models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver nonparametric density estimation with a reduced curse of dimensionality.

Updated: 2024-07-08 00:27:04

标题: 通过方差减小素描进行非参数密度估计

摘要: 非参数密度模型在各种科学和工程学科中备受关注。经典的密度核方法在低维情况下具有数值稳健性和统计合理性，但在中等高维情况下甚至会因维度诅咒而变得不足够。在本文中，我们引入了一个名为方差降低草图（VRS）的新框架，专门设计用于估计具有降低维度诅咒的多变量密度函数。我们的框架将多变量函数概念化为无限大小的矩阵，并借鉴数值线性代数文献提出了一种新的草图技术，以减少密度估计问题中的方差。我们通过一系列模拟实验和真实数据应用展示了VRS的稳健数值性能。值得注意的是，在许多密度模型中，VRS相对于现有的神经网络估计器和经典核方法表现出了显著的改进。此外，我们提供了VRS的理论理由，以支持其能够提供具有降低维度诅咒的非参数密度估计。

更新时间: 2024-07-08 00:27:04

领域: stat.ML,cs.LG,cs.NA,math.NA,stat.ME

下载: http://arxiv.org/abs/2401.11646v2

Rethinking Image Skip Connections in StyleGAN2

Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathematical analysis on it yet. To rectify this situation, this brief aims to elucidate the mathematical meaning of the image skip connection and introduce a groundbreaking methodology, termed the image squeeze connection, which significantly improves the quality of image synthesis. Specifically, we analyze the image skip connection technique to reveal its problem and introduce the proposed method which not only effectively boosts the GAN performance but also reduces the required number of network parameters. Extensive experiments on various datasets demonstrate that the proposed method consistently enhances the performance of state-of-the-art models based on StyleGAN. We believe that our findings represent a vital advancement in the field of image synthesis, suggesting a novel direction for future research and applications.

Updated: 2024-07-08 00:21:17

标题: 重新思考StyleGAN2中的图像跳连接

摘要: 基于StyleGAN的各种模型在图像合成领域取得了显著的进展，这归功于它们稳定的训练性能和优越的表现。在StyleGAN框架内，采用图像跳跃连接取代传统的残差连接更受青睐。然而，这种偏好仅仅基于经验观察，尚未进行深入的数学分析。为了纠正这种情况，本文旨在阐明图像跳跃连接的数学含义，并引入一种开创性的方法，称为图像压缩连接，显著提高图像合成的质量。具体而言，我们分析了图像跳跃连接技术，揭示了其问题，并引入了提出的方法，不仅有效提升了GAN性能，还减少了网络参数的数量。对各种数据集进行的大量实验表明，所提出的方法持续增强了基于StyleGAN的最新模型的性能。我们相信我们的发现代表了图像合成领域的重要进步，为未来研究和应用提供了新的方向。

更新时间: 2024-07-08 00:21:17

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.05527v1

Can Machines Learn the True Probabilities?

When there exists uncertainty, AI machines are designed to make decisions so as to reach the best expected outcomes. Expectations are based on true facts about the objective environment the machines interact with, and those facts can be encoded into AI models in the form of true objective probability functions. Accordingly, AI models involve probabilistic machine learning in which the probabilities should be objectively interpreted. We prove under some basic assumptions when machines can learn the true objective probabilities, if any, and when machines cannot learn them.

Updated: 2024-07-08 00:19:43

标题: 机器能学习真实概率吗？

摘要: 在存在不确定性的情况下，人工智能机器被设计为做出决策，以达到最佳预期结果。预期结果基于机器与之交互的客观环境的真实事实，并且这些事实可以以真实客观概率函数的形式编码到人工智能模型中。因此，人工智能模型涉及概率机器学习，其中概率应该被客观解释。我们在一些基本假设下证明了机器何时能学习到真实客观概率，以及何时无法学习到它们。

更新时间: 2024-07-08 00:19:43

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.05526v1

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

标题: MUSE：语言模型的机器取消学习六方面评估

How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

标题: 我取得了多少进展？一种未被探索的人类反馈信号，用于教导机器人

SD-BLS: Privacy Preserving Selective Disclosure of Verifiable Credentials with Unlinkable Threshold Revocation

标题: SD-BLS：隐私保护的可验证凭证选择性披露与不可链接阈值吊销

Exploiting Heterogeneity in Timescales for Sparse Recurrent Spiking Neural Networks for Energy-Efficient Edge Computing

标题: 利用时间尺度的异质性实现稀疏循环尖峰神经网络，用于能效边缘计算

MELT: Mining Effective Lightweight Transformations from Pull Requests

标题: MELT: 从拉取请求中挖掘有效轻量级转换

Geospatial Trajectory Generation via Efficient Abduction: Deployment for Independent Testing

标题: 地理空间轨迹生成的有效绑架：独立测试的部署

Deep Learning in Physical Layer: Review on Data Driven End-to-End Communication Systems and their Enabling Semantic Applications

标题: 物理层的深度学习：基于数据驱动的端到端通信系统及其支持的语义应用的综述

Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment

标题: 揭示隐私漏洞：对LLM对齐偏好数据进行成员推断攻击

A Single Transformer for Scalable Vision-Language Modeling

标题: 一个用于可扩展视觉-语言建模的单一Transformer

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

标题: DebUnc：通过不确定性估计减轻大型语言模型代理通信中的幻觉

Large Language Models in Finance: A Survey

标题: 金融领域中的大型语言模型：一项调研

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

标题: 在线双层优化：在线交替梯度方法的遗憾分析

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

标题: InsightBench: 通过多步洞察生成评估商业分析代理

Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)

标题: 探索ChatGPT在社交计算任务中复制人类标签的能力（扩展版）

System stabilization with policy optimization on unstable latent manifolds

标题: 在不稳定潜在流形上通过策略优化实现系统稳定化

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

标题: AutoChunk：用于内存高效的长序列推理的自动激活块 请问还有其他什么可以帮助您的吗？如果没有的话，谢谢！

Hybrid Classical-Quantum architecture for vectorised image classification of hand-written sketches

标题: 混合经典-量子架构用于手写草图的矢量化图像分类

SmartChoices: Augmenting Software with Learned Implementations

标题: 智能选择：利用学习实现增强软件

If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers

标题: 如果你不理解它，就不要使用它：通过在层之间添加过滤器来消除特洛伊木马

Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation

标题: 优化负面提示以增强文本到图像生成中的美学和保真度

AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships

标题: 基于人工智能的多组学整合，用于多尺度因果基因型-环境-表型关系的预测建模

Knowledge Management in the Companion Cognitive Architecture

标题: 《伴侣认知架构中的知识管理》

Interactively Diagnosing Errors in a Semantic Parser

标题: 交互式诊断语义解析器中的错误

Learning Diffusion Priors from Observations by Expectation Maximization

标题: 通过期望最大化从观察中学习扩散先验

JANET: Joint Adaptive predictioN-region Estimation for Time-series

标题: JANET：时间序列的联合自适应预测区域估计

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

标题: 从LLMs到动作：潜在代码作为层次化机器人控制中的桥梁

Bucketized Active Sampling for Learning ACOPF

标题: 基于桶化的主动采样用于学习ACOPF

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

标题: 抵御越狱攻击的语言模型强化提示优化

Non-Robust Features are Not Always Useful in One-Class Classification

标题: 非稳健特征在一类分类中并非总是有用

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models

标题: AUTOLYCUS：利用可解释人工智能（XAI）对可解释模型进行模型提取攻击

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

标题: 重新审视稀疏奖励在目标达成强化学习中的作用

Improving Text-To-Audio Models with Synthetic Captions

标题: 用合成字幕改进文本到音频模型

Learning Regionalization using Accurate Spatial Cost Gradients within a Differentiable High-Resolution Hydrological Model: Application to the French Mediterranean Region

标题: 学习区域化：在可微分的高分辨率水文模型中利用精确的空间成本梯度——以法国地中海地区为例

Calibrating Transformers via Sparse Gaussian Processes

标题: 通过稀疏高斯过程对变压器进行校准

Parametric Matrix Models

标题: 参数矩阵模型

Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

标题: 将神经网络和符号回归相结合用于分析李雅普诺夫函数的发现

P3GNN: A Privacy-Preserving Provenance Graph-Based Model for APT Detection in Software Defined Networking

标题: P3GNN：软件定义网络中用于高级持久性图的隐私保护型模型，用于APT检测

Self-Organising Neural Discrete Representation Learning à la Kohonen

标题: 自组织神经离散表示学习 à la Kohonen

Large Language Model Recall Uncertainty is Modulated by the Fan Effect

标题: 大型语言模型的召回不确定性受风扇效应调制

FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

标题: 突袭：针对DeFi协议中深层逻辑漏洞的有效攻击合成

标题: AutoChunk：用于内存高效的长序列推理的自动激活块请问还有其他什么可以帮助您的吗？如果没有的话，谢谢！