_              _         ____              
   / \   _ ____  _(_)_   __ |  _ \  __ _ _   _ 
  / _ \ | '__\ \/ / \ \ / / | | | |/ _` | | | |
 / ___ \| |   >  <| |\ V /  | |_| | (_| | |_| |
/_/   \_\_|  /_/\_\_| \_/   |____/ \__,_|\__, |
                                         |___/ 
        

Articles: 0

Last Updated: N/A (+00:00)

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at https://transic-robot.github.io/

Updated: 2024-05-16 17:59:07

标题: TRANSIC:通过从在线校正中学习进行从模拟到真实策略转移

摘要: 在模拟中学习并将学到的策略转移到现实世界有潜力实现通用性机器人。这种方法的关键挑战是解决模拟与现实之间的差距。先前的方法通常需要领域特定知识的先验知识。我们认为,获得这种知识的一种简单方法是让人类观察并协助机器人在现实世界中执行策略。然后机器人可以从人类那里学习,以消除各种模拟到现实之间的差距。我们提出了TRANSIC,这是一种基于人在环中框架的数据驱动方法,可以实现成功的模拟到现实转移。TRANSIC允许人类通过干预和在线纠正来增强模拟策略,以全面地克服各种未建模的模拟到现实差距。残余策略可以从人类纠正中学习,并与模拟策略集成以进行自主执行。我们展示了我们的方法可以在复杂且接触丰富的操纵任务中实现成功的模拟到现实转移,例如家具装配。通过在模拟中学习的策略和从人类学习的策略的协同集成,TRANSIC作为一种全面解决各种常常共存的模拟到现实差距的方法是有效的。它显示了与人类努力一起扩展的有吸引力的特性。视频和代码可在https://transic-robot.github.io/获取。

更新时间: 2024-05-16 17:59:07

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.10315v1

How Far Are We From AGI

The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing works have summarized specific recent advancements of AI, they lack a comprehensive discussion of AGI's definitions, goals, and developmental trajectories. Different from existing survey papers, this paper delves into the pivotal questions of our proximity to AGI and the strategies necessary for its realization through extensive surveys, discussions, and original perspectives. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status-quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.

Updated: 2024-05-16 17:59:02

标题: 我们离人工通用智能(AGI)有多远

摘要: 人工智能(AI)的发展深刻影响了人类社会,在多个领域取得了重大进展。然而,对AI的不断增加需求突显了当前AI的局限性,催生了向人工通用智能(AGI)的运动。AGI以其能够高效有效地执行多样化的现实任务,与人类智能相媲美的能力而著称,代表了AI演进的重要里程碑。尽管现有作品总结了AI的具体最新进展,但缺乏对AGI的定义、目标和发展轨迹的全面讨论。与现有调查报告不同,本文深入探讨了我们距离AGI的接近程度以及实现AGI所需的策略,通过广泛的调查、讨论和原创观点。我们首先阐述了AGI所需的能力框架,整合了内部、接口和系统维度。由于实现AGI需要更先进的能力和严格遵守的限制条件,我们进一步讨论了必要的AGI对齐技术,以协调这些因素。值得注意的是,我们强调了以负责任的方式对待AGI的重要性,首先定义AGI进展的关键级别,然后评估框架,最后提出我们如何实现AGI的路线图。此外,为了提供对AI整合普遍影响的具体见解,我们概述了多个领域朝向AGI的现有挑战和潜在途径。总之,作为对AGI当前状态和未来轨迹的开拓性探索,本文旨在促进研究人员和从业者之间关于AGI的广泛公开讨论。

更新时间: 2024-05-16 17:59:02

领域: cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.10313v1

Stochastic Q-learning for Large Discrete Action Spaces

In complex environments with large discrete action spaces, effective decision-making is critical in reinforcement learning (RL). Despite the widespread use of value-based RL approaches like Q-learning, they come with a computational burden, necessitating the maximization of a value function over all actions in each iteration. This burden becomes particularly challenging when addressing large-scale problems and using deep neural networks as function approximators. In this paper, we present stochastic value-based RL approaches which, in each iteration, as opposed to optimizing over the entire set of $n$ actions, only consider a variable stochastic set of a sublinear number of actions, possibly as small as $\mathcal{O}(\log(n))$. The presented stochastic value-based RL methods include, among others, Stochastic Q-learning, StochDQN, and StochDDQN, all of which integrate this stochastic approach for both value-function updates and action selection. The theoretical convergence of Stochastic Q-learning is established, while an analysis of stochastic maximization is provided. Moreover, through empirical validation, we illustrate that the various proposed approaches outperform the baseline methods across diverse environments, including different control problems, achieving near-optimal average returns in significantly reduced time.

Updated: 2024-05-16 17:58:44

标题: 大离散动作空间的随机Q学习

摘要: 在具有大离散动作空间的复杂环境中,有效的决策对于强化学习(RL)至关重要。尽管像Q-learning这样的基于价值的RL方法被广泛使用,但它们带来了计算负担,需要在每次迭代中对所有动作进行价值函数的最大化。当处理大规模问题并使用深度神经网络作为函数逼近器时,这种负担尤为具有挑战性。在本文中,我们提出了基于随机价值的RL方法,在每次迭代中,与优化整个动作集合相反,只考虑一个可变随机集合,可能仅为$\mathcal{O}(\log(n))$这么多的动作。所提出的随机基于价值的RL方法包括随机Q-learning、StochDQN和StochDDQN,所有这些方法都将这种随机方法集成到价值函数更新和动作选择中。我们建立了随机Q-learning的理论收敛性,并提供了对随机最大化的分析。此外,通过实证验证,我们展示了各种提出的方法在不同环境中优于基准方法,包括不同的控制问题,在显著减少的时间内实现接近最优的平均回报。

更新时间: 2024-05-16 17:58:44

领域: cs.LG,cs.AI,cs.PF,cs.RO,stat.ML

下载: http://arxiv.org/abs/2405.10310v1

4D Panoptic Scene Graph Generation

We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs. To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. In the end, we provide a real-world application example to demonstrate how we can achieve dynamic scene understanding by integrating a large language model into our PSG-4D system.

Updated: 2024-05-16 17:56:55

标题: 4D全景场景图生成

摘要: 我们生活在一个三维空间中,同时通过第四维度时间向前移动。为了让人工智能能够发展对这样一个四维环境的全面理解,我们引入了4D全景场景图(PSG-4D),这是一个新的表示方法,它连接了在动态的4D世界中感知到的原始视觉数据和高级视觉理解。具体来说,PSG-4D将丰富的4D感知数据抽象为节点,这些节点代表具有精确位置和状态信息的实体,以及边,捕捉时间关系。为了促进这一新领域的研究,我们构建了一个丰富注释的PSG-4D数据集,包括3K个RGB-D视频,总共1M帧,每一帧都标记有4D全景分割掩模以及细粒度、动态的场景图。为了解决PSG-4D问题,我们提出了PSG4DFormer,这是一种基于Transformer的模型,可以预测全景分割掩模,沿着时间轴跟踪掩模,并通过一个关系组件生成相应的场景图。对新数据集的大量实验证明,我们的方法可以作为未来对PSG-4D的研究的强有力基准。最后,我们提供了一个真实的应用示例,演示了如何通过将大型语言模型整合到我们的PSG-4D系统中实现动态场景理解。

更新时间: 2024-05-16 17:56:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.10305v1

Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through a real-world dataset, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.

Updated: 2024-05-16 17:55:42

标题: 无监督领域转移条件下预测区间的最佳聚合

摘要: 随着机器学习模型越来越多地部署在动态环境中,评估和量化与分布变化相关的不确定性变得至关重要。当基础数据生成过程发生变化时,分布变化就会发生,导致模型性能的偏差。预测区间捕捉了给定预测的可能结果范围,是表征由基础分布引起的不确定性的关键工具。在本文中,我们提出了一种聚合预测区间的方法,以获得在无监督领域转移下具有最小宽度和足够覆盖的目标领域。在无监督领域转移情况下,我们从相关源领域拥有标记样本并且从目标领域拥有未标记的协变量。我们的分析涵盖了源领域和目标领域通过有界密度比和保持测度的转换相关的情景。我们提出的方法具有计算效率高且易于实施的特点。除了通过真实数据集展示我们方法的性能外,我们还深入探讨了理论细节。这包括建立严格的理论保证和有限样本界限,关于我们预测区间的覆盖率和宽度。我们的方法在实际应用中表现出色,并且基于坚实的理论框架,确保其在不同环境下的可靠性和有效性。

更新时间: 2024-05-16 17:55:42

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.10302v1

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data. En route, we investigate the informativeness of various features in alignment prediction and combine them with standard models to construct the alignment predictor.

Updated: 2024-05-16 17:55:24

标题: 共形对齐:何时信任基础模型并提供保证

摘要: 在将基础模型的输出应用于高风险任务之前,确保它们符合人类价值观是至关重要的。例如,在放射学报告生成中,视觉语言模型生成的报告在用于医疗决策之前必须与人类评估保持一致。本文提出了一种称为Conformal Alignment的通用框架,用于识别其输出符合用户指定的对齐标准的单位。可以保证,平均而言,所选单位中的一定比例确实符合对齐标准,无论基础模型或数据分布如何。给定任何预训练模型和具有模型生成输出的新单位,Conformal Alignment利用具有地面实况对齐状态的参考数据集训练一个对齐预测器。然后选择那些预测的对齐分数超过数据相关阈值的新单位,证明其相应输出是可信的。通过问答和放射学报告生成的应用,我们证明了我们的方法能够通过轻量级训练在适量的参考数据上准确识别具有可信输出的单位。在此过程中,我们研究了对齐预测中各种特征的信息性,并将其与标准模型结合起来构建对齐预测器。

更新时间: 2024-05-16 17:55:24

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.10301v1

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

The expanding size of language models has created the necessity for a comprehensive examination across various dimensions that reflect the desiderata with respect to the tradeoffs between various hardware metrics, such as latency, energy consumption, GPU memory usage, and performance. There is a growing interest in establishing Pareto frontiers for different language model configurations to identify optimal models with specified hardware constraints. Notably, architectures that excel in latency on one device may not perform optimally on another. However, exhaustive training and evaluation of numerous architectures across diverse hardware configurations is computationally prohibitive. To this end, we propose HW-GPT-Bench, a hardware-aware language model surrogate benchmark, where we leverage weight-sharing techniques from Neural Architecture Search (NAS) to efficiently train a supernet proxy, encompassing language models of varying scales in a single model. We conduct profiling of these models across 13 devices, considering 5 hardware metrics and 3 distinct model scales. Finally, we showcase the usability of HW-GPT-Bench using 8 different multi-objective NAS algorithms and evaluate the quality of the resultant Pareto fronts. Through this benchmark, our objective is to propel and expedite research in the advancement of multi-objective methods for NAS and structural pruning in large language models.

Updated: 2024-05-16 17:53:32

标题: HW-GPT-Bench: 面向硬件的语言模型架构基准测试

摘要: 语言模型的规模不断扩大,这引发了对各种维度的全面考察,反映了在各种硬件指标(如延迟、能源消耗、GPU内存使用和性能)之间权衡的必要性。人们越来越感兴趣地建立不同语言模型配置的帕累托前沿,以确定在特定硬件约束条件下的最佳模型。值得注意的是,在一个设备上优秀的架构可能在另一个设备上表现不佳。然而,对于许多架构在不同硬件配置上进行详尽的训练和评估在计算上是不可行的。为此,我们提出了一种硬件感知的语言模型替代基准HW-GPT-Bench,我们利用神经结构搜索(NAS)中的权重共享技术,有效地训练一个超网络代理,将各种规模的语言模型合并为一个模型。我们在13台设备上对这些模型进行了分析,考虑了5个硬件指标和3种不同的模型规模。最后,我们展示了HW-GPT-Bench的可用性,使用8种不同的多目标NAS算法评估了结果帕累托前沿的质量。通过这个基准测试,我们的目标是推动和加速多目标NAS和大型语言模型中的结构剪枝方法的研究进展。

更新时间: 2024-05-16 17:53:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.10299v1

Societal Adaptation to Advanced AI

Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse. However, this approach becomes less feasible as the number of developers of advanced AI grows, and impedes beneficial use-cases as well as harmful ones. In response, we urge a complementary approach: increasing societal adaptation to advanced AI, that is, reducing the expected negative impacts from a given level of diffusion of a given AI capability. We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems, illustrated with examples in election manipulation, cyberterrorism, and loss of control to AI decision-makers. We discuss a three-step cycle that society can implement to adapt to AI. Increasing society's ability to implement this cycle builds its resilience to advanced AI. We conclude with concrete recommendations for governments, industry, and third-parties.

Updated: 2024-05-16 17:52:12

标题: Societal Adaptation to Advanced AI 高级人工智能的社会适应

摘要: 现有的管理来自先进AI系统风险的策略通常集中在影响开发何种AI系统以及它们如何扩散上。然而,随着先进AI的开发者数量增加,这种方法变得不太可行,并且阻碍了有益用例以及有害用例。作为回应,我们提倡一种互补方法:增加社会对先进AI的适应能力,即降低在特定AI能力扩散水平下的预期负面影响。我们引入了一个概念框架,帮助识别避免、防御和纠正潜在有害AI系统使用的适应性干预措施,并通过选举操纵、网络恐怖主义和AI决策者失控等例子加以说明。我们讨论了社会可以实施的适应AI的三步循环。增加社会实施此循环的能力将增强其对先进AI的适应能力。最后,我们提出了针对政府、产业和第三方的具体建议。

更新时间: 2024-05-16 17:52:12

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10295v1

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL. Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini. Furthermore, we find that CoT reasoning is a crucial component for performance improvement, as removing the CoT reasoning results in a significant decrease in the overall performance of our method.

Updated: 2024-05-16 17:50:19

标题: 用强化学习对大型视觉-语言模型进行微调,作为决策制定代理

摘要: 大型视觉-语言模型(VLMs)在专门的视觉指导数据上进行微调,展现出在各种场景中令人印象深刻的语言推理能力。然而,这种微调范式可能无法高效地学习多步目标导向任务中的最佳决策代理,尤其是来自互动环境。为了解决这一挑战,我们提出了一个算法框架,利用强化学习(RL)对VLMs进行微调。具体地,我们的框架提供一个任务描述,然后提示VLM生成一系列思维链(CoT)推理,使VLM能够有效地探索导致最终基于文本的行动的中间推理步骤。接下来,开放式文本输出被解析为可执行的行动,与环境互动以获得目标导向任务奖励。最后,我们的框架利用这些任务奖励用RL对整个VLM进行微调。从经验上看,我们证明了我们提出的框架增强了VLM代理在各种任务中的决策能力,使7b模型能够胜过商业模型如GPT4-V或Gemini。此外,我们发现CoT推理是性能改进的关键组成部分,因为去除CoT推理会导致我们方法整体性能显著降低。

更新时间: 2024-05-16 17:50:19

领域: cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.10292v1

Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction

Facts extraction is pivotal for constructing knowledge graphs. Recently, the increasing demand for temporal facts in downstream tasks has led to the emergence of the task of temporal fact extraction. In this paper, we specifically address the extraction of temporal facts from natural language text. Previous studies fail to handle the challenge of establishing time-to-fact correspondences in complex sentences. To overcome this hurdle, we propose a timeline-based sentence decomposition strategy using large language models (LLMs) with in-context learning, ensuring a fine-grained understanding of the timeline associated with various facts. In addition, we evaluate the performance of LLMs for direct temporal fact extraction and get unsatisfactory results. To this end, we introduce TSDRE, a method that incorporates the decomposition capabilities of LLMs into the traditional fine-tuning of smaller pre-trained language models (PLMs). To support the evaluation, we construct ComplexTRED, a complex temporal fact extraction dataset. Our experiments show that TSDRE achieves state-of-the-art results on both HyperRED-Temporal and ComplexTRED datasets.

Updated: 2024-05-16 17:48:21

标题: 基于时间轴的句子分解与上下文学习用于时间事实提取

摘要: 事实提取对于构建知识图谱至关重要。最近,在下游任务中对时间事实的增加需求导致了时间事实提取任务的出现。在本文中,我们特别关注从自然语言文本中提取时间事实。先前的研究未能解决建立复杂句子中时间与事实对应关系的挑战。为了克服这一障碍,我们提出了一种基于时间线的句子分解策略,利用大型语言模型(LLMs)进行上下文学习,确保对与各种事实相关的时间线有细粒度的理解。此外,我们评估了LLMs在直接提取时间事实方面的性能,并获得了不尽人意的结果。为此,我们引入了TSDRE,一种将LLMs的分解能力整合到传统的对较小的预训练语言模型(PLMs)进行微调的方法。为了支持评估,我们构建了ComplexTRED,一个复杂的时间事实提取数据集。我们的实验表明,TSDRE在HyperRED-Temporal和ComplexTRED数据集上均取得了最先进的结果。

更新时间: 2024-05-16 17:48:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.10288v1

FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models

Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality and diversity. Then, we devise effective solutions for addressing both problems, which essentially require training with multiple true positive pairs. Finally, we propose training with sigmoid loss to address such a requirement. We show very large gains over the current state-of-the-art for both image recognition ($\sim +6\%$ on average over 11 datasets) and image retrieval ($\sim +19\%$ on Flickr30k and $\sim +15\%$ on MSCOCO).

Updated: 2024-05-16 17:46:54

标题: FFF:修复对比预训练中存在缺陷的基础,导致非常强大的视觉-语言模型

摘要: 尽管噪声和字幕质量被认为是影响视觉-语言对比预训练的重要因素,在本文中,我们展示了通过解决这些问题来提高训练过程的潜力尚未被实现。具体来说,我们首先研究和分析了影响训练的两个问题:负对的错误分配和字幕质量和多样性不足。然后,我们设计了有效的解决方案来解决这两个问题,这基本上需要使用多个真正的正对。最后,我们提出使用sigmoid损失进行训练来满足这种要求。我们展示了在图像识别(在11个数据集上平均提高约6%)和图像检索(在Flickr30k上提高约19%,在MSCOCO上提高约15%)方面相对于当前最先进技术的非常大的增益。

更新时间: 2024-05-16 17:46:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.10286v1

Quantum Vision Transformers for Quark-Gluon Classification

We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architecture as a potential solution. In particular, we evaluate our method by applying the model to multi-detector jet images from CMS Open Data. The goal is to distinguish quark-initiated from gluon-initiated jets. We successfully train the quantum model and evaluate it via numerical simulations. Using this approach, we achieve classification performance almost on par with the one obtained with the completely classical architecture, considering a similar number of parameters.

Updated: 2024-05-16 17:45:54

标题: 量子视觉变压器用于夸克胶子分类

摘要: 我们引入了一种混合量子-经典视觉变压器架构,其特点是在注意机制和多层感知器中集成了变分量子电路。该研究解决了在分析未来高亮度大强子对撞机数据时的计算效率和资源限制的关键挑战,将该架构呈现为一种潜在解决方案。具体而言,我们通过将模型应用于CMS开放数据中的多探测器喷流图像来评估我们的方法。目标是区分由夸克引发的喷流和由胶子引发的喷流。我们成功训练了量子模型,并通过数值模拟进行了评估。使用这种方法,我们几乎实现了与完全经典架构获得的分类性能相当的结果,考虑到参数数量相似。

更新时间: 2024-05-16 17:45:54

领域: quant-ph,cs.LG,hep-ph,68Q12 (Primary) 81P68, 68T07 (Secondary)

下载: http://arxiv.org/abs/2405.10284v1

VREN: Volleyball Rally Dataset with Expression Notation Language

This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science. The dataset is available at: https://github.com/haotianxia/VREN.

Updated: 2024-05-16 17:38:36

标题: VREN:带有表达式符号语言的排球拉力数据集

摘要: 这项研究旨在实现两个目标:第一个目标是整理一个大型且信息丰富的数据集,其中包含职业联赛和NCAA Div-I室内排球比赛中球员的行动和位置的关键且简明的摘要,以及排球的来回运动模式。虽然之前有几项研究致力于为其他运动(如羽毛球和足球)创建类似的数据集,但尚未实现为室内排球创建这样的数据集。第二个目标是引入一种排球描述性语言,以完全描述比赛中的拉力过程,并将该语言应用于我们的数据集。基于整理的数据集和我们的描述性运动语言,我们提出了三项任务,用于使用我们的数据集进行自动化排球行动和战术分析:(1)排球拉力预测,旨在预测拉力的结果并帮助球员和教练在训练中改善决策,(2)设置类型和击球类型预测,以帮助教练和球员更有效地准备比赛,以及(3)排球战术和进攻区域统计,提供高级排球统计数据并帮助教练更好地理解比赛和对手的战术。我们进行了案例研究,展示实验结果如何为排球分析社区提供见解。此外,基于真实数据的实验评估为未来研究和应用我们的数据集和语言建立了基准。这项研究弥合了室内排球领域与计算机科学之间的差距。数据集可在以下网址获取:https://github.com/haotianxia/VREN。

更新时间: 2024-05-16 17:38:36

领域: cs.LG

下载: http://arxiv.org/abs/2209.13846v2

Simultaneous Haar Indistinguishability with Applications to Unclonable Cryptography

Unclonable cryptography is concerned with leveraging the no-cloning principle to build cryptographic primitives that are otherwise impossible to achieve classically. Understanding the feasibility of unclonable encryption, one of the key unclonable primitives, satisfying indistinguishability security in the plain model has been a major open question in the area. So far, the existing constructions of unclonable encryption are either in the quantum random oracle model or are based on new conjectures. We present a new approach to unclonable encryption via a reduction to a novel question about nonlocal quantum state discrimination: how well can non-communicating -- but entangled -- players distinguish between different distributions over quantum states? We call this task simultaneous state indistinguishability. Our main technical result is showing that the players cannot distinguish between each player receiving independently-chosen Haar random states versus all players receiving the same Haar random state. We leverage this result to present the first construction of unclonable encryption satisfying indistinguishability security, with quantum decryption keys, in the plain model. We also show other implications to single-decryptor encryption and leakage-resilient secret sharing.

Updated: 2024-05-16 17:30:55

标题: 同时哈尔不可区分性及其在不可克隆密码学中的应用

摘要: 不可克隆密码学涉及利用无法复制原理来构建经典情况下无法实现的加密原语。了解不可克隆加密的可行性,其中一个关键不可克隆原语,满足明文模型中的不可区分性安全性一直是该领域的一个重要开放问题。到目前为止,现有的不可克隆加密构造要么在量子随机预言模型中,要么基于新的猜想。 我们提出了一种新的不可克隆加密方法,通过将其简化为有关非局部量子态识别的新问题:非沟通但纠缠的玩家如何区分不同的量子态分布?我们称这个任务为同时状态不可区分性。我们的主要技术结果是表明玩家无法区分每个玩家独立选择的Haar随机态和所有玩家接收相同的Haar随机态。 我们利用这一结果提出了第一个在明文模型中具有量子解密密钥的不可克隆加密构造,满足不可区分性安全性。我们还展示了对单解密者加密和泄漏鲁棒秘密共享的其他影响。

更新时间: 2024-05-16 17:30:55

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2405.10274v1

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations in facial motion for the same identity. To tackle these issues, we introduce a motion sampler based on conditional flow matching, which is capable of high-quality motion code generation in an efficient way. Moreover, we introduce a novel conditioning method for the TTS system, which utilises motion-removed features from the TFG model to yield uniform speech outputs. Our extensive experiments demonstrate that our method effectively creates natural-looking talking faces and speech that accurately match the input text. To our knowledge, this is the first effort to build a multimodal synthesis system that can generalise to unseen identities.

Updated: 2024-05-16 17:29:37

标题: Faces that Speak: 从文本中共同合成说话的面孔和语音

摘要: 这项工作的目标是同时从文本生成自然的说话脸部和语音输出。我们通过将Talking Face Generation (TFG)和Text-to-Speech (TTS)系统集成到统一框架中来实现这一目标。我们解决了每项任务的主要挑战:(1)生成代表真实场景的一系列头部姿势,以及(2)确保同一身份的面部运动变化时声音的一致性。为了解决这些问题,我们引入了基于条件流匹配的动作采样器,能够以高质量的方式生成动作代码。此外,我们还引入了一种新颖的TTS系统调节方法,利用TFG模型中去除动作的特征产生统一的语音输出。我们的广泛实验证明,我们的方法有效地创建了看起来自然的说话脸部和准确匹配输入文本的语音。据我们了解,这是第一次尝试构建一个能够推广到未见身份的多模态综合系统。

更新时间: 2024-05-16 17:29:37

领域: cs.CV,cs.AI,cs.SD,eess.AS,eess.IV

下载: http://arxiv.org/abs/2405.10272v1

Automated Federated Learning via Informed Pruning

Federated learning (FL) represents a pivotal shift in machine learning (ML) as it enables collaborative training of local ML models coordinated by a central aggregator, all without the need to exchange local data. However, its application on edge devices is hindered by limited computational capabilities and data communication challenges, compounded by the inherent complexity of Deep Learning (DL) models. Model pruning is identified as a key technique for compressing DL models on devices with limited resources. Nonetheless, conventional pruning techniques typically rely on manually crafted heuristics and demand human expertise to achieve a balance between model size, speed, and accuracy, often resulting in sub-optimal solutions. In this study, we introduce an automated federated learning approach utilizing informed pruning, called AutoFLIP, which dynamically prunes and compresses DL models within both the local clients and the global server. It leverages a federated loss exploration phase to investigate model gradient behavior across diverse datasets and losses, providing insights into parameter significance. Our experiments showcase notable enhancements in scenarios with strong non-IID data, underscoring AutoFLIP's capacity to tackle computational constraints and achieve superior global convergence.

Updated: 2024-05-16 17:27:41

标题: 通过信息修剪实现自动化的联邦学习

摘要: 联邦学习(FL)代表着机器学习(ML)中的一个重要转变,因为它实现了由中央聚合器协调的本地ML模型的协作训练,而无需交换本地数据。然而,在边缘设备上应用FL受限于有限的计算能力和数据通信挑战,加上深度学习(DL)模型固有的复杂性。模型剪枝被认为是在资源有限的设备上压缩DL模型的关键技术。然而,传统的剪枝技术通常依赖于手工制定的启发式方法,并要求人类专业知识来在模型大小、速度和准确度之间取得平衡,通常导致次优解决方案。 在本研究中,我们介绍了一种利用知情剪枝的自动化联邦学习方法,称为AutoFLIP,它动态地在本地客户端和全局服务器内对DL模型进行剪枝和压缩。它利用联邦损失探索阶段来研究跨多种数据集和损失的模型梯度行为,提供参数重要性的洞察。我们的实验展示了在强非IID数据情况下的显着增强,强调了AutoFLIP处理计算约束和实现卓越全局收敛的能力。

更新时间: 2024-05-16 17:27:41

领域: cs.LG,cs.AI,cs.DC,cs.ET

下载: http://arxiv.org/abs/2405.10271v1

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.

Updated: 2024-05-16 17:25:01

标题: DeepSeek-V2:一种强大、经济高效的专家混合语言模型

摘要: 我们提出了DeepSeek-V2,这是一个强大的混合专家(MoE)语言模型,具有经济的训练和高效的推断。它包含236B总参数,其中每个标记激活了21B,并支持128K标记的上下文长度。DeepSeek-V2采用创新的架构,包括多头潜在注意力(MLA)和DeepSeekMoE。MLA通过将关键值(KV)缓存显著压缩成潜在向量来保证高效的推断,而DeepSeekMoE通过稀疏计算实现经济成本下训练强大的模型。与DeepSeek 67B相比,DeepSeek-V2实现了显著更强的性能,同时节省了42.5%的训练成本,将KV缓存减少了93.3%,并将最大生成吞吐量提高了5.76倍。我们在由81T标记组成的高质量和多源语料库上预训练DeepSeek-V2,并进一步进行监督微调(SFT)和强化学习(RL)以充分发挥其潜力。评估结果表明,即使只有21B激活参数,DeepSeek-V2及其聊天版本仍然在开源模型中实现了顶级性能。

更新时间: 2024-05-16 17:25:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04434v3

PhilHumans: Benchmarking Machine Learning for Personal Health

The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leveraging HUman-MAchine Natural interactions (PhilHumans), a holistic suite of benchmarks for machine learning across different Healthcare settings - talk therapy, diet coaching, emergency care, intensive care, obstetric sonography - as well as different learning settings, such as action anticipation, timeseries modeling, insight mining, language modeling, computer vision, reinforcement learning and program synthesis

Updated: 2024-05-16 17:24:01

标题: PhilHumans:用于个人健康的机器学习基准测试

摘要: 在医疗保健中使用机器学习有潜力改善患者预后,同时扩大医疗保健的覆盖范围和可负担性。其他应用领域的历史表明,强有力的基准对于智能系统的发展至关重要。我们提出了利用人机自然交互的个人健康界面(PhilHumans),这是一个全面的机器学习基准套件,可用于不同的医疗保健环境(如谈话疗法、饮食指导、急救、重症监护、产科超声检查)以及不同的学习设置,例如行动预测、时间序列建模、洞察发掘、语言建模、计算机视觉、强化学习和程序合成。

更新时间: 2024-05-16 17:24:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.02770v2

An invitation to the sample complexity of quantum hypothesis testing

Quantum hypothesis testing (QHT) has been traditionally studied from the information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of samples of an unknown state. In this paper, we study the sample complexity of QHT, wherein the goal is to determine the minimum number of samples needed to reach a desired error probability. By making use of the wealth of knowledge that already exists in the literature on QHT, we characterize the sample complexity of binary QHT in the symmetric and asymmetric settings, and we provide bounds on the sample complexity of multiple QHT. In more detail, we prove that the sample complexity of symmetric binary QHT depends logarithmically on the inverse error probability and inversely on the negative logarithm of the fidelity. As a counterpart of the quantum Stein's lemma, we also find that the sample complexity of asymmetric binary QHT depends logarithmically on the inverse type II error probability and inversely on the quantum relative entropy, provided that the type II error probability is sufficiently small. We then provide lower and upper bounds on the sample complexity of multiple QHT, with it remaining an intriguing open question to improve these bounds. The final part of our paper outlines and reviews how sample complexity of QHT is relevant to a broad swathe of research areas and can enhance understanding of many fundamental concepts, including quantum algorithms for simulation and search, quantum learning and classification, and foundations of quantum mechanics. As such, we view our paper as an invitation to researchers coming from different communities to study and contribute to the problem of sample complexity of QHT, and we outline a number of open directions for future research.

Updated: 2024-05-16 17:20:50

标题: 一个关于量子假设检验样本复杂度的邀请

摘要: 量子假设检验(QHT)传统上是从信息论的角度研究的,其中人们对作为未知状态样本数量的函数的误差概率的优化衰减率感兴趣。在本文中,我们研究了QHT的样本复杂度,目标是确定达到所需误差概率所需的最小样本数量。通过利用已有的关于QHT的文献中存在的丰富知识,我们表征了对称和非对称设置中二元QHT的样本复杂度,并提供了多重QHT样本复杂度的界限。更详细地说,我们证明了对称二进制QHT的样本复杂度对于误差概率的倒数和保真度的负对数具有对数依赖关系。作为量子Stein引理的对应物,我们还发现,非对称二元QHT的样本复杂度依赖于类型II错误概率的倒数和量子相对熵的负对数,前提是类型II错误概率足够小。然后,我们提供了多重QHT样本复杂度的下界和上界,对于如何改进这些边界仍然是一个有趣的开放问题。我们论文的最后部分概述和回顾了QHT的样本复杂度如何与广泛的研究领域相关,并且可以增进对许多基本概念的理解,包括用于模拟和搜索的量子算法、量子学习和分类以及量子力学的基础。因此,我们认为我们的论文是对来自不同社区的研究人员研究和贡献QHT样本复杂度问题的邀请,并概述了未来研究的一些开放方向。

更新时间: 2024-05-16 17:20:50

领域: quant-ph,cs.IT,cs.LG,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2403.17868v3

Sharpness-Aware Minimization in Genetic Programming

Sharpness-Aware Minimization (SAM) was recently introduced as a regularization procedure for training deep neural networks. It simultaneously minimizes the fitness (or loss) function and the so-called fitness sharpness. The latter serves as a %connection between the geometry of the fitness landscape measure of the nonlinear behavior of a solution %and generalization and does so by finding solutions that lie in neighborhoods having uniformly similar loss values across all fitness cases. In this contribution, we adapt SAM for tree Genetic Programming (TGP) by exploring the semantic neighborhoods of solutions using two simple approaches By capitalizing upon perturbing input and output of program trees, sharpness can be estimated and used as a second optimization criterion during the evolution. To better understand the impact of this variant of SAM on TGP, we collect numerous indicators of the evolutionary process, including generalization ability, complexity, diversity, and a recently proposed genotype-phenotype mapping to study the amount of redundancy in trees. The experimental results demonstrate that using any of the two proposed SAM adaptations in TGP allows (i) a significant reduction of tree sizes in the population and (ii) a decrease in redundancy of the trees. When assessed on real-world benchmarks, the generalization ability of the elite solutions does not deteriorate.

Updated: 2024-05-16 17:19:58

标题: 遗传规划中的锐度感知最小化

摘要: 最近引入了Sharpness-Aware Minimization (SAM)作为训练深度神经网络的正则化过程。它同时最小化适应性(或损失)函数和所谓的适应性锐度。后者作为连接适应性地形度量和解决方案的非线性行为的几何形状之间的连接,并通过寻找位于具有统一损失值的邻域中的解决方案来实现。在这篇论文中,我们通过利用两种简单的方法将SAM调整为树形遗传编程(TGP),通过扰动程序树的输入和输出来探索解决方案的语义邻域,从而估计锐度并在进化过程中将其用作第二个优化标准。为了更好地理解SAM对TGP的影响,我们收集了许多进化过程的指标,包括泛化能力、复杂性、多样性和最近提出的基因型-表型映射,以研究树中冗余量。实验结果表明,在TGP中使用任何两种提出的SAM改进都允许(i)显著减少种群中的树大小,(ii)减少树的冗余。在真实世界的基准测试中,优秀解决方案的泛化能力并不会下降。

更新时间: 2024-05-16 17:19:58

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2405.10267v1

Architectures and random properties of symplectic quantum circuits

Parametrized and random unitary (or orthogonal) $n$-qubit circuits play a central role in quantum information. As such, one could naturally assume that circuits implementing symplectic transformation would attract similar attention. However, this is not the case, as $\mathbb{SP}(d/2)$ -- the group of $d\times d$ unitary symplectic matrices -- has thus far been overlooked. In this work, we aim at starting to right this wrong. We begin by presenting a universal set of generators $\mathcal{G}$ for the symplectic algebra $i\mathfrak{sp}(d/2)$, consisting of one- and two-qubit Pauli operators acting on neighboring sites in a one-dimensional lattice. Here, we uncover two critical differences between such set, and equivalent ones for unitary and orthogonal circuits. Namely, we find that the operators in $\mathcal{G}$ cannot generate arbitrary local symplectic unitaries and that they are not translationally invariant. We then review the Schur-Weyl duality between the symplectic group and the Brauer algebra, and use tools from Weingarten calculus to prove that Pauli measurements at the output of Haar random symplectic circuits can converge to Gaussian processes. As a by-product, such analysis provides us with concentration bounds for Pauli measurements in circuits that form $t$-designs over $\mathbb{SP}(d/2)$. To finish, we present tensor-network tools to analyze shallow random symplectic circuits, and we use these to numerically show that computational-basis measurements anti-concentrate at logarithmic depth.

Updated: 2024-05-16 17:15:39

标题: 辛量子电路的结构和随机特性

摘要: 参数化和随机酉(或正交)$n$量子比特电路在量子信息中起着核心作用。因此,人们自然可以假设实施辛变换的电路将引起类似的关注。然而,事实并非如此,因为迄今为止$\mathbb{SP}(d/2)$ - $d\times d$酉辛矩阵群 - 被忽视了。在这项工作中,我们旨在开始纠正这一错误。我们首先提出了辛代数$i\mathfrak{sp}(d/2)$的一组通用生成元$\mathcal{G}$,包括在一维格点上作用在相邻位置的一和两量子比特Pauli算符。在这里,我们揭示了这种集合与酉和正交电路的等效集合之间的两个关键区别。即,我们发现$\mathcal{G}$中的算符无法生成任意局部辛酉和它们不具有平移不变性。然后,我们回顾了辛群与Brauer代数之间的Schur-Weyl对偶关系,并利用Weingarten积分学工具证明了在Haar随机辛电路输出处的Pauli测量可以收敛到高斯过程。作为副产品,这种分析为我们提供了在形成$\mathbb{SP}(d/2)$上的$t$-设计电路中Pauli测量的集中边界。最后,我们提出了张量网络工具来分析浅层随机辛电路,并利用这些工具数值地展示了计算基础测量在对数深度处出现反集中。

更新时间: 2024-05-16 17:15:39

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2405.10264v1

On Partially Unitary Learning

The problem of an optimal mapping between Hilbert spaces $IN$ of $\left|\psi\right\rangle$ and $OUT$ of $\left|\phi\right\rangle$ based on a set of wavefunction measurements (within a phase) $\psi_l \to \phi_l$, $l=1\dots M$, is formulated as an optimization problem maximizing the total fidelity $\sum_{l=1}^{M} \omega^{(l)} \left|\langle\phi_l|\mathcal{U}|\psi_l\rangle\right|^2$ subject to probability preservation constraints on $\mathcal{U}$ (partial unitarity). Constructed operator $\mathcal{U}$ can be considered as a $IN$ to $OUT$ quantum channel; it is a partially unitary rectangular matrix of the dimension $\dim(OUT) \times \dim(IN)$ transforming operators as $A^{OUT}=\mathcal{U} A^{IN} \mathcal{U}^{\dagger}$. An iteration algorithm finding the global maximum of this optimization problem is developed and it's application to a number of problems is demonstrated. A software product implementing the algorithm is available from the authors.

Updated: 2024-05-16 17:13:55

标题: 关于部分幺正学习

摘要: 根据一组波函数测量(在相位内)$\psi_l \to \phi_l$, $l=1\dots M$,在Hilbert空间$IN$的$\left|\psi\right\rangle$和$OUT$的$\left|\phi\right\rangle$之间的最优映射问题被制定为一个优化问题,最大化总保真度$\sum_{l=1}^{M} \omega^{(l)} \left|\langle\phi_l|\mathcal{U}|\psi_l\rangle\right|^2$,在$\mathcal{U}$上满足概率保持约束(部分幺正性)。构建的算子$\mathcal{U}$可以被视为一个$IN$到$OUT$的量子通道;它是一个部分幺正的矩形矩阵,维度为$\dim(OUT) \times \dim(IN)$,将算子转换为$A^{OUT}=\mathcal{U} A^{IN} \mathcal{U}^{\dagger}$。开发了一个迭代算法来找到这个优化问题的全局最大值,并展示了它在一些问题上的应用。实现该算法的软件产品可从作者那里获得。

更新时间: 2024-05-16 17:13:55

领域: cs.LG,cs.NA,math.NA,quant-ph,stat.ML

下载: http://arxiv.org/abs/2405.10263v1

Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features

This paper investigates the dynamics of a deep neural network (DNN) learning interactions. Previous studies have discovered and mathematically proven that given each input sample, a well-trained DNN usually only encodes a small number of interactions (non-linear relationships) between input variables in the sample. A series of theorems have been derived to prove that we can consider the DNN's inference equivalent to using these interactions as primitive patterns for inference. In this paper, we discover the DNN learns interactions in two phases. The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders. We can consider the two-phase phenomenon as the starting point of a DNN learning over-fitted features. Such a phenomenon has been widely shared by DNNs with various architectures trained for different tasks. Therefore, the discovery of the two-phase dynamics provides a detailed mechanism for how a DNN gradually learns different inference patterns (interactions). In particular, we have also verified the claim that high-order interactions have weaker generalization power than low-order interactions. Thus, the discovered two-phase dynamics also explains how the generalization power of a DNN changes during the training process.

Updated: 2024-05-16 17:13:25

标题: 两相互作用动力学解释了DNN学习过度拟合特征的起始点

摘要: 本文研究了深度神经网络(DNN)学习交互作用的动态。先前的研究发现并数学证明,对于每个输入样本,一个训练良好的DNN通常只编码样本中输入变量之间的少量交互作用(非线性关系)。已经推导出一系列定理证明,我们可以将DNN的推断等同于使用这些交互作用作为原始模式进行推断。在本文中,我们发现DNN学习交互作用分为两个阶段。第一阶段主要惩罚中高阶交互作用,第二阶段主要学习逐渐增加阶数的交互作用。我们可以将这种两阶段现象视为DNN学习过度拟合特征的起点。这种现象在训练用于不同任务的具有不同架构的DNN中广泛共享。因此,发现两阶段动态为DNN逐渐学习不同推断模式(交互作用)提供了详细的机制。特别地,我们还验证了高阶交互作用比低阶交互作用具有较弱的泛化能力的说法。因此,发现的两阶段动态也解释了DNN在训练过程中泛化能力的变化。

更新时间: 2024-05-16 17:13:25

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.10262v1

Keep It Private: Unsupervised Privatization of Online Text

Authorship obfuscation techniques hold the promise of helping people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.

Updated: 2024-05-16 17:12:18

标题: 保持私密:在线文本的无监督私有化

摘要: 作者模糊技术有望通过自动重写文本来隐藏原始作者的身份,从而帮助人们保护在线通信中的隐私。然而,在自然语言处理文献中,作者模糊技术已在狭窄的环境中进行评估,并主要通过表面编辑操作来处理,这可能导致不自然的输出。在这项工作中,我们引入了一种通过强化学习对大型语言模型进行微调的自动文本私有化框架,以产生在合理性、意义和隐私之间平衡的重写。我们在由68k作者撰写的英文Reddit帖子的大规模测试集上进行了广泛评估,这些帖子由短至中等长度的文本组成。我们研究了在评估条件包括作者资料长度和作者识别策略的情况下,性能如何变化。我们的方法根据自动度量标准和人类评估均保持高质量的文本质量,并成功逃避了几种自动作者识别攻击。

更新时间: 2024-05-16 17:12:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.10260v1

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Offline reinforcement learning (RL) aims to infer sequential decision policies using only offline datasets. This is a particularly difficult setup, especially when learning to achieve multiple different goals or outcomes under a given scenario with only sparse rewards. For offline learning of goal-conditioned policies via supervised learning, previous work has shown that an advantage weighted log-likelihood loss guarantees monotonic policy improvement. In this work we argue that, despite its benefits, this approach is still insufficient to fully address the distribution shift and multi-modality problems. The latter is particularly severe in long-horizon tasks where finding a unique and optimal policy that goes from a state to the desired goal is challenging as there may be multiple and potentially conflicting solutions. To tackle these challenges, we propose a complementary advantage-based weighting scheme that introduces an additional source of inductive bias: given a value-based partitioning of the state space, the contribution of actions expected to lead to target regions that are easier to reach, compared to the final goal, is further increased. Empirically, we demonstrate that the proposed approach, Dual-Advantage Weighted Offline Goal-conditioned RL (DAWOG), outperforms several competing offline algorithms in commonly used benchmarks. Analytically, we offer a guarantee that the learnt policy is never worse than the underlying behaviour policy.

Updated: 2024-05-16 17:07:44

标题: 通过状态空间划分进行目标导向的离线强化学习

摘要: 离线强化学习(RL)旨在仅使用离线数据集推断顺序决策策略。这是一个特别困难的设置,特别是在学习在给定情景下实现多个不同目标或结果时,只有稀疏奖励。对于通过监督学习学习目标条件策略的离线学习,先前的工作表明,一种优势加权对数似然损失可以保证策略的单调改进。在这项工作中,我们认为,尽管这种方法具有益处,但仍不足以完全解决分布转移和多模性问题。后者在长期任务中特别严重,其中寻找从状态到期望目标的唯一和最佳策略具有挑战性,因为可能存在多个潜在冲突的解决方案。为了应对这些挑战,我们提出了一种补充的基于优势加权的方案,引入了额外的归纳偏置来源:鉴于状态空间的基于价值的分区,预计导致比最终目标更容易到达的目标区域的行动的贡献会进一步增加。在经验上,我们展示了所提出的方法,双优势加权离线目标条件RL(DAWOG),在常用基准测试中优于几种竞争的离线算法。从分析上看,我们保证学习到的策略永远不会比底层行为策略更糟。

更新时间: 2024-05-16 17:07:44

领域: cs.LG

下载: http://arxiv.org/abs/2303.09367v2

PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology

Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately. The requirement to train a network to aggregate information across a large number of tiles in multiple whole slide images limits these models' impact. In this work, we present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings and leverages clinical report text for pre-training. Using the tile embeddings, PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use. Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching and surpassing that of a supervised aggregator model. Using the slide embeddings with linear classifiers, PRISM surpasses supervised aggregator models. Furthermore, we demonstrate that fine-tuning of the PRISM slide encoder yields label-efficient training for biomarker prediction, a task that typically suffers from low availability of training data; an aggregator initialized with PRISM and trained on as little as 10% of the training data can outperform a supervised baseline that uses all of the data.

Updated: 2024-05-16 16:59:12

标题: PRISM:用于幻灯片级组织病理学的多模式生成基础模型

摘要: 计算病理学中的基础模型承诺解锁新的临床决策支持系统和精准医学模型的发展。然而,目前大多数临床分析是在一个或多个整个切片图像级别上定义的,与迄今为止处理整个切片图像中包含的数千个图像瓦片的基础模型之间存在不匹配。训练网络来聚合多个整个切片图像中大量瓦片的信息的要求限制了这些模型的影响。在这项工作中,我们提出了一个适用于H&E染色组织病理学的切片级基础模型PRISM,它建立在Virchow图块嵌入上,并利用临床报告文本进行预训练。使用图块嵌入,PRISM生成具有生成临床报告能力的切片级嵌入,导致多种使用模式。使用文本提示,PRISM实现了零-shot癌症检测和亚型性能,接近甚至超过受监督聚合模型的性能。使用线性分类器和幻灯片嵌入,PRISM超越了受监督的聚合器模型。此外,我们展示了对PRISM幻灯片编码器进行微调可实现对生物标记预测的标签高效训练,这通常受到训练数据稀缺的限制;使用PRISM初始化的聚合器并在仅使用10%的训练数据进行训练可以胜过使用全部数据的受监督基线。

更新时间: 2024-05-16 16:59:12

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.10254v1

DocuMint: Docstring Generation for Python using Small Language Models

Effective communication, specifically through documentation, is the beating heart of collaboration among contributors in software development. Recent advancements in language models (LMs) have enabled the introduction of a new type of actor in that ecosystem: LM-powered assistants capable of code generation, optimization, and maintenance. Our study investigates the efficacy of small language models (SLMs) for generating high-quality docstrings by assessing accuracy, conciseness, and clarity, benchmarking performance quantitatively through mathematical formulas and qualitatively through human evaluation using Likert scale. Further, we introduce DocuMint, as a large-scale supervised fine-tuning dataset with 100,000 samples. In quantitative experiments, Llama 3 8B achieved the best performance across all metrics, with conciseness and clarity scores of 0.605 and 64.88, respectively. However, under human evaluation, CodeGemma 7B achieved the highest overall score with an average of 8.3 out of 10 across all metrics. Fine-tuning the CodeGemma 2B model using the DocuMint dataset led to significant improvements in performance across all metrics, with gains of up to 22.5% in conciseness. The fine-tuned model and the dataset can be found in HuggingFace and the code can be found in the repository.

Updated: 2024-05-16 16:46:46

标题: DocuMint:使用小语言模型为Python生成文档字符串

摘要: 有效沟通,尤其是通过文档,是软件开发中贡献者之间合作的核心。最近语言模型(LMs)的进展使得在这个生态系统中引入了一种新型参与者:LM驱动的助手,能够进行代码生成、优化和维护。我们的研究通过评估准确性、简洁性和清晰度,定量通过数学公式和定性通过利克特量表的人类评估来调查小型语言模型(SLMs)生成高质量文档字符串的有效性。此外,我们引入了DocuMint,作为一个包含10万个样本的大规模监督微调数据集。在定量实验中,Llama 3 8B在所有指标上表现最佳,简洁性和清晰度分别为0.605和64.88。然而,在人类评估中,CodeGemma 7B在所有指标上取得了最高的整体得分,平均为10分中的8.3分。使用DocuMint数据集对CodeGemma 2B模型进行微调,显著提高了所有指标的表现,简洁性提高了高达22.5%。微调后的模型和数据集可以在HuggingFace中找到,代码可以在存储库中找到。

更新时间: 2024-05-16 16:46:46

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2405.10243v1

Lookbehind-SAM: k steps back, 1 step forward

Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective. In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off. By taking inspiration from the Lookahead optimizer, which uses multiple descent steps ahead, we propose Lookbehind, which performs multiple ascent steps behind to enhance the maximization step of SAM and find a worst-case perturbation with higher loss. Then, to mitigate the variance in the descent step arising from the gathered gradients across the multiple ascent steps, we employ linear interpolation to refine the minimization step. Lookbehind leads to a myriad of benefits across a variety of tasks. Particularly, we show increased generalization performance, greater robustness against noisy weights, as well as improved learning and less catastrophic forgetting in lifelong learning settings. Our code is available at https://github.com/chandar-lab/Lookbehind-SAM.

Updated: 2024-05-16 16:44:11

标题: 回顾-SAM:向后k步,向前1步

摘要: 锐度感知最小化(SAM)方法通过将最小化损失值和损失锐度问题构建为一个极小极大目标而越来越受欢迎。在这项工作中,我们提高了SAM目标的最大化和最小化部分的效率,以实现更好的损失锐度权衡。受到Lookahead优化器的启发,该优化器使用多个下降步骤,我们提出了Lookbehind,它执行多个上升步骤以增强SAM的最大化步骤,并找到具有更高损失值的最坏情况扰动。然后,为了减少由于多个上升步骤中收集到的梯度造成的下降步骤的方差,我们采用线性插值来优化最小化步骤。Lookbehind在各种任务中带来了许多好处。特别是,我们展示了增强的泛化性能,更强的抗噪声权重,以及在终身学习环境中改善的学习和减少灾难性遗忘。我们的代码可以在https://github.com/chandar-lab/Lookbehind-SAM找到。

更新时间: 2024-05-16 16:44:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.16704v3

Invariant Risk Minimization Is A Total Variation Model

Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.

Updated: 2024-05-16 16:37:01

标题: 不变风险最小化是一个总变差模型

摘要: 不变风险最小化(IRM)是一种在机器学习中推广不变特征到不同环境的新兴方法。虽然大多数相关工作都集中在新的IRM设置或新的应用场景上,但IRM的数学本质仍然需要得到适当解释。我们验证IRM本质上是关于学习风险的$L^2$范数(TV-$\ell_2$)的总变差。此外,我们提出了一种基于TV-$\ell_1$模型的新颖IRM框架。它不仅扩展了可以用作学习风险的函数类,还在基于共面积公式的去噪和不变特征保护方面具有鲁棒性表现。我们还阐明了IRM-TV-$\ell_1$实现超出分布的泛化的一些要求。实验结果表明,提出的框架在几个基准机器学习场景中取得了竞争性表现。

更新时间: 2024-05-16 16:37:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.01389v4

Results about sets of desirable gamble sets

Coherent sets of desirable gamble sets is used as a model for representing an agents opinions and choice preferences under uncertainty. In this paper we provide some results about the axioms required for coherence and the natural extension of a given set of desirable gamble sets. We also show that coherent sets of desirable gamble sets can be represented by a proper filter of coherent sets of desirable gambles.

Updated: 2024-05-16 16:35:01

标题: 关于理想赌注集合的结果

摘要: 一致的理想赌注集合被用作代表一个代理人在不确定性下的意见和选择偏好的模型。在本文中,我们提供了关于所需的一致性公理以及给定的理想赌注集合的自然扩展的一些结果。我们还展示了一致的理想赌注集合可以通过一组一致的理想赌注的适当过滤器来表示。

更新时间: 2024-05-16 16:35:01

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2404.17924v2

Influencer Cartels

Social media influencers account for a growing share of marketing worldwide. We demonstrate the existence of a novel form of market failure in this advertising market: influencer cartels, where groups of influencers collude to increase their advertising revenue by inflating their engagement. Our theoretical model shows that influencer cartels can improve consumer welfare if they expand social media engagement to the target audience, or reduce welfare if they divert engagement to less relevant audiences. We validate the model empirically using novel data on influencer cartels combined with machine learning tools, and derive policy implications for how to maximize consumer welfare.

Updated: 2024-05-16 16:29:49

标题: 影响者卡特尔

摘要: 社交媒体影响者在全球营销中所占比例不断增长。我们证明了在这个广告市场中存在一种新形式的市场失灵:影响者卡特尔,即一群影响者共谋增加他们的广告收入,通过夸大他们的互动。我们的理论模型表明,如果影响者卡特尔将社交媒体互动扩展到目标受众,可以提高消费者福利,或者如果他们将互动转向不太相关的受众,则会降低福利。我们利用关于影响者卡特尔的新数据结合机器学习工具从实证角度验证了该模型,并得出了如何最大化消费者福利的政策含义。

更新时间: 2024-05-16 16:29:49

领域: econ.GN,cs.CY,cs.LG,q-fin.EC

下载: http://arxiv.org/abs/2405.10231v1

Random ReLU Neural Networks as Non-Gaussian Processes

We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

Updated: 2024-05-16 16:28:11

标题: 随机ReLU神经网络作为非高斯过程

摘要: 我们考虑了一个具有随机初始化参数和修正线性单元激活函数的大类浅层神经网络。我们证明这些随机神经网络是明确定义的非高斯过程。作为副产品,我们证明这些网络是由冲击白噪声驱动的随机微分方程的解(随机Dirac测度的组合)。这些过程由权重和偏置的法则以及输入域每个有界区域内激活阈值的密度参数化。我们证明这些过程是各向同性的,并且具有Hurst指数为$3/2$的广义自相似性。我们还推导出它们的自协方差函数的一个非常简单的封闭形式表达式。我们的结果与先前的工作根本不同,因为我们考虑了一个非渐近的观点:输入域每个有界区域的神经元数量(即宽度)本身是一个随机变量,其均值与密度参数成比例。最后,我们证明在适当的假设下,随着期望宽度趋于无穷大,这些过程不仅可以收敛到高斯过程,还可以收敛到取决于权重法则的非高斯过程。我们的渐近结果提供了对几个经典结果的新见解(宽网络收敛到高斯过程),以及一些新的结果(宽网络可以收敛到非高斯过程)。

更新时间: 2024-05-16 16:28:11

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2405.10229v1

VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model

Security practitioners maintain vulnerability reports (e.g., GitHub Advisory) to help developers mitigate security risks. An important task for these databases is automatically extracting structured information mentioned in the report, e.g., the affected software packages, to accelerate the defense of the vulnerability ecosystem. However, it is challenging for existing work on affected package identification to achieve a high accuracy. One reason is that all existing work focuses on relatively smaller models, thus they cannot harness the knowledge and semantic capabilities of large language models. To address this limitation, we propose VulLibGen, the first method to use LLM for affected package identification. In contrast to existing work, VulLibGen proposes the novel idea to directly generate the affected package. To improve the accuracy, VulLibGen employs supervised fine-tuning (SFT), retrieval augmented generation (RAG) and a local search algorithm. The local search algorithm is a novel postprocessing algorithm we introduce for reducing the hallucination of the generated packages. Our evaluation results show that VulLibGen has an average accuracy of 0.806 for identifying vulnerable packages in the four most popular ecosystems in GitHub Advisory (Java, JS, Python, Go) while the best average accuracy in previous work is 0.721. Additionally, VulLibGen has high value to security practice: we submitted 60 <vulnerability, affected package> pairs to GitHub Advisory (covers four ecosystems). 34 of them have been accepted and merged and 20 are pending approval. Our code and dataset can be found in the attachments.

Updated: 2024-05-16 16:15:12

标题: VulLibGen:通过生成式预训练模型识别易受攻击的第三方库

摘要: 安全从业者维护漏洞报告(例如,GitHub咨询)以帮助开发人员减轻安全风险。这些数据库的一个重要任务是自动提取报告中提到的结构化信息,例如受影响的软件包,以加速漏洞生态系统的防御。 然而,现有受影响软件包识别工作很难实现高准确性。一个原因是所有现有工作都集中在相对较小的模型上,因此它们无法利用大型语言模型的知识和语义能力。 为了解决这一局限性,我们提出了VulLibGen,这是第一种使用LLM进行受影响软件包识别的方法。与现有工作相反,VulLibGen提出了直接生成受影响软件包的新颖想法。为了提高准确性,VulLibGen采用了监督微调(SFT)、检索增强生成(RAG)和局部搜索算法。局部搜索算法是我们为减少生成软件包的幻觉而引入的一种新颖的后处理算法。我们的评估结果显示,VulLibGen在识别GitHub Advisory中四个最流行生态系统(Java、JS、Python、Go)中的受影响软件包方面具有平均准确率为0.806,而先前工作的最佳平均准确率为0.721。此外,VulLibGen对安全实践具有很高的价值:我们向GitHub Advisory提交了60个<vulnerability,affected package>对(涵盖四个生态系统)。其中34个已被接受并合并,20个正在等待批准。我们的代码和数据集可以在附件中找到。

更新时间: 2024-05-16 16:15:12

领域: cs.CR

下载: http://arxiv.org/abs/2308.04662v2

Scalarisation-based risk concepts for robust multi-objective optimisation

Robust optimisation is a well-established framework for optimising functions in the presence of uncertainty. The inherent goal of this problem is to identify a collection of inputs whose outputs are both desirable for the decision maker, whilst also being robust to the underlying uncertainties in the problem. In this work, we study the multi-objective extension of this problem from a computational standpoint. We identify that the majority of all robust multi-objective algorithms rely on two key operations: robustification and scalarisation. Robustification refers to the strategy that is used to marginalise over the uncertainty in the problem. Whilst scalarisation refers to the procedure that is used to encode the relative importance of each objective. As these operations are not necessarily commutative, the order that they are performed in has an impact on the resulting solutions that are identified and the final decisions that are made. This work aims to give an exposition on the philosophical differences between these two operations and highlight when one should opt for one ordering over the other. As part of our analysis, we showcase how many existing risk concepts can be easily integrated into the specification and solution of a robust multi-objective optimisation problem. Besides this, we also demonstrate how one can principally define the notion of a robust Pareto front and a robust performance metric based on our robustify and scalarise methodology. To illustrate the efficacy of these new ideas, we present two insightful numerical case studies which are based on real-world data sets.

Updated: 2024-05-16 16:11:00

标题: 基于标量化的风险概念用于强健多目标优化

摘要: 鲁棒优化是在不确定性存在的情况下优化函数的一个成熟框架。这个问题的固有目标是确定一组输入,其输出既对决策者有利,同时也对问题中的潜在不确定性具有鲁棒性。在这项研究中,我们从计算的角度研究了这个问题的多目标扩展。我们发现大多数鲁棒多目标算法依赖于两个关键操作:鲁棒化和标量化。鲁棒化是指用于消除问题中不确定性的策略。而标量化是指用于编码每个目标的相对重要性的过程。由于这些操作不一定可交换,它们执行的顺序会影响识别的解决方案和最终决策。这项工作旨在阐明这两个操作之间的哲学差异,并强调何时应选择其中一种顺序。作为我们分析的一部分,我们展示了许多现有风险概念如何可以轻松地整合到鲁棒多目标优化问题的规范和解决方案中。除此之外,我们还展示了如何基于我们的鲁棒化和标量化方法原则上定义鲁棒帕累托前沿和鲁棒性能指标的概念。为了说明这些新思想的有效性,我们展示了两个基于真实数据集的有见地的数值案例研究。

更新时间: 2024-05-16 16:11:00

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.10221v1

SoK: Prudent Evaluation Practices for Fuzzing

Fuzzing has proven to be a highly effective approach to uncover software bugs over the past decade. After AFL popularized the groundbreaking concept of lightweight coverage feedback, the field of fuzzing has seen a vast amount of scientific work proposing new techniques, improving methodological aspects of existing strategies, or porting existing methods to new domains. All such work must demonstrate its merit by showing its applicability to a problem, measuring its performance, and often showing its superiority over existing works in a thorough, empirical evaluation. Yet, fuzzing is highly sensitive to its target, environment, and circumstances, e.g., randomness in the testing process. After all, relying on randomness is one of the core principles of fuzzing, governing many aspects of a fuzzer's behavior. Combined with the often highly difficult to control environment, the reproducibility of experiments is a crucial concern and requires a prudent evaluation setup. To address these threats to validity, several works, most notably Evaluating Fuzz Testing by Klees et al., have outlined how a carefully designed evaluation setup should be implemented, but it remains unknown to what extent their recommendations have been adopted in practice. In this work, we systematically analyze the evaluation of 150 fuzzing papers published at the top venues between 2018 and 2023. We study how existing guidelines are implemented and observe potential shortcomings and pitfalls. We find a surprising disregard of the existing guidelines regarding statistical tests and systematic errors in fuzzing evaluations. For example, when investigating reported bugs, ...

Updated: 2024-05-16 16:10:41

标题: 标题翻译:SoK:模糊测试的谨慎评估实践

摘要: 模糊测试在过去十年中已被证明是发现软件缺陷的一种非常有效的方法。在AFL普及了轻量级覆盖反馈的开创性概念之后,模糊测试领域出现了大量的科学工作,提出了新的技术,改进了现有策略的方法论方面,或者将现有方法移植到新的领域。所有这些工作必须通过展示其适用于问题,测量其性能,并经常通过彻底的实证评估展示其优越性来证明其价值。然而,模糊测试对其目标、环境和情况非常敏感,例如,测试过程中的随机性。毕竟,依赖随机性是模糊测试的核心原则之一,影响了模糊测试工具行为的许多方面。结合通常难以控制的环境,实验的可重复性是一个关键问题,需要谨慎的评估设置。为了解决这些对有效性的威胁,几项工作,尤其是Klees等人的《评估模糊测试》一文,已经概述了如何实施一个精心设计的评估设置,但目前尚不清楚他们的建议在实践中被采纳的程度。在本研究中,我们系统分析了2018年至2023年间发表在顶级期刊上的150篇模糊测试论文的评估。我们研究了现有指南是如何实施的,并观察到潜在的缺陷和陷阱。我们发现在模糊测试评估中对于统计测试和系统误差的现有指南存在令人惊讶的忽视。例如,在调查报告的缺陷时,...

更新时间: 2024-05-16 16:10:41

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2405.10220v1

ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks

Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike the classical hierarchical pooling operation that is based on the unclear node assignment and simply computes the averaged feature over the nodes of each cluster, the proposed ENADPool not only employs a hard clustering strategy to assign each node into an unique cluster, but also compress the node features as well as their edge connectivity strengths into the resulting hierarchical structure based on the attention mechanism after each pooling step. As a result, the proposed ENADPool simultaneously identifies the importance of different nodes within each separated cluster and edges between corresponding clusters, that significantly addresses the shortcomings of the uniform edge-node based structure information aggregation arising in the classical hierarchical pooling operation. Moreover, to mitigate the over-smoothing problem arising in existing GNNs, we propose a Multi-distance GNN (MD-GNN) model associated with the proposed ENADPool operation, allowing the nodes to actively and directly receive the feature information from neighbors at different random walk steps. Experiments demonstrate the effectiveness of the MD-GNN associated with the proposed ENADPool.

Updated: 2024-05-16 16:08:49

标题: ENADPool:基于边缘节点注意力的图神经网络可微分池化

摘要: 图神经网络(GNNs)是图分类的强大工具。对于GNNs来说,一个重要的操作是下采样或池化,可以从节点表示中学习有效的嵌入。在本文中,我们提出了一种新的分层池化操作,即基于边-节点注意力的可微池化(ENADPool),用于GNNs学习有效的图表示。与传统的基于不清晰节点分配的分层池化操作不同,后者仅仅计算每个簇中节点的平均特征,我们提出的ENADPool不仅采用了硬聚类策略将每个节点分配到唯一的簇中,而且在每个池化步骤之后基于注意力机制将节点特征以及它们的边连接强度压缩到结果的分层结构中。因此,所提出的ENADPool同时识别了每个分离簇内不同节点的重要性以及相应簇之间的边,显著解决了传统分层池化操作中出现的基于统一边-节点结构信息聚合的缺陷。此外,为了缓解现有GNNs中出现的过度平滑问题,我们提出了一个与所提出的ENADPool操作相关联的多距离GNN(MD-GNN)模型,允许节点在不同随机步骤中主动直接接收来自邻居的特征信息。实验证明了与所提出的ENADPool相关联的MD-GNN的有效性。

更新时间: 2024-05-16 16:08:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.10218v1

A Framework for Improving the Reliability of Black-box Variational Inference

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples.

Updated: 2024-05-16 16:06:36

标题: 一个提高黑匣子变分推断可靠性的框架

摘要: 黑盒变分推断(BBVI)现在在机器学习和统计学中被广泛使用,作为一种快速而灵活的近似贝叶斯推断的替代方法,而不是马尔可夫链蒙特卡洛方法。然而,BBVI的随机优化方法仍然不可靠,并且需要大量的专业知识和手动调整才能有效应用。在本文中,我们提出了Robust and Automated Black-box VI(RABVI),这是一个用于提高BBVI优化可靠性的框架。RABVI基于严格合理的自动化技术,仅包括少量直观调整参数,并检测到对最佳变分逼近的不准确估计。RABVI通过检测固定学习率迭代的收敛性来自适应性地降低学习率,然后估计当前变分逼近和最佳变分逼近之间的对称Kullback-Leibler(KL)散度。它还采用一种新颖的优化终止标准,使用户能够通过比较(i)在使用较小学习率时对称KL散度的预测相对减少和(ii)在收敛时所需的预测计算来平衡所需的准确性与计算成本。我们通过精心设计的模拟研究和一组多样的真实模型和数据示例验证了RABVI的稳健性和准确性。

更新时间: 2024-05-16 16:06:36

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2203.15945v2

Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting

Low-Rank Adaptation (LoRA) is a widely used technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. However, its application to time series data, particularly within foundational models, remains underexplored. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos. We demonstrate LoRA's fine-tuning potential for forecasting the vital signs of sepsis patients in intensive care units (ICUs), emphasizing the models' adaptability to previously unseen, out-of-domain modalities. Integrating LoRA aims to enhance forecasting performance while reducing inefficiencies associated with fine-tuning large models on limited domain-specific data. Our experiments show that LoRA fine-tuning of time series foundational models significantly improves forecasting, achieving results comparable to state-of-the-art models trained from scratch on similar modalities. We conduct comprehensive ablation studies to demonstrate the trade-offs between the number of tunable parameters and forecasting performance and assess the impact of varying LoRA matrix ranks on model performance.

Updated: 2024-05-16 16:05:33

标题: 基于低秩的时间序列基础模型适应性用于跨领域模态预测

摘要: 低秩适应(LoRA)是一种广泛使用的技术,用于微调跨不同模态和任务的大型预先训练或基础模型。然而,其在时间序列数据中的应用,特别是在基础模型中,仍未被充分探索。本文研究了LoRA对当代时间序列基础模型(Lag-Llama,MOIRAI和Chronos)的影响。我们展示了LoRA对于预测重症监护病房(ICU)中败血症患者的生命体征的微调潜力,强调模型对以前未见过的领域模态的适应性。整合LoRA旨在提高预测性能,同时减少在有限领域特定数据上微调大型模型所带来的低效。我们的实验表明,对时间序列基础模型进行LoRA微调显著改善了预测性能,实现了与从头开始训练相似模态的最新模型可比的结果。我们进行了全面的消融研究,以展示可调参数数量与预测性能之间的权衡,并评估LoRA矩阵秩的变化对模型性能的影响。

更新时间: 2024-05-16 16:05:33

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2405.10216v1

SMLP: Symbolic Machine Learning Prover (User Manual)

SMLP: Symbolic Machine Learning Prover an open source tool for exploration and optimization of systems represented by machine learning models. SMLP uses symbolic reasoning for ML model exploration and optimization under verification and stability constraints, based on SMT, constraint and NN solvers. In addition its exploration methods are guided by probabilistic and statistical methods. SMLP is a general purpose tool that requires only data suitable for ML modelling in the csv format (usually samples of the system's input/output). SMLP has been applied at Intel for analyzing and optimizing hardware designs at the analog level. Currently SMLP supports NNs, polynomial and tree models, and uses SMT solvers for reasoning and optimization at the backend, integration of specialized NN solvers is in progress.

Updated: 2024-05-16 16:05:21

标题: SMLP:符号机器学习证明器(用户手册)

摘要: SMLP:Symbolic Machine Learning Prover是一款开源工具,用于探索和优化由机器学习模型表示的系统。SMLP使用符号推理进行机器学习模型的探索和优化,在验证和稳定性约束下,基于SMT、约束和NN求解器。此外,其探索方法受概率和统计方法引导。SMLP是一款通用工具,只需要适合机器学习建模的数据,格式为csv(通常是系统输入/输出的样本)。SMLP已在英特尔应用于分析和优化模拟级别的硬件设计。目前,SMLP支持NNs、多项式和树模型,并在后端使用SMT求解器进行推理和优化,正在整合专门的NN求解器。

更新时间: 2024-05-16 16:05:21

领域: cs.LG,cs.AI,cs.LO,cs.SC,math.OC

下载: http://arxiv.org/abs/2405.10215v1

GPT Store Mining and Analysis

As a pivotal extension of the renowned ChatGPT, the GPT Store serves as a dynamic marketplace for various Generative Pre-trained Transformer (GPT) models, shaping the frontier of conversational AI. This paper presents an in-depth measurement study of the GPT Store, with a focus on the categorization of GPTs by topic, factors influencing GPT popularity, and the potential security risks. Our investigation starts with assessing the categorization of GPTs in the GPT Store, analyzing how they are organized by topics, and evaluating the effectiveness of the classification system. We then examine the factors that affect the popularity of specific GPTs, looking into user preferences, algorithmic influences, and market trends. Finally, the study delves into the security risks of the GPT Store, identifying potential threats and evaluating the robustness of existing security measures. This study offers a detailed overview of the GPT Store's current state, shedding light on its operational dynamics and user interaction patterns. Our findings aim to enhance understanding of the GPT ecosystem, providing valuable insights for future research, development, and policy-making in generative AI.

Updated: 2024-05-16 16:00:35

标题: GPT存储挖掘与分析

摘要: 作为著名ChatGPT的重要延伸,GPT Store作为各种生成预训练Transformer(GPT)模型的动态市场,塑造了对话式人工智能的前沿。本文介绍了对GPT Store的深入测量研究,重点关注GPT按主题分类、影响GPT流行度的因素以及潜在的安全风险。我们的调查从评估GPT Store中GPT的分类开始,分析它们如何按主题组织,并评估分类系统的有效性。然后,我们研究影响特定GPT流行度的因素,研究用户偏好、算法影响和市场趋势。最后,研究深入探讨了GPT Store的安全风险,识别潜在威胁,并评估现有安全措施的稳健性。本研究提供了对GPT Store当前状态的详细概述,揭示了其运营动态和用户互动模式。我们的发现旨在增进对GPT生态系统的理解,为未来生成式人工智能的研究、开发和政策制定提供有价值的见解。

更新时间: 2024-05-16 16:00:35

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2405.10210v1

Machine Learning Infused Distributed Optimization for Coordinating Virtual Power Plant Assets

Amid the increasing interest in the deployment of Distributed Energy Resources (DERs), the Virtual Power Plant (VPP) has emerged as a pivotal tool for aggregating diverse DERs and facilitating their participation in wholesale energy markets. These VPP deployments have been fueled by the Federal Energy Regulatory Commission's Order 2222, which makes DERs and VPPs competitive across market segments. However, the diversity and decentralized nature of DERs present significant challenges to the scalable coordination of VPP assets. To address efficiency and speed bottlenecks, this paper presents a novel machine learning-assisted distributed optimization to coordinate VPP assets. Our method, named LOOP-MAC(Learning to Optimize the Optimization Process for Multi-agent Coordination), adopts a multi-agent coordination perspective where each VPP agent manages multiple DERs and utilizes neural network approximators to expedite the solution search. The LOOP-MAC method employs a gauge map to guarantee strict compliance with local constraints, effectively reducing the need for additional post-processing steps. Our results highlight the advantages of LOOP-MAC, showcasing accelerated solution times per iteration and significantly reduced convergence times. The LOOP-MAC method outperforms conventional centralized and distributed optimization methods in optimization tasks that require repetitive and sequential execution.

Updated: 2024-05-16 15:43:30

标题: Machine Learning融合的分布式优化用于协调虚拟电厂资产

摘要: 在对分布式能源资源(DERs)的部署日益增加的兴趣中,虚拟电力厂(VPP)已经成为聚合各种DERs并促进其参与批发能源市场的关键工具。这些VPP部署得到了联邦能源监管委员会第2222号令的推动,该命令使DERs和VPPs在市场领域具有竞争力。然而,DERs的多样性和分散性对VPP资产的可伸缩协调提出了重大挑战。为了解决效率和速度瓶颈,本文提出了一种新颖的机器学习辅助分布式优化方法来协调VPP资产。我们的方法名为LOOP-MAC(学习优化多智能体协调过程),采用多智能体协调视角,其中每个VPP智能体管理多个DERs,并利用神经网络逼近器来加速解决方案搜索。LOOP-MAC方法采用规范图来保证严格遵守本地约束,有效减少了额外的后处理步骤的需求。我们的结果突显了LOOP-MAC的优势,展示了每次迭代的加速解决方案时间和显著减少的收敛时间。LOOP-MAC方法在需要重复和顺序执行的优化任务中优于传统的集中式和分布式优化方法。

更新时间: 2024-05-16 15:43:30

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.17882v2

ShennongAlpha: an AI-driven sharing and collaboration platform for intelligent curation, acquisition, and translation of natural medicinal material knowledge

Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge. Although NMMs are a major source for drug discovery and clinical application, the utilization and sharing of NMM knowledge face crucial challenges, including the standardized description of critical information, efficient curation and acquisition, and language barriers. To address these, we developed ShennongAlpha, an AI-driven sharing and collaboration platform for intelligent knowledge curation, acquisition, and translation. For standardized knowledge curation, the platform introduced a Systematic Nomenclature to enable accurate differentiation and identification of NMMs. More than fourteen thousand Chinese NMMs have been curated into the platform along with their knowledge. Furthermore, the platform pioneered chat-based knowledge acquisition, standardized machine translation, and collaborative knowledge updating. Together, our study represents the first major advance in leveraging AI to empower NMM knowledge sharing, which not only marks a novel application of AI for Science, but also will significantly benefit the global biomedical, pharmaceutical, physician, and patient communities.

Updated: 2024-05-16 15:38:21

标题: 神农Alpha:一款人工智能驱动的共享与协作平台,用于智能筛选、获取和翻译天然药材知识

摘要: 天然药用材料(NMMs)在全球临床应用中有着悠久的历史,并拥有丰富的记录和知识。虽然NMMs是药物发现和临床应用的主要来源,但NMM知识的利用和分享面临关键挑战,包括关键信息的标准化描述、高效的策划和获取,以及语言障碍。为了解决这些问题,我们开发了ShennongAlpha,这是一个基于人工智能的智能知识策划、获取和翻译的分享和合作平台。为了标准化知识的策划,平台引入了系统命名法,以便准确区分和识别NMMs。超过一万四千种中国NMMs已被策划到平台上,并附带其知识。此外,该平台开创了基于聊天的知识获取、标准化机器翻译和协作知识更新。总的来说,我们的研究代表了利用人工智能推动NMM知识分享的第一个重大进展,这不仅标志着人工智能在科学领域的新应用,还将极大地惠及全球生物医药、制药、医生和患者社区。

更新时间: 2024-05-16 15:38:21

领域: cs.AI,cs.DB,cs.IR

下载: http://arxiv.org/abs/2401.00020v2

The NFLikelihood: an unsupervised DNNLikelihood from Normalizing Flows

We propose the NFLikelihood, an unsupervised version, based on Normalizing Flows, of the DNNLikelihood proposed in Ref.[1]. We show, through realistic examples, how Autoregressive Flows, based on affine and rational quadratic spline bijectors, are able to learn complicated high-dimensional Likelihoods arising in High Energy Physics (HEP) analyses. We focus on a toy LHC analysis example already considered in the literature and on two Effective Field Theory fits of flavor and electroweak observables, whose samples have been obtained throught the HEPFit code. We discuss advantages and disadvantages of the unsupervised approach with respect to the supervised one and discuss possible interplays of the two.

Updated: 2024-05-16 15:05:14

标题: NFLikelihood:来自归一化流的无监督DNNLikelihood

摘要: 我们提出了NFLikelihood,这是DNNLikelihood的无监督版本,基于文献[1]中提出的正态流。我们通过现实示例展示,基于仿射和有理二次样条双射器的自回归流能够学习在高能物理(HEP)分析中出现的复杂高维度似然。我们关注文献中已经考虑过的一个LHC分析示例和两个有效场论适配的味道和电弱可观测量,这些样本是通过HEPFit代码获得的。我们讨论了无监督方法与监督方法之间的优缺点,并讨论了两者之间的可能相互作用。

更新时间: 2024-05-16 15:05:14

领域: hep-ph,cs.LG,hep-ex

下载: http://arxiv.org/abs/2309.09743v3

The WHY in Business Processes: Discovery of Causal Execution Dependencies

Unraveling the causal relationships among the execution of process activities is a crucial element in predicting the consequences of process interventions and making informed decisions regarding process improvements. Process discovery algorithms exploit time precedence as their main source of model derivation. Hence, a causal view can supplement process discovery, being a new perspective in which relations reflect genuine cause-effect dependencies among the tasks. This calls for faithful new techniques to discover the causal execution dependencies among the tasks in the process. To this end, our work offers a systematic approach to the unveiling of the causal business process by leveraging an existing causal discovery algorithm over activity timing. In addition, this work delves into a set of conditions under which process mining discovery algorithms generate a model that is incongruent with the causal business process model, and shows how the latter model can be methodologically employed for a sound analysis of the process. Our methodology searches for such discrepancies between the two models in the context of three causal patterns, and derives a new view in which these inconsistencies are annotated over the mined process model. We demonstrate our methodology employing two open process mining algorithms, the IBM Process Mining tool, and the LiNGAM causal discovery technique. We apply it on a synthesized dataset and on two open benchmark data sets.

Updated: 2024-05-16 14:56:37

标题: 业务流程中的原因:发现因果执行依赖关系

摘要: 解开过程活动执行之间的因果关系是预测过程干预后果和做出关于过程改进的明智决策的关键要素。过程发现算法利用时间先行性作为模型推导的主要来源。因此,因果视角可以补充过程发现,这是一种新的视角,其中关系反映了任务之间的真正因果依赖关系。这需要忠实的新技术来发现过程中任务之间的因果执行依赖关系。为此,我们的工作通过利用现有的因果发现算法来揭示因果业务过程的系统化方法。此外,这项工作深入探讨了过程挖掘发现算法生成与因果业务过程模型不一致的模型的一组条件,并展示了如何方法论地利用后者模型对过程进行深入分析。我们的方法在三种因果模式的背景下搜索这两种模型之间的不一致之处,并推导出一种新的视角,其中这些不一致被注释在挖掘出的过程模型上。我们演示了我们的方法,使用了两种开放的过程挖掘算法,IBM过程挖掘工具和LiNGAM因果发现技术。我们将其应用于一个合成数据集和两个开放基准数据集。

更新时间: 2024-05-16 14:56:37

领域: cs.AI

下载: http://arxiv.org/abs/2310.14975v2

EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

Understanding and explaining the predictions of Graph Neural Networks (GNNs), is crucial for enhancing their safety and trustworthiness. Subgraph-level explanations are gaining attention for their intuitive appeal. However, most existing subgraph-level explainers face efficiency challenges in explaining GNNs due to complex search processes. The key challenge is to find a balance between intuitiveness and efficiency while ensuring transparency. Additionally, these explainers usually induce subgraphs by nodes, which may introduce less-intuitive disconnected nodes in the subgraph-level explanations or omit many important subgraph structures. In this paper, we reveal that inducing subgraph explanations by edges is more comprehensive than other subgraph inducing techniques. We also emphasize the need of determining the subgraph explanation size for each data instance, as different data instances may involve different important substructures. Building upon these considerations, we introduce a training-free approach, named EiG-Search. We employ an efficient linear-time search algorithm over the edge-induced subgraphs, where the edges are ranked by an enhanced gradient-based importance. We conduct extensive experiments on a total of seven datasets, demonstrating its superior performance and efficiency both quantitatively and qualitatively over the leading baselines.

Updated: 2024-05-16 14:55:47

标题: EiG-Search: 在线性时间内生成用于GNN解释的边诱导子图

摘要: 理解和解释图神经网络(GNNs)的预测对于增强其安全性和可信度至关重要。子图级别的解释因其直观吸引力而受到关注。然而,大多数现有的子图级别解释器在解释GNNs时面临效率挑战,原因是复杂的搜索过程。关键挑战在于在确保透明度的同时找到直观性和效率之间的平衡。此外,这些解释器通常通过节点引入子图,这可能在子图级别的解释中引入不直观的断开节点或忽略许多重要的子图结构。在本文中,我们揭示了通过边引入子图解释比其他子图引入技术更全面。我们还强调了需要确定每个数据实例的子图解释大小,因为不同的数据实例可能涉及不同的重要子结构。基于这些考虑,我们引入了一种无需训练的方法,名为EiG-Search。我们在边诱导的子图上采用了一种高效的线性时间搜索算法,其中边通过增强的基于梯度的重要性进行排序。我们在总共七个数据集上进行了大量实验,定量和定性地展示了它相对于主要基线的优越性能和效率。

更新时间: 2024-05-16 14:55:47

领域: cs.LG

下载: http://arxiv.org/abs/2405.01762v2

PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning

Remote sensing image-text retrieval constitutes a foundational aspect of remote sensing interpretation tasks, facilitating the alignment of vision and language representations. This paper introduces a prior instruction representation (PIR) learning paradigm that draws on prior knowledge to instruct adaptive learning of vision and text representations. Based on PIR, a domain-adapted remote sensing image-text retrieval framework PIR-ITR is designed to address semantic noise issues in vision-language understanding tasks. However, with massive additional data for pre-training the vision-language foundation model, remote sensing image-text retrieval is further developed into an open-domain retrieval task. Continuing with the above, we propose PIR-CLIP, a domain-specific CLIP-based framework for remote sensing image-text retrieval, to address semantic noise in remote sensing vision-language representations and further improve open-domain retrieval performance. In vision representation, Vision Instruction Representation (VIR) based on Spatial-PAE utilizes the prior-guided knowledge of the remote sensing scene recognition by building a belief matrix to select key features for reducing the impact of semantic noise. In text representation, Language Cycle Attention (LCA) based on Temporal-PAE uses the previous time step to cyclically activate the current time step to enhance text representation capability. A cluster-wise Affiliation Loss (AL) is proposed to constrain the inter-classes and to reduce the semantic confusion zones in the common subspace. Comprehensive experiments demonstrate that PIR could enhance vision and text representations and outperform the state-of-the-art methods of closed-domain and open-domain retrieval on two benchmark datasets, RSICD and RSITMD.

Updated: 2024-05-16 14:53:45

标题: PIR: 具有先验指导表示学习的遥感图像文本检索

摘要: 遥感图像-文本检索构成了遥感解释任务的基础方面,促进了视觉和语言表示的对齐。本文介绍了一种先前指导表示(PIR)学习范式,利用先前知识指导视觉和文本表示的自适应学习。基于PIR,设计了一个领域自适应的遥感图像-文本检索框架PIR-ITR,以解决视觉-语言理解任务中的语义噪音问题。然而,通过大量额外数据进行预训练视觉-语言基础模型后,遥感图像-文本检索进一步发展为一个开放领域的检索任务。在此基础上,我们提出了PIR-CLIP,一种基于领域特定CLIP框架的遥感图像-文本检索,以解决遥感视觉-语言表示中的语义噪音,并进一步提高开放领域检索性能。在视觉表示方面,基于Spatial-PAE的Vision Instruction Representation(VIR)利用遥感场景识别的先导知识建立信念矩阵,选择关键特征,以减少语义噪音的影响。在文本表示方面,基于Temporal-PAE的Language Cycle Attention(LCA)利用前一个时间步循环激活当前时间步,增强文本表示能力。提出了一种基于集群的从属损失(AL)来限制类间关系,并减少共同子空间中的语义混淆区域。全面的实验证明,PIR可以增强视觉和文本表示,并在两个基准数据集RSICD和RSITMD上胜过最先进的封闭领域和开放领域检索方法。

更新时间: 2024-05-16 14:53:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.10160v1

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence. Furthermore, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of an objective function corresponding to a novel $f$-divergence referred to as shifted log (SL). We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.

Updated: 2024-05-16 14:46:49

标题: $f$-散度基础分类:超越交叉熵的应用

摘要: 在深度学习中,分类任务通常被形式化为优化问题,常常通过最小化交叉熵来解决。然而,最近对目标函数设计的进展允许使用$f$-散度来概括分类优化问题的公式化。我们采用贝叶斯视角,将分类任务表述为最大后验概率问题。我们提出了一类基于$f$-散度变分表示的目标函数。此外,受到改善最先进方法的挑战的驱使,我们提出了一种自下而上的方法,导致了一个对应于新型$f$-散度的目标函数的公式化,称为偏移对数(SL)。我们在三个应用场景中对提出的目标函数进行了理论分析和数值测试:玩具示例、图像数据集和信号检测/解码问题。分析的场景展示了提出方法的有效性,以及在几乎所有考虑的情况下,SL散度实现了最高的分类准确性。

更新时间: 2024-05-16 14:46:49

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2401.01268v2

Neurosymbolic AI for Reasoning over Knowledge Graphs: A Survey

Neurosymbolic AI is an increasingly active area of research that combines symbolic reasoning methods with deep learning to leverage their complementary benefits. As knowledge graphs are becoming a popular way to represent heterogeneous and multi-relational data, methods for reasoning on graph structures have attempted to follow this neurosymbolic paradigm. Traditionally, such approaches have utilized either rule-based inference or generated representative numerical embeddings from which patterns could be extracted. However, several recent studies have attempted to bridge this dichotomy to generate models that facilitate interpretability, maintain competitive performance, and integrate expert knowledge. Therefore, we survey methods that perform neurosymbolic reasoning tasks on knowledge graphs and propose a novel taxonomy by which we can classify them. Specifically, we propose three major categories: (1) logically-informed embedding approaches, (2) embedding approaches with logical constraints, and (3) rule learning approaches. Alongside the taxonomy, we provide a tabular overview of the approaches and links to their source code, if available, for more direct comparison. Finally, we discuss the unique characteristics and limitations of these methods, then propose several prospective directions toward which this field of research could evolve.

Updated: 2024-05-16 14:46:08

标题: 《用于知识图谱推理的神经符号人工智能:一项调查》

摘要: 神经符号人工智能是一个越来越活跃的研究领域,它将符号推理方法与深度学习相结合,以充分利用它们的互补优势。随着知识图谱成为代表异构和多关系数据的流行方式,用于在图结构上进行推理的方法试图遵循这种神经符号范式。传统上,这些方法要么利用基于规则的推理,要么生成代表性的数值嵌入,从中可以提取模式。然而,最近的几项研究尝试弥合这种二元对立,生成促进可解释性、保持竞争性能力并整合专家知识的模型。因此,我们调查了在知识图谱上执行神经符号推理任务的方法,并提出了一种新的分类方法,通过这种方法我们可以对它们进行分类。具体来说,我们提出了三个主要类别:(1)基于逻辑的嵌入方法,(2)带有逻辑约束的嵌入方法,和(3)规则学习方法。除了分类方法,我们还提供了这些方法的表格概述和链接到它们的源代码(如果可用),以便进行更直接的比较。最后,我们讨论了这些方法的独特特征和局限性,然后提出了这一研究领域可能发展的几个前景方向。

更新时间: 2024-05-16 14:46:08

领域: cs.AI,cs.LO,stat.ML

下载: http://arxiv.org/abs/2302.07200v3

Relational DNN Verification With Cross Executional Bound Refinement

We focus on verifying relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations (UAP), certified worst-case hamming distance for binary string classifications, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. However, most of the existing works in DNN verification only handle properties defined over single executions and as a result, are imprecise for relational properties. Though few recent works for relational DNN verification, capture linear dependencies between the inputs of multiple executions, they do not leverage dependencies between the outputs of hidden layers producing imprecise results. We develop a scalable relational verifier RACoon that utilizes cross-execution dependencies at all layers of the DNN gaining substantial precision over SOTA baselines on a wide range of datasets, networks, and relational properties.

Updated: 2024-05-16 14:35:50

标题: 使用交叉执行边界细化进行关系型DNN验证

摘要: 我们专注于验证深度神经网络(DNN)上定义的关系性属性,如对抗性扰动的稳健性(UAP)、二进制字符串分类的认证最坏情况汉明距离等。对这些属性的精确验证需要对同一DNN的多次执行进行推理。然而,大多数现有的DNN验证工作只处理在单次执行中定义的属性,因此对于关系性属性来说是不精确的。尽管最近有一些关于关系性DNN验证的工作捕捉了多次执行的输入之间的线性依赖关系,但它们没有利用隐藏层输出之间的依赖关系,导致结果不精确。我们开发了一个可扩展的关系性验证器RACoon,在DNN的所有层中利用跨执行的依赖关系,相比于当前最先进的基准,在广泛的数据集、网络和关系性属性上取得了显著的精度。

更新时间: 2024-05-16 14:35:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.10143v1

TRABSA: Interpretable Sentiment Analysis of Tweets using Attention-based BiLSTM and Twitter-RoBERTa

Sentiment analysis is crucial for understanding public opinion and consumer behavior. Existing models face challenges with linguistic diversity, generalizability, and explainability. We propose TRABSA, a hybrid framework integrating transformer-based architectures, attention mechanisms, and BiLSTM networks to address this. Leveraging RoBERTa-trained on 124M tweets, we bridge gaps in sentiment analysis benchmarks, ensuring state-of-the-art accuracy. Augmenting datasets with tweets from 32 countries and US states, we compare six word-embedding techniques and three lexicon-based labeling techniques, selecting the best for optimal sentiment analysis. TRABSA outperforms traditional ML and deep learning models with 94% accuracy and significant precision, recall, and F1-score gains. Evaluation across diverse datasets demonstrates consistent superiority and generalizability. SHAP and LIME analyses enhance interpretability, improving confidence in predictions. Our study facilitates pandemic resource management, aiding resource planning, policy formation, and vaccination tactics.

Updated: 2024-05-16 14:35:36

标题: TRABSA: 使用基于注意力的双向LSTM和Twitter-RoBERTa进行可解释的推特情感分析

摘要: 情感分析对于理解公众意见和消费者行为至关重要。现有模型在面对语言多样性、泛化能力和可解释性方面面临挑战。我们提出了TRABSA,这是一个混合框架,集成了基于transformer的架构、注意机制和BiLSTM网络来解决这些问题。利用在124M条推文上训练的RoBERTa,我们填补了情感分析基准测试中的差距,确保了最先进的准确性。通过使用来自32个国家和美国各州的推文增强数据集,我们比较了六种词嵌入技术和三种基于词库的标签技术,选择了最佳的用于优化情感分析。TRABSA以94%的准确度和显著的精度、召回率和F1分数增益,优于传统的机器学习和深度学习模型。在各种数据集上的评估表明了一致的优越性和泛化能力。SHAP和LIME分析增强了可解释性,提高了对预测的信心。我们的研究有助于大流行资源管理,促进资源规划、政策制定和疫苗策略。

更新时间: 2024-05-16 14:35:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.00297v2

Scaling the weight parameters in Markov logic networks and relational logistic regression models

We consider Markov logic networks and relational logistic regression as two fundamental representation formalisms in statistical relational artificial intelligence that use weighted formulas in their specification. However, Markov logic networks are based on undirected graphs, while relational logistic regression is based on directed acyclic graphs. We show that when scaling the weight parameters with the domain size, the asymptotic behaviour of a relational logistic regression model is transparently controlled by the parameters, and we supply an algorithm to compute asymptotic probabilities. We also show using two examples that this is not true for Markov logic networks. We also discuss using several examples, mainly from the literature, how the application context can help the user to decide when such scaling is appropriate and when using the raw unscaled parameters might be preferable. We highlight random sampling as a particularly promising area of application for scaled models and expound possible avenues for further research.

Updated: 2024-05-16 14:32:16

标题: 将马尔可夫逻辑网络和关系逻辑回归模型中的权重参数进行缩放

摘要: 我们认为马尔可夫逻辑网络和关系逻辑回归是统计关系人工智能中使用加权公式的两种基本表示形式。然而,马尔可夫逻辑网络基于无向图,而关系逻辑回归基于有向无环图。我们展示了当将权重参数与域大小进行缩放时,关系逻辑回归模型的渐近行为可以透明地通过参数控制,并提供了一个计算渐近概率的算法。我们还通过两个例子展示了这一点对于马尔可夫逻辑网络并不成立。我们还讨论了使用几个例子,主要来自文献,如何应用背景可以帮助用户决定何时使用这种缩放是合适的,何时使用原始未缩放的参数可能更好。我们强调随机抽样作为一个特别有前景的应用领域,阐明了进一步研究的可能途径。

更新时间: 2024-05-16 14:32:16

领域: cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2103.15140v3

Towards Consistent and Explainable Motion Prediction using Heterogeneous Graph Attention

In autonomous driving, accurately interpreting the movements of other road users and leveraging this knowledge to forecast future trajectories is crucial. This is typically achieved through the integration of map data and tracked trajectories of various agents. Numerous methodologies combine this information into a singular embedding for each agent, which is then utilized to predict future behavior. However, these approaches have a notable drawback in that they may lose exact location information during the encoding process. The encoding still includes general map information. However, the generation of valid and consistent trajectories is not guaranteed. This can cause the predicted trajectories to stray from the actual lanes. This paper introduces a new refinement module designed to project the predicted trajectories back onto the actual map, rectifying these discrepancies and leading towards more consistent predictions. This versatile module can be readily incorporated into a wide range of architectures. Additionally, we propose a novel scene encoder that handles all relations between agents and their environment in a single unified heterogeneous graph attention network. By analyzing the attention values on the different edges in this graph, we can gain unique insights into the neural network's inner workings leading towards a more explainable prediction.

Updated: 2024-05-16 14:31:15

标题: 朝向一致且可解释的使用异质图注意力进行运动预测

摘要: 在自动驾驶中,准确解释其他道路用户的移动并利用这些知识来预测未来轨迹至关重要。通常通过集成地图数据和各种代理的跟踪轨迹来实现这一目标。许多方法将这些信息合并为每个代理的单一嵌入,然后利用这些信息来预测未来行为。然而,这些方法存在一个明显的缺点,即在编码过程中可能会丢失精确的位置信息。编码仍然包括一般地图信息。然而,并不保证生成有效和一致的轨迹。这可能导致预测的轨迹偏离实际车道。本文介绍了一个新的改进模块,旨在将预测的轨迹投影回实际地图,纠正这些差异并朝着更一致的预测方向发展。这个多功能模块可以很容易地整合到各种架构中。此外,我们提出了一种新颖的场景编码器,通过一个统一的异构图注意力网络处理代理与环境之间的所有关系。通过分析图中不同边缘上的注意力值,我们可以深入了解神经网络的内部工作,从而实现更可解释的预测。

更新时间: 2024-05-16 14:31:15

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.10134v1

Trusting the Cloud-Native Edge: Remotely Attested Kubernetes Workers

A Kubernetes cluster typically consists of trusted nodes, running within the confines of a physically secure datacenter. With recent advances in edge orchestration, this is no longer the case. This poses a new challenge: how can we trust a device that an attacker has physical access to? This paper presents an architecture and open-source implementation that securely enrolls edge devices as trusted Kubernetes worker nodes. By providing boot attestation rooted in a hardware Trusted Platform Module, a strong base of trust is provided. A new custom controller directs a modified version of Keylime to cross the cloud-edge gap and securely deliver unique cluster credentials required to enroll an edge worker. The controller dynamically grants and revokes these credentials based on attestation events, preventing a possibly compromised node from accessing sensitive cluster resources. We provide both a qualitative and a quantitative evaluation of the architecture. The qualitative scenarios prove its ability to attest and enroll an edge device with role-based access control (RBAC) permissions that dynamically adjust to attestation events. The quantitative evaluation reflects an average of 10.28 seconds delay incurred on the startup time of the edge node due to attestation for a total average enrollment time of 20.91 seconds. The presented architecture thus provides a strong base of trust, securing a physically exposed edge device and paving the way for a robust and resilient edge computing ecosystem.

Updated: 2024-05-16 14:29:28

标题: 信任云原生边缘:远程认证的Kubernetes工作节点

摘要: 一个 Kubernetes 集群通常由受信任的节点组成,运行在物理安全数据中心的限制范围内。随着边缘编排技术的最新进展,情况已经发生变化。这带来了一个新的挑战:我们如何能够信任一个攻击者可以物理接触的设备?本文提出了一种架构和开源实现,可安全将边缘设备注册为受信任的 Kubernetes 工作节点。通过提供基于硬件 Trusted Platform Module 的引导认证,提供了强大的信任基础。一个新的自定义控制器指导 Keylime 的修改版本跨越云边缘间隙,并安全地提供所需的唯一集群凭据,以注册边缘工作节点。该控制器根据认证事件动态授予和撤销这些凭据,防止可能受损节点访问敏感集群资源。我们对该架构进行了定性和定量评估。定性场景证明了其具有证明和注册具有基于角色的访问控制权限的边缘设备的能力,并动态调整到认证事件。定量评估反映了由于认证而导致的边缘节点启动时间平均延迟为10.28秒,总体平均注册时间为20.91秒。因此,所提出的架构为提供强大的信任基础,保护了一个物理暴露的边缘设备,并为强大而具有弹性的边缘计算生态系统铺平了道路。

更新时间: 2024-05-16 14:29:28

领域: cs.CR

下载: http://arxiv.org/abs/2405.10131v1

StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis

The emergence of large language models (LLMs) capable of generating realistic texts and images has sparked ethical concerns across various sectors. In response, researchers in academia and industry are actively exploring methods to distinguish AI-generated content from human-authored material. However, a crucial question remains: What are the unique characteristics of AI-generated text? Addressing this gap, this study proposes StyloAI, a data-driven model that uses 31 stylometric features to identify AI-generated texts by applying a Random Forest classifier on two multi-domain datasets. StyloAI achieves accuracy rates of 81% and 98% on the test set of the AuTextification dataset and the Education dataset, respectively. This approach surpasses the performance of existing state-of-the-art models and provides valuable insights into the differences between AI-generated and human-authored texts.

Updated: 2024-05-16 14:28:01

标题: StyloAI:利用文体分析区分AI生成的内容

摘要: 大型语言模型(LLMs)的出现,能够生成逼真的文本和图像,引发了各个领域的道德关注。作为回应,学术界和工业界的研究人员正在积极探索方法,以区分人工智能生成的内容和人类创作的材料。然而,一个关键问题仍然存在:人工智能生成的文本具有哪些独特特征?为了填补这一空白,本研究提出了StyloAI,一个数据驱动模型,使用31个文体特征来识别AI生成的文本,通过在两个多领域数据集上应用随机森林分类器。StyloAI在AuTextification数据集和教育数据集的测试集上分别实现了81%和98%的准确率。这种方法超越了现有最先进的模型的性能,并为AI生成的文本与人类创作的文本之间的差异提供了宝贵的见解。

更新时间: 2024-05-16 14:28:01

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2405.10129v1

Red Teaming Language Models for Contradictory Dialogues

Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.

Updated: 2024-05-16 14:27:32

标题: 为相互矛盾对话设计红队语言模型

摘要: 目前大多数可用的语言模型在对话过程中容易出现自相矛盾的情况。为了缓解这一问题,本研究探索了一种新颖的相互矛盾对话处理任务,旨在检测和修改对话中的矛盾陈述。这一任务受到了关于上下文忠实度和对话理解的研究的启发,这些研究表明,检测和理解矛盾通常需要详细的解释。我们开发了一个包含矛盾对话的数据集,其中对话的一方与自己相矛盾。每个对话都附带一个解释性标签,突出显示矛盾的位置和细节。借助这个数据集,我们提出了一个用于处理相互矛盾对话的“红队”框架。该框架检测并尝试解释对话,然后使用解释修改现有的矛盾内容。我们的实验表明,该框架提高了检测相互矛盾对话的能力,并提供了有效的解释。此外,它展示了修改此类对话的独特能力。我们的研究突显了对话人工智能中逻辑不一致问题的重要性。

更新时间: 2024-05-16 14:27:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.10128v1

Estimating a Function and Its Derivatives Under a Smoothness Condition

We consider the problem of estimating an unknown function f* and its partial derivatives from a noisy data set of n observations, where we make no assumptions about f* except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of f* in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as n increases to infinity. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.

Updated: 2024-05-16 14:24:44

标题: 在一个光滑条件下估计一个函数及其导数

摘要: 我们考虑从一个包含n个观测数据的噪声数据集中估计一个未知函数f*及其偏导数的问题,我们对f*没有假设,只假设它在平方可积的m阶偏导数意义下是光滑的。在这种情况下,f*的估计的一个自然候选者是符合某种平滑条件的最佳拟合数据集。这个估计可以看作是在某种平滑度量上的最小二乘估计。另一个有用的估计是在平方误差的平均值上设定上界的情况下最小化平滑度量的那个。我们证明这两个估计是可计算的,可以作为二次规划的解,证实了这些估计及它们的偏导数的一致性,并研究了n趋于无穷时的收敛速度。这些估计的有效性在一个场景中通过数值方法进行了说明,这个场景中估计了股票期权的价值及其二阶导数作为基础股价的函数。

更新时间: 2024-05-16 14:24:44

领域: stat.ML,cs.LG,math.ST,stat.TH,62G08, 62G20

下载: http://arxiv.org/abs/2405.10126v1

Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.

Updated: 2024-05-16 14:24:37

标题: 确保无人机安全:通过物体检测、跟踪和距离估计实现碰撞回避的仅视觉和实时框架

摘要: 在过去二十年中,无人机(UAVs)因其在军事和民用领域中应用不断扩大而引起了越来越多的关注。高效地检测非合作空中飞行器并准确估计碰撞是实现完全自主飞行器和促进先进空中移动(AAM)的关键。本文提出了一个利用光学传感器进行非合作空中飞行器检测、跟踪和距离估计的深度学习框架。在实施这一全面感知框架时,深度信息的可用性对于使自主飞行器能够感知和绕过障碍物至关重要。在这项工作中,我们提出了一种利用单目摄像头输入实时估计检测到的空中目标距离信息的方法。为了训练我们的深度学习组件进行对象检测、跟踪和深度估计任务,我们利用了亚马逊空中目标跟踪(AOT)数据集。与将深度估计模块整合到对象检测器中的先前方法不同,我们的方法将问题形式化为图像到图像的转换。我们采用一个独立的轻量级编码器-解码器网络进行高效且稳健的深度估计。简而言之,对象检测模块识别和定位障碍物,并将此信息传递给跟踪模块监视障碍物移动以及深度估计模块计算距离。我们的方法在空中目标跟踪(AOT)数据集上进行评估,该数据集是我们所知道的最大的空中飞行器数据集。

更新时间: 2024-05-16 14:24:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.06749v2

Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis

The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: https://github.com/dkollias/Fair-Consistent-Affect-Analysis

Updated: 2024-05-16 14:23:23

标题: 弥合差距:朝向公平一致的情感分析协议

摘要: 随着机器学习算法在日常生活中的不断融合,强调了在其部署中公平和公正的重要性。由于这些技术在决策中发挥着关键作用,解决跨不同亚群体群体(包括年龄、性别和种族)的偏见变得至关重要。自动情感分析处于生理学、心理学和机器学习的交叉点上,已经取得了显著的发展。然而,现有的数据库和方法缺乏统一性,导致有偏见的评估。本研究通过分析六个情感数据库,注释人口统计属性,并提出一个数据库分区的通用协议,来解决这些问题。重点放在评估中的公平性。基线和最先进方法的广泛实验展示了这些变化的影响,揭示了以前评估的不足之处。研究结果强调了在情感分析研究中考虑人口统计属性的重要性,并为更加公平的方法学奠定了基础。我们的注释、代码和预训练模型可在以下网址找到:https://github.com/dkollias/Fair-Consistent-Affect-Analysis

更新时间: 2024-05-16 14:23:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.06841v2

Asynchronous Federated Stochastic Optimization with Exact Averaging for Heterogeneous Local Objectives

Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients and a decrease in training accuracy induced by non-iid local distributions ("client drift"). In this work we propose and analyze AREA, a new stochastic (sub)gradient algorithm that is robust to client drift and utilizes asynchronous communication to speed up convergence in the presence of stragglers. Moreover, AREA is, to the best of our knowledge, the first method that is both guaranteed to converge under arbitrarily long delays, and converges to an error neighborhood whose size depends only on the variance of the stochastic (sub)gradients used and thus is independent of both the heterogeneity between the local datasets and the length of client delays, without the use of delay-adaptive stepsizes. Our numerical results confirm our theoretical analysis and suggest that AREA outperforms state-of-the-art methods when local data are highly non-iid.

Updated: 2024-05-16 14:22:49

标题: 异步联邦随机优化与异构本地目标的精确平均

摘要: 联邦学习(FL)是最近提出的一种方法,用于在中央服务器的协调下在多个位置(“客户端”)保存的数据上安全训练模型。阻碍FL算法性能的两个主要挑战是由于拖延的客户端导致的长时间训练以及由非iid本地分布(“客户漂移”)引起的训练准确性降低。在这项工作中,我们提出并分析了AREA,这是一种新的随机(次)梯度算法,它对客户漂移具有鲁棒性,并利用异步通信加快了收敛速度以应对拖延者。此外,据我们所知,AREA是第一种在任意长延迟下收敛并收敛到一个仅取决于使用的随机(次)梯度的方差的误差邻域的方法,因此与本地数据集之间的异质性和客户端延迟的长度无关,而不使用延迟自适应步长。我们的数值结果验证了我们的理论分析,并表明当本地数据高度非iid时,AREA优于最先进的方法。

更新时间: 2024-05-16 14:22:49

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.10123v1

The generalised distribution semantics and projective families of distributions

We generalise the distribution semantics underpinning probabilistic logic programming by distilling its essential concept, the separation of a free random component and a deterministic part. This abstracts the core ideas beyond logic programming as such to encompass frameworks from probabilistic databases, probabilistic finite model theory and discrete lifted Bayesian networks. To demonstrate the usefulness of such a general approach, we completely characterise the projective families of distributions representable in the generalised distribution semantics and we demonstrate both that large classes of interesting projective families cannot be represented in a generalised distribution semantics and that already a very limited fragment of logic programming (acyclic determinate logic programs) in the determinsitic part suffices to represent all those projective families that are representable in the generalised distribution semantics at all.

Updated: 2024-05-16 14:22:29

标题: 广义分布语义学和分布的投影族

摘要: 我们通过概括概率逻辑编程背后的分布语义,提炼其核心概念,即自由随机成分和确定性部分的分离。这将逻辑编程背后的核心思想抽象出来,以包括概率数据库、概率有限模型理论和离散提升贝叶斯网络等框架。为了展示这种一般方法的实用性,我们完全描述了可在广义分布语义中表示的投影分布族,并且我们证明了许多有趣的投影分布族不能在广义分布语义中表示,并且在确定性部分中已经很有限的逻辑编程片段(无环确定逻辑程序)就足以表示那些可在广义分布语义中表示的所有投影分布族。

更新时间: 2024-05-16 14:22:29

领域: cs.AI,cs.DB,cs.PL

下载: http://arxiv.org/abs/2211.06751v2

Probabilities of the third type: Statistical Relational Learning and Reasoning with Relative Frequencies

Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced Lifted Bayesian Networks for Conditional Probability Logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence. and compare and contrast them with ifted Bayesian Networks for Conditional Probability Logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.

Updated: 2024-05-16 14:22:08

标题: 第三类型的概率:使用相对频率进行统计关系学习和推理

摘要: 在建模概率关系数据时,对领域中某个状态的相对频率的依赖性是常见的。例如,在流行病期间学校关闭的可能性可能取决于感染学生的比例超过某个阈值。通常,与依赖于离散阈值不同,依赖性是连续的:例如,任何一次蚊子叮咬传播疾病的可能性取决于携带蚊子的比例。当前方法通常只考虑可能世界上的概率,而不是领域元素本身的概率。一个例外是最近引入的用于条件概率逻辑的提升贝叶斯网络,它表达了对概率数据的离散依赖关系。我们引入了功能提升贝叶斯网络,这是一种形式化的形式,明确地将相对频率上的连续依赖关系纳入统计关系人工智能中,并将其与条件概率逻辑中的提升贝叶斯网络进行比较和对比。将相对频率纳入模型不仅有利于建模;它还提供了一种更严格的方法来解决训练和测试或应用领域大小不同的问题。为此,我们提供了功能提升贝叶斯网络在逐渐增大的领域上诱导的渐近概率分布的表示。由于该表示在不同领域大小上具有良好理解的缩放行为,因此可以使用它从随机抽样的子种群中一致地估计大领域的参数。此外,我们还展示了在FLBN的参数族中,收敛在参数上是均匀的,这确保了渐近概率对模型参数的有意义的依赖。

更新时间: 2024-05-16 14:22:08

领域: cs.AI,cs.LG,cs.LO,I.2.4; I.2.6

下载: http://arxiv.org/abs/2202.10367v3

NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of mapping. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and mapping quality in dynamic environments.

Updated: 2024-05-16 14:19:52

标题: NID-SLAM:基于神经隐式表示的动态环境中RGB-D SLAM

摘要: 神经隐式表示已被探索用来增强视觉SLAM算法,特别是提供高保真密集地图。现有方法在静态场景中运行稳健,但在移动物体引起的干扰下表现不佳。本文介绍了NID-SLAM,显著改善了神经SLAM在动态环境中的性能。我们提出了一种新方法,以增强语义掩膜中的不准确区域,特别是在边缘区域。利用深度图像中的几何信息,该方法能够精确地去除动态物体,从而降低相机漂移的概率。此外,我们引入了一种针对动态场景的关键帧选择策略,增强了相机跟踪对大规模物体的稳健性,并提高了建图的效率。对公开可用的RGB-D数据集进行的实验表明,我们的方法在动态环境中的跟踪准确性和建图质量方面优于竞争性的神经SLAM方法。

更新时间: 2024-05-16 14:19:52

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2401.01189v2

Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal Inference for Sensor Signals

Semi-supervised anomaly detection for sensor signals is critical in ensuring system reliability in smart manufacturing. However, existing methods rely heavily on data correlation, neglecting causality and leading to potential misinterpretations due to confounding factors. Moreover, while current reinforcement learning-based methods can effectively identify known and unknown anomalies with limited labeled samples, these methods still face several challenges, such as under-utilization of priori knowledge, lack of model flexibility, and deficient reward feedback during environmental interactions. To address the above problems, this paper innovatively constructs a counterfactual causal reinforcement learning model, termed Triple-Assisted Causal Reinforcement Learning Anomaly Detector (Tri-CRLAD). The model leverages causal inference to extract the intrinsic causal feature in data, enhancing the agent's utilization of prior knowledge and improving its generalization capability. In addition, Tri-CRLAD features a triple decision support mechanism, including a sampling strategy based on historical similarity, an adaptive threshold smoothing adjustment strategy, and an adaptive decision reward mechanism. These mechanisms further enhance the flexibility and generalization ability of the model, enabling it to effectively respond to various complex and dynamically changing environments. Experimental results across seven diverse sensor signal datasets demonstrate that Tri-CRLAD outperforms nine state-of-the-art baseline methods. Notably, Tri-CRLAD achieves up to a 23\% improvement in anomaly detection stability with minimal known anomaly samples, highlighting its potential in semi-supervised anomaly detection scenarios. Our code is available at https://github.com/Aoudsung/Tri-CRLAD.

Updated: 2024-05-16 14:17:10

标题: 半监督异常检测:基于因果推断的自适应强化学习启用方法用于传感器信号

摘要: 传感器信号的半监督异常检测在智能制造中确保系统可靠性至关重要。然而,现有方法主要依赖数据相关性,忽视因果关系,导致由于混淆因素可能产生潜在的错误解释。此外,虽然当前基于强化学习的方法可以有效识别已知和未知异常,但这些方法仍面临一些挑战,如先验知识的利用不足、模型灵活性不足以及在环境交互过程中缺乏奖励反馈。为了解决上述问题,本文创新地构建了一个因果反事实强化学习模型,称为三重辅助因果强化学习异常检测器(Tri-CRLAD)。该模型利用因果推断来提取数据中的内在因果特征,增强了代理的先验知识利用和提高了其泛化能力。此外,Tri-CRLAD具有三重决策支持机制,包括基于历史相似性的采样策略、自适应阈值平滑调整策略和自适应决策奖励机制。这些机制进一步增强了模型的灵活性和泛化能力,使其能够有效应对各种复杂和动态变化的环境。对七个不同的传感器信号数据集的实验结果表明,Tri-CRLAD优于九种最先进的基线方法。值得注意的是,Tri-CRLAD在最少已知异常样本的情况下,异常检测稳定性提高了高达23%,突显了其在半监督异常检测场景中的潜力。我们的代码可在https://github.com/Aoudsung/Tri-CRLAD 上找到。

更新时间: 2024-05-16 14:17:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.06925v2

Disguised Copyright Infringement of Latent Diffusion Models

Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access.

Updated: 2024-05-16 14:12:28

标题: 潜在扩散模型的伪装版权侵权

摘要: 侵犯版权可能发生在生成模型产生与其在训练阶段访问过的某些受版权保护的数据非常相似的样本时。访问的概念通常指的是将受版权保护的样本直接包含在训练数据集中,可以检查以识别侵权行为。我们认为,这种视觉审计很大程度上忽视了隐藏的版权侵权行为,即制作出看起来与受版权保护的样本截然不同但仍会导致在其上训练潜在扩散模型的效果的伪装。这种伪装只需要间接访问受版权保护的材料,无法在视觉上区分,因此很容易规避当前的审计工具。在本文中,我们通过揭示伪装生成算法、伪装的揭示以及重要的是如何检测它们来更好地理解这种伪装的版权侵权行为,以增强现有工具箱。此外,我们引入了一个更广泛的承认概念,以理解这种间接访问。

更新时间: 2024-05-16 14:12:28

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.06737v3

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO.

Updated: 2024-05-16 14:11:46

标题: UCB驱动的多目标强化学习中的效用函数搜索

摘要: 在多目标强化学习(MORL)中,代理被赋予优化决策行为的任务,这些行为在多个可能冲突的目标之间进行权衡。基于分解的MORL是一类解决方法,它采用多个效用函数将多目标问题分解为单个单目标问题,同时解决以便近似帕累托前沿。我们关注线性效用函数的情况,这些函数由权重向量w参数化。我们提出了一种基于上置信界的方法,用于在学习过程的不同阶段高效搜索最有希望的权重向量,旨在最大化所得帕累托前沿的超体积。所提出的方法在不同随机种子下的Mujoco基准问题上表现优于各种MORL基线。代码位于以下链接:https://github.com/SYCAMORE-1/ucb-MOPPO。

更新时间: 2024-05-16 14:11:46

领域: cs.LG

下载: http://arxiv.org/abs/2405.00410v2

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods, mainly model-free, face constraints in handling limited data and generalizing to unseen goals. In this work, we propose Goal-conditioned Offline Planning (GOPlan), a novel model-based framework that contains two key phases: (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, we base the prior policy on an advantage-weighted conditioned generative adversarial network, which facilitates distinct mode separation, mitigating the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. With thorough experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal navigation and manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.

Updated: 2024-05-16 14:08:55

标题: GOPlan:使用学习模型进行规划的目标导向离线强化学习

摘要: 离线目标条件强化学习(GCRL)提供了一种可行的范例,可以从多样化和多任务的离线数据集中学习通用策略。尽管最近取得了显著进展,但主要的离线GCRL方法,主要是无模型的,面临处理有限数据和泛化到未见目标的限制。在这项工作中,我们提出了一种名为Goal-conditioned Offline Planning(GOPlan)的新颖的基于模型的框架,其中包含两个关键阶段:(1)预训练一个能够捕捉多目标数据集中多模态动作分布的先验策略;(2)利用重新分析方法与规划来生成想象轨迹以进行策略的微调。具体来说,我们基于一个优势加权的条件生成对抗网络构建先验策略,这有助于明确模式分离,减轻了超出分布(OOD)动作的缺陷。为了进一步优化策略,重新分析方法通过利用学习模型进行规划生成高质量的虚拟数据,用于处理轨迹内和轨迹间的目标。通过彻底的实验评估,我们证明了GOPlan在各种离线多目标导航和操作任务上实现了最先进的性能。此外,我们的结果突出了GOPlan处理小数据预算和泛化到OOD目标的卓越能力。

更新时间: 2024-05-16 14:08:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.20025v3

A novel Reservoir Architecture for Periodic Time Series Prediction

This paper introduces a novel approach to predicting periodic time series using reservoir computing. The model is tailored to deliver precise forecasts of rhythms, a crucial aspect for tasks such as generating musical rhythm. Leveraging reservoir computing, our proposed method is ultimately oriented towards predicting human perception of rhythm. Our network accurately predicts rhythmic signals within the human frequency perception range. The model architecture incorporates primary and intermediate neurons tasked with capturing and transmitting rhythmic information. Two parameter matrices, denoted as c and k, regulate the reservoir's overall dynamics. We propose a loss function to adapt c post-training and introduce a dynamic selection (DS) mechanism that adjusts $k$ to focus on areas with outstanding contributions. Experimental results on a diverse test set showcase accurate predictions, further improved through real-time tuning of the reservoir via c and k. Comparative assessments highlight its superior performance compared to conventional models.

Updated: 2024-05-16 13:55:53

标题: 一种用于周期性时间序列预测的新型储存体结构

摘要: 本文介绍了一种利用储层计算预测周期性时间序列的新方法。该模型旨在提供精确的节奏预测,这对于任务如生成音乐节奏而言是至关重要的。通过利用储层计算,我们提出的方法最终旨在预测人类对节奏的感知。我们的网络准确预测了人类频率感知范围内的节奏信号。该模型架构包括主要和中间神经元,负责捕捉和传输节奏信息。两个参数矩阵,标记为c和k,调节了储层的整体动态。我们提出了一个损失函数来在训练后调整c,并引入了一个动态选择(DS)机制,调整k以侧重于具有杰出贡献的区域。对多样化测试集的实验结果展示了准确的预测,通过实时调整储层通过c和k进一步改进。比较评估突显了与传统模型相比其优越的性能。

更新时间: 2024-05-16 13:55:53

领域: cs.NE,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.10102v1

The Effect of Quantization in Federated Learning: A Rényi Differential Privacy Perspective

Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address this issue, quantization is commonly employed. Nevertheless, the presence of quantized Gaussian noise introduces complexities in understanding privacy protection. This research paper investigates the impact of quantization on privacy in FL systems. We examine the privacy guarantees of quantized Gaussian mechanisms using R\'enyi Differential Privacy (RDP). By deriving the privacy budget of quantized Gaussian mechanisms, we demonstrate that lower quantization bit levels provide improved privacy protection. To validate our theoretical findings, we employ Membership Inference Attacks (MIA), which gauge the accuracy of privacy leakage. The numerical results align with our theoretical analysis, confirming that quantization can indeed enhance privacy protection. This study not only enhances our understanding of the correlation between privacy and communication in FL but also underscores the advantages of quantization in preserving privacy.

Updated: 2024-05-16 13:50:46

标题: 《联邦学习中量化的影响:从Rényi差分隐私的角度看》

摘要: 联邦学习(FL)是一种新兴范式,具有利用分布式数据进行隐私保护机器学习的巨大潜力。为了增强隐私保护,FL可以与差分隐私(DP)结合,其中涉及向模型权重添加高斯噪声。然而,FL在传输这些模型权重时面临着大量的通信开销方面的重大挑战。为了解决这个问题,通常采用量化方法。然而,量化高斯噪声的存在增加了理解隐私保护的复杂性。本研究论文调查了量化对FL系统隐私的影响。我们通过使用Rényi差分隐私(RDP)来检验量化高斯机制的隐私保证。通过推导量化高斯机制的隐私预算,我们证明较低的量化比特级别提供了更好的隐私保护。为了验证我们的理论发现,我们使用成员推断攻击(MIA),评估隐私泄漏的准确性。数值结果与我们的理论分析一致,证实了量化确实可以增强隐私保护。这项研究不仅增进了我们对FL中隐私与通信之间相关性的理解,还强调了量化在保护隐私方面的优势。

更新时间: 2024-05-16 13:50:46

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.10096v1

LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting

We introduce LatentTimePFN (LaT-PFN), a foundational Time Series model with a strong embedding space that enables zero-shot forecasting. To achieve this, we perform in-context learning in latent space utilizing a novel integration of the Prior-data Fitted Networks (PFN) and Joint Embedding Predictive Architecture (JEPA) frameworks. We leverage the JEPA framework to create a prediction-optimized latent representation of the underlying stochastic process that generates time series and combines it with contextual learning, using a PFN. Furthermore, we improve on preceding works by utilizing related time series as a context and introducing an abstract time axis. This drastically reduces training time and increases the versatility of the model by allowing any time granularity and forecast horizon. We show that this results in superior zero-shot predictions compared to established baselines. We also demonstrate our latent space produces informative embeddings of both individual time steps and fixed-length summaries of entire series. Finally, we observe the emergence of multi-step patch embeddings without explicit training, suggesting the model actively learns discrete tokens that encode local structures in the data, analogous to vision transformers.

Updated: 2024-05-16 13:44:56

标题: LaT-PFN:一种用于上下文时间序列预测的联合嵌入预测架构

摘要: 我们介绍了LatentTimePFN(LaT-PFN),这是一个具有强大嵌入空间的基础时间序列模型,可以实现零-shot预测。为了实现这一目标,我们在潜在空间中进行上下文学习,利用了Prior-data Fitted Networks(PFN)和Joint Embedding Predictive Architecture(JEPA)框架的新颖集成。我们利用JEPA框架创建了一个预测优化的潜在表示,用于生成时间序列的基础随机过程,并将其与PFN进行上下文学习相结合。此外,我们通过利用相关时间序列作为上下文,并引入一个抽象时间轴,改进了之前的工作。这大大减少了训练时间,并通过允许任何时间粒度和预测时间范围,增加了模型的多功能性。我们展示了与已建立的基线相比,这导致了优越的零-shot预测结果。我们还展示了我们的潜在空间产生了有关个别时间步骤和整个系列的固定长度摘要的信息性嵌入。最后,我们观察到多步补丁嵌入的出现,而无需显式训练,这表明模型积极学习编码数据中的局部结构的离散标记,类似于视觉变换器。

更新时间: 2024-05-16 13:44:56

领域: cs.LG,cs.AI,stat.ML,62, 68,I.2.6

下载: http://arxiv.org/abs/2405.10093v1

Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation

The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the deep learning context, we introduce the mini-batch Learning-to-match (m-LTM) framework for audio-text retrieval problems. This framework leverages mini-batch subsampling and Mahalanobis-enhanced family of ground metrics. Moreover, to cope with misaligned training data in practice, we propose a variant using partial optimal transport to mitigate the harm of misaligned data pairs in training data. We conduct extensive experiments on audio-text matching problems using three datasets: AudioCaps, Clotho, and ESC-50. Results demonstrate that our proposed method is capable of learning rich and expressive joint embedding space, which achieves SOTA performance. Beyond this, the proposed m-LTM framework is able to close the modality gap across audio and text embedding, which surpasses both triplet and contrastive loss in the zero-shot sound event detection task on the ESC-50 dataset. Notably, our strategy of employing partial optimal transport with m-LTM demonstrates greater noise tolerance than contrastive loss, especially under varying noise ratios in training data on the AudioCaps dataset. Our code is available at https://github.com/v-manhlt3/m-LTM-Audio-Text-Retrieval

Updated: 2024-05-16 13:28:10

标题: 重新审视通过交通视角深度音频文本检索 (Note: This translation may not be completely accurate as the original title is quite technical and may have specific meanings in the context of the document.)

摘要: 学习匹配(LTM)框架证明是一种有效的逆优化传输方法,用于学习两个数据源之间的基础地面度量,促进后续匹配。然而,传统的LTM框架面临着可扩展性挑战,需要每次更新地面度量的参数时使用整个数据集。在将LTM调整到深度学习环境中时,我们引入了用于音频文本检索问题的小批量学习匹配(m-LTM)框架。该框架利用小批量子采样和马哈拉诺比斯增强的地面度量系列。此外,为了应对实践中的不对齐训练数据,我们提出了一种使用部分最优传输来减轻训练数据中不对齐数据对的危害的变体。我们在三个数据集上进行了大量实验,包括AudioCaps,Clotho和ESC-50。结果表明,我们提出的方法能够学习丰富和表达丰富的联合嵌入空间,实现了SOTA性能。此外,提出的m-LTM框架能够消除音频和文本嵌入之间的模态差距,在ESC-50数据集上的零样本声音事件检测任务中超越了三元组和对比损失。值得注意的是,我们采用m-LTM和部分最优传输策略表现出比对比损失更高的噪声容忍度,特别是在AudioCaps数据集的训练数据中噪声比率变化的情况下。我们的代码可在https://github.com/v-manhlt3/m-LTM-Audio-Text-Retrieval 上找到。

更新时间: 2024-05-16 13:28:10

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2405.10084v1

Protecting Your LLMs with Information Bottleneck

The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outperforms current defense methods in mitigating jailbreak attempts, without overly affecting response quality or inference speed. Its effectiveness and adaptability across various attack methods and target LLMs underscore the potential of IBProtector as a novel, transferable defense that bolsters the security of LLMs without requiring modifications to the underlying models.

Updated: 2024-05-16 13:26:57

标题: 用信息瓶颈保护您的LLMs

摘要: 大型语言模型(LLMs)的出现彻底改变了自然语言处理领域,但它们可能会受到攻击而产生有害内容。尽管已经努力在伦理上对齐LLMs,但这些模型往往很脆弱,可以通过优化或手动对抗性提示来规避。为了解决这个问题,我们引入了信息瓶颈保护器(IBProtector),这是一种基于信息瓶颈原则的防御机制,并且我们修改了目标以避免平凡解决方案。IBProtector通过轻量级和可训练的提取器有选择性地压缩和扰动提示,仅保留目标LLMs回应预期答案所必要的信息。此外,我们进一步考虑了在梯度不可见时要与任何LLM兼容的情况。我们的实证评估表明,IBProtector在减轻越狱尝试方面优于当前的防御方法,而不会过度影响响应质量或推理速度。其在各种攻击方法和目标LLMs上的有效性和适应性突显了IBProtector作为一种新颖、可转移的防御手段,增强了LLMs的安全性,而不需要对底层模型进行修改。

更新时间: 2024-05-16 13:26:57

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2404.13968v2

An Integrated Framework for Multi-Granular Explanation of Video Summarization

In this paper, we propose an integrated framework for multi-granular explanation of video summarization. This framework integrates methods for producing explanations both at the fragment level (indicating which video fragments influenced the most the decisions of the summarizer) and the more fine-grained visual object level (highlighting which visual objects were the most influential for the summarizer). To build this framework, we extend our previous work on this field, by investigating the use of a model-agnostic, perturbation-based approach for fragment-level explanation of the video summarization results, and introducing a new method that combines the results of video panoptic segmentation with an adaptation of a perturbation-based explanation approach to produce object-level explanations. The performance of the developed framework is evaluated using a state-of-the-art summarization method and two datasets for benchmarking video summarization. The findings of the conducted quantitative and qualitative evaluations demonstrate the ability of our framework to spot the most and least influential fragments and visual objects of the video for the summarizer, and to provide a comprehensive set of visual-based explanations about the output of the summarization process.

Updated: 2024-05-16 13:25:36

标题: 一个用于视频摘要的多粒度解释的集成框架

摘要: 在本文中,我们提出了一个用于多粒度视频摘要解释的集成框架。该框架整合了在片段级别(指示哪些视频片段对摘要生成器的决策影响最大)和更精细的视觉对象级别(突出显示哪些视觉对象对摘要生成器最具影响力)产生解释的方法。为了构建这个框架,我们扩展了我们在这一领域的先前工作,通过研究在视频摘要结果的片段级别解释中使用一种与模型无关的扰动方法,并引入了一种结合视频全景分割结果和一种扰动方法改编产生对象级别解释的新方法。使用一种最先进的摘要方法和两个用于基准测试视频摘要的数据集来评估开发的框架的性能。进行的定量和定性评估的结果表明,我们的框架能够识别视频中对摘要生成器最具影响力和最不具影响力的片段和视觉对象,并提供有关摘要过程输出的一套全面的基于视觉的解释。

更新时间: 2024-05-16 13:25:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.10082v1

Dihedral Quantum Codes

We establish dihedral quantum codes of short block length, a class of CSS codes obtained by the lifted product construction. We present the code construction and give a formula for the code dimension, depending on the two classical codes that the CSS code is based on. We also give a lower bound on the code distance and construct an example of short dihedral quantum codes.

Updated: 2024-05-16 13:22:11

标题: 二面角量子码

摘要: 我们建立了短块长度的二面角量子码,这是一类通过提升乘积构造得到的CSS码。我们提出了编码构造,并给出了编码维度的公式,取决于CSS码所基于的两个经典码。我们还给出了编码距离的下界,并构造了短二面角量子码的一个示例。

更新时间: 2024-05-16 13:22:11

领域: quant-ph,cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2310.15092v2

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.

Updated: 2024-05-16 13:17:05

标题: GenTranslate:大型语言模型是生成式多语言语音和机器翻译器

摘要: 最近的大型语言模型(LLMs)的进展推动了多语言语音和机器翻译的发展,通过减少表示错误并整合外部知识。然而,翻译任务通常利用波束搜索解码和顶级假设选择进行推断。这些技术往往无法充分利用多样的N个最佳假设中的丰富信息,使它们在需要单一高质量输出序列的翻译任务中不够理想。在本文中,我们提出了一种新的翻译任务生成范式,即"GenTranslate",它建立在LLMs的基础上,从N个最佳列表中生成更好的结果。利用LLMs的丰富语言知识和强大推理能力,我们的新范式可以整合N个最佳候选者中的丰富信息,生成更高质量的翻译结果。此外,为了支持LLM微调,我们构建并发布了一个包含11种语言中超过592K假设-翻译对的HypoTranslate数据集。在各种语音和机器翻译基准测试中的实验证明,我们的GenTranslate明显优于当前最先进的模型。

更新时间: 2024-05-16 13:17:05

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2402.06894v2

HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition

Natural language could play an important role in developing generalist surgical models by providing a broad source of supervision from raw texts. This flexible form of supervision can enable the model's transferability across datasets and tasks as natural language can be used to reference learned visual concepts or describe new ones. In this work, we present HecVL, a novel hierarchical video-language pretraining approach for building a generalist surgical model. Specifically, we construct a hierarchical video-text paired dataset by pairing the surgical lecture video with three hierarchical levels of texts: at clip-level, atomic actions using transcribed audio texts; at phase-level, conceptual text summaries; and at video-level, overall abstract text of the surgical procedure. Then, we propose a novel fine-to-coarse contrastive learning framework that learns separate embedding spaces for the three video-text hierarchies using a single model. By disentangling embedding spaces of different hierarchical levels, the learned multi-modal representations encode short-term and long-term surgical concepts in the same model. Thanks to the injected textual semantics, we demonstrate that the HecVL approach can enable zero-shot surgical phase recognition without any human annotation. Furthermore, we show that the same HecVL model for surgical phase recognition can be transferred across different surgical procedures and medical centers.

Updated: 2024-05-16 13:14:43

标题: HecVL:用于零样本手术阶段识别的分层视频-语言预训练

摘要: 自然语言可以通过提供广泛的监督来源,从原始文本中发挥重要作用,从而在开发全科手术模型中发挥重要作用。这种灵活的监督形式可以使模型在数据集和任务之间具有可传递性,因为自然语言可以用来引用学到的视觉概念或描述新的概念。在这项工作中,我们提出了HecVL,这是一种用于构建全科手术模型的新型分层视频语言预训练方法。具体地,我们通过将手术讲座视频与三个层次的文本配对来构建分层视频文本配对数据集:在剪辑级别,使用转录的音频文本进行原子动作;在阶段级别,进行概念性文本总结;在视频级别,进行手术程序的整体抽象文本。然后,我们提出了一种新颖的由精细到粗糙的对比学习框架,该框架使用单一模型为三个视频文本层次学习单独的嵌入空间。通过解耦不同层次的嵌入空间,学习到的多模式表示将短期和长期手术概念编码在同一个模型中。由于注入了文本语义,我们展示了HecVL方法可以实现零标注的手术阶段识别。此外,我们展示了相同的HecVL模型可以在不同的手术程序和医疗中心之间进行转移,用于手术阶段识别。

更新时间: 2024-05-16 13:14:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.10075v1

A blind spot for large language models: Supradiegetic linguistic information

Large Language Models (LLMs) like ChatGPT reflect profound changes in the field of Artificial Intelligence, achieving a linguistic fluency that is impressively, even shockingly, human-like. The extent of their current and potential capabilities is an active area of investigation by no means limited to scientific researchers. It is common for people to frame the training data for LLMs as "text" or even "language". We examine the details of this framing using ideas from several areas, including linguistics, embodied cognition, cognitive science, mathematics, and history. We propose that considering what it is like to be an LLM like ChatGPT, as Nagel might have put it, can help us gain insight into its capabilities in general, and in particular, that its exposure to linguistic training data can be productively reframed as exposure to the diegetic information encoded in language, and its deficits can be reframed as ignorance of extradiegetic information, including supradiegetic linguistic information. Supradiegetic linguistic information consists of those arbitrary aspects of the physical form of language that are not derivable from the one-dimensional relations of context -- frequency, adjacency, proximity, co-occurrence -- that LLMs like ChatGPT have access to. Roughly speaking, the diegetic portion of a word can be thought of as its function, its meaning, as the information in a theoretical vector in a word embedding, while the supradiegetic portion of the word can be thought of as its form, like the shapes of its letters or the sounds of its syllables. We use these concepts to investigate why LLMs like ChatGPT have trouble handling palindromes, the visual characteristics of symbols, translating Sumerian cuneiform, and continuing integer sequences.

Updated: 2024-05-16 13:06:42

标题: 大型语言模型的盲点:超叙事语言信息

摘要: 大型语言模型(LLMs)如ChatGPT反映了人工智能领域的深刻变化,实现了令人印象深刻甚至令人震惊的类人语言流畅度。它们当前和潜在能力的广度是一个积极研究的领域,绝不仅限于科学研究人员。人们通常将LLMs的训练数据框架为“文本”甚至“语言”。我们使用来自语言学、具身认知、认知科学、数学和历史等多个领域的思想来审视这种框架的细节。我们提出,考虑到像ChatGPT这样的LLM是什么感觉,正如纳格尔可能会说的那样,可以帮助我们更好地了解其一般能力,特别是其接触到的语言训练数据可以被有益地重新构想为对语言中编码的叙事信息的接触,而其缺陷可以被重新构想为对超叙事信息的无知,包括超叙事语言信息。超叙事语言信息包括语言物理形式的那些任意方面,这些方面无法从LLMs像ChatGPT这样可以访问的上下文的一维关系(频率、邻近性、接近性、共现)中推导出。粗略地说,一个词的叙事部分可以被看作是其功能,其含义,就像在词嵌入中的理论向量中的信息一样,而该词的超叙事部分可以被看作是其形式,比如其字母的形状或音节的声音。我们使用这些概念来调查为什么像ChatGPT这样的LLMs难以处理回文,符号的视觉特征,翻译苏美尔楔形文字和继续整数序列。

更新时间: 2024-05-16 13:06:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2306.06794v3

A finite-sample generalization bound for stable LPV systems

One of the main theoretical challenges in learning dynamical systems from data is providing upper bounds on the generalization error, that is, the difference between the expected prediction error and the empirical prediction error measured on some finite sample. In machine learning, a popular class of such bounds are the so-called Probably Approximately Correct (PAC) bounds. In this paper, we derive a PAC bound for stable continuous-time linear parameter-varying (LPV) systems. Our bound depends on the H2 norm of the chosen class of the LPV systems, but does not depend on the time interval for which the signals are considered.

Updated: 2024-05-16 12:42:36

标题: 稳定LPV系统的有限样本一般化界限

摘要: 学习动态系统的主要理论挑战之一是提供广义化误差的上限,即预期预测误差与在某个有限样本上测量的经验预测误差之间的差异。在机器学习中,一类流行的上限称为可能近似正确(PAC)上限。本文推导了稳定连续时间线性参数变化(LPV)系统的PAC上限。我们的上限取决于所选LPV系统类的H2范数,但不取决于信号考虑的时间间隔。

更新时间: 2024-05-16 12:42:36

领域: cs.LG,cs.SY,eess.SY,68,I.2.0

下载: http://arxiv.org/abs/2405.10054v1

How should I compute my candidates? A taxonomy and classification of diagnosis computation algorithms

This work proposes a taxonomy for diagnosis computation methods which allows their standardized assessment, classification and comparison. The aim is to (i) give researchers and practitioners an impression of the diverse landscape of available diagnostic techniques, (ii) allow them to easily retrieve the main features as well as pros and cons of the approaches, (iii) enable an easy and clear comparison of the techniques based on their characteristics wrt. a list of important and well-defined properties, and (iv) facilitate the selection of the "right" algorithm to adopt for a particular problem case, e.g., in practical diagnostic settings, for comparison in experimental evaluations, or for reuse, modification, extension, or improvement in the course of research.

Updated: 2024-05-16 12:41:13

标题: 我应该如何计算我的候选者?一种诊断计算算法的分类和分类法

摘要: 这项工作提出了一个诊断计算方法的分类学,可以对其进行标准化评估、分类和比较。其目的是(i)让研究人员和实践者了解可用诊断技术的多样性景观,(ii)使他们能够轻松地获取各种方法的主要特点以及优缺点,(iii)基于一系列重要且明确定义的属性,便于对技术进行简单明了的比较,(iv)促进选择“适合”的算法用于特定问题案例,例如在实际诊断环境中,用于实验评估中的比较,或者在研究过程中进行重用、修改、扩展或改进。

更新时间: 2024-05-16 12:41:13

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2207.12583v2

MarkLLM: An Open-Source Toolkit for LLM Watermarking

LLM watermarking, which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of large language models. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily experiment with, understand, and assess the latest advancements. To address these issues, we introduce MarkLLM, an open-source toolkit for LLM watermarking. MarkLLM offers a unified and extensible framework for implementing LLM watermarking algorithms, while providing user-friendly interfaces to ensure ease of access. Furthermore, it enhances understanding by supporting automatic visualization of the underlying mechanisms of these algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines. Through MarkLLM, we aim to support researchers while improving the comprehension and involvement of the general public in LLM watermarking technology, fostering consensus and driving further advancements in research and application. Our code is available at https://github.com/THU-BPM/MarkLLM.

Updated: 2024-05-16 12:40:01

标题: MarkLLM:用于LLM水印技术的开源工具包

摘要: LLM水印技术将不可察觉但可通过算法检测的信号嵌入到模型输出中,用于识别LLM生成的文本,这在减轻大型语言模型潜在滥用方面变得至关重要。然而,LLM水印算法的丰富性、复杂的机制以及复杂的评估程序和观点给研究人员和社区带来了挑战,使得他们难以轻松地实验、理解和评估最新的进展。为了解决这些问题,我们推出了MarkLLM,一个开源的LLM水印工具包。MarkLLM提供了一个统一且可扩展的框架,用于实现LLM水印算法,同时提供用户友好的界面以确保易于访问。此外,它通过支持这些算法的基本机制的自动可视化来增强理解。在评估方面,MarkLLM提供了一个包含三个角度的综合套件,以及两种类型的自动评估管道。通过MarkLLM,我们旨在支持研究人员,同时提高普通公众对LLM水印技术的理解和参与,促进共识并推动研究和应用的进一步发展。我们的代码可在https://github.com/THU-BPM/MarkLLM上找到。

更新时间: 2024-05-16 12:40:01

领域: cs.CR,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2405.10051v1

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect contamination caused by the variants of test data. TED significantly mitigates performance improvements up to 66.9\% attributed to data contamination across 24 settings and 21 contamination degrees. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.

Updated: 2024-05-16 12:34:24

标题: 泛化还是记忆:数据污染与大型语言模型的可信评估

摘要: 最近关于大型语言模型(LLMs)印象深刻能力的声明通常是通过在公开访问的基准上进行评估来支持的。考虑到LLMs训练数据的巨大规模和广泛来源,它可能明确或隐含地包含测试数据,导致LLMs更容易受到数据污染的影响。然而,由于训练数据的不透明性、模型的黑盒访问以及合成训练数据的快速增长,检测和减轻LLMs的数据污染面临重大挑战。在本文中,我们提出了CDD,即通过LLMs输出分布进行数据污染检测。CDD仅需要采样文本来检测数据污染,通过识别LLMs输出分布的尖峰度。为了减轻评估中数据污染的影响,我们还提出了TED:通过输出分布进行可信评估,基于修正LLMs输出分布。为了促进这项研究,我们引入了两个基准,即DetCon和ComiEval,用于数据污染检测和污染缓解评估任务。广泛的实验结果显示,CDD在准确度、F1分数和AUC指标方面相对其他污染检测方法平均提高了21.8\%-30.2%,并且可以有效地检测由测试数据变体引起的污染。TED在24个设置和21个污染程度中显着减轻了由数据污染导致的高达66.9%的性能改进。在实际应用中,我们发现ChatGPT在HumanEval基准上存在受到数据污染的高潜力。

更新时间: 2024-05-16 12:34:24

领域: cs.CL,cs.AI,cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2402.15938v2

Mesh Neural Cellular Automata

Texture modeling and synthesis are essential for enhancing the realism of virtual environments. Methods that directly synthesize textures in 3D offer distinct advantages to the UV-mapping-based methods as they can create seamless textures and align more closely with the ways textures form in nature. We propose Mesh Neural Cellular Automata (MeshNCA), a method that directly synthesizes dynamic textures on 3D meshes without requiring any UV maps. MeshNCA is a generalized type of cellular automata that can operate on a set of cells arranged on non-grid structures such as the vertices of a 3D mesh. MeshNCA accommodates multi-modal supervision and can be trained using different targets such as images, text prompts, and motion vector fields. Only trained on an Icosphere mesh, MeshNCA shows remarkable test-time generalization and can synthesize textures on unseen meshes in real time. We conduct qualitative and quantitative comparisons to demonstrate that MeshNCA outperforms other 3D texture synthesis methods in terms of generalization and producing high-quality textures. Moreover, we introduce a way of grafting trained MeshNCA instances, enabling interpolation between textures. MeshNCA allows several user interactions including texture density/orientation controls, grafting/regenerate brushes, and motion speed/direction controls. Finally, we implement the forward pass of our MeshNCA model using the WebGL shading language and showcase our trained models in an online interactive demo, which is accessible on personal computers and smartphones and is available at https://meshnca.github.io.

Updated: 2024-05-16 12:32:28

标题: 网格神经细胞自动机

摘要: 纹理建模和合成对于增强虚拟环境的逼真度至关重要。直接在3D中合成纹理的方法与基于UV映射的方法相比具有明显优势,因为它们可以创建无缝纹理,并且更紧密地与自然中的纹理形成方式相吻合。我们提出了Mesh Neural Cellular Automata(MeshNCA)方法,该方法可以在3D网格上直接合成动态纹理,而无需任何UV映射。MeshNCA是一种广义的细胞自动机类型,可以在非网格结构上排列的一组单元上操作,例如3D网格的顶点。MeshNCA可以容纳多模态监督,并且可以使用不同的目标进行训练,例如图像、文本提示和运动矢量场。仅在Icosphere网格上训练,在测试时MeshNCA表现出令人惊讶的泛化能力,并且可以实时在未见过的网格上合成纹理。我们进行定性和定量比较,以证明MeshNCA在泛化和生成高质量纹理方面优于其他3D纹理合成方法。此外,我们介绍了一种将经过训练的MeshNCA实例嫁接的方法,实现纹理之间的插值。MeshNCA允许多种用户交互,包括纹理密度/方向控制、嫁接/再生笔刷以及运动速度/方向控制。最后,我们使用WebGL着色语言实现了MeshNCA模型的前向传递,并在在线交互式演示中展示了我们训练过的模型,可在个人计算机和智能手机上访问,地址为https://meshnca.github.io。

更新时间: 2024-05-16 12:32:28

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2311.02820v2

Global Benchmark Database

This paper presents Global Benchmark Database (GBD), a comprehensive suite of tools for provisioning and sustainably maintaining benchmark instances and their metadata. The availability of benchmark metadata is essential for many tasks in empirical research, e.g., for the data-driven compilation of benchmarks, the domain-specific analysis of runtime experiments, or the instance-specific selection of solvers. In this paper, we introduce the data model of GBD as well as its interfaces and provide examples of how to interact with them. We also demonstrate the integration of custom data sources and explain how to extend GBD with additional problem domains, instance formats and feature extractors.

Updated: 2024-05-16 12:29:12

标题: 全球基准数据库

摘要: 本文介绍了全球基准数据库(GBD),这是一个全面的工具套件,用于提供和可持续维护基准实例及其元数据。基准元数据的可用性对于实证研究中的许多任务至关重要,例如基于数据的基准编制、运行时实验的领域特定分析,或求解器的特定实例选择。在本文中,我们介绍了GBD的数据模型及其接口,并提供了与其交互的示例。我们还展示了如何集成自定义数据源,并解释如何通过增加问题领域、实例格式和特征提取器来扩展GBD。

更新时间: 2024-05-16 12:29:12

领域: cs.DB,cs.AI,cs.LO

下载: http://arxiv.org/abs/2405.10045v1

SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation

Large language models (LLMs) are versatile and can address many tasks, but for computational efficiency, it is often desirable to distill their capabilities into smaller student models. One way to do this for classification tasks is via dataset synthesis, which can be accomplished by generating examples of each label from the LLM. Prior approaches to synthesis use few-shot prompting, which relies on the LLM's parametric knowledge to generate usable examples. However, this leads to issues of repetition, bias towards popular entities, and stylistic differences from human text. In this work, we propose Synthesize by Retrieval and Refinement (SynthesizRR), which uses retrieval augmentation to introduce variety into the dataset synthesis process: as retrieved passages vary, the LLM is "seeded" with different content to generate its examples. We empirically study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor, requiring complex synthesis strategies. We find SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance, when compared to standard 32-shot prompting and six baseline approaches.

Updated: 2024-05-16 12:22:41

标题: SynthesizRR: 利用检索增强生成多样化数据集

摘要: 大型语言模型(LLMs)功能强大,可以处理许多任务,但为了提高计算效率,通常希望将它们的能力提炼成更小的学生模型。对于分类任务,一种方法是通过数据集合成来实现,这可以通过从LLM生成每个标签的示例来实现。先前的综合方法使用少量提示,依赖于LLM的参数知识来生成可用示例。然而,这会导致重复、偏向流行实体和与人类文本的风格差异等问题。在这项工作中,我们提出了Retrieval and Refinement(SynthesizRR)综合方法,它利用检索增强来引入数据集合成过程中的多样性:由于检索到的段落不同,LLM被“种植”不同的内容来生成示例。我们在涵盖主题分类、情感分析、语调检测和幽默等需要复杂合成策略的六个数据集上进行了实证研究。与标准的32-shot提示和六种基准方法相比,我们发现SynthesizRR显著改善了词汇和语义多样性、与人类撰写文本的相似性以及提炼性能。

更新时间: 2024-05-16 12:22:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.10040v1

The Real Price of Bandit Information in Multiclass Classification

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left\{|\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |{\mathcal{H}|}} \right\} \right) }$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Updated: 2024-05-16 12:11:09

标题: 多类别分类中强盗信息的真实代价

摘要: 我们重新审视了具有决策者反馈的多类分类问题(Kakade, Shalev-Shwartz和Tewari,2008),在这里,每个输入都被分类为$K$个可能的标签之一,并且反馈仅限于预测的标签是否正确。我们主要关注标签数量$K$的依赖性,以及在这种情况下$T$步后悔界是否可以超越现有算法所展示的$\smash{\sqrt{KT}}$依赖性。我们的主要贡献在于展示多类决策者的最小后悔实际上更为微妙,形式为$\smash{\widetilde{\Theta}\left(\min \left\{|\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |{\mathcal{H}|}} \right\} \right) }$,其中$\mathcal{H}$是基础(有限)假设类。特别地,我们提出了一种新的决策者分类算法,保证后悔为$\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$,提高了对于中等规模假设类的经典算法,并给出一个匹配的下界,建立了在所有参数范围内上界的紧密性(直到对数因子)。

更新时间: 2024-05-16 12:11:09

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.10027v1

AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection Lab

Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.

Updated: 2024-05-16 12:10:58

标题: 通过设计基于人工智能的网络骚扰检测实验室进行人工智能网络安全教育

摘要: 网络骚扰是一个关键的、具有社会相关性的网络安全问题,因为它可能对被针对的群体或个人造成不利影响。虽然在理解网络骚扰方面已经取得了进展,但对基于人工智能(AI)的网络骚扰系统的检测、攻击以及网络骚扰检测器中的社会问题方面还很少有研究,而在人工智能时代设计吸引学生参与这一新兴社会网络安全领域的体验式学习教育材料更是少之又少。体验式学习机会通常通过诸如计算机科学等STEM项目中的终极项目和工程设计课程来提供。尽管终极项目是体验式学习的一个很好的例子,但由于这一新兴社会网络安全问题的跨学科性质,对于没有人工智能先验知识的非计算机专业学生来说,使用它们来参与可能会有挑战。因此,我们受到激励开发了一个实践实验平台,为没有或几乎没有人工智能背景知识的非计算机专业学生提供实践学习经验,并讨论了在开发这个实验室中所学到的经验。在2022年春季和秋季两个学期内由北卡罗来纳A&T州立大学的社会科学学生使用的这个实验室中,学生会收到详细的实验室手册,并完成一系列详细的任务。通过这个过程,学生学习人工智能概念以及人工智能在网络骚扰检测中的应用。通过使用前后调查问卷,我们要求学生评价他们对人工智能知识或技能以及所学概念的理解。结果显示学生对人工智能和网络骚扰的概念有中等程度的理解。

更新时间: 2024-05-16 12:10:58

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.08125v2

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

Updated: 2024-05-16 12:06:03

标题: 神经崩溃遇上差分隐私:带有近乎完美表示学习的NoisyGD的奇特行为

摘要: 最近De等人(2022)的一项研究报告称,通过在公共数据集上进行预训练的大规模表示学习显著提高了差分隐私(DP)学习在下游任务中的表现,尽管特征空间的维度很高。为了从理论上解释这一现象,我们考虑了表示学习中一种称为“层叠模型”的设置,这导致了与深度学习和迁移学习中学习特征相关的有趣现象,即神经坍塌(NC)。 在NC框架内,我们建立了一个错误界限,表明当实际特征与理想特征之间的距离小于阈值时,误分类误差与维度无关。此外,在NC框架内对最后一层的特征质量进行了实证评估,结果显示更强大的转换器会导致更好的特征表示。此外,我们发现与不使用DP的微调相比,DP微调在存在扰动时更不稳健。这些观察结果得到了理论分析和实验评估的支持。此外,为了增强DP微调的稳健性,我们提出了几种策略,例如特征归一化或使用主成分分析(PCA)等降维方法。在实证上,我们通过对最后一层特征进行PCA,测试精度得到了显著提高。

更新时间: 2024-05-16 12:06:03

领域: cs.LG,cs.CR,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.08920v2

Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suffers from two limitations: 1) LLMs are unaware of the source speech during GER, which may lead to results that are grammatically correct but violate the source speech content, 2) N-best hypotheses usually only vary in a few tokens, making it redundant to send all of them for GER, which could confuse LLM about which tokens to focus on and thus lead to increased miscorrection. In this paper, we propose ClozeGER, a new paradigm for ASR generative error correction. First, we introduce a multimodal LLM (i.e., SpeechGPT) to receive source speech as extra input to improve the fidelity of correction output. Then, we reformat GER as a cloze test with logits calibration to remove the input information redundancy and simplify GER with clear instructions. Experiments show that ClozeGER achieves a new breakthrough over vanilla GER on 9 popular ASR datasets.

Updated: 2024-05-16 12:05:45

标题: 再次聆听并选择正确答案:基于大型语言模型的自动语音识别的新范式

摘要: 最近大语言模型(LLMs)的进展推动了生成式错误校正(GER)在自动语音识别(ASR)中的应用,其目标是从解码的N个最佳假设中预测地面真实转录。由于LLMs强大的语言生成能力和N个最佳列表中丰富的信息,GER在提升ASR结果方面表现出极佳的效果。然而,它仍然存在两个限制:1)LLMs在GER过程中不知道源语音内容,这可能导致结果在语法上正确但与源语音内容不符,2)N个最佳假设通常只有少数标记不同,导致将它们全部发送给GER是多余的,这可能会使LLMs困惑于应该关注哪些标记,从而导致错误更正增加。在本文中,我们提出了ClozeGER,一种新的ASR生成式错误校正范式。首先,我们引入了一种多模态LLM(即SpeechGPT),以接收源语音作为额外输入,以提高校正输出的保真度。然后,我们将GER重新格式化为一个带有对数校准的填空测试,以消除输入信息冗余并简化GER,并提供清晰的指导。实验表明,ClozeGER在9个流行的ASR数据集上取得了与原始GER相比的新突破。

更新时间: 2024-05-16 12:05:45

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.10025v1

$Δ\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning procedures that directly optimise online success. Nevertheless, the high variance that comes with unbiasedness is typically the crux that complicates practical applications. An important insight is that the difference between policy values can often be estimated with significantly reduced variance, if said policies have positive covariance. This allows us to formulate a pairwise off-policy estimation task: $\Delta\text{-}{\rm OPE}$. $\Delta\text{-}{\rm OPE}$ subsumes the common use-case of estimating improvements of a learnt policy over a production policy, using data collected by a stochastic logging policy. We introduce $\Delta\text{-}{\rm OPE}$ methods based on the widely used Inverse Propensity Scoring estimator and its extensions. Moreover, we characterise a variance-optimal additive control variate that further enhances efficiency. Simulated, offline, and online experiments show that our methods significantly improve performance for both evaluation and learning tasks.

Updated: 2024-05-16 12:04:55

标题: $Δ\text{-}{\rm OPE}$: 使用一对策略进行离线策略估计

摘要: 离策略范式将推荐视为一项反事实的决策任务,允许从业者使用离线数据无偏估计在线指标。这导致了有效的评估指标,以及直接优化在线成功的学习程序。然而,与无偏性相伴随的高方差通常是复杂化实际应用的关键。一个重要的见解是,如果这些策略具有正协方差,那么策略值之间的差异通常可以用显著降低的方差来估计。这使我们能够制定一种成对的离策略估计任务:Δ-OPE。Δ-OPE包含了估计已学习策略相对于生产策略的改进的常见用例,使用由随机记录策略收集的数据。我们引入了基于广泛使用的反倒数评分估计器及其扩展的Δ-OPE方法。此外,我们描述了一个方差最优的附加控制变量,进一步提高了效率。模拟、离线和在线实验表明,我们的方法显著提高了评估和学习任务的性能。

更新时间: 2024-05-16 12:04:55

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2405.10024v1

Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

Machine learning has demonstrated remarkable performance over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model's performance in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (e.g., the human users), thus a good evaluation protocol is probably to evaluate the models' behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle (i.e., a foundation model). Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same image-label structure the original test image represents, constrained by a foundation model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.

Updated: 2024-05-16 12:02:45

标题: 基于基础模型的鲁棒性:通过预训练模型进行鲁棒图像模型评估

摘要: 机器学习在有限数据集上展现出了显著的性能,然而固定基准上的得分是否足以表明模型在现实世界中的表现仍在讨论中。事实上,一个理想的鲁棒模型可能会类似于预言者(例如,人类用户),因此一个良好的评估协议可能是将模型的行为与预言者进行比较。在本文中,我们介绍了一种新的鲁棒性测量方法,直接衡量图像分类模型与代理预言者(即,基础模型)的性能。此外,我们设计了一种简单的方法,可以超越基准范围进行评估。我们的方法通过对图像数据集进行扩展,引入足够扰动的新样本,使其与原始集中的样本有所不同,但仍受到原始测试图像表示的相同图像-标签结构的限制,受基础模型预训练的大量样本的约束。因此,我们的新方法将为我们提供一个评估模型鲁棒性性能的新途径,摆脱了固定基准或受限扰动的限制,尽管受到预言者的影响。除了评估结果外,我们还利用生成的数据来了解模型的行为和我们的新评估策略。

更新时间: 2024-05-16 12:02:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.10632v3

Natural Language Can Help Bridge the Sim2Real Gap

The main challenge in learning image-conditioned robotic policies is acquiring a visual representation conducive to low-level control. Due to the high dimensionality of the image space, learning a good visual representation requires a considerable amount of visual data. However, when learning in the real world, data is expensive. Sim2Real is a promising paradigm for overcoming data scarcity in the real-world target domain by using a simulator to collect large amounts of cheap data closely related to the target task. However, it is difficult to transfer an image-conditioned policy from sim to real when the domains are very visually dissimilar. To bridge the sim2real visual gap, we propose using natural language descriptions of images as a unifying signal across domains that captures the underlying task-relevant semantics. Our key insight is that if two image observations from different domains are labeled with similar language, the policy should predict similar action distributions for both images. We demonstrate that training the image encoder to predict the language description or the distance between descriptions of a sim or real image serves as a useful, data-efficient pretraining step that helps learn a domain-invariant image representation. We can then use this image encoder as the backbone of an IL policy trained simultaneously on a large amount of simulated and a handful of real demonstrations. Our approach outperforms widely used prior sim2real methods and strong vision-language pretraining baselines like CLIP and R3M by 25 to 40%.

Updated: 2024-05-16 12:02:02

标题: 自然语言可以帮助弥合模拟到真实环境之间的差距

摘要: 学习基于图像条件的机器人策略的主要挑战在于获得促进低级控制的视觉表示。由于图像空间的高维度,学习一个好的视觉表示需要大量的视觉数据。然而,在现实世界中学习时,数据是昂贵的。Sim2Real是一种有希望的范例,通过使用模拟器收集大量与目标任务密切相关的廉价数据,来克服现实世界目标领域中数据稀缺的问题。然而,当领域在视觉上非常相异时,从模拟到真实的转移图像条件策略是困难的。为了弥合sim2real视觉差距,我们提出使用图像的自然语言描述作为跨领域的统一信号,捕捉潜在的任务相关语义。我们的关键见解是,如果来自不同领域的两个图像观察被标记为相似的语言,那么策略应该为这两个图像预测相似的动作分布。我们展示了训练图像编码器来预测语言描述或模拟或真实图像之间描述的距离,作为一个有用的、高效的数据预训练步骤,有助于学习一个领域不变的图像表示。然后,我们可以将这个图像编码器作为IL策略的主干,同时在大量模拟和少量真实演示上进行训练。我们的方法胜过了广泛使用的先前sim2real方法和强有力的视觉语言预训练基线,如CLIP和R3M,提高了25%到40%。

更新时间: 2024-05-16 12:02:02

领域: cs.RO,cs.CL,cs.CV,cs.LG,I.2.9; I.2.7; I.2.6

下载: http://arxiv.org/abs/2405.10020v1

A Survey on Deep Learning and State-of-the-art Applications

Deep learning, a branch of artificial intelligence, is a computational model that uses multiple layers of interconnected units (neurons) to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is a challenging task due to the algorithm`s complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art of deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review the state-of-the-art deep learning models in computer vision, natural language processing, time series analysis and pervasive computing. We highlight the key features of the models and their effectiveness in solving the problems within each domain. Furthermore, this study presents the fundamentals of deep learning, various deep learning model types and prominent convolutional neural network architectures. Finally, challenges and future directions in deep learning research are discussed to offer a broader perspective for future researchers.

Updated: 2024-05-16 12:00:29

标题: 一份关于深度学习和最新应用的调查

摘要: 深度学习是人工智能的一个分支,是一种利用多层相互连接的单元(神经元)从原始输入数据中直接学习复杂模式和表示的计算模型。凭借这种学习能力,它已成为解决复杂问题的强大工具,并是许多突破性技术和创新的核心驱动力。构建深度学习模型是一项具有挑战性的任务,因为算法的复杂性和现实世界问题的动态性。有几项研究已经回顾了深度学习的概念和应用。然而,这些研究主要关注深度学习模型的类型和卷积神经网络架构,对深度学习模型及其在解决不同领域复杂问题中的应用的最新发展覆盖有限。因此,受到这些局限的启发,本研究旨在全面审视计算机视觉、自然语言处理、时间序列分析和普适计算领域的最新深度学习模型。我们重点介绍了这些模型的关键特点以及它们在每个领域内解决问题的有效性。此外,本研究介绍了深度学习的基础知识、各种深度学习模型类型和著名的卷积神经网络架构。最后,讨论了深度学习研究中的挑战和未来方向,为未来研究人员提供更广泛的视角。

更新时间: 2024-05-16 12:00:29

领域: cs.LG

下载: http://arxiv.org/abs/2403.17561v3

Machine Learning-Based Path Loss Modeling with Simplified Features

Propagation modeling is a crucial tool for successful wireless deployments and spectrum planning with the demand for high modeling accuracy continuing to grow. Recognizing that detailed knowledge of the physical environment (terrain and clutter) is essential, we propose a novel approach that uses environmental information for predictions. Instead of relying on complex, detail-intensive models, we explore the use of simplified scalar features involving the total obstruction depth along the direct path from transmitter to receiver. Obstacle depth offers a streamlined, yet surprisingly accurate, method for predicting wireless signal propagation, providing a practical solution for efficient and effective wireless network planning.

Updated: 2024-05-16 11:46:39

标题: 基于简化特征的基于机器学习的路径损耗建模

摘要: 传播建模是成功的无线部署和频谱规划的关键工具,对于高精度建模的需求不断增长。我们认识到对于物理环境(地形和杂乱环境)的详细了解是至关重要的,因此提出了一种利用环境信息进行预测的新方法。我们不再依赖复杂、细节密集的模型,而是探索使用简化的标量特征,涉及从发射机到接收机的直接路径上的总遮挡深度。障碍物深度提供了一种简化而令人惊讶地准确的方法,用于预测无线信号的传播,为高效和有效的无线网络规划提供了实用解决方案。

更新时间: 2024-05-16 11:46:39

领域: cs.LG,cs.NI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.10006v1

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.

Updated: 2024-05-16 11:44:35

标题: ROCOv2:上下文中的放射学对象版本2,更新的多模态图像数据集

摘要: 自动化医学图像分析系统通常需要大量具有高质量标签的训练数据,这些数据生成起来费时费力。本文介绍了Radiology Object in COntext version 2(ROCOv2),这是一个多模态数据集,包含从PMC开放获取子集中提取的放射影像和相关医学概念和标题。这是2018年发布的ROCO数据集的更新版本,新增了自2018年以来PMC中添加的35,705张新图像。它进一步为X射线提供了手动策划的成像模态的概念,包括额外的解剖和方向概念。该数据集包含79,789张图像,并已在ImageCLEFmedical Caption 2023的概念检测和标题预测任务中使用,仅进行了轻微修改。该数据集适用于基于图像-标题对训练图像注释模型,或使用每个图像提供的统一医学语言系统(UMLS)概念进行多标签图像分类。此外,它可用于医学领域模型的预训练,并评估用于多任务学习的深度学习模型。

更新时间: 2024-05-16 11:44:35

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.10004v1

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes. Our work is the first to establish the optimal (w.r.t.~$K$) rate of convergence in the stochastic setting with bandit feedback using a policy optimization based approach, and the first to establish the optimal (w.r.t.~$K$) rate in the adversarial setup with full information feedback, for which no algorithm with an optimal rate guarantee is currently known.

Updated: 2024-05-16 11:37:06

标题: 线性马尔可夫决策过程的速率最优策略优化

摘要: 我们研究在线情节线性马尔可夫决策过程中的遗憾最小化,并获得速率最优的$\widetilde O (\sqrt K)$遗憾,其中$K$表示情节数。我们的工作是第一个在随机设置中利用基于策略优化的方法建立了最优(关于$K$)收敛速率,并且是第一个在具有全信息反馈的对抗设置中建立了最优(关于$K$)速率的工作,目前尚未知道具有最优速率保证的算法。

更新时间: 2024-05-16 11:37:06

领域: cs.LG

下载: http://arxiv.org/abs/2308.14642v3

Reward Centering

We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant, then standard methods perform much worse, whereas methods with reward centering are unaffected. Estimating the average reward is straightforward in the on-policy setting; we propose a slightly more sophisticated method for the off-policy setting. Reward centering is a general idea, so we expect almost every reinforcement-learning algorithm to benefit by the addition of reward centering.

Updated: 2024-05-16 11:33:49

标题: 奖励中心

摘要: 我们发现,在解决持续强化学习问题时,折扣方法如果通过减去奖励的经验平均值来使奖励集中,其性能可以显著提高。在常用的折扣因子下,这种改进是显著的,并且随着折扣因子接近1,改进进一步增加。此外,我们发现如果问题的奖励被一个常数移动,标准方法表现得更糟,而奖励中心化的方法则不受影响。在在线策略设置中估计平均奖励是直观的;我们提出了一种稍微更复杂的方法来解决离线策略设置中的问题。奖励中心化是一个通用的思想,因此我们预计几乎每个强化学习算法都会通过添加奖励中心化来受益。

更新时间: 2024-05-16 11:33:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.09999v1

Generative Design through Quality-Diversity Data Synthesis and Language Models

Two fundamental challenges face generative models in engineering applications: the acquisition of high-performing, diverse datasets, and the adherence to precise constraints in generated designs. We propose a novel approach combining optimization, constraint satisfaction, and language models to tackle these challenges in architectural design. Our method uses Quality-Diversity (QD) to generate a diverse, high-performing dataset. We then fine-tune a language model with this dataset to generate high-level designs. These designs are then refined into detailed, constraint-compliant layouts using the Wave Function Collapse algorithm. Our system demonstrates reliable adherence to textual guidance, enabling the generation of layouts with targeted architectural and performance features. Crucially, our results indicate that data synthesized through the evolutionary search of QD not only improves overall model performance but is essential for the model's ability to closely adhere to textual guidance. This improvement underscores the pivotal role evolutionary computation can play in creating the datasets key to training generative models for design. Web article at https://tilegpt.github.io

Updated: 2024-05-16 11:30:08

标题: 通过质量多样性数据综合和语言模型进行生成设计

摘要: 工程应用中生成模型面临两个基本挑战:获取高性能、多样化的数据集,以及生成设计时遵守精确的约束条件。我们提出了一种新颖的方法,结合优化、约束满足和语言模型来应对建筑设计中的这些挑战。我们的方法使用质量多样性(QD)生成多样化、高性能的数据集。然后,我们使用这个数据集对语言模型进行微调,生成高水平的设计。这些设计随后通过波函数坍缩算法精细化为详细的、符合约束条件的布局。我们的系统展示了可靠地遵循文本指导,实现生成具有目标建筑和性能特征的布局。至关重要的是,我们的结果表明,通过QD的进化搜索合成的数据不仅改善了整体模型性能,而且对模型密切遵循文本指导的能力至关重要。这一改进强调了进化计算在创建训练生成模型所需关键数据集方面的关键作用。请参考网页文章https://tilegpt.github.io。

更新时间: 2024-05-16 11:30:08

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2405.09997v1

Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification

Large pretrained transformers are increasingly being developed as generalised foundation models which can underpin powerful task-specific artificial intelligence models. Histopathology foundation models show promise across many tasks, but analyses have been limited by arbitrary hyperparameters that were not tuned to the specific task/dataset. We report the most rigorous single-task validation conducted to date of a histopathology foundation model, and the first performed in ovarian cancer subtyping. Attention-based multiple instance learning classifiers were compared using vision transformer and ResNet features generated through varied preprocessing and pretraining procedures. The training set consisted of 1864 whole slide images from 434 ovarian carcinoma cases at Leeds Hospitals. Five-class classification performance was evaluated through five-fold cross-validation, and these cross-validation models were ensembled for evaluation on a hold-out test set and an external set from the Transcanadian study. Reporting followed the TRIPOD+AI checklist. The vision transformer-based histopathology foundation model, UNI, performed best in every evaluation, with five-class balanced accuracies of 88% and 93% in hold-out internal and external testing, compared to the best ResNet model scores of 68% and 81%, respectively. Normalisations and augmentations aided the generalisability of ResNet-based models, but these still did not match the performance of UNI, which gave the best external performance in any ovarian cancer subtyping study to date. Histopathology foundation models offer a clear benefit to subtyping, improving classification performance to a degree where clinical utility is tangible, albeit with an increased computational burden. Such models could provide a second opinion in challenging cases and may improve the accuracy, objectivity, and efficiency of pathological diagnoses overall.

Updated: 2024-05-16 11:21:02

标题: 组织病理学基础模型实现准确的卵巢癌亚型分类

摘要: 大型预训练变压器越来越被开发为通用基础模型,可以支撑强大的任务特定人工智能模型。组织病理学基础模型在许多任务中显示出潜力,但分析受到未调整为特定任务/数据集的任意超参数的限制。我们报告了迄今为止对组织病理学基础模型进行的最严格的单一任务验证,也是首次在卵巢癌分型中进行的。通过使用通过不同预处理和预训练程序生成的视觉变压器和ResNet特征进行比较的基于注意力的多实例学习分类器。训练集由来自利兹医院的434例卵巢癌病例的1864张全切片图像组成。通过五折交叉验证评估了五类分类性能,并且这些交叉验证模型被集成用于在保留测试集和Transcanadian研究中的外部集上进行评估。报告遵循了TRIPOD+AI检查表。基于视觉变压器的组织病理学基础模型UNI在每次评估中表现最佳,五类平衡精度分别为88%和93%,而最佳ResNet模型得分分别为68%和81%。规范化和增强有助于ResNet模型的泛化能力,但这些仍不及UNI的表现,UNI在迄今为止任何卵巢癌分型研究中均提供了最佳的外部表现。组织病理学基础模型为分型提供了明显的好处,将分类性能提高到使临床效用具体可见,尽管计算负担增加。这种模型可以在具有挑战性的病例中提供第二意见,并可能提高病理诊断的准确性、客观性和效率。

更新时间: 2024-05-16 11:21:02

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.09990v1

Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population

Population censuses are vital to public policy decision-making. They provide insight into human resources, demography, culture, and economic structure at local, regional, and national levels. However, such surveys are very expensive (especially for low and middle-income countries with high populations, such as India), time-consuming, and may also raise privacy concerns, depending upon the kinds of data collected. In light of these issues, we introduce SynthPop++, a novel hybrid framework, which can combine data from multiple real-world surveys (with different, partially overlapping sets of attributes) to produce a real-scale synthetic population of humans. Critically, our population maintains family structures comprising individuals with demographic, socioeconomic, health, and geolocation attributes: this means that our ``fake'' people live in realistic locations, have realistic families, etc. Such data can be used for a variety of purposes: we explore one such use case, Agent-based modelling of infectious disease in India. To gauge the quality of our synthetic population, we use both machine learning and statistical metrics. Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India, producing real-scale, detailed data at the desired level of zoom -- from cities, to districts, to states, eventually combining to form a country-scale synthetic population.

Updated: 2024-05-16 11:03:49

标题: Synthpop++: 一个用于生成国家规模合成人口的混合框架

摘要: 人口普查对公共政策决策至关重要。它们提供了对本地、区域和国家层面的人力资源、人口统计、文化和经济结构的深入了解。然而,这些调查非常昂贵(尤其对于印度等人口众多的低收入和中等收入国家而言),耗时,并且可能会引发隐私问题,取决于收集的数据类型。 鉴于这些问题,我们引入了SynthPop ++,这是一个新颖的混合框架,可以结合来自多个现实调查(具有不同、部分重叠属性集的)的数据,生成一个真实规模的人口合成。关键是,我们的人口保持了包括人口统计、社会经济、健康和地理位置属性在内的家庭结构:这意味着我们的“虚假”人口生活在现实位置、有现实家庭等。这些数据可用于各种目的:我们探讨了一个这样的用例,即在印度进行传染病的基于代理的建模。 为了衡量我们合成人口的质量,我们使用机器学习和统计指标。我们的实验结果表明,合成人口可以逼真地模拟印度各行政单位的人口,产生所需层级的真实规模、详细数据——从城市、到区、再到州,最终组合成一个国家规模的合成人口。

更新时间: 2024-05-16 11:03:49

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2304.12284v2

A Privacy Preserving System for Movie Recommendations Using Federated Learning

Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.

Updated: 2024-05-16 11:03:49

标题: 一个使用联邦学习的电影推荐隐私保护系统

摘要: 推荐系统在过去几年已经变得无处不在。它们解决了许多用户面临的选择困难问题,并被许多在线企业用来推动用户参与和销售。除了其他批评,比如在社交网络中创建过滤气泡,推荐系统经常因收集大量个人数据而受到谴责。然而,要个性化推荐,基本上需要个人信息。最近出现了一种名为联邦学习的分布式学习方案,使我们能够从个人用户数据中学习而不需要集中收集。因此,我们提出了一个电影推荐的推荐系统,它在多个层面上提供隐私性和可信度:首先,它是使用联邦学习进行训练的,因此从其本质上来说是保护隐私的,同时仍使用户能够受益于全局见解。此外,采用了一种新颖的联邦学习方案,称为FedQ,它不仅解决了非i.i.d.和小型本地数据集的问题,还通过尽早聚合客户更新来防止输入数据重建攻击。最后,为了减少通信开销,采用了压缩技术,显著压缩了交换的神经网络参数到其原始大小的一小部分。我们推测这也可能通过其有损量化阶段改善数据隐私。

更新时间: 2024-05-16 11:03:49

领域: cs.IR,cs.CR,cs.LG

下载: http://arxiv.org/abs/2303.04689v4

Zero-Shot Hierarchical Classification on the Common Procurement Vocabulary Taxonomy

Classifying public tenders is a useful task for both companies that are invited to participate and for inspecting fraudulent activities. To facilitate the task for both participants and public administrations, the European Union presented a common taxonomy (\textit{Common Procurement Vocabulary}, CPV) which is mandatory for tenders of certain importance; however, the contracts in which a CPV label is mandatory are the minority compared to all the Public Administrations activities. Classifying over a real-world taxonomy introduces some difficulties that can not be ignored. First of all, some fine-grained classes have an insufficient (if any) number of observations in the training set, while other classes are far more frequent (even thousands of times) than the average. To overcome those difficulties, we present a zero-shot approach, based on a pre-trained language model that relies only on label description and respects the label taxonomy. To train our proposed model, we used industrial data, which comes from \url{contrattipubblici.org}, a service by \href{https://spaziodati.eu}{SpazioDati s.r.l}. that collects public contracts stipulated in Italy in the last 25 years. Results show that the proposed model achieves better performance in classifying low-frequent classes compared to three different baselines, and is also able to predict never-seen classes.

Updated: 2024-05-16 11:01:09

标题: 零射击普通采购词汇分类层次结构

摘要: 分类公共招标是对被邀请参与的公司以及检查欺诈活动都有用的任务。为了为参与者和公共行政部门简化任务,欧盟提出了一个通用的分类法(Common Procurement Vocabulary,CPV),对于某些重要的招标是强制性的;然而,CPV标签是强制的合同占所有公共行政活动的少数。在真实世界的分类法中引入了一些难以忽视的困难。首先,一些细粒度的类别在训练集中的观测数量不足(如果有的话),而其他类别则比平均值频繁得多(甚至比平均值高出数千倍)。为了克服这些困难,我们提出了一种零-shot方法,基于一个预先训练的语言模型,仅依赖于标签描述并遵守标签分类法。为了训练我们提出的模型,我们使用了来自contrattipubblici.org的工业数据,这是由SpazioDati s.r.l提供的服务,该服务收集了意大利过去25年签订的公共合同。结果显示,所提出的模型在分类低频类别方面比三种不同的基线模型表现更好,并且还能预测从未见过的类别。

更新时间: 2024-05-16 11:01:09

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.09983v1

BEIR-PL: Zero Shot Information Retrieval Benchmark for the Polish Language

The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish language, thereby advancing the research in this NLP area. In this work, inspired by mMARCO and Mr.~TyDi datasets, we translated all accessible open IR datasets into Polish, and we introduced the BEIR-PL benchmark -- a new benchmark which comprises 13 datasets, facilitating further development, training and evaluation of modern Polish language models for IR tasks. We executed an evaluation and comparison of numerous IR models on the newly introduced BEIR-PL benchmark. Furthermore, we publish pre-trained open IR models for Polish language,d marking a pioneering development in this field. Additionally, the evaluation revealed that BM25 achieved significantly lower scores for Polish than for English, which can be attributed to high inflection and intricate morphological structure of the Polish language. Finally, we trained various re-ranking models to enhance the BM25 retrieval, and we compared their performance to identify their unique characteristic features. To ensure accurate model comparisons, it is necessary to scrutinise individual results rather than to average across the entire benchmark. Thus, we thoroughly analysed the outcomes of IR models in relation to each individual data subset encompassed by the BEIR benchmark. The benchmark data is available at URL {\bf https://huggingface.co/clarin-knext}.

Updated: 2024-05-16 10:59:27

标题: BEIR-PL:波兰语零-shot信息检索基准测试

摘要: BEIR数据集是一个在零-shot设置下用于信息检索(IR)的大型、异构的基准,引起了研究界的广泛关注。然而,BEIR和类似的数据集主要局限于英语语言。我们的目标是建立广泛的大规模资源,用于波兰语的IR,从而推动这一NLP领域的研究。在这项工作中,受mMARCO和Mr.~TyDi数据集的启发,我们将所有可访问的开放IR数据集翻译成波兰语,并推出了BEIR-PL基准 -- 一个包括13个数据集的新基准,促进了对现代波兰语语言模型进行IR任务的进一步发展、训练和评估。我们在新推出的BEIR-PL基准上对多个IR模型进行了评估和比较。此外,我们发布了用于波兰语的预训练开放IR模型,标志着这一领域的开创性发展。此外,评估结果显示,BM25在波兰语上的得分显著低于英语,这可以归因于波兰语的高屈折和复杂的形态结构。最后,我们训练了各种重新排名模型来增强BM25的检索,并比较它们的性能以确定其独特的特征。为了确保准确的模型比较,有必要仔细审查各个数据子集的个别结果,而不是对整个基准进行平均。因此,我们对IR模型的结果进行了彻底分析,涵盖了BEIR基准包含的每个单独数据子集。基准数据可在URL {\bf https://huggingface.co/clarin-knext} 上找到。

更新时间: 2024-05-16 10:59:27

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2305.19840v2

Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective

This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, the overall performance of the data-driven PINNs (DD-PINNs) framework is examined, which can utilize the acquired datasets in DT scenarios. Its scalability to more general physics is validated within parametric Navier-Stokes equations, where PINNs do not need to be retrained as the Reynolds number varies. In addition, since datasets can be often collected from different fidelity/sparsity in practice, multi-fidelity DD-PINNs are also proposed and evaluated. They show remarkable prediction performance even in the extrapolation tasks, with $42\sim62\%$ improvement over the single-fidelity approach. Finally, the uncertainty quantification performance of multi-fidelity DD-PINNs is investigated by the ensemble method to verify their potential in DT, where an accurate measure of predictive uncertainty is critical. The DD-PINN frameworks explored in this study are found to be more suitable for DT scenarios than traditional PINNs from the above perspectives, bringing engineers one step closer to seamless DT realization.

Updated: 2024-05-16 10:55:23

标题: 基于数据驱动的物理信息神经网络:数字孪生的视角

摘要: 本研究从多个角度探讨了物理信息神经网络(PINNs)在数字孪生(DT)实现中的潜力。首先,研究了用于定义点的各种自适应采样方法,以验证它们在PINNs的无网格框架中的有效性,该框架允许自动构建虚拟表示而无需手动生成网格。然后,研究了数据驱动的PINNs(DD-PINNs)框架的整体性能,该框架可以在DT场景中利用获取的数据集。通过在参数化Navier-Stokes方程中验证其对更一般的物理的可扩展性,其中PINNs不需要随着雷诺数的变化而重新训练。此外,由于在实践中数据集通常可以从不同的保真度/稀疏性中收集,因此还提出并评估了多保真度DD-PINNs。它们在外推任务中表现出卓越的预测性能,比单一保真度方法提高了42\~62%。最后,通过集成方法研究了多保真度DD-PINNs的不确定性量化性能,以验证它们在DT中的潜力,其中准确衡量预测不确定性至关重要。本研究探讨的DD-PINN框架在上述各方面被发现更适合于DT场景,使工程师更接近实现无缝的DT。

更新时间: 2024-05-16 10:55:23

领域: physics.flu-dyn,cs.CE,cs.LG

下载: http://arxiv.org/abs/2401.08667v2

Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits

In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices' information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and F1-score.

Updated: 2024-05-16 10:53:52

标题: 基于图注意力的模拟电路对称约束提取

摘要: 近年来,模拟电路受到了广泛关注,并在许多新兴应用中被广泛使用。对模拟电路的高需求需要缩短电路设计周期。为了实现所需的性能和规格,各种几何对称约束在模拟布局过程中必须仔细考虑。然而,由经验丰富的模拟工程师手动标记这些约束是一个费时费力的过程。为了解决昂贵的运行时间问题,我们提出了一种基于图形的学习框架,用于自动提取模拟电路布局中的对称约束。所提出的框架利用电路的连接特性和器件信息来学习对称约束的一般规则,有效地促进了在电路网表上提取器件级约束。实验结果表明,与最先进的对称约束检测方法相比,我们的框架实现了更高的准确性和F1分数。

更新时间: 2024-05-16 10:53:52

领域: cs.LG

下载: http://arxiv.org/abs/2312.14405v2

FinTextQA: A Dataset for Long-form Financial Question Answering

Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold.

Updated: 2024-05-16 10:53:31

标题: FinTextQA:用于长篇金融问题回答的数据集

摘要: 财务问答(QA)系统的准确评估需要一个涵盖多种问题类型和背景的全面数据集。然而,当前的财务QA数据集在范围多样性和问题复杂性方面存在不足。本文介绍了FinTextQA,这是一个用于金融领域长篇问答(LFQA)的新数据集。FinTextQA包括1,262个高质量、来源可追溯的QA对,从金融教材和政府机构网站中提取和选择。此外,我们开发了基于检索增强生成(RAG)的LFQA系统,包括嵌入器、检索器、重新排序器和生成器。我们采用多方面的评估方法,包括人工排名、自动评估指标和GPT-4评分,来对不同LFQA系统配置在增加噪声条件下的表现进行基准测试。结果表明:(1)在所有比较的生成器中,百川2-7B在准确度得分上与GPT-3.5-turbo竞争激烈;(2)在我们的数据集上,最有效的系统配置涉及将嵌入器、检索器、重新排序器和生成器分别设置为Ada2、自动合并检索、Bge-Reranker-Base和百川2-7B;(3)在上下文长度达到特定阈值后,模型对噪声的敏感性降低。

更新时间: 2024-05-16 10:53:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09980v1

Predicting Solar Heat Production to Optimize Renewable Energy Usage

Utilizing solar energy to meet space heating and domestic hot water demand is very efficient (in terms of environmental footprint as well as cost), but in order to ensure that user demand is entirely covered throughout the year needs to be complemented with auxiliary heating systems, typically boilers and heat pumps. Naturally, the optimal control of such a system depends on an accurate prediction of solar thermal production. Experimental testing and physics-based numerical models are used to find a collector's performance curve - the mapping from solar radiation and other external conditions to heat production - but this curve changes over time once the collector is exposed to outdoor conditions. In order to deploy advanced control strategies in small domestic installations, we present an approach that uses machine learning to automatically construct and continuously adapt a model that predicts heat production. Our design is driven by the need to (a) construct and adapt models using supervision that can be extracted from low-cost instrumentation, avoiding extreme accuracy and reliability requirements; and (b) at inference time, use inputs that are typically provided in publicly available weather forecasts. Recent developments in attention-based machine learning, as well as careful adaptation of the training setup to the specifics of the task, have allowed us to design a machine learning-based solution that covers our requirements. We present positive empirical results for the predictive accuracy of our solution, and discuss the impact of these results on the end-to-end system.

Updated: 2024-05-16 10:32:39

标题: 预测太阳热生产以优化可再生能源利用

摘要: 利用太阳能满足空间供暖和家用热水需求非常高效(从环境足迹和成本的角度来看),但为了确保用户需求在整年都得到满足,需要辅助供热系统,通常是锅炉和热泵。当然,这样一个系统的最佳控制取决于对太阳热生产的准确预测。 实验测试和基于物理的数值模型被用来找到集热器的性能曲线 - 从太阳辐射和其他外部条件到热生产的映射 - 但一旦集热器暴露在户外条件下,这个曲线会随时间变化。为了在小型家庭安装中部署先进的控制策略,我们提出了一个方法,利用机器学习自动构建并持续调整一个预测热生产的模型。我们的设计受到以下需求驱动:(a)构建和调整模型时,利用可以从低成本仪器中提取的监督,避免极端精确度和可靠性要求;(b)在推断时间,使用通常提供在公开天气预报中的输入。 最近注意力机制机器学习的发展,以及对训练设置根据任务的特定性进行细致调整,使我们能够设计出一个满足我们需求的基于机器学习的解决方案。我们展示了我们解决方案预测准确性的积极实证结果,并讨论了这些结果对端到端系统的影响。

更新时间: 2024-05-16 10:32:39

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.09972v1

Uniform Pessimistic Risk and Optimal Portfolio

The optimal allocation of assets has been widely discussed with the theoretical analysis of risk measures, and pessimism is one of the most attractive approaches beyond the conventional optimal portfolio model. The $\alpha$-risk plays a crucial role in deriving a broad class of pessimistic optimal portfolios. However, estimating an optimal portfolio assessed by a pessimistic risk is still challenging due to the absence of a computationally tractable model. In this study, we propose an integral of $\alpha$-risk called the \textit{uniform pessimistic risk} and the computational algorithm to obtain an optimal portfolio based on the risk. Further, we investigate the theoretical properties of the proposed risk in view of three different approaches: multiple quantile regression, the proper scoring rule, and distributionally robust optimization. Real data analysis of three stock datasets (S\&P500, CSI500, KOSPI200) demonstrates the usefulness of the proposed risk and portfolio model.

Updated: 2024-05-16 10:15:20

标题: 统一悲观风险与最优组合投资

摘要: 资产的最优配置已经广泛讨论,与风险度量的理论分析,悲观主义是超越传统最优投资组合模型的最吸引人的方法之一。α风险在推导广泛类的悲观最优投资组合中起着至关重要的作用。然而,由悲观风险评估的最优投资组合的估计仍然具有挑战性,因为缺乏一个可计算的模型。在这项研究中,我们提出了一个称为“均匀悲观风险”的α风险的积分,以及用于基于风险获得最优投资组合的计算算法。此外,我们通过多个分位数回归、适当的评分规则和分布鲁棒优化三种不同的方法,研究了所提出风险的理论特性。对三个股票数据集(标普500、中证500、韩国综合指数200)的实际数据分析展示了所提出的风险和投资组合模型的实用性。

更新时间: 2024-05-16 10:15:20

领域: q-fin.PM,cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2303.07158v2

A Unified Deep Transfer Learning Model for Accurate IoT Localization in Diverse Environments

Internet of Things (IoT) is an ever-evolving technological paradigm that is reshaping industries and societies globally. Real-time data collection, analysis, and decision-making facilitated by localization solutions form the foundation for location-based services, enabling them to support critical functions within diverse IoT ecosystems. However, most existing works on localization focus on single environment, resulting in the development of multiple models to support multiple environments. In the context of smart cities, these raise costs and complexity due to the dynamicity of such environments. To address these challenges, this paper presents a unified indoor-outdoor localization solution that leverages transfer learning (TL) schemes to build a single deep learning model. The model accurately predicts the localization of IoT devices in diverse environments. The performance evaluation shows that by adopting an encoder-based TL scheme, we can improve the baseline model by about 17.18% in indoor environments and 9.79% in outdoor environments.

Updated: 2024-05-16 10:07:59

标题: 一个统一的深度迁移学习模型,用于准确的物联网在不同环境中的定位

摘要: 物联网(IoT)是一个不断发展的技术范式,正在全球重塑行业和社会。由定位解决方案促进的实时数据收集、分析和决策构成了基于位置的服务的基础,使它们能够支持各种物联网生态系统中的关键功能。然而,大多数现有的定位工作都集中在单一环境上,导致开发了多个模型来支持多个环境。在智能城市的背景下,这些模型因环境的动态性而增加了成本和复杂性。为了解决这些挑战,本文提出了一个统一的室内外定位解决方案,利用迁移学习(TL)方案构建一个单一的深度学习模型。该模型可以准确预测物联网设备在各种环境中的定位。性能评估表明,通过采用基于编码器的TL方案,我们可以在室内环境中将基线模型提高约17.18%,在室外环境中提高9.79%。

更新时间: 2024-05-16 10:07:59

领域: eess.SP,cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.09960v1

Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respect

With the growing popularity of conversational agents based on large language models (LLMs), we need to ensure their behaviour is ethical and appropriate. Work in this area largely centres around the 'HHH' criteria: making outputs more helpful and honest, and avoiding harmful (biased, toxic, or inaccurate) statements. Whilst this semantic focus is useful when viewing LLM agents as mere mediums or output-generating systems, it fails to account for pragmatic factors that can make the same speech act seem more or less tactless or inconsiderate in different social situations. With the push towards agentic AI, wherein systems become increasingly proactive in chasing goals and performing actions in the world, considering the pragmatics of interaction becomes essential. We propose an interactional approach to ethics that is centred on relational and situational factors. We explore what it means for a system, as a social actor, to treat an individual respectfully in a (series of) interaction(s). Our work anticipates a set of largely unexplored risks at the level of situated social interaction, and offers practical suggestions to help agentic LLM technologies treat people well.

Updated: 2024-05-16 09:53:45

标题: 机器人对话AI是否应改变我们对伦理道德的看法?以尊重为中心的互动伦理特征描述

摘要: 随着基于大型语言模型(LLMs)的会话代理的日益普及,我们需要确保它们的行为是道德和适当的。这一领域的工作主要围绕着“HHH”标准展开:使输出更有帮助和诚实,避免有害(偏见、有毒或不准确)的陈述。虽然这种语义焦点在将LLM代理视为纯粹的媒介或输出生成系统时是有用的,但它未能考虑到在不同社会情境下相同言语行为可能显得更或更不得体或不考虑他人感受的实用因素。随着向有能动性的AI推进,系统越来越积极主动地追求目标并在世界中执行行动,考虑互动的语用学因素变得至关重要。我们提出了一种以关系和情境因素为中心的道德互动方法。我们探讨了一个系统作为社会行为者如何在(一系列)互动中尊重地对待个体。我们的工作预见了在情境社会互动层面上一系列尚未探索的风险,并提供了实用建议,帮助具有能动性的LLM技术善待人们。

更新时间: 2024-05-16 09:53:45

领域: cs.CL,cs.AI,cs.HC,68T42,H.5.2; I.2; I.2.1; J.4; J.5

下载: http://arxiv.org/abs/2401.09082v2

SciQAG: A Framework for Auto-Generated Scientific Question Answering Dataset with Fine-grained Evaluation

The use of question-answer (QA) pairs for training and evaluating large language models (LLMs) has attracted considerable attention. Yet few available QA datasets are based on knowledge from the scientific literature. Here we bridge this gap by presenting Automatic Generation of Scientific Question Answers (SciQAG), a framework for automatic generation and evaluation of scientific QA pairs sourced from published scientific literature. We fine-tune an open-source LLM to generate \num{960000} scientific QA pairs from full-text scientific papers and propose a five-dimensional metric to evaluate the quality of the generated QA pairs. We show via LLM-based evaluation that the generated QA pairs consistently achieve an average score of 2.5 out of 3 across five dimensions, indicating that our framework can distill key knowledge from papers into high-quality QA pairs at scale. We make the dataset, models, and evaluation codes publicly available.

Updated: 2024-05-16 09:42:37

标题: SciQAG:一个用于自动生成科学问题回答数据集并进行细粒度评估的框架

摘要: 使用问题-回答(QA)对来训练和评估大型语言模型(LLMs)已经引起了广泛关注。然而,很少有可用的QA数据集基于科学文献的知识。在这里,我们通过提出科学问题回答的自动生成(SciQAG)框架来填补这一空白,这是一个从已发表的科学文献中自动生成和评估科学QA对的框架。我们通过微调开源的LLM从全文科学论文中生成了960,000个科学QA对,并提出了一个五维度指标来评估生成的QA对的质量。我们通过LLM的评估显示,生成的QA对在五个维度上始终保持平均得分为3的2.5,表明我们的框架可以以大规模提取论文中的关键知识到高质量QA对中。我们公开提供数据集、模型和评估代码。

更新时间: 2024-05-16 09:42:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09939v1

DEBATE: Devil's Advocate-Based Assessment and Text Evaluation

As natural language generation (NLG) models have become prevalent, systematically assessing the quality of machine-generated texts has become increasingly important. Recent studies introduce LLM-based evaluators that operate as reference-free metrics, demonstrating their capability to adeptly handle novel tasks. However, these models generally rely on a single-agent approach, which, we argue, introduces an inherent limit to their performance. This is because there exist biases in LLM agent's responses, including preferences for certain text structure or content. In this work, we propose DEBATE, an NLG evaluation framework based on multi-agent scoring system augmented with a concept of Devil's Advocate. Within the framework, one agent is instructed to criticize other agents' arguments, potentially resolving the bias in LLM agent's answers. DEBATE substantially outperforms the previous state-of-the-art methods in two meta-evaluation benchmarks in NLG evaluation, SummEval and TopicalChat. We also show that the extensiveness of debates among agents and the persona of an agent can influence the performance of evaluators.

Updated: 2024-05-16 09:41:12

标题: 辩论:以魔鬼的代言人为基础的评估和文本评价

摘要: 随着自然语言生成(NLG)模型的普及,系统评估机器生成文本的质量变得越来越重要。最近的研究引入了基于LLM的评估器,作为无参考度量的指标,展示了它们处理新任务的能力。然而,这些模型通常依赖于单一代理方法,我们认为这会限制它们的性能。这是因为LLM代理的响应中存在偏见,包括对特定文本结构或内容的偏好。在这项工作中,我们提出了DEBATE,一个基于多代理评分系统的NLG评估框架,增加了魔鬼倡导者的概念。在这个框架内,一个代理被指示批评其他代理的论点,可能解决LLM代理答案中的偏见。DEBATE在NLG评估的两个元评估基准SummEval和TopicalChat中明显优于先前的最先进方法。我们还展示了代理之间辩论的广泛程度和一个代理的个性可以影响评估器的表现。

更新时间: 2024-05-16 09:41:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09935v1

Detecting Domain Shift in Multiple Instance Learning for Digital Pathology Using Fréchet Domain Distance

Multiple-instance learning (MIL) is an attractive approach for digital pathology applications as it reduces the costs related to data collection and labelling. However, it is not clear how sensitive MIL is to clinically realistic domain shifts, i.e., differences in data distribution that could negatively affect performance, and if already existing metrics for detecting domain shifts work well with these algorithms. We trained an attention-based MIL algorithm to classify whether a whole-slide image of a lymph node contains breast tumour metastases. The algorithm was evaluated on data from a hospital in a different country and various subsets of this data that correspond to different levels of domain shift. Our contributions include showing that MIL for digital pathology is affected by clinically realistic differences in data, evaluating which features from a MIL model are most suitable for detecting changes in performance, and proposing an unsupervised metric named Fr\'echet Domain Distance (FDD) for quantification of domain shifts. Shift measure performance was evaluated through the mean Pearson correlation to change in classification performance, where FDD achieved 0.70 on 10-fold cross-validation models. The baselines included Deep ensemble, Difference of Confidence, and Representation shift which resulted in 0.45, -0.29, and 0.56 mean Pearson correlation, respectively. FDD could be a valuable tool for care providers and vendors who need to verify if a MIL system is likely to perform reliably when implemented at a new site, without requiring any additional annotations from pathologists.

Updated: 2024-05-16 09:37:57

标题: 使用Fréchet域距离在数字病理学中检测多实例学习中的域转移

摘要: 多实例学习(MIL)是数字病理学应用中一种吸引人的方法,因为它减少了与数据收集和标记相关的成本。然而,目前还不清楚MIL对临床现实领域转移的敏感性,即数据分布上的差异可能会对性能产生负面影响,以及已有的用于检测领域转移的指标是否与这些算法很好地配合。我们训练了一种基于注意力的MIL算法,用于分类淋巴结全切片图像是否含有乳腺肿瘤转移。该算法在来自不同国家的医院数据以及对应于不同领域转移水平的各种子集上进行了评估。我们的贡献包括展示数字病理学中的MIL受临床现实数据差异影响,评估MIL模型中哪些特征最适合检测性能变化,并提出一个名为Frechet Domain Distance(FDD)的无监督度量,用于量化领域转移。通过与分类性能变化的平均皮尔逊相关性来评估转移度量性能,其中FDD 在10折交叉验证模型上实现了0.70。基线包括深度集成、置信度差异和表示转移,分别导致0.45、-0.29和0.56的平均皮尔逊相关性。FDD 可能是护理提供者和需要验证MIL系统在新场所实施时是否能可靠运行的供应商的宝贵工具,而无需来自病理学家的任何额外注释。

更新时间: 2024-05-16 09:37:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09934v1

MiniMaxAD: A Lightweight Autoencoder for Feature-Rich Anomaly Detection

Previous unsupervised anomaly detection (UAD) methods often struggle with significant intra-class diversity; i.e., a class in a dataset contains multiple subclasses, which we categorize as Feature-Rich Anomaly Detection Datasets (FRADs). This is evident in applications such as unified setting and unmanned supermarket scenarios. To address this challenge, we developed MiniMaxAD: a lightweight autoencoder designed to efficiently compress and memorize extensive information from normal images. Our model utilizes a large kernel convolutional network equipped with a Global Response Normalization (GRN) unit and employs a multi-scale feature reconstruction strategy. The GRN unit significantly increases the upper limit of the network's capacity, while the large kernel convolution facilitates the extraction of highly abstract patterns, leading to compact normal feature modeling. Additionally, we introduce an Adaptive Contraction Loss (ADCLoss), tailored to FRADs to overcome the limitations of global cosine distance loss. MiniMaxAD was comprehensively tested across six challenging UAD benchmarks, achieving state-of-the-art results in four and highly competitive outcomes in the remaining two. Notably, our model achieved a detection AUROC of up to 97.0\% in ViSA under the unified setting. Moreover, it not only achieved state-of-the-art performance in unmanned supermarket tasks but also exhibited an inference speed 37 times faster than the previous best method, demonstrating its effectiveness in complex UAD tasks.

Updated: 2024-05-16 09:37:54

标题: MiniMaxAD:用于特征丰富异常检测的轻量级自编码器

摘要: 以往的无监督异常检测(UAD)方法通常难以处理类内差异显著的情况;即数据集中的一个类包含多个子类,我们将其归类为特征丰富的异常检测数据集(FRADs)。这在统一设置和无人超市场景等应用中很明显。为了解决这一挑战,我们开发了MiniMaxAD:一种轻量级自编码器,设计用于高效压缩和记忆正常图像中的大量信息。我们的模型利用一个装备有全局响应归一化(GRN)单元的大核卷积网络,并采用多尺度特征重建策略。GRN单元显著增加了网络容量的上限,而大核卷积有助于提取高度抽象的模式,从而实现紧凑的正常特征建模。此外,我们引入了一种适应性收缩损失(ADCLoss),专门针对FRADs,以克服全局余弦距离损失的局限性。MiniMaxAD在六个具有挑战性的UAD基准测试中进行了全面测试,在其中四个中取得了最先进的结果,在剩下的两个中取得了高度竞争力的结果。值得注意的是,我们的模型在统一设置下的ViSA中实现了高达97.0\%的检测AUROC。此外,它不仅在无人超市任务中取得了最先进的性能,而且展现出比之前最佳方法快37倍的推理速度,证明了其在复杂UAD任务中的有效性。

更新时间: 2024-05-16 09:37:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09933v1

Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy

This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied on lower-level convexity simplification, our work specifically tackles large-scale BLO problems involving nonconvexity in both the upper and lower levels. We simultaneously address computational and theoretical challenges by introducing an innovative single-loop gradient-based algorithm, utilizing the Moreau envelope-based reformulation, and providing non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, our algorithm relies solely on first-order gradient information, enhancing its practicality and efficiency, especially for large-scale BLO learning tasks. We validate our approach's effectiveness through experiments on various synthetic problems, two typical hyper-parameter learning tasks, and a real-world neural architecture search application, collectively demonstrating its superior performance.

Updated: 2024-05-16 09:33:28

标题: 莫罗包络(Moreau Envelope)用于非凸双层优化:一种单循环和无Hessian解决策略

摘要: 这项工作着重解决大规模非凸Bi-Level Optimization(BLO)问题领域中的两个主要挑战,这些问题由于能够建模嵌套结构,越来越多地应用于机器学习。这些挑战涉及确保计算效率和提供理论保证。尽管最近可扩展BLO算法的进展主要依赖于对较低级别的凸性简化,我们的工作专门处理涉及上下两个级别的非凸性的大规模BLO问题。我们通过引入一种创新的基于梯度的单循环算法,利用Moreau包络的重构,并为一般非凸BLO问题提供非渐进收敛分析,同时解决了计算和理论挑战。值得注意的是,我们的算法仅依赖于一阶梯度信息,增强了其实用性和效率,特别适用于大规模BLO学习任务。通过在各种合成问题、两个典型的超参数学习任务以及一个真实世界的神经架构搜索应用上进行实验证实了我们方法的有效性,共同展示了其卓越的性能。

更新时间: 2024-05-16 09:33:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.09927v1

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice activity detection (VAD) on each separated stream. In this work we conduct an in-depth study of SSGD in the conversational telephone speech (CTS) domain, focusing mainly on low-latency streaming diarization applications. We consider three state-of-the-art speech separation (SSep) algorithms and study their performance both in online and offline scenarios, considering non-causal and causal implementations as well as continuous SSep (CSS) windowed inference. We compare different SSGD algorithms on two widely used CTS datasets: CALLHOME and Fisher Corpus (Part 1 and 2) and evaluate both separation and diarization performance. To improve performance, a novel, causal and computationally efficient leakage removal algorithm is proposed, which significantly decreases false alarms. We also explore, for the first time, fully end-to-end SSGD integration between SSep and VAD modules. Crucially, this enables fine-tuning on real-world data for which oracle speakers sources are not available. In particular, our best model achieves 8.8% DER on CALLHOME, which outperforms the current state-of-the-art end-to-end neural diarization model, despite being trained on an order of magnitude less data and having significantly lower latency, i.e., 0.1 vs. 1 seconds. Finally, we also show that the separated signals can be readily used also for automatic speech recognition, reaching performance close to using oracle sources in some configurations.

Updated: 2024-05-16 09:28:36

标题: 电话会话低延迟的端到端语音分离和语音活动检测集成

摘要: 最近的研究表明,语音分离引导的话语划分(SSGD)是一个越来越有希望的方向,主要归功于语音分离的最新进展。它通过首先分离说话者,然后在每个分离的流上应用语音活动检测(VAD)来执行话语划分。在本研究中,我们深入研究了SSGD在电话对话语音(CTS)领域的应用,主要关注低延迟流式划分应用。我们考虑了三种最先进的语音分离(SSep)算法,并研究了它们在在线和离线场景中的性能,考虑了非因果和因果实现以及连续SSep(CSS)窗口推断。我们在两个广泛使用的CTS数据集上比较了不同的SSGD算法:CALLHOME和Fisher语料库(第1部分和第2部分),并评估了分离和话语划分的性能。为了提高性能,提出了一种新颖的、因果且计算效率高的泄漏去除算法,可以显著减少误报。我们还首次探索了完全端到端的SSGD在SSep和VAD模块之间的集成。关键是,这使得可以对实际数据进行微调,而其中的oracle说话者来源是不可用的。特别是,我们的最佳模型在CALLHOME上实现了8.8%的DER,这优于当前最先进的端到端神经话语划分模型,尽管它是在数量级较少的数据上进行训练,并且具有显著较低的延迟,即0.1秒与1秒相比。最后,我们还表明分离的信号也可以轻松用于自动语音识别,在某些配置中的性能接近使用oracle来源。

更新时间: 2024-05-16 09:28:36

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2303.12002v2

Cell Maps Representation For Lung Adenocarcinoma Growth Patterns Classification In Whole Slide Images

Lung adenocarcinoma is a morphologically heterogeneous disease, characterized by five primary histologic growth patterns. The quantity of these patterns can be related to tumor behavior and has a significant impact on patient prognosis. In this work, we propose a novel machine learning pipeline capable of classifying tissue tiles into one of the five patterns or as non-tumor, with an Area Under the Receiver Operating Characteristic Curve (AUCROC) score of 0.97. Our model's strength lies in its comprehensive consideration of cellular spatial patterns, where it first generates cell maps from Hematoxylin and Eosin (H&E) whole slide images (WSIs), which are then fed into a convolutional neural network classification model. Exploiting these cell maps provides the model with robust generalizability to new data, achieving approximately 30% higher accuracy on unseen test-sets compared to current state of the art approaches. The insights derived from our model can be used to predict prognosis, enhancing patient outcomes.

Updated: 2024-05-16 09:19:05

标题: 细胞地图表示法在肺腺癌生长模式分类中的应用:基于全切片图像

摘要: 肺腺癌是一种形态学异质性的疾病,其特征是五种主要的组织学生长模式。这些模式的数量可以与肿瘤行为相关联,对患者的预后有显著影响。在这项工作中,我们提出了一种新颖的机器学习流程,能够将组织切片分类为五种模式之一或非肿瘤,其接收操作特征曲线下面积(AUCROC)得分为0.97。我们模型的优势在于全面考虑细胞空间模式,在此过程中首先从溴化铝和伊红(H&E)全幻灯片图像中生成细胞图,然后将其输入到卷积神经网络分类模型中。利用这些细胞图使模型对新数据具有强大的泛化能力,在未见测试集上的准确率比当前的最先进方法高出约30%。我们模型得出的见解可用于预测预后,提高患者的预后。

更新时间: 2024-05-16 09:19:05

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.15847v2

DIMSIM -- Device Integrity Monitoring through iSIM Applets and Distributed Ledger Technology

In the context of industrial environment, devices, such as robots and drones, are vulnerable to malicious activities such device tampering (e.g., hardware and software changes). The problem becomes even worse in a multi-stakeholder environment where multiple players contribute to an ecosystem. In such scenarios, particularly, when devices are deployed in remote settings, ensuring device integrity so that all stakeholders can trust them is challenging. Existing methods, often depend on additional hardware like the Trusted Platform Module (TPM) which may not be universally provided by all vendors. In this study, we introduce a distributed ledger technology-oriented architecture to monitor the remote devices' integrity using eUICC technology, a feature commonly found in industrial devices for cellular connectivity. We propose that using secure applets in eUICC, devices' integrity can be monitored and managed without installing any additional hardware. To this end, we present an end-to-end architecture to monitor device integrity thereby enabling all the stakeholders in the system to trust the devices. Additionally, we leverage the properties of immutable databases to provide robustness and efficiently to our model. In our primary evaluations, we measure the overhead caused by hashing our proposed data packets and performance of integrating an immutable database into our system. Our results show that performing hashing on our data packets takes order of microseconds, while reading and writing to an immutable database also requires only milliseconds.

Updated: 2024-05-16 09:13:54

标题: DIMSIM -- 设备完整性监控通过iSIM应用程序和分布式账本技术

摘要: 在工业环境中,诸如机器人和无人机等设备容易遭受恶意活动,例如设备篡改(例如,硬件和软件更改)。在多利益相关者环境中,问题变得更加严重,因为多个参与者为生态系统做出贡献。 在这种情况下,特别是当设备部署在偏远环境中时,确保设备的完整性以便所有利益相关者都能信任它们是具有挑战性的。现有方法通常依赖于像可信平台模块(TPM)这样的额外硬件,但并非所有供应商都能普遍提供。在这项研究中,我们引入了一种基于分布式账本技术的架构,使用eUICC技术监控远程设备的完整性,这是工业设备中用于蜂窝连接的常见功能。我们提出通过在eUICC中使用安全应用程序,可以监控和管理设备的完整性,而无需安装任何额外硬件。 为此,我们提出了一个端到端的架构来监控设备的完整性,从而使系统中的所有利益相关者都能信任这些设备。此外,我们利用不可变数据库的属性为我们的模型提供鲁棒性和效率。在我们的初步评估中,我们测量了对我们提出的数据包进行哈希处理所引起的开销,以及将不可变数据库集成到我们系统中的性能。我们的结果显示,对我们的数据包进行哈希处理需要微秒级的时间,而读写不可变数据库也只需要毫秒级的时间。

更新时间: 2024-05-16 09:13:54

领域: cs.CR

下载: http://arxiv.org/abs/2405.09916v1

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.

Updated: 2024-05-16 09:07:42

标题: 使用自然互动和大型语言模型逐步学习人形机器人行为

摘要: 自然语言对话是直观的人机交互的关键。它不仅可以用来表达人类的意图,还可以用来传达指令,以改进机器人对命令的理解。重要的是赋予机器人从这种交互经验中以增量方式学习的能力,以使它们能够改进其行为或在未来避免错误。本文提出了一个系统,实现了从自然交互中对复杂行为的增量学习,并在一个人形机器人上展示了其实现。基于最新的进展,我们提出了一个系统,利用大型语言模型(LLM)对机器人的行为进行高级编排,基于使LLM能够在交互控制台中生成Python语句的思想来调用机器人的感知和行动。交互循环通过将人类指令、环境观察和执行结果反馈给LLM来闭合,从而通知下一个语句的生成。具体地,我们引入了增量提示学习,使系统能够从其错误中进行交互式学习。为此,LLM可以调用另一个负责基于人类反馈对当前交互进行代码级改进的LLM。改进后的交互然后保存在机器人的记忆中,并在类似请求时检索。我们将该系统集成到人形机器人ARMAR-6的机器人认知架构中,并通过在模拟环境中定量地评估我们的方法,并通过在模拟环境和现实世界中展示广义增量学习知识来定性地评估。

更新时间: 2024-05-16 09:07:42

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2309.04316v3

Scaling convolutional neural networks achieves expert-level seizure detection in neonatal EEG

Background: Neonatal seizures are a neurological emergency that require urgent treatment. They are hard to diagnose clinically and can go undetected if EEG monitoring is unavailable. EEG interpretation requires specialised expertise which is not widely available. Algorithms to detect EEG seizures can address this limitation but have yet to reach widespread clinical adoption. Methods: Retrospective EEG data from 332 neonates was used to develop and validate a seizure-detection model. The model was trained and tested with a development dataset ($n=202$) that was annotated with over 12k seizure events on a per-channel basis. This dataset was used to develop a convolutional neural network (CNN) using a modern architecture and training methods. The final model was then validated on two independent multi-reviewer datasets ($n=51$ and $n=79$). Results: Increasing dataset and model size improved model performance: Matthews correlation coefficient (MCC) and Pearson's correlation ($r$) increased by up to 50% with data scaling and up to 15% with model scaling. Over 50k hours of annotated single-channel EEG was used for training a model with 21 million parameters. State-of-the-art was achieved on an open-access dataset (MCC=0.764, $r=0.824$, and AUC=0.982). The CNN attains expert-level performance on both held-out validation sets, with no significant difference in inter-rater agreement among the experts and among experts and algorithm ($\Delta \kappa < -0.095$, $p>0.05$). Conclusion: With orders of magnitude increases in data and model scale we have produced a new state-of-the-art model for neonatal seizure detection. Expert-level equivalence on completely unseen data, a first in this field, provides a strong indication that the model is ready for further clinical validation.

Updated: 2024-05-16 08:59:20

标题: 将卷积神经网络进行缩放,实现新生儿脑电图中专家水平的癫痫检测

摘要: 背景:新生儿癫痫是一种需要紧急治疗的神经紧急情况。临床上很难诊断,如果没有脑电图监测,可能会被忽视。脑电图解释需要专业知识,这种专业知识并不普遍可得。用于检测脑电图癫痫的算法可以解决这一限制,但尚未得到广泛的临床采用。 方法:回顾性分析了332名新生儿的脑电图数据,用于开发和验证一种癫痫检测模型。该模型在一个开发数据集(n=202)上进行训练和测试,该数据集按通道标注了超过12,000次癫痫事件。使用这个数据集开发了一个使用现代架构和训练方法的卷积神经网络(CNN)。最终模型随后在两个独立的多审阅者数据集(n=51和n=79)上进行了验证。 结果:增加数据集和模型规模可以提高模型性能:马修斯相关系数(MCC)和皮尔逊相关系数(r)随数据缩放增加高达50%,随模型缩放增加高达15%。训练使用了超过50,000小时的标注单通道脑电图,训练了一个拥有2100万参数的模型。在一个开放获取的数据集上实现了最新技术水平(MCC=0.764,r=0.824,AUC=0.982)。CNN在两个保留验证集上达到了专家级别的性能,在专家之间和专家与算法之间的一致性协议中没有显著差异(Δkappa<-0.095,p>0.05)。 结论:通过数据和模型规模的数量级增加,我们为新生儿癫痫检测制定了一种新的最先进模型。对于完全未见数据的专家级等效性,这是该领域的首次,这表明该模型已准备好进行进一步的临床验证。

更新时间: 2024-05-16 08:59:20

领域: cs.LG,eess.SP,physics.med-ph

下载: http://arxiv.org/abs/2405.09911v1

A Machine Learning Approach for Simultaneous Demapping of QAM and APSK Constellations

As telecommunication systems evolve to meet increasing demands, integrating deep neural networks (DNNs) has shown promise in enhancing performance. However, the trade-off between accuracy and flexibility remains challenging when replacing traditional receivers with DNNs. This paper introduces a novel probabilistic framework that allows a single DNN demapper to demap multiple QAM and APSK constellations simultaneously. We also demonstrate that our framework allows exploiting hierarchical relationships in families of constellations. The consequence is that we need fewer neural network outputs to encode the same function without an increase in Bit Error Rate (BER). Our simulation results confirm that our approach approaches the optimal demodulation error bound under an Additive White Gaussian Noise (AWGN) channel for multiple constellations. Thereby, we address multiple important issues in making DNNs flexible enough for practical use as receivers.

Updated: 2024-05-16 08:57:34

标题: 一种用于同时解调QAM和APSK星座图的机器学习方法

摘要: 随着电信系统不断发展以满足日益增长的需求,整合深度神经网络(DNNs)已显示出提高性能的潜力。然而,当用DNNs替换传统接收器时,精度和灵活性之间的权衡仍然具有挑战性。本文介绍了一种新颖的概率框架,允许单个DNN解调器同时解调多个QAM和APSK星座图。我们还展示了我们的框架允许利用星座族中的层次关系。其结果是我们需要较少的神经网络输出来编码相同的功能,而不会增加误比特率(BER)。我们的仿真结果证实,我们的方法在多星座下接近于加性白高斯噪声(AWGN)信道下的最佳解调误差界。因此,我们解决了使DNNs足够灵活以实际用作接收器的多个重要问题。

更新时间: 2024-05-16 08:57:34

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.09909v1

Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

Federated Learning (FL) has become an attractive approach to collaboratively train Machine Learning (ML) models while data sources' privacy is still preserved. However, most of existing FL approaches are based on supervised techniques, which could require resource-intensive activities and human intervention to obtain labelled datasets. Furthermore, in the scope of cyberattack detection, such techniques are not able to identify previously unknown threats. In this direction, this work proposes a novel unsupervised FL approach for the identification of potential misbehavior in vehicular environments. We leverage the computing capabilities of public cloud services for model aggregation purposes, and also as a central repository of misbehavior events, enabling cross-vehicle learning and collective defense strategies. Our solution integrates the use of Gaussian Mixture Models (GMM) and Variational Autoencoders (VAE) on the VeReMi dataset in a federated environment, where each vehicle is intended to train only with its own data. Furthermore, we use Restricted Boltzmann Machines (RBM) for pre-training purposes, and Fedplus as aggregation function to enhance model's convergence. Our approach provides better performance (more than 80 percent) compared to recent proposals, which are usually based on supervised techniques and artificial divisions of the VeReMi dataset.

Updated: 2024-05-16 08:49:50

标题: 使用变分自动编码器和高斯混合模型的联邦学习在异常检测中的应用

摘要: 联邦学习(FL)已经成为一种吸引人的方法,可以在保护数据源隐私的同时协作训练机器学习(ML)模型。然而,大多数现有的FL方法都基于监督技术,这可能需要资源密集型的活动和人为干预来获取标记的数据集。此外,在网络攻击检测范围内,这些技术无法识别先前未知的威胁。在这个方向上,这项工作提出了一种新颖的无监督FL方法,用于识别车辆环境中潜在的不当行为。我们利用公共云服务的计算能力用于模型聚合目的,同时作为不当行为事件的中央存储库,实现跨车辆学习和集体防御策略。我们的解决方案在联邦环境中使用高斯混合模型(GMM)和变分自动编码器(VAE)对VeReMi数据集进行训练,其中每辆车都只用自己的数据进行训练。此外,我们使用受限玻尔兹曼机(RBM)进行预训练,并使用Fedplus作为聚合函数来增强模型的收敛性。与通常基于监督技术和对VeReMi数据集进行人为划分的最近提议相比,我们的方法提供了更好的性能(超过80%)。

更新时间: 2024-05-16 08:49:50

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.09903v1

Unveiling the Potential: Harnessing Deep Metric Learning to Circumvent Video Streaming Encryption

Encryption on the internet with the shift to HTTPS has been an important step to improve the privacy of internet users. However, there is an increasing body of work about extracting information from encrypted internet traffic without having to decrypt it. Such attacks bypass security guarantees assumed to be given by HTTPS and thus need to be understood. Prior works showed that the variable bitrates of video streams are sufficient to identify which video someone is watching. These works generally have to make trade-offs in aspects such as accuracy, scalability, robustness, etc. These trade-offs complicate the practical use of these attacks. To that end, we propose a deep metric learning framework based on the triplet loss method. Through this framework, we achieve robust, generalisable, scalable and transferable encrypted video stream detection. First, the triplet loss is better able to deal with video streams not seen during training. Second, our approach can accurately classify videos not seen during training. Third, we show that our method scales well to a dataset of over 1000 videos. Finally, we show that a model trained on video streams over Chrome can also classify streams over Firefox. Our results suggest that this side-channel attack is more broadly applicable than originally thought. We provide our code alongside a diverse and up-to-date dataset for future research.

Updated: 2024-05-16 08:49:05

标题: 揭示潜力:利用深度度量学习规避视频流加密

摘要: 随着转向HTTPS,互联网上的加密已经成为改善互联网用户隐私的重要一步。然而,有越来越多的关于从加密互联网流量中提取信息而无需解密的研究成果。这些攻击绕过了HTTPS假定的安全保证,因此需要被理解。先前的研究表明,视频流的可变比特率足以识别某人正在观看的视频。这些研究通常需要在准确性、可扩展性、稳健性等方面做出权衡。这些权衡使得这些攻击的实际应用变得复杂。为此,我们提出了基于三元损失方法的深度度量学习框架。通过这个框架,我们实现了稳健、通用、可扩展和可转移的加密视频流检测。首先,三元损失更能够处理在训练期间未见过的视频流。其次,我们的方法可以准确分类在训练期间未见过的视频。第三,我们展示了我们的方法在超过1000个视频的数据集上的良好扩展性。最后,我们展示了在Chrome上训练的模型也可以对Firefox上的视频流进行分类。我们的结果表明,这种侧信道攻击比最初想象的更广泛适用。我们提供我们的代码以及一个多样化和最新的数据集供未来研究使用。

更新时间: 2024-05-16 08:49:05

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.09902v1

Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

Recent deep music generation studies have put much emphasis on long-term generation with structures. However, we are yet to see high-quality, well-structured whole-song generation. In this paper, we make the first attempt to model a full music piece under the realization of compositional hierarchy. With a focus on symbolic representations of pop songs, we define a hierarchical language, in which each level of hierarchy focuses on the semantics and context dependency at a certain music scope. The high-level languages reveal whole-song form, phrase, and cadence, whereas the low-level languages focus on notes, chords, and their local patterns. A cascaded diffusion model is trained to model the hierarchical language, where each level is conditioned on its upper levels. Experiments and analysis show that our model is capable of generating full-piece music with recognizable global verse-chorus structure and cadences, and the music quality is higher than the baselines. Additionally, we show that the proposed model is controllable in a flexible way. By sampling from the interpretable hierarchical languages or adjusting pre-trained external representations, users can control the music flow via various features such as phrase harmonic structures, rhythmic patterns, and accompaniment texture.

Updated: 2024-05-16 08:48:23

标题: 使用级联扩散模型进行符号音乐的整首歌层次生成

摘要: 最近的深度音乐生成研究非常注重具有结构的长期生成。然而,我们尚未看到高质量、结构完整的整首歌曲生成。在本文中,我们首次尝试在组合层次实现下建模完整的音乐作品。专注于流行歌曲的符号表示,我们定义了一个分层语言,其中每个层次的重点放在某个音乐范围的语义和上下文依赖上。高层语言揭示了整首歌曲的形式、乐句和终止音,而低层语言则专注于音符、和弦及其局部模式。训练了一个级联扩散模型来建模层次语言,其中每个层次都受其上层的条件约束。实验和分析表明,我们的模型能够生成具有可识别的全局副歌结构和终止音的完整音乐作品,音乐质量高于基线。此外,我们展示了所提出的模型以灵活方式可控。通过从可解释的分层语言中抽样或调整预训练的外部表示,用户可以通过各种功能控制音乐流,如乐句和谐结构、节奏模式和伴奏质地。

更新时间: 2024-05-16 08:48:23

领域: cs.SD,cs.AI,cs.LG,eess.AS,68Txx

下载: http://arxiv.org/abs/2405.09901v1

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space. These rates allow to treat the misspecified case in which the true regression function is not contained in the hypothesis space. We combine standard assumptions on the capacity of the hypothesis space with a novel tensor product construction of vector-valued interpolation spaces in order to characterize the smoothness of the regression function. Our upper bound not only attains the same rate as real-valued kernel ridge regression, but also removes the assumption that the target regression function is bounded. For the lower bound, we reduce the problem to the scalar setting using a projection argument. We show that these rates are optimal in most cases and independent of the dimension of the output space. We illustrate our results for the special case of vector-valued Sobolev spaces.

Updated: 2024-05-16 08:41:32

标题: 朝向矢量值正则化最小二乘算法的最优Sobolev范数速率

摘要: 我们提出了第一个针对无限维向量值岭回归的最优速率,这些速率处于一个连续的范数尺度上,介于$L_2$和假设空间之间,我们将假设空间视为一个向量值再生核希尔伯特空间。这些速率允许处理真实回归函数不包含在假设空间中的误差情况。我们将对假设空间的容量的标准假设与一种新颖的向量值插值空间的张量积构造相结合,以描述回归函数的平滑性。我们的上界不仅达到与实值核岭回归相同的速率,还消除了目标回归函数有界的假设。对于下界,我们使用投影论证将问题缩减到标量设置。我们展示这些速率在大多数情况下是最优的,并且与输出空间的维度无关。我们用向量值Sobolev空间的特殊情况说明了我们的结果。

更新时间: 2024-05-16 08:41:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2312.07186v4

Image Clustering with External Guidance

The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering, even though it seems irrelevant to the given data. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, to improve image clustering performance, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset.

Updated: 2024-05-16 08:41:14

标题: 带有外部指导的图像聚类

摘要: 聚类的核心是整合先验知识来构建监督信号。从基于数据紧凑性的经典k均值到最近由自我监督引导的对比聚类,聚类方法的演变从根本上对应着监督信号的进展。目前,大量工作已致力于从数据中挖掘内部监督信号。然而,丰富的外部知识,如语义描述,自然有助于聚类,却遗憾地被忽视。在这项工作中,我们提出利用外部知识作为新的监督信号来指导聚类,即使它看似与给定数据无关。为了实现和验证我们的想法,我们设计了一种外部引导聚类方法(Text-Aided Clustering,TAC),利用WordNet的文本语义来促进图像聚类。具体而言,TAC首先选择并检索最能区分图像的WordNet名词以增强特征可辨识性。然后,为了提高图像聚类性能,TAC通过相互提炼跨模态邻域信息来协同文本和图像模态。实验证明,TAC在包括完整ImageNet-1K数据集在内的五个广泛使用和三个更具挑战性的图像聚类基准上取得了最先进的性能。

更新时间: 2024-05-16 08:41:14

领域: cs.LG

下载: http://arxiv.org/abs/2310.11989v2

Querying Easily Flip-flopped Samples for Deep Active Learning

Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.

Updated: 2024-05-16 08:36:30

标题: 轻松查询深度主动学习中易翻转的样本

摘要: 主动学习是一种机器学习范式,旨在通过策略性地选择和查询未标记数据来改善模型的性能。一种有效的选择策略是基于模型的预测不确定性,这可以解释为样本信息量的度量。样本到决策边界的距离是预测不确定性的自然度量,但通常难以计算,特别是对于多类分类任务中形成的复杂决策边界。为了解决这个问题,本文提出了“最小不同意度度量”(LDM),定义为预测标签的不同意度的最小概率,并证明了LDM的估计器在温和假设下渐近一致。该估计器计算效率高,可以通过参数扰动轻松实现深度学习模型。基于LDM的主动学习通过查询具有最小LDM的未标记数据来执行。实验结果表明,我们基于LDM的主动学习算法在所有考虑的数据集和深度架构上均获得了最先进的整体性能。

更新时间: 2024-05-16 08:36:30

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2401.09787v2

AnglE-optimized Text Embeddings

High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

Updated: 2024-05-16 08:21:54

标题: 角度优化的文本嵌入

摘要: 高质量的文本嵌入在改进语义文本相似性(STS)任务中至关重要,这些任务是大型语言模型(LLM)应用的关键组成部分。然而,现存的文本嵌入模型面临的一个普遍挑战是梯度消失的问题,主要是由于它们依赖于在优化目标中的余弦函数,该函数具有饱和区域。为了解决这个问题,本文提出了一种新颖的角度优化文本嵌入模型,称为AnglE。AnglE的核心思想是在复杂空间中引入角度优化。这种新颖方法有效地缓解了余弦函数中饱和区域的不利影响,这可能会阻碍梯度并阻碍优化过程。为了建立一个全面的STS评估,我们在现有的短文本STS数据集和从GitHub Issues收集的新的长文本STS数据集上进行了实验。此外,我们检查了具有有限标记数据的特定领域STS场景,并探讨了AnglE如何与LLM注释数据一起使用。我们在各种任务上进行了广泛的实验,包括短文本STS、长文本STS和特定领域的STS任务。结果表明,AnglE优于忽略余弦饱和区域的最先进(SOTA)STS模型。这些发现证明了AnglE生成高质量文本嵌入的能力以及角度优化在STS中的有用性。

更新时间: 2024-05-16 08:21:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.12871v7

Multi-Scale Protein Language Model for Unified Molecular Modeling

Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins.

Updated: 2024-05-16 08:21:11

标题: 多尺度蛋白质语言模型用于统一分子建模

摘要: 蛋白质语言模型在蛋白工程领域展现出显著潜力。然而,目前的蛋白质语言模型主要在残基尺度上运行,这限制了它们在原子水平提供信息的能力。这种限制阻碍了我们充分利用蛋白质语言模型在涉及蛋白质和小分子的应用中的能力。在本文中,我们提出了ESM-AA(ESM全原子),这是一种新颖的方法,可以实现原子尺度和残基尺度统一的分子建模。ESM-AA通过在多尺度编码切换蛋白质序列上进行预训练,并利用多尺度位置编码来捕捉残基和原子之间的关系来实现这一目标。实验结果表明,ESM-AA在蛋白质-分子任务中超越了先前的方法,展示了蛋白质语言模型的充分利用。进一步的研究揭示,通过统一的分子建模,ESM-AA不仅获得了分子知识,而且保留了对蛋白质的理解。

更新时间: 2024-05-16 08:21:11

领域: q-bio.BM,cs.CE,cs.LG

下载: http://arxiv.org/abs/2403.12995v2

"Hunt Takes Hare": Theming Games Through Game-Word Vector Translation

A game's theme is an important part of its design -- it conveys narrative information, rhetorical messages, helps the player intuit strategies, aids in tutorialisation and more. Thematic elements of games are notoriously difficult for AI systems to understand and manipulate, however, and often rely on large amounts of hand-written interpretations and knowledge. In this paper we present a technique which connects game embeddings, a recent method for modelling game dynamics from log data, and word embeddings, which models semantic information about language. We explain two different approaches for using game embeddings in this way, and show evidence that game embeddings enhance the linguistic translations of game concepts from one theme to another, opening up exciting new possibilities for reasoning about the thematic elements of games in the future.

Updated: 2024-05-16 08:19:11

标题: "Hunt Takes Hare": 通过游戏词向量翻译对游戏进行主题化

摘要: 一个游戏的主题是其设计的重要部分——它传达叙事信息、修辞信息,帮助玩家直观地理解策略,辅助教程等。游戏的主题元素对于人工智能系统来说常常难以理解和操作,并且通常依赖大量手工编写的解释和知识。在本文中,我们提出了一种将游戏嵌入(一种最近用于建模游戏动态的方法)和单词嵌入(用于建模语言的语义信息)相连接的技术。我们解释了两种不同的使用游戏嵌入的方式,并展示证据表明游戏嵌入增强了从一个主题到另一个主题的游戏概念的语言翻译,为未来关于游戏主题元素推理的激动人心的新可能性打开了大门。

更新时间: 2024-05-16 08:19:11

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.09893v1

Balancing Similarity and Complementarity for Federated Learning

In mobile and IoT systems, Federated Learning (FL) is increasingly important for effectively using data while maintaining user privacy. One key challenge in FL is managing statistical heterogeneity, such as non-i.i.d. data, arising from numerous clients and diverse data sources. This requires strategic cooperation, often with clients having similar characteristics. However, we are interested in a fundamental question: does achieving optimal cooperation necessarily entail cooperating with the most similar clients? Typically, significant model performance improvements are often realized not by partnering with the most similar models, but through leveraging complementary data. Our theoretical and empirical analyses suggest that optimal cooperation is achieved by enhancing complementarity in feature distribution while restricting the disparity in the correlation between features and targets. Accordingly, we introduce a novel framework, \texttt{FedSaC}, which balances similarity and complementarity in FL cooperation. Our framework aims to approximate an optimal cooperation network for each client by optimizing a weighted sum of model similarity and feature complementarity. The strength of \texttt{FedSaC} lies in its adaptability to various levels of data heterogeneity and multimodal scenarios. Our comprehensive unimodal and multimodal experiments demonstrate that \texttt{FedSaC} markedly surpasses other state-of-the-art FL methods.

Updated: 2024-05-16 08:16:19

标题: 在联邦学习中平衡相似性和互补性

摘要: 在移动和物联网系统中,联邦学习(FL)在有效利用数据同时保护用户隐私方面变得越来越重要。FL中的一个关键挑战是管理统计异质性,例如来自众多客户和不同数据源的非独立同分布数据。这需要战略性合作,通常与具有相似特征的客户合作。然而,我们对一个基本问题感兴趣:实现最佳合作是否必然意味着与最相似的客户合作?通常,实现显著的模型性能改进并不是通过与最相似的模型合作,而是通过利用互补数据。我们的理论和实证分析表明,通过增强特征分布的互补性并限制特征与目标之间的相关性差异,可以实现最佳合作。因此,我们引入了一个新的框架\texttt{FedSaC},在FL合作中平衡相似性和互补性。我们的框架旨在通过优化模型相似性和特征互补性的加权和来近似为每个客户构建一个最佳合作网络。\texttt{FedSaC}的优势在于其适应各种数据异质性和多模态场景。我们的全面的单模态和多模态实验表明,\texttt{FedSaC}明显优于其他最先进的FL方法。

更新时间: 2024-05-16 08:16:19

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.09892v1

Deep Regression Representation Learning with Topology

Most works studying representation learning focus only on classification and neglect regression. Yet, the learning objectives and, therefore, the representation topologies of the two tasks are fundamentally different: classification targets class separation, leading to disconnected representations, whereas regression requires ordinality with respect to the target, leading to continuous representations. We thus wonder how the effectiveness of a regression representation is influenced by its topology, with evaluation based on the Information Bottleneck (IB) principle. The IB principle is an important framework that provides principles for learning effective representations. We establish two connections between it and the topology of regression representations. The first connection reveals that a lower intrinsic dimension of the feature space implies a reduced complexity of the representation Z. This complexity can be quantified as the conditional entropy of Z on the target Y, and serves as an upper bound on the generalization error. The second connection suggests a feature space that is topologically similar to the target space will better align with the IB principle. Based on these two connections, we introduce PH-Reg, a regularizer specific to regression that matches the intrinsic dimension and topology of the feature space with the target space. Experiments on synthetic and real-world regression tasks demonstrate the benefits of PH-Reg. Code: https://github.com/needylove/PH-Reg.

Updated: 2024-05-16 08:16:04

标题: 使用拓扑学进行深度回归表示学习

摘要: 大多数研究代表学习的工作只关注分类,而忽视了回归。然而,两个任务的学习目标和因此表示拓扑是根本不同的:分类目标是类别分离,导致不连通的表示,而回归则要求相对于目标的有序性,导致连续表示。因此,我们想知道回归表示的有效性如何受其拓扑结构的影响,评估基于信息瓶颈(IB)原则。信息瓶颈原则是一个重要框架,为学习有效的表示提供原则。我们建立了它与回归表示拓扑之间的两种连接。第一个连接揭示了特征空间的较低固有维度意味着表示Z的复杂性降低。这种复杂性可以量化为Z在目标Y上的条件熵,并作为泛化误差的上界。第二个连接表明,与目标空间拓扑相似的特征空间将更好地与信息瓶颈原则相一致。基于这两种连接,我们引入了PH-Reg,一种特定于回归的正则化器,将特征空间的固有维度和拓扑与目标空间匹配。对合成和真实回归任务的实验表明了PH-Reg的好处。代码:https://github.com/needylove/PH-Reg。

更新时间: 2024-05-16 08:16:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.13904v4

MTLComb: multi-task learning combining regression and classification tasks for joint feature selection

Multi-task learning (MTL) is a learning paradigm that enables the simultaneous training of multiple communicating algorithms. Although MTL has been successfully applied to ether regression or classification tasks alone, incorporating mixed types of tasks into a unified MTL framework remains challenging, primarily due to variations in the magnitudes of losses associated with different tasks. This challenge, particularly evident in MTL applications with joint feature selection, often results in biased selections. To overcome this obstacle, we propose a provable loss weighting scheme that analytically determines the optimal weights for balancing regression and classification tasks. This scheme significantly mitigates the otherwise biased feature selection. Building upon this scheme, we introduce MTLComb, an MTL algorithm and software package encompassing optimization procedures, training protocols, and hyperparameter estimation procedures. MTLComb is designed for learning shared predictors among tasks of mixed types. To showcase the efficacy of MTLComb, we conduct tests on both simulated data and biomedical studies pertaining to sepsis and schizophrenia.

Updated: 2024-05-16 08:07:25

标题: MTLComb:将回归和分类任务结合的多任务学习,用于联合特征选择

摘要: 多任务学习(MTL)是一种学习范式,它使多个相互通信的算法能够同时训练。尽管MTL已成功应用于回归或分类任务中的任一种,但将混合类型的任务纳入统一的MTL框架仍然具有挑战性,主要是因为不同任务相关损失的幅度变化。这种挑战在具有联合特征选择的MTL应用中尤为明显,通常导致选择偏向。为了克服这一障碍,我们提出了一种可证明的损失加权方案,可以分析确定平衡回归和分类任务的最佳权重。该方案显著减轻了否则会出现的偏向特征选择。基于这一方案,我们引入了MTLComb,这是一种MTL算法和软件包,包括优化过程、训练协议和超参数估计过程。MTLComb旨在学习混合类型任务之间的共享预测器。为展示MTLComb的有效性,我们在模拟数据和有关败血症和精神分裂症的生物医学研究中进行了测试。

更新时间: 2024-05-16 08:07:25

领域: cs.LG,cs.AI,q-bio.BM,J.3; I.2.6

下载: http://arxiv.org/abs/2405.09886v1

Testing the Segment Anything Model on radiology data

Deep learning models trained with large amounts of data have become a recent and effective approach to predictive problem solving -- these have become known as "foundation models" as they can be used as fundamental tools for other applications. While the paramount examples of image classification (earlier) and large language models (more recently) led the way, the Segment Anything Model (SAM) was recently proposed and stands as the first foundation model for image segmentation, trained on over 10 million images and with recourse to over 1 billion masks. However, the question remains -- what are the limits of this foundation? Given that magnetic resonance imaging (MRI) stands as an important method of diagnosis, we sought to understand whether SAM could be used for a few tasks of zero-shot segmentation using MRI data. Particularly, we wanted to know if selecting masks from the pool of SAM predictions could lead to good segmentations. Here, we provide a critical assessment of the performance of SAM on magnetic resonance imaging data. We show that, while acceptable in a very limited set of cases, the overall trend implies that these models are insufficient for MRI segmentation across the whole volume, but can provide good segmentations in a few, specific slices. More importantly, we note that while foundation models trained on natural images are set to become key aspects of predictive modelling, they may prove ineffective when used on other imaging modalities.

Updated: 2024-05-16 08:06:44

标题: 在放射学数据上测试片段化模型

摘要: 训练了大量数据的深度学习模型已成为最近一种有效的预测问题解决方法--这些模型被称为“基础模型”,因为它们可以作为其他应用的基本工具。虽然图像分类(早期)和大型语言模型(最近)是主要的例子,但最近提出了分割任意模型(SAM),它是第一个基础模型,用于图像分割,训练了超过1000万张图像并使用了超过10亿个蒙版。然而,问题仍然存在--这个基础的限制是什么?鉴于磁共振成像(MRI)作为一种重要的诊断方法,我们试图了解SAM是否可以用于零样本分割MRI数据的一些任务。特别是,我们想知道从SAM预测的蒙版池中选择蒙版是否能够导致良好的分割。 在这里,我们对SAM在磁共振成像数据上的性能进行了批判性评估。我们表明,尽管在非常有限的情况下是可接受的,但总体趋势暗示这些模型对于整个体积的MRI分割是不足够的,但可以在一些特定的切片中提供良好的分割。更重要的是,我们注意到,虽然在自然图像上训练的基础模型将成为预测建模的关键方面,但当用于其他成像模式时,它们可能会证明无效。

更新时间: 2024-05-16 08:06:44

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.12880v2

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

With the rapid development of face recognition (FR) systems, the privacy of face images on social media is facing severe challenges due to the abuse of unauthorized FR systems. Some studies utilize adversarial attack techniques to defend against malicious FR systems by generating adversarial examples. However, the generated adversarial examples, i.e., the protected face images, tend to suffer from subpar visual quality and low transferability. In this paper, we propose a novel face protection approach, dubbed DiffAM, which leverages the powerful generative ability of diffusion models to generate high-quality protected face images with adversarial makeup transferred from reference images. To be specific, we first introduce a makeup removal module to generate non-makeup images utilizing a fine-tuned diffusion model with guidance of textual prompts in CLIP space. As the inverse process of makeup transfer, makeup removal can make it easier to establish the deterministic relationship between makeup domain and non-makeup domain regardless of elaborate text prompts. Then, with this relationship, a CLIP-based makeup loss along with an ensemble attack strategy is introduced to jointly guide the direction of adversarial makeup domain, achieving the generation of protected face images with natural-looking makeup and high black-box transferability. Extensive experiments demonstrate that DiffAM achieves higher visual quality and attack success rates with a gain of 12.98% under black-box setting compared with the state of the arts. The code will be available at https://github.com/HansSunY/DiffAM.

Updated: 2024-05-16 08:05:36

标题: DiffAM:基于扩散的对抗性化妆迁移技术用于面部隐私保护

摘要: 随着人脸识别系统的快速发展,社交媒体上的人脸图像隐私面临着严重挑战,原因是未经授权的人脸识别系统的滥用。一些研究利用对抗攻击技术来抵御恶意人脸识别系统,生成对抗性示例。然而,生成的对抗性示例,即受保护的人脸图像,往往存在视觉质量不佳和低可传递性的问题。在本文中,我们提出了一种新颖的人脸保护方法,称为DiffAM,利用扩散模型强大的生成能力,从参考图像转移生成高质量的受保护人脸图像。具体来说,我们首先引入了一个化妆去除模块,利用在CLIP空间中的文本提示指导下使用经过微调的扩散模型生成非化妆图像。作为化妆转移的逆过程,化妆去除可以更容易地建立化妆领域和非化妆领域之间的确定性关系,而不受精心设计的文本提示的影响。然后,利用这种关系,引入了基于CLIP的化妆损失以及一个集成攻击策略,共同指导对抗性化妆领域的方向,实现生成具有自然妆容和高黑盒可传递性的受保护人脸图像。大量实验表明,DiffAM在黑盒设置下相比现有技术取得了12.98%的攻击成功率和更高的视觉质量。代码可在https://github.com/HansSunY/DiffAM上找到。

更新时间: 2024-05-16 08:05:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09882v1

Generative Unlearning for Any Identity

Recent advances in generative models trained on large-scale datasets have made it possible to synthesize high-quality samples across various domains. Moreover, the emergence of strong inversion networks enables not only a reconstruction of real-world images but also the modification of attributes through various editing methods. However, in certain domains related to privacy issues, e.g., human faces, advanced generative models along with strong inversion methods can lead to potential misuses. In this paper, we propose an essential yet under-explored task called generative identity unlearning, which steers the model not to generate an image of a specific identity. In the generative identity unlearning, we target the following objectives: (i) preventing the generation of images with a certain identity, and (ii) preserving the overall quality of the generative model. To satisfy these goals, we propose a novel framework, Generative Unlearning for Any Identity (GUIDE), which prevents the reconstruction of a specific identity by unlearning the generator with only a single image. GUIDE consists of two parts: (i) finding a target point for optimization that un-identifies the source latent code and (ii) novel loss functions that facilitate the unlearning procedure while less affecting the learned distribution. Our extensive experiments demonstrate that our proposed method achieves state-of-the-art performance in the generative machine unlearning task. The code is available at https://github.com/KHU-AGI/GUIDE.

Updated: 2024-05-16 08:00:55

标题: 任何身份的生成式遗忘

摘要: 最近在大规模数据集上训练的生成模型的进展使得在各个领域中合成高质量样本成为可能。此外,强大的反演网络的出现不仅使得对真实世界图像的重建成为可能,还可以通过各种编辑方法修改属性。然而,在涉及隐私问题的某些领域,例如人脸,先进的生成模型以及强大的反演方法可能导致潜在的滥用。在本文中,我们提出了一个重要但尚未深入研究的任务,称为生成身份遗忘,该任务引导模型不生成特定身份的图像。在生成身份遗忘中,我们的目标是:(i) 防止生成具有某种身份的图像,以及 (ii) 保留生成模型的整体质量。为了实现这些目标,我们提出了一个新的框架,名为任何身份的生成遗忘(GUIDE),通过仅使用一张图片就可以遗忘生成器,从而阻止特定身份的重建。GUIDE由两部分组成:(i) 找到一个优化的目标点,使源潜在代码不可识别,以及 (ii) 新颖的损失函数,促进遗忘过程,同时不太影响学习到的分布。我们的广泛实验证明,我们提出的方法在生成机器遗忘任务中实现了最先进的性能。代码可在https://github.com/KHU-AGI/GUIDE 获取。

更新时间: 2024-05-16 08:00:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09879v1

Hyperplane Arrangements and Fixed Points in Iterated PWL Neural Networks

We leverage the framework of hyperplane arrangements to analyze potential regions of (stable) fixed points. We provide an upper bound on the number of fixed points for multi-layer neural networks equipped with piecewise linear (PWL) activation functions with arbitrary many linear pieces. The theoretical optimality of the exponential growth in the number of layers of the latter bound is shown. Specifically, we also derive a sharper upper bound on the number of stable fixed points for one-hidden-layer networks with hard tanh activation.

Updated: 2024-05-16 07:57:31

标题: 超平面排列和迭代PWL神经网络中的固定点

摘要: 我们利用超平面排列的框架来分析(稳定)固定点的潜在区域。我们为多层神经网络配备具有任意数量线性片段的分段线性(PWL)激活函数提供了固定点数量的上限。后一限制中层数量的指数增长的理论最优性已被证明。具体地,我们还推导了具有硬tanh激活函数的单隐藏层网络稳定固定点数量的更尖锐的上限。

更新时间: 2024-05-16 07:57:31

领域: cs.LG,cs.AI,stat.ML,68T07,G.0

下载: http://arxiv.org/abs/2405.09878v1

Risk Management for Medical Devices via the Riskman Ontology & Shapes

We introduce the Riskman ontology & shapes for representing and analysing information about risk management for medical devices. Risk management is concerned with taking necessary precautions so a medical device does not cause harms for users or the environment. To date, risk management documentation is submitted to notified bodies (for certification) in the form of semi-structured natural language text. We propose to use classes from the Riskman ontology to logically model risk management documentation and to use the included SHACL constraints to check for syntactic completeness and conformity to relevant standards. In particular, the ontology is modelled after ISO 14971 and the recently published VDE Spec 90025. Our proposed methodology has the potential to save many person-hours for both manufacturers (when creating risk management documentation) as well as notified bodies (when assessing submitted applications for certification), and thus offers considerable benefits for healthcare and, by extension, society as a whole.

Updated: 2024-05-16 07:53:07

标题: 医疗器械风险管理:通过Riskman本体论和形状

摘要: 我们引入了Riskman本体和形状,用于表示和分析有关医疗设备风险管理的信息。风险管理涉及采取必要预防措施,以确保医疗设备不会对用户或环境造成伤害。到目前为止,风险管理文档以半结构化自然语言文本的形式提交给通知机构(用于认证)。我们建议使用Riskman本体中的类来逻辑建模风险管理文档,并使用包含的SHACL约束来检查语法完整性和符合相关标准。特别是,该本体是根据ISO 14971和最近发布的VDE Spec 90025进行建模的。我们提出的方法论有潜力为制造商(创建风险管理文档时)以及通知机构(评估提交的认证申请时)节省大量人力,并因此为医疗保健以及整个社会带来可观的好处。

更新时间: 2024-05-16 07:53:07

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.09875v1

FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation

Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labelled samples of the new classes (support set) as reference. As the test classes are novel, FSL is challenging with high generalization error with respect to the novel classes, where outliers query or support image during inference exacerbate the error further. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models. In contrast, inspired by the fact that test samples are more relevant to the target domain, we believe that test-time augmentation may be more useful than training augmentation for FSL. In this work, to reduce the bias caused by unconventional test samples, we generate new test samples through combining them with similar train-class samples. Averaged representations of the test-time augmentation are then considered for few-shot classification. According to our experiments, by augmenting the support set and query with a few additional generated sample, we can achieve improvement for trained FSL models. Importantly, our method is universally compatible with different off-the-shelf FSL models, whose performance can be improved without extra dataset nor further training of the models themselves. Codes are available at https://github.com/WendyBaiYunwei/FSL-Rectifier.

Updated: 2024-05-16 07:44:48

标题: FSL-Rectifier: 通过测试时增强纠正少样本学习中的异常值

摘要: Few-shot learning(FSL)通常需要一个模型根据一些新类别的少量标记样本(支持集)作为参考,来识别属于训练期间未见过的类别的图像(查询)。由于测试类别是新颖的,相对于新颖类别,FSL 在推理过程中存在高泛化误差的挑战,异常查询或支持图像会进一步加剧错误。到目前为止,许多算法包括训练数据增强以提高 FSL 模型的泛化能力。相反,受到测试样本与目标域更相关这一事实的启发,我们认为测试时的数据增强可能比训练增强对 FSL 更有用。在这项工作中,为了减少由于不寻常的测试样本导致的偏差,我们通过将它们与相似的训练类别样本组合来生成新的测试样本。然后考虑测试时增强的平均表示用于少样本分类。根据我们的实验,通过为支持集和查询增加一些额外生成的样本,我们可以改善经过训练的 FSL 模型。重要的是,我们的方法与不同的现成 FSL 模型通用兼容,其性能可以在不需要额外数据集或进一步训练模型本身的情况下得到改善。代码可以在https://github.com/WendyBaiYunwei/FSL-Rectifier 上找到。

更新时间: 2024-05-16 07:44:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.18292v2

Rethinking Multi-User Semantic Communications with Deep Generative Models

In recent years, novel communication strategies have emerged to face the challenges that the increased number of connected devices and the higher quality of transmitted information are posing. Among them, semantic communication obtained promising results especially when combined with state-of-the-art deep generative models, such as large language or diffusion models, able to regenerate content from extremely compressed semantic information. However, most of these approaches focus on single-user scenarios processing the received content at the receiver on top of conventional communication systems. In this paper, we propose to go beyond these methods by developing a novel generative semantic communication framework tailored for multi-user scenarios. This system assigns the channel to users knowing that the lost information can be filled in with a diffusion model at the receivers. Under this innovative perspective, OFDMA systems should not aim to transmit the largest part of information, but solely the bits necessary to the generative model to semantically regenerate the missing ones. The thorough experimental evaluation shows the capabilities of the novel diffusion model and the effectiveness of the proposed framework, leading towards a GenAI-based next generation of communications.

Updated: 2024-05-16 07:43:15

标题: 重新思考多用户语义通信:深度生成模型

摘要: 近年来,新型通信策略已经出现,以应对连接设备数量增加和传输信息质量提高所带来的挑战。其中,语义通信在与最先进的深度生成模型结合时取得了令人期待的成果,例如大型语言或扩散模型,能够从极度压缩的语义信息中重新生成内容。然而,大多数这些方法都集中在单用户场景上,接收端在传统通信系统之上处理接收到的内容。本文提出通过开发一种专为多用户场景量身定制的新型生成语义通信框架来超越这些方法。该系统根据用户分配信道,知道丢失的信息可以通过扩散模型在接收方填充。在这种创新的视角下,OFDMA系统不应该旨在传输信息的大部分,而仅仅传输生成模型重新生成缺失信息所需的位。彻底的实验评估显示了新型扩散模型的能力和所提出框架的有效性,引领向基于GenAI的下一代通信。

更新时间: 2024-05-16 07:43:15

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.09866v1

Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that the protected model and the watermark extractor are in black boxes. Under this setting, we carry out three studies. 1) We develop an extractor-gradient-guided (EGG) remover and show its effectiveness when the extractor uses ReLU activation only. 2) More generally, for an unknown extractor, we leverage adversarial attacks and design the EGG remover based on the estimated gradients. 3) Under the most stringent condition that the extractor is inaccessible, we design a transferable remover based on a set of private proxy models. In all cases, the proposed removers can successfully remove embedded watermarks while preserving the quality of the processed images, and we also demonstrate that the EGG remover can even replace the watermarks. Extensive experimental results verify the effectiveness and generalizability of the proposed attacks, revealing the vulnerabilities of the existing box-free methods and calling for further research.

Updated: 2024-05-16 07:41:54

标题: 无盒模型水印易受黑盒去除攻击的影响

摘要: 无框模型水印技术是一种新兴技术,用于保护深度学习模型的知识产权,特别是用于低级别图像处理任务。现有研究已经验证并改进了其在几个方面的有效性。然而,在本文中,我们揭示无框模型水印技术容易受到移除攻击,即使在真实世界的威胁模型下,受保护的模型和水印提取器都是黑盒。在这种设置下,我们进行了三项研究。1)我们开发了一个提取器梯度引导(EGG)去除器,并展示了当提取器仅使用ReLU激活时的有效性。2)更普遍地,对于未知的提取器,我们利用对抗攻击并基于估计的梯度设计了EGG去除器。3)在最严格的条件下,即提取器不可访问的情况下,我们设计了一个基于一组私有代理模型的可转移去除器。在所有情况下,提出的去除器可以成功去除嵌入的水印,同时保持处理图像的质量,并且我们还证明了EGG去除器甚至可以替换水印。大量实验结果验证了所提出攻击的有效性和普适性,揭示了现有无框方法的漏洞,并呼吁进一步研究。

更新时间: 2024-05-16 07:41:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09863v1

Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation

This paper addresses the unrealistic aspect of the commonly adopted Continuous Incremental Semantic Segmentation (CISS) scenario, termed overlapped. We point out that overlapped allows the same image to reappear in future tasks with different pixel labels, which is far from practical incremental learning scenarios. Moreover, we identified that this flawed scenario may lead to biased results for two commonly used techniques in CISS, pseudo-labeling and exemplar memory, resulting in unintended advantages or disadvantages for certain techniques. To mitigate this, a practical scenario called partitioned is proposed, in which the dataset is first divided into distinct subsets representing each class, and then the subsets are assigned to each corresponding task. This efficiently addresses the issue above while meeting the requirement of CISS scenario, such as capturing the background shifts. Furthermore, we identify and address the code implementation issues related to retrieving data from the exemplar memory, which was ignored in previous works. Lastly, we introduce a simple yet competitive memory-based baseline, MiB-AugM, that handles background shifts of current tasks in the exemplar memory. This baseline achieves state-of-the-art results across multiple tasks involving learning numerous new classes.

Updated: 2024-05-16 07:25:15

标题: 朝向实际的类增量语义分割场景

摘要: 本文讨论了常用的连续增量语义分割(CISS)场景中不切实际的方面,称为重叠。我们指出,重叠允许同一图像在未来任务中以不同的像素标签重新出现,这与实际的增量学习场景相去甚远。此外,我们发现这种有缺陷的场景可能导致CISS中两种常用技术——伪标签和样本记忆的偏见结果,从而为某些技术带来意外的优势或劣势。为了缓解这一问题,提出了一个实际的场景称为分区,其中数据集首先被分成代表每个类的不同子集,然后这些子集被分配给每个相应的任务。这有效地解决了上述问题,同时满足了CISS场景的要求,例如捕捉背景转移。此外,我们识别并解决了与从样本记忆中检索数据相关的代码实现问题,这在先前的工作中被忽略了。最后,我们介绍了一个简单而具有竞争力的基于记忆的基线,MiB-AugM,它处理了样本记忆中当前任务的背景转移。这一基线在涉及学习大量新类的多个任务中取得了最新的结果。

更新时间: 2024-05-16 07:25:15

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.09858v1

IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining

Pretrained Large Language Models (LLM) such as ChatGPT, Claude, etc. have demonstrated strong capabilities in various fields of natural language generation. However, there are still many problems when using LLM in specialized domain-specific fields. When using generative AI to process downstream tasks, a common approach is to add new knowledge (e.g., private domain knowledge, cutting-edge information) to a pretrained model through continued training or fine-tuning. However, whether there is a universal paradigm for domain adaptation training is still an open question. In this article, we proposed Information Gain Optimized Tokenizer (IGOT), which analyzes the special token set of downstream tasks, constructs a new subset using heuristic function $\phi$ with the special token and its information gain, to build new domain-specific tokenizer, and continues pretraining on the downstream task data. We explored the many positive effects of this method's customized tokenizer on domain-adaptive pretraining and verified this method can perform better than the ordinary method of just collecting data and fine-tuning. Based on our experiment, the continued pretraining process of IGOT with LLaMA-7B achieved 11.9\% token saving, 12.2\% training time saving, and 5.8\% maximum GPU VRAM usage saving, combined with the T5 model, we can even reach a 31.5\% of training time saving, making porting general generative AI to specific domains more effective than before. In domain-specific tasks, supervised $IGOT_\tau$ shows great performance on reducing both the convergence radius and convergence point during keep pretraining.

Updated: 2024-05-16 07:25:10

标题: IGOT:领域自适应预训练中信息增益优化的分词器

摘要: 预训练大型语言模型(LLM)如ChatGPT、Claude等已经展示出在自然语言生成的各个领域具有强大的能力。然而,在专门领域中使用LLM时仍然存在许多问题。在使用生成式人工智能处理下游任务时,一种常见的方法是通过持续训练或微调向预训练模型添加新知识(例如私有领域知识、最新信息)。然而,是否存在一种通用的领域适应训练范式仍然是一个未解之谜。在本文中,我们提出了信息增益优化分词器(IGOT),该分析下游任务的特殊token集合,使用启发式函数$\phi$构建一个新的子集,其中包括特殊token及其信息增益,以构建新的领域特定分词器,并在下游任务数据上继续预训练。我们探讨了这种方法定制分词器在领域自适应预训练中的许多积极影响,并验证了这种方法可以比仅收集数据和微调的普通方法表现更好。根据我们的实验,IGOT与LLaMA-7B的持续预训练过程实现了11.9\%的token节省、12.2\%的训练时间节省和5.8\%的最大GPU VRAM使用节省,结合T5模型,甚至可以实现31.5\%的训练时间节省,使将通用生成式人工智能移植到特定领域比以往更加有效。在特定领域任务中,监督$IGOT_\tau$在保持预训练过程中减少了收敛半径和收敛点,表现出很好的性能。

更新时间: 2024-05-16 07:25:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09857v1

Training-Free Consistent Text-to-Image Generation

Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language. However, using these models to consistently portray the same subject across diverse prompts remains challenging. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects or add image conditioning to the model. These methods require lengthy per-subject optimization or large-scale pre-training. Moreover, they struggle to align generated images with text prompts and face difficulties in portraying multiple subjects. Here, we present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model. We introduce a subject-driven shared attention block and correspondence-based feature injection to promote subject consistency between images. Additionally, we develop strategies to encourage layout diversity while maintaining subject consistency. We compare ConsiStory to a range of baselines, and demonstrate state-of-the-art performance on subject consistency and text alignment, without requiring a single optimization step. Finally, ConsiStory can naturally extend to multi-subject scenarios, and even enable training-free personalization for common objects.

Updated: 2024-05-16 07:17:55

标题: 无需训练的一致文本到图像生成

摘要: 文本到图像模型通过允许用户通过自然语言引导图像生成过程,提供了新的创造性灵活性水平。然而,使用这些模型来在不同提示下一致地描绘相同主题仍然具有挑战性。现有方法通过微调模型来教授描述特定用户提供主题的新单词,或向模型添加图像调节。这些方法需要针对每个主题进行漫长优化或大规模预训练。此外,它们在将生成的图像与文本提示对齐以及描绘多个主题方面遇到困难。在这里,我们提出了一种名为ConsiStory的无需训练的方法,通过共享预训练模型的内部激活来实现一致的主题生成。我们引入了一个主题驱动的共享注意力块和基于对应关系的特征注入,以促进图像之间的主题一致性。此外,我们发展了一些策略来鼓励布局多样性同时保持主题一致性。我们将ConsiStory与一系列基线进行比较,并展示了在主题一致性和文本对齐方面的最新表现,而无需进行任何优化步骤。最后,ConsiStory可以自然地扩展到多主题场景,甚至实现无需训练的常见对象个性化。

更新时间: 2024-05-16 07:17:55

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2402.03286v2

ALBA: Adaptive Language-based Assessments for Mental Health

Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment ALBA, which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method ALIRT and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ~ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.

Updated: 2024-05-16 07:13:10

标题: ALBA:针对心理健康的自适应基于语言的评估

摘要: 心理健康问题在个体之间存在广泛差异,表现出不同的迹象和症状。最近,基于语言的评估显示出在捕捉这种多样性方面有一定的潜力,但为了准确性,需要每个人有大量的词汇样本。本文介绍了自适应语言评估任务ALBA,该任务涉及在对问题进行自适应排序的同时,还使用先前问题的有限语言回应对个体的潜在心理特质进行评分。为此,我们基于两种心理测量理论(经典测验理论和项目反应理论)开发了自适应测试方法。我们对排序和评分策略进行了实证评估,将其组织成两种新方法:基于半监督项目反应理论的方法ALIRT和监督的演员-评论家模型。虽然我们发现这两种方法相对于非自适应基线有所改善,但我们发现ALIRT是最准确和可扩展的,仅需较少的问题就能实现最高准确性(例如,与通常需要至少7个问题相比,仅3个问题后Pearson r约为0.93)。总的来说,基于语言的自适应评估抑郁和焦虑能够利用更小的语言样本,而不会影响有效性或增加大量的计算成本。

更新时间: 2024-05-16 07:13:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.06467v2

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

In this paper, we deeply explore several mechanisms employed by Transformer-based language models in factual recall tasks. In zero-shot scenarios, given a prompt like ``The capital of France is,'' task-specific attention heads extract the topic entity, such as ``France,'' from the context and pass it to subsequent MLPs to recall the required answer such as ``Paris.'' We introduce a novel analysis method aimed at decomposing the outputs of the MLP into components understandable by humans. Through this method, we quantify the function of the MLP layer following these task-specific heads. In the residual stream, it either erases or amplifies the information originating from individual heads. Moreover, it generates a component that redirects the residual stream towards the direction of its expected answer. These zero-shot mechanisms are also employed in few-shot scenarios. Additionally, we observed a widely existent anti-overconfidence mechanism in the final layer of models, which suppresses correct predictions. We mitigate this suppression by leveraging our interpretation to improve factual recall confidence. Our interpretations have been evaluated across various language models, including the GPT-2 families, 1.3B OPT, and 7B Llama-2, encompassing diverse tasks spanning various domains of factual knowledge.

Updated: 2024-05-16 07:04:12

标题: 使用基于Transformer的语言模型解释事实回忆的关键机制

摘要: 在本文中,我们深入探讨了基于Transformer的语言模型在事实回忆任务中采用的几种机制。在零-shot场景中,给定类似“法国的首都是”的提示,任务特定的注意力头从上下文中提取主题实体,如“法国”,并将其传递给后续的MLPs,以回忆所需的答案,如“巴黎”。我们引入了一种新颖的分析方法,旨在将MLP的输出分解为人类可理解的组件。通过这种方法,我们量化了MLP层在这些任务特定头部之后的功能。在残余流中,它要么擦除,要么放大来自各个头部的信息。此外,它生成一个组件,将残余流重定向到其预期答案的方向。这些零-shot机制也应用于少量-shot场景。此外,我们观察到模型最终层存在一个广泛存在的抑制正确预测的反过度自信机制。我们通过利用我们的解释来提高事实回忆的信心,来减轻这种抑制。我们的解释已在各种语言模型中进行了评估,包括GPT-2系列,1.3B OPT和7B Llama-2,涵盖了跨越各种领域的各种事实知识任务。

更新时间: 2024-05-16 07:04:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.19521v3

Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.

Updated: 2024-05-16 06:55:11

标题: 通过软负采样增强多模态思维链中的语义

摘要: 思维链(CoT)已被证明在需要复杂推理的问题中很有用。许多这些问题既是文本的又是多模态的。鉴于不同模态的输入,模型生成一个理由,然后使用它来回答一个问题。由于幻觉问题,生成的具有高文本质量但语义不合逻辑的软负理由并不总是有助于提高答案准确性。本研究提出了一种使用软负采样(SNSE-CoT)来减轻多模态CoT中幻觉的理由生成方法。应用了五种方法来生成共享高度相似文本但与原始文本语义不同的软负样本。双向边界损失(BML)被应用于将它们引入传统的只涉及正负样本的对比学习框架中。对ScienceQA数据集进行了大量实验证明了所提出方法的有效性。代码和数据已发布在https://github.com/zgMin/SNSE-CoT。

更新时间: 2024-05-16 06:55:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09848v1

Exploring Correlations of Self-Supervised Tasks for Graphs

Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations. Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations. Through this process, we unveil the task correlations between various self-supervised tasks and can measure their expressive capabilities, which are closely related to downstream performance. By analyzing the correlation values between tasks across various datasets, we reveal the complexity of task correlations and the limitations of existing multi-task learning methods. To obtain more capable representations, we propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training. The experimental results indicate that our method significantly outperforms existing methods across various downstream tasks.

Updated: 2024-05-16 06:51:23

标题: 探索图自监督任务之间的相关性

摘要: 图自监督学习在训练具有信息性表示而无需访问任何标记数据方面引发了一波研究热潮。然而,我们对图自监督学习的理解仍然有限,各种自监督任务之间的固有关系仍未被探索。本文旨在基于任务相关性提供对图自监督学习的新理解。具体来说,我们评估了由一个特定任务训练的表示在其他任务上的性能,并定义相关值以量化任务之间的相关性。通过这一过程,我们揭示了各种自监督任务之间的任务相关性,并可以衡量它们的表达能力,这与下游性能密切相关。通过分析跨各种数据集的任务之间的相关值,我们揭示了任务相关性的复杂性以及现有多任务学习方法的局限性。为了获得更有能力的表示,我们提出了图任务相关性建模(GraphTCM)来说明任务相关性,并利用它来增强图自监督训练。实验结果表明,我们的方法在各种下游任务上明显优于现有方法。

更新时间: 2024-05-16 06:51:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04245v2

Simultaneous Identification of Sparse Structures and Communities in Heterogeneous Graphical Models

Exploring and detecting community structures hold significant importance in genetics, social sciences, neuroscience, and finance. Especially in graphical models, community detection can encourage the exploration of sets of variables with group-like properties. In this paper, within the framework of Gaussian graphical models, we introduce a novel decomposition of the underlying graphical structure into a sparse part and low-rank diagonal blocks (non-overlapped communities). We illustrate the significance of this decomposition through two modeling perspectives and propose a three-stage estimation procedure with a fast and efficient algorithm for the identification of the sparse structure and communities. Also on the theoretical front, we establish conditions for local identifiability and extend the traditional irrepresentability condition to an adaptive form by constructing an effective norm, which ensures the consistency of model selection for the adaptive $\ell_1$ penalized estimator in the second stage. Moreover, we also provide the clustering error bound for the K-means procedure in the third stage. Extensive numerical experiments are conducted to demonstrate the superiority of the proposed method over existing approaches in estimating graph structures. Furthermore, we apply our method to the stock return data, revealing its capability to accurately identify non-overlapped community structures.

Updated: 2024-05-16 06:38:28

标题: 在异质图模型中同时识别稀疏结构和社区

摘要: 在遗传学、社会科学、神经科学和金融领域,探索和检测社区结构具有重要意义。特别是在图模型中,社区检测可以促进具有类似组属性的变量集的探索。在本文中,在高斯图模型的框架内,我们介绍了对底层图结构的一种新的分解,将其分为稀疏部分和低秩对角块(非重叠社区)。我们通过两种建模视角说明了这种分解的重要性,并提出了一个三阶段估计程序,配以快速高效的算法,用于识别稀疏结构和社区。此外,在理论方面,我们建立了局部可辨识性条件,并通过构建有效的范数将传统的不可辨识条件扩展为自适应形式,从而确保自适应$\ell_1$惩罚估计器在第二阶段的模型选择的一致性。此外,我们还为第三阶段的K均值过程提供了聚类误差界限。我们进行了大量数值实验,以展示所提出的方法在估计图结构方面优于现有方法。此外,我们将我们的方法应用于股票回报数据,揭示了其准确识别非重叠社区结构的能力。

更新时间: 2024-05-16 06:38:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.09841v1

Advances in Robust Federated Learning: Heterogeneity Considerations

In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous federated learning and summarize the research challenges in federated learning in terms of five aspects: data, model, task, device, and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of federated learning, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous federated learning environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous federated learning.

Updated: 2024-05-16 06:35:42

标题: 强大的联邦学习进展:异构性考虑

摘要: 在异构联合学习(FL)领域,关键挑战是如何有效地协作训练跨多个具有不同数据分布、模型结构、任务目标、计算能力和通信资源的客户端的模型。这种多样性导致了显著的异质性,增加了模型训练的复杂性。本文首先概述了异构联合学习的基本概念,并从数据、模型、任务、设备和通信五个方面总结了联邦学习中的研究挑战。此外,我们探讨了现有最先进的方法如何应对联合学习的异质性,并将这些方法分为数据级、模型级和架构级三个不同层次进行分类和审查。随后,本文广泛讨论了在异构联合学习环境中的隐私保护策略。最后,本文讨论了当前的开放问题和未来研究方向,旨在促进异构联合学习的进一步发展。

更新时间: 2024-05-16 06:35:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.09839v1

Unsupervised Work Behavior Pattern Extraction Based on Hierarchical Probabilistic Model

Evolving consumer demands and market trends have led to businesses increasingly embracing a production approach that prioritizes flexibility and customization. Consequently, factory workers must engage in tasks that are more complex than before. Thus, productivity depends on each worker's skills in assembling products. Therefore, analyzing the behavior of a worker is crucial for work improvement. However, manual analysis is time consuming and does not provide quick and accurate feedback. Machine learning have been attempted to automate the analyses; however, most of these methods need several labels for training. To this end, we extend the Gaussian process hidden semi-Markov model (GP-HSMM), to enable the rapid and automated analysis of worker behavior without pre-training. The model does not require labeled data and can automatically and accurately segment continuous motions into motion classes. The proposed model is a probabilistic model that hierarchically connects GP-HSMM and HSMM, enabling the extraction of behavioral patterns with different granularities. Furthermore, it mutually infers the parameters between the GP-HSMM and HSMM, resulting in accurate motion pattern extraction. We applied the proposed method to motion data in which workers assembled products at an actual production site. The accuracy of behavior pattern extraction was evaluated using normalized Levenshtein distance (NLD). The smaller the value of NLD, the more accurate is the pattern extraction. The NLD of motion patterns captured by GP-HSMM and HSMM layers in our proposed method was 0.50 and 0.33, respectively, which are the smallest compared to that of the baseline methods.

Updated: 2024-05-16 06:31:02

标题: 基于分层概率模型的无监督工作行为模式提取

摘要: 不断变化的消费者需求和市场趋势导致企业越来越倾向于采用注重灵活性和定制化的生产方式。因此,工厂工人必须从事比以往更复杂的任务。因此,生产效率取决于每个工人在组装产品方面的技能。因此,分析工人的行为对于工作改进至关重要。然而,手动分析耗时且无法提供快速准确的反馈。机器学习已尝试自动化分析;然而,大多数方法需要多个标签进行训练。为此,我们扩展了高斯过程隐藏半马尔可夫模型(GP-HSMM),以实现工人行为的快速自动化分析而无需预先训练。该模型不需要有标签的数据,可以自动准确地将连续动作分段为动作类别。所提出的模型是一个概率模型,层次连接了GP-HSMM和HSMM,实现了不同粒度行为模式的提取。此外,它在GP-HSMM和HSMM之间相互推断参数,从而实现准确的运动模式提取。我们将所提出的方法应用于工人在实际生产现场组装产品的运动数据。使用标准化的Levenshtein距离(NLD)评估了行为模式提取的准确性。NLD值越小,模式提取越准确。在我们提出的方法中,GP-HSMM和HSMM层捕获的运动模式的NLD分别为0.50和0.33,与基准方法相比是最小的。

更新时间: 2024-05-16 06:31:02

领域: cs.LG

下载: http://arxiv.org/abs/2405.09838v1

Improving Transformers using Faithful Positional Encoding

We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.

Updated: 2024-05-16 06:26:43

标题: 利用忠实的位置编码改进Transformer

摘要: 我们提出了一种新的位置编码方法,用于名为Transformer的神经网络架构。与标准的正弦位置编码不同,我们的方法基于坚实的数学基础,并保证不会丢失关于输入序列位置顺序的信息。我们展示了新的编码方法系统地改进了时间序列分类任务中的预测性能。

更新时间: 2024-05-16 06:26:43

领域: cs.LG

下载: http://arxiv.org/abs/2405.09061v2

OpenBox: A Python Toolkit for Generalized Black-box Optimization

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly interfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

Updated: 2024-05-16 06:17:53

标题: OpenBox:用于广义黑盒优化的Python工具包

摘要: 黑盒优化(BBO)具有广泛的应用范围,包括自动机器学习、实验设计和数据库旋钮调整。然而,用户在将BBO方法应用于现有软件包解决问题时仍面临挑战,包括适用性、性能和效率方面。本文介绍了OpenBox,一个具有改进可用性的开源BBO工具包。它实现了用户友好的界面和可视化,供用户定义和管理任务。OpenBox背后的模块化设计促进了其在现有系统中的灵活部署。实验结果证明了OpenBox相对于现有系统的有效性和效率。OpenBox的源代码可在https://github.com/PKU-DAIR/open-box获取。

更新时间: 2024-05-16 06:17:53

领域: cs.LG

下载: http://arxiv.org/abs/2304.13339v3

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

In this paper, we investigate the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the feature dimension $d$ and the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{\mathcal{O}}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the MNL contextual bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.

Updated: 2024-05-16 06:07:31

标题: 多项式逻辑老虎机的几乎极小化最优遗憾

摘要: 在本文中,我们研究了上下文多项式Logit(MNL)赌博问题,其中学习代理根据上下文信息顺序选择一组商品,用户反馈遵循MNL选择模型。特别是在特征维度$d$和最大商品数量$K$方面,下界和上界之间存在显著的差异。此外,这些边界之间奖励结构的变化使得追求最优性变得复杂。在统一奖励下,所有商品具有相同的预期奖励,我们建立了一个$\Omega(d\sqrt{\smash[b]{T/K}})$的遗憾下界,并提出了一个常数时间算法OFU-MNL+,可以实现$\tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}})$的匹配上界。在非均匀奖励情况下,我们证明了一个$\Omega(d\sqrt{T})$的下界和一个$\tilde{\mathcal{O}}(d\sqrt{T})$的上界,OFU-MNL+也能达到这个上界。我们的实证研究支持了这些理论发现。据我们所知,这是MNL上下文赌博文献中第一篇证明极小化最优性的工作,无论是在统一还是非统一奖励设置下,并且提出了一个计算效率高的算法,可以在对数因子内实现这种最优性。

更新时间: 2024-05-16 06:07:31

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.09831v1

Parallel Backpropagation for Shared-Feature Visualization

High-level visual brain regions contain subareas in which neurons appear to respond more strongly to examples of a particular semantic category, like faces or bodies, rather than objects. However, recent work has shown that while this finding holds on average, some out-of-category stimuli also activate neurons in these regions. This may be due to visual features common among the preferred class also being present in other images. Here, we propose a deep-learning-based approach for visualizing these features. For each neuron, we identify relevant visual features driving its selectivity by modelling responses to images based on latent activations of a deep neural network. Given an out-of-category image which strongly activates the neuron, our method first identifies a reference image from the preferred category yielding a similar feature activation pattern. We then backpropagate latent activations of both images to the pixel level, while enhancing the identified shared dimensions and attenuating non-shared features. The procedure highlights image regions containing shared features driving responses of the model neuron. We apply the algorithm to novel recordings from body-selective regions in macaque IT cortex in order to understand why some images of objects excite these neurons. Visualizations reveal object parts which resemble parts of a macaque body, shedding light on neural preference of these objects.

Updated: 2024-05-16 05:56:03

标题: 并行反向传播用于共享特征可视化

摘要: 高级视觉脑区包含亚区,其中神经元对特定语义类别的示例(如面部或身体)的响应似乎比对对象更强烈。然而,最近的研究表明,虽然这一发现平均成立,但一些类别外的刺激也会激活这些区域的神经元。这可能是因为首选类别中常见的视觉特征也存在于其他图像中。在这里,我们提出了一种基于深度学习的方法来可视化这些特征。对于每个神经元,我们通过建模基于深度神经网络的潜在激活的图像响应,识别驱动其选择性的相关视觉特征。对于强烈激活该神经元的类别外图像,我们的方法首先识别一个从属于首选类别的参考图像,产生类似的特征激活模式。然后,我们将两个图像的潜在激活向后传播到像素级别,同时增强识别的共享维度并减弱非共享特征。该过程突出显示包含驱动模型神经元响应的共享特征的图像区域。我们将该算法应用于猴IT皮层中的身体选择性区域的新记录,以了解为什么一些对象的图像会激发这些神经元。可视化揭示了类似猴体部位的对象部分,揭示了这些对象的神经偏好。

更新时间: 2024-05-16 05:56:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.09827v1

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability. Our code can be found at https://github.com/XZhang97666/PrivacyBoost-SLM.

Updated: 2024-05-16 05:53:55

标题: 增强小型医学学习者的隐私保护上下文提示

摘要: 大型语言模型(LLMs)展示了出色的医学专业知识,但数据隐私问题阻碍了它们在医疗环境中的直接使用。尽管领域特定的小语言模型(SLMs)提供了更好的数据隐私保护,但通常表现不如LLMs,强调了需要减少这种性能差距同时缓解隐私问题的方法。在本文中,我们提出了一种简单而有效的方法,利用LLMs的医学专业知识来增强在隐私受限情况下医学任务中SLM的性能。具体而言,我们通过从医学数据中提取关键词并促使LLM生成一个模拟临床医生思维过程的医学知识密集型背景来缓解患者隐私问题。这个背景作为SLMs的额外输入,增强了它们的决策能力。我们的方法显著提高了在三个医学知识密集型任务中的性能,无论是在少样本还是完整训练设置中,与没有背景的SLM微调相比,绝对准确率提高了高达22.57%,并在两个医学任务中设立了新的最先进结果,在隐私受限情况下。进一步在两个一般领域数据集中进行的域外测试和实验证明了其泛化性和广泛适用性。我们的代码可以在https://github.com/XZhang97666/PrivacyBoost-SLM 上找到。

更新时间: 2024-05-16 05:53:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.12723v2

Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at https://github.com/ychika/DPSW.

Updated: 2024-05-16 05:38:03

标题: 不同可微的帕累托平滑加权用于高维异质性治疗效果估计

摘要: 越来越多的人对使用其高维特征属性估计个体之间的异质性处理效果感兴趣。在这种高维异质性处理效果估计中取得高性能是具有挑战性的,因为在这种设置中,通常一些特征会引起样本选择偏差,而其他特征则不会,但会预测潜在结果。为了避免丢失这些预测特征信息,现有方法使用逆概率加权(IPW)学习单独的特征表示。然而,由于它们的数值不稳定的IPW权重,在有限样本设置下,这些方法会遭受估计偏差。为了通过加权表示学习开发一个数值稳健的估计器,我们提出了一个可微的帕累托平滑加权框架,以一种端到端的方式替换极端权重值。我们的实验结果表明,通过有效地校正权重值,我们提出的方法优于现有方法,包括传统的加权方案。我们的代码可以在https://github.com/ychika/DPSW 上找到。

更新时间: 2024-05-16 05:38:03

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2404.17483v4

Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students

Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, investigating algorithmic bias based on students' regional background. The university used the Canvas learning management system (LMS) in its online courses across a broad range of domains. Over the period of three semesters, we collected 48.7 million log records of the students' activity in Canvas. We used these logs to train binary classification models that predict student grades from the LMS activity. The best-performing model reached AUC of 0.75 and weighted F1-score of 0.79. Subsequently, we examined the data for bias based on students' region. Evaluation using three metrics: AUC, weighted F1-score, and MADD showed consistent results across all demographic groups. Thus, no unfairness was observed against a particular student group in the grade predictions.

Updated: 2024-05-16 05:37:50

标题: 评估算法偏见在预测菲律宾学生学术表现模型中的作用

摘要: 算法偏见是教育背景中机器学习模型的一个重要问题。然而,在亚洲学习环境中,这个问题尚未得到充分研究,只有有限的工作考虑了基于区域(次国家级)背景的算法偏见。为了填补这一空白,本文研究了菲律宾一所大学的5986名学生的人口,调查了基于学生地区背景的算法偏见。该大学在其在线课程中使用了Canvas学习管理系统(LMS),涵盖了广泛的领域。在三个学期的时间内,我们收集了学生在Canvas中的活动的4870万条日志记录。我们利用这些日志训练了二元分类模型,用于预测学生的LMS活动成绩。表现最佳的模型达到了AUC为0.75和加权F1分数为0.79。随后,我们根据学生的地区基础检查了数据中的偏见。使用三个指标进行评估:AUC、加权F1分数和MADD,在所有人口群体中都得到了一致的结果。因此,在成绩预测中没有观察到针对特定学生群体的不公平行为。

更新时间: 2024-05-16 05:37:50

领域: cs.LG,cs.CY,K.3

下载: http://arxiv.org/abs/2405.09821v1

Densely Distilling Cumulative Knowledge for Continual Learning

Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track the model's capabilities. It partitions the output logits of the model into dense groups, each corresponding to a task in the task pool. It then distills all tasks' knowledge using all groups. However, using all the groups can be computationally expensive, we also suggest random group selection in each optimization step. Moreover, we propose an adaptive weighting scheme, which balances the learning of new classes and the retention of old classes, based on the count and similarity of the classes. Our DKD outperforms recent state-of-the-art baselines across diverse benchmarks and scenarios. Empirical analysis underscores DKD's ability to enhance model stability, promote flatter minima for improved generalization, and remains robust across various memory budgets and task orders. Moreover, it seamlessly integrates with other CL methods to boost performance and proves versatile in offline scenarios like model compression.

Updated: 2024-05-16 05:37:06

标题: 密集提炼累积知识以实现持续学习

摘要: 不断学习,涉及在多样化任务上进行顺序训练,常常面临灾难性遗忘。虽然基于知识蒸馏的方法在防止遗忘方面取得了显著成功,但我们指出了它们在提炼所有先前任务的累积知识能力方面存在局限性。为了解决这个问题,我们提出了密集知识蒸馏(DKD)。DKD使用任务池来跟踪模型的能力。它将模型的输出逻辑分成密集组,每个组对应任务池中的一个任务。然后,它使用所有组来提炼所有任务的知识。然而,使用所有组可能会计算量巨大,我们还建议在每个优化步骤中进行随机组选择。此外,我们提出了一种自适应加权方案,根据类别的数量和相似度来平衡新类别的学习和旧类别的保留。我们的DKD在各种基准和场景中均优于最近的最先进基线。实证分析强调了DKD提升模型稳定性、促进更平缓的最小值以提高泛化能力,并在各种内存预算和任务顺序中保持稳健。此外,它与其他连续学习方法无缝集成,以提升性能,并在模型压缩等离线场景中证明了其多功能性。

更新时间: 2024-05-16 05:37:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.09820v1

Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning

This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into machine learning to solve the problems faced by existing MLOps and improve productivity. This paper focuses on the importance of automated model training, and the method to ensure the transparency and repeatability of the training process through version control system. In addition, the challenges of integrating machine learning components into traditional CI/CD pipelines are discussed, and solutions such as versioning environments and containerization are proposed. Finally, the paper emphasizes the importance of continuous monitoring and feedback loops after model deployment to maintain model performance and reliability. Using case studies and best practices from Netflix, the article presents key strategies and lessons learned for successful implementation of MLOps practices, providing valuable references for other organizations to build and optimize their own MLOps practices.

Updated: 2024-05-16 05:36:28

标题: 自动化MLOps中的模型训练和部署:通过将系统集成到机器学习中

摘要: 这篇文章介绍了机器学习在现实世界应用中的重要性,并探讨了MLOps(机器学习运营)的兴起及其在解决模型部署和性能监控等挑战中的重要性。通过回顾MLOps的发展历程及其与传统软件开发方法的关系,本文提出了将该系统整合到机器学习中以解决现有MLOps面临的问题并提高生产力的方法。本文着重介绍了自动模型训练的重要性,以及通过版本控制系统确保训练过程的透明度和可重复性的方法。此外,还讨论了将机器学习组件整合到传统CI/CD管道中的挑战,并提出了版本环境和容器化等解决方案。最后,本文强调了在模型部署后持续监控和反馈循环的重要性,以保持模型性能和可靠性。利用Netflix的案例研究和最佳实践,本文呈现了成功实施MLOps实践的关键策略和经验教训,为其他组织建立和优化自己的MLOps实践提供了有价值的参考。

更新时间: 2024-05-16 05:36:28

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2405.09819v1

Rectified Gaussian kernel multi-view k-means clustering

In this paper, we show two new variants of multi-view k-means (MVKM) algorithms to address multi-view data. The general idea is to outline the distance between $h$-th view data points $x_i^h$ and $h$-th view cluster centers $a_k^h$ in a different manner of centroid-based approach. Unlike other methods, our proposed methods learn the multi-view data by calculating the similarity using Euclidean norm in the space of Gaussian-kernel, namely as multi-view k-means with exponent distance (MVKM-ED). By simultaneously aligning the stabilizer parameter $p$ and kernel coefficients $\beta^h$, the compression of Gaussian-kernel based weighted distance in Euclidean norm reduce the sensitivity of MVKM-ED. To this end, this paper designated as Gaussian-kernel multi-view k-means (GKMVKM) clustering algorithm. Numerical evaluation of five real-world multi-view data demonstrates the robustness and efficiency of our proposed MVKM-ED and GKMVKM approaches.

Updated: 2024-05-16 05:31:45

标题: 矫正高斯核多视角k均值聚类

摘要: 在本文中,我们展示了两种新的多视角k均值(MVKM)算法变体,以解决多视角数据。总体思路是以一种不同于基于质心的方法来概述第$h$视角数据点$x_i^h$与第$h$视角簇中心$a_k^h$之间的距离。与其他方法不同,我们提出的方法通过在高斯核空间中使用欧氏范数计算相似性,即作为带指数距离的多视角k均值(MVKM-ED)。通过同时调整稳定器参数$p$和核系数$\beta^h$,高斯核加权距离在欧氏范数中的压缩降低了MVKM-ED的敏感性。因此,本文将其指定为高斯核多视角k均值(GKMVKM)聚类算法。对五个真实世界的多视角数据进行数值评估展示了我们提出的MVKM-ED和GKMVKM方法的鲁棒性和效率。

更新时间: 2024-05-16 05:31:45

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.05619v3

Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary Data

Active learning optimizes the exploration of large parameter spaces by strategically selecting which experiments or simulations to conduct, thus reducing resource consumption and potentially accelerating scientific discovery. A key component of this approach is a probabilistic surrogate model, typically a Gaussian Process (GP), which approximates an unknown functional relationship between control parameters and a target property. However, conventional GPs often struggle when applied to systems with discontinuities and non-stationarities, prompting the exploration of alternative models. This limitation becomes particularly relevant in physical science problems, which are often characterized by abrupt transitions between different system states and rapid changes in physical property behavior. Fully Bayesian Neural Networks (FBNNs) serve as a promising substitute, treating all neural network weights probabilistically and leveraging advanced Markov Chain Monte Carlo techniques for direct sampling from the posterior distribution. This approach enables FBNNs to provide reliable predictive distributions, crucial for making informed decisions under uncertainty in the active learning setting. Although traditionally considered too computationally expensive for 'big data' applications, many physical sciences problems involve small amounts of data in relatively low-dimensional parameter spaces. Here, we assess the suitability and performance of FBNNs with the No-U-Turn Sampler for active learning tasks in the 'small data' regime, highlighting their potential to enhance predictive accuracy and reliability on test functions relevant to problems in physical sciences.

Updated: 2024-05-16 05:20:47

标题: 使用全贝叶斯神经网络进行不连续和非平稳数据的主动学习

摘要: 主动学习通过有策略地选择要进行的实验或模拟来优化对大参数空间的探索,从而减少资源消耗并可能加速科学发现。这种方法的一个关键组成部分是概率代理模型,通常是高斯过程(GP),它近似描述了控制参数和目标特性之间的未知函数关系。然而,当应用于具有不连续性和非恒定性的系统时,传统的高斯过程通常会遇到困难,促使探索替代模型。这种限制在物理科学问题中尤为重要,因为这些问题通常具有不同系统状态之间的突然转变和物理特性行为的快速变化。全贝叶斯神经网络(FBNNs)被视为一种有前途的替代方案,将所有神经网络权重视为概率,并利用先进的马尔可夫链蒙特卡洛技术直接从后验分布中采样。这种方法使得FBNNs能够提供可靠的预测分布,在主动学习环境下做出明智决策至关重要。尽管传统上被认为在“大数据”应用中计算成本过高,但许多物理科学问题涉及大量数据量较小的低维参数空间。在这里,我们评估了使用No-U-Turn采样器的FBNNs在“小数据”范围内的主动学习任务的适用性和性能,突显了它们在物理科学问题相关的测试函数上提高预测准确性和可靠性的潜力。

更新时间: 2024-05-16 05:20:47

领域: cs.LG,physics.data-an

下载: http://arxiv.org/abs/2405.09817v1

BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry

This paper presents BrepGen, a diffusion-based generative approach that directly outputs a Boundary representation (B-rep) Computer-Aided Design (CAD) model. BrepGen represents a B-rep model as a novel structured latent geometry in a hierarchical tree. With the root node representing a whole CAD solid, each element of a B-rep model (i.e., a face, an edge, or a vertex) progressively turns into a child-node from top to bottom. B-rep geometry information goes into the nodes as the global bounding box of each primitive along with a latent code describing the local geometric shape. The B-rep topology information is implicitly represented by node duplication. When two faces share an edge, the edge curve will appear twice in the tree, and a T-junction vertex with three incident edges appears six times in the tree with identical node features. Starting from the root and progressing to the leaf, BrepGen employs Transformer-based diffusion models to sequentially denoise node features while duplicated nodes are detected and merged, recovering the B-Rep topology information. Extensive experiments show that BrepGen advances the task of CAD B-rep generation, surpassing existing methods on various benchmarks. Results on our newly collected furniture dataset further showcase its exceptional capability in generating complicated geometry. While previous methods were limited to generating simple prismatic shapes, BrepGen incorporates free-form and doubly-curved surfaces for the first time. Additional applications of BrepGen include CAD autocomplete and design interpolation. The code, pretrained models, and dataset are available at https://github.com/samxuxiang/BrepGen.

Updated: 2024-05-16 05:20:19

标题: BrepGen:具有结构化潜在几何的B-rep生成扩散模型

摘要: 本文介绍了一种基于扩散的生成方法BrepGen,该方法直接输出边界表示(B-rep)计算机辅助设计(CAD)模型。BrepGen将B-rep模型表示为一种新颖的结构化潜在几何体,在层次树中。根节点表示整个CAD实体,B-rep模型的每个元素(即面、边或顶点)逐渐从顶部向底部变成子节点。B-rep几何信息进入节点,每个基本图形的全局边界框以及描述局部几何形状的潜在代码。B-rep拓扑信息通过节点复制隐式表示。当两个面共享一条边时,该边曲线将在树中出现两次,具有三个入射边的T形交点顶点将在树中出现六次,具有相同的节点特征。从根节点开始,BrepGen使用基于Transformer的扩散模型,逐步去噪节点特征,同时检测和合并重复的节点,恢复B-Rep拓扑信息。大量实验表明,BrepGen推动了CAD B-rep生成任务,超越了各种基准测试上的现有方法。我们新收集的家具数据集上的结果进一步展示了其在生成复杂几何形状方面的出色能力。而以前的方法仅限于生成简单的棱柱形状,BrepGen首次包括自由形式和双曲面。BrepGen的其他应用包括CAD自动完成和设计插值。代码、预训练模型和数据集可在https://github.com/samxuxiang/BrepGen 上找到。

更新时间: 2024-05-16 05:20:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.15563v2

Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Divergence measures play a central role in machine learning and become increasingly essential in deep learning. However, valid and computationally efficient divergence measures for multiple (more than two) distributions are scarcely investigated. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both unavoidable and essential. Examples include clustering, multi-source domain adaptation or generalization, and multi-view learning, among others. Although calculating the mean of pairwise distances between any two distributions serves as a common way to quantify the total divergence among multiple distributions, it is crucial to acknowledge that this approach is not straightforward and requires significant computational resources. In this study, we introduce a new divergence measure for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD), which is inspired by the classic Cauchy-Schwarz divergence. Additionally, we provide a closed-form sample estimator based on kernel density estimation, making it convenient and straightforward to use in various machine-learning applications. Finally, we apply the proposed GCSD to two challenging machine learning tasks, namely deep learning-based clustering and the problem of multi-source domain adaptation. The experimental results showcase the impressive performance of GCSD in both tasks, highlighting its potential application in machine-learning areas that involve quantifying multiple distributions.

Updated: 2024-05-16 05:11:43

标题: Generalized Cauchy-Schwarz 散度及其在深度学习中的应用

摘要: 分歧度量在机器学习中发挥着核心作用,并在深度学习中变得越来越重要。然而,针对多个(两个以上)分布的有效和计算效率高的分歧度量研究得很少。这在需要同时管理多个分布且不可避免且至关重要的领域尤为关键。示例包括聚类、多源域适应或泛化以及多视图学习等。尽管计算任意两个分布之间的距离均值是衡量多个分布之间总体分歧的常见方法,但必须承认这种方法并不简单且需要大量计算资源。在本研究中,我们介绍了一种新的多分布分歧度量,名为广义柯西-施瓦茨分歧(GCSD),灵感来自经典的柯西-施瓦茨分歧。此外,我们基于核密度估计提供了一个封闭形式的样本估计器,使其在各种机器学习应用中使用方便和直观。最后,我们将提出的GCSD应用于两个具有挑战性的机器学习任务,即基于深度学习的聚类和多源域适应问题。实验结果展示了GCSD在这两个任务中的出色性能,突显了它在涉及量化多个分布的机器学习领域中的潜在应用。

更新时间: 2024-05-16 05:11:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04061v2

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences. One potential solution for the long sequence problem is to utilize distributed clusters to parallelize the computation of attention modules across multiple devices (e.g., GPUs). However, adopting a distributed approach inevitably introduces extra memory overheads to store local attention results and incurs additional communication costs to aggregate local results into global ones. In this paper, we propose a distributed attention framework named ``BurstAttention'' to optimize memory access and communication operations at both the global cluster and local device levels. In our experiments, we compare BurstAttention with other competitive distributed attention solutions for long sequence processing. The experimental results under different length settings demonstrate that BurstAttention offers significant advantages for processing long sequences compared with these competitive baselines, reducing 40% communication overheads and achieving 1.37 X speedup during training 128K sequence length on 32 X A100.

Updated: 2024-05-16 05:08:13

标题: 爆发关注:一种用于极长序列的高效分布式关注框架

摘要: 有效的注意力模块在基于Transformer的大型语言模型(LLMs)的成功中发挥了关键作用,但这些注意力模块的二次时间和内存复杂性在处理长序列时也带来挑战。解决长序列问题的一个潜在解决方案是利用分布式集群来并行计算注意力模块在多个设备(例如GPU)上。然而,采用分布式方法不可避免地会引入额外的内存开销来存储本地注意力结果,并产生额外的通信成本将本地结果聚合成全局结果。本文提出了一个名为“BurstAttention”的分布式注意力框架,以优化全局集群和本地设备级别的内存访问和通信操作。在我们的实验中,我们将BurstAttention与其他竞争性的分布式注意力解决方案进行了比较,以处理长序列。在不同长度设置下的实验结果表明,与这些竞争基线相比,BurstAttention在处理长序列时提供了显著优势,减少了40%的通信开销,并在在32 X A100上训练128K序列长度时实现了1.37倍的加速。

更新时间: 2024-05-16 05:08:13

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2403.09347v3

MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis

Diffusion models have recently gained significant traction due to their ability to generate high-fidelity and diverse images and videos conditioned on text prompts. In medicine, this application promises to address the critical challenge of data scarcity, a consequence of barriers in data sharing, stringent patient privacy regulations, and disparities in patient population and demographics. By generating realistic and varying medical 2D and 3D images, these models offer a rich, privacy-respecting resource for algorithmic training and research. To this end, we introduce MediSyn, a pair of instruction-tuned text-guided latent diffusion models with the ability to generate high-fidelity and diverse medical 2D and 3D images across specialties and modalities. Through established metrics, we show significant improvement in broad medical image and video synthesis guided by text prompts.

Updated: 2024-05-16 04:28:44

标题: MediSyn: 基于文本引导的扩展医学2D和3D图像合成扩散模型

摘要: 最近,扩散模型因其能够根据文本提示生成高保真度和多样化的图像和视频而受到重视。在医学领域,这种应用有望解决数据稀缺的关键挑战,这是数据共享障碍、严格的患者隐私法规以及患者人口和人口统计数据差异的结果。通过生成逼真且多样化的医学二维和三维图像,这些模型为算法训练和研究提供了丰富且尊重隐私的资源。为此,我们介绍了MediSyn,一对经过指导的文本引导潜在扩散模型,具有生成高保真度和多样化的医学二维和三维图像的能力,涵盖各种专业和模态。通过已建立的度量标准,我们展示了在广泛的医学图像和视频合成中,在文本提示的指导下的显著改进。

更新时间: 2024-05-16 04:28:44

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.09806v1

SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data

Traditional security mechanisms isolate resources from users who should not access them. We reflect the compositional nature of such security mechanisms back into the structure of LLMs to build a provably secure LLM; that we term SecureLLM. Other approaches to LLM safety attempt to protect against bad actors or bad outcomes, but can only do so to an extent making them inappropriate for sensitive data. SecureLLM blends access security with fine-tuning methods. Each data silo has associated with it a separate fine-tuning and a user has access only to the collection of fine-tunings that they have permission for. The model must then perform on compositional tasks at the intersection of those data silos with the combination of those individual fine-tunings. While applicable to any task like document QA or making API calls, in this work we concern ourselves with models that learn the layouts of new SQL databases to provide natural-language-to-SQL translation capabilities. Existing fine-tuning composition methods fail in this challenging environment, as they are not well-equipped for handling compositional tasks. Compositionality remains a challenge for LLMs. We contribute both a difficult new compositional natural-language-to-SQL translation task and a new perspective on LLM security that allows models to be deployed to secure environments today.

Updated: 2024-05-16 04:25:53

标题: SecureLLM:使用组合性构建可证明安全的语言模型,用于私人、敏感和秘密数据

摘要: 传统的安全机制将资源与不应访问它们的用户隔离开来。我们将这种安全机制的组合性质反映到LLMs的结构中,构建一个经过证明安全的LLM,我们称之为SecureLLM。其他LLM安全方法试图保护免受不良行为者或不良结果的影响,但只能在一定程度上做到这一点,因此不适合处理敏感数据。SecureLLM将访问安全性与微调方法结合起来。每个数据储存区都有一个单独的微调方法,用户只能访问他们被授权访问的微调集合。模型必须在这些数据储存区的交集上执行组合任务,结合这些单独的微调方法。虽然适用于文档问答或进行API调用等任何任务,但在这项工作中,我们关注学习新SQL数据库布局以提供自然语言到SQL翻译能力的模型。现有的微调组合方法在这个具有挑战性的环境中失败,因为它们不适用于处理组合任务。组合性仍然是LLMs的挑战。我们提出了一个新的具有挑战性的组合自然语言到SQL翻译任务,以及一个新的LLM安全视角,允许模型今天部署到安全环境中。

更新时间: 2024-05-16 04:25:53

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2405.09805v1

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.

Updated: 2024-05-16 04:25:13

标题: 用大规模语言模型测试学习增强的网络物理系统:一种正式方法

摘要: 将机器学习(ML)整合到物联网系统(CPS)中,提供了显著的好处,包括增强效率、预测能力、实时响应性以及实现自主操作。这种融合加速了各种实际应用的开发和部署,如自动驾驶车辆、送货无人机、服务机器人和远程医疗程序。然而,AI注入的CPS的软件开发生命周期(SDLC)与传统方法有着显著的分歧,其中数据和学习是两个关键组成部分。现有的验证和验证技术通常不足以应对这些新范式。在本研究中,我们指出了确保学习使能的CPS的形式安全性主要挑战。我们首先从将测试作为验证和验证的最实用方法开始,总结当前最先进的方法论。鉴于当前测试方法在提供形式安全保证方面存在的局限性,我们提出了一个路线图,从基础概率测试过渡到更严格的方法,能够提供形式保证。

更新时间: 2024-05-16 04:25:13

领域: cs.SE,cs.AI,cs.DC,cs.RO

下载: http://arxiv.org/abs/2311.07377v3

Analysis and Predictive Modeling of Solar Coronal Holes Using Computer Vision and LSTM Networks

In the era of space exploration, coronal holes on the sun play a significant role due to their impact on satellites and aircraft through their open magnetic fields and increased solar wind emissions. This study employs computer vision techniques to detect coronal hole regions and estimate their sizes using imagery from the Solar Dynamics Observatory (SDO). Additionally, we utilize deep learning methods, specifically Long Short-Term Memory (LSTM) networks, to analyze trends in the area of coronal holes and predict their areas across various solar regions over a span of seven days. By examining time series data, we aim to identify patterns in coronal hole behavior and understand their potential effects on space weather. This research enhances our ability to anticipate and prepare for space weather events that could affect Earth's technological systems.

Updated: 2024-05-16 04:21:09

标题: 使用计算机视觉和LSTM网络分析和预测太阳日冕空洞

摘要: 在太空探索时代,太阳上的日冕空洞由于其对卫星和飞机的影响而发挥着重要作用,这是因为它们的开放磁场和增加的太阳风排放。本研究利用计算机视觉技术检测太阳动力学观测卫星(SDO)的图像中的日冕空洞区域并估计它们的大小。此外,我们利用深度学习方法,特别是长短期记忆(LSTM)网络,分析日冕空洞面积的趋势,并预测在七天内跨越各种太阳区域的面积。通过检查时间序列数据,我们旨在识别日冕空洞行为中的模式,并了解它们对空间天气的潜在影响。这项研究增强了我们预测和准备可能影响地球技术系统的空间天气事件的能力。

更新时间: 2024-05-16 04:21:09

领域: astro-ph.SR,astro-ph.EP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09802v1

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.

Updated: 2024-05-16 04:13:17

标题: 流形集成梯度:黎曼几何用于特征归因

摘要: 在这篇论文中,我们探讨了集成梯度(IG)的可靠性问题,这是一种常见的黑盒深度学习模型特征归因方法。我们特别关注IG所面临的两个主要挑战:为视觉模型生成嘈杂的特征可视化以及容易受到对抗性归因攻击的影响。我们的方法涉及基于路径的特征归因的调整,将归因路径更加贴近数据流形的固有几何性质。我们的实验利用深度生成模型应用于几个真实世界的图像数据集。实验结果表明,沿着测地线的IG符合黎曼数据流形的曲线几何特性,产生更具感知直觉的解释,并随后显著增加了对有针对性的归因攻击的鲁棒性。

更新时间: 2024-05-16 04:13:17

领域: cs.LG,cs.HC,math.DG

下载: http://arxiv.org/abs/2405.09800v1

Many-Shot In-Context Learning in Multimodal Foundation Models

Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an opportunity to explore their capability to perform ICL with many more demonstrating examples. In this work, we evaluate the performance of multimodal foundation models scaling from few-shot to many-shot ICL. We benchmark GPT-4o and Gemini 1.5 Pro across 10 datasets spanning multiple domains (natural imagery, medical imagery, remote sensing, and molecular imagery) and tasks (multi-class, multi-label, and fine-grained classification). We observe that many-shot ICL, including up to almost 2,000 multimodal demonstrating examples, leads to substantial improvements compared to few-shot (<100 examples) ICL across all of the datasets. Further, Gemini 1.5 Pro performance continues to improve log-linearly up to the maximum number of tested examples on many datasets. Given the high inference costs associated with the long prompts required for many-shot ICL, we also explore the impact of batching multiple queries in a single API call. We show that batching up to 50 queries can lead to performance improvements under zero-shot and many-shot ICL, with substantial gains in the zero-shot setting on multiple datasets, while drastically reducing per-query cost and latency. Finally, we measure ICL data efficiency of the models, or the rate at which the models learn from more demonstrating examples. We find that while GPT-4o and Gemini 1.5 Pro achieve similar zero-shot performance across the datasets, Gemini 1.5 Pro exhibits higher ICL data efficiency than GPT-4o on most datasets. Our results suggest that many-shot ICL could enable users to efficiently adapt multimodal foundation models to new applications and domains. Our codebase is publicly available at https://github.com/stanfordmlgroup/ManyICL .

Updated: 2024-05-16 04:02:43

标题: 多镜头上下文学习在多模态基础模型中的应用

摘要: 大型语言模型以在上下文中学习时的少样本效果而闻名。最近,多模态基础模型的进展使得具有前所未有的长上下文窗口,为探索它们在执行更多演示示例的ICL的能力提供了机会。在这项工作中,我们评估了从少样本扩展到多样本ICL的多模态基础模型的性能。我们在跨多个领域(自然图像、医学图像、遥感和分子图像)和任务(多类、多标签和细粒度分类)的10个数据集上对GPT-4o和Gemini 1.5 Pro进行基准测试。我们观察到,包括几乎2,000个多模态演示示例在内的多样本ICL相比少样本(<100个示例)ICL在所有数据集上都导致了显著的改善。此外,Gemini 1.5 Pro的性能在许多数据集上继续对数线性改善,直至测试示例的最大数量。考虑到需要长提示的多样本ICL所需的高推理成本,我们还探讨了在单个API调用中批处理多个查询的影响。我们发现,将多达50个查询批处理可以在零样本和多样本ICL下导致性能改进,在多个数据集的零样本设置中获得显著的增益,同时大幅降低每次查询的成本和延迟。最后,我们衡量模型的ICL数据效率,即模型从更多演示示例学习的速度。我们发现,虽然GPT-4o和Gemini 1.5 Pro在数据集上实现了类似的零样本性能,但在大多数数据集上Gemini 1.5 Pro表现出比GPT-4o更高的ICL数据效率。我们的结果表明,多样本ICL可以使用户有效地将多模态基础模型适应到新应用和领域。我们的代码库可以在https://github.com/stanfordmlgroup/ManyICL上公开获取。

更新时间: 2024-05-16 04:02:43

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.09798v1

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Generative artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human-AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in an isolated context: they are tightly entangled with the responses and behavior of human users over time. In this position paper, we argue that meaningful safety assurances for these AI technologies can only be achieved by reasoning about how the feedback loop formed by the AI's outputs and human behavior may drive the interaction towards different outcomes. To this end, we envision a high-value window of opportunity to bridge the rapidly growing capabilities of generative AI and the dynamical safety frameworks from control theory, laying a new foundation for human-centered AI safety in the coming decades.

Updated: 2024-05-16 03:52:00

标题: 人工智能安全:生成式人工智能和控制系统安全的后代

摘要: 生成式人工智能(AI)正在以前所未有的规模与人类互动,为巨大的积极影响提供了新的途径,但也引起了广泛关注,因为可能对个人和社会造成伤害。如今,人类与AI安全的主导范式侧重于调整生成模型的输出,使其更好地与人类提供的示例或反馈一致。然而,在现实中,AI模型的输出的后果无法在孤立的环境中确定:它们与人类用户随时间的响应和行为紧密纠缠在一起。在这篇立场论文中,我们认为,对于这些AI技术的有意义的安全保证只能通过思考AI的输出和人类行为形成的反馈循环如何可能将互动驱向不同结果来实现。为此,我们设想一个高价值的机遇窗口,将生成式AI的快速增长能力与控制理论中的动态安全框架联系起来,为未来几十年的以人为中心的AI安全奠定新基础。

更新时间: 2024-05-16 03:52:00

领域: cs.AI,cs.CY,cs.SY,eess.SY,I.2

下载: http://arxiv.org/abs/2405.09794v1

From Matching to Generation: A Survey on Generative Information Retrieval

Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching to return ranked lists of documents, have been reliable means of information acquisition, dominating the IR field for years. With the advancement of pre-trained language models, generative information retrieval (GenIR) has emerged as a novel paradigm, gaining increasing attention in recent years. Currently, research in GenIR can be categorized into two aspects: generative document retrieval (GR) and reliable response generation. GR leverages the generative model's parameters for memorizing documents, enabling retrieval by directly generating relevant document identifiers without explicit indexing. Reliable response generation, on the other hand, employs language models to directly generate the information users seek, breaking the limitations of traditional IR in terms of document granularity and relevance matching, offering more flexibility, efficiency, and creativity, thus better meeting practical needs. This paper aims to systematically review the latest research progress in GenIR. We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant. We also review the evaluation, challenges and future prospects in GenIR systems. This review aims to offer a comprehensive reference for researchers in the GenIR field, encouraging further development in this area.

Updated: 2024-05-16 03:28:28

标题: 从匹配到生成:生成式信息检索综述

摘要: 信息检索(IR)系统是用户访问信息的关键工具,在搜索引擎、问答系统和推荐系统等场景中得到广泛应用。传统的IR方法基于相似性匹配返回排名列表的文档,多年来一直是信息获取的可靠手段,主导着IR领域。随着预训练语言模型的进步,生成式信息检索(GenIR)作为一种新的范式出现,近年来受到越来越多的关注。目前,GenIR的研究可分为两个方面:生成式文档检索(GR)和可靠响应生成。GR利用生成模型的参数来记忆文档,使得可以直接生成相关文档标识符进行检索,而无需显式索引。另一方面,可靠响应生成则利用语言模型直接生成用户所需的信息,打破了传统IR在文档粒度和相关性匹配方面的限制,提供更灵活、高效和创造性的解决方案,从而更好地满足实际需求。本文旨在系统地回顾GenIR领域的最新研究进展。我们将总结GR方面的进展,包括模型训练、文档标识符、增量学习、下游任务适应、多模式GR和生成式推荐,以及可靠响应生成方面的内部知识记忆、外部知识增强、生成带引文的响应和个人信息助手。我们还将回顾GenIR系统在评估、挑战和未来前景方面的情况。这篇综述旨在为GenIR领域的研究人员提供全面的参考,鼓励进一步发展这一领域。

更新时间: 2024-05-16 03:28:28

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.14851v3

Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps.

Updated: 2024-05-16 03:23:57

标题: 对2023年BraTS颅内脑膜瘤分割挑战的分析

摘要: 我们描述了BraTS 2023颅内脑膜瘤分割挑战的设计和结果。BraTS脑膜瘤挑战与之前的BraTS胶质瘤挑战不同,它侧重于脑膜瘤,这些通常是良性的额外轴性肿瘤,具有多样化的放射学和解剖学表现,并且有多发倾向。九个参与团队分别使用来自迄今为止最大的多机构系统性专家注释的多标记多序列脑膜瘤MRI数据集的图像数据,开发了深度学习自动分割模型,该数据集包括1000个训练集病例,141个验证集病例和283个隐藏的测试集病例。每个病例包括具有相关肿瘤区分标签的T2、T2/FLAIR、T1和T1Gd脑MRI序列,标明增强肿瘤、非增强肿瘤和周围非增强T2/FLAIR高信号强度。参与者的自动分割模型根据评估病灶的指标,包括Dice相似系数(DSC)和95%豪斯多夫距离,进行评估和排名。排名第一的团队在增强肿瘤、肿瘤核心和整个肿瘤的病灶中介数Dice相似系数(DSC)分别为0.976、0.976和0.964,相应平均DSC分别为0.899、0.904和0.871。这些结果为未来术前脑膜瘤自动分割算法提供了最新的基准。此外,我们发现1424例中有1286例(90.3%)至少有一个区段像素与去除颅骨图像边缘相接触,这需要进一步研究最佳预处理面部匿名化步骤。

更新时间: 2024-05-16 03:23:57

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.09787v1

IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a 'firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., parameter-oriented scaling consistency (PSC), where the prediction confidences of poisoned samples are significantly more consistent than those of benign ones when amplifying model parameters. In particular, we provide theoretical analysis to safeguard the foundations of the PSC phenomenon. We also design an adaptive method to select BN layers to scale up for effective detection. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our IBD-PSC method and its resistance to adaptive attacks.

Updated: 2024-05-16 03:19:52

标题: IBD-PSC:基于参数定向缩放一致性的输入级后门检测

摘要: 深度神经网络(DNNs)容易受到后门攻击的影响,即对手可以在模型训练过程中植入隐藏的后门,恶意触发模型误分类。本文提出了一种简单而有效的输入级后门检测方法(称为IBD-PSC),作为一种“防火墙”来过滤恶意测试图像。我们的方法受到一个有趣的现象的启发,即参数取向缩放一致性(PSC),即在放大模型参数时,受污染样本的预测置信度明显比良性样本更一致。特别地,我们提供了理论分析来保障PSC现象的基础。我们还设计了一种自适应方法来选择BN层进行放大以实现有效检测。在基准数据集上进行了大量实验,验证了我们的IBD-PSC方法的有效性和效率,以及其对自适应攻击的抵抗力。

更新时间: 2024-05-16 03:19:52

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.09786v1

Online bipartite matching with imperfect advice

We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.

Updated: 2024-05-16 03:04:33

标题: 在线二分匹配中的不完美建议

摘要: 我们研究了具有$n$个离线顶点和$n$个在线顶点的在线无权二部匹配问题,希望能够在竞争中击败最佳的离线算法。尽管Karp等人的经典RANKING算法[1990]可以证明获得竞争比率为$1-1/e > 1/2$,我们表明在对抗性到达模型下,没有学习增强方法既能够是1一致的,又能严格优于$1/2$鲁棒。同时,在随机到达模型下,我们展示了如何利用分布测试方法设计一个算法,该算法接受有关在线顶点的外部建议,并可以证明实现竞争比率,该比率在无建议方法和最优比率1之间插值,具体取决于建议的质量。

更新时间: 2024-05-16 03:04:33

领域: cs.LG,cs.AI,cs.DS,stat.ML

下载: http://arxiv.org/abs/2405.09784v1

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.

Updated: 2024-05-16 03:04:10

标题: LLM和模拟作为双层优化器:推动物理科学发现的新范式

摘要: 最近,大型语言模型在科学发现中引起了广泛关注,因为它们具有丰富的知识和先进的推理能力。然而,它们在有效模拟观察反馈并将其与语言相结合以推动物理科学发现方面遇到挑战。相反,人类科学家通过制定假设、进行实验和通过观察分析修订理论来进行科学发现。受此启发,我们提出利用模拟的计算能力来增强LLMs的知识驱动、抽象推理能力。我们引入了科学生成代理(SGA),这是一个双层优化框架:LLMs充当具有知识和多功能思维的人员,提出科学假设并推理离散组件,比如物理方程或分子结构;同时,模拟作为实验平台,提供观察反馈并通过可微分性优化连续部分,比如物理参数。我们进行了大量实验,证明了我们的框架在材料本构定律发现和分子设计方面的有效性,揭示了与传统人类期望不同但在分析上仍保持连贯的新颖解决方案。

更新时间: 2024-05-16 03:04:10

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2405.09783v1

MIMIC: Masked Image Modeling with Image Correspondences

Dense pixel-specific representation learning at scale has been bottlenecked due to the unavailability of large-scale multi-view datasets. Current methods for building effective pretraining datasets heavily rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments, preventing them from building datasets from real-world data sources where such metadata is lacking. We propose a pretraining dataset-curation approach that does not require any additional annotations. Our method allows us to generate multi-view datasets from both real-world videos and simulated environments at scale. Specifically, we experiment with two scales: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs. We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1.7%), and surface normals estimation on Taskonomy (2.05%). For dense tasks which also require object understanding, we outperform MULTIVIEW-HABITAT, on semantic segmentation on ADE20K (3.89%), pose estimation on MSCOCO (9.4%), and reduce the gap with models pre-trained on the object-centric expensive ImageNet-1K. We outperform even when the representations are frozen, and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.

Updated: 2024-05-16 03:03:37

标题: MIMIC:具有图像对应关系的遮罩图像建模

摘要: 由于缺乏大规模多视图数据集,密集像素特定表示学习在规模上受到了瓶颈的限制。目前建立有效的预训练数据集的方法严重依赖于来自模拟环境的标记的3D网格、点云和相机参数,这阻止了它们从缺乏此类元数据的现实世界数据源构建数据集。我们提出了一种不需要任何额外注释的预训练数据集策划方法。我们的方法允许我们从真实世界视频和模拟环境中生成规模化的多视图数据集。具体来说,我们在两个规模上进行实验:MIMIC-1M(130万)和MIMIC-3M(310万)多视图图像对。我们训练多个具有不同遮蔽图像建模目标的模型,以展示以下发现:在我们自动生成的MIMIC-3M上训练的表示优于从昂贵的众包数据集(ImageNet-1K)和从合成环境学习的表示(MULTIVIEW-HABITAT)学到的表示,在两个密集几何任务上表现出色:在NYUv2上进行深度估计(1.7%),在Taskonomy上进行表面法线估计(2.05%)。对于还需要对象理解的密集任务,我们在ADE20K上的语义分割(3.89%),在MSCOCO上的姿势估计(9.4%)上优于MULTIVIEW-HABITAT,并减少了与在对象为中心的昂贵ImageNet-1K上预训练的模型之间的差距。我们甚至在表示被冻结时和下游训练数据受限于少量数据时也取得了优势。更大的数据集(MIMIC-3M)显著提高了性能,这是令人鼓舞的,因为我们的策划方法可以任意扩展以生成更大的数据集。MIMIC代码、数据集和预训练模型在https://github.com/RAIVNLab/MIMIC上开源。

更新时间: 2024-05-16 03:03:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.15128v4

An Independent Implementation of Quantum Machine Learning Algorithms in Qiskit for Genomic Data

In this paper, we explore the power of Quantum Machine Learning as we extend, implement and evaluate algorithms like Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), and Quantum Neural Networks (QNN) in Qiskit with diverse feature mapping techniques for genomic sequence classification.

Updated: 2024-05-16 03:00:41

标题: 一个独立实现的基因组数据量子机器学习算法在Qiskit中的应用

摘要: 在这篇论文中,我们探讨了量子机器学习的力量,通过扩展、实施和评估诸如量子支持向量分类器(QSVC)、Pegasos-QSVC、变分量子电路(VQC)和量子神经网络(QNN)等算法在Qiskit中的应用,使用多样的特征映射技术进行基因组序列分类。

更新时间: 2024-05-16 03:00:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.09781v1

Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation

Algorithm selection, a critical process of automated machine learning, aims to identify the most suitable algorithm for solving a specific problem prior to execution. Mainstream algorithm selection techniques heavily rely on problem features, while the role of algorithm features remains largely unexplored. Due to the intrinsic complexity of algorithms, effective methods for universally extracting algorithm information are lacking. This paper takes a significant step towards bridging this gap by introducing Large Language Models (LLMs) into algorithm selection for the first time. By comprehending the code text, LLM not only captures the structural and semantic aspects of the algorithm, but also demonstrates contextual awareness and library function understanding. The high-dimensional algorithm representation extracted by LLM, after undergoing a feature selection module, is combined with the problem representation and passed to the similarity calculation module. The selected algorithm is determined by the matching degree between a given problem and different algorithms. Extensive experiments validate the performance superiority of the proposed model and the efficacy of each key module. Furthermore, we present a theoretical upper bound on model complexity, showcasing the influence of algorithm representation and feature selection modules. This provides valuable theoretical guidance for the practical implementation of our method.

Updated: 2024-05-16 02:54:25

标题: 大型语言模型增强算法选择:朝着全面算法表示的方向

摘要: 算法选择是自动化机器学习的关键过程,旨在在执行之前确定解决特定问题最合适的算法。主流算法选择技术主要依赖于问题特征,而算法特征的作用仍然很少被探索。由于算法的固有复杂性,普遍提取算法信息的有效方法缺乏。本文首次将大型语言模型(LLMs)引入算法选择,以填补这一差距。通过理解代码文本,LLM不仅捕捉算法的结构和语义方面,还展示了上下文意识和库函数理解。经过特征选择模块提取的高维算法表示与问题表征结合并传递给相似度计算模块。选定的算法是通过给定问题和不同算法之间的匹配度来确定的。广泛的实验证明了所提出模型的性能优越性以及每个关键模块的有效性。此外,我们提出了模型复杂度的理论上限,展示了算法表示和特征选择模块的影响。这为我们方法的实际实施提供了有价值的理论指导。

更新时间: 2024-05-16 02:54:25

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2311.13184v3

A Theoretical Computer Science Perspective on Free Will

We consider the paradoxical concept of free will from the perspective of Theoretical Computer Science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations.

Updated: 2024-05-16 02:39:49

标题: 一个理论计算机科学对自由意志的视角

摘要: 我们从理论计算机科学(TCS)的角度考虑了自由意志这一自相矛盾的概念,TCS是数学的一个分支,关注于理解计算和复杂性的基本原则,包括资源限制的影响和令人惊讶的后果。

更新时间: 2024-05-16 02:39:49

领域: cs.CC,cs.AI,68-02,I.2; F.0

下载: http://arxiv.org/abs/2206.13942v5

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.

Updated: 2024-05-16 02:37:29

标题: 使用大型语言模型通过自对准学习机器人技能的奖励

摘要: 学习奖励函数仍然是为机器人装备广泛技能的瓶颈。大型语言模型(LLM)包含有价值的与任务相关的知识,可以潜在地帮助学习奖励函数。然而,所提出的奖励函数可能不精确,因此无效,需要进一步通过环境信息进行深入研究。我们提出了一种在没有人类参与情况下更有效地学习奖励的方法。我们的方法包括两个组成部分:首先使用LLM提出奖励的特征和参数化,然后通过迭代自我对准过程更新参数。特别是,该过程通过执行反馈减少LLM和学习奖励函数之间的排序不一致性。该方法在2个模拟环境中跨9个任务上进行了验证。它表现出与替代基于突变的方法相比的持续改进训练效果和效率,同时消耗的GPT令牌数量明显更少。

更新时间: 2024-05-16 02:37:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.07162v3

PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability Against Parameter Corruptions

Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults (e.g., bit flips) that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.

Updated: 2024-05-16 02:30:58

标题: PVF(参数脆弱性因子):一种量化度量AI对参数损坏脆弱性的指标

摘要: 人工智能系统的可靠性是成功部署和广泛采用人工智能技术的根本关注点。不幸的是,人工智能硬件系统的不断复杂化和异构化使其越来越容易受到硬件故障(例如,位翻转)的影响,这可能潜在地损坏模型参数。当这种情况发生在人工智能推理/服务过程中时,可能导致用户获得不正确或降级的模型输出,最终影响人工智能服务的质量和可靠性。鉴于不断升级的威胁,关键问题是至关重要的:人工智能模型对参数损坏有多脆弱,模型的不同组件(例如模块、层)如何展示出不同的参数损坏脆弱性?为了系统地解决这个问题,我们提出了一种新颖的定量指标,参数脆弱因子(PVF),受到计算机体系结构领域的架构脆弱因子(AVF)的启发,旨在标准化对人工智能模型对参数损坏的脆弱性量化。我们将模型参数的PVF定义为该特定模型参数发生损坏导致输出不正确的概率。在本文中,我们展示了将PVF应用于推理过程中三种类型的任务/模型的几个用例——推荐(DLRM)、视觉分类(CNN)和文本分类(BERT),同时对DLRM进行了深入的脆弱性分析。PVF可以为人工智能硬件设计师提供关键见解,以在故障保护和性能/效率之间取得平衡,例如将脆弱的人工智能参数组件映射到受保护良好的硬件模块。PVF指标适用于任何人工智能模型,并有潜力帮助统一和标准化人工智能脆弱性/韧性评估实践。

更新时间: 2024-05-16 02:30:58

领域: cs.CR,cs.AI,cs.AR,cs.LG

下载: http://arxiv.org/abs/2405.01741v2

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Reinforcement Learning from Human Feedback (RLHF) is key to aligning Large Language Models (LLMs), typically paired with the Proximal Policy Optimization (PPO) algorithm. While PPO is a powerful method designed for general reinforcement learning tasks, it is overly sophisticated for LLMs, leading to laborious hyper-parameter tuning and significant computation burdens. To make RLHF efficient, we present ReMax, which leverages 3 properties of RLHF: fast simulation, deterministic transitions, and trajectory-level rewards. These properties are not exploited in PPO, making it less suitable for RLHF. Building on the renowned REINFORCE algorithm, ReMax does not require training an additional value model as in PPO and is further enhanced with a new variance reduction technique. ReMax offers several benefits over PPO: it is simpler to implement, eliminates more than 4 hyper-parameters in PPO, reduces GPU memory usage, and shortens training time. ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO. Applying ReMax to a Mistral-7B model resulted in a 94.78% win rate on the AlpacaEval leaderboard and a 7.739 score on MT-bench, setting a new SOTA for open-source 7B models. These results show the effectiveness of ReMax while addressing the limitations of PPO in LLMs.

Updated: 2024-05-16 02:22:23

标题: ReMax:一种简单、有效和高效的强化学习方法,用于对齐大型语言模型

摘要: 人类反馈的强化学习(RLHF)是与Proximal Policy Optimization(PPO)算法配对的大型语言模型(LLMs)对齐的关键。虽然PPO是一种设计用于一般强化学习任务的强大方法,但对于LLMs来说过于复杂,导致了繁琐的超参数调整和显着的计算负担。为了使RLHF高效,我们提出了ReMax,它利用了RLHF的3个属性:快速模拟、确定性转换和轨迹级奖励。这些属性在PPO中没有被利用,使其不太适合RLHF。基于著名的REINFORCE算法,ReMax不需要像PPO那样训练额外的值模型,并进一步增强了一种新的方差减少技术。ReMax相对于PPO提供了几个优点:实现更简单,消除了PPO中超过4个超参数,减少了GPU内存使用量,并缩短了训练时间。当训练一个7B模型时,ReMax可以节省大约46%的GPU内存,并且可以在A800-80GB GPU上进行训练,而无需PPO所需的节省内存的卸载技术。将ReMax应用于Mistral-7B模型,在AlpacaEval排行榜上获得了94.78%的胜率,MT-bench得分为7.739,为开源7B模型设定了新的SOTA。这些结果展示了ReMax的有效性,同时解决了PPO在LLMs中的局限性。

更新时间: 2024-05-16 02:22:23

领域: cs.LG

下载: http://arxiv.org/abs/2310.10505v4

Harmonizing Generalization and Personalization in Federated Prompt Learning

Federated Prompt Learning (FPL) incorporates large pre-trained Vision-Language models (VLM) into federated learning through prompt tuning. The transferable representations and remarkable generalization capacity of VLM make them highly compatible with the integration of federated learning. Addressing data heterogeneity in federated learning requires personalization, but excessive focus on it across clients could compromise the model's ability to generalize effectively. To preserve the impressive generalization capability of VLM, it is crucial to strike a balance between personalization and generalization in FPL. To tackle this challenge, we proposed Federated Prompt Learning with CLIP Generalization and low-rank Personalization (FedPGP), which employs pre-trained CLIP to provide knowledge-guidance on the global prompt for improved generalization and incorporates a low-rank adaptation term to personalize the global prompt. Further, FedPGP integrates a prompt-wise contrastive loss to achieve knowledge guidance and personalized adaptation simultaneously, enabling a harmonious balance between personalization and generalization in FPL. We conduct extensive experiments on various datasets to explore base-to-novel generalization in both category-level and domain-level scenarios with heterogeneous data, showing the superiority of FedPGP in balancing generalization and personalization.

Updated: 2024-05-16 02:22:09

标题: 在联邦式即时学习中协调概括和个性化

摘要: Federated Prompt Learning (FPL)通过提示调整将大型预训练的视觉语言模型(VLM)整合进联邦学习中。VLM的可传递表示和出色的泛化能力使它们与联邦学习的整合高度兼容。在联邦学习中解决数据异质性需要个性化,但是在各个客户端过度关注个性化可能会影响模型有效泛化的能力。为了保持VLM令人印象深刻的泛化能力,关键是在FPL中在个性化和泛化之间取得平衡。为了应对这一挑战,我们提出了具有CLIP泛化和低秩个性化的联邦提示学习(FedPGP),它利用预训练的CLIP对全局提示提供知识指导,以改善泛化,并结合低秩调整项来个性化全局提示。此外,FedPGP集成了一个基于提示的对比损失,同时实现知识指导和个性化调整,实现FPL中个性化和泛化之间的和谐平衡。我们在各种数据集上进行了广泛的实验,探索了在异质数据的类别级和领域级场景中的从基础到新颖的泛化,展示了FedPGP在平衡泛化和个性化方面的优越性。

更新时间: 2024-05-16 02:22:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.09771v1

Optimization Techniques for Sentiment Analysis Based on LLM (GPT-3)

With the rapid development of natural language processing (NLP) technology, large-scale pre-trained language models such as GPT-3 have become a popular research object in NLP field. This paper aims to explore sentiment analysis optimization techniques based on large pre-trained language models such as GPT-3 to improve model performance and effect and further promote the development of natural language processing (NLP). By introducing the importance of sentiment analysis and the limitations of traditional methods, GPT-3 and Fine-tuning techniques are introduced in this paper, and their applications in sentiment analysis are explained in detail. The experimental results show that the Fine-tuning technique can optimize GPT-3 model and obtain good performance in sentiment analysis task. This study provides an important reference for future sentiment analysis using large-scale language models.

Updated: 2024-05-16 02:21:13

标题: 基于LLM(GPT-3)的情感分析优化技术

摘要: 随着自然语言处理(NLP)技术的迅速发展,像GPT-3这样的大规模预训练语言模型已经成为NLP领域的研究热点。本文旨在探讨基于大型预训练语言模型(如GPT-3)的情感分析优化技术,以提高模型性能和效果,并进一步推动自然语言处理(NLP)的发展。通过介绍情感分析的重要性和传统方法的局限性,本文引入了GPT-3和Fine-tuning技术,并详细解释了它们在情感分析中的应用。实验结果表明,Fine-tuning技术可以优化GPT-3模型,在情感分析任务中取得良好的性能。本研究为未来利用大规模语言模型进行情感分析提供了重要参考。

更新时间: 2024-05-16 02:21:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09770v1

Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.

Updated: 2024-05-16 02:11:03

标题: 在超维空间中的无监督提取式对话摘要

摘要: 我们提出了HyperSum,这是一个提取式摘要框架,它既保留了传统词汇摘要的效率,又具有当代神经方法的准确性。HyperSum利用在极高维度上随机初始化向量时出现的伪正交性(“维度祝福”)来构建具有代表性和高效的句子嵌入。简单地对获得的嵌入进行聚类并提取它们的中心点可以得到有竞争力的摘要。HyperSum通常优于最先进的摘要器——无论是在摘要准确性还是忠实性方面——同时速度快10至100倍。我们将HyperSum开源作为一种强大的基准线用于无监督提取式摘要。

更新时间: 2024-05-16 02:11:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09765v1

Fusion Intelligence: Confluence of Natural and Artificial Intelligence for Enhanced Problem-Solving Efficiency

This paper introduces Fusion Intelligence (FI), a bio-inspired intelligent system, where the innate sensing, intelligence and unique actuation abilities of biological organisms such as bees and ants are integrated with the computational power of Artificial Intelligence (AI). This interdisciplinary field seeks to create systems that are not only smart but also adaptive and responsive in ways that mimic the nature. As FI evolves, it holds the promise of revolutionizing the way we approach complex problems, leveraging the best of both biological and digital worlds to create solutions that are more effective, sustainable, and harmonious with the environment. We demonstrate FI's potential to enhance agricultural IoT system performance through a simulated case study on improving insect pollination efficacy (entomophily).

Updated: 2024-05-16 02:10:30

标题: 融合智能:自然智能与人工智能的交融,提升问题解决效率

摘要: 本文介绍了融合智能(FI),这是一种受生物启发的智能系统,将蜜蜂和蚂蚁等生物组织的固有感知、智能和独特的执行能力与人工智能(AI)的计算能力集成在一起。这一跨学科领域旨在创建既智能又能够模仿自然的方式适应和响应的系统。随着FI的发展,它有望彻底改变我们解决复杂问题的方式,利用生物和数字世界的优势,创造更加有效、可持续和与环境和谐的解决方案。我们通过一个模拟案例研究展示了FI提升农业物联网系统性能的潜力,该案例研究是通过改善昆虫授粉效率(昆虫媒介作用)来实现的。

更新时间: 2024-05-16 02:10:30

领域: cs.AI,cs.ET,cs.MA,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.09763v1

Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the frequency and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.

Updated: 2024-05-16 01:57:23

标题: 动态数据集上的近似最近邻搜索:一项调查

摘要: 近似k最近邻(ANN)方法通常用于在大规模高维数据集上挖掘信息和辅助机器学习。ANN方法通常在用于加速搜索的索引结构上有所不同,导致各种召回率/运行时间的折衷点。对于静态数据集的应用,可以利用运行时约束和数据集属性来经验性地选择具有适当运行特性的ANN方法。然而,对于经常在线更改(如添加新样本)的动态数据集的应用,目前尚无共识关于哪些ANN方法最合适。传统评估方法并未考虑更新索引结构的计算成本,以及更新索引的频率和大小。为了解决这个问题,我们在考虑这些因素的同时,对5种流行的ANN方法在两个主要应用(在线数据收集和在线特征学习)上进行了经验评估。使用了两个动态数据集,分别来自包含100万样本的SIFT1M数据集和包含10亿样本的DEEP1B数据集。结果表明,通常使用的k-d树方法在动态数据集上不适用,因为它比直接基线穷举搜索方法慢。对于在线数据收集,分层可导航小世界图方法在各种召回率范围内都实现了相对于基线的一致加速。对于在线特征学习,可扩展最近邻方法在召回率低于75%时比基线更快。

更新时间: 2024-05-16 01:57:23

领域: cs.LG

下载: http://arxiv.org/abs/2404.19284v2

Robust Point Matching with Distance Profiles

While matching procedures based on pairwise distances are conceptually appealing and thus favored in practice, theoretical guarantees for such procedures are rarely found in the literature. We propose and analyze matching procedures based on distance profiles that are easily implementable in practice, showing these procedures are robust to outliers and noise. We demonstrate the performance of the proposed method using a real data example and provide simulation studies to complement the theoretical findings.

Updated: 2024-05-16 01:53:58

标题: 具有距离配置文件的稳健点匹配

摘要: 尽管基于成对距离的匹配程序在概念上很吸引人,因此在实践中备受青睐,但在文献中很少找到这些程序的理论保证。我们提出并分析了基于距离概要的匹配程序,这些程序在实践中易于实施,表明这些程序对异常值和噪声具有鲁棒性。我们使用真实数据示例展示了所提出方法的性能,并提供模拟研究以补充理论发现。

更新时间: 2024-05-16 01:53:58

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2312.12641v2

Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates

Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they make, and any potential bottlenecks for durable interventions. In this short paper, we focus on these questions in the context of cryptocurrency giveaway scams, where victims are tricked into irreversibly transferring funds to scammers under the pretense of even greater returns. Combining data from Twitter, YouTube and Twitch livestreams, landing pages, and cryptocurrency blockchains, we measure how giveaway scams operate at scale. We find that 1 in 1000 scam tweets, and 4 in 100,000 livestream views, net a victim, and that scammers managed to extract nearly \$4.62 million from just hundreds of victims during our measurement window.

Updated: 2024-05-16 01:50:50

标题: 给予与收受:关于免费赠品诈骗转化率的端到端调查

摘要: 骗局——旨在从受害者那里骗取钱财的欺诈计划——自有记载历史以来就存在。然而,互联网的低通信成本、全球覆盖和功能性匿名性的结合使得骗局数量达到了新的高度。设计有效的干预措施首先需要了解背景:骗子是如何接触潜在受害者的、他们所获得的收益以及任何可能成为持久干预障碍的瓶颈。在这篇简短的论文中,我们关注加密货币赠送骗局的情况,其中受害者在假扮可以获得更大回报的借口下被骗将资金不可逆地转移到骗子手中。通过结合来自Twitter、YouTube和Twitch直播、落地页以及加密货币区块链的数据,我们测量了赠送骗局的规模运营方式。我们发现,每1000条骗局推文中,有1位受害者,每10万次直播观看中,有4位受害者,并且在我们的测量期间,骗子成功从数百名受害者那里骗取了近462万美元。

更新时间: 2024-05-16 01:50:50

领域: cs.CR

下载: http://arxiv.org/abs/2405.09757v1

An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification

In the relentless efforts in enhancing medical diagnostics, the integration of state-of-the-art machine learning methodologies has emerged as a promising research area. In molecular biology, there has been an explosion of data generated from multi-omics sequencing. The advent sequencing equipment can provide large number of complicated measurements per one experiment. Therefore, traditional statistical methods face challenging tasks when dealing with such high dimensional data. However, most of the information contained in these datasets is redundant or unrelated and can be effectively reduced to significantly fewer variables without losing much information. Dimensionality reduction techniques are mathematical procedures that allow for this reduction; they have largely been developed through statistics and machine learning disciplines. The other challenge in medical datasets is having an imbalanced number of samples in the classes, which leads to biased results in machine learning models. This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features, and Generative Adversarial Networks (GAN) to generate synthetic samples. Latent space is the reduced dimensional space that captures the meaningful features of the original data. Our model starts with feature selection to select the discriminative features before feeding them to the neural network. Then, the model predicts the outcome of cancer for different datasets. The proposed model outperformed other existing models by scoring accuracy of 95.09% for bladder cancer dataset and 88.82% for the breast cancer dataset.

Updated: 2024-05-16 01:45:55

标题: 一种用于多组学数据不平衡类处理和分类的自动编码器和生成对抗网络方法

摘要: 在不懈的努力中提升医学诊断的过程中,最新的机器学习方法的整合已经成为一个有前途的研究领域。在分子生物学领域,多组学测序产生了大量数据。新型测序设备可以在一次实验中提供大量复杂的测量数据。因此,当处理这种高维数据时,传统的统计方法面临着挑战。然而,这些数据集中包含的大部分信息是冗余的或不相关的,可以通过降维技术有效地减少到更少的变量而不丢失太多信息。降维技术是一种数学程序,允许进行这种减少;它们主要是通过统计学和机器学习学科开发的。医学数据集中的另一个挑战是类别中样本数量不平衡,这导致机器学习模型产生偏倚的结果。本研究专注于在神经网络中解决这些挑战,该神经网络结合了自编码器来提取特征的潜在空间,以及生成对抗网络(GAN)来生成合成样本。潜在空间是捕获原始数据有意义特征的降维空间。我们的模型从特征选择开始,在将其输入神经网络之前选择具有判别性的特征。然后,模型预测不同数据集的癌症结果。所提出的模型在膀胱癌数据集的准确率为95.09%,在乳腺癌数据集为88.82%,优于其他现有模型。

更新时间: 2024-05-16 01:45:55

领域: cs.LG,cs.NE,q-bio.GN

下载: http://arxiv.org/abs/2405.09756v1

Efficient Data-Driven MPC for Demand Response of Commercial Buildings

Model predictive control (MPC) has been shown to significantly improve the energy efficiency of buildings while maintaining thermal comfort. Data-driven approaches based on neural networks have been proposed to facilitate system modelling. However, such approaches are generally nonconvex and result in computationally intractable optimization problems. In this work, we design a readily implementable energy management method for small commercial buildings. We then leverage our approach to formulate a real-time demand bidding strategy. We propose a data-driven and mixed-integer convex MPC which is solved via derivative-free optimization given a limited computational time of 5 minutes to respect operational constraints. We consider rooftop unit heating, ventilation, and air conditioning systems with discrete controls to accurately model the operation of most commercial buildings. Our approach uses an input convex recurrent neural network to model the thermal dynamics. We apply our approach in several demand response (DR) settings, including a demand bidding, a time-of-use, and a critical peak rebate program. Controller performance is evaluated on a state-of-the-art building simulation. The proposed approach improves thermal comfort while reducing energy consumption and cost through DR participation, when compared to other data-driven approaches or a set-point controller.

Updated: 2024-05-16 01:11:20

标题: 商业建筑需求响应的高效数据驱动型模型预测控制

摘要: 模型预测控制(MPC)已被证明可以显著提高建筑物的能源效率,同时保持热舒适。基于神经网络的数据驱动方法已被提出以促进系统建模。然而,这些方法通常是非凸的,导致计算难以处理的优化问题。在这项工作中,我们设计了一个适用于小型商业建筑的易于实施的能源管理方法。然后,我们利用我们的方法制定了一个实时的需求竞标策略。我们提出了一种数据驱动和混合整数凸MPC,通过无导数优化求解,给定有限的计算时间为5分钟,以遵守操作约束。我们考虑了具有离散控制的屋顶单元供暖、通风和空调系统,以准确模拟大多数商业建筑的运行。我们的方法使用输入凸递归神经网络来建模热动力学。我们将我们的方法应用于几种需求响应(DR)设置,包括需求竞标、分时用电和关键峰值回扣计划。控制器性能在最先进的建筑模拟中进行评估。与其他数据驱动方法或设定点控制器相比,所提出的方法通过DR参与改善了热舒适度,同时降低了能源消耗和成本。

更新时间: 2024-05-16 01:11:20

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2401.15742v2

NIFTY Financial News Headlines Dataset

We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt's context length systematically) are available on Hugging Face at https://huggingface.co/datasets/raeidsaqur/NIFTY.

Updated: 2024-05-16 01:09:33

标题: NIFTY 金融新闻头条数据集

摘要: 我们介绍并公开了NIFTY金融新闻头条数据集,旨在促进并推动利用大型语言模型(LLMs)进行金融市场预测研究。该数据集包括两个不同版本,分别针对不同的建模方法:(i)NIFTY-LM,旨在通过自回归、因果语言建模目标对LLMs进行有监督微调(SFT),(ii)NIFTY-RL,专门格式化为对齐方法(如从人类反馈中强化学习(RLHF))通过拒绝抽样和奖励建模来对齐LLMs。每个数据集版本提供了经过筛选的高质量数据,包括全面的元数据、市场指数和系统性过滤和排名的去重金融新闻头条,以适应现代LLM框架。我们还包括一些实验,展示了数据集在股价变动和LLM嵌入在信息获取/丰富性中的应用。NIFTY数据集及其实用工具(如系统性地截断提示的上下文长度)可在Hugging Face上获得,网址为https://huggingface.co/datasets/raeidsaqur/NIFTY。

更新时间: 2024-05-16 01:09:33

领域: q-fin.CP,cs.LG

下载: http://arxiv.org/abs/2405.09747v1

Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts

Task-oriented dialogue systems are broadly used in virtual assistants and other automated services, providing interfaces between users and machines to facilitate specific tasks. Nowadays, task-oriented dialogue systems have greatly benefited from pre-trained language models (PLMs). However, their task-solving performance is constrained by the inherent capacities of PLMs, and scaling these models is expensive and complex as the model size becomes larger. To address these challenges, we propose Soft Mixture-of-Expert Task-Oriented Dialogue system (SMETOD) which leverages an ensemble of Mixture-of-Experts (MoEs) to excel at subproblems and generate specialized outputs for task-oriented dialogues. SMETOD also scales up a task-oriented dialogue system with simplicity and flexibility while maintaining inference efficiency. We extensively evaluate our model on three benchmark functionalities: intent prediction, dialogue state tracking, and dialogue response generation. Experimental results demonstrate that SMETOD achieves state-of-the-art performance on most evaluated metrics. Moreover, comparisons against existing strong baselines show that SMETOD has a great advantage in the cost of inference and correctness in problem-solving.

Updated: 2024-05-16 01:02:09

标题: 众人拾柴火焰高:基于模块化专家混合的任务导向对话系统

摘要: 任务导向对话系统广泛用于虚拟助手和其他自动化服务,为用户和机器提供界面以促进特定任务的完成。如今,任务导向对话系统从预训练语言模型(PLMs)中获益良多。然而,它们的任务解决性能受限于PLMs的固有能力,随着模型规模的增大,扩展这些模型变得昂贵且复杂。为了解决这些挑战,我们提出了Soft Mixture-of-Expert Task-Oriented Dialogue system(SMETOD),它利用专家混合体(MoEs)的集成在子问题上表现出色,并为任务导向对话生成专业化输出。SMETOD还通过简单性和灵活性扩展了任务导向对话系统,同时保持推理效率。我们在三个基准功能上对我们的模型进行了广泛评估:意图预测、对话状态跟踪和对话响应生成。实验结果表明,SMETOD在大多数评估指标上取得了最先进的性能。此外,与现有强基线方法的比较表明,SMETOD在推理成本和问题解决正确性方面具有明显优势。

更新时间: 2024-05-16 01:02:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09744v1

Random Scaling and Momentum for Non-smooth Non-convex Optimization

Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which classical analysis applies only if the loss is either convex or smooth. We show that a very small modification to SGDM closes this gap: simply scale the update at each time point by an exponentially distributed random scalar. The resulting algorithm achieves optimal convergence guarantees. Intriguingly, this result is not derived by a specific analysis of SGDM: instead, it falls naturally out of a more general framework for converting online convex optimization algorithms to non-convex optimization algorithms.

Updated: 2024-05-16 00:52:03

标题: 随机缩放和动量用于非光滑非凸优化

摘要: 训练神经网络需要优化一个可能非常不规则的损失函数,特别是既不是凸的也不是平滑的。流行的训练算法基于带有动量的随机梯度下降(SGDM),对于这种算法,只有在损失函数是凸或平滑的情况下才适用经典分析。我们展示了对SGDM进行非常小的修改可以弥补这一差距:在每个时间点简单地通过指数分布的随机标量来缩放更新。这种算法实现了最佳的收敛保证。有趣的是,这个结果并不是通过对SGDM的具体分析得出的:相反,它自然地脱离了一个更一般的框架,用于将在线凸优化算法转换为非凸优化算法。

更新时间: 2024-05-16 00:52:03

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.09742v1

Remembering Transformer for Continual Learning

Neural networks encounter the challenge of Catastrophic Forgetting (CF) in continual learning, where new task learning interferes with previously learned knowledge. Existing data fine-tuning and regularization methods necessitate task identity information during inference and cannot eliminate interference among different tasks, while soft parameter sharing approaches encounter the problem of an increasing model parameter size. To tackle these challenges, we propose the Remembering Transformer, inspired by the brain's Complementary Learning Systems (CLS). Remembering Transformer employs a mixture-of-adapters architecture and a generative model-based novelty detection mechanism in a pretrained Transformer to alleviate CF. Remembering Transformer dynamically routes task data to the most relevant adapter with enhanced parameter efficiency based on knowledge distillation. We conducted extensive experiments, including ablation studies on the novelty detection mechanism and model capacity of the mixture-of-adapters, in a broad range of class-incremental split tasks and permutation tasks. Our approach demonstrated SOTA performance surpassing the second-best method by 15.90% in the split tasks, reducing the memory footprint from 11.18M to 0.22M in the five splits CIFAR10 task.

Updated: 2024-05-16 00:12:11

标题: "持续学习的记忆变压器"

摘要: 神经网络在持续学习中遇到了灾难性遗忘(CF)的挑战,即新任务学习干扰先前学习的知识。现有的数据微调和正则化方法需要在推断过程中的任务标识信息,并且无法消除不同任务之间的干扰,而软参数共享方法会遇到模型参数尺寸增加的问题。为了解决这些挑战,我们提出了“Remembering Transformer”,灵感来自大脑的互补学习系统(CLS)。Remembering Transformer采用了一种适配器混合架构和基于生成模型的新颖性检测机制,在预训练的Transformer中缓解CF。Remembering Transformer根据知识蒸馏,动态地将任务数据路由到最相关的适配器,提高参数效率。我们进行了大量实验,包括对新颖性检测机制和适配器混合模型容量的消融研究,在广泛的增量分割任务和排列任务中。我们的方法在分割任务中表现出了领先的性能,比第二好的方法提高了15.90%,在五个分割的CIFAR10任务中将内存占用从11.18M减少到0.22M。

更新时间: 2024-05-16 00:12:11

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.07518v3

By Xinhai (Sean) Zou.