Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning
Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities using generative AI models ranging from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated data compared to the original, as the generator continuously self-learns from its outputs. This degradation can lead to the potential risk of catastrophic forgetting (CF) occurring in the classifier. To address this, we propose the Gradient Projection Class-Prototype Conditional Diffusion Model (GPPDM), a GR-based approach for continual learning that enhances image quality in generators and thus reduces the CF in classifiers. The cornerstone of GPPDM is a learnable class prototype that captures the core characteristics of images in a given class. This prototype, integrated into the diffusion model's denoising process, ensures the generation of high-quality images of the old tasks, hence reducing the risk of CF in classifiers. Moreover, to further mitigate the CF of diffusion models, we propose a gradient projection technique tailored for the cross-attention layer of diffusion models to maximally maintain and preserve the representations of old task data in the current task as close as possible to their representations when they first arrived. Our empirical studies on diverse datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art models, highlighting its satisfactory ability to preserve image quality and enhance the model's memory retention.
Updated: 2024-03-21 23:57:13
标题: Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning 持续学习的类-原型条件扩散模型与梯度投影
摘要: 减轻灾难性遗忘是继续学习中的一个关键障碍。深度生成回放(GR)提供了专注于利用生成式人工智能模型(从生成对抗网络(GANs)到更近期的扩散模型(DMs))生成先前任务样本以增强模型记忆能力的技术。一个主要问题是生成的数据质量与原始数据相比的恶化,因为生成器不断从其输出中自学习。这种退化可能导致分类器中发生灾难性遗忘(CF)的潜在风险。为了解决这个问题,我们提出了梯度投影类原型条件扩散模型(GPPDM),这是一种基于GR的持续学习方法,可以增强生成器的图像质量,从而降低分类器中的CF。GPPDM的核心是一个可学习的类原型,捕捉给定类别图像的核心特征。将这个原型整合到扩散模型的降噪过程中,可以确保生成旧任务的高质量图像,从而降低分类器中发生CF的风险。此外,为了进一步减轻扩散模型的CF,我们提出了一种针对扩散模型的交叉注意力层定制的梯度投影技术,最大程度地保持和保留旧任务数据在当前任务中的表示,使其尽可能接近它们最初到达时的表示。我们在不同数据集上的实证研究表明,我们提出的方法明显优于现有的最先进模型,突显了其满意的保留图像质量和增强模型记忆保留能力的能力。
更新时间: 2024-03-21 23:57:13
领域: cs.LG
AutoRE: Document-Level Relation Extraction with Large Language Models
Large Language Models (LLMs) have demonstrated exceptional abilities in comprehending and generating text, motivating numerous researchers to utilize them for Information Extraction (IE) purposes, including Relation Extraction (RE). Nonetheless, most existing methods are predominantly designed for Sentence-level Relation Extraction (SentRE) tasks, which typically encompass a restricted set of relations and triplet facts within a single sentence. Furthermore, certain approaches resort to treating relations as candidate choices integrated into prompt templates, leading to inefficient processing and suboptimal performance when tackling Document-Level Relation Extraction (DocRE) tasks, which entail handling multiple relations and triplet facts distributed across a given document, posing distinct challenges. To overcome these limitations, we introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts). Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios. Additionally, we have developed an easily extensible RE framework using a Parameters Efficient Fine Tuning (PEFT) algorithm (QLoRA). Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results, surpassing TAG by 10.03% and 9.03% respectively on the dev and test set.
Updated: 2024-03-21 23:48:21
标题: AutoRE: 使用大型语言模型进行文档级关系抽取
摘要: 大型语言模型(LLMs)已经展示出在理解和生成文本方面的异常能力,激励了许多研究人员利用它们进行信息抽取(IE)目的,包括关系抽取(RE)。然而,大多数现有方法主要设计用于句子级关系抽取(SentRE)任务,这些任务通常涵盖了单个句子内的一组关系和三元组事实。此外,某些方法采用将关系视为集成到提示模板中的候选选择的方式,导致处理文档级关系抽取(DocRE)任务时效率低下,性能不佳,这些任务需要处理分布在给定文档中的多个关系和三元组事实,提出了不同的挑战。为了克服这些限制,我们引入了AutoRE,一种采用名为RHF(Relation-Head-Facts)的新型RE提取范式的端到端DocRE模型。与现有方法不同,AutoRE不依赖于已知关系选项的假设,使其更符合现实场景。此外,我们还开发了一个易于扩展的RE框架,使用参数高效微调(PEFT)算法(QLoRA)。我们在RE-DocRED数据集上的实验展示了AutoRE的最佳性能,取得了最新的结果,分别在开发集和测试集上超过TAG 10.03%和9.03%。
更新时间: 2024-03-21 23:48:21
领域: cs.CL,cs.AI
Establishing a leader in a pairwise comparisons method
Abstract Like electoral systems, decision-making methods are also vulnerable to manipulation by decision-makers. The ability to effectively defend against such threats can only come from thoroughly understanding the manipulation mechanisms. In the presented article, we show two algorithms that can be used to launch a manipulation attack. They allow for equating the weights of two selected alternatives in the pairwise comparison method and, consequently, choosing a leader. The theoretical considerations are accompanied by a Monte Carlo simulation showing the relationship between the size of the PC matrix, the degree of inconsistency, and the ease of manipulation. This work is a continuation of our previous research published in the paper (Szybowski et al., 2023)
Updated: 2024-03-21 23:42:00
标题: 建立一种两两比较方法中的领导者
摘要: 摘要:与选举制度一样,决策方法也容易受到决策者的操纵。有效防范此类威胁的能力只能来自对操纵机制的彻底了解。在本文中,我们展示了两种可以用于发起操纵攻击的算法。它们允许在成对比较方法中将两个选择的替代方案的权重相等,从而选择领导者。理论考虑与蒙特卡罗模拟相结合,展示了PC矩阵的规模、不一致程度和操纵的便利性之间的关系。本研究是我们之前发表的研究工作的延续(Szybowski等人,2023年)。
更新时间: 2024-03-21 23:42:00
领域: cs.AI,cs.CR,cs.CY,cs.DM
Noisy Interpolation Learning with Shallow Univariate ReLU Networks
Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. 2022 noted that neural networks seem to often exhibit ``tempered overfitting'', wherein the population risk does not converge to the Bayes optimal error, but neither does it approach infinity, yielding non-trivial generalization. However, this has not been studied rigorously. We provide the first rigorous analysis of the overfitting behavior of regression with minimum norm ($\ell_2$ of weights), focusing on univariate two-layer ReLU networks. We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et. al., and overfitting is catastrophic with respect to the $L_2$ loss, or when taking an expectation over the training set.
Updated: 2024-03-21 23:38:52
标题: 带有浅层单变量ReLU网络的噪声插值学习
摘要: 理解超参数化神经网络如何在嘈杂训练数据完美插值的情况下泛化是一个基本问题。Mallinar等人2022年指出,神经网络似乎经常表现出"温和的过拟合",其中总体风险不会收敛到贝叶斯最优错误,但也不会无限接近,从而产生非平凡的泛化。然而,这尚未得到严格研究。我们提供了对带有最小范数(权重的$\ell_2$)的回归超参数化行为的第一次严格分析,重点放在单变量两层ReLU网络上。我们展示了过拟合在$ L_1 $损失的情况下是温和的(概率很高),但也表明情况比Mallinar等人所建议的更复杂,过拟合在$ L_2 $损失方面是灾难性的,或者当对训练集进行期望时。
更新时间: 2024-03-21 23:38:52
领域: cs.LG
Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Perhaps of independent interest, we formulate and analyze the stability of such data-driven models in the presence of noise. The Youla-Kucera approach requires a stable "parameter" for controller design. For the training of reinforcement learning agents, the set of all stable linear operators is given explicitly through a matrix factorization approach. Moreover, a nonlinear extension is given using a neural network to express a parameterized set of stable operators, which enables seamless integration with standard deep learning libraries. Finally, we show how these ideas can also be applied to tune fixed-structure controllers.
Updated: 2024-03-21 22:49:40
标题: 稳定性强化学习控制:优化所有稳定行为的模块化框架
摘要: 我们提出了一个用于设计反馈控制器的框架,结合了深度强化学习的优化驱动和无模型优势,以及使用Youla-Kucera参数化提供的稳定性保证。最近在行为系统方面的进展使我们能够构建一个数据驱动的内部模型;这使我们能够基于输入输出探索数据完全实现Youla-Kucera参数化的另一种方式。或许是独立感兴趣的是,我们制定并分析了在噪声存在的情况下这种数据驱动模型的稳定性。Youla-Kucera方法需要一个稳定的"参数"用于控制器设计。对于强化学习代理的训练,所有稳定线性算子的集合通过矩阵分解方法明确给出。此外,通过使用神经网络来表示一组稳定算子的参数化集合,给出了一个非线性扩展,从而实现与标准深度学习库的无缝集成。最后,我们展示了如何将这些想法应用于调整固定结构的控制器。
更新时间: 2024-03-21 22:49:40
领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC
WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather
We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first semantic segmentation dataset with accurate clear and adverse weather image pairs that share an underlying scene. Through this dataset, we analyze the error modes in existing models and found that they were sensitive to the highly complex combination of different weather effects induced on the image during capture. To improve robustness, we propose a way to use language as guidance by identifying contributions of adverse weather conditions and injecting that as "side information". Models trained using our language guidance exhibit performance gains by up to 10.2% in mIoU on WeatherProof, up to 8.44% in mIoU on the widely used ACDC dataset compared to standard training techniques, and up to 6.21% in mIoU on the ACDC dataset as compared to previous SOTA methods.
Updated: 2024-03-21 22:46:27
标题: 防水:利用语言指导在恶劣天气条件下进行语义分割
摘要: 我们提出了一种方法,可以从在恶劣天气条件下拍摄的图像中推断语义分割地图。我们首先检查了在受到雨、雾或雪等天气条件影响下被破坏的图像上的现有模型,并发现它们与在晴朗天气下拍摄的图像相比表现出明显的性能下降。为了控制场景结构的变化,我们提出了WeatherProof,这是第一个具有准确的晴朗和恶劣天气图像对的语义分割数据集,这些图像共享相同的场景。通过这个数据集,我们分析了现有模型的错误模式,发现它们对于在拍摄过程中施加在图像上的不同天气效应的高度复杂组合非常敏感。为了提高鲁棒性,我们提出了一种使用语言作为指导的方法,通过识别恶劣天气条件的影响并将其注入为“辅助信息”。使用我们的语言指导训练的模型在WeatherProof上的mIoU性能提高了高达10.2%,在广泛使用的ACDC数据集上的mIoU性能提高了高达8.44%,与标准训练技术相比,在ACDC数据集上的mIoU性能提高了高达6.21%,与先前的SOTA方法相比。
更新时间: 2024-03-21 22:46:27
领域: cs.CV,cs.LG
Structuring the Chaos: Enabling Small Business Cyber-Security Risks & Assets Modelling with a UML Class Model
Small businesses are increasingly adopting IT, and consequently becoming more vulnerable to cyber-incidents. Whilst small businesses are aware of the cyber-security risks, many struggle with implementing mitigations. Some of these can be traced to fundamental differences in the characteristics of small business versus large enterprises where modern cyber-security solutions are widely deployed. Small business specific cyber-security tools are needed. Currently available cyber-security tools and standards assume technical expertise and time resources often not practical for small businesses. Cyber-security competes with other roles that small business owners take on, e.g. cleaning, sales etc. A small business model, salient and implementable at-scale, with simplified non-specialist terminologies and presentation is needed to encourage sustained participation of all stakeholders, not just technical ones. We propose a new UML class (Small IT Data (SITD)) model to support the often chaotic information-gathering phase of a small business' first foray into cyber-security. The SITD model is designed in the UML format to help small business implement technical solutions. The SITD model structure stays relevant by using generic classes and structures that evolve with technology and environmental changes. The SITD model keeps security decisions proportionate to the business by highlighting relationships between business strategy tasks and IT infrastructure. We construct a set of design principles to address small business cyber-security needs. Model components are designed in response to these needs. The uses of the SITD model are then demonstrated and design principles validated by examining a case study of a real small business operational and IT information. The SITD model's ability to illustrate breach information is also demonstrated using the NotPetya incident.
Updated: 2024-03-21 22:41:28
标题: 整理混乱:利用UML类模型实现小企业网络安全风险和资产建模
摘要: 小企业越来越多地采用信息技术,因此越来越容易受到网络安全事件的影响。虽然小企业意识到网络安全风险,但许多企业在实施缓解措施方面存在困难。其中一些问题可以追溯到小企业与大型企业之间的基本特征上,在大型企业中,现代网络安全解决方案得到广泛应用。 需要小企业专用的网络安全工具。目前可用的网络安全工具和标准假设了技术专业知识和时间资源,这通常对小企业来说并不切实际。网络安全需要与小企业所有者承担的其他角色相竞争,例如清洁、销售等。需要一种小企业模型,具有显著性和可实施性,采用简化的非专业术语和表述,以鼓励所有利益相关者的持续参与,而不仅仅是技术人员。 我们提出了一个新的UML类(Small IT Data(SITD))模型,以支持小企业首次进入网络安全时常常混乱的信息收集阶段。SITD模型设计为UML格式,以帮助小企业实施技术解决方案。通过使用随着技术和环境变化而发展的通用类和结构,SITD模型结构保持相关性。SITD模型通过强调业务战略任务与IT基础设施之间的关系,使安全决策与企业规模相称。 我们制定了一套设计原则,以满足小企业网络安全需求。模型组件是为了响应这些需求而设计的。然后通过对真实小企业运营和IT信息的案例研究来演示SITD模型的用途,并通过审查NotPetya事件来验证设计原则中的有效性,展示了SITD模型展示违规信息的能力。
更新时间: 2024-03-21 22:41:28
领域: cs.CR
VidLA: Video-Language Alignment at Scale
In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos. By employing a simple two-tower architecture, we are able to initialize our video-language model with pretrained image-text foundation models, thereby boosting the final performance. Second, existing video-language alignment works struggle due to the lack of semantically aligned large-scale training data. To overcome it, we leverage recent LLMs to curate the largest video-language dataset to date with better visual grounding. Furthermore, unlike existing video-text datasets which only contain short clips, our dataset is enriched with video clips of varying durations to aid our temporally hierarchical data tokens in extracting better representations at varying temporal scales. Overall, empirical results show that our proposed approach surpasses state-of-the-art methods on multiple retrieval benchmarks, especially on longer videos, and performs competitively on classification benchmarks.
Updated: 2024-03-21 22:36:24
标题: VidLA:规模化视频语言对齐
摘要: 在本文中,我们提出了VidLA,一种用于大规模视频-语言对齐的方法。先前的视频-语言对齐方法存在两个主要限制。首先,它们不能捕捉短期和长期的时间依赖关系,通常采用复杂的分层深度网络架构,难以与现有的预训练图像-文本基础模型集成。为了有效解决这一限制,我们选择保持网络架构简单,并使用一组在层次结构中以不同时间分辨率运行的数据令牌,考虑视频的时间分层性质。通过采用简单的双塔架构,我们能够使用预训练的图像-文本基础模型初始化我们的视频-语言模型,从而提升最终性能。其次,现有的视频-语言对齐工作由于缺乏语义对齐的大规模训练数据而面临困难。为了克服这一问题,我们利用最近的LLMs来筛选迄今为止最大的视频-语言数据集,具有更好的视觉基础。此外,与现有的仅包含短视频片段的视频-文本数据集不同,我们的数据集包含不同长度的视频片段,以帮助我们的时间分层数据令牌在不同时间尺度上提取更好的表示。总体而言,实证结果表明我们提出的方法在多个检索基准上超越了最先进的方法,特别是在较长的视频上,并在分类基准上表现竞争力。
更新时间: 2024-03-21 22:36:24
领域: cs.CV,cs.CL,cs.LG
Sequential Decision-Making for Inline Text Autocomplete
Autocomplete suggestions are fundamental to modern text entry systems, with applications in domains such as messaging and email composition. Typically, autocomplete suggestions are generated from a language model with a confidence threshold. However, this threshold does not directly take into account the cognitive load imposed on the user by surfacing suggestions, such as the effort to switch contexts from typing to reading the suggestion, and the time to decide whether to accept the suggestion. In this paper, we study the problem of improving inline autocomplete suggestions in text entry systems via a sequential decision-making formulation, and use reinforcement learning to learn suggestion policies through repeated interactions with a target user over time. This formulation allows us to factor cognitive load into the objective of training an autocomplete model, through a reward function based on text entry speed. We acquired theoretical and experimental evidence that, under certain objectives, the sequential decision-making formulation of the autocomplete problem provides a better suggestion policy than myopic single-step reasoning. However, aligning these objectives with real users requires further exploration. In particular, we hypothesize that the objectives under which sequential decision-making can improve autocomplete systems are not tailored solely to text entry speed, but more broadly to metrics such as user satisfaction and convenience.
Updated: 2024-03-21 22:33:16
标题: 连续决策用于在线文本自动补全
摘要: 自动完成建议对于现代文本输入系统至关重要,在领域应用中,如消息和电子邮件的撰写。通常,自动完成建议是从带有置信度阈值的语言模型生成的。然而,这个阈值并没有直接考虑到用户在浮现建议时所承受的认知负荷,比如从打字到阅读建议的切换上下文的努力,以及决定是否接受建议的时间。在本文中,我们通过顺序决策制定了一个问题,研究了通过与目标用户随时间重复交互来改进文本输入系统中内联自动完成建议的问题,并使用强化学习来学习建议策略。这种制定允许我们将认知负荷因素纳入训练自动完成模型的目标中,通过基于文本输入速度的奖励函数。我们获得了理论和实验证据,证明在某些目标下,自动完成问题的顺序决策制定比短视的单步推理提供了更好的建议策略。然而,将这些目标与真实用户对齐需要进一步探索。特别是,我们假设,通过顺序决策制定可以改善自动完成系统的目标不仅仅局限于文本输入速度,而更广泛地涵盖用户满意度和便利性等指标。
更新时间: 2024-03-21 22:33:16
领域: cs.CL,cs.HC,cs.LG
Distribution-informed and wavelength-flexible data-driven photoacoustic oximetry
Significance: Photoacoustic imaging (PAI) promises to measure spatially-resolved blood oxygen saturation, but suffers from a lack of accurate and robust spectral unmixing methods to deliver on this promise. Accurate blood oxygenation estimation could have important clinical applications, from cancer detection to quantifying inflammation. Aim: This study addresses the inflexibility of existing data-driven methods for estimating blood oxygenation in PAI by introducing a recurrent neural network architecture. Approach: We created 25 simulated training dataset variations to assess neural network performance. We used a long short-term memory network to implement a wavelength-flexible network architecture and proposed the Jensen-Shannon divergence to predict the most suitable training dataset. Results: The network architecture can handle arbitrary input wavelengths and outperforms linear unmixing and the previously proposed learned spectral decolouring method. Small changes in the training data significantly affect the accuracy of our method, but we find that the Jensen-Shannon divergence correlates with the estimation error and is thus suitable for predicting the most appropriate training datasets for any given application. Conclusions: A flexible data-driven network architecture combined with the Jensen-Shannon Divergence to predict the best training data set provides a promising direction that might enable robust data-driven photoacoustic oximetry for clinical use cases.
Updated: 2024-03-21 22:18:25
标题: 基于分布信息和波长灵活性的数据驱动光声血氧测量学
摘要: 意义:光声成像(PAI)有望测量具有空间分辨率的血氧饱和度,但缺乏准确和稳健的光谱分离方法来实现这一承诺。准确的血氧估计可能在癌症检测和炎症定量等临床应用中起重要作用。 目的:本研究旨在通过引入一种递归神经网络架构来解决PAI中现有基于数据的方法对血氧估计的不灵活性。 方法:我们创建了25个模拟训练数据集变化以评估神经网络的性能。我们使用长短期记忆网络实现了一种波长灵活的网络架构,并提出了Jensen-Shannon散度来预测最适合的训练数据集。 结果:该网络架构可以处理任意输入波长,并优于线性混合和先前提出的学习光谱去色方法。训练数据中的微小变化显著影响我们方法的准确性,但我们发现Jensen-Shannon散度与估计误差相关,因此适用于预测任何给定应用的最适合的训练数据集。 结论:灵活的基于数据的网络架构结合Jensen-Shannon散度来预测最佳训练数据集,提供了一个有希望的方向,可能实现用于临床应用的稳健基于数据驱动的光声血氧测定。
更新时间: 2024-03-21 22:18:25
领域: physics.med-ph,cs.CV,cs.LG,F.2.1
Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control
We introduce $\mathcal{L}_1$-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the $\mathcal{L}_1$ adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with $\mathcal{L}_1$ augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.
Updated: 2024-03-21 22:15:09
标题: 使用$\mathcal{L}_1$自适应控制的鲁棒模型基础强化学习
摘要: 我们引入了$\mathcal{L}_1$-MBRL,这是一种用于基于模型的强化学习(MBRL)算法的控制理论增强方案。与无模型方法不同,MBRL算法通过数据学习转移函数的模型,并利用它来设计控制输入。我们的方法根据提出的切换规律生成一系列学习到的转移函数的近似控制仿射模型。使用近似模型,基础MBRL生成的控制输入通过$\mathcal{L}_1$自适应控制进行扰动,旨在增强系统对不确定性的鲁棒性。重要的是,这种方法不受MBRL算法选择的影响,可以与各种MBRL算法一起使用。具有$\mathcal{L}_1$增强的MBRL算法在多个MuJoCo环境中表现出卓越的性能和样本效率,优于原始的MBRL算法,无论是否存在系统噪声。
更新时间: 2024-03-21 22:15:09
领域: eess.SY,cs.LG,cs.SY
Comparing Plausibility Estimates in Base and Instruction-Tuned Large Language Models
Instruction-tuned LLMs can respond to explicit queries formulated as prompts, which greatly facilitates interaction with human users. However, prompt-based approaches might not always be able to tap into the wealth of implicit knowledge acquired by LLMs during pre-training. This paper presents a comprehensive study of ways to evaluate semantic plausibility in LLMs. We compare base and instruction-tuned LLM performance on an English sentence plausibility task via (a) explicit prompting and (b) implicit estimation via direct readout of the probabilities models assign to strings. Experiment 1 shows that, across model architectures and plausibility datasets, (i) log likelihood ($\textit{LL}$) scores are the most reliable indicator of sentence plausibility, with zero-shot prompting yielding inconsistent and typically poor results; (ii) $\textit{LL}$-based performance is still inferior to human performance; (iii) instruction-tuned models have worse $\textit{LL}$-based performance than base models. In Experiment 2, we show that $\textit{LL}$ scores across models are modulated by context in the expected way, showing high performance on three metrics of context-sensitive plausibility and providing a direct match to explicit human plausibility judgments. Overall, $\textit{LL}$ estimates remain a more reliable measure of plausibility in LLMs than direct prompting.
Updated: 2024-03-21 22:08:44
标题: 比较基础和指导调整大型语言模型中的可能性估计
摘要: 指导调整的LLMs可以响应明确的查询,这些查询被形式化为提示,这极大地促进了与人类用户的交互。然而,基于提示的方法并不总能够利用LLMs在预训练过程中获得的隐含知识的丰富性。本文对LLMs中评估语义合理性的方法进行了全面研究。我们通过(a)显式提示和(b)通过模型分配给字符串的概率的直接读出隐含估计,比较基础和指导调整的LLM在英语句子合理性任务上的表现。实验1表明,在模型架构和合理性数据集方面,(i)对数似然($\textit{LL}$)分数是句子合理性最可靠的指标,零样本提示产生不一致且通常较差的结果;(ii)基于$\textit{LL}$的性能仍然不及人类表现;(iii)指导调整的模型比基础模型的基于$\textit{LL}$的性能更差。在实验2中,我们展示了不同模型之间的$\textit{LL}$分数会按预期的方式受到上下文的调节,表现出在三个上下文敏感合理性指标上的高性能,并提供了与明确的人类合理性判断直接匹配的结果。总体而言,$\textit{LL}$估计仍然是LLMs中合理性的更可靠度量,而不是直接提示。
更新时间: 2024-03-21 22:08:44
领域: cs.CL,cs.AI
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such complex domains? To address this question, we introduce TurkingBench, a benchmark of tasks formulated as web pages containing textual instructions with multi-modal context. Unlike existing work which employs artificially synthesized web pages, here we use natural HTML pages that were originally designed for crowdsourcing workers for various annotation purposes. The HTML instructions of each task are also instantiated with various values (obtained from the crowdsourcing tasks) to form new instances of the task. This benchmark contains 32.2K instances distributed across 158 tasks. Additionally, to facilitate the evaluation on TurkingBench, we develop an evaluation framework that connects the responses of chatbots to modifications on web pages (modifying a text box, checking a radio, etc.). We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark. Our findings reveal that these models perform significantly better than random chance, yet considerable room exists for improvement. We hope this benchmark will help facilitate the evaluation and development of web-based agents.
Updated: 2024-03-21 21:57:13
标题: 图灵基准:一个用于网络代理的挑战基准
摘要: 最近的聊天机器人已经展示出了在原始文本形式下理解和交流的惊人能力。然而,世界不仅仅是原始文本。例如,人类花费大量时间在网页上,其中文本与其他模态交织在一起,任务以各种复杂的交互形式完成。最先进的多模态模型能否推广到这样复杂的领域? 为了解决这个问题,我们引入了TurkingBench,一个以包含多模态上下文的文本说明的网页形式的任务基准。与现有工作使用人工合成的网页不同,这里我们使用最初设计用于众包工作者进行各种注释目的的自然HTML页面。每个任务的HTML说明也实例化为各种值(从众包任务中获得)以形成任务的新实例。这个基准包含32.2K个实例,分布在158个任务中。 此外,为了促进对TurkingBench的评估,我们开发了一个评估框架,将聊天机器人的响应与网页上的修改(修改文本框,勾选单选按钮等)联系起来。我们在这个基准上评估了最先进模型的性能,包括仅语言、仅视觉和仅布局模型,以及它们的组合。我们的研究结果显示,这些模型的表现明显优于随机机会,但仍有很大的改进空间。我们希望这个基准可以帮助促进基于网络的代理的评估和开发。
更新时间: 2024-03-21 21:57:13
领域: cs.AI,cs.CL,cs.CV,cs.HC
iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations
Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyTorch-based C++ library equipped with auto-tuned sparse operations. iSpLib expedites GNN training with a cache-enabled backpropagation that stores intermediate matrices in local caches. The library offers a user-friendly Python plug-in that allows users to take advantage of our optimized PyTorch operations out-of-the-box for any existing linear algebra-based PyTorch implementation of popular GNNs (Graph Convolution Network, GraphSAGE, Graph Inference Network, etc.) with only two lines of additional code. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU. Our library is publicly available at https://github.com/HipGraph/iSpLib (https://doi.org/10.5281/zenodo.10806511).
Updated: 2024-03-21 21:56:44
标题: iSpLib:使用自动调整的稀疏运算加速图神经网络的库
摘要: 图神经网络(GNN)训练和推断中的核心计算通常映射到稀疏矩阵运算,例如稀疏-稠密矩阵乘法(SpMM)。这些稀疏操作很难通过手动调优来优化,因为它们的性能在很大程度上取决于输入图的稀疏性、GNN模型和计算平台。为了解决这一挑战,我们提出了iSpLib,一个基于PyTorch的C++库,配备了自动调优的稀疏操作。iSpLib通过启用缓存的反向传播加快了GNN训练,它将中间矩阵存储在本地缓存中。该库提供了一个用户友好的Python插件,允许用户立即利用我们优化的PyTorch操作,用于任何现有的基于线性代数的PyTorch实现流行的GNNs(图卷积网络、GraphSAGE、图推理网络等),只需额外两行代码。我们展示了iSpLib相对于等效的PyTorch 2.1.0和PyTorch Geometric 2.4.0在CPU上实现的整体训练加速高达27倍。我们的库可以在https://github.com/HipGraph/iSpLib (https://doi.org/10.5281/zenodo.10806511) 上公开获得。
更新时间: 2024-03-21 21:56:44
领域: cs.LG,cs.DC,cs.PF
Output-Constrained Lossy Source Coding With Application to Rate-Distortion-Perception Theory
The distortion-rate function of output-constrained lossy source coding with limited common randomness is analyzed for the special case of squared error distortion measure. An explicit expression is obtained when both source and reconstruction distributions are Gaussian. This further leads to a partial characterization of the information-theoretic limit of quadratic Gaussian rate-distortion-perception coding with the perception measure given by Kullback-Leibler divergence or squared quadratic Wasserstein distance.
Updated: 2024-03-21 21:51:36
标题: 受输出限制的有损源编码及其在速率失真感知理论中的应用
摘要: 本文分析了具有有限公共随机性的输出受限有损源编码的失真率函数,特别是对于平方误差失真度量的特殊情况。当源和重构分布均为高斯分布时,得到了一个明确的表达式。这进一步导致了对于以Kullback-Leibler散度或平方二次Wasserstein距离给定感知度量的二次高斯率失真感知编码的信息理论极限的部分特征化。
更新时间: 2024-03-21 21:51:36
领域: cs.IT,cs.LG,math.IT
Learning WENO for entropy stable schemes to solve conservation laws
Entropy conditions play a crucial role in the extraction of a physically relevant solution for a system of conservation laws, thus motivating the construction of entropy stable schemes that satisfy a discrete analogue of such conditions. TeCNO schemes (Fjordholm et al. 2012) form a class of arbitrary high-order entropy stable finite difference solvers, which require specialized reconstruction algorithms satisfying the sign property at each cell interface. Recently, third-order WENO schemes called SP-WENO (Fjordholm and Ray, 2016) and SP-WENOc (Ray, 2018) have been designed to satisfy the sign property. However, these WENO algorithms can perform poorly near shocks, with the numerical solutions exhibiting large spurious oscillations. In the present work, we propose a variant of the SP-WENO, termed as Deep Sign-Preserving WENO (DSP-WENO), where a neural network is trained to learn the WENO weighting strategy. The sign property and third-order accuracy are strongly imposed in the algorithm, which constrains the WENO weight selection region to a convex polygon. Thereafter, a neural network is trained to select the WENO weights from this convex region with the goal of improving the shock-capturing capabilities without sacrificing the rate of convergence in smooth regions. The proposed synergistic approach retains the mathematical framework of the TeCNO scheme while integrating deep learning to remedy the computational issues of the WENO-based reconstruction. We present several numerical experiments to demonstrate the significant improvement with DSP-WENO over the existing variants of WENO satisfying the sign property.
Updated: 2024-03-21 21:39:05
标题: 学习WENO以解决守恒定律的熵稳定格式
摘要: 熵条件在保守律方程组的物理相关解提取中起着至关重要的作用,因此促使构建满足这些条件的离散模拟的熵稳定方案。TeCNO方案(Fjordholm等,2012)构成了一类任意高阶熵稳定有限差分求解器,需要满足每个单元界面的符号属性的专门重构算法。最近,设计了称为SP-WENO(Fjordholm和Ray,2016)和SP-WENOc(Ray,2018)的三阶WENO方案,以满足符号属性。然而,这些WENO算法在冲击附近可能表现不佳,数值解可能出现大幅度的虚假振荡。在本研究中,我们提出了SP-WENO的变体,称为深度保符号WENO(DSP-WENO),其中一个神经网络被训练以学习WENO加权策略。该算法强烈强调了符号属性和三阶精度,将WENO权重选择区域限制为凸多边形。然后,通过训练神经网络在这个凸区域中选择WENO权重,以改善冲击捕获能力,同时不牺牲在平滑区域的收敛速度。所提出的协同方法保留了TeCNO方案的数学框架,同时将深度学习整合到WENO重构的计算问题的解决中。我们进行了几个数值实验,以展示DSP-WENO相对于现有满足符号属性的WENO变体的显着改进。
更新时间: 2024-03-21 21:39:05
领域: math.NA,cs.LG,cs.NA,65M06, 65M12, 35L65, 68T07
Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model
While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and displaying viewer discretion. Nevertheless, applicable predictive methods lack relevance to well-established medical knowledge, which outlines clinically proven external and environmental factors of depression. To account for such medical knowledge, we resort to an emergent methodological discipline, seeded Neural Topic Models (NTMs). However, existing seeded NTMs suffer from the limitations of single-origin topics, unknown topic sources, unclear seed supervision, and suboptimal convergence. To address those challenges, we develop a novel Knowledge-guided Multimodal NTM to predict a short-form video's depressive impact on viewers. Extensive empirical analyses using TikTok and Douyin datasets prove that our method outperforms state-of-the-art benchmarks. Our method also discovers medically relevant topics from videos that are linked to depressive impact. We contribute to IS with a novel video analytics method that is generalizable to other video classification problems. Practically, our method can help platforms understand videos' mental impacts, thus adjusting recommendations and video topic disclosure.
Updated: 2024-03-21 21:37:50
标题: 短视频和心理健康:一种知识引导的神经主题模型
摘要: 随着短视频开始改变整个社交媒体格局,专家们越来越担心它们对观众的抑郁影响,这一点已被医学研究证实。为了防止广泛的后果,平台急于预测这些视频对观众心理健康的影响。随后,他们可以采取干预措施,如修改推荐算法和显示观众谨慎。然而,适用的预测方法缺乏与已确立的医学知识相关性,医学知识概述了抑郁症的临床证明的外部和环境因素。为了考虑这些医学知识,我们借鉴一种新兴的方法论学科,播种神经主题模型(NTMs)。然而,现有的播种NTMs存在单一主题、未知主题来源、不明确的播种监督和收敛不佳的局限性。为了应对这些挑战,我们开发了一种新颖的知识引导多模态NTM,以预测短视频对观众的抑郁影响。使用TikTok和抖音数据集进行了广泛的实证分析,证明我们的方法优于当前最先进的基准。我们的方法还发现了与抑郁影响相关的视频中的医学相关主题。我们通过一种新颖的视频分析方法为信息系统领域做出了贡献,该方法可推广到其他视频分类问题。实际上,我们的方法可以帮助平台了解视频对心理的影响,从而调整推荐和视频主题披露。
更新时间: 2024-03-21 21:37:50
领域: cs.CV,cs.LG
Gene Regulatory Network Inference in the Presence of Dropouts: a Causal View
Gene regulatory network inference (GRNI) is a challenging problem, particularly owing to the presence of zeros in single-cell RNA sequencing data: some are biological zeros representing no gene expression, while some others are technical zeros arising from the sequencing procedure (aka dropouts), which may bias GRNI by distorting the joint distribution of the measured gene expressions. Existing approaches typically handle dropout error via imputation, which may introduce spurious relations as the true joint distribution is generally unidentifiable. To tackle this issue, we introduce a causal graphical model to characterize the dropout mechanism, namely, Causal Dropout Model. We provide a simple yet effective theoretical result: interestingly, the conditional independence (CI) relations in the data with dropouts, after deleting the samples with zero values (regardless if technical or not) for the conditioned variables, are asymptotically identical to the CI relations in the original data without dropouts. This particular test-wise deletion procedure, in which we perform CI tests on the samples without zeros for the conditioned variables, can be seamlessly integrated with existing structure learning approaches including constraint-based and greedy score-based methods, thus giving rise to a principled framework for GRNI in the presence of dropouts. We further show that the causal dropout model can be validated from data, and many existing statistical models to handle dropouts fit into our model as specific parametric instances. Empirical evaluation on synthetic, curated, and real-world experimental transcriptomic data comprehensively demonstrate the efficacy of our method.
Updated: 2024-03-21 21:27:43
标题: 在缺失值存在的情况下基因调控网络推断:因果视角
摘要: 基因调控网络推断(GRNI)是一个具有挑战性的问题,特别是由于单细胞RNA测序数据中存在零值:一些是代表没有基因表达的生物学零值,而另一些是由测序过程(即辍学)产生的技术零值,这可能通过扭曲测量基因表达的联合分布来引起GRNI的偏差。现有方法通常通过插补来处理辍学错误,这可能会引入虚假关系,因为真实的联合分布通常是不可识别的。为了解决这个问题,我们引入了一个因果图模型来表征辍学机制,即因果辍学模型。我们提供了一个简单但有效的理论结果:有趣的是,在具有辍学的数据中,删除具有零值的样本(无论是技术性的还是不是)的条件变量后,条件独立(CI)关系在渐近上与原始数据中没有辍学时的CI关系是相同的。这种特定的测试级删除过程,其中我们对没有零值的样本进行条件变量的CI测试,可以与包括基于约束和贪心评分的方法在内的现有结构学习方法无缝集成,从而为存在辍学情况下的GRNI提供一个原则性框架。我们进一步展示了因果辍学模型可以从数据中验证,并且许多现有的处理辍学的统计模型都适用于我们的模型作为特定的参数实例。对合成、策划和真实世界实验转录组数据的经验评估全面展示了我们方法的有效性。
更新时间: 2024-03-21 21:27:43
领域: q-bio.QM,cs.LG,q-bio.MN
Local Causal Discovery with Linear non-Gaussian Cyclic Models
Local causal discovery is of great practical significance, as there are often situations where the discovery of the global causal structure is unnecessary, and the interest lies solely on a single target variable. Most existing local methods utilize conditional independence relations, providing only a partially directed graph, and assume acyclicity for the ground-truth structure, even though real-world scenarios often involve cycles like feedback mechanisms. In this work, we present a general, unified local causal discovery method with linear non-Gaussian models, whether they are cyclic or acyclic. We extend the application of independent component analysis from the global context to independent subspace analysis, enabling the exact identification of the equivalent local directed structures and causal strengths from the Markov blanket of the target variable. We also propose an alternative regression-based method in the particular acyclic scenarios. Our identifiability results are empirically validated using both synthetic and real-world datasets.
Updated: 2024-03-21 21:27:39
标题: 使用线性非高斯循环模型进行本地因果发现
摘要: 本地因果发现具有很大的实际意义,因为通常情况下,发现全局因果结构是不必要的,兴趣仅仅在于单个目标变量。大多数现有的本地方法利用条件独立关系,仅提供部分有向图,并假定地面真实结构为无环,尽管真实世界的情景通常涉及像反馈机制这样的循环。在这项工作中,我们提出了一种通用的、统一的本地因果发现方法,使用线性非高斯模型,无论它们是循环还是无环的。我们将独立分量分析的应用从全局上下文扩展到独立子空间分析,实现了从目标变量的马尔可夫毯子中精确识别等效的本地有向结构和因果强度。我们还在特定的无环情景中提出了一种基于回归的替代方法。我们的可识别性结果经过合成和真实世界数据集的经验证实。
更新时间: 2024-03-21 21:27:39
领域: cs.LG,cs.AI
Model order reduction of deep structured state-space models: A system-theoretic approach
With a specific emphasis on control design objectives, achieving accurate system modeling with limited complexity is crucial in parametric system identification. The recently introduced deep structured state-space models (SSM), which feature linear dynamical blocks as key constituent components, offer high predictive performance. However, the learned representations often suffer from excessively large model orders, which render them unsuitable for control design purposes. The current paper addresses this challenge by means of system-theoretic model order reduction techniques that target the linear dynamical blocks of SSMs. We introduce two regularization terms which can be incorporated into the training loss for improved model order reduction. In particular, we consider modal $\ell_1$ and Hankel nuclear norm regularization to promote sparsity, allowing one to retain only the relevant states without sacrificing accuracy. The presented regularizers lead to advantages in terms of parsimonious representations and faster inference resulting from the reduced order models. The effectiveness of the proposed methodology is demonstrated using real-world ground vibration data from an aircraft.
Updated: 2024-03-21 21:05:59
标题: 深度结构状态空间模型的模型降阶:系统论方法
摘要: 在控制设计目标方面,使用有限复杂度实现准确的系统建模对于参数化系统识别至关重要。最近引入的深度结构状态空间模型(SSM)具有线性动力学模块作为关键组成部分,具有高预测性能。然而,所学习的表示通常受到过大的模型阶数的影响,使其不适用于控制设计目的。本文通过系统理论模型阶数缩减技术解决了这一挑战,该技术针对SSM的线性动力学模块。我们引入了两种正则化项,可以纳入训练损失中以改进模型阶数缩减。特别地,我们考虑模态$\ell_1$和Hankel核范数正则化来促进稀疏性,从而只保留相关状态而不损失准确性。所提出的正则化项在简洁表示和由减少阶数模型产生的更快推断方面具有优势。提出的方法的有效性通过使用飞机的实际地面振动数据进行了演示。
更新时间: 2024-03-21 21:05:59
领域: cs.LG,cs.SY,eess.SY
Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling
The connections between (convex) optimization and (logconcave) sampling have been considerably enriched in the past decade with many conceptual and mathematical analogies. For instance, the Langevin algorithm can be viewed as a sampling analogue of gradient descent and has condition-number-dependent guarantees on its performance. In the early 1990s, Nesterov and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers, providing efficient algorithms for structured convex optimization, often faster than the general method. This raises the following question: can we develop an analogous IPM for structured sampling problems? In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes, and an improved analysis was given in 2020 by Laddha-Lee-Vempala. The Dikin walk uses a local metric defined by a self-concordant barrier for linear constraints. Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms. Our IPM-based sampling framework provides an efficient warm start and goes beyond uniform distributions and linear constraints. We illustrate the approach on important special cases, in particular giving the fastest algorithms to sample uniform, exponential, or Gaussian distributions on a truncated PSD cone. The framework is general and can be applied to other sampling algorithms.
Updated: 2024-03-21 20:59:59
标题: 高斯冷却和迪金步行:对数凹采样的内点方法
摘要: 在过去的十年中,凸优化和对数凸取样之间的联系得到了大大丰富,有许多概念和数学类比。例如,Langevin算法可以被视为梯度下降的取样模拟,并且其性能有条件数依赖的保证。在上世纪90年代初,Nesterov和Nemirovski基于自共轭障碍物开发了凸优化的内点法(IPM),为结构化凸优化提供了高效算法,通常比一般方法更快。这引出了以下问题:我们是否可以为结构化取样问题开发类似的IPM? 2012年,Kannan和Narayanan提出了Dikin行走算法用于均匀取样多面体,并且在2020年由Laddha-Lee-Vempala提出了改进的分析。Dikin行走使用由线性约束的自共轭障碍物定义的局部度量。在这里,我们通过开发和调整IPM机制以及Dikin行走来推广这种方法,用于多项式时间取样算法。我们基于IPM的取样框架提供了高效的热启动,并且超越了均匀分布和线性约束。我们在重要的特殊情况下说明了这种方法,特别是为截断PSD锥上的均匀、指数或高斯分布提供了最快的算法。该框架是通用的,可以应用于其他取样算法。
更新时间: 2024-03-21 20:59:59
领域: cs.DS,cs.LG,math.OC
Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures
Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.
Updated: 2024-03-21 20:43:44
标题: 深度聚类评估:如何验证内部聚类验证指标
摘要: 深度聚类是一种利用深度神经网络对复杂的高维数据进行分区的方法,它提出了独特的评估挑战。传统的聚类验证方法设计用于低维空间,对于涉及将数据投影到较低维度嵌入中进行分区的深度聚类存在问题。识别出两个关键问题:1) 在将这些方法应用于原始数据时出现的维度灾难,以及2) 在不同聚类模型的训练过程和参数设置中产生的嵌入空间的聚类结果不可靠比较。本文解决了在深度学习中评估聚类质量时面临的这些挑战。我们提出了一个理论框架,突显了在原始数据和嵌入数据上使用内部验证方法所带来的无效性,并提出了在深度聚类背景下应用聚类有效性指标的系统方法。实验证明,这一框架与外部验证方法更加吻合,有效地减少了在深度学习中错误使用聚类有效性指标所引起的误导。
更新时间: 2024-03-21 20:43:44
领域: stat.ML,cs.LG
Hyperbolic Secant representation of the logistic function: Application to probabilistic Multiple Instance Learning for CT intracranial hemorrhage detection
Multiple Instance Learning (MIL) is a weakly supervised paradigm that has been successfully applied to many different scientific areas and is particularly well suited to medical imaging. Probabilistic MIL methods, and more specifically Gaussian Processes (GPs), have achieved excellent results due to their high expressiveness and uncertainty quantification capabilities. One of the most successful GP-based MIL methods, VGPMIL, resorts to a variational bound to handle the intractability of the logistic function. Here, we formulate VGPMIL using P\'olya-Gamma random variables. This approach yields the same variational posterior approximations as the original VGPMIL, which is a consequence of the two representations that the Hyperbolic Secant distribution admits. This leads us to propose a general GP-based MIL method that takes different forms by simply leveraging distributions other than the Hyperbolic Secant one. Using the Gamma distribution we arrive at a new approach that obtains competitive or superior predictive performance and efficiency. This is validated in a comprehensive experimental study including one synthetic MIL dataset, two well-known MIL benchmarks, and a real-world medical problem. We expect that this work provides useful ideas beyond MIL that can foster further research in the field.
Updated: 2024-03-21 20:43:34
标题: 双曲正割表示的 logistic 函数:在 CT 颅内出血检测中的概率多实例学习应用
摘要: 多实例学习(MIL)是一种弱监督范式,已成功应用于许多不同的科学领域,并特别适用于医学影像。基于概率的MIL方法,特别是高斯过程(GPs),由于其高表达能力和不确定性量化能力而取得了出色的结果。最成功的基于GP的MIL方法之一,VGPMIL,采用变分下界来处理逻辑函数的难以处理性。在这里,我们利用泊松-伽玛随机变量来表达VGPMIL。这种方法产生了与原始VGPMIL相同的变分后验近似值,这是双曲正切分布所允许的两种表示的结果。这使我们提出了一种基于GP的多实例学习方法,通过简单地利用除双曲正切之外的分布来采用不同形式。使用伽玛分布,我们得到了一种新方法,可以获得具有竞争力或更高预测性能和效率的结果。这在一个综合性实验研究中得到验证,包括一个合成的MIL数据集,两个知名的MIL基准以及一个真实世界的医学问题。我们期望这项工作提供了超出MIL范围的有用思路,可以促进该领域的进一步研究。
更新时间: 2024-03-21 20:43:34
领域: cs.LG,stat.ML
Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets
We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework.
Updated: 2024-03-21 20:29:43
标题: 非凸健壮假设检验使用Sinkhorn不确定性集
摘要: 我们提出了一个新的框架来解决非凸鲁棒假设检验问题,其中的目标是寻找最优的检测器,最小化最坏情况下的一型和二型风险函数的最大值。分布不确定性集合是围绕基于Sinkhorn距离的样本导出的经验分布构建的。鉴于目标涉及非凸、非光滑的概率函数,往往难以优化,现有方法采用近似而非精确解。为了解决这一挑战,我们引入了一个精确的混合整数指数锥重构问题,可以通过适量的输入数据求解为全局最优解。随后,我们提出了一个凸逼近,展示了其优于当前文献中最先进方法的优势。此外,我们建立了鲁棒假设检验和非鲁棒风险函数的正则化公式之间的联系,提供了深入的解释。我们的数值研究突显了所提框架的令人满意的测试性能和计算效率。
更新时间: 2024-03-21 20:29:43
领域: stat.ML,cs.LG,math.OC
TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports
Understanding the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format. This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language processing (NLP) to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. The knowledge of threat intelligence in terms of TTPs is essential for comprehensively understanding cyber threats and enhancing detection and mitigation strategies. We create two datasets: an augmented sentence-TTP dataset of 39,296 samples and a 149 real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence dataset and the cyber threat reports. The TTPXHunter achieves the highest performance of 92.42% f1-score on the augmented dataset, and it also outperforms existing state-of-the-art solutions in TTP extraction by achieving an f1-score of 97.09% when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis, providing a crucial tool for cybersecurity professionals fighting cyber threats.
Updated: 2024-03-21 20:23:49
标题: TTPXHunter:从已完成的网络威胁报告中提取可操作的威胁情报行动
摘要: 了解对手的作案方式有助于组织采用有效的防御策略并在社区中分享情报。这种知识通常存在于威胁分析报告中的非结构化自然语言文本中。需要一种翻译工具来解释威胁报告中所述的作案方式,并将其翻译成结构化格式。这项研究引入了一种名为TTPXHunter的方法,用于从完成的网络威胁报告中自动提取关于战术、技术和程序(TTPs)的威胁情报。它利用网络领域特定的最新自然语言处理(NLP)技术,增强少数类TTPs的句子,并显著改进在威胁分析报告中定位TTPs。了解威胁情报的TTPs对全面了解网络威胁并增强检测和缓解策略至关重要。我们创建了两个数据集:一个包含39,296个样本的增强句子-TTP数据集和一个包含149个真实世界网络威胁情报报告-TTP数据集。此外,我们对TTPXHunter进行了评估,评估了增强句子数据集和网络威胁报告。TTPXHunter在增强数据集上取得了92.42%的最高性能f1分数,并且在对报告数据集进行评估时,其f1分数达到97.09%,也超过了现有的TTP提取最新解决方案。TTPXHunter通过为网络攻击者行为提供快速、可操作的见解,显著改进了网络安全威胁情报。这一进步自动化了威胁情报分析,为打击网络威胁的网络安全专业人员提供了关键工具。
更新时间: 2024-03-21 20:23:49
领域: cs.CR
Crowdsourced Multilingual Speech Intelligibility Testing
With the advent of generative audio features, there is an increasing need for rapid evaluation of their impact on speech intelligibility. Beyond the existing laboratory measures, which are expensive and do not scale well, there has been comparatively little work on crowdsourced assessment of intelligibility. Standards and recommendations are yet to be defined, and publicly available multilingual test materials are lacking. In response to this challenge, we propose an approach for a crowdsourced intelligibility assessment. We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.
Updated: 2024-03-21 20:14:53
标题: 众包多语种语音可懂性测试
摘要: 随着生成音频功能的出现,对其对语音可懂性的影响进行快速评估的需求日益增加。除了现有的昂贵且不易扩展的实验室测量外,对于通过众包评估可懂性的工作相对较少。标准和建议尚未定义,并且缺乏公开可用的多语言测试材料。为了应对这一挑战,我们提出了一种众包可懂性评估方法。我们详细介绍了测试设计,多语言语音数据的收集和公开发布,以及我们早期实验的结果。
更新时间: 2024-03-21 20:14:53
领域: eess.AS,cs.AI,eess.SP
Curvature Augmented Manifold Embedding and Learning
A new dimensional reduction (DR) and data visualization method, Curvature-Augmented Manifold Embedding and Learning (CAMEL), is proposed. The key novel contribution is to formulate the DR problem as a mechanistic/physics model, where the force field among nodes (data points) is used to find an n-dimensional manifold representation of the data sets. Compared with many existing attractive-repulsive force-based methods, one unique contribution of the proposed method is to include a non-pairwise force. A new force field model is introduced and discussed, inspired by the multi-body potential in lattice-particle physics and Riemann curvature in topology. A curvature-augmented force is included in CAMEL. Following this, CAMEL formulation for unsupervised learning, supervised learning, semi-supervised learning/metric learning, and inverse learning are provided. Next, CAMEL is applied to many benchmark datasets by comparing existing models, such as tSNE, UMAP, TRIMAP, and PacMap. Both visual comparison and metrics-based evaluation are performed. 14 open literature and self-proposed metrics are employed for a comprehensive comparison. Conclusions and future work are suggested based on the current investigation. Related code and demonstration are available on https://github.com/ymlasu/CAMEL for interested readers to reproduce the results and other applications.
Updated: 2024-03-21 19:59:07
标题: 曲率增强流形嵌入与学习
摘要: 提出了一种新的降维(DR)和数据可视化方法,曲率增强流形嵌入和学习(CAMEL)。其关键创新贡献在于将降维问题建模为一个机械/物理模型,其中节点(数据点)之间的力场被用来找到数据集的n维流形表示。与许多现有的基于吸引-排斥力的方法相比,所提出的方法的一个独特贡献是包含了非成对力。引入和讨论了一个新的力场模型,受到晶格-粒子物理中的多体势和拓扑中的黎曼曲率的启发。CAMEL中包括了一个曲率增强力。在此基础上,提供了CAMEL的无监督学习、监督学习、半监督学习/度量学习和逆学习的表达式。接下来,通过比较现有模型(如tSNE、UMAP、TRIMAP和PacMap),将CAMEL应用于许多基准数据集。进行了视觉比较和基于指标的评估。采用了14种公开文献和自我提出的指标进行全面比较。根据当前调查提出了结论和未来工作。有关代码和演示可在https://github.com/ymlasu/CAMEL上找到,感兴趣的读者可以重现结果和其他应用。
更新时间: 2024-03-21 19:59:07
领域: stat.ML,cs.HC,cs.LG
TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation
A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.
Updated: 2024-03-21 19:57:46
标题: TeleMoMa:用于移动操作的模块化多功能远程操作系统
摘要: 机器人学中限制模仿学习的一个关键瓶颈是缺乏数据。在移动操作中,这个问题更加严重,因为与固定操作相比,收集演示更加困难,这是由于缺乏可用和易于使用的远程操作界面。在这项工作中,我们展示了TeleMoMa,一个用于移动操作机器人全身远程操作的通用和模块化接口。TeleMoMa统一了多种人机接口,包括RGB和深度相机、虚拟现实控制器、键盘、操纵杆等,以及任何组合。在更易访问的版本中,TeleMoMa仅使用视觉(例如RGB-D相机)即可运行,降低了人类提供移动操作演示的门槛。我们通过在模拟和实际环境中远程操作几个现有的移动操作机器人(PAL Tiago ++、Toyota HSR和Fetch)来展示TeleMoMa的多功能性。我们通过训练移动操作任务的模仿学习策略来展示使用TeleMoMa收集的演示的质量,这些任务涉及同步的全身运动。最后,我们还展示了TeleMoMa的远程操作通道能够实现在现场(通过观察机器人)或远程(通过计算机网络发送命令和观察)的远程操作,并进行用户研究以评估新手用户学习使用我们系统启用的不同人机接口组合收集演示的难易程度。我们希望TeleMoMa成为一个有益的工具,使研究人员能够收集全身移动操作演示。有关更多信息和视频结果,请访问https://robin-lab.cs.utexas.edu/telemoma-web。
更新时间: 2024-03-21 19:57:46
领域: cs.RO,cs.AI,cs.LG
Deep Active Learning: A Reality Check
We conduct a comprehensive evaluation of state-of-the-art deep active learning methods. Surprisingly, under general settings, no single-model method decisively outperforms entropy-based active learning, and some even fall short of random sampling. We delve into overlooked aspects like starting budget, budget step, and pretraining's impact, revealing their significance in achieving superior results. Additionally, we extend our evaluation to other tasks, exploring the active learning effectiveness in combination with semi-supervised learning, and object detection. Our experiments provide valuable insights and concrete recommendations for future active learning studies. By uncovering the limitations of current methods and understanding the impact of different experimental settings, we aim to inspire more efficient training of deep learning models in real-world scenarios with limited annotation budgets. This work contributes to advancing active learning's efficacy in deep learning and empowers researchers to make informed decisions when applying active learning to their tasks.
Updated: 2024-03-21 19:28:17
标题: 深度主动学习:现实检验
摘要: 我们对最先进的深度主动学习方法进行了全面评估。令人惊讶的是,在一般设置下,没有单一模型方法能够明显优于基于熵的主动学习,甚至有些方法甚至不如随机抽样。我们深入探讨了被忽视的方面,如起始预算、预算步长和预训练的影响,揭示了它们在实现卓越结果方面的重要性。此外,我们将评估扩展到其他任务,探索主动学习与半监督学习以及目标检测的有效性。我们的实验提供了宝贵的见解和具体的建议,为未来的主动学习研究提供了指导。通过揭示当前方法的局限性并理解不同实验设置的影响,我们旨在激发在有限注释预算的现实场景中更有效地训练深度学习模型。这项工作有助于提升深度学习中主动学习的效果,并赋予研究人员在将主动学习应用到任务时做出明智决策的能力。
更新时间: 2024-03-21 19:28:17
领域: cs.LG,cs.AI,cs.CV
Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-trained DETR-style detector to new tasks while preserving knowledge from previous tasks. We propose a novel localized query function for efficient information retrieval from memory units, aiming to minimize forgetting. Furthermore, we identify a fundamental challenge in continual detection referred to as background relegation. This arises when object categories from earlier tasks reappear in future tasks, potentially without labels, leading them to be implicitly treated as background. This is an inevitable issue in continual detection or segmentation. The introduced continual optimization technique effectively tackles this challenge. Finally, we assess the performance of our proposed system on continual detection benchmarks and demonstrate that our approach surpasses the performance of existing state-of-the-art resulting in 5-7% improvements on MS-COCO and PASCAL-VOC on the task of continual detection.
Updated: 2024-03-21 19:20:29
标题: 通过记忆网络在连续检测中预防灾难性遗忘
摘要: 现代预训练架构在持续对新任务进行微调时往往无法保留先前的信息。尽管在持续分类方面取得了显著进展,但针对复杂视觉任务(如检测或分割)设计的系统仍然难以达到令人满意的性能。在本研究中,我们引入了一种基于记忆的检测变压器架构,将预训练的DETR风格检测器调整到新任务,同时保留来自先前任务的知识。我们提出了一种新颖的局部化查询函数,用于从记忆单元中高效检索信息,旨在最小化遗忘。此外,我们发现了持续检测中的一个基本挑战,即背景降级。当来自早期任务的物体类别在未来任务中再次出现时,可能没有标签,导致它们被隐式视为背景。这是持续检测或分割中不可避免的问题。引入的持续优化技术有效地解决了这一挑战。最后,我们评估了我们提出的系统在持续检测基准上的性能,并展示了我们的方法在MS-COCO和PASCAL-VOC上持续检测任务中的表现超过了现有的最新技术,使性能提高了5-7%。
更新时间: 2024-03-21 19:20:29
领域: cs.CV,cs.LG
Planning and Acting While the Clock Ticks
Standard temporal planning assumes that planning takes place offline and then execution starts at time 0. Recently, situated temporal planning was introduced, where planning starts at time 0 and execution occurs after planning terminates. Situated temporal planning reflects a more realistic scenario where time passes during planning. However, in situated temporal planning a complete plan must be generated before any action is executed. In some problems with time pressure, timing is too tight to complete planning before the first action must be executed. For example, an autonomous car that has a truck backing towards it should probably move out of the way now and plan how to get to its destination later. In this paper, we propose a new problem setting: concurrent planning and execution, in which actions can be dispatched (executed) before planning terminates. Unlike previous work on planning and execution, we must handle wall clock deadlines that affect action applicability and goal achievement (as in situated planning) while also supporting dispatching actions before a complete plan has been found. We extend previous work on metareasoning for situated temporal planning to develop an algorithm for this new setting. Our empirical evaluation shows that when there is strong time pressure, our approach outperforms situated temporal planning.
Updated: 2024-03-21 19:18:47
标题: 计划和行动,时钟滴答声中
摘要: 标准的时间规划假定规划是在离线环境下进行的,然后在时间0开始执行。最近,引入了情境化时间规划,其中规划从时间0开始,执行发生在规划终止后。情境化时间规划反映了一个更加现实的场景,在规划过程中时间是流逝的。然而,在情境化时间规划中,必须在执行任何操作之前生成完整的计划。在一些时间紧迫的问题中,时间太紧迫,以至于在执行第一个动作之前无法完成规划。例如,一辆自动驾驶汽车遇到一辆向其倒车的卡车时,现在应该移开,然后再规划如何到达目的地。在本文中,我们提出了一个新的问题设置:并发规划和执行,其中动作可以在规划终止之前被派遣(执行)。与之前关于规划和执行的工作不同,我们必须处理影响动作适用性和目标实现的挂钟截止时间(与情境化规划相同),同时还支持在找到完整计划之前派遣动作。我们扩展了之前关于情境化时间规划的元推理工作,以开发出适用于这一新设置的算法。我们的实证评估表明,在时间压力很大时,我们的方法优于情境化时间规划。
更新时间: 2024-03-21 19:18:47
领域: cs.AI
FERGI: Automatic Annotation of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. Specifically, AU4 (brow lowerer) is reflective of negative evaluations of the generated image whereas AU12 (lip corner puller) is reflective of positive evaluations. These can be useful in two ways. Firstly, we can automatically annotate user preferences between image pairs with substantial difference in these AU responses with an accuracy significantly outperforming state-of-the-art scoring models. Secondly, directly integrating the AU responses with the scoring models improves their consistency with human preferences. Finally, this method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
Updated: 2024-03-21 19:14:04
标题: FERGI:来自自发面部表情反应的文本到图像生成用户偏好的自动注释
摘要: 研究人员提出利用人类偏好反馈数据来微调文本到图像生成模型。然而,人类反馈收集的可扩展性受到手动注释的局限。因此,我们开发并测试了一种方法,可以从用户对生成的图像的自发面部表情反应中自动注释用户偏好。我们收集了一个面部表情反应到生成图像(FERGI)的数据集,并展示了多个面部动作单元(AUs)的激活与用户对生成图像的评价高度相关。具体来说,AU4(眉毛下垂者)反映了对生成图像的负面评价,而AU12(唇角拉扯者)反映了正面评价。这些可以在两个方面有用。首先,我们可以自动注释在这些AU反应有显著差异的图像对之间的用户偏好,准确性显著优于最先进的评分模型。其次,直接将AU反应与评分模型整合可以提高其与人类偏好的一致性。最后,利用面部表情分析进行自动注释的这种方法可能推广到其他生成任务。代码可在https://github.com/ShuangquanFeng/FERGI获得,数据集也可在同一链接上用于研究目的。
更新时间: 2024-03-21 19:14:04
领域: cs.CV,cs.AI,cs.HC,cs.LG
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without proper assessment of risks. As a first step towards democratic governance and risk assessment of AI, we introduce Particip-AI, a framework to gather current and future AI use cases and their harms and benefits from non-expert public. Our framework allows us to study more nuanced and detailed public opinions on AI through collecting use cases, surfacing diverse harms through risk assessment under alternate scenarios (i.e., developing and not developing a use case), and illuminating tensions over AI development through making a concluding choice on its development. To showcase the promise of our framework towards guiding democratic AI, we gather responses from 295 demographically diverse participants. We find that participants' responses emphasize applications for personal life and society, contrasting with most current AI development's business focus. This shows the value of surfacing diverse harms that are complementary to expert assessments. Furthermore, we found that perceived impact of not developing use cases predicted participants' judgements of whether AI use cases should be developed, and highlighted lay users' concerns of techno-solutionism. We conclude with a discussion on how frameworks like Particip-AI can further guide democratic AI governance and regulation.
Updated: 2024-03-21 19:12:37
标题: 参与-AI:一种民主调查框架,用于预测未来AI使用案例、危害和好处
摘要: 通用人工智能,如ChatGPT,似乎已经降低了公众使用人工智能和利用其力量的障碍。然而,人工智能的治理和发展仍然掌握在少数人手中,发展速度正在加快,但没有对风险进行适当评估。作为迈向民主治理和人工智能风险评估的第一步,我们介绍了Particip-AI,一个框架,用于收集当前和未来的人工智能使用案例及其对公众的伤害和利益。我们的框架使我们能够通过收集使用案例来研究更加细致和详细的公众对人工智能的看法,通过在备选方案下进行风险评估(即开发和不开发使用案例),表明对人工智能发展的紧张关系。通过对其发展作出结论性选择。为了展示我们的框架对民主人工智能的承诺,我们收集了295名人口多样化的参与者的回应。我们发现参与者的回应强调了个人生活和社会的应用,与大多数当前人工智能发展的商业重点形成对比。这显示了揭示与专家评估互补的多样化伤害的价值。此外,我们发现,不开发使用案例的知觉影响预测了参与者对是否应该开发人工智能使用案例的判断,并突出了普通用户对技术解决方案的担忧。我们结论讨论了像Particip-AI这样的框架如何进一步指导民主人工智能治理和监管。
更新时间: 2024-03-21 19:12:37
领域: cs.CY,cs.AI
Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define a zero-sum NMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as a zero-sum NMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the product of per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted zero-sum NMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for zero-sum NMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
Updated: 2024-03-21 19:12:08
标题: 多人零和马尔可夫游戏与网络可分离交互
摘要: 我们研究了一类新的马尔可夫博弈,即带有网络分离交互的(多人)零和马尔可夫博弈(zero-sum NMGs),用于模拟非合作多智能体顺序决策中的局部交互结构。我们将零和NMG定义为一个模型,其中与每个状态相关联的辅助游戏的收益是零和的,并且在某些交互网络上跨邻居具有一定的可分离(即多矩阵)结构。我们首先确定了马尔可夫博弈可以被表示为零和NMG的必要和充分条件,并且显示了在这些游戏中,马尔可夫粗粒度相关均衡(CCE)集合收敛到马尔可夫纳什均衡(NE)集合,即前者对于所有玩家的每个状态的边际化乘积产生后者。此外,我们还表明,在无限时间折扣的零和NMG中,找到近似的马尔可夫稳态CCE是PPAD-难的,除非底层网络具有“星形拓扑”。然后,我们提出了虚构游戏类型的动态,即正常形式游戏中的经典学习动态,用于零和NMG,并在星形网络结构下建立了收敛保证到马尔可夫稳态NE。最后,考虑到难度结果,我们专注于计算马尔可夫非稳态NE,并为一系列基于值迭代的算法提供有限迭代保证。我们还提供数值实验来证实我们的理论结果。
更新时间: 2024-03-21 19:12:08
领域: cs.GT,cs.LG
Latent Diffusion Models for Attribute-Preserving Image Anonymization
Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This paper presents, to the best of our knowledge, the first approach to image anonymization based on Latent Diffusion Models (LDMs). Every element of a scene is maintained to convey the same meaning, yet manipulated in a way that makes re-identification difficult. We propose two LDMs for this purpose: CAMOUFLaGE-Base exploits a combination of pre-trained ControlNets, and a new controlling mechanism designed to increase the distance between the real and anonymized images. CAMOFULaGE-Light is based on the Adapter technique, coupled with an encoding designed to efficiently represent the attributes of different persons in a scene. The former solution achieves superior performance on most metrics and benchmarks, while the latter cuts the inference time in half at the cost of fine-tuning a lightweight module. We show through extensive experimental comparison that the proposed method is competitive with the state-of-the-art concerning identity obfuscation whilst better preserving the original content of the image and tackling unresolved challenges that current solutions fail to address.
Updated: 2024-03-21 19:09:21
标题: 潜在扩散模型用于保留属性的图像匿名化
摘要: 图像匿名化的生成技术具有很大潜力,可以生成保护图像中人物隐私的数据集,同时实现高数据保真度和实用性。现有方法主要关注保留面部属性,但未能采用更全面的视角考虑场景和背景在匿名化过程中的作用。本文据我们所知,首次提出基于潜在扩散模型(LDMs)的图像匿名化方法。场景的每个元素都保留以传达相同的含义,但以一种使重新识别困难的方式进行操作。我们提出了两种用于此目的的LDMs:CAMOUFLaGE-Base利用预训练的ControlNets的组合,以及一种旨在增加真实图像和匿名化图像之间距离的新控制机制。CAMOFULaGE-Light基于Adapter技术,结合一种设计高效表示场景中不同人物属性的编码。前一种解决方案在大多数度量和基准测试中表现出色,而后者则通过调整轻量级模块,在减半推理时间的同时进行微调。通过广泛的实验比较,我们展示了所提出的方法在身份模糊方面与最先进技术竞争力的同时,更好地保留了图像的原始内容,并解决了当前解决方案未能解决的挑战。
更新时间: 2024-03-21 19:09:21
领域: cs.CV,cs.AI
Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on specific VQA datasets, making it more practical and robust in the open world. We present preliminary experimental results under zero-shot scenarios and highlight some failure cases, offering new directions for future research.
Updated: 2024-03-21 18:57:25
标题: 多代理VQA:在零-shot视觉问答中探索多代理基础模型
摘要: 这项工作探讨了基础模型在视觉问答(VQA)任务中的零样本能力。我们提出了一个自适应多智能体系统,命名为Multi-Agent VQA,以克服基础模型在目标检测和计数方面的限制,通过使用专门的智能体作为工具。与现有方法不同,我们的研究侧重于系统在特定VQA数据集上未经微调的性能,使其在开放世界中更实用和稳健。我们提出了零样本场景下的初步实验结果,并突出了一些失败案例,为未来研究提供了新的方向。
更新时间: 2024-03-21 18:57:25
领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MA
A Causal Analysis of CO2 Reduction Strategies in Electricity Markets Through Machine Learning-Driven Metalearners
This study employs the Causal Machine Learning (CausalML) statistical method to analyze the influence of electricity pricing policies on carbon dioxide (CO2) levels in the household sector. Investigating the causality between potential outcomes and treatment effects, where changes in pricing policies are the treatment, our analysis challenges the conventional wisdom surrounding incentive-based electricity pricing. The study's findings suggest that adopting such policies may inadvertently increase CO2 intensity. Additionally, we integrate a machine learning-based meta-algorithm, reflecting a contemporary statistical approach, to enhance the depth of our causal analysis. The study conducts a comparative analysis of learners X, T, S, and R to ascertain the optimal methods based on the defined question's specified goals and contextual nuances. This research contributes valuable insights to the ongoing dialogue on sustainable development practices, emphasizing the importance of considering unintended consequences in policy formulation.
Updated: 2024-03-21 18:55:05
标题: 通过机器学习驱动的元学习器对电力市场中减少CO2排放策略的因果分析
摘要: 这项研究采用因果机器学习(CausalML)统计方法来分析电力定价政策对家庭部门二氧化碳(CO2)水平的影响。研究了潜在结果和处理效果之间的因果关系,其中定价政策的变化是处理效果,我们的分析挑战了围绕基于激励的电力定价的传统智慧。研究结果表明,采用这种政策可能会无意中增加CO2强度。此外,我们整合了一种基于机器学习的元算法,反映了当代统计方法,以增强我们因果分析的深度。该研究对学习者X、T、S和R进行了比较分析,以确定基于所定义问题的特定目标和背景细微差异的最佳方法。这项研究为可持续发展实践的持续对话提供了宝贵的见解,强调在政策制定中考虑意外后果的重要性。
更新时间: 2024-03-21 18:55:05
领域: cs.LG,cs.AI,cs.CY,stat.ME
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
Language models have shown unprecedented capabilities, sparking debate over the source of their performance. Is it merely the outcome of learning syntactic patterns and surface level statistics, or do they extract semantics and a world model from the text? Prior work by Li et al. investigated this by training a GPT model on synthetic, randomly generated Othello games and found that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations using linear probes and contrastive activations. The model is given no a priori knowledge of the game and is solely trained on next character prediction, yet we find evidence of internal representations of board state. We validate these internal representations by using them to make interventions on the model's activations and edit its internal board state. Unlike Li et al's prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character. We derive a player skill vector and add it to the model, improving the model's win rate by up to 2.6 times.
Updated: 2024-03-21 18:53:23
标题: Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models 国际标准翻译: Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
摘要: 语言模型展示了前所未有的能力,引发了关于其性能来源的辩论。它们的性能仅仅是学习句法模式和表面层统计的结果,还是从文本中提取语义和世界模型?李等人之前的研究通过训练一个GPT模型来调查这个问题,他们在合成的随机生成的奥赛罗游戏上发现,模型学会了对局状态的内部表示。我们将这项工作扩展到更复杂的国际象棋领域,通过在真实游戏上训练并使用线性探针和对比激活来调查我们模型的内部表示。模型没有对游戏有任何先验知识,仅仅通过下一个字符的预测进行训练,但我们发现了对局状态的内部表示的证据。我们通过使用这些内部表示来对模型的激活进行干预并编辑其内部对局状态来验证这些内部表示。与李等人之前的合成数据集方法不同,我们的分析发现模型还学会了估计潜在变量,如玩家技能,以更好地预测下一个字符。我们推导出一个玩家技能向量并将其添加到模型中,使模型的胜率提高了最多2.6倍。
更新时间: 2024-03-21 18:53:23
领域: cs.LG,cs.CL
AltGraph: Redesigning Quantum Circuits Using Generative Graph Models for Efficient Optimization
Quantum circuit transformation aims to produce equivalent circuits while optimizing for various aspects such as circuit depth, gate count, and compatibility with modern Noisy Intermediate Scale Quantum (NISQ) devices. There are two techniques for circuit transformation. The first is a rule-based approach that greedily cancels out pairs of gates that equate to the identity unitary operation. Rule-based approaches are used in quantum compilers such as Qiskit, tket, and Quilc. The second is a search-based approach that tries to find an equivalent quantum circuit by exploring the quantum circuits search space. Search-based approaches typically rely on machine learning techniques such as generative models and Reinforcement Learning (RL). In this work, we propose AltGraph, a novel search-based circuit transformation approach that generates equivalent quantum circuits using existing generative graph models. We use three main graph models: DAG Variational Autoencoder (D-VAE) with two variants: Gated Recurrent Unit (GRU) and Graph Convolutional Network (GCN), and Deep Generative Model for Graphs (DeepGMG) that take a Direct Acyclic Graph (DAG) of the quantum circuit as input and output a new DAG from which we reconstruct the equivalent quantum circuit. Next, we perturb the latent space to generate equivalent quantum circuits some of which may be more compatible with the hardware coupling map and/or enable better optimization leading to reduced gate count and circuit depth. AltGraph achieves on average a 37.55% reduction in the number of gates and a 37.75% reduction in the circuit depth post-transpiling compared to the original transpiled circuit with only 0.0074 Mean Squared Error (MSE) in the density matrix.
Updated: 2024-03-21 18:52:20
标题: AltGraph:使用生成图模型重新设计量子电路以进行高效优化
摘要: 量子电路转换旨在生成等效电路,同时优化各个方面,如电路深度、门数量以及与现代嘈杂中间规模量子(NISQ)设备的兼容性。电路转换有两种技术。第一种是基于规则的方法,贪婪地取消等同于单位矩阵操作的门对。基于规则的方法在量子编译器(如Qiskit、tket和Quilc)中被使用。第二种是基于搜索的方法,通过探索量子电路搜索空间来寻找等效量子电路。基于搜索的方法通常依赖于机器学习技术,如生成模型和强化学习(RL)。在这项工作中,我们提出了AltGraph,一种新颖的基于搜索的电路转换方法,使用现有的生成图模型生成等效量子电路。我们使用三种主要的图模型:带有两个变体的DAG变分自动编码器(D-VAE):门控循环单元(GRU)和图卷积网络(GCN),以及用于图的深度生成模型(DeepGMG),以量子电路的有向无环图(DAG)作为输入,并输出一个新的DAG,从中我们重建等效量子电路。接下来,我们扰动潜空间以生成等效量子电路,其中一些可能更符合硬件耦合图,或者实现更好的优化,从而减少门数量和电路深度。与原始转换电路相比,AltGraph平均实现了门数量减少37.55%和电路深度减少37.75%,仅在密度矩阵中具有0.0074的均方误差(MSE)。
更新时间: 2024-03-21 18:52:20
领域: quant-ph,cs.ET,cs.LG
On the Detection of Anomalous or Out-Of-Distribution Data in Vision Models Using Statistical Techniques
Out-of-distribution data and anomalous inputs are vulnerabilities of machine learning systems today, often causing systems to make incorrect predictions. The diverse range of data on which these models are used makes detecting atypical inputs a difficult and important task. We assess a tool, Benford's law, as a method used to quantify the difference between real and corrupted inputs. We believe that in many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data. We hope to open a discussion on these applications and further areas where this technique is underexplored.
Updated: 2024-03-21 18:31:47
标题: 关于使用统计技术检测视觉模型中异常或超出分布数据的研究
摘要: 今天,机器学习系统面临的脆弱性之一是外部数据和异常输入,这经常导致系统做出错误的预测。这些模型使用的各种数据使得检测到异常输入成为一项困难且重要的任务。我们评估了一种工具,邦福德定律,作为一种用于量化真实和受损输入之间差异的方法。我们相信在许多情境中,它可以作为异常数据点的过滤器,并用于标识出-of-distribution数据。我们希望就这些应用开展讨论,并探讨这种技术在尚未开发的领域中的进一步应用。
更新时间: 2024-03-21 18:31:47
领域: cs.CV,cs.LG
Few-Shot Adversarial Prompt Learning on Vision-Language Models
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adaptation cost, suboptimal text supervision, and uncontrolled natural generalization capacity. In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. Specifically, we achieve this by providing adversarially correlated text supervision that is end-to-end learned from adversarial examples. We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. The proposed framework gives access to learn adversarial text supervision, which provides superior cross-modal adversarial alignment and matches state-of-the-art zero-shot adversarial robustness with only 1% training data.
Updated: 2024-03-21 18:28:43
标题: 视觉-语言模型上的少样本对抗性提示学习
摘要: 深度神经网络对于不可察觉的对抗性扰动的脆弱性引起了广泛关注。受视觉-语言基础模型成功的启发,先前的努力通过将对抗性视觉特征与文本监督进行对齐,实现了零样本对抗性鲁棒性。然而,在实践中,它们仍然存在一些问题,包括高昂的适应成本、次优的文本监督和无法控制的自然泛化能力。本文针对这些问题,提出了一种少样本对抗提示框架,通过有限数据对输入序列进行调整,显著提高了对抗性鲁棒性。具体来说,我们通过提供从对抗性示例中端到端学习的对抗性相关文本监督来实现这一点。我们还提出了一种增强多模态特征一致性的训练目标,同时鼓励自然和对抗性示例之间的差异化单模态特征。所提出的框架使得学习对抗性文本监督成为可能,从而提供了优越的跨模态对抗性对齐,并仅使用1%的训练数据即可达到最先进的零样本对抗性鲁棒性。
更新时间: 2024-03-21 18:28:43
领域: cs.CV,cs.CL,cs.CR,cs.LG
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video generation (typically 16 or 24 frames), ending up with hard-cuts when naively extended to the case of long video synthesis. To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions. The key components are:(i) a short-term memory block called conditional attention module (CAM), which conditions the current generation on the features extracted from the previous chunk via an attentional mechanism, leading to consistent chunk transitions, (ii) a long-term memory block called appearance preservation module, which extracts high-level scene and object features from the first video chunk to prevent the model from forgetting the initial scene, and (iii) a randomized blending approach that enables to apply a video enhancer autoregressively for infinitely long videos without inconsistencies between chunks. Experiments show that StreamingT2V generates high motion amount. In contrast, all competing image-to-video methods are prone to video stagnation when applied naively in an autoregressive manner. Thus, we propose with StreamingT2V a high-quality seamless text-to-long video generator that outperforms competitors with consistency and motion. Our code will be available at: https://github.com/Picsart-AI-Research/StreamingT2V
Updated: 2024-03-21 18:27:29
标题: StreamingT2V:一致、动态和可扩展的基于文本的长视频生成
摘要: 文本到视频扩散模型使得生成高质量视频成为可能,这些视频遵循文本说明,轻松创建多样化和个性化内容。然而,现有方法主要集中在生成高质量的短视频(通常为16或24帧),当天真地扩展到长视频合成时,会出现硬切换的问题。为了克服这些限制,我们引入了StreamingT2V,这是一种用于生成80、240、600、1200或更多帧长视频的自回归方法,具有平滑的过渡效果。关键组件包括:(i)称为条件注意模块(CAM)的短期记忆块,通过注意机制将当前生成的内容与先前块中提取的特征进行条件化,从而实现一致的块转换;(ii)称为外观保持模块的长期记忆块,从第一个视频块中提取高级场景和对象特征,以防止模型遗忘初始场景;(iii)一种随机混合方法,使得可以对无限长的视频应用视频增强器进行自回归处理,而不会出现块之间的不一致性。实验证明StreamingT2V生成了高运动量。相比之下,所有竞争的图像到视频方法在自回归方式下应用时容易出现视频停滞问题。因此,我们提出StreamingT2V,这是一个高质量的无缝文本到长视频生成器,以其一致性和动态性能胜过竞争对手。我们的代码将会在以下链接提供:https://github.com/Picsart-AI-Research/StreamingT2V。
更新时间: 2024-03-21 18:27:29
领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM,eess.IV
Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
Recent model inversion attack algorithms permit adversaries to reconstruct a neural network's private training data just by repeatedly querying the network and inspecting its outputs. In this work, we develop a novel network architecture that leverages sparse-coding layers to obtain superior robustness to this class of attacks. Three decades of computer science research has studied sparse coding in the context of image denoising, object recognition, and adversarial misclassification settings, but to the best of our knowledge, its connection to state-of-the-art privacy vulnerabilities remains unstudied. However, sparse coding architectures suggest an advantageous means to defend against model inversion attacks because they allow us to control the amount of irrelevant private information encoded in a network's intermediate representations in a manner that can be computed efficiently during training and that is known to have little effect on classification accuracy. Specifically, compared to networks trained with a variety of state-of-the-art defenses, our sparse-coding architectures maintain comparable or higher classification accuracy while degrading state-of-the-art training data reconstructions by factors of 1.1 to 18.3 across a variety of reconstruction quality metrics (PSNR, SSIM, FID). This performance advantage holds across 5 datasets ranging from CelebA faces to medical images and CIFAR-10, and across various state-of-the-art SGD-based and GAN-based inversion attacks, including Plug-&-Play attacks. We provide a cluster-ready PyTorch codebase to promote research and standardize defense evaluations.
Updated: 2024-03-21 18:26:23
标题: 通过稀疏编码架构提高对模型反演攻击的鲁棒性
摘要: 最近的模型反演攻击算法允许对手通过反复查询网络并检查其输出来重建神经网络的私人训练数据。在这项工作中,我们开发了一种新颖的网络架构,利用稀疏编码层来获得对这类攻击的卓越稳健性。三十年来,计算机科学研究一直在研究稀疏编码在图像去噪、目标识别和对抗性误分类设置中的应用,但据我们所知,它与最新的隐私漏洞的关联尚未被研究。然而,稀疏编码架构表明一种有利的方式来抵御模型反演攻击,因为它们允许我们控制编码在网络中间表示中的无关私人信息的数量,这种方式可以在训练期间高效计算,并且已知对分类准确性几乎没有影响。具体来说,与使用各种最先进的防御方法训练的网络相比,我们的稀疏编码架构在维持相当或更高的分类准确性的同时,将最先进的训练数据重建降低了1.1到18.3倍,跨多种重建质量指标(PSNR、SSIM、FID)。这种性能优势在包括CelebA人脸、医学图像和CIFAR-10在内的5个数据集上持续存在,跨各种最先进的基于SGD和GAN的反演攻击,包括Plug-&-Play攻击。我们提供了一个集群就绪的PyTorch代码库,以促进研究和标准化防御评估。
更新时间: 2024-03-21 18:26:23
领域: cs.CV,cs.AI,cs.CR,cs.LG
Gravitational Duals from Equations of State
Holography relates gravitational theories in five dimensions to four-dimensional quantum field theories in flat space. Under this map, the equation of state of the field theory is encoded in the black hole solutions of the gravitational theory. Solving the five-dimensional Einstein's equations to determine the equation of state is an algorithmic, direct problem. Determining the gravitational theory that gives rise to a prescribed equation of state is a much more challenging, inverse problem. We present a novel approach to solve this problem based on physics-informed neural networks. The resulting algorithm is not only data-driven but also informed by the physics of the Einstein's equations. We successfully apply it to theories with crossovers, first- and second-order phase transitions.
Updated: 2024-03-21 18:07:32
标题: 来自状态方程的引力双对应物
摘要: 全息学将五维引力理论与四维量子场论在平坦空间中联系起来。在这种映射下,场论的状态方程被编码在引力理论的黑洞解中。解五维爱因斯坦方程以确定状态方程是一个算法性的直接问题。确定导致规定状态方程的引力理论是一个更具挑战性的逆问题。我们提出了一种基于物理信息神经网络的新方法来解决这个问题。得到的算法不仅是数据驱动的,而且还受到爱因斯坦方程物理的启示。我们成功将其应用于具有交叉点、一级和二级相变的理论。
更新时间: 2024-03-21 18:07:32
领域: hep-th,astro-ph.CO,cs.AI,cs.LG,gr-qc
The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency and Usability in AI
Generative AI (GAI) offers unprecedented possibilities but its commercialization has raised concerns about transparency, reproducibility, bias, and safety. Many "open-source" GAI models lack the necessary components for full understanding and reproduction, and some use restrictive licenses, a practice known as "openwashing." We propose the Model Openness Framework (MOF), a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access. The MOF requires specific components of the model development lifecycle to be included and released under appropriate open licenses. This framework aims to prevent misrepresentation of models claiming to be open, guide researchers and developers in providing all model components under permissive licenses, and help companies, academia, and hobbyists identify models that can be safely adopted without restrictions. Wide adoption of the MOF will foster a more open AI ecosystem, accelerating research, innovation, and adoption.
Updated: 2024-03-21 18:03:46
标题: 开放模型框架:促进人工智能中的可重现性、透明度和可用性的完整性和开放性
摘要: 生成式人工智能(GAI)提供了前所未有的可能性,但其商业化引发了关于透明度、可复制性、偏见和安全性的担忧。许多“开源”GAI模型缺乏完全理解和复制所需的组件,而一些使用限制性许可证,这种做法被称为“openwashing”。我们提出了模型开放性框架(MOF),这是一个排名分类系统,根据其完整性和开放性对机器学习模型进行评级,遵循开放科学、开源、开放数据和开放获取原则。MOF要求模型开发生命周期的特定组件必须包含并以适当的开放许可证发布。该框架旨在防止声称为开放的模型被误传,引导研究人员和开发人员以宽松许可证提供所有模型组件,并帮助公司、学术界和爱好者识别可安全采用而无限制的模型。MOF的广泛采用将促进更加开放的人工智能生态系统,加速研究、创新和采纳。
更新时间: 2024-03-21 18:03:46
领域: cs.LG,cs.AI,cs.CY,cs.SE
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
The widely popular transformer network popularized by the generative pre-trained transformer (GPT) has a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. In the latter context, the continuous analog of the self-attention mechanism at the heart of transformer networks has been applied to learning the solutions of partial differential equations and reveals a convolution kernel nature that can be exploited by the Fourier transform. It is well known that many quantum algorithms that have provably demonstrated a speedup over classical algorithms utilize the quantum Fourier transform. In this work, we explore quantum circuits that can efficiently express a self-attention mechanism through the perspective of kernel-based operator learning. In this perspective, we are able to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. We analyze the computational and parameter complexity of our novel variational quantum circuit, which we call Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), and demonstrate its utility on simplified classification problems.
Updated: 2024-03-21 18:00:04
标题: 学习与SASQuaTCh:一种新颖的基于核的自注意力变分量子变压器架构
摘要: 由生成式预训练变压器(GPT)推广的广泛流行的变压器网络具有广泛的应用领域,包括预测文本和图像、分类,甚至预测物理系统动力学的解决方案。在后一种情境中,变压器网络核心的连续自注意机制的类比已被应用于学习偏微分方程的解,并揭示了可以通过傅里叶变换利用的卷积核性质。众所周知,许多已被证明比经典算法具有加速效果的量子算法利用了量子傅里叶变换。在这项工作中,我们通过基于核的算子学习的角度探索可以有效表达自注意机制的量子电路。在这个角度上,我们能够使用简单的门操作和一组多维量子傅里叶变换来表示视觉变压器网络的深层。我们分析了我们称之为自注意顺序量子变压器通道(SASQuaTCh)的新型变分量子电路的计算和参数复杂性,并展示了它在简化分类问题上的实用性。
更新时间: 2024-03-21 18:00:04
领域: quant-ph,cs.LG
A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients
Automating real-time anomaly detection is essential for identifying rare transients in the era of large-scale astronomical surveys. Modern survey telescopes are generating tens of thousands of alerts per night, and future telescopes, such as the Vera C. Rubin Observatory, are projected to increase this number dramatically. Currently, most anomaly detection algorithms for astronomical transients rely either on hand-crafted features extracted from light curves or on features generated through unsupervised representation learning, which are then coupled with standard machine learning anomaly detection algorithms. In this work, we introduce an alternative approach to detecting anomalies: using the penultimate layer of a neural network classifier as the latent space for anomaly detection. We then propose a novel method, named Multi-Class Isolation Forests (MCIF), which trains separate isolation forests for each class to derive an anomaly score for a light curve from the latent space representation given by the classifier. This approach significantly outperforms a standard isolation forest. We also use a simpler input method for real-time transient classifiers which circumvents the need for interpolation in light curves and helps the neural network model inter-passband relationships and handle irregular sampling. Our anomaly detection pipeline identifies rare classes including kilonovae, pair-instability supernovae, and intermediate luminosity transients shortly after trigger on simulated Zwicky Transient Facility light curves. Using a sample of our simulations that matched the population of anomalies expected in nature (54 anomalies and 12,040 common transients), our method was able to discover $41\pm3$ anomalies (~75% recall) after following up the top 2000 (~15%) ranked transients. Our novel method shows that classifiers can be effectively repurposed for real-time anomaly detection.
Updated: 2024-03-21 18:00:00
标题: 一个基于分类器的方法用于天文瞬变的多类异常检测
摘要: 自动化实时异常检测对于在大规模天文调查时代识别罕见瞬变现象至关重要。现代调查望远镜每晚产生数万个警报,未来的望远镜,如维拉·C·鲁宾天文台,预计将大幅增加这一数字。目前,用于天文瞬变的大多数异常检测算法要么依赖于从光变曲线中提取的手工特征,要么依赖于通过无监督表示学习生成的特征,然后与标准机器学习异常检测算法相结合。在这项工作中,我们引入了一种检测异常的替代方法:使用神经网络分类器的倒数第二层作为异常检测的潜在空间。然后,我们提出了一种新方法,名为多类孤立森林(MCIF),为每个类别训练单独的孤立森林,以从分类器给出的潜在空间表示中推导出光变曲线的异常分数。这种方法显著优于标准孤立森林。我们还使用了一种更简单的实时瞬变分类器输入方法,避免了光变曲线中的插值需求,并帮助神经网络模型相互通道关系和处理不规则采样。我们的异常检测管道在模拟的兹威基瞬变设施光变曲线上触发后不久,识别了包括溃变星、对撞不稳定性超新星和中等亮度瞬变等罕见类别。使用与自然界预期的异常种群匹配的模拟样本(54个异常和12,040个常见瞬变),我们的方法在跟踪前2000个(约15%)排名最高的瞬变后,能够发现$41\pm3$个异常(约75%的召回率)。我们的新方法表明分类器可以有效地用于实时异常检测。
更新时间: 2024-03-21 18:00:00
领域: astro-ph.IM,astro-ph.HE,cs.LG
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood. We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in deducing answers without truly interpreting the input diagrams. To this end, we introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs. We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources. Each problem is then transformed by human annotators into six distinct versions, each offering varying degrees of information content in multi-modality, contributing to 15K test samples in total. This approach allows MathVerse to comprehensively assess whether and how much MLLMs can truly understand the visual diagrams for mathematical reasoning. In addition, we propose a Chain-of-Thought (CoT) evaluation strategy for a fine-grained assessment of the output answers. Rather than naively judging True or False, we employ GPT-4(V) to adaptively extract crucial reasoning steps, and then score each step with detailed error analysis, which can reveal the intermediate CoT reasoning quality by MLLMs. We hope the MathVerse benchmark may provide unique insights to guide the future development of MLLMs. Project page: https://mathverse-cuhk.github.io
Updated: 2024-03-21 17:59:50
标题: MathVerse:您的多模式LLM真的能看懂视觉数学问题中的图表吗?
摘要: 多模式大型语言模型(MLLMs)的显著进展引起了空前的关注,因为它们在视觉环境中表现出优越的性能。然而,它们在视觉数学问题解决方面的能力尚未得到充分评估和理解。我们调查当前的基准测试,以在文本问题中引入过多的视觉内容,这可能有助于MLLMs在不真正解释输入图表的情况下推断答案。为此,我们引入了MathVerse,一个全面的视觉数学基准,旨在公平而深入地评估MLLMs。我们精心收集了2,612个高质量的多学科数学问题,其中包含来自公开来源的图表。然后,每个问题都由人类注释员转换为六个不同版本,每个版本在多模态中提供不同程度的信息内容,总共贡献了15K个测试样本。这种方法使得MathVerse能够全面评估MLLMs是否真正理解数学推理的视觉图表以及如何理解。此外,我们提出了一种“Chain-of-Thought”(CoT)评估策略,用于对输出答案进行细粒度评估。我们不是简单地判断True或False,而是利用GPT-4(V)自适应提取关键的推理步骤,然后对每个步骤进行详细的错误分析,这可以揭示MLLMs的中间CoT推理质量。我们希望MathVerse基准测试可以为指导未来MLLMs的发展提供独特的见解。项目页面:https://mathverse-cuhk.github.io
更新时间: 2024-03-21 17:59:50
领域: cs.CV,cs.AI,cs.CL,cs.LG
DreamReward: Text-to-3D Generation with Human Preference
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
Updated: 2024-03-21 17:58:04
标题: DreamReward:使用人类偏好进行文本到3D生成
摘要: 最近,从文本提示中创建3D内容显示出了显著的成功。然而,目前的文本到3D方法经常生成与人类偏好不太一致的3D结果。在本文中,我们提出了一个全面的框架,称为DreamReward,用于从人类偏好反馈中学习和改进文本到3D模型。首先,我们收集了25k个专家比较,基于一个系统化的注释流程,包括评分和排名。然后,我们构建了Reward3D——第一个通用的文本到3D人类偏好奖励模型,有效地编码人类偏好。在3D奖励模型的基础上,我们最终进行了理论分析,并提出了Reward3D反馈学习(DreamFL),一个直接调整算法,用于优化重新定义评分者的多视角扩散模型。通过理论证明和广泛的实验比较,我们的DreamReward成功生成了高保真度和3D一致性的结果,显著提高了与人类意图的提示对齐。我们的结果表明,从人类反馈学习以改进文本到3D模型具有巨大潜力。
更新时间: 2024-03-21 17:58:04
领域: cs.CV,cs.CL,cs.LG
TD-MPC2: Scalable, Robust World Models for Continuous Control
TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com
Updated: 2024-03-21 17:56:19
标题: TD-MPC2: 可扩展,鲁棒的连续控制世界模型
摘要: TD-MPC是一种基于模型的强化学习(RL)算法,它在学习的隐式(无解码器)世界模型的潜在空间中执行局部轨迹优化。在这项工作中,我们提出了TD-MPC2:对TD-MPC算法的一系列改进。我们展示了TD-MPC2在跨越4个不同任务领域的104个在线RL任务中明显优于基线,使用一组超参数始终取得良好的结果。我们进一步展示了随着模型和数据规模的增加,代理能力增强,并成功训练了一个单一的317M参数代理来执行跨多个任务领域、体现和行动空间的80个任务。我们最后总结了与大型TD-MPC2代理相关的教训、机会和风险。在https://tdmpc2.com上探索视频、模型、数据、代码等。
更新时间: 2024-03-21 17:56:19
领域: cs.LG,cs.AI,cs.CV,cs.RO
The Elements of Differentiable Programming
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two. Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.
Updated: 2024-03-21 17:55:16
标题: 《可微分编程的要素》
摘要: 人工智能最近取得了显著进展,得益于大型模型、海量数据集、加速硬件,以及可微分编程的变革性力量。这种新的编程范式使复杂计算机程序(包括具有控制流和数据结构的程序)实现端到端的微分,从而实现了对程序参数的基于梯度的优化。作为一种新兴范式,可微分编程建立在计算机科学和应用数学的几个领域之上,包括自动微分、图形模型、优化和统计学。本书提供了对可微分编程有用的基本概念的全面审视。我们采用了两种主要视角,即优化和概率,二者之间有明显的类比。可微分编程不仅仅是程序的微分,还包括旨在进行微分的程序的深思熟虑设计。通过使程序可微分,我们从根本上引入了概率分布,从而提供了一种量化与程序输出相关的不确定性的方法。
更新时间: 2024-03-21 17:55:16
领域: cs.LG,cs.AI,cs.PL
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 participants, 11 of whom improved the baseline performance. The top-performing system achieved a significant improvement scoring 25.8% equal error rate, which is still far from the performance of state-of-the-art adult speaker verification systems. Therefore, we believe there is room for further research on this dataset, potentially extending beyond the verification task.
Updated: 2024-03-21 17:52:22
标题: CryCeleb:基于婴儿哭声的说话人验证数据集
摘要: 这篇论文描述了Ubenwa CryCeleb数据集 - 一个标记的婴儿哭声收集 - 以及附带的CryCeleb 2023任务,这是一个基于哭声的公共发言者验证挑战。我们发布了超过6小时手动分割的来自786名新生儿的哭声,供学术研究使用,旨在鼓励婴儿哭声分析研究。首届公开比赛吸引了59名参与者,其中11名改进了基线表现。表现最出色的系统取得了显著进步,实现了25.8%的等误差率,但仍远远落后于现有成人发言者验证系统的性能。因此,我们相信在这个数据集上还有进一步研究的空间,可能超出验证任务。
更新时间: 2024-03-21 17:52:22
领域: cs.SD,cs.AI,cs.CL,eess.AS
ReNoise: Real Image Inversion Through Iterative Noising
Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps. In this work, we introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. Building on reversing the diffusion sampling process, our method employs an iterative renoising mechanism at each inversion sampling step. This mechanism refines the approximation of a predicted point along the forward diffusion trajectory, by iteratively applying the pretrained diffusion model, and averaging these predictions. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models. Through comprehensive evaluations and comparisons, we show its effectiveness in terms of both accuracy and speed. Furthermore, we confirm that our method preserves editability by demonstrating text-driven image editing on real images.
Updated: 2024-03-21 17:52:08
标题: ReNoise:通过迭代加噪实现真实图像反转
摘要: 最近在文本引导的扩散模型方面取得了重大进展,解锁了强大的图像处理能力。然而,将这些方法应用于真实图像需要将图像反演到预训练扩散模型的领域。实现忠实的反演仍然是一个挑战,特别是对于更近期训练用于生成带有少量去噪步骤的图像的模型。在这项工作中,我们引入了一种高质量与操作比例的反演方法,提高了重建准确性而不增加操作数量。基于扭转扩散抽样过程,我们的方法在每个反演抽样步骤中采用了一个迭代的去噪机制。该机制通过迭代应用预训练的扩散模型,并平均这些预测,来完善沿正向扩散轨迹的预测点的逼近。我们使用各种抽样算法和模型(包括最近的加速扩散模型)评估了我们的ReNoise技术的性能。通过全面的评估和比较,我们展示了它在准确性和速度方面的有效性。此外,我们通过展示对真实图像进行文本驱动的图像编辑来确认我们的方法保留了可编辑性。
更新时间: 2024-03-21 17:52:08
领域: cs.CV,cs.GR,cs.LG,eess.IV
Extended Reality for Enhanced Human-Robot Collaboration: a Human-in-the-Loop Approach
The rise of automation has provided an opportunity to achieve higher efficiency in manufacturing processes, yet it often compromises the flexibility required to promptly respond to evolving market needs and meet the demand for customization. Human-robot collaboration attempts to tackle these challenges by combining the strength and precision of machines with human ingenuity and perceptual understanding. In this paper, we conceptualize and propose an implementation framework for an autonomous, machine learning-based manipulator that incorporates human-in-the-loop principles and leverages Extended Reality (XR) to facilitate intuitive communication and programming between humans and robots. Furthermore, the conceptual framework foresees human involvement directly in the robot learning process, resulting in higher adaptability and task generalization. The paper highlights key technologies enabling the proposed framework, emphasizing the importance of developing the digital ecosystem as a whole. Additionally, we review the existent implementation approaches of XR in human-robot collaboration, showcasing diverse perspectives and methodologies. The challenges and future outlooks are discussed, delving into the major obstacles and potential research avenues of XR for more natural human-robot interaction and integration in the industrial landscape.
Updated: 2024-03-21 17:50:22
标题: 扩展现实技术用于增强人机协作:一种人在环路中的方法
摘要: 自动化的崛起为制造过程的效率提供了机会,但往往会牺牲灵活性,无法及时响应不断变化的市场需求和满足定制需求。人机协作试图通过结合机器的力量和精准性与人类的智慧和感知理解来应对这些挑战。本文构想并提出了一个基于机器学习的自主操作器框架,该框架融合了人机循环原则,并利用扩展现实(XR)促进人与机器之间的直观沟通和编程。此外,概念框架预见到人类直接参与机器学习过程,从而实现更高的适应性和任务泛化。本文重点介绍了支持所提出框架的关键技术,强调了作为一个整体发展数字生态系统的重要性。此外,我们审查了XR在人机协作中的现有实施方法,展示了不同的观点和方法论。讨论了挑战和未来展望,深入探讨了XR在工业领域实现更自然的人机交互和整合所面临的主要障碍和潜在研究途径。
更新时间: 2024-03-21 17:50:22
领域: cs.RO,cs.HC,cs.LG
Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery
Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning. This paper rethinks the two different angles of AIRL: policy imitation and transferable reward recovery. We begin with substituting the built-in algorithm in AIRL with soft actor-critic (SAC) during the policy optimization process to enhance sample efficiency, thanks to the off-policy formulation of SAC and identifiable Markov decision process (MDP) models with respect to AIRL. It indeed exhibits a significant improvement in policy imitation but accidentally brings drawbacks to transferable reward recovery. To learn this issue, we illustrate that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for satisfactory transfer effect. Additionally, we analyze the capability of environments to extract disentangled rewards from an algebraic theory perspective.
Updated: 2024-03-21 17:48:38
标题: 重新思考对抗性逆强化学习:从策略模仿和可转移奖励恢复的角度进行研究
摘要: 对抗性逆强化学习(AIRL)是模仿学习中的一个基石方法。本文重新思考了AIRL的两个不同角度:策略模仿和可转移奖励恢复。我们首先在策略优化过程中用软演员-评论家(SAC)替换AIRL中的内置算法,以增强样本效率,这要归功于SAC的离线制定和与AIRL相关的可识别马尔可夫决策过程(MDP)模型。它确实在策略模仿方面表现出显著的改进,但不小心给可转移奖励恢复带来了缺点。为了解决这个问题,我们说明了SAC算法本身在AIRL训练过程中不可行地全面解开奖励函数,并提出了一个混合框架PPO-AIRL + SAC,以实现满意的转移效果。此外,我们从代数理论的角度分析了环境从中提取解开奖励的能力。
更新时间: 2024-03-21 17:48:38
领域: cs.LG,stat.ML
Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals
As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants. AI coding assistants should set clear expectations for usage, integrate with advanced IDE capabilities and existing extensions, use extendable backend designs, and collect app data responsibly for downstream analyses. We propose open questions and challenges that academia and industry should address to realize the vision of next-generation AI coding assistants.
Updated: 2024-03-21 17:47:28
标题: 构想下一代AI编程助手:见解与建议.
摘要: 作为人工智能软件工程领域的研究产品混合群体(AI4SE),我们从开发中IDE AI编码助手的经验中提出了四个关键要点。AI编码助手应该为使用设定明确的期望,与高级IDE功能和现有扩展集成,使用可扩展的后端设计,并负责地收集应用程序数据以进行下游分析。我们提出了学术界和工业界应该解决的开放性问题和挑战,以实现下一代AI编码助手的愿景。
更新时间: 2024-03-21 17:47:28
领域: cs.SE,cs.AI,cs.HC
Co-Optimization of Environment and Policies for Decentralized Multi-Agent Navigation
This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in cluttered environments. By introducing two sub-objectives of multi-agent navigation and environment optimization, we propose an $\textit{agent-environment co-optimization}$ problem and develop a $\textit{coordinated algorithm}$ that alternates between these sub-objectives to search for an optimal synthesis of agent actions and obstacle configurations in the environment; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between agents, environment and performance, we leverage policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum trajectory of an associated time-varying non-convex optimization problem. Extensive numerical results corroborate theoretical findings and show the benefits of co-optimization over baselines. Interestingly, the results also indicate that optimized environment configurations are able to offer structural guidance that is key to de-conflicting agents in motion.
Updated: 2024-03-21 17:37:43
标题: 环境和政策的协同优化对去中心化多智能体导航的影响
摘要: 这项工作将多智能体系统及其周围环境视为一个共同进化的系统,其中一个的行为会影响另一个。目标是将智能体的行动和环境配置都作为决策变量,并以协调的方式优化这两个部分,以改进某些感兴趣的度量。为了实现这一目标,我们考虑了在拥挤环境中的分散式多智能体导航问题。通过引入多智能体导航和环境优化的两个子目标,我们提出了一个“智能体-环境协同优化”问题,并开发了一个“协调算法”,该算法在这两个子目标之间交替,以搜索智能体行动和环境障碍物配置的最佳综合,最终改进导航性能。由于明确建模智能体、环境和性能之间关系的挑战,我们利用策略梯度在协调框架内制定了一个无模型学习机制。正式的收敛分析表明,我们的协调算法跟踪一个相关的时变非凸优化问题的局部最小轨迹。大量的数值结果证实了理论发现,并显示了协同优化相对于基线的好处。有趣的是,结果还表明,优化的环境配置能够提供结构性指导,这对解决运动中的智能体冲突至关重要。
更新时间: 2024-03-21 17:37:43
领域: cs.RO,cs.LG,cs.MA
Multiple and Gyro-Free Inertial Datasets
An inertial navigation system (INS) utilizes three orthogonal accelerometers and gyroscopes to determine platform position, velocity, and orientation. There are countless applications for INS, including robotics, autonomous platforms, and the internet of things. Recent research explores the integration of data-driven methods with INS, highlighting significant innovations, improving accuracy and efficiency. Despite the growing interest in this field and the availability of INS datasets, no datasets are available for gyro-free INS (GFINS) and multiple inertial measurement unit (MIMU) architectures. To fill this gap and to stimulate further research in this field, we designed and recorded GFINS and MIMU datasets using 54 inertial sensors grouped in nine inertial measurement units. These sensors can be used to define and evaluate different types of MIMU and GFINS architectures. The inertial sensors were arranged in three different sensor configurations and mounted on a mobile robot and a passenger car. In total, the dataset contains 35 hours of inertial data and corresponding ground truth trajectories. The data and code are freely accessible through our GitHub repository.
Updated: 2024-03-21 17:36:53
标题: 多个和无陀螺惯性数据集
摘要: An inertial navigation system (INS) utilizes three orthogonal accelerometers and gyroscopes to determine platform position, velocity, and orientation. There are countless applications for INS, including robotics, autonomous platforms, and the internet of things. Recent research explores the integration of data-driven methods with INS, highlighting significant innovations, improving accuracy and efficiency. Despite the growing interest in this field and the availability of INS datasets, no datasets are available for gyro-free INS (GFINS) and multiple inertial measurement unit (MIMU) architectures. To fill this gap and to stimulate further research in this field, we designed and recorded GFINS and MIMU datasets using 54 inertial sensors grouped in nine inertial measurement units. These sensors can be used to define and evaluate different types of MIMU and GFINS architectures. The inertial sensors were arranged in three different sensor configurations and mounted on a mobile robot and a passenger car. In total, the dataset contains 35 hours of inertial data and corresponding ground truth trajectories. The data and code are freely accessible through our GitHub repository.
更新时间: 2024-03-21 17:36:53
领域: eess.SP,cs.LG,cs.RO
Large Language Models for Multi-Choice Question Classification of Medical Subjects
The aim of this paper is to evaluate whether large language models trained on multi-choice question data can be used to discriminate between medical subjects. This is an important and challenging task for automatic question answering. To achieve this goal, we train deep neural networks for multi-class classification of questions into the inferred medical subjects. Using our Multi-Question (MQ) Sequence-BERT method, we outperform the state-of-the-art results on the MedMCQA dataset with an accuracy of 0.68 and 0.60 on their development and test sets, respectively. In this sense, we show the capability of AI and LLMs in particular for multi-classification tasks in the Healthcare domain.
Updated: 2024-03-21 17:36:08
标题: 大型语言模型用于医学科目多选题分类
摘要: 本文的目的是评估是否可以利用在多选题数据上训练的大型语言模型来区分医学科目。这对于自动问答来说是一个重要且具有挑战性的任务。为了实现这一目标,我们训练了深度神经网络,用于将问题多类别分类为推断的医学科目。使用我们的Multi-Question(MQ)Sequence-BERT方法,在MedMCQA数据集上取得了最先进的结果,其准确率分别为0.68和0.60。从这个意义上说,我们展示了AI和特别是LLMs在医疗保健领域的多分类任务中的能力。
更新时间: 2024-03-21 17:36:08
领域: cs.CL,cs.AI
Global, robust and comparable digital carbon assets
Carbon credits purchased in the voluntary carbon market allow unavoidable emissions, such as from international flights for essential travel, to be offset by an equivalent climate benefit, such as avoiding emissions from tropical deforestation. However, many concerns regarding the credibility of these offsetting claims have been raised. Moreover, the credit market is manual, therefore inefficient and unscalable, and non-fungible, therefore illiquid. To address these issues, we propose an efficient digital methodology that combines remote sensing data, modern econometric techniques, and on-chain certification and trading to create a new digital carbon asset (the PACT stablecoin) against which carbon offsetting claims can be transparently verified. PACT stablecoins are produced as outputs from a reproducible computational pipeline for estimating the climate benefits of carbon offset projects that not only quantifies the CO2 emissions involved, but also allows for similar credits to be pooled based on their co-benefits such as biodiversity and jurisdictional attributes, increasing liquidity through fungibility within pools. We implement and evaluate the PACT carbon stablecoin on the Tezos blockchain, which is designed to facilitate low-cost transactions while minimizing environmental impact. Our implementation includes a contract for a registry for tracking issuance, ownership, and retirement of credits, and a custodian contract to bridge on-chain and off-chain transactions. Our work brings scale and trust to the voluntary carbon market by providing a transparent, scalable, and efficient framework for high integrity carbon credit transactions.
Updated: 2024-03-21 17:35:07
标题: 全球、强大且可比较的数字碳资产
摘要: 在自愿碳市场购买的碳信用额度可以抵消不可避免的排放,例如国际航班造成的必要旅行排放,通过等同的气候效益来抵消,如避免热带森林砍伐所产生的排放。然而,对这些抵消声明的可信度提出了许多担忧。此外,信用市场是手动的,因此低效和不可扩展,不可交换,因此流动性不佳。为了解决这些问题,我们提出了一种有效的数字方法,结合遥感数据、现代计量技术和基于链的认证和交易,创建一个新的数字碳资产(PACT稳定币),以便透明地验证碳抵消声明。PACT稳定币是通过可重复计算的管道产生的产出,用于估算碳抵消项目的气候效益,不仅量化涉及的CO2排放,还允许基于其生物多样性和司法属性等共同效益对类似信用额度进行汇集,通过在池中的可交换性增加流动性。我们在Tezos区块链上实施和评估了PACT碳稳定币,该区块链旨在促进低成本交易,同时最大限度地减少环境影响。我们的实施包括用于跟踪信用额度发行、所有权和退役的注册表合约,以及桥接链上和链下交易的托管合约。我们的工作通过提供一个透明、可扩展和高效的框架,为高诚信碳信用交易带来了规模和信任。
更新时间: 2024-03-21 17:35:07
领域: cs.CR
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
Large Language Models (LLMs) increasingly support applications in a wide range of domains, some with potential high societal impact such as biomedicine, yet their reliability in realistic use cases is under-researched. In this work we introduce the Reliability AssesMent for Biomedical LLM Assistants (RAmBLA) framework and evaluate whether four state-of-the-art foundation LLMs can serve as reliable assistants in the biomedical domain. We identify prompt robustness, high recall, and a lack of hallucinations as necessary criteria for this use case. We design shortform tasks and tasks requiring LLM freeform responses mimicking real-world user interactions. We evaluate LLM performance using semantic similarity with a ground truth response, through an evaluator LLM.
Updated: 2024-03-21 17:30:59
标题: RAmBLA:评估LLM在生物医学领域作为助手的可靠性的框架
摘要: 大型语言模型(LLMs)越来越支持各种领域的应用,其中一些具有潜在的高社会影响,如生物医学,然而它们在现实应用中的可靠性尚未得到充分研究。在这项工作中,我们介绍了用于生物医学LLM助手的可靠性评估(RAmBLA)框架,并评估了四种最先进的基础LLM是否可以作为生物医学领域的可靠助手。我们确定了提示鲁棒性、高召回率和缺乏幻觉作为这种用例的必要标准。我们设计了短表单任务和需要LLM自由形式回答的任务,模仿真实用户互动。我们使用语义相似性与地面真相响应,通过一个评估者LLM来评估LLM的表现。
更新时间: 2024-03-21 17:30:59
领域: cs.LG,cs.AI
Emergent Dominance Hierarchies in Reinforcement Learning Agents
Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
Updated: 2024-03-21 17:29:37
标题: 强化学习智能体中的紧急主导层次结构
摘要: 现代强化学习(RL)算法能够在各种任务中胜过人类。多智能体强化学习(MARL)设置带来额外挑战,在混合动机智能体群体中成功合作取决于个体和群体目标之间微妙的平衡。社会惯例和规范,通常受到人类机构的启发,被用作实现这种平衡的工具。 在本文中,我们研究了一种根植于动物和人类社会合作的基本、深入研究的社会惯例:支配等级制度。 我们将动物行为学中的支配等级制度理论应用于人工智能体,尽量少地修改已有术语和定义。我们证明,操作没有明确编程或内在奖励的RL智能体群体可以发明、学习、执行和传播支配等级制度给新的群体。出现的支配等级制度与研究过的鸡、老鼠、鱼类和其他物种的结构相似。
更新时间: 2024-03-21 17:29:37
领域: cs.MA,cs.AI,cs.GT,cs.LG
Improving Galileo OSNMA Time To First Authenticated Fix
Galileo is the first global navigation satellite system to authenticate their civilian signals through the Open Service Galileo Message Authentication (OSNMA) protocol. However, OSNMA delays the time to obtain a first position and time fix, the so-called Time To First Authentication Fix (TTFAF). Reducing the TTFAF as much as possible is crucial to integrate the technology seamlessly into the current products. In the cases where the receiver already has cryptographic data available, the so-called hot start mode and focus of this article, the currently available implementations achieve an average TTFAF of around 100 seconds in ideal environments. In this work, we dissect the TTFAF process, propose two main optimizations to reduce the TTFAF, and benchmark them in three distinct scenarios (open-sky, soft urban, and hard urban) with recorded real data. Moreover, we evaluate the optimizations using the synthetic scenario from the official OSNMA test vectors. The first block of optimizations centers on extracting as much information as possible from broken sub-frames by processing them at page level and combining redundant data from multiple satellites. The second block of optimizations aims to reconstruct missed navigation data by using fields in the authentication tags belonging to the same sub-frame as the authentication key. Combining both optimizations improves the TTFAF substantially for all considered scenarios. We obtain an average TTFAF of 60.9 and 68.8 seconds for the test vectors and the open-sky scenario, respectively, with a best-case of 44.0 seconds in both. Likewise, the urban scenarios see a drastic reduction of the average TTFAF between the non-optimized and optimized cases, from 127.5 to 87.5 seconds in the soft urban scenario and from 266.1 to 146.1 seconds in the hard urban scenario. These optimizations are available as part of the open-source OSNMAlib library on GitHub.
Updated: 2024-03-21 17:28:35
标题: 改进伽利略OSNMA首次验证修复时间
摘要: Galileo是第一个通过Open Service Galileo消息认证(OSNMA)协议对其民用信号进行认证的全球导航卫星系统。然而,OSNMA会延迟获取第一个位置和时间修正的时间,即所谓的首次认证修正时间(TTFAF)。尽可能减少TTFAF对于将技术无缝集成到当前产品中至关重要。在接收器已经具有加密数据的情况下,即所谓的热启动模式和本文的重点,目前可用的实现在理想环境中实现了平均TTFAF约为100秒。在这项工作中,我们剖析了TTFAF过程,提出了两个主要的优化方案来减少TTFAF,并在三种不同的场景(开阔天空、软城市和硬城市)中使用记录的真实数据进行了基准测试。此外,我们使用官方OSNMA测试向量中的合成场景评估了这些优化。第一组优化方案的重点是通过在页面级别处理它们并从多个卫星组合冗余数据,从损坏的子帧中提取尽可能多的信息。第二组优化的目标是利用属于认证密钥所在的同一子帧的认证标签中的字段来重建丢失的导航数据。结合这两种优化方案显著改善了所有考虑的场景的TTFAF。我们在测试向量和开阔天空场景中分别获得了平均TTFAF为60.9和68.8秒的结果,最佳情况下为44.0秒。同样,在城市场景中,未优化和优化情况下的平均TTFAF之间看到了显著的减少,软城市场景中从127.5秒降至87.5秒,硬城市场景中从266.1秒降至146.1秒。这些优化方案作为开源OSNMAlib库的一部分在GitHub上提供。
更新时间: 2024-03-21 17:28:35
领域: cs.CR,eess.SP
Unraveling the Mystery of Scaling Laws: Part I
Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by OpenAI did not disclose the complete details necessary to derive the precise scaling law formulas, and their conclusions are only based on models containing up to 1.5 billion parameters. Though some subsequent works attempt to unveil these details and scale to larger models, they often neglect the training dependency of important factors such as the learning rate, context length and batch size, leading to their failure to establish a reliable formula for predicting the test loss trajectory. In this technical report, we confirm that the scaling law formulations proposed in the original OpenAI paper remain valid when scaling the model size up to 33 billion, but the constant coefficients in these formulas vary significantly with the experiment setup. We meticulously identify influential factors and provide transparent, step-by-step instructions to estimate all constant terms in scaling-law formulas by training on models with only 1M~60M parameters. Using these estimated formulas, we showcase the capability to accurately predict various attributes for models with up to 33B parameters before their training, including (1) the minimum possible test loss; (2) the minimum required training steps and processed tokens to achieve a specific loss; (3) the critical batch size with an optimal time/computation trade-off at any loss value; and (4) the complete test loss trajectory with arbitrary batch size.
Updated: 2024-03-21 17:08:43
标题: 揭开标度定律之谜:第一部分
摘要: 比例律原则表明,在训练过程中损失与模型大小、数据集大小和使用的计算资源等变量之间存在幂律相关性。这些原则在优化模型预训练的各个方面中发挥着至关重要的作用,最终有助于成功构建像GPT-4、Llama和Gemini这样的大型语言模型。然而,OpenAI的原始比例律论文并未披露出衍生精确比例律公式所需的完整细节,他们的结论仅基于包含高达15亿参数的模型。尽管一些后续工作试图揭示这些细节并扩展到更大的模型,但它们经常忽视了重要因素如学习率、上下文长度和批大小的训练依赖性,导致无法建立可靠的预测测试损失轨迹的公式。在这份技术报告中,我们确认原始OpenAI论文中提出的比例律公式在将模型大小扩展至330亿时仍然有效,但这些公式中的常数系数在实验设置中会有显著变化。我们细致地确定了影响因素,并提供了透明的、逐步的说明,通过仅使用1M~60M参数的模型进行训练来估计比例律公式中的所有常数项。利用这些估算的公式,我们展示了在模型训练之前准确预测高达330亿参数的模型的各种属性的能力,包括:(1)最小可能的测试损失;(2)实现特定损失所需的最小训练步骤和处理的标记数;(3)在任意损失值下具有最佳时间/计算权衡的关键批大小;以及(4)完整测试损失轨迹及任意批大小。
更新时间: 2024-03-21 17:08:43
领域: cs.LG,cs.CL
A Geospatial Approach to Predicting Desert Locust Breeding Grounds in Africa
Desert locust swarms present a major threat to agriculture and food security. Addressing this challenge, our study develops an operationally-ready model for predicting locust breeding grounds, which has the potential to enhance early warning systems and targeted control measures. We curated a dataset from the United Nations Food and Agriculture Organization's (UN-FAO) locust observation records and analyzed it using two types of spatio-temporal input features: remotely-sensed environmental and climate data as well as multi-spectral earth observation images. Our approach employed custom deep learning models (three-dimensional and LSTM-based recurrent convolutional networks), along with the geospatial foundational model Prithvi recently released by Jakubik et al., 2023. These models notably outperformed existing baselines, with the Prithvi-based model, fine-tuned on multi-spectral images from NASA's Harmonized Landsat and Sentinel-2 (HLS) dataset, achieving the highest accuracy, F1 and ROC-AUC scores (83.03%, 81.53% and 87.69%, respectively). A significant finding from our research is that multi-spectral earth observation images alone are sufficient for effective locust breeding ground prediction without the need to explicitly incorporate climatic or environmental features.
Updated: 2024-03-21 17:06:49
标题: 一个地理空间方法来预测非洲沙漠蝗繁殖地点
摘要: 沙漠蝗虫群给农业和食品安全带来了重大威胁。为了解决这一挑战,我们的研究开发了一个可操作的模型,用于预测蝗虫繁殖地点,这有可能增强早期预警系统和定向控制措施。我们从联合国粮食和农业组织(UN-FAO)的蝗虫观测记录中整理了数据集,并使用两种类型的时空输入特征进行分析:遥感环境和气候数据以及多光谱地球观测图像。我们的方法采用了定制的深度学习模型(三维和基于LSTM的循环卷积网络),以及Jakubik等人最近发布的地理空间基础模型Prithvi。这些模型显著优于现有基线,基于Prithvi的模型,在NASA的Harmonized Landsat和Sentinel-2(HLS)数据集中对多光谱图像进行微调,实现了最高的准确性、F1和ROC-AUC分数(分别为83.03%、81.53%和87.69%)。我们研究的一个重要发现是,仅凭借多光谱地球观测图像就足以有效地预测蝗虫繁殖地点,而无需明确纳入气候或环境特征。
更新时间: 2024-03-21 17:06:49
领域: cs.LG,cs.CV
The Era of Semantic Decoding
Recent work demonstrated great promise in the idea of orchestrating collaborations between LLMs, human input, and various tools to address the inherent limitations of LLMs. We propose a novel perspective called semantic decoding, which frames these collaborative processes as optimization procedures in semantic space. Specifically, we conceptualize LLMs as semantic processors that manipulate meaningful pieces of information that we call semantic tokens (known thoughts). LLMs are among a large pool of other semantic processors, including humans and tools, such as search engines or code executors. Collectively, semantic processors engage in dynamic exchanges of semantic tokens to progressively construct high-utility outputs. We refer to these orchestrated interactions among semantic processors, optimizing and searching in semantic space, as semantic decoding algorithms. This concept draws a direct parallel to the well-studied problem of syntactic decoding, which involves crafting algorithms to best exploit auto-regressive language models for extracting high-utility sequences of syntactic tokens. By focusing on the semantic level and disregarding syntactic details, we gain a fresh perspective on the engineering of AI systems, enabling us to imagine systems with much greater complexity and capabilities. In this position paper, we formalize the transition from syntactic to semantic tokens as well as the analogy between syntactic and semantic decoding. Subsequently, we explore the possibilities of optimizing within the space of semantic tokens via semantic decoding algorithms. We conclude with a list of research opportunities and questions arising from this fresh perspective. The semantic decoding perspective offers a powerful abstraction for search and optimization directly in the space of meaningful concepts, with semantic tokens as the fundamental units of a new type of computation.
Updated: 2024-03-21 17:06:17
标题: 语义解码的时代
摘要: 最近的工作展示了在LLM、人类输入和各种工具之间协作的想法在解决LLM固有限制方面具有巨大潜力。我们提出了一种新颖的观点,称为语义解码,将这些协作过程框定为语义空间中的优化过程。具体来说,我们将LLM概念化为语义处理器,这些处理器操纵我们称之为语义标记(已知思想)的有意义的信息片段。LLM是众多其他语义处理器之一,包括人类和工具,如搜索引擎或代码执行器。语义处理器共同参与语义标记的动态交换,逐步构建高效率输出。我们将这些在语义空间中协调的语义处理器之间的交互行为,优化和搜索称为语义解码算法。这个概念直接类比了已经研究的句法解码问题,涉及为提取高效率的句法标记序列而Crafting算法以最好地利用自回归语言模型。通过专注于语义层面并忽略句法细节,我们获得了对AI系统工程的新视角,使我们能够想象具有更大复杂性和功能的系统。在这篇立场论文中,我们形式化了从句法到语义标记的过渡以及句法和语义解码之间的类比。随后,我们探讨了通过语义解码算法在语义标记空间内进行优化的可能性。我们最后列出了从这一新视角产生的研究机会和问题。语义解码视角为在有意义概念空间中直接进行搜索和优化提供了强大的抽象,其中语义标记作为一种新型计算的基本单位。
更新时间: 2024-03-21 17:06:17
领域: cs.CL,cs.AI,cs.HC,cs.MA
Exploring the Market Dynamics of Liquid Staking Derivatives (LSDs)
Staking has emerged as a crucial concept following Ethereum's transition to Proof-of-Stake consensus. The introduction of Liquid Staking Derivatives (LSDs) has effectively addressed the illiquidity issue associated with solo staking, gaining significant market attention. This paper analyzes the LSD market dynamics from the perspectives of both liquidity takers (LTs) and liquidity providers (LPs). We first quantify the price discrepancy between the LSD primary and secondary markets. Then we investigate and empirically measure how LTs can leverage such discrepancy to exploit arbitrage opportunities, unveiling the potential barriers to LSD arbitrages. In addition, we evaluate the financial profit and losses experienced by LPs who supply LSDs for liquidity provision. Our results show that 66% of LSD liquidity positions generate returns lower than those from simply holding the corresponding LSDs.
Updated: 2024-03-21 17:03:28
标题: 探索流动权益质押衍生品(LSDs)的市场动态
摘要: Staking在以太坊转向权益证明共识后已经成为一个关键的概念。Liquid Staking Derivatives(LSDs)的引入有效地解决了与单独staking相关的流动性问题,吸引了市场的重要关注。本文从流动性接受者(LTs)和流动性提供者(LPs)的角度分析了LSD市场动态。我们首先量化了LSD一级市场和二级市场之间的价格差异。然后我们调查并实证测量了LTs如何利用这种差异来利用套利机会,揭示了LSD套利的潜在障碍。此外,我们评估了为提供流动性而供应LSDs的LPs所经历的财务利润和损失。我们的结果显示,66%的LSD流动性头寸产生的回报低于简单持有相应LSDs的回报。
更新时间: 2024-03-21 17:03:28
领域: cs.CR
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Today's most accurate language models are trained on orders of magnitude more language data than human language learners receive - but with no supervision from other sensory modalities that play a crucial role in human learning. Can we make LMs' representations and predictions more accurate (and more human-like) with more ecologically plausible supervision? This paper describes LexiContrastive Grounding (LCG), a grounded language learning procedure that leverages visual supervision to improve textual representations. LexiContrastive Grounding combines a next token prediction strategy with a contrastive visual grounding objective, focusing on early-layer representations that encode lexical information. Across multiple word-learning and sentence-understanding benchmarks, LexiContrastive Grounding not only outperforms standard language-only models in learning efficiency, but also improves upon vision-and-language learning procedures including CLIP, GIT, Flamingo, and Vokenization. Moreover, LexiContrastive Grounding improves perplexity by around 5% on multiple language modeling tasks. This work underscores the potential of incorporating visual grounding into language models, aligning more closely with the multimodal nature of human language acquisition.
Updated: 2024-03-21 16:52:01
标题: 词汇层面对比性视觉引地提升语言建模
摘要: 目前最准确的语言模型是通过比人类语言学习者接收到的语言数据多几个数量级来训练的,但没有其他感官模态的监督,这在人类学习中起着至关重要的作用。我们能否通过更生态合理的监督使语言模型的表示和预测更准确(更类似于人类)?本文描述了LexiContrastive Grounding(LCG),这是一种利用视觉监督来改进文本表示的基于实地语言学习过程。LexiContrastive Grounding结合了下一个token预测策略和对比视觉基础目标,重点放在编码词汇信息的早期层表示上。在多个单词学习和句子理解基准测试中,LexiContrastive Grounding不仅在学习效率上优于标准的仅语言模型,还改进了包括CLIP、GIT、Flamingo和Vokenization在内的视觉与语言学习程序。此外,LexiContrastive Grounding在多个语言建模任务中将困惑度提高约5%。这项工作强调了将视觉基础纳入语言模型的潜力,更贴近人类语言习得的多模态性质。
更新时间: 2024-03-21 16:52:01
领域: cs.CL,cs.AI,cs.LG
Dynamic Explanation Emphasis in Human-XAI Interaction with Communication Robot
Communication robots have the potential to contribute to effective human-XAI interaction as an interface that goes beyond textual or graphical explanations. One of their strengths is that they can use physical and vocal expressions to add detailed nuances to explanations. However, it is not clear how a robot can apply such expressions, or in particular, how we can develop a strategy to adaptively use such expressions depending on the task and user in dynamic interactions. To address this question, this paper proposes DynEmph, a method for a communication robot to decide where to emphasize XAI-generated explanations with physical expressions. It predicts the effect of emphasizing certain points on a user and aims to minimize the expected difference between predicted user decisions and AI-suggested ones. DynEmph features a strategy for deciding where to emphasize in a data-driven manner, relieving engineers from the need to manually design a strategy. We further conducted experiments to investigate how emphasis selection strategies affect the performance of user decisions. The results suggest that, while a naive strategy (emphasizing explanations for an AI's most probable class) does not necessarily work better, DynEmph effectively guides users to better decisions under the condition that the performance of the AI suggestion is high.
Updated: 2024-03-21 16:50:12
标题: 人机交互中交流机器人对于动态解释强调的重要性
摘要: 沟通机器人有潜力作为一个界面,超越了文本或图形解释,有助于有效的人机-XAI互动。它们的优势之一是可以利用物理和语音表达来为解释添加细致的细微差别。然而,目前尚不清楚机器人如何应用这些表达,特别是如何开发一种策略,根据任务和用户在动态交互中自适应地使用这些表达。为了解决这个问题,本文提出了DynEmph,一种用于通信机器人决定在XAI生成的解释中何处强调物理表达的方法。它预测了强调某些点对用户的影响,并旨在最小化预测用户决策与AI建议之间的预期差异。DynEmph具有一种基于数据驱动的策略,用于决定何处强调,减轻工程师手动设计策略的需要。我们进一步进行了实验,以调查强调选择策略如何影响用户决策的表现。结果表明,虽然一个天真的策略(强调对AI最可能的类别的解释)不一定效果更好,但在AI建议的性能高的情况下,DynEmph有效地指导用户做出更好的决策。
更新时间: 2024-03-21 16:50:12
领域: cs.HC,cs.AI
Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images
The application of data augmentation for deep learning (DL) methods plays an important role in achieving state-of-the-art results in supervised, semi-supervised, and self-supervised image classification. In particular, channel transformations (e.g., solarize, grayscale, brightness adjustments) are integrated into data augmentation pipelines for remote sensing (RS) image classification tasks. However, contradicting beliefs exist about their proper applications to RS images. A common point of critique is that the application of channel augmentation techniques may lead to physically inconsistent spectral data (i.e., pixel signatures). To shed light on the open debate, we propose an approach to estimate whether a channel augmentation technique affects the physical information of RS images. To this end, the proposed approach estimates a score that measures the alignment of a pixel signature within a time series that can be naturally subject to deviations caused by factors such as acquisition conditions or phenological states of vegetation. We compare the scores associated with original and augmented pixel signatures to evaluate the physical consistency. Experimental results on a multi-label image classification task show that channel augmentations yielding a score that exceeds the expected deviation of original pixel signatures can not improve the performance of a baseline model trained without augmentation.
Updated: 2024-03-21 16:48:45
标题: 估算遥感图像通道数据增强的物理信息一致性
摘要: 数据增强在深度学习(DL)方法中的应用对于在监督、半监督和自监督图像分类中取得最新成果起着重要作用。特别是,通道变换(例如,反相、灰度、亮度调整)被整合到遥感(RS)图像分类任务的数据增强流程中。然而,关于它们在RS图像中的正确应用存在相互矛盾的观点。一个常见的批评观点是,通道增强技术的应用可能导致物理上不一致的光谱数据(即,像素特征)。为了揭示这一争论,我们提出了一种方法来估计通道增强技术是否影响RS图像的物理信息。为此,所提出的方法估计一个分数,该分数衡量了一个时间序列中像素特征的对齐度,该对齐度可能受到采集条件或植被的物候状态等因素引起的偏差的影响。我们比较原始和增强像素特征相关的分数,以评估物理一致性。在一个多标签图像分类任务上的实验结果表明,产生超过原始像素特征预期偏差的分数的通道增强无法提高基线模型的性能,而基线模型是在没有增强的情况下训练的。
更新时间: 2024-03-21 16:48:45
领域: cs.CV,cs.LG
Intelligent Canvas: Enabling Design-Like Exploratory Visual Data Analysis with Generative AI through Rapid Prototyping, Iteration and Curation
Complex data analysis inherently seeks unexpected insights through exploratory visual analysis methods, transcending logical, step-by-step processing. However, existing interfaces such as notebooks and dashboards have limitations in exploration and comparison for visual data analysis. Addressing these limitations, we introduce a "design-like" intelligent canvas environment integrating generative AI into data analysis, offering rapid prototyping, iteration, and comparative visualization management. Our dual contributions include the integration of generative AI components into a canvas interface, and empirical findings from a user study (N=10) evaluating the effectiveness of the canvas interface.
Updated: 2024-03-21 16:44:41
标题: 智能画布:通过快速原型设计、迭代和策展,借助生成人工智能实现类似设计的探索性视觉数据分析
摘要: 复杂数据分析本质上通过探索性可视化分析方法寻求意想不到的见解,超越逻辑的、逐步进行的处理。然而,现有的界面如笔记本和仪表板在可视化数据分析的探索和比较方面存在局限性。为了解决这些局限性,我们引入了一个“设计式”的智能画布环境,将生成式人工智能集成到数据分析中,提供快速原型设计、迭代和比较可视化管理。我们的双重贡献包括将生成式人工智能组件整合到画布界面中,以及来自用户研究(N=10)的实证结果,评估画布界面的有效性。
更新时间: 2024-03-21 16:44:41
领域: cs.HC,cs.AI
Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild
One of the biggest challenges in single-view 3D shape reconstruction in the wild is the scarcity of <3D shape, 2D image>-paired data from real-world environments. Inspired by remarkable achievements via domain randomization, we propose ObjectDR which synthesizes such paired data via a random simulation of visual variations in object appearances and backgrounds. Our data synthesis framework exploits a conditional generative model (e.g., ControlNet) to generate images conforming to spatial conditions such as 2.5D sketches, which are obtainable through a rendering process of 3D shapes from object collections (e.g., Objaverse-XL). To simulate diverse variations while preserving object silhouettes embedded in spatial conditions, we also introduce a disentangled framework which leverages an initial object guidance. After synthesizing a wide range of data, we pre-train a model on them so that it learns to capture a domain-invariant geometry prior which is consistent across various domains. We validate its effectiveness by substantially improving 3D shape reconstruction models on a real-world benchmark. In a scale-up evaluation, our pre-training achieves 23.6% superior results compared with the pre-training on high-quality computer graphics renderings.
Updated: 2024-03-21 16:40:10
标题: Object-Centric Domain Randomization用于野外3D形状重建
摘要: 在野外进行单视图3D形状重建的最大挑战之一是真实环境中<3D形状,2D图像>配对数据的稀缺性。受到域随机化取得的显著成就的启发,我们提出了ObjectDR,通过对对象外观和背景的视觉变化进行随机模拟来合成这种配对数据。我们的数据合成框架利用条件生成模型(例如ControlNet)生成符合空间条件(如2.5D草图)的图像,这些空间条件可以通过从对象集合(例如Objaverse-XL)渲染3D形状的过程获得。为了在保留嵌入空间条件中的对象轮廓的同时模拟多样化的变化,我们还引入了一个分离的框架,利用初始对象指导。在合成了各种数据之后,我们在其上预训练模型,使其学习捕捉跨各种域保持一致的几何先验。我们通过显著改进真实世界基准上的3D形状重建模型来验证其有效性。在规模评估中,我们的预训练结果比在高质量计算机图形渲染上的预训练结果高出23.6%。
更新时间: 2024-03-21 16:40:10
领域: cs.CV,cs.AI,cs.LG
Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors
Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web-trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario validate the efficacy of our approach, demonstrating its potential for advancing semantic-aware robotics manipulation. Web page: https://tsagkas.github.io/click2grasp
Updated: 2024-03-21 16:26:19
标题: 点击抓取:通过视觉扩散描述符实现零射击精确操纵
摘要: 精确操纵在机器人领域仍然是一个持久的挑战,而且很难泛化到不同场景和物体。目前针对这一任务的方法很大程度上依赖于具有显着视觉和/或几何部分模糊性的对象的大量训练实例。我们的工作通过利用基于网络训练的文本到图像扩散式生成模型,探索了在零样本设置中对精确操纵进行细粒度部件描述的基础。我们将问题构建为一个密集语义部位对应任务。我们的模型返回一个夹持器姿势,用于操纵特定部件,参考自来自同一对象的视觉不同实例的源图像中用户定义的点击。我们无需手动夹持演示,因为我们利用了固有的对象几何和特征。在真实世界桌面场景中的实际实验验证了我们方法的有效性,展示了其推动语义感知机器人操纵的潜力。网页:https://tsagkas.github.io/click2grasp
更新时间: 2024-03-21 16:26:19
领域: cs.RO,cs.AI,cs.CV
Let's do the time-warp-attend: Learning topological invariants of dynamical systems
Dynamical systems across the sciences, from electrical circuits to ecological networks, undergo qualitative and often catastrophic changes in behavior, called bifurcations, when their underlying parameters cross a threshold. Existing methods predict oncoming catastrophes in individual systems but are primarily time-series-based and struggle both to categorize qualitative dynamical regimes across diverse systems and to generalize to real data. To address this challenge, we propose a data-driven, physically-informed deep-learning framework for classifying dynamical regimes and characterizing bifurcation boundaries based on the extraction of topologically invariant features. We focus on the paradigmatic case of the supercritical Hopf bifurcation, which is used to model periodic dynamics across a wide range of applications. Our convolutional attention method is trained with data augmentations that encourage the learning of topological invariants which can be used to detect bifurcation boundaries in unseen systems and to design models of biological systems like oscillatory gene regulatory networks. We further demonstrate our method's use in analyzing real data by recovering distinct proliferation and differentiation dynamics along pancreatic endocrinogenesis trajectory in gene expression space based on single-cell data. Our method provides valuable insights into the qualitative, long-term behavior of a wide range of dynamical systems, and can detect bifurcations or catastrophic transitions in large-scale physical and biological systems.
Updated: 2024-03-21 16:26:09
标题: 让我们一起做时间扭曲-参与:学习动力系统的拓扑不变量
摘要: 动力系统在各个科学领域中都存在,从电路到生态网络,当其基本参数越过阈值时,其行为会发生定性和经常是灾难性的变化,这被称为分岔。现有的方法可以预测个别系统中即将发生的灾难,但主要基于时间序列,并且在分类各种系统中的定性动态制度和推广到真实数据方面存在困难。为了解决这一挑战,我们提出了一个数据驱动的、具有物理信息的深度学习框架,用于分类动态制度并描述基于拓扑不变特征的分岔边界的提取。我们专注于超临界Hopf分岔的范例情况,该分岔用于模拟各种应用中的周期动态。我们的卷积关注方法经过数据增强训练,鼓励学习可用于检测未知系统中的分岔边界并设计生物系统模型,例如振荡基因调控网络。我们进一步通过基于单细胞数据在基因表达空间中恢复胰腺内分泌发育轨迹上的明显增殖和分化动态来演示我们的方法在分析真实数据中的应用。我们的方法为各种动力系统的定性、长期行为提供了宝贵的见解,并且可以检测大规模物理和生物系统中的分岔或灾难性转变。
更新时间: 2024-03-21 16:26:09
领域: cs.LG,math.DS,stat.ML
Dodging DeepFake Detection via Implicit Spatial-Domain Notch Filtering
The current high-fidelity generation and high-precision detection of DeepFake images are at an arms race. We believe that producing DeepFakes that are highly realistic and 'detection evasive' can serve the ultimate goal of improving future generation DeepFake detection capabilities. In this paper, we propose a simple yet powerful pipeline to reduce the artifact patterns of fake images without hurting image quality by performing implicit spatial-domain notch filtering. We first demonstrate that frequency-domain notch filtering, although famously shown to be effective in removing periodic noise in the spatial domain, is infeasible for our task at hand due to the manual designs required for the notch filters. We, therefore, resort to a learning-based approach to reproduce the notch filtering effects, but solely in the spatial domain. We adopt a combination of adding overwhelming spatial noise for breaking the periodic noise pattern and deep image filtering to reconstruct the noise-free fake images, and we name our method DeepNotch. Deep image filtering provides a specialized filter for each pixel in the noisy image, producing filtered images with high fidelity compared to their DeepFake counterparts. Moreover, we also use the semantic information of the image to generate an adversarial guidance map to add noise intelligently. Our large-scale evaluation on 3 representative state-of-the-art DeepFake detection methods (tested on 16 types of DeepFakes) has demonstrated that our technique significantly reduces the accuracy of these 3 fake image detection methods, 36.79% on average and up to 97.02% in the best case.
Updated: 2024-03-21 16:24:05
标题: 通过隐式空域陷阱滤波规避DeepFake检测
摘要: 目前,高保真生成和高精度检测DeepFake图像正处于一场军备竞赛中。我们相信,制作高度逼真且“检测难以”的DeepFakes可以为改善未来一代DeepFake检测能力的最终目标提供帮助。在本文中,我们提出了一个简单但强大的流程,通过隐式空间域槽滤波来减少虚假图像的伪影模式,而不损害图像质量。我们首先证明,尽管频域槽滤波在去除空间域中的周期性噪声方面被广泛证明是有效的,但由于需要手动设计槽滤波器,因此对于我们手头的任务来说是不可行的。因此,我们转向基于学习的方法来复制槽滤波效果,但仅在空间域中。我们采用添加大量空间噪声来破坏周期性噪声模式和深度图像滤波来重建无噪声的虚假图像的组合,我们将我们的方法命名为DeepNotch。深度图像滤波为嘈杂图像中的每个像素提供了专门的滤波器,与其DeepFake对应物相比,产生高保真度的滤波图像。此外,我们还利用图像的语义信息生成一个对抗性引导图,以智能地添加噪声。我们在3种代表性最先进的DeepFake检测方法上进行了大规模评估(在16种DeepFakes上进行测试),结果表明,我们的技术显著降低了这3种虚假图像检测方法的准确度,平均降低36.79%,在最好的情况下高达97.02%。
更新时间: 2024-03-21 16:24:05
领域: cs.CV,cs.CR,cs.LG
QuATON: Quantization Aware Training of Optical Neurons
Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable parameters in optical neurons, and should be handled during the design of the optical processor in order to avoid a model mismatch. Specifically, optical neurons should be trained or designed within the physical-constraints at a predefined quantized precision level. To address this critical issues we propose a physics-informed quantization-aware training framework. Our approach accounts for physical constraints during the training process, leading to robust designs. We demonstrate that our approach can design state of the art optical processors using diffractive networks for multiple physics based tasks despite quantized learnable parameters. We thus lay the foundation upon which improved optical processors may be 3D fabricated in the future.
Updated: 2024-03-21 16:21:45
标题: QuATON:光神经元的量化感知训练
摘要: 光学处理器,利用“光学神经元”构建,能够以光速高效执行高维度的线性操作。因此,它们是加速大规模线性计算的有前途的途径。随着微加工技术的不断进步,这种光学处理器现在可以进行三维制造,但精度有限。这种限制导致光学神经元中可学习参数的量化,并应在光学处理器设计过程中处理,以避免模型不匹配。具体而言,光学神经元应在预定义的量化精度水平内进行训练或设计,以满足物理约束。为了解决这一关键问题,我们提出了一种基于物理信息的量化感知训练框架。我们的方法在训练过程中考虑了物理约束,导致鲁棒的设计。我们证明,尽管学习参数量化,我们的方法可以设计出用于多种基于物理的任务的最先进的光学处理器,通过衍射网络。因此,我们为未来制造更好的光学处理器奠定了基础。
更新时间: 2024-03-21 16:21:45
领域: cs.LG,eess.IV,physics.optics
Collaborative Distributed Machine Learning
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML systems regarding their suitability for use cases is often difficult. This work presents a CDML system conceptualization and CDML archetypes to support comparison of CDML systems and introduce scientific and practical audiences to the principal functioning and key traits of CDML systems.
Updated: 2024-03-21 16:21:23
标题: 协作式分布式机器学习
摘要: 各种协作式分布式机器学习(CDML)系统,包括联邦学习系统和群体学习系统,具有不同的关键特征,旨在利用资源以保护机器学习(ML)模型的开发和使用的机密性。为了满足用例要求,需要选择合适的CDML系统。然而,对于CDML系统在用例中的适用性进行比较通常是困难的。本文提出了一个CDML系统概念化和CDML原型,以支持对CDML系统的比较,并向科学和实践受众介绍CDML系统的主要功能和关键特征。
更新时间: 2024-03-21 16:21:23
领域: cs.MA,cs.ET,cs.LG,cs.SE
Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles
Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Obtaining robust counterfactual explanations is essential to provide valid algorithmic recourse and meaningful explanations. We study the robustness of explanations of randomized ensembles, which are always subject to algorithmic uncertainty even when the training data is fixed. We formalize the generation of robust counterfactual explanations as a probabilistic problem and show the link between the robustness of ensemble models and the robustness of base learners. We develop a practical method with good empirical performance and support it with theoretical guarantees for ensembles of convex base learners. Our results show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below $50\%$ on most data sets and can fall to $20\%$ on problems with many features. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
Updated: 2024-03-21 16:14:01
标题: 不解释噪音:随机集成的强健反事实因果推断
摘要: 反事实解释描述了如何修改特征向量以翻转训练分类器的结果。获得稳健的反事实解释对于提供有效的算法补救和有意义的解释至关重要。我们研究了随机集成解释的稳健性,即使训练数据固定,它们也始终受算法不确定性的影响。我们将生成稳健反事实解释形式化为一个概率问题,并展示了集成模型的稳健性与基础学习器的稳健性之间的联系。我们开发了一个实用的方法,在实证性能方面表现良好,并为凸基础学习器的集成模型提供了理论保证。我们的结果显示,现有方法的稳健性非常低:在大多数数据集上,朴素反事实的有效性低于50%,并且在具有许多特征的问题上可能降至20%。相比之下,我们的方法在反事实解释与其初始观察之间的距离仅略微增加的情况下实现了高稳健性。
更新时间: 2024-03-21 16:14:01
领域: cs.LG,math.OC
Assessing the Causal Impact of Humanitarian Aid on Food Security
In the face of climate change-induced droughts, vulnerable regions encounter severe threats to food security, demanding urgent humanitarian assistance. This paper introduces a causal inference framework for the Horn of Africa, aiming to assess the impact of cash-based interventions on food crises. Our contributions include identifying causal relationships within the food security system, harmonizing a comprehensive database including socio-economic, weather and remote sensing data, and estimating the causal effect of humanitarian interventions on malnutrition. On a country level, our results revealed no significant effects, likely due to limited sample size, suboptimal data quality, and an imperfect causal graph resulting from our limited understanding of multidisciplinary systems like food security. Instead, on a district level, results revealed significant effects, further implying the context-specific nature of the system. This underscores the need to enhance data collection and refine causal models with domain experts for more effective future interventions and policies, improving transparency and accountability in humanitarian aid.
Updated: 2024-03-21 16:11:17
标题: 评估人道援助对粮食安全的因果影响
摘要: 在面对气候变化引发的干旱时,脆弱地区面临食品安全严重威胁,需要紧急人道主义援助。本文介绍了一个用于非洲之角的因果推断框架,旨在评估现金援助对粮食危机的影响。我们的贡献包括识别粮食安全系统内的因果关系,协调一个包括社会经济、天气和遥感数据的综合数据库,并估计人道主义干预对营养不良的因果效应。在国家层面上,我们的结果显示了没有显著影响,可能是由于样本量有限、数据质量不佳,以及由于我们对诸如粮食安全等多学科系统的有限了解而导致的不完美因果图。相反,在地区层面上,结果显示了显著影响,进一步暗示了系统的特定上下文性质。这强调了增强数据收集和与领域专家精炼因果模型的必要性,以便更有效地进行未来干预和政策,提高人道主义援助的透明度和问责性。
更新时间: 2024-03-21 16:11:17
领域: cs.LG
Machine-learning invariant foliations in forced systems for reduced order modelling
We identify reduced order models (ROM) of forced systems from data using invariant foliations. The forcing can be external, parametric, periodic or quasi-periodic. The process has four steps: 1. identify an approximate invariant torus and the linear dynamics about the torus; 2. identify a globally defined invariant foliation about the torus; 3. identify a local foliation about an invariant manifold that complements the global foliation 4. extract the invariant manifold as the leaf going through the torus and interpret the result. We combine steps 2 and 3, so that we can track the location of the invariant torus and scale the invariance equations appropriately. We highlight some fundamental limitations of invariant manifolds and foliations when fitting them to data, that require further mathematics to resolve.
Updated: 2024-03-21 16:10:42
标题: 机器学习不变流形在强迫系统中的应用于降阶建模
摘要: 我们使用不变叶面从数据中识别受迫系统的降阶模型(ROM)。强制可以是外部的、参数化的、周期性的或准周期性的。该过程包括四个步骤:1. 识别一个近似的不变环面以及环面周围的线性动力学;2. 识别关于环面的全局定义的不变叶面;3. 识别关于一个补充全局叶面的不变流形的局部叶面;4. 提取作为穿过环面的叶子的不变流形并解释结果。我们结合步骤2和3,以便能够跟踪不变环面的位置并适当地调整不变性方程。我们强调在将不变流形和叶面拟合到数据时存在一些基本限制,需要进一步的数学解决。
更新时间: 2024-03-21 16:10:42
领域: math.DS,cs.LG
Learning a Depth Covariance Function
We propose learning a depth covariance function with applications to geometric vision tasks. Given RGB images as input, the covariance function can be flexibly used to define priors over depth functions, predictive distributions given observations, and methods for active point selection. We leverage these techniques for a selection of downstream tasks: depth completion, bundle adjustment, and monocular dense visual odometry.
Updated: 2024-03-21 16:09:57
标题: 学习深度协方差函数
摘要: 我们提出了学习深度协方差函数,并将其应用于几何视觉任务。给定RGB图像作为输入,协方差函数可以灵活地用于定义深度函数的先验、给定观测值的预测分布,以及主动点选取方法。我们利用这些技术来解决一系列下游任务:深度补全、捆绑调整和单目稠密视觉里程计。
更新时间: 2024-03-21 16:09:57
领域: cs.CV,cs.LG,cs.RO
Constrained Reinforcement Learning with Smoothed Log Barrier Function
Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained problems via reward shaping can be difficult as it requires tedious manual tuning of reward functions with several interacting terms. Recent formulations which include constraints mostly require a pre-training phase, which often needs human expertise to collect data or assumes having a sub-optimal policy readily available. We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function), which achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic. It implements an adaptive penalty for policy learning and alleviates the numerical issues that are known to complicate the application of the log barrier function method. As a result, we show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty and evaluate our methods in a locomotion task on a real quadruped robot platform.
Updated: 2024-03-21 16:02:52
标题: 具有平滑对数障碍函数的受限强化学习
摘要: 强化学习(RL)已被广泛应用于许多控制任务中,并与传统控制方法相比,在许多领域中明显提高了性能,其中奖励函数被明确定义。然而,对于许多现实世界的问题,通常更方便同时在奖励和约束方面制定优化问题。通过奖励塑形优化这种受限问题可能会很困难,因为它需要通过手动调整包含多个相互作用项的奖励函数。最近的表述大多需要一个预训练阶段,通常需要人类专业知识来收集数据,或者假设已经有一个次优策略可供使用。我们提出了一种新的受限RL方法,称为CSAC-LB(具有对数屏障函数的受限软演员-评论家方法),通过将线性平滑的对数屏障函数应用于额外的安全评论家,实现了竞争性性能,而无需任何预训练。它实现了对策略学习的自适应惩罚,并缓解了已知会使对数屏障函数方法应用复杂化的数值问题。因此,我们展示了通过CSAC-LB,在几个受限控制任务上实现了最先进的性能,包括在真实的四足机器人平台上进行了一项运动任务的评估。
更新时间: 2024-03-21 16:02:52
领域: cs.LG,cs.AI,cs.SY,eess.SY
TMI! Finetuned Models Leak Private Information from their Pretraining Data
Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for $\textit{privacy}$ in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate $\textbf{TMI}$ on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of $\textbf{TMI}$ can be found $\href{https://github.com/johnmath/tmi-pets24}{\text{on GitHub}}$.
Updated: 2024-03-21 15:57:29
标题: 太多信息!微调模型泄漏其预训练数据中的私人信息
摘要: 迁移学习已经成为机器学习中越来越流行的技术,可以利用一个为某一任务训练过的预训练模型来帮助构建一个针对相关任务进行微调的模型。这种范式特别受欢迎用于机器学习中的隐私保护,其中预训练模型被认为是公开的,只有用于微调的数据被认为是敏感的。然而,有理由相信用于预训练的数据仍然是敏感的,因此了解微调模型泄露有关预训练数据的信息量是至关重要的。在这项工作中,我们提出了一种新的成员推理威胁模型,其中对手只能访问微调模型,并希望推断出预训练数据的成员资格。为了实现这种威胁模型,我们实现了一种基于元分类器的攻击,TMI,利用了预训练样本对下游任务中的预测的影响。我们在视觉和自然语言任务的多个迁移学习设置中评估了TMI,包括使用差分隐私进行微调。通过我们的评估,我们发现TMI可以成功地使用对微调模型的查询访问推断出预训练示例的成员资格。TMI的开源实现可以在GitHub上找到(https://github.com/johnmath/tmi-pets24)。
更新时间: 2024-03-21 15:57:29
领域: cs.LG,cs.CR
Soft Learning Probabilistic Circuits
Probabilistic Circuits (PCs) are prominent tractable probabilistic models, allowing for a range of exact inferences. This paper focuses on the main algorithm for training PCs, LearnSPN, a gold standard due to its efficiency, performance, and ease of use, in particular for tabular data. We show that LearnSPN is a greedy likelihood maximizer under mild assumptions. While inferences in PCs may use the entire circuit structure for processing queries, LearnSPN applies a hard method for learning them, propagating at each sum node a data point through one and only one of the children/edges as in a hard clustering process. We propose a new learning procedure named SoftLearn, that induces a PC using a soft clustering process. We investigate the effect of this learning-inference compatibility in PCs. Our experiments show that SoftLearn outperforms LearnSPN in many situations, yielding better likelihoods and arguably better samples. We also analyze comparable tractable models to highlight the differences between soft/hard learning and model querying.
Updated: 2024-03-21 15:56:15
标题: 软学习概率电路
摘要: 概率电路(PCs)是著名的可计算概率模型,允许进行一系列精确推断。本文关注PCs的主要训练算法LearnSPN,这是一个效率高、性能优越且易于使用的黄金标准,特别适用于表格数据。我们展示了在温和假设下,LearnSPN是一种贪婪的似然最大化方法。虽然PCs中的推断可能利用整个电路结构来处理查询,但LearnSPN采用一种硬方法来学习它们,即在每个求和节点上,将数据点通过一个且仅一个子节点/边进行传播,就像在硬聚类过程中一样。我们提出了一种名为SoftLearn的新学习程序,通过软聚类过程诱导出一个PC。我们研究了这种学习-推断兼容性对PCs的影响。我们的实验显示,在许多情况下,SoftLearn比LearnSPN表现更好,产生更好的似然和可能更好的样本。我们还分析了可计算模型,以突显软/硬学习和模型查询之间的差异。
更新时间: 2024-03-21 15:56:15
领域: cs.LG,cs.AI
How Human-Centered Explainable AI Interface Are Designed and Evaluated: A Systematic Survey
Despite its technological breakthroughs, eXplainable Artificial Intelligence (XAI) research has limited success in producing the {\em effective explanations} needed by users. In order to improve XAI systems' usability, practical interpretability, and efficacy for real users, the emerging area of {\em Explainable Interfaces} (EIs) focuses on the user interface and user experience design aspects of XAI. This paper presents a systematic survey of 53 publications to identify current trends in human-XAI interaction and promising directions for EI design and development. This is among the first systematic survey of EI research.
Updated: 2024-03-21 15:44:56
标题: 人类中心的可解释AI界面是如何设计和评估的:一项系统性调查
摘要: 尽管在技术上取得了突破,可解释人工智能(XAI)研究在为用户提供所需的有效解释方面取得了有限的成功。为了提高XAI系统的可用性、实用的可解释性和对真实用户的效力,新兴领域“可解释界面”(EIs)专注于XAI的用户界面和用户体验设计方面。本文对53篇文献进行了系统调查,以确定当前人机-XAI交互的趋势和EI设计与开发的有前途的方向。这是对EI研究的首次系统调查之一。
更新时间: 2024-03-21 15:44:56
领域: cs.HC,cs.AI
Learning to Project for Cross-Task Knowledge Distillation
Traditional knowledge distillation (KD) relies on a proficient teacher trained on the target task, which is not always available. In this setting, cross-task distillation can be used, enabling the use of any teacher model trained on a different task. However, many KD methods prove ineffective when applied to this cross-task setting. To address this limitation, we propose a simple modification: the use of an inverted projection. We show that this drop-in replacement for a standard projector is effective by learning to disregard any task-specific features which might degrade the student's performance. We find that this simple modification is sufficient for extending many KD methods to the cross-task setting, where the teacher and student tasks can be very different. In doing so, we obtain up to a 1.9% improvement in the cross-task setting compared to the traditional projection, at no additional cost. Our method can obtain significant performance improvements (up to 7%) when using even a randomly-initialised teacher on various tasks such as depth estimation, image translation, and semantic segmentation, despite the lack of any learned knowledge to transfer. To provide conceptual and analytical insights into this result, we show that using an inverted projection allows the distillation loss to be decomposed into a knowledge transfer and a spectral regularisation component. Through this analysis we are additionally able to propose a novel regularisation loss that allows teacher-free distillation, enabling performance improvements of up to 8.57% on ImageNet with no additional training costs.
Updated: 2024-03-21 15:42:17
标题: 学习投影以进行跨任务知识蒸馏
摘要: 传统知识蒸馏(KD)依赖于受过目标任务训练的熟练教师,这并不总是可用的。在这种情况下,可以使用跨任务蒸馏,从而利用在不同任务上训练的任何教师模型。然而,许多知识蒸馏方法在应用于这种跨任务设置时证明是无效的。为了解决这一限制,我们提出了一个简单的修改:使用倒置投影。我们展示了这种替代标准投影器的方法是有效的,通过学习忽略可能降低学生表现的任何任务特定特征。我们发现,这个简单的修改足以将许多知识蒸馏方法扩展到跨任务设置,其中教师和学生的任务可能非常不同。通过这样做,我们在跨任务设置中相比传统投影获得了高达1.9%的改进,而没有额外的成本。我们的方法可以在各种任务上(如深度估计、图像翻译和语义分割)甚至使用随机初始化的教师时获得显着的性能改进(高达7%),尽管缺乏任何学习的知识进行转移。为了提供对这一结果的概念和分析见解,我们展示了使用倒置投影允许将蒸馏损失分解为知识转移和谱规范化组件。通过这种分析,我们另外能够提出一种新颖的规范化损失,允许无教师蒸馏,实现在ImageNet上高达8.57%的性能改进,而无需额外的训练成本。
更新时间: 2024-03-21 15:42:17
领域: cs.CV,cs.AI
Closing the Gap: Achieving Better Accuracy-Robustness Tradeoffs against Query-Based Attacks
Although promising, existing defenses against query-based attacks share a common limitation: they offer increased robustness against attacks at the price of a considerable accuracy drop on clean samples. In this work, we show how to efficiently establish, at test-time, a solid tradeoff between robustness and accuracy when mitigating query-based attacks. Given that these attacks necessarily explore low-confidence regions, our insight is that activating dedicated defenses, such as random noise defense and random image transformations, only for low-confidence inputs is sufficient to prevent them. Our approach is independent of training and supported by theory. We verify the effectiveness of our approach for various existing defenses by conducting extensive experiments on CIFAR-10, CIFAR-100, and ImageNet. Our results confirm that our proposal can indeed enhance these defenses by providing better tradeoffs between robustness and accuracy when compared to state-of-the-art approaches while being completely training-free.
Updated: 2024-03-21 15:42:06
标题: 缩小差距:在面对基于查询的攻击时实现更好的准确性和鲁棒性权衡
摘要: 尽管现有的防御措施对查询攻击具有潜在的前景,但它们存在一个共同的局限性:它们在提高对攻击的稳健性的同时,会导致对干净样本的准确性显著下降。在这项工作中,我们展示了如何在测试时有效地建立稳健性和准确性之间的坚实权衡,以缓解查询攻击。鉴于这些攻击必然探索低置信度区域,我们的见解是,仅对低置信度输入激活专门的防御措施,如随机噪声防御和随机图像转换,就足以阻止它们。我们的方法独立于训练并得到理论支持。通过在CIFAR-10、CIFAR-100和ImageNet上进行大量实验,我们验证了我们的方法的有效性。我们的结果证实,与最先进的方法相比,我们的提议确实可以通过提供更好的稳健性和准确性之间的权衡来增强这些防御措施,同时完全免于训练。
更新时间: 2024-03-21 15:42:06
领域: cs.CV,cs.CR,cs.LG
Physics-Based Causal Reasoning for Safe & Robust Next-Best Action Selection in Robot Manipulation Tasks
Safe and efficient object manipulation is a key enabler of many real-world robot applications. However, this is challenging because robot operation must be robust to a range of sensor and actuator uncertainties. In this paper, we present a physics-informed causal-inference-based framework for a robot to probabilistically reason about candidate actions in a block stacking task in a partially observable setting. We integrate a physics-based simulation of the rigid-body system dynamics with a causal Bayesian network (CBN) formulation to define a causal generative probabilistic model of the robot decision-making process. Using simulation-based Monte Carlo experiments, we demonstrate our framework's ability to successfully: (1) predict block tower stability with high accuracy (Pred Acc: 88.6%); and, (2) select an approximate next-best action for the block stacking task, for execution by an integrated robot system, achieving 94.2% task success rate. We also demonstrate our framework's suitability for real-world robot systems by demonstrating successful task executions with a domestic support robot, with perception and manipulation sub-system integration. Hence, we show that by embedding physics-based causal reasoning into robots' decision-making processes, we can make robot task execution safer, more reliable, and more robust to various types of uncertainty.
Updated: 2024-03-21 15:36:26
标题: 基于物理的因果推理,用于机器人操作任务中安全和稳健的下一最佳动作选择
摘要: 安全高效的物体操纵是许多现实世界机器人应用的关键推动因素。然而,这是具有挑战性的,因为机器人操作必须对各种传感器和执行器不确定性具有鲁棒性。在本文中,我们提出了一个基于物理信息的因果推断框架,用于机器人在部分可观测环境中概率推理关于块堆叠任务的候选动作。我们将刚体系统动力学的基于物理的模拟与因果贝叶斯网络(CBN)配方集成,定义了机器人决策过程的因果生成概率模型。通过基于模拟的蒙特卡洛实验,我们展示了我们的框架成功地实现了以下目标:(1)高精度地预测块塔稳定性(Pred Acc: 88.6%);和(2)为块堆叠任务选择一个近似的次优动作,供整合机器人系统执行,实现94.2%的任务成功率。我们还通过在家庭支持机器人上展示成功的任务执行,以及感知和操纵子系统的集成,展示了我们的框架适用于真实世界机器人系统。因此,我们表明通过将基于物理的因果推理嵌入机器人的决策过程中,我们可以使机器人任务执行更安全、更可靠,并且更具抗各种不确定性的能力。
更新时间: 2024-03-21 15:36:26
领域: cs.RO,cs.AI,cs.LG,stat.AP,I.2.9; I.2.8; I.2.3; G.3; I.2.6; I.6.8; I.2.4; I.2.10
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
Updated: 2024-03-21 15:33:34
标题: EasyInstruct:面向大型语言模型的易于使用的指令处理框架
摘要: 近年来,指令调优越来越受到关注,并成为增强大型语言模型(LLMs)能力的关键技术。为了构建高质量的指令数据集,许多指令处理方法被提出,旨在实现数据数量和数据质量之间的微妙平衡。然而,由于各种指令处理方法之间仍存在不一致性,目前还没有针对社区提供标准的开源指令处理实现框架,这阻碍了从业者进一步开发和进步。为了促进指令处理研究和开发,我们提出了EasyInstruct,一个易于使用的LLMs指令处理框架,模块化指令生成、选择和提示,并考虑它们的组合和交互。EasyInstruct已公开发布并在https://github.com/zjunlp/EasyInstruct 上积极维护,同时提供在线演示应用程序和演示视频以便快速开始,呼吁更广泛的围绕指令数据和合成数据展开的研究。
更新时间: 2024-03-21 15:33:34
领域: cs.CL,cs.AI,cs.HC,cs.IR,cs.LG
Towards Secure Virtual Elections: Multiparty Computation of Order Based Voting Rules
Electronic voting systems are essential for holding virtual elections, and the need for such systems increases due to the COVID-19 pandemic and the social distancing that it mandates. One of the main challenges in e-voting systems is to secure the voting process: namely, to certify that the computed results are consistent with the cast ballots, and that the privacy of the voters is preserved. We propose herein a secure voting protocol for elections that are governed by order-based voting rules. Our protocol offers perfect ballot secrecy, in the sense that it issues only the required output, while no other information on the cast ballots is revealed. Such perfect secrecy, which is achieved by employing secure multiparty computation tools, may increase the voters' confidence and, consequently, encourage them to vote according to their true preferences. Evaluation of the protocol's computational costs establishes that it is lightweight and can be readily implemented in real-life electronic elections.
Updated: 2024-03-21 15:31:41
标题: 朝向安全虚拟选举:基于次序的投票规则的多方计算
摘要: 电子投票系统对于举行虚拟选举至关重要,由于COVID-19大流行和社交距离的强制执行,对这种系统的需求正在增加。电子投票系统面临的主要挑战之一是确保投票过程的安全性:即,验证计算结果与投票结果一致,并保护选民的隐私。本文提出了一种安全的选举投票协议,适用于基于顺序的投票规则。我们的协议提供完美的选票保密性,即仅发出所需的输出,而不透露有关投票结果的其他信息。通过采用安全多方计算工具实现的完美保密性可以增加选民的信心,从而鼓励他们根据真实偏好进行投票。对协议的计算成本进行评估表明,它轻量且可以轻松在实际电子选举中实施。
更新时间: 2024-03-21 15:31:41
领域: cs.CR
HyperGALE: ASD Classification via Hypergraph Gated Attention with Learnable Hyperedges
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by varied social cognitive challenges and repetitive behavioral patterns. Identifying reliable brain imaging-based biomarkers for ASD has been a persistent challenge due to the spectrum's diverse symptomatology. Existing baselines in the field have made significant strides in this direction, yet there remains room for improvement in both performance and interpretability. We propose \emph{HyperGALE}, which builds upon the hypergraph by incorporating learned hyperedges and gated attention mechanisms. This approach has led to substantial improvements in the model's ability to interpret complex brain graph data, offering deeper insights into ASD biomarker characterization. Evaluated on the extensive ABIDE II dataset, \emph{HyperGALE} not only improves interpretability but also demonstrates statistically significant enhancements in key performance metrics compared to both previous baselines and the foundational hypergraph model. The advancement \emph{HyperGALE} brings to ASD research highlights the potential of sophisticated graph-based techniques in neurodevelopmental studies. The source code and implementation instructions are available at GitHub:https://github.com/mehular0ra/HyperGALE.
Updated: 2024-03-21 15:31:28
标题: HyperGALE: 通过可学习的超边缘超图门控注意力进行自闭症谱系障碍分类
摘要: 自闭症谱系障碍(ASD)是一种神经发育病症,其特点是各种社交认知挑战和重复行为模式。由于该谱系的症状多样化,为ASD确定可靠的基于脑成像的生物标志一直是一个持久的挑战。该领域的现有基线已经在这个方向上取得了显著进展,但在性能和可解释性方面仍有改进空间。我们提出了\emph{HyperGALE},它在超图的基础上结合了学到的超边和门控注意机制。这种方法已经大大提高了模型解释复杂脑图数据的能力,为ASD生物标志特征的表征提供了更深入的见解。在广泛的ABIDE II数据集上评估,\emph{HyperGALE}不仅提高了可解释性,而且与以前的基线和基础超图模型相比,在关键性能指标上也表现出统计显著的增强。\emph{HyperGALE}为ASD研究带来的进步突显了在神经发育研究中使用高级基于图的技术的潜力。源代码和实施说明可在GitHub上找到:https://github.com/mehular0ra/HyperGALE。
更新时间: 2024-03-21 15:31:28
领域: cs.LG,cs.AI,cs.CV,cs.NE
Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research
Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, communication operators are also builders and maintainers of communication networks. Internet data improves the user's credit evaluation strategy. This paper uses the massive data provided by communication operators to carry out research on the operator's user credit evaluation model based on the fusion LightGBM algorithm. First, for the massive data related to user evaluation provided by operators, key features are extracted by data preprocessing and feature engineering methods, and a multi-dimensional feature set with statistical significance is constructed; then, linear regression, decision tree, LightGBM, and other machine learning algorithms build multiple basic models to find the best basic model; finally, integrates Averaging, Voting, Blending, Stacking and other integrated algorithms to refine multiple fusion models, and finally establish the most suitable fusion model for operator user evaluation.
Updated: 2024-03-21 15:29:24
标题: 使用LightGBM算法进行运营商用户信用评估研究
摘要: 移动互联网用户信用评估是通信运营商建立决策和制定措施的重要途径,也是运营商获得预期收益的保证。然而,信用评估方法长期以来一直被银行和信用等金融行业垄断。作为平台网络技术和网络资源的支持者和提供者,通信运营商也是通信网络的建设者和维护者。互联网数据改善了用户的信用评估策略。本文利用通信运营商提供的海量数据,基于融合LightGBM算法对运营商的用户信用评估模型进行研究。首先,通过数据预处理和特征工程方法提取运营商提供的与用户评估相关的大量数据的关键特征,并构建具有统计学意义的多维特征集;然后,利用线性回归、决策树、LightGBM等机器学习算法构建多个基本模型,找到最佳的基本模型;最后,整合平均、投票、混合、堆叠等综合算法,优化多个融合模型,最终建立最适合运营商用户评估的融合模型。
更新时间: 2024-03-21 15:29:24
领域: cs.LG,cs.AI,q-fin.ST
ChatGPT Alternative Solutions: Large Language Models Survey
In recent times, the grandeur of Large Language Models (LLMs) has not only shone in the realm of natural language processing but has also cast its brilliance across a vast array of applications. This remarkable display of LLM capabilities has ignited a surge in research contributions within this domain, spanning a diverse spectrum of topics. These contributions encompass advancements in neural network architecture, context length enhancements, model alignment, training datasets, benchmarking, efficiency improvements, and more. Recent years have witnessed a dynamic synergy between academia and industry, propelling the field of LLM research to new heights. A notable milestone in this journey is the introduction of ChatGPT, a powerful AI chatbot grounded in LLMs, which has garnered widespread societal attention. The evolving technology of LLMs has begun to reshape the landscape of the entire AI community, promising a revolutionary shift in the way we create and employ AI algorithms. Given this swift-paced technical evolution, our survey embarks on a journey to encapsulate the recent strides made in the world of LLMs. Through an exploration of the background, key discoveries, and prevailing methodologies, we offer an up-to-the-minute review of the literature. By examining multiple LLM models, our paper not only presents a comprehensive overview but also charts a course that identifies existing challenges and points toward potential future research trajectories. This survey furnishes a well-rounded perspective on the current state of generative AI, shedding light on opportunities for further exploration, enhancement, and innovation.
Updated: 2024-03-21 15:16:50
标题: ChatGPT替代解决方案:大型语言模型调查
摘要: 最近,大型语言模型(LLMs)的壮丽不仅在自然语言处理领域熠熠生辉,而且也横扫了广泛的应用领域。这种令人瞩目的LLM能力展示引发了该领域内研究贡献的激增,涵盖了神经网络架构、上下文长度增强、模型对齐、训练数据集、基准测试、效率改进等多个方面。近年来,学术界和工业界之间的动态协同作用推动了LLM研究领域的发展。该旅程中的一个显著里程碑是ChatGPT的推出,这是一个基于LLMs的强大人工智能聊天机器人,引起了广泛社会关注。LLMs技术的不断发展已经开始重塑整个人工智能社区的格局,承诺在我们创建和应用人工智能算法的方式上进行革命性转变。鉴于这种快速技术演进,我们的调查开始了一次探索LLMs世界最新进展的征程。通过探索背景、关键发现和主流方法,我们提供了一份最新的文献综述。通过研究多个LLM模型,我们的论文不仅呈现了全面的概述,还为识别现有挑战并指出潜在未来研究方向制定了一条路线。该调查提供了对生成式人工智能当前状态的全面视角,为进一步探索、增强和创新提供了启示。
更新时间: 2024-03-21 15:16:50
领域: cs.CL,cs.AI
Universal Feature Selection for Simultaneous Interpretability of Multitask Datasets
Extracting meaningful features from complex, high-dimensional datasets across scientific domains remains challenging. Current methods often struggle with scalability, limiting their applicability to large datasets, or make restrictive assumptions about feature-property relationships, hindering their ability to capture complex interactions. BoUTS's general and scalable feature selection algorithm surpasses these limitations to identify both universal features relevant to all datasets and task-specific features predictive for specific subsets. Evaluated on seven diverse chemical regression datasets, BoUTS achieves state-of-the-art feature sparsity while maintaining prediction accuracy comparable to specialized methods. Notably, BoUTS's universal features enable domain-specific knowledge transfer between datasets, and suggest deep connections in seemingly-disparate chemical datasets. We expect these results to have important repercussions in manually-guided inverse problems. Beyond its current application, BoUTS holds immense potential for elucidating data-poor systems by leveraging information from similar data-rich systems. BoUTS represents a significant leap in cross-domain feature selection, potentially leading to advancements in various scientific fields.
Updated: 2024-03-21 15:13:54
标题: 多任务数据集的通用特征选择,以实现同时可解释性
摘要: 从各个科学领域的复杂、高维数据集中提取有意义的特征仍然具有挑战性。当前的方法往往难以扩展,限制了它们对大型数据集的适用性,或者对特征属性关系做出了限制性假设,阻碍了它们捕捉复杂交互作用的能力。BoUTS的通用和可扩展的特征选择算法克服了这些限制,能够识别出适用于所有数据集的通用特征和适用于特定子集的具有预测能力的特征。在七个不同的化学回归数据集上进行评估,BoUTS实现了最先进的特征稀疏性,同时保持了与专门方法相当的预测准确性。值得注意的是,BoUTS的通用特征能够在数据集之间进行领域特定知识转移,并暗示了看似不相关的化学数据集之间的深层联系。我们期待这些结果对手动引导的反问题产生重要影响。除了目前的应用之外,BoUTS还具有利用类似数据丰富系统的信息来阐明数据贫乏系统的巨大潜力。BoUTS代表了跨领域特征选择的重大进展,有可能引领各个科学领域的进步。
更新时间: 2024-03-21 15:13:54
领域: cs.LG
An explainable three dimension framework to uncover learning patterns: A unified look in variable sulci recognition
Explainable AI is crucial in medical imaging. In the challenging field of neuroscience, visual topics present a high level of complexity, particularly within three-dimensional space. The application of neuroscience, which involves identifying brain sulcal features from MRI, faces significant hurdles due to varying annotation protocols among experts and the intricate three-dimension functionality of the brain. Consequently, traditional explainability approaches fall short in effectively validating and evaluating these networks. To address this, we first present a mathematical formulation delineating various categories of explanation needs across diverse computer vision tasks, categorized into self-explanatory, semi-explanatory, non-explanatory, and new-pattern learning applications based on the reliability of the validation protocol. With respect to this mathematical formulation, we propose a 3D explainability framework aimed at validating the outputs of deep learning networks in detecting the paracingulate sulcus an essential brain anatomical feature. The framework integrates local 3D explanations, global explanations through dimensionality reduction, concatenated global explanations, and statistical shape features, unveiling new insights into pattern learning. We trained and tested two advanced 3D deep learning networks on the challenging TOP-OSLO dataset, significantly improving sulcus detection accuracy, particularly on the left hemisphere. During evaluation with diverse annotation protocols for this dataset, we highlighted the crucial role of an unbiased annotation process in achieving precise predictions and effective pattern learning within our proposed 3D framework. The proposed framework not only annotates the variable sulcus but also uncovers hidden AI knowledge, promising to advance our understanding of brain anatomy and function.
Updated: 2024-03-21 15:12:36
标题: 一个可解释的三维框架揭示学习模式:变量沟回识别的统一视角
摘要: 可解释性人工智能在医学影像学中至关重要。在挑战性的神经科学领域,视觉主题呈现出高度复杂性,特别是在三维空间内。神经科学的应用涉及从MRI中识别大脑沟回特征,面临着由于专家之间不同的标注协议以及大脑复杂的三维功能而产生的重大障碍。因此,传统的可解释性方法在有效验证和评估这些网络方面表现不佳。为了解决这个问题,我们首先提出了一个数学公式,描述了在不同的计算机视觉任务中对解释需求的各种类别,分为自解释、半解释、非解释和基于验证协议可靠性的新样式学习应用。基于这个数学公式,我们提出了一个旨在验证深度学习网络输出的3D可解释性框架,用于检测重要的大脑解剖特征——副扣带沟。该框架整合了局部3D解释、通过降维实现的全局解释、串联全局解释和统计形状特征,揭示了对模式学习的新见解。我们在具有挑战性的TOP-OSLO数据集上训练和测试了两个先进的3D深度学习网络,显著提高了沟回检测的准确性,特别是在左半球上。在对该数据集使用不同的标注协议进行评估时,我们强调了一个无偏标注过程在实现精确预测和有效模式学习方面的关键作用。所提出的框架不仅注释了可变的沟回,还揭示了隐藏的人工智能知识,有望推动我们对大脑解剖和功能的理解。
更新时间: 2024-03-21 15:12:36
领域: cs.CV,cs.AI
Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow
We propose a novel model- and feature-based approach to development of vehicle software systems, where the end architecture is not explicitly defined. Instead, it emerges from an iterative process of search and optimization given certain constraints, requirements and hardware architecture, while retaining the property of single-system illusion, where applications run in a logically uniform environment. One of the key points of the presented approach is the inclusion of modern generative AI, specifically Large Language Models (LLMs), in the loop. With the recent advances in the field, we expect that the LLMs will be able to assist in processing of requirements, generation of formal system models, as well as generation of software deployment specification and test code. The resulting pipeline is automated to a large extent, with feedback being generated at each step.
Updated: 2024-03-21 15:07:57
标题: 朝向软件定义车辆中的单一系统错觉 - 自动化、人工智能驱动的工作流程
摘要: 我们提出了一种新颖的基于模型和特征的方法来开发车辆软件系统,其中最终架构并没有明确定义。相反,它是从一个迭代的搜索和优化过程中出现的,考虑了一定的约束条件、要求和硬件架构,同时保持了单一系统幻觉的特性,其中应用程序在逻辑统一的环境中运行。所提出的方法的一个关键点是在循环中包含现代生成式人工智能,特别是大型语言模型(LLMs)。随着该领域的最新进展,我们期望LLMs能够协助处理需求、生成形式系统模型,以及生成软件部署规范和测试代码。最终的流水线在很大程度上是自动化的,每一步都会产生反馈。
更新时间: 2024-03-21 15:07:57
领域: cs.SE,cs.AI,cs.CL,D.2.1; D.2.2; D.2.4; I.2.7; I.2.2; I.7.0
Multi-Level Explanations for Generative Language Models
Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.
Updated: 2024-03-21 15:06:14
标题: 生成式语言模型的多层解释
摘要: 扰动解释方法,如LIME和SHAP通常应用于文本分类。本文侧重于将这些方法扩展到生成式语言模型。为了解决文本作为输出和长文本输入的挑战,我们提出了一个称为MExGen的通用框架,可以用不同的归因算法实例化。为了处理文本输出,我们引入了将文本映射到实数的标量化概念,并研究了多种可能性。为了处理长输入,我们采取了多层次的方法,从粗糙的粒度到细致的粒度,重点关注具有模型查询线性扩展的算法。我们对扰动型归因方法在摘要和基于上下文的问答中进行了系统评估,包括自动化和人工评估。结果显示,我们的框架可以提供更加局部忠实的生成输出解释。
更新时间: 2024-03-21 15:06:14
领域: cs.CL,cs.AI
gTBLS: Generating Tables from Text by Conditional Question Answering
Distilling large, unstructured text into a structured, condensed form such as tables is an open research problem. One of the primary challenges in automatically generating tables is ensuring their syntactic validity. Prior approaches address this challenge by including additional parameters in the Transformer's attention mechanism to attend to specific rows and column headers. In contrast to this single-stage method, this paper presents a two-stage approach called Generative Tables (gTBLS). The first stage infers table structure (row and column headers) from the text. The second stage formulates questions using these headers and fine-tunes a causal language model to answer them. Furthermore, the gTBLS approach is amenable to the utilization of pre-trained Large Language Models in a zero-shot configuration, presenting a solution for table generation in situations where fine-tuning is not feasible. gTBLS improves prior approaches by up to 10% in BERTScore on the table construction task and up to 20% on the table content generation task of the E2E, WikiTableText, WikiBio, and RotoWire datasets.
Updated: 2024-03-21 15:04:32
标题: gTBLS:通过条件问答生成文本表格
摘要: 将大量非结构化文本提炼成结构化、简化形式,如表格,是一个开放的研究问题。自动生成表格的主要挑战之一是确保其句法有效性。以前的方法通过在Transformer的注意机制中包含额外的参数来应对这一挑战,以便关注特定的行和列标题。与这种单阶段方法相比,本文提出了一种称为生成表格(gTBLS)的两阶段方法。第一阶段从文本中推断表格结构(行和列标题)。第二阶段使用这些标题提出问题,并对因果语言模型进行微调以回答这些问题。此外,gTBLS方法适用于在零次配置下利用预训练的大型语言模型,为在无法进行微调的情况下生成表格提供了解决方案。gTBLS在表构建任务上比以前的方法提高了最多10%的BERT得分,并在E2E、WikiTableText、WikiBio和RotoWire数据集的表内容生成任务上最多提高了20%。
更新时间: 2024-03-21 15:04:32
领域: cs.CL,cs.IR,cs.LG
Maximal $α$-Leakage for Quantum Privacy Mechanisms
In this work, maximal $\alpha$-leakage is introduced to quantify how much a quantum adversary can learn about any sensitive information of data upon observing its disturbed version via a quantum privacy mechanism. We first show that an adversary's maximal expected $\alpha$-gain using optimal measurement is characterized by measured conditional R\'enyi entropy. This can be viewed as a parametric generalization of K\"onig et al.'s famous guessing probability formula [IEEE Trans. Inf. Theory, 55(9), 2009]. Then, we prove that the $\alpha$-leakage and maximal $\alpha$-leakage for a quantum privacy mechanism are determined by measured Arimoto information and measured R\'enyi capacity, respectively. Various properties of maximal $\alpha$-leakage, such as data processing inequality and composition property are established as well. Moreover, we show that regularized $\alpha$-leakage and regularized maximal $\alpha$-leakage for identical and independent quantum privacy mechanisms coincide with $\alpha$-tilted sandwiched R\'enyi information and sandwiched R\'enyi capacity, respectively.
Updated: 2024-03-21 14:58:07
标题: 量子隐私机制的最大$α$-泄露
摘要: 在这项工作中,引入了最大$\alpha$-泄露来量化量子对手在观察通过量子隐私机制扰动后的数据的敏感信息时可以学到多少。我们首先展示了对手使用最优测量获得的最大期望$\alpha$-增益可以通过测量条件Rényi熵来表征。这可以被视为König等人著名的猜测概率公式的参数化推广[IEEE Trans. Inf. Theory, 55(9), 2009]。然后,我们证明了量子隐私机制的$\alpha$-泄露和最大$\alpha$-泄露由测量的Arimoto信息和测量的Rényi容量确定。最大$\alpha$-泄露的各种特性,如数据处理不等式和组合性质也得到了建立。此外,我们展示了相同和独立量子隐私机制的正则化$\alpha$-泄露和正则化最大$\alpha$-泄露分别与$\alpha$-倾斜夹在Rényi信息和夹在Rényi容量相吻合。
更新时间: 2024-03-21 14:58:07
领域: quant-ph,cs.CR,cs.IT,math.IT
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.
Updated: 2024-03-21 14:50:18
标题: 优化多语言大型语言模型的语言增强:韩语案例研究
摘要: 大语言模型(LLMs)使用预训练来预测后续单词;然而,它们的扩展需要大量的计算资源。许多大型科技公司和研究机构已经开发了多语言LLMs(MLLMs)来满足当前需求,却忽视了资源较少的语言(LRLs)。本研究提出了三种策略,基于公开可用的MLLMs,以增强LRLs的性能。首先,扩展LRLs的MLLM词汇以增强表达能力。其次,使用双语数据进行预训练,以对齐高资源和低资源语言。第三,构建高质量小规模指令数据集,并进行指令调整以增强LRL。实验采用了Llama2模型,韩语被用作LRL,在八项任务中与其他开发的LLMs进行了定量评估。此外,基于人类评估和GPT4进行了定性评估。实验结果显示,我们提出的Bllossom模型在定性分析中表现出优越性能,与先前提出的韩语单语模型相比。
更新时间: 2024-03-21 14:50:18
领域: cs.CL,cs.AI
Language Models Can Reduce Asymmetry in Information Markets
This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes.
Updated: 2024-03-21 14:48:37
标题: 语言模型可以减少信息市场的不对称性
摘要: 这项工作解决了信息市场中买方检验悖论的问题。悖论在于买方需要获取信息以确定其价值,而卖方需要限制访问以防止盗窃。为了研究这个问题,我们引入了一个开源的模拟数字市场,其中由语言模型驱动的智能代理代表外部参与者购买和出售信息。支持这一市场的中心机制是代理的双重能力:他们不仅能够评估特权信息的质量,而且还具有遗忘的能力。这种诱发遗忘的能力使供应商能够临时授予专有信息的访问权限,大大降低了未经授权保留信息的风险,同时使代理能够准确评估信息对特定查询或任务的相关性。为了表现良好,代理必须做出理性决策,通过生成的子查询战略性地探索市场,并从购买的信息综合答案。具体来说,我们的实验揭示了语言模型中存在的偏见导致了非理性行为,并评估了减轻这些偏见的技术,研究了价格如何影响信息商品的需求,以及显示检验和更高的预算都会导致更高质量的结果。
更新时间: 2024-03-21 14:48:37
领域: cs.AI,cs.CL,cs.GT,cs.LG,cs.MA,cs.SI
Analysing Diffusion Segmentation for Medical Images
Denoising Diffusion Probabilistic models have become increasingly popular due to their ability to offer probabilistic modeling and generate diverse outputs. This versatility inspired their adaptation for image segmentation, where multiple predictions of the model can produce segmentation results that not only achieve high quality but also capture the uncertainty inherent in the model. Here, powerful architectures were proposed for improving diffusion segmentation performance. However, there is a notable lack of analysis and discussions on the differences between diffusion segmentation and image generation, and thorough evaluations are missing that distinguish the improvements these architectures provide for segmentation in general from their benefit for diffusion segmentation specifically. In this work, we critically analyse and discuss how diffusion segmentation for medical images differs from diffusion image generation, with a particular focus on the training behavior. Furthermore, we conduct an assessment how proposed diffusion segmentation architectures perform when trained directly for segmentation. Lastly, we explore how different medical segmentation tasks influence the diffusion segmentation behavior and the diffusion process could be adapted accordingly. With these analyses, we aim to provide in-depth insights into the behavior of diffusion segmentation that allow for a better design and evaluation of diffusion segmentation methods in the future.
Updated: 2024-03-21 14:45:54
标题: 分析医学图像的扩散分割
摘要: 去噪扩散概率模型因其能够提供概率建模和生成多样化输出的能力而日益受到欢迎。这种多功能性激发了它们在图像分割中的应用,模型的多次预测可以产生不仅质量高的分割结果,而且还能捕捉模型固有的不确定性。在这里,提出了用于改善扩散分割性能的强大架构。然而,对于扩散分割与图像生成之间的区别缺乏分析和讨论,缺少深入评估以区分这些架构对一般分割的改进和对扩散分割的特定益处。在这项工作中,我们批判性地分析和讨论了医学图像的扩散分割与扩散图像生成之间的差异,特别关注训练行为。此外,我们评估了提出的扩散分割架构在直接进行分割训练时的表现。最后,我们探讨了不同的医学分割任务如何影响扩散分割行为,以及如何相应地调整扩散过程。通过这些分析,我们旨在深入了解扩散分割的行为,以便更好地设计和评估未来的扩散分割方法。
更新时间: 2024-03-21 14:45:54
领域: eess.IV,cs.AI,cs.CV,cs.LG
Biased Binary Attribute Classifiers Ignore the Majority Classes
To visualize the regions of interest that classifiers base their decisions on, different Class Activation Mapping (CAM) methods have been developed. However, all of these techniques target categorical classifiers only, though most real-world tasks are binary classification. In this paper, we extend gradient-based CAM techniques to work with binary classifiers and visualize the active regions for binary facial attribute classifiers. When training an unbalanced binary classifier on an imbalanced dataset, it is well-known that the majority class, i.e. the class with many training samples, is mostly predicted much better than minority class with few training instances. In our experiments on the CelebA dataset, we verify these results, when training an unbalanced classifier to extract 40 facial attributes simultaneously. One would expect that the biased classifier has learned to extract features mainly for the majority classes and that the proportional energy of the activations mainly reside in certain specific regions of the image where the attribute is located. However, we find very little regular activation for samples of majority classes, while the active regions for minority classes seem mostly reasonable and overlap with our expectations. These results suggest that biased classifiers mainly rely on bias activation for majority classes. When training a balanced classifier on the imbalanced data by employing attribute-specific class weights, majority and minority classes are classified similarly well and show expected activations for almost all attributes
Updated: 2024-03-21 14:41:58
标题: 偏倚的二元属性分类器忽视多数类
摘要: 为了可视化分类器基于其决策的兴趣区域,已经开发了不同的类激活映射(CAM)方法。然而,所有这些技术只针对分类器,尽管大多数现实世界的任务是二元分类。在本文中,我们将基于梯度的CAM技术扩展到与二元分类器一起工作,并可视化二元面部属性分类器的活动区域。在不平衡数据集上训练不平衡的二元分类器时,众所周知,即具有许多训练样本的多数类通常比具有少量训练实例的少数类预测得更好。在CelebA数据集上的实验中,我们验证了这些结果,当训练一个不平衡分类器同时提取40个面部属性时。人们可能会期望,偏见分类器已经学会主要提取多数类的特征,并且激活的比例能量主要驻留在图像中属性所在的特定区域。然而,我们发现多数类样本的激活非常少,而少数类的活动区域似乎大多合理,并且与我们的期望重叠。这些结果表明,偏见分类器主要依赖于多数类的偏见激活。通过使用属性特定的类权重在不平衡数据上训练平衡分类器时,多数类和少数类的分类效果相似,并且几乎所有属性展现了预期的激活。
更新时间: 2024-03-21 14:41:58
领域: cs.CV,cs.AI,cs.LG
On the continuity and smoothness of the value function in reinforcement learning and optimal control
The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always H\"older continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
Updated: 2024-03-21 14:39:28
标题: 关于增强学习和最优控制中值函数的连续性和光滑性
摘要: 价值函数在强化学习和最优控制中扮演着衡量代理人未来累积奖励的关键角色。因此,研究相邻状态的价值有多相似,即研究价值函数的连续性是非常重要的。我们通过提供和验证价值函数连续性的上界来进行研究。此外,我们展示了在对基础系统进行相对较弱的假设下,价值函数始终是Hölder连续的,并且非可微的价值函数可以通过稍微“扰动”系统来使之可微。
更新时间: 2024-03-21 14:39:28
领域: eess.SY,cs.AI,cs.SY,37H99, 37N35, 93E03,I.2.8
Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
Deep learning-based image generation has seen significant advancements with diffusion models, notably improving the quality of generated images. Despite these developments, generating images with unseen characteristics beneficial for downstream tasks has received limited attention. To bridge this gap, we propose Style-Extracting Diffusion Models, featuring two conditioning mechanisms. Specifically, we utilize 1) a style conditioning mechanism which allows to inject style information of previously unseen images during image generation and 2) a content conditioning which can be targeted to a downstream task, e.g., layout for segmentation. We introduce a trainable style encoder to extract style information from images, and an aggregation block that merges style information from multiple style inputs. This architecture enables the generation of images with unseen styles in a zero-shot manner, by leveraging styles from unseen images, resulting in more diverse generations. In this work, we use the image layout as target condition and first show the capability of our method on a natural image dataset as a proof-of-concept. We further demonstrate its versatility in histopathology, where we combine prior knowledge about tissue composition and unannotated data to create diverse synthetic images with known layouts. This allows us to generate additional synthetic data to train a segmentation network in a semi-supervised fashion. We verify the added value of the generated images by showing improved segmentation results and lower performance variability between patients when synthetic images are included during segmentation training. Our code will be made publicly available at [LINK].
Updated: 2024-03-21 14:36:59
标题: 风格提取扩散模型用于半监督组织病理学分割
摘要: 基于深度学习的图像生成在扩散模型的显着进展中取得了重大突破,显著提高了生成图像的质量。尽管取得了这些进展,生成具有对下游任务有益的未见特征的图像却受到了有限的关注。为了弥合这一差距,我们提出了风格提取扩散模型,具有两种调节机制。具体来说,我们利用1)风格调节机制,允许在图像生成过程中注入先前未见图像的风格信息,以及2)内容调节机制,可以针对下游任务,比如布局用于分割。我们引入了一个可训练的风格编码器来从图像中提取风格信息,以及一个合并来自多个风格输入的风格信息的聚合块。这种架构使得以零样本方式生成具有未见风格的图像成为可能,通过利用来自未见图像的风格,从而产生更多样化的生成物。在这项工作中,我们以图像布局作为目标条件,并首先在一个自然图像数据集上展示了我们方法的能力作为概念验证。我们进一步展示了其在组织病理学中的多功能性,其中我们结合了关于组织构成和未注释数据的先验知识,创建了具有已知布局的多样化合成图像。这使我们能够生成额外的合成数据以半监督方式训练分割网络。我们通过展示改进的分割结果和在分割训练过程中包含合成图像时患者之间性能变化的降低,验证了生成图像的附加价值。我们的代码将在[链接]上公开提供。
更新时间: 2024-03-21 14:36:59
领域: cs.CV,cs.AI,cs.LG
FHAUC: Privacy Preserving AUC Calculation for Federated Learning using Fully Homomorphic Encryption
Ensuring data privacy is a significant challenge for machine learning applications, not only during model training but also during evaluation. Federated learning has gained significant research interest in recent years as a result. Current research on federated learning primarily focuses on preserving privacy during the training phase. However, model evaluation has not been adequately addressed, despite the potential for significant privacy leaks during this phase as well. In this paper, we demonstrate that the state-of-the-art AUC computation method for federated learning systems, which utilizes differential privacy, still leaks sensitive information about the test data while also requiring a trusted central entity to perform the computations. More importantly, we show that the performance of this method becomes completely unusable as the data size decreases. In this context, we propose an efficient, accurate, robust, and more secure evaluation algorithm capable of computing the AUC in horizontal federated learning systems. Our approach not only enhances security compared to the current state-of-the-art but also surpasses the state-of-the-art AUC computation method in both approximation performance and computational robustness, as demonstrated by experimental results. To illustrate, our approach can efficiently calculate the AUC of a federated learning system involving 100 parties, achieving 99.93% accuracy in just 0.68 seconds, regardless of data size, while providing complete data privacy.
Updated: 2024-03-21 14:36:55
标题: FHAUC:使用全同态加密的联邦学习隐私保护AUC计算
摘要: 确保数据隐私对于机器学习应用程序来说是一个重要挑战,不仅在模型训练期间,还在评估过程中。近年来,联邦学习因此而引起了重要的研究兴趣。目前关于联邦学习的研究主要集中在在训练阶段保护隐私。然而,尽管在评估阶段也存在可能泄露重要隐私信息的风险,但该问题尚未得到充分解决。本文展示了目前用于联邦学习系统的最先进的利用差分隐私进行AUC计算的方法仍会泄露关于测试数据的敏感信息,同时需要一个可信的中央实体来执行计算。更重要的是,我们发现这种方法在数据规模减小时性能将完全无法使用。在这种情况下,我们提出了一种高效、精确、稳健且更安全的评估算法,能够在水平联邦学习系统中计算AUC。与当前最先进的方法相比,我们的方法不仅提高了安全性,而且在近似性能和计算稳健性方面超越了最先进的AUC计算方法,实验证明了这一点。举例来说,我们的方法可以在0.68秒内高效计算涉及100个参与方的联邦学习系统的AUC,精度达到99.93%,无论数据规模如何,同时提供完全的数据隐私保护。
更新时间: 2024-03-21 14:36:55
领域: cs.CR
Exact and general decoupled solutions of the LMC Multitask Gaussian Process model
The Linear Model of Co-regionalization (LMC) is a very general model of multitask gaussian process for regression or classification. While its expressivity and conceptual simplicity are appealing, naive implementations have cubic complexity in the number of datapoints and number of tasks, making approximations mandatory for most applications. However, recent work has shown that under some conditions the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes. We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model. We introduce a full parametrization of the resulting \emph{projected LMC} model, and an expression of the marginal likelihood enabling efficient optimization. We perform a parametric study on synthetic data to show the excellent performance of our approach, compared to an unrestricted exact LMC and approximations of the latter. Overall, the projected LMC appears as a credible and simpler alternative to state-of-the art models, which greatly facilitates some computations such as leave-one-out cross-validation and fantasization.
Updated: 2024-03-21 14:36:26
标题: LMC多任务高斯过程模型的精确和通用解耦解答
摘要: 共区化的线性模型(LMC)是一种多任务高斯过程用于回归或分类的非常普遍的模型。虽然它的表达能力和概念上的简单性具有吸引力,但天真的实现在数据点数量和任务数量上具有立方复杂度,这使得对大多数应用来说,近似是必要的。然而,最近的研究表明,在某些条件下,该模型的潜在过程可以解耦,导致复杂度仅与这些过程的数量成线性关系。在此,我们扩展了这些结果,从最一般的假设中表明,使LMC的精确计算高效唯一必要的条件是对噪声模型的一种温和的假设。我们介绍了结果的完整参数化\emph{投影LMC}模型,并给出了使优化高效的边际似然表达式。我们在合成数据上进行了参数研究,展示了我们方法与不受限制的精确LMC和后者的近似相比的出色性能。总的来说,投影LMC似乎是一种可靠且更简单的替代现有技术的模型,这极大地简化了某些计算,如留一法交叉验证和幻想化。
更新时间: 2024-03-21 14:36:26
领域: cs.LG,stat.ML,I.2.6
Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization
We present a method for end-to-end learning of Koopman surrogate models for optimal performance in control. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the potential differentiability of environments based on mechanistic simulation models. We evaluate the performance of our method by comparing it to that of other controller type and training algorithm combinations on a literature known eNMPC case study. Our method exhibits superior performance on this problem, thereby constituting a promising avenue towards more capable controllers that employ dynamic surrogate models.
Updated: 2024-03-21 14:28:43
标题: 通过可微模拟和优化实现基于数据驱动的任务最优的eNMPC代理模型
摘要: 我们提出了一种用于端到端学习Koopman代理模型以实现控制最佳性能的方法。与以往使用标准强化学习(RL)算法的贡献不同,我们使用一种训练算法,利用基于机械模拟模型的环境潜在可微性。通过将我们的方法与其他控制器类型和训练算法组合在文献已知的eNMPC案例研究中进行比较,我们评估了我们方法的性能。我们的方法在这个问题上表现出优越的性能,因此构成了一个朝着使用动态代理模型的更有能力控制器的有希望的途径。
更新时间: 2024-03-21 14:28:43
领域: cs.LG,math.OC
A task of anomaly detection for a smart satellite Internet of things system
When the equipment is working, real-time collection of environmental sensor data for anomaly detection is one of the key links to prevent industrial process accidents and network attacks and ensure system security. However, under the environment with specific real-time requirements, the anomaly detection for environmental sensors still faces the following difficulties: (1) The complex nonlinear correlation characteristics between environmental sensor data variables lack effective expression methods, and the distribution between the data is difficult to be captured. (2) it is difficult to ensure the real-time monitoring requirements by using complex machine learning models, and the equipment cost is too high. (3) Too little sample data leads to less labeled data in supervised learning. This paper proposes an unsupervised deep learning anomaly detection system. Based on the generative adversarial network and self-attention mechanism, considering the different feature information contained in the local subsequences, it automatically learns the complex linear and nonlinear dependencies between environmental sensor variables, and uses the anomaly score calculation method combining reconstruction error and discrimination error. It can monitor the abnormal points of real sensor data with high real-time performance and can run on the intelligent satellite Internet of things system, which is suitable for the real working environment. Anomaly detection outperforms baseline methods in most cases and has good interpretability, which can be used to prevent industrial accidents and cyber-attacks for monitoring environmental sensors.
Updated: 2024-03-21 14:26:29
标题: 一个智能卫星物联网系统的异常检测任务
摘要: 当设备工作时,实时收集环境传感器数据用于异常检测是防止工业过程事故和网络攻击并确保系统安全的关键环节之一。然而,在具有特定实时要求的环境下,环境传感器的异常检测仍面临以下困难:(1)环境传感器数据变量之间复杂的非线性相关特性缺乏有效的表达方法,数据之间的分布难以捕捉。(2)使用复杂的机器学习模型难以确保实时监测要求,而且设备成本过高。(3)样本数据过少导致监督学习中的标记数据较少。本文提出了一种无监督深度学习异常检测系统。基于生成对抗网络和自注意机制,考虑到局部子序列中包含的不同特征信息,它自动学习环境传感器变量之间的复杂线性和非线性依赖关系,并使用重构误差和判别误差结合的异常分数计算方法。它可以通过高实时性能监测实际传感器数据的异常点,并可以在智能卫星物联网系统上运行,适用于真实工作环境。在大多数情况下,异常检测优于基准方法,并具有良好的可解释性,可用于监测环境传感器以防止工业事故和网络攻击。
更新时间: 2024-03-21 14:26:29
领域: cs.LG,eess.SP
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
Updated: 2024-03-21 14:17:28
标题: DP-RDM:无需微调即可将扩散模型适应私有领域
摘要: 文本到图像扩散模型已经被证明存在样本级别的记忆问题,可能会复制出几乎完美的图像副本,这可能是不可取的。为了解决这个问题,我们开发了第一个具有差分隐私(DP)检索增强生成算法,能够生成高质量的图像样本同时提供可证明的隐私保证。具体来说,我们假设可以访问一个在少量公共数据上训练的文本到图像扩散模型,并设计了一个DP检索机制,用私有检索数据集中检索到的样本来增强文本提示。我们的“差分隐私检索增强扩散模型”(DP-RDM)不需要在检索数据集上进行微调就能适应另一个领域,并且可以使用最先进的生成模型生成高质量的图像样本,同时满足严格的DP保证。例如,在MS-COCO数据集上评估时,我们的DP-RDM可以在隐私预算为ε=10的情况下生成样本,同时与仅使用公共检索相比,对于多达10,000个查询可以提供3.5个FID改进点。
更新时间: 2024-03-21 14:17:28
领域: cs.LG,cs.CR,cs.CV
On the Privacy of Selection Mechanisms with Gaussian Noise
Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.
Updated: 2024-03-21 14:03:39
标题: 关于带有高斯噪声的选择机制隐私性的研究
摘要: 报告嘈杂的最大值和阈值以上是两种经典的差分隐私(DP)选择机制。它们的输出是通过向一系列低敏感度查询添加噪声并报告满足特定条件的查询的标识来获得的。当Laplace噪声添加到查询时,这些机制的纯DP保证很容易获得。另一方面,当使用高斯噪声实例化时,尽管这些机制的输出位于离散空间中,标准分析仅产生近似的DP保证。在这项工作中,我们重新审视了使用高斯噪声的报告嘈杂的最大值和阈值以上的分析,并表明,在额外假设基础查询是有界的情况下,可以为报告嘈杂的最大值提供纯的前期DP界限,并为阈值以上提供纯的后期DP界限。得到的界限是紧密的,并依赖于可以使用标准方法进行数值评估的封闭形式表达。在实证方面,我们发现这些方法在高隐私、低数据范围内导致更紧密的隐私核算。此外,我们提出了一个简单的隐私过滤器,用于构建纯的后期DP保证,并使用它来推导一个完全自适应的高斯稀疏向量技术机制。最后,我们提供了关于移动性和能源消耗数据集的实验,证明我们的稀疏向量技术在实践中与先前方法竞争,并需要更少的超参数调整。
更新时间: 2024-03-21 14:03:39
领域: cs.LG,cs.CR
GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning
Deep neural networks often exhibit sub-optimal performance under covariate and category shifts. Source-Free Domain Adaptation (SFDA) presents a promising solution to this dilemma, yet most SFDA approaches are restricted to closed-set scenarios. In this paper, we explore Source-Free Universal Domain Adaptation (SF-UniDA) aiming to accurately classify "known" data belonging to common categories and segregate them from target-private "unknown" data. We propose a novel Global and Local Clustering (GLC) technique, which comprises an adaptive one-vs-all global clustering algorithm to discern between target classes, complemented by a local k-NN clustering strategy to mitigate negative transfer. Despite the effectiveness, the inherent closed-set source architecture leads to uniform treatment of "unknown" data, impeding the identification of distinct "unknown" categories. To address this, we evolve GLC to GLC++, integrating a contrastive affinity learning strategy. We examine the superiority of GLC and GLC++ across multiple benchmarks and category shift scenarios. Remarkably, in the most challenging open-partial-set scenarios, GLC and GLC++ surpass GATE by 16.7% and 18.6% in H-score on VisDA, respectively. GLC++ enhances the novel category clustering accuracy of GLC by 4.3% in open-set scenarios on Office-Home. Furthermore, the introduced contrastive learning strategy not only enhances GLC but also significantly facilitates existing methodologies.
Updated: 2024-03-21 13:57:45
标题: GLC++:通过全局-局部聚类和对比亲和力学习实现无源通用域自适应
摘要: 深度神经网络在协变量和类别转移下往往表现出次优性能。无源域自适应(SFDA)提出了一个有希望的解决方案,然而大多数SFDA方法仅适用于封闭集场景。在本文中,我们探讨了无源通用域自适应(SF-UniDA),旨在准确分类属于常见类别的“已知”数据,并将其与目标私有的“未知”数据进行隔离。我们提出了一种新颖的全局和局部聚类(GLC)技术,其中包括一种自适应的一对所有全局聚类算法,用于区分目标类别,再辅以一种局部k-NN聚类策略以减轻负迁移。尽管有效,固有的封闭源架构导致对“未知”数据的统一处理,阻碍了对不同“未知”类别的识别。为了解决这个问题,我们将GLC发展为GLC++,并整合了对比亲和学习策略。我们在多个基准和类别转移场景中检验了GLC和GLC++的优越性。值得注意的是,在最具挑战性的开放部分集场景中,GLC和GLC++在VisDA上的H分数分别比GATE高出16.7%和18.6%。GLC++在Office-Home的开放集场景中将GLC的新类别聚类准确率提高了4.3%。此外,引入的对比学习策略不仅增强了GLC,还显著促进了现有方法学的发展。
更新时间: 2024-03-21 13:57:45
领域: cs.CV,cs.AI,cs.LG
Locating and Mitigating Gender Bias in Large Language Models
Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects.
Updated: 2024-03-21 13:57:43
标题: 定位和缓解大型语言模型中的性别偏见
摘要: 大型语言模型(LLM)在广泛的语料库上进行预训练,以学习包含人类偏好的事实和人类认知。然而,这个过程可能无意中导致这些模型获得社会中普遍存在的偏见和刻板印象。先前的研究通常通过一维视角来解决偏见问题,集中于定位或减轻偏见。这种有限的视角在促进偏见研究方面产生了障碍,无法协同互补并逐步建立。在本研究中,我们将定位和减轻偏见过程整合到一个统一框架中。首先,我们使用因果中介分析来追踪大型语言模型中不同组件激活的因果效应。在此基础上,我们提出了LSDM(最小二乘去偏方法),这是一种基于知识编辑的方法,用于减轻职业代词中的性别偏见,并将其与三个性别偏见数据集和七个知识能力测试数据集上的两个基线进行比较。实验结果表明,性别偏见的主要贡献者是底层MLP模块对职业代词的最后一个标记的作用和顶层注意模块对句子中最后一个单词的作用。此外,LSDM在更有效地减轻模型中的性别偏见,同时完全保留模型在其他方面的能力。
更新时间: 2024-03-21 13:57:43
领域: cs.CL,cs.AI
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources to train deep learning models. Neural network pruning techniques, such as dynamic pruning, could enhance model efficiency, but directly adopting them in FL still poses substantial challenges, including post-pruning performance degradation, high activation memory usage, etc. To address these challenges, we propose FedMef, a novel and memory-efficient federated dynamic pruning framework. FedMef comprises two key components. First, we introduce the budget-aware extrusion that maintains pruning efficiency while preserving post-pruning performance by salvaging crucial information from parameters marked for pruning within a given budget. Second, we propose scaled activation pruning to effectively reduce activation memory footprints, which is particularly beneficial for deploying FL to memory-limited devices. Extensive experiments demonstrate the effectiveness of our proposed FedMef. In particular, it achieves a significant reduction of 28.5% in memory footprint compared to state-of-the-art methods while obtaining superior accuracy.
Updated: 2024-03-21 13:54:36
标题: FedMef:面向内存高效的联邦动态剪枝
摘要: 联邦学习(FL)促进分散训练,同时优先考虑数据保密性。然而,在资源受限设备上应用FL是具有挑战性的,因为训练深度学习模型需要高计算和内存资源。神经网络修剪技术,如动态修剪,可以增强模型效率,但在FL中直接采用仍然面临重大挑战,包括后修剪性能下降,高激活内存使用等。为解决这些挑战,我们提出了FedMef,一种新颖且内存高效的联邦动态修剪框架。FedMef包括两个关键组件。首先,我们引入了预算感知挤压,通过在给定预算内从标记为修剪的参数中挽救关键信息,保持修剪效率同时保留后修剪性能。其次,我们提出了缩放激活修剪,有效减少激活内存占用,这对于部署FL到内存受限设备特别有益。广泛的实验表明了我们提出的FedMef的有效性。特别是,与最先进的方法相比,它在内存占用上实现了28.5%的显著减少,同时获得了更高的准确性。
更新时间: 2024-03-21 13:54:36
领域: cs.LG,cs.DC
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.
Updated: 2024-03-21 13:53:48
标题: 无监督视频领域自适应:掩蔽预训练和协作自训练
摘要: 在这项工作中,我们致力于解决视频动作识别的无监督域自适应(UDA)问题。我们的方法被称为UNITE,利用图像教师模型来调整视频学生模型以适应目标领域。UNITE首先采用自监督预训练,利用教师引导的掩蔽蒸馏目标函数促进目标领域视频的辨别特征学习。然后,我们对掩蔽的目标数据进行自训练,利用视频学生模型和图像教师模型一起为未标记的目标视频生成改进的伪标签。我们的自训练过程成功利用了两个模型的优势,在不同领域之间实现了强大的迁移性能。我们在多个视频领域自适应基准上评估了我们的方法,并观察到与先前报道的结果相比显著的改进。
更新时间: 2024-03-21 13:53:48
领域: cs.CV,cs.LG
Physics-Informed Diffusion Models
Generative models such as denoising diffusion models are quickly advancing their ability to approximate highly complex data distributions. They are also increasingly leveraged in scientific machine learning, where samples from the implied data distribution are expected to adhere to specific governing equations. We present a framework to inform denoising diffusion models on underlying constraints on such generated samples during model training. Our approach improves the alignment of the generated samples with the imposed constraints and significantly outperforms existing methods without affecting inference speed. Additionally, our findings suggest that incorporating such constraints during training provides a natural regularization against overfitting. Our framework is easy to implement and versatile in its applicability for imposing equality and inequality constraints as well as auxiliary optimization objectives.
Updated: 2024-03-21 13:52:55
标题: 物理学信息扩散模型
摘要: 生成模型,如去噪扩散模型,正在快速提高其近似高度复杂数据分布的能力。它们也越来越多地被用于科学机器学习中,在那里预期从暗示数据分布中得到的样本将遵守特定的控制方程。我们提出了一个框架,以在模型训练期间通知去噪扩散模型关于这些生成样本的潜在约束。我们的方法改进了生成样本与施加的约束之间的对齐,并且在不影响推断速度的情况下显着优于现有方法。此外,我们的发现表明,在训练期间合并这些约束提供了一种自然的规范化防止过度拟合。我们的框架易于实施,并且在施加相等和不等约束以及辅助优化目标方面具有广泛的适用性。
更新时间: 2024-03-21 13:52:55
领域: cs.LG,cs.CE
AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation
We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning and implicit distillations. Our model not only distills the deterministic and progressive knowledge which are from the pre-trained and previous epoch predictive probabilities but also transfers the knowledge of the deterministic predictive distributions using adversarial learning. The motivation is that the self-knowledge distillation methods regularize the predictive probabilities with soft targets, but the exact distributions may be hard to predict. Our method deploys a discriminator to distinguish the distributions between the pre-trained and student models while the student model is trained to fool the discriminator in the trained procedure. Thus, the student model not only can learn the pre-trained model's predictive probabilities but also align the distributions between the pre-trained and student models. We demonstrate the effectiveness of the proposed method with network architectures on multiple datasets and show the proposed method achieves better performance than state-of-the-art methods.
Updated: 2024-03-21 13:51:10
标题: AI-KD:对抗学习和隐式正则化在自我知识蒸馏中的应用
摘要: 我们提出了一种新颖的对抗惩罚自知识蒸馏方法,名为对抗学习和隐式正则化自知识蒸馏(AI-KD),通过对抗学习和隐式蒸馏来规范训练过程。我们的模型不仅蒸馏了来自预训练和之前时期预测概率的确定性和渐进性知识,还利用对抗学习转移了确定性预测分布的知识。动机在于自知识蒸馏方法通过软目标对预测概率进行规范化,但确切的分布可能难以预测。我们的方法部署了一个鉴别器来区分预训练和学生模型之间的分布,同时学生模型在训练过程中被训练来欺骗鉴别器。因此,学生模型不仅可以学习预训练模型的预测概率,还可以使预训练和学生模型之间的分布对齐。我们通过多个数据集上的网络架构展示了所提出方法的有效性,并表明所提出方法比最先进方法取得了更好的性能。
更新时间: 2024-03-21 13:51:10
领域: cs.CV,cs.LG
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning
Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from developing accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.
Updated: 2024-03-21 13:47:40
标题: 使用语言感知指令调整构建准确的翻译定制LLMs
摘要: 翻译定制的大型语言模型(LLMs)表现出出色的翻译能力,甚至可以与经过监督训练的商业翻译系统竞争。然而,离题翻译仍然是一个尚未解决的问题,特别是对于资源稀缺的语言,这阻碍了我们开发基于准确LLMs的翻译模型。为了缓解离题翻译问题并提高LLMs在翻译中的性能,最近的研究要么设计了先进的提示策略以突出翻译指令的功能,要么利用LLMs的上下文学习能力通过输入少量示范。然而,这些方法本质上并没有提高LLMs遵循翻译指令的能力,特别是语言方向信息。在这项工作中,我们设计了一个两阶段微调算法,以提高LLMs的指令遵循能力(特别是翻译方向)。具体而言,我们首先在翻译数据集上通过最大似然估计损失微调LLMs,以引出基本的翻译能力。在第二阶段,我们通过在指令内随机替换翻译方向为错误方向来构建指令冲突样本,然后引入额外的不可能性损失来学习这些样本。在IWSLT和WMT基准测试中,涵盖16个零翻译方向的LLaMA模型上的实验证明,与竞争基线 - 翻译微调的LLama相比,我们的方法可以有效降低离题翻译比例(平均-53.3%),从而提高翻译质量,平均提高+5.7 SacreBLEU和+16.4 BLEURT。分析表明,我们的方法可以保留模型在AlpacaEval上的通用任务性能。代码和模型将在 \url{https://github.com/alphadl/LanguageAware_Tuning} 上发布。
更新时间: 2024-03-21 13:47:40
领域: cs.CL,cs.AI
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks. Similar to existing regularized adaptive methods, the subproblem for computing the update direction of RAMDA involves a nonsmooth regularizer and a diagonal preconditioner, and therefore does not possess a closed-form solution in general. We thus also carefully devise an implementable inexactness condition that retains convergence guarantees similar to the exact versions, and propose a companion efficient solver for the subproblems of both RAMDA and existing methods to make them practically feasible. We leverage the theory of manifold identification in variational analysis to show that, even in the presence of such inexactness, the iterates of RAMDA attain the ideal structure induced by the regularizer at the stationary point of asymptotic convergence. This structure is locally optimal near the point of convergence, so RAMDA is guaranteed to obtain the best structure possible among all methods converging to the same point, making it the first regularized adaptive method outputting models that possess outstanding predictive performance while being (locally) optimally structured. Extensive numerical experiments in large-scale modern computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network. Implementation of our algorithm is available at http://www.github.com/ismoptgroup/RAMDA/.
Updated: 2024-03-21 13:43:49
标题: 正则化自适应动量双平均法:用高效的近似子问题求解器训练结构化神经网络
摘要: 我们提出了一种用于训练结构化神经网络的正则化自适应动量双均值(RAMDA)算法。与现有的正则化自适应方法类似,RAMDA的更新方向计算子问题涉及非平滑正则化器和对角预处理器,因此通常没有闭合形式的解。因此,我们还精心设计了一个可实现的不精确条件,保留了与精确版本类似的收敛保证,并提出了一个高效的伴随求解器,用于RAMDA和现有方法的子问题,使其在实践中可行。我们利用变分分析中的流形识别理论表明,即使存在不精确性,RAMDA的迭代也可以在渐近收敛的稳定点处获得由正则化器引起的理想结构。这种结构在收敛点附近是局部最优的,因此RAMDA被保证可以获得所有收敛到相同点的方法中可能的最佳结构,使其成为第一个输出具有出色预测性能且(局部)最优结构的正则化自适应方法。大规模现代计算机视觉、语言建模和语音任务的大量数值实验表明,我们提出的RAMDA是高效的,并且始终优于现有方法,用于训练结构化神经网络。我们的算法实现可在http://www.github.com/ismoptgroup/RAMDA/上找到。
更新时间: 2024-03-21 13:43:49
领域: cs.LG,math.OC
Sequence-to-Sequence Spanish Pre-trained Language Models
In recent years, significant advancements in pre-trained language models have driven the creation of numerous non-English language variants, with a particular emphasis on encoder-only and decoder-only architectures. While Spanish language models based on BERT and GPT have demonstrated proficiency in natural language understanding and generation, there remains a noticeable scarcity of encoder-decoder models explicitly designed for sequence-to-sequence tasks, which aim to map input sequences to generate output sequences conditionally. This paper breaks new ground by introducing the implementation and evaluation of renowned encoder-decoder architectures exclusively pre-trained on Spanish corpora. Specifically, we present Spanish versions of BART, T5, and BERT2BERT-style models and subject them to a comprehensive assessment across various sequence-to-sequence tasks, including summarization, question answering, split-and-rephrase, dialogue, and translation. Our findings underscore the competitive performance of all models, with the BART- and T5-based models emerging as top performers across all tasks. We have made all models publicly available to the research community to foster future explorations and advancements in Spanish NLP: https://github.com/vgaraujov/Seq2Seq-Spanish-PLMs.
Updated: 2024-03-21 13:41:35
标题: 序列到序列西班牙语预训练语言模型
摘要: 近年来,预训练语言模型取得了显著进展,推动了许多非英语语言变体的创建,特别强调了仅编码器和仅解码器架构。虽然基于BERT和GPT的西班牙语语言模型在自然语言理解和生成方面表现出了熟练的能力,但明显缺乏专门设计用于序列到序列任务的编码器-解码器模型,这些任务旨在将输入序列映射到有条件地生成输出序列。本文通过引入专门在西班牙语语料库上预训练的著名编码器-解码器架构的实现和评估,开创了新的领域。具体来说,我们提出了BART、T5和BERT2BERT风格模型的西班牙语版本,并对它们进行了全面评估,包括摘要、问答、拆分和重述、对话和翻译等各种序列到序列任务。我们的研究结果强调了所有模型的竞争性表现,其中基于BART和T5的模型在所有任务中表现最佳。我们已经将所有模型公开提供给研究社区,以促进未来对西班牙语自然语言处理的探索和进步:https://github.com/vgaraujov/Seq2Seq-Spanish-PLMs。
更新时间: 2024-03-21 13:41:35
领域: cs.CL,cs.AI,cs.LG
EEG decoding with conditional identification information
Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces. Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals. Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities. However, DNN still faces challenge in decoding EEG samples of unseen individuals. To address this, this paper introduces a novel approach by incorporating the conditional identification information of each individual into the neural network, thereby enhancing model representation through the synergistic interaction of EEG and personal traits. We test our model on the WithMe dataset and demonstrated that the inclusion of these identifiers substantially boosts accuracy for both subjects in the training set and unseen subjects. This enhancement suggests promising potential for improving for EEG interpretability and understanding of relevant identification features.
Updated: 2024-03-21 13:38:59
标题: 使用条件识别信息进行脑电图解码
摘要: 解码脑电图信号对于揭示人类大脑并推进脑机接口至关重要。传统的机器学习算法受到脑电图信号中高噪声水平和个体间固有变化的阻碍。最近深度神经网络(DNNs)的进展显示出了希望,这归因于其先进的非线性建模能力。然而,DNN仍然面临着解码未见个体的脑电图样本的挑战。为了解决这个问题,本文引入了一种新颖的方法,将每个个体的条件识别信息融入神经网络中,从而通过脑电图和个人特征之间的协同作用增强模型表示。我们在WithMe数据集上测试了我们的模型,并证明了这些标识符的包含显著提高了训练集中和未见个体的准确性。这种增强表明了提高脑电图可解释性和理解相关识别特征的潜力。
更新时间: 2024-03-21 13:38:59
领域: eess.SP,cs.AI,cs.HC,cs.LG
Effective Structured Prompting by Meta-Learning and Representative Verbalizer
Prompt tuning for pre-trained masked language models (MLM) has shown promising performance in natural language processing tasks with few labeled examples. It tunes a prompt for the downstream task, and a verbalizer is used to bridge the predicted token and label prediction. Due to the limited training data, prompt initialization is crucial for prompt tuning. Recently, MetaPrompting (Hou et al., 2022) uses meta-learning to learn a shared initialization for all task-specific prompts. However, a single initialization is insufficient to obtain good prompts for all tasks and samples when the tasks are complex. Moreover, MetaPrompting requires tuning the whole MLM, causing a heavy burden on computation and memory as the MLM is usually large. To address these issues, we use a prompt pool to extract more task knowledge and construct instance-dependent prompts via attention. We further propose a novel soft verbalizer (RepVerb) which constructs label embedding from feature embeddings directly. Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting. MetaPrompter is parameter-efficient as only the pool is required to be tuned. Experimental results demonstrate that MetaPrompter performs better than the recent state-of-the-arts and RepVerb outperforms existing soft verbalizers.
Updated: 2024-03-21 13:37:23
标题: 元学习和代表性表达者的有效结构提示
摘要: 预训练的掩码语言模型(MLM)的提示调整已经展现出在自然语言处理任务中具有很少标记示例时的良好性能。它为下游任务调整了一个提示,并使用一个表述器来连接预测的标记和标签预测。由于训练数据有限,提示初始化对于提示调整至关重要。最近,MetaPrompting(Hou等人,2022)使用元学习来学习所有特定任务提示的共享初始化。然而,当任务复杂时,单一初始化不足以获得所有任务和样本的良好提示。此外,MetaPrompting需要调整整个MLM,这会给计算和内存造成沉重负担,因为MLM通常很大。为了解决这些问题,我们使用提示池来提取更多任务知识,并通过注意力构建基于实例的提示。我们进一步提出了一种新颖的软表述器(RepVerb),它直接从特征嵌入中构建标签嵌入。结合元学习提示池和RepVerb,我们提出了MetaPrompter以进行有效的结构化提示。MetaPrompter在参数效率方面表现出色,因为只需要调整池。实验结果表明,MetaPrompter的性能优于最近的最新技术,并且RepVerb胜过现有的软表述器。
更新时间: 2024-03-21 13:37:23
领域: cs.CL,cs.AI,cs.LG
A Bag of Tricks for Few-Shot Class-Incremental Learning
We present a bag of tricks framework for few-shot class-incremental learning (FSCIL), which is a challenging form of continual learning that involves continuous adaptation to new tasks with limited samples. FSCIL requires both stability and adaptability, i.e., preserving proficiency in previously learned tasks while learning new ones. Our proposed bag of tricks brings together eight key and highly influential techniques that improve stability, adaptability, and overall performance under a unified framework for FSCIL. We organize these tricks into three categories: stability tricks, adaptability tricks, and training tricks. Stability tricks aim to mitigate the forgetting of previously learned classes by enhancing the separation between the embeddings of learned classes and minimizing interference when learning new ones. On the other hand, adaptability tricks focus on the effective learning of new classes. Finally, training tricks improve the overall performance without compromising stability or adaptability. We perform extensive experiments on three benchmark datasets, CIFAR-100, CUB-200, and miniIMageNet, to evaluate the impact of our proposed framework. Our detailed analysis shows that our approach substantially improves both stability and adaptability, establishing a new state-of-the-art by outperforming prior works in the area. We believe our method provides a go-to solution and establishes a robust baseline for future research in this area.
Updated: 2024-03-21 13:33:00
标题: 一袋技巧用于少样本类别逐步学习
摘要: 我们提出了一种用于少样本类增量学习(FSCIL)的“一袋技巧”框架,这是一种具有挑战性的持续学习形式,涉及对有限样本的新任务进行持续适应。FSCIL需要稳定性和适应性,即在学习新任务的同时保持先前学习任务的熟练程度。我们提出的一袋技巧将八种关键且具有高影响力的技术汇集在一起,这些技术可以在一个统一的框架下改善FSCIL的稳定性、适应性和整体性能。我们将这些技巧分为三类:稳定性技巧、适应性技巧和训练技巧。稳定性技巧旨在通过增强已学习类别的嵌入之间的分离并在学习新类别时最小化干扰来减轻先前学习类别的遗忘。另一方面,适应性技巧专注于有效学习新类别。最后,训练技巧在不影响稳定性或适应性的情况下提高整体性能。我们在三个基准数据集CIFAR-100、CUB-200和miniIMageNet上进行了大量实验证明我们提出的框架的影响。我们的详细分析显示,我们的方法显著改善了稳定性和适应性,通过胜过该领域先前的作品,确立了新的最先进技术。我们相信我们的方法提供了一种应对解决方案,并为未来研究在该领域建立了一个稳健的基准。
更新时间: 2024-03-21 13:33:00
领域: cs.CV,cs.LG
Consistency Enhancement-Based Deep Multiview Clustering via Contrastive Learning
Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clustering methods based on contrastive learning overlook the consistency of the clustering representations during the clustering process. In this paper, we show how the above problems can be overcome and propose a consistent enhancement-based deep MVC method via contrastive learning (CCEC). Specifically, semantic connection blocks are incorporated into a feature representation to preserve the consistent information among multiple views. Furthermore, the representation process for clustering is enhanced through spectral clustering, and the consistency across multiple views is improved. Experiments conducted on five datasets demonstrate the effectiveness and superiority of our method in comparison with the state-of-the-art (SOTA) methods. The code for this method can be accessed at https://anonymous.4open.science/r/CCEC-E84E/.
Updated: 2024-03-21 13:23:44
标题: 基于一致性增强的深度多视图聚类方法:通过对比学习进行优化
摘要: 多视图聚类(MVC)通过综合多个视图中的信息,将数据样本分离成有意义的簇。此外,基于深度学习的方法在MVC场景中展示了强大的特征学习能力。然而,有效地泛化特征表示并保持一致性仍然是一个棘手的问题。此外,大多数基于对比学习的现有深度聚类方法在聚类过程中忽略了聚类表示的一致性。在本文中,我们展示了如何克服上述问题,并提出了一种基于对比学习的一致性增强型深度MVC方法(CCEC)。具体来说,语义连接块被整合到特征表示中,以保留多视图之间的一致信息。此外,通过谱聚类增强了聚类的表示过程,并提高了多视图之间的一致性。在五个数据集上进行的实验表明,与最先进的方法相比,我们的方法的有效性和优越性。此方法的代码可以在https://anonymous.4open.science/r/CCEC-E84E/访问。
更新时间: 2024-03-21 13:23:44
领域: cs.LG,cs.CV
Estimating Causal Effects with Double Machine Learning -- A Method Evaluation
The estimation of causal effects with observational data continues to be a very active research area. In recent years, researchers have developed new frameworks which use machine learning to relax classical assumptions necessary for the estimation of causal effects. In this paper, we review one of the most prominent methods - "double/debiased machine learning" (DML) - and empirically evaluate it by comparing its performance on simulated data relative to more traditional statistical methods, before applying it to real-world data. Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships. This advantage enables a departure from traditional functional form assumptions typically necessary in causal effect estimation. However, we demonstrate that the method continues to critically depend on standard assumptions about causal structure and identification. When estimating the effects of air pollution on housing prices in our application, we find that DML estimates are consistently larger than estimates of less flexible methods. From our overall results, we provide actionable recommendations for specific choices researchers must make when applying DML in practice.
Updated: 2024-03-21 13:21:33
标题: 用双机器学习估计因果效应--方法评估
摘要: 用观察数据估计因果效应仍然是一个非常活跃的研究领域。近年来,研究人员开发了利用机器学习来放宽传统假设以估计因果效应的新框架。在本文中,我们回顾了其中一个最重要的方法 - “双重/无偏机器学习”(DML),并通过将其在模拟数据上的表现与更传统的统计方法进行比较来进行实证评估,然后将其应用于真实世界的数据。我们的研究结果表明,在DML中应用一个适当灵活的机器学习算法可以改善对各种非线性混淆关系的调整。这种优势使得能够摆脱通常在因果效应估计中所需的传统函数形式假设。然而,我们证明该方法继续严重依赖于有关因果结构和识别的标准假设。在我们的应用中估计空气污染对房价的影响时,我们发现DML的估计值始终比不那么灵活的方法的估计值更大。根据我们的总体结果,我们提供了在实践中应用DML时研究人员必须做出的具体选择的可操作建议。
更新时间: 2024-03-21 13:21:33
领域: stat.ML,cs.LG,econ.EM,stat.ME
Adversarial Attacks and Defenses in Automated Control Systems: A Comprehensive Benchmark
Integrating machine learning into Automated Control Systems (ACS) enhances decision-making in industrial process management. One of the limitations to the widespread adoption of these technologies in industry is the vulnerability of neural networks to adversarial attacks. This study explores the threats in deploying deep learning models for fault diagnosis in ACS using the Tennessee Eastman Process dataset. By evaluating three neural networks with different architectures, we subject them to six types of adversarial attacks and explore five different defense methods. Our results highlight the strong vulnerability of models to adversarial samples and the varying effectiveness of defense strategies. We also propose a novel protection approach by combining multiple defense methods and demonstrate it's efficacy. This research contributes several insights into securing machine learning within ACS, ensuring robust fault diagnosis in industrial processes.
Updated: 2024-03-21 13:18:47
标题: 自动控制系统中的对抗性攻击与防御:全面评估
摘要: 将机器学习整合到自动控制系统(ACS)中,可以增强工业过程管理中的决策制定。在工业领域广泛采用这些技术的限制之一是神经网络对对抗性攻击的脆弱性。本研究探讨了在ACS中部署深度学习模型用于故障诊断时的威胁,使用了田纳西伊斯曼过程数据集。通过评估三种不同架构的神经网络,我们将它们暴露于六种类型的对抗性攻击,并探索五种不同的防御方法。我们的结果突显了模型对对抗性样本的强大脆弱性以及不同防御策略的有效性。我们还提出了一种新颖的保护方法,通过结合多种防御方法并展示其有效性。这项研究为确保工业过程中的机器学习安全,以确保工业过程中的稳健故障诊断做出了若干见解。
更新时间: 2024-03-21 13:18:47
领域: cs.LG,cs.CR,cs.SY,eess.SY,I.2.6; I.2.1
Learning to Solve Integer Linear Programs with Davis-Yin Splitting
In many applications, a combinatorial problem must be repeatedly solved with similar, but distinct parameters. Yet, the parameters $w$ are not directly observed; only contextual data $d$ that correlates with $w$ is available. It is tempting to use a neural network to predict $w$ given $d$. However, training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. When the problem in question is an Integer Linear Program (ILP), one approach to overcome this training issue is to consider a continuous relaxation of the combinatorial problem. While existing methods utilizing this approach have shown to be highly effective on small problems, they do not always scale well to large problems. In this work, we draw on ideas from modern convex optimization to design a network and training scheme which scales effortlessly to problems with thousands of variables. Our experiments verify the computational advantage our proposed method enjoys on two representative problems, namely the shortest path problem and the knapsack problem.
Updated: 2024-03-21 13:16:27
标题: 学习使用Davis-Yin分裂方法解决整数线性规划问题
摘要: 在许多应用中,需要重复解决一个组合问题,但参数是相似的,但不同的。然而,参数$w$并不是直接可观测的;只有与$w$相关的上下文数据$d$是可用的。使用神经网络来预测给定$d$的$w$是很诱人的。然而,训练这样一个模型需要调和组合优化的离散性质和用于训练神经网络的基于梯度的框架。当所讨论的问题是整数线性规划(ILP)时,克服这个训练问题的一种方法是考虑组合问题的连续放松。虽然利用这种方法的现有方法在小问题上已经被证明非常有效,但它们并不总是很好地适应大问题。在这项工作中,我们借鉴了现代凸优化的思想,设计了一个网络和训练方案,可以轻松地扩展到具有数千个变量的问题。我们的实验证实了我们提出的方法在两个代表性问题上享有的计算优势,即最短路径问题和背包问题。
更新时间: 2024-03-21 13:16:27
领域: cs.LG
Editing Knowledge Representation of Language Lodel via Rephrased Prefix Prompts
Neural language models (LMs) have been extensively trained on vast corpora to store factual knowledge about various aspects of the world described in texts. Current technologies typically employ knowledge editing methods or specific prompts to modify LM outputs. However, existing knowledge editing methods are costly and inefficient, struggling to produce appropriate text. Additionally, prompt engineering is opaque and requires significant effort to find suitable prompts. To address these issues, we introduce a new method called PSPEM (Prefix Soft Prompt Editing Method), that can be used for a lifetime with just one training. It resolves the inefficiencies and generalizability issues in knowledge editing methods and overcomes the opacity of prompt engineering by automatically seeking optimal soft prompts. Specifically, PSPEM utilizes a prompt encoder and an encoding converter to refine key information in prompts and uses prompt alignment techniques to guide model generation, ensuring text consistency and adherence to the intended structure and content, thereby maintaining an optimal balance between efficiency and accuracy. We have validated the effectiveness of PSPEM through knowledge editing and attribute inserting. On the COUNTERFACT dataset, PSPEM achieved nearly 100\% editing accuracy and demonstrated the highest level of fluency. We further analyzed the similarities between PSPEM and original prompts and their impact on the model's internals. The results indicate that PSPEM can serve as an alternative to original prompts, supporting the model in effective editing.
Updated: 2024-03-21 13:15:25
标题: 通过重新表述的前缀提示编辑Lodel语言知识表示
摘要: 神经语言模型(LMs)已经在大量语料库上进行了广泛训练,以存储关于文本描述的世界各个方面的事实知识。当前的技术通常采用知识编辑方法或特定提示来修改LM的输出。然而,现有的知识编辑方法成本高昂且效率低下,很难生成适当的文本。此外,提示工程是不透明的,需要大量努力找到合适的提示。为了解决这些问题,我们引入了一种新方法称为PSPEM(前缀软提示编辑方法),只需一次训练即可在一生中使用。它解决了知识编辑方法中的效率和泛化问题,并通过自动寻找最佳软提示来克服提示工程的不透明性。具体而言,PSPEM利用提示编码器和编码转换器来优化提示中的关键信息,并使用提示对齐技术来引导模型生成,确保文本一致性和符合预期的结构和内容,从而保持效率和准确性之间的最佳平衡。我们通过知识编辑和属性插入验证了PSPEM的有效性。在COUNTERFACT数据集上,PSPEM实现了近100%的编辑准确性,并表现出最高水平的流畅度。我们进一步分析了PSPEM和原始提示之间的相似之处及其对模型内部的影响。结果表明,PSPEM可以作为原始提示的替代方案,支持模型进行有效编辑。
更新时间: 2024-03-21 13:15:25
领域: cs.CL,cs.AI
Tensor network compressibility of convolutional models
Convolutional neural networks (CNNs) represent one of the most widely used neural network architectures, showcasing state-of-the-art performance in computer vision tasks. Although larger CNNs generally exhibit higher accuracy, their size can be effectively reduced by "tensorization" while maintaining accuracy. Tensorization consists of replacing the convolution kernels with compact decompositions such as Tucker, Canonical Polyadic decompositions, or quantum-inspired decompositions such as matrix product states, and directly training the factors in the decompositions to bias the learning towards low-rank decompositions. But why doesn't tensorization seem to impact the accuracy adversely? We explore this by assessing how truncating the convolution kernels of dense (untensorized) CNNs impact their accuracy. Specifically, we truncated the kernels of (i) a vanilla four-layer CNN and (ii) ResNet-50 pre-trained for image classification on CIFAR-10 and CIFAR-100 datasets. We found that kernels (especially those inside deeper layers) could often be truncated along several cuts resulting in significant loss in kernel norm but not in classification accuracy. This suggests that such ``correlation compression'' (underlying tensorization) is an intrinsic feature of how information is encoded in dense CNNs. We also found that aggressively truncated models could often recover the pre-truncation accuracy after only a few epochs of re-training, suggesting that compressing the internal correlations of convolution layers does not often transport the model to a worse minimum. Our results can be applied to tensorize and compress CNN models more effectively.
Updated: 2024-03-21 13:12:33
标题: 卷积模型的张量网络可压缩性
摘要: 卷积神经网络(CNNs)代表了最广泛使用的神经网络架构之一,在计算机视觉任务中展示了最先进的性能。尽管较大的CNNs通常表现出更高的准确性,但它们的大小可以通过“张量化”有效地减小,同时保持准确性。张量化包括用Tucker、Canonical Polyadic分解或量子启发式分解(如矩阵乘积状态)替换卷积核,并直接训练分解中的因子以偏向低秩分解。但为什么张量化似乎不会对准确性产生不利影响?我们通过评估截断密集(未张量化)CNN的卷积核如何影响其准确性来探讨这一点。具体来说,我们截断了(i)一个普通的四层CNN和(ii)在CIFAR-10和CIFAR-100数据集上进行图像分类预训练的ResNet-50的卷积核。我们发现,卷积核(尤其是深层内部的卷积核)通常可以沿着几个切割点被截断,导致卷积核范数的显著损失,但不会影响分类准确性。这表明这种“相关压缩”(张量化的基础)是密集CNN中信息编码的固有特征。我们还发现,经过激进截断的模型通常只需要几轮重新训练就能恢复截断前的准确性,这表明压缩卷积层的内部相关性通常不会将模型转移到更糟糕的极小值。我们的结果可以更有效地应用于张量化和压缩CNN模型。
更新时间: 2024-03-21 13:12:33
领域: cs.CV,cs.LG,quant-ph
Knowledge-Enhanced Recommendation with User-Centric Subgraph Network
Recommendation systems, as widely implemented nowadays on various platforms, recommend relevant items to users based on their preferences. The classical methods which rely on user-item interaction matrices has limitations, especially in scenarios where there is a lack of interaction data for new items. Knowledge graph (KG)-based recommendation systems have emerged as a promising solution. However, most KG-based methods adopt node embeddings, which do not provide personalized recommendations for different users and cannot generalize well to the new items. To address these limitations, we propose Knowledge-enhanced User-Centric subgraph Network (KUCNet), a subgraph learning approach with graph neural network (GNN) for effective recommendation. KUCNet constructs a U-I subgraph for each user-item pair that captures both the historical information of user-item interactions and the side information provided in KG. An attention-based GNN is designed to encode the U-I subgraphs for recommendation. Considering efficiency, the pruned user-centric computation graph is further introduced such that multiple U-I subgraphs can be simultaneously computed and that the size can be pruned by Personalized PageRank. Our proposed method achieves accurate, efficient, and interpretable recommendations especially for new items. Experimental results demonstrate the superiority of KUCNet over state-of-the-art KG-based and collaborative filtering (CF)-based methods.
Updated: 2024-03-21 13:09:23
标题: 使用以用户为中心的子图网络增强的知识推荐
摘要: 推荐系统,如今在各种平台上广泛实施,根据用户的偏好向他们推荐相关的物品。传统方法依赖于用户-物品交互矩阵,在新物品缺乏交互数据的情况下存在局限性。基于知识图(KG)的推荐系统已经成为一种有前途的解决方案。然而,大多数基于KG的方法采用节点嵌入,不能为不同用户提供个性化推荐,也不能很好地推广到新物品。为了解决这些限制,我们提出了一种基于知识增强用户中心子图网络(KUCNet)的子图学习方法,利用图神经网络(GNN)进行有效的推荐。KUCNet为每个用户-物品对构建一个U-I子图,捕捉了用户-物品交互的历史信息和KG中提供的附加信息。设计了一个基于注意力的GNN来为推荐编码U-I子图。为了考虑效率,进一步引入了修剪的用户中心计算图,使得可以同时计算多个U-I子图,并且可以通过个性化PageRank来修剪大小。我们提出的方法特别适用于新物品,实现了准确、高效和可解释的推荐。实验证明了KUCNet在基于KG和协同过滤(CF)的最新方法上的优越性。
更新时间: 2024-03-21 13:09:23
领域: cs.IR,cs.AI,cs.LG
Loop Improvement: An Efficient Approach for Extracting Shared Features from Heterogeneous Data without Central Server
In federated learning, data heterogeneity significantly impacts performance. A typical solution involves segregating these parameters into shared and personalized components, a concept also relevant in multi-task learning. Addressing this, we propose "Loop Improvement" (LI), a novel method enhancing this separation and feature extraction without necessitating a central server or data interchange among participants. Our experiments reveal LI's superiority in several aspects: In personalized federated learning environments, LI consistently outperforms the advanced FedALA algorithm in accuracy across diverse scenarios. Additionally, LI's feature extractor closely matches the performance achieved when aggregating data from all clients. In global model contexts, employing LI with stacked personalized layers and an additional network also yields comparable results to combined client data scenarios. Furthermore, LI's adaptability extends to multi-task learning, streamlining the extraction of common features across tasks and obviating the need for simultaneous training. This approach not only enhances individual task performance but also achieves accuracy levels on par with classic multi-task learning methods where all tasks are trained simultaneously. LI integrates a loop topology with layer-wise and end-to-end training, compatible with various neural network models. This paper also delves into the theoretical underpinnings of LI's effectiveness, offering insights into its potential applications. The code is on https://github.com/axedge1983/LI
Updated: 2024-03-21 12:59:24
标题: 循环改进:一种有效的方法,用于从异构数据中提取共享特征而无需中央服务器
摘要: 在联邦学习中,数据异质性显著影响性能。一种典型的解决方案涉及将这些参数分为共享和个性化组件,这个概念在多任务学习中也是相关的。为了解决这个问题,我们提出了一种名为“Loop Improvement”(LI)的新方法,可以增强这种分离和特征提取,而无需中央服务器或参与者之间的数据交换。我们的实验显示,LI在几个方面表现出优越性:在个性化联邦学习环境中,LI在不同场景下的准确性始终优于先进的FedALA算法。此外,LI的特征提取器与从所有客户端聚合数据时实现的性能密切匹配。在全局模型环境中,使用具有堆叠个性化层和额外网络的LI也产生与结合客户端数据场景相当的结果。此外,LI的适应性扩展到多任务学习,简化了跨任务提取共同特征的过程,并消除了同时训练的需要。这种方法不仅提高了单个任务的性能,而且实现了与所有任务同时训练的经典多任务学习方法相当的准确性水平。 LI将循环拓扑结构与逐层和端到端训练相结合,与各种神经网络模型兼容。本文还探讨了LI有效性的理论基础,提供了关于其潜在应用的见解。 代码位于https://github.com/axedge1983/LI。
更新时间: 2024-03-21 12:59:24
领域: cs.LG,cs.AI,cs.DC
LLM4SGG: Large Language Model for Weakly Supervised Scene Graph Generation
Weakly-Supervised Scene Graph Generation (WSSGG) research has recently emerged as an alternative to the fully-supervised approach that heavily relies on costly annotations. In this regard, studies on WSSGG have utilized image captions to obtain unlocalized triplets while primarily focusing on grounding the unlocalized triplets over image regions. However, they have overlooked the two issues involved in the triplet formation process from the captions: 1) Semantic over-simplification issue arises when extracting triplets from captions, where fine-grained predicates in captions are undesirably converted into coarse-grained predicates, resulting in a long-tailed predicate distribution, and 2) Low-density scene graph issue arises when aligning the triplets in the caption with entity/predicate classes of interest, where many triplets are discarded and not used in training, leading to insufficient supervision. To tackle the two issues, we propose a new approach, i.e., Large Language Model for weakly-supervised SGG (LLM4SGG), where we mitigate the two issues by leveraging the LLM's in-depth understanding of language and reasoning ability during the extraction of triplets from captions and alignment of entity/predicate classes with target data. To further engage the LLM in these processes, we adopt the idea of Chain-of-Thought and the in-context few-shot learning strategy. To validate the effectiveness of LLM4SGG, we conduct extensive experiments on Visual Genome and GQA datasets, showing significant improvements in both Recall@K and mean Recall@K compared to the state-of-the-art WSSGG methods. A further appeal is that LLM4SGG is data-efficient, enabling effective model training with a small amount of training images.
Updated: 2024-03-21 12:59:04
标题: LLM4SGG:用于弱监督场景图生成的大型语言模型
摘要: 弱监督场景图生成(WSSGG)研究最近出现作为全面监督方法的替代方案,该方法严重依赖昂贵的注释。在这方面,对WSSGG的研究利用图像标题获取未定位三元组,主要集中在将未定位的三元组与图像区域进行对齐。然而,他们忽视了从标题中形成三元组的两个问题:1)当从标题中提取三元组时,会出现语义过度简化问题,其中标题中的细粒度谓词不希望被转换为粗粒度谓词,导致长尾谓词分布;2)当将标题中的三元组与感兴趣的实体/谓词类别对齐时,会出现低密度场景图问题,其中许多三元组被丢弃并未用于训练,导致监督不足。为了解决这两个问题,我们提出了一种新方法,即用于弱监督SGG的大型语言模型(LLM4SGG),通过利用LLM对语言的深入理解和推理能力,在从标题中提取三元组和将实体/谓词类别与目标数据对齐的过程中缓解这两个问题。为了进一步将LLM参与到这些过程中,我们采用了Chain-of-Thought的概念和上下文少样本学习策略。为了验证LLM4SGG的有效性,我们在Visual Genome和GQA数据集上进行了大量实验证明,与最先进的WSSGG方法相比,在Recall@K和平均Recall@K方面都有显著改进。LLM4SGG的另一个吸引力在于其数据效率高,能够在少量训练图像下进行有效模型训练。
更新时间: 2024-03-21 12:59:04
领域: cs.CV,cs.AI,cs.LG
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45\%$ better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.
Updated: 2024-03-21 12:51:31
标题: 室内单视角材质估计的内在图像扩散
摘要: 我们提出了内在图像扩散(Intrinsic Image Diffusion),这是一个用于室内场景外观分解的生成模型。给定单个输入视图,我们采样多个可能的材质解释,表示为反照率、粗糙度和金属度图。外观分解在计算机视觉中面临着很大的挑战,因为光照和材质属性之间存在固有的模糊性,且缺乏真实数据集。为了解决这个问题,我们提倡使用概率形式,即不直接预测真实材质属性,而是利用条件生成模型从解空间中进行采样。此外,我们展示了利用最近在大规模真实世界图像上训练的强大学习先验的扩散模型可以被调整为材质估计,并且极大地改善对真实图像的泛化能力。我们的方法产生了明显更清晰、更一致和更详细的材质,比最先进的方法在PSNR上提高了1.5dB,在反照率预测上FID分数提高了45%。我们通过对合成和真实世界数据集的实验展示了我们方法的有效性。
更新时间: 2024-03-21 12:51:31
领域: cs.CV,cs.AI,cs.GR,I.4.8; I.2.10
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel
We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
Updated: 2024-03-21 12:43:34
标题: 基于MMD负距离核的梯度流后验抽样
摘要: 我们提出了最大均值差异(MMD)的条件流,使用负距离核进行后验抽样和条件生成建模。这种MMD,也被称为能量距离,具有诸如通过切片和排序进行高效计算等有利性质。我们使用离散Wasserstein梯度流近似地面实况和观测的联合分布,并为后验分布建立了误差界限。此外,我们证明了我们的粒子流确实是一个适当函数的Wasserstein梯度流。我们的方法的实力通过数值示例得到了证明,包括条件图像生成和反问题,如超分辨率、修补和计算机断层扫描在低剂量和有限角度设置下。
更新时间: 2024-03-21 12:43:34
领域: stat.ML,cs.LG,math.OC,math.PR
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labeled point on the image, we spread the object feature to synthetic visual patterns with known boxes to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.
Updated: 2024-03-21 12:43:32
标题: Point2RBox:结合合成视觉模式知识,通过单点监督实现端到端定向物体检测
摘要: 随着面向对象检测(OOD)需求的迅速增加,最近涉及弱监督检测器用于从水平框(HBox)学习旋转框(RBox)的研究引起了越来越多的关注。在本文中,我们探索了一个更具挑战性但标记效率高的设置,即单点监督OOD,并提出了我们的方法称为Point2RBox。具体而言,我们提出利用两个原则:1)合成模式知识组合:通过在图像上每个标记点周围取样,我们将对象特征传播到已知框的合成视觉模式,为框回归提供知识。2)转换自监督:通过转换的输入图像(例如缩放/旋转),输出的RBoxes被训练以遵循相同的变换,以便网络可以感知对象之间的相对大小/旋转。该检测器通过一些设计的技术进一步增强,以应对边缘问题,例如锚/层分配,因为在我们的点监督设置中对象的大小不可用。据我们所知,Point2RBox是第一个用于点监督OOD的端到端解决方案。特别是,我们的方法采用轻量级范式,但在点监督替代方案中实现了竞争性性能,分别为DOTA/DIOR/HRSC数据集的41.05%/27.62%/80.01%。
更新时间: 2024-03-21 12:43:32
领域: cs.CV,cs.AI
Varroa destructor detection on honey bees using hyperspectral imagery
Hyperspectral (HS) imagery in agriculture is becoming increasingly common. These images have the advantage of higher spectral resolution. Advanced spectral processing techniques are required to unlock the information potential in these HS images. The present paper introduces a method rooted in multivariate statistics designed to detect parasitic Varroa destructor mites on the body of western honey bee Apis mellifera, enabling easier and continuous monitoring of the bee hives. The methodology explores unsupervised (K-means++) and recently developed supervised (Kernel Flows - Partial Least-Squares, KF-PLS) methods for parasitic identification. Additionally, in light of the emergence of custom-band multispectral cameras, the present research outlines a strategy for identifying the specific wavelengths necessary for effective bee-mite separation, suitable for implementation in a custom-band camera. Illustrated with a real-case dataset, our findings demonstrate that as few as four spectral bands are sufficient for accurate parasite identification.
Updated: 2024-03-21 12:40:41
标题: 利用高光谱成像技术检测蜜蜂上的瓦罗氏螨Varroa destructor
摘要: 农业中的高光谱(HS)成像越来越普遍。这些图像具有更高的光谱分辨率优势。需要先进的光谱处理技术来解锁这些HS图像中的信息潜力。本文介绍了一种基于多元统计的方法,旨在检测寄生虫瓦罗亚螨在西方蜜蜂 Apis mellifera 身上,实现更容易和持续监测蜜蜂巢。该方法探索了非监督(K-means++)和最近开发的监督(核流-偏最小二乘,KF-PLS)方法用于寄生虫识别。此外,鉴于定制波段多光谱相机的出现,本研究概述了一种确定用于有效蜜蜂-螨分离的特定波长的策略,适用于在定制波段相机中实施。通过实例数据集说明,我们的发现表明,仅需四个光谱波段即可准确识别寄生虫。
更新时间: 2024-03-21 12:40:41
领域: cs.CV,cs.LG
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
Large Language Models (LLMs) have demonstrated significant potential and effectiveness across multiple application domains. To assess the performance of mainstream LLMs in public security tasks, this study aims to construct a specialized evaluation benchmark tailored to the Chinese public security domain--CPSDbench. CPSDbench integrates datasets related to public security collected from real-world scenarios, supporting a comprehensive assessment of LLMs across four key dimensions: text classification, information extraction, question answering, and text generation. Furthermore, this study introduces a set of innovative evaluation metrics designed to more precisely quantify the efficacy of LLMs in executing tasks related to public security. Through the in-depth analysis and evaluation conducted in this research, we not only enhance our understanding of the performance strengths and limitations of existing models in addressing public security issues but also provide references for the future development of more accurate and customized LLM models targeted at applications in this field.
Updated: 2024-03-21 12:39:09
标题: CPSDBench:一个用于中文公共安全领域的大型语言模型评估基准和基线
摘要: 大型语言模型(LLMs)已在多个应用领域展现出显著的潜力和有效性。为了评估主流LLMs在公共安全任务中的表现,本研究旨在构建一个专门针对中国公共安全领域的评估基准--CPSDbench。CPSDbench整合了从现实场景中收集的与公共安全相关的数据集,支持对LLMs在文本分类、信息提取、问答和文本生成等四个关键维度上的全面评估。此外,本研究还引入了一组创新的评估指标,旨在更准确地量化LLMs在执行与公共安全相关任务时的有效性。通过本研究进行的深入分析和评估,我们不仅加强了对现有模型在解决公共安全问题方面的性能优势和局限性的理解,还为未来开发更精确和定制的面向该领域应用的LLM模型提供了参考。
更新时间: 2024-03-21 12:39:09
领域: cs.AI
Exploring the Potential of Large Language Models in Graph Generation
Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research.
Updated: 2024-03-21 12:37:54
标题: 探索大型语言模型在图生成中的潜力
摘要: 大型语言模型(LLMs)在许多领域取得了巨大成功,最近的研究探讨了LLMs在图形判别任务(如节点分类)中的应用。然而,文献中对LLMs在图形生成方面的能力尚未被探索。图形生成要求LLM生成具有特定属性的图形,这在药物发现等现实世界应用中具有重要意义,但也更具挑战性。在本文中,我们提出了LLM4GraphGen,通过系统性的任务设计和广泛的实验来探索LLMs在图形生成方面的能力。具体而言,我们提出了几个经过细致实验设计的任务,以解决关于LLMs对不同图形结构规则理解能力、捕捉结构类型分布能力以及利用领域知识进行基于属性的图形生成能力的关键问题。我们的评估表明,LLMs,特别是GPT-4,在图形生成任务中展现出初步的能力,包括基于规则和分布的生成。我们还观察到,诸如少样本和思维链提示等流行的提示方法并不能始终提升性能。此外,LLMs在生成具有特定属性的分子方面显示出潜力。这些发现可作为设计基于LLMs的图形生成模型的基础,并为提供宝贵的见解和进一步研究。
更新时间: 2024-03-21 12:37:54
领域: cs.LG,cs.AI,q-bio.BM
On the convergence of loss and uncertainty-based active learning algorithms
We consider the convergence rates of loss and uncertainty-based active learning algorithms under various assumptions. Firstly, we establish a set of conditions that ensure convergence rates when applied to linear classifiers and linearly separable datasets. This includes demonstrating convergence rate guarantees for loss-based sampling with various loss functions. Secondly, we introduce a framework that allows us to derive convergence rate bounds for loss-based sampling by leveraging known convergence rate bounds for stochastic gradient descent algorithms. Lastly, we propose a new algorithm that combines point sampling and stochastic Polyak's step size. We establish a condition on the sampling process, ensuring a convergence rate guarantee for this algorithm, particularly in the case of smooth convex loss functions. Our numerical results showcase the efficiency of the proposed algorithm.
Updated: 2024-03-21 12:37:21
标题: 关于损失和基于不确定性的主动学习算法的收敛性
摘要: 我们考虑在各种假设下损失和基于不确定性的主动学习算法的收敛速度。首先,我们建立了一组条件,确保在应用于线性分类器和线性可分数据集时的收敛速度。这包括展示基于损失函数的采样具有收敛速度保证。其次,我们引入了一个框架,通过利用已知的随机梯度下降算法的收敛速度上界来推导基于损失函数的采样的收敛速度界限。最后,我们提出了一种结合点采样和随机Polyak步长的新算法。我们建立了关于采样过程的条件,确保该算法具有收敛速度保证,特别是在光滑凸损失函数的情况下。我们的数值结果展示了所提算法的效率。
更新时间: 2024-03-21 12:37:21
领域: cs.LG,cs.AI
RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization
Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.
Updated: 2024-03-21 12:36:22
标题: RiskQ:风险敏感的多智能体强化学习价值因子分解
摘要: 多智能体系统以环境不确定性、智能体的不同策略和部分可观测性为特征,这导致了重大风险。在多智能体强化学习(MARL)的背景下,学习协调和分散的对风险敏感的策略是具有挑战性的。为了制定风险敏感的MARL协调要求,我们引入了风险敏感的个体-全局最大(RIGM)原则作为个体-全局最大(IGM)和分布IGM(DIGM)原则的概括。该原则要求每个智能体的风险敏感行动选择的集合应当等同于中央策略的风险敏感行动选择。目前的MARL价值因子化方法不满足常见风险度量标准(如Value at Risk(VaR)度量标准或失真风险测量)的RIGM原则。因此,我们提出RiskQ来解决这一限制,该方法通过将联合回报分布建模为加权分位数混合物以对每个智能体的回报分布效用进行建模。RiskQ满足VaR和失真风险度量标准的RIGM原则。我们通过大量实验展示了RiskQ可以取得有希望的性能。RiskQ的源代码可在https://github.com/xmu-rl-3dv/RiskQ获得。
更新时间: 2024-03-21 12:36:22
领域: cs.MA,cs.AI,cs.LG
DomainLab: A modular Python package for domain generalization in deep learning
Poor generalization performance caused by distribution shifts in unseen domains often hinders the trustworthy deployment of deep neural networks. Many domain generalization techniques address this problem by adding a domain invariant regularization loss terms during training. However, there is a lack of modular software that allows users to combine the advantages of different methods with minimal effort for reproducibility. DomainLab is a modular Python package for training user specified neural networks with composable regularization loss terms. Its decoupled design allows the separation of neural networks from regularization loss construction. Hierarchical combinations of neural networks, different domain generalization methods, and associated hyperparameters, can all be specified together with other experimental setup in a single configuration file. Hierarchical combinations of neural networks, different domain generalization methods, and associated hyperparameters, can all be specified together with other experimental setup in a single configuration file. In addition, DomainLab offers powerful benchmarking functionality to evaluate the generalization performance of neural networks in out-of-distribution data. The package supports running the specified benchmark on an HPC cluster or on a standalone machine. The package is well tested with over 95 percent coverage and well documented. From the user perspective, it is closed to modification but open to extension. The package is under the MIT license, and its source code, tutorial and documentation can be found at https://github.com/marrlab/DomainLab.
Updated: 2024-03-21 12:35:46
标题: DomainLab:一个用于深度学习领域泛化的模块化Python包
摘要: 由于未知领域中的分布转移导致的泛化性能不佳经常阻碍了深度神经网络的可信部署。许多领域泛化技术通过在训练过程中添加一个领域不变的正则化损失项来解决这个问题。然而,目前缺乏模块化软件,允许用户以最小的努力将不同方法的优势结合起来以便复现。DomainLab是一个模块化的Python软件包,用于训练用户指定的神经网络,并包含可组合的正则化损失项。它的解耦设计允许将神经网络与正则化损失构建分开。神经网络、不同的领域泛化方法以及相关超参数的分层组合,可以与其他实验设置一起在单个配置文件中指定。此外,DomainLab提供了强大的基准测试功能,用于评估神经网络在分布之外数据中的泛化性能。该软件包支持在HPC集群上或独立机器上运行指定的基准测试。该软件包经过了良好的测试,测试覆盖率超过95%,并且有很好的文档。从用户角度来看,它是封闭的,但可扩展的。该软件包采用MIT许可证,其源代码、教程和文档可在https://github.com/marrlab/DomainLab找到。
更新时间: 2024-03-21 12:35:46
领域: cs.LG,cs.SE
Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels
Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples.
Updated: 2024-03-21 12:34:14
标题: 神经沃瑟斯坦梯度流用于最大均值差异与Riesz核函数
摘要: Wasserstein梯度流的最大均值差异(MMD)函数与非光滑Riesz核呈现出丰富的结构,因为奇异测度可以变为绝对连续测度,反之亦然。本文为理解这种流动做出了贡献。我们提出用神经网络(NNs)逼近Jordan、Kinderlehrer和Otto的反向方案,以计算这种Wasserstein梯度流,同时提出了所谓的Wasserstein最陡下降流的前向方案。由于我们不能局限于绝对连续测度,所以必须处理运输计划和速度计划,而不是通常的运输图和速度场。实际上,我们通过生成式NNs逼近了这两个计划的分解,这些NNs是根据适当的损失函数学习的。为了评估这两种神经方案的质量,我们将它们在相互作用能量上进行基准测试。在这里,我们提供了从Dirac测度开始的Wasserstein方案的解析公式,并展示了随着时间步长趋近于零时的收敛性。最后,我们通过数值示例展示了我们的神经MMD流。
更新时间: 2024-03-21 12:34:14
领域: cs.LG,math.OC,math.PR
Graph Ranking Contrastive Learning: A Extremely Simple yet Efficient Method
Graph contrastive learning (GCL) has emerged as a representative graph self-supervised method, achieving significant success. The currently prevalent optimization objective for GCL is InfoNCE. Typically, it employs augmentation techniques to obtain two views, where a node in one view acts as the anchor, the corresponding node in the other view serves as the positive sample, and all other nodes are regarded as negative samples. The goal is to minimize the distance between the anchor node and positive samples and maximize the distance to negative samples. However, due to the lack of label information during training, InfoNCE inevitably treats samples from the same class as negative samples, leading to the issue of false negative samples. This can impair the learned node representations and subsequently hinder performance in downstream tasks. While numerous methods have been proposed to mitigate the impact of false negatives, they still face various challenges. For instance, while increasing the number of negative samples can dilute the impact of false negatives, it concurrently increases computational burden. Thus, we propose GraphRank, a simple yet efficient graph contrastive learning method that addresses the problem of false negative samples by redefining the concept of negative samples to a certain extent, thereby avoiding the issue of false negative samples. The effectiveness of GraphRank is empirically validated through experiments on the node, edge, and graph level tasks.
Updated: 2024-03-21 12:32:53
标题: 图排名对比学习:一种极其简单但高效的方法
摘要: 图对比学习(GCL)已经成为代表性的图自监督方法,取得了显著的成功。目前普遍的GCL优化目标是InfoNCE。通常,它采用增强技术来获得两个视图,其中一个视图中的一个节点充当锚点,另一个视图中对应的节点作为正样本,所有其他节点被视为负样本。其目标是最小化锚节点与正样本之间的距离,并最大化与负样本之间的距离。然而,在训练过程中由于缺乏标签信息,InfoNCE不可避免地将来自同一类别的样本视为负样本,导致假负样本的问题。这可能会损害学习到的节点表示,并随后阻碍下游任务的性能。虽然已经提出了许多方法来减轻假负样本的影响,但它们仍然面临各种挑战。例如,增加负样本的数量可以稀释假负样本的影响,但同时增加了计算负担。因此,我们提出了GraphRank,这是一种简单而有效的图对比学习方法,通过在一定程度上重新定义负样本的概念,从而避免了假负样本的问题。通过节点、边和图级任务的实验,GraphRank的有效性得到了经验验证。
更新时间: 2024-03-21 12:32:53
领域: cs.LG,cs.AI
Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset and iii) not dealing with human work capacity constraints. To address these issues, we propose the deferral under cost and capacity constraints framework (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average 8.4% reduction in the misclassification cost.
Updated: 2024-03-21 12:30:16
标题: 成本敏感学习以满足工作负荷约束条件推迟多个专家
摘要: 学会推迟(L2D)旨在通过学习如何将决策推迟给人类,当人类比机器学习分类器更有可能正确时,从而改善人工智能协作系统。现有的L2D研究忽视了阻碍其实际应用的现实世界系统的关键方面,即:i)忽略了成本敏感场景,其中类型1和类型2错误具有不同的成本;ii)要求对训练数据集的每个实例进行并发人类预测;iii)没有处理人类工作能力限制。为了解决这些问题,我们提出了在成本和能力约束下的推迟框架(DeCCaF)。DeCCaF是一种新颖的L2D方法,采用监督学习来建模在较少限制的数据要求下人类错误的概率(每个实例仅需一个专家预测),并使用约束编程来在工作负载限制下全局最小化错误成本。我们在一系列不同团队的9名合成欺诈分析员中测试了DeCCaF在成本敏感欺诈检测场景中的性能,每个分析员都有个人工作能力约束。结果表明,我们的方法在各种场景中表现明显优于基线,实现了平均8.4%的误分类成本降低。
更新时间: 2024-03-21 12:30:16
领域: cs.LG,cs.AI
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.
Updated: 2024-03-21 12:28:44
标题: DaCapo:加速自主系统中视频分析的持续学习
摘要: 深度神经网络(DNN)视频分析对于自动系统(如自动驾驶车辆、无人机和安全机器人)至关重要。然而,由于计算资源和电池功率有限,真实世界的部署面临挑战。为了解决这些挑战,持续学习在部署(推断)中利用轻量级“学生”模型,利用更大的“教师”模型对采样数据进行标记(标记),并持续重新训练学生模型以适应不断变化的情景(重新训练)。本文强调了现有持续学习系统的局限性:(1)它们关注重新训练的计算,而忽视了推断和标记的计算需求,(2)它们依赖耗电量大的GPU,不适合于电池驱动的自动系统,(3)它们位于远程集中式服务器上,适用于多租户场景,但由于隐私、网络可用性和延迟问题,不适合于自动系统。我们提出了一个硬件-算法共同设计的持续学习解决方案DaCapo,使自动系统能够以高效节能的方式同时执行推断、标记和训练。DaCapo包括(1)一个可在子加速器上并行执行内核的空间可分割和精度灵活的加速器,以及(2)一个空间-时间资源分配算法,策略性地在资源-精度权衡空间中导航,促进资源分配的最佳决策,实现最大准确性。我们的评估表明,DaCapo的准确性比基于GPU的最先进持续学习系统Ekya和EOMU分别高出6.5%和5.5%,同时功耗降低254倍。
更新时间: 2024-03-21 12:28:44
领域: cs.AR,cs.LG,cs.RO
Adversary-Augmented Simulation to evaluate client-fairness on HyperLedger Fabric
This paper presents a novel adversary model specifically tailored to distributed systems, with the aim to asses the security of blockchain technologies. Building upon literature on adversarial assumptions and capabilities, we include classical notions of failure and communication models to classify and bind the use of adversarial actions. We focus on the effect of these actions on properties of distributed protocols. A significant effort of our research is the integration of this model into the Multi-Agent eXperimenter (MAX) framework. This integration enables realistic simulations of adversarial attacks on blockchain systems. In particular, we have simulated attacks violating a form of client-fairness on HyperLedger Fabric.
Updated: 2024-03-21 12:20:36
标题: 对HyperLedger Fabric上的客户公平性进行评估的对手增强模拟
摘要: 本文介绍了一种专门针对分布式系统设计的新型对手模型,旨在评估区块链技术的安全性。在对对手假设和能力的文献基础上,我们包括了经典的故障和通信模型,以对对手行为的使用进行分类和约束。我们着重研究这些行为对分布式协议属性的影响。我们研究的一个重要成果是将这一模型整合到Multi-Agent eXperimenter(MAX)框架中。这种整合使得对区块链系统进行对手攻击的实际模拟成为可能。特别地,我们已经模拟了在HyperLedger Fabric上违反客户公平性的攻击。
更新时间: 2024-03-21 12:20:36
领域: cs.CR,cs.DC,cs.MA
Exploring Task Unification in Graph Representation Learning via Generative Approach
Graphs are ubiquitous in real-world scenarios and encompass a diverse range of tasks, from node-, edge-, and graph-level tasks to transfer learning. However, designing specific tasks for each type of graph data is often costly and lacks generalizability. Recent endeavors under the "Pre-training + Fine-tuning" or "Pre-training + Prompt" paradigms aim to design a unified framework capable of generalizing across multiple graph tasks. Among these, graph autoencoders (GAEs), generative self-supervised models, have demonstrated their potential in effectively addressing various graph tasks. Nevertheless, these methods typically employ multi-stage training and require adaptive designs, which on one hand make it difficult to be seamlessly applied to diverse graph tasks and on the other hand overlook the negative impact caused by discrepancies in task objectives between the different stages. To address these challenges, we propose GA^2E, a unified adversarially masked autoencoder capable of addressing the above challenges seamlessly. Specifically, GA^2E proposes to use the subgraph as the meta-structure, which remains consistent across all graph tasks (ranging from node-, edge-, and graph-level to transfer learning) and all stages (both during training and inference). Further, GA^2E operates in a \textbf{"Generate then Discriminate"} manner. It leverages the masked GAE to reconstruct the input subgraph whilst treating it as a generator to compel the reconstructed graphs resemble the input subgraph. Furthermore, GA^2E introduces an auxiliary discriminator to discern the authenticity between the reconstructed (generated) subgraph and the input subgraph, thus ensuring the robustness of the graph representation through adversarial training mechanisms. We validate GA^2E's capabilities through extensive experiments on 21 datasets across four types of graph tasks.
Updated: 2024-03-21 12:14:02
标题: 通过生成方法探索图表示学习中的任务统一化
摘要: 图在现实世界中无处不在,并涵盖了从节点、边和图级任务到迁移学习等各种任务。然而,为每种类型的图数据设计特定任务往往成本高昂且缺乏泛化能力。最近在“预训练+微调”或“预训练+提示”范式下的努力旨在设计一个统一框架,能够泛化多个图任务。在这些方法中,图自动编码器(GAEs)、生成自监督模型已经展示了它们在有效解决各种图任务方面的潜力。然而,这些方法通常采用多阶段训练并需要自适应设计,一方面使其难以无缝地应用于各种图任务,另一方面忽视了不同阶段任务目标之间的差异造成的负面影响。为了解决这些挑战,我们提出了GA^2E,一个统一的对抗遮蔽自动编码器,能够无缝地解决上述挑战。具体而言,GA^2E提出使用子图作为元结构,这在所有图任务(从节点、边和图级到迁移学习)和所有阶段(训练和推断期间)中保持一致。此外,GA^2E以“生成然后辨别”的方式运作。它利用遮蔽的GAE重建输入子图,同时将其视为生成器,以使重建的图看起来像输入子图。此外,GA^2E引入了一个辅助鉴别器来区分重建(生成的)子图和输入子图之间的真实性,从而通过对抗训练机制确保图表示的稳健性。我们通过在21个数据集上进行广泛实验证实了GA^2E的能力,涵盖了四种类型的图任务。
更新时间: 2024-03-21 12:14:02
领域: cs.LG,cs.AI
$\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning
Machine Unlearning, the process of selectively eliminating the influence of certain data examples used during a model's training, has gained significant attention as a means for practitioners to comply with recent data protection regulations. However, existing unlearning methods face critical drawbacks, including their prohibitively high cost, often associated with a large number of hyperparameters, and the limitation of forgetting only relatively small data portions. This often makes retraining the model from scratch a quicker and more effective solution. In this study, we introduce Gradient-based and Task-Agnostic machine Unlearning ($\nabla \tau$), an optimization framework designed to remove the influence of a subset of training data efficiently. It applies adaptive gradient ascent to the data to be forgotten while using standard gradient descent for the remaining data. $\nabla \tau$ offers multiple benefits over existing approaches. It enables the unlearning of large sections of the training dataset (up to 30%). It is versatile, supporting various unlearning tasks (such as subset forgetting or class removal) and applicable across different domains (images, text, etc.). Importantly, $\nabla \tau$ requires no hyperparameter adjustments, making it a more appealing option than retraining the model from scratch. We evaluate our framework's effectiveness using a set of well-established Membership Inference Attack metrics, demonstrating up to 10% enhancements in performance compared to state-of-the-art methods without compromising the original model's accuracy.
Updated: 2024-03-21 12:11:26
标题: $\nabla τ$: 基于梯度和任务无关的机器去学习
摘要: 机器遗忘是选择性地消除模型训练过程中使用的某些数据示例的影响的过程,作为从业者遵守最近的数据保护法规的手段已经引起了重视。然而,现有的遗忘方法存在关键缺点,包括其成本过高,通常与大量超参数相关,以及仅遗忘相对较小数据部分的限制。这通常使得从头开始重新训练模型成为一个更快速和更有效的解决方案。在这项研究中,我们介绍了基于梯度和任务无关的机器遗忘($\nabla \tau$),这是一个旨在有效消除一部分训练数据影响的优化框架。它将自适应梯度上升应用于需要被遗忘的数据,同时对其余数据使用标准梯度下降。$\nabla \tau$相对于现有方法具有多重优势。它能够遗忘训练数据集的大部分(高达30%)。它具有多功能性,支持各种遗忘任务(如子集遗忘或类别删除),并适用于不同领域(图像、文本等)。重要的是,$\nabla \tau$不需要超参数调整,使其比从头开始重新训练模型更具吸引力。我们使用一组成熟的会员推理攻击指标评估了我们的框架的有效性,表明相对于现有技术方法,性能提升高达10%,而不会损害原始模型的准确性。
更新时间: 2024-03-21 12:11:26
领域: cs.LG,cs.AI,cs.CL,cs.CV
Mpox-AISM: AI-Mediated Super Monitoring for Mpox and Like-Mpox
The key to preventing the spread of mpox (monkeypox) lies in timely, convenient, and accurate diagnosis for earlier-stage infected individuals. Unfortunately, the resemblances between common skin diseases and mpox and the need for professional diagnosis inevitably deteriorated the diagnosis of earlier-stage patients with Mpox and contributed to its widespread outbreak in crowded areas. Here, we proposed a real-time visualization strategy called "Super Monitoring" using artificial intelligence and Internet technology, thereby performing a low-cost, convenient, timely, and unspecialized diagnosis for earlier-stage mpox. Specifically, such AI-mediated "super monitoring" (Mpox-AISM) invokes a framework assembled by deep learning models, data augmentation, self-supervised learning, and cloud services. Verified by publicly available datasets, the Precision, Recall, Specificity, and F1-score of Mpox-AISM in diagnosing mpox achieved 99.3%, 94.1%, 99.9%, and 96.6%, respectively. Furthermore, Mpox-AISM's overall accuracy reaches 94.51% in diagnosing mpox, six like-mpox skin diseases, and normal skin. We also employed gradient-weighted class activation mapping to explain the decision-making process of Mpox-AISM, thus handily understanding the specific characteristics that may indicate the mpox's onset and improving its reliability. With the help of the Internet and communication terminal, Mpox-AISM can perform a real-time, low-cost, and convenient diagnosis for earlier-stage mpox in various real-world settings, thereby effectively curbing the spread of mpox virus.
Updated: 2024-03-21 12:05:47
标题: Mpox-AISM:Mpox和类似Mpox的AI介导超级监测
摘要: 预防mpox(猴痘)传播的关键在于及时、方便和准确地诊断早期感染者。不幸的是,常见皮肤疾病与mpox之间的相似之处以及对专业诊断的需求不可避免地导致了对早期患有Mpox的患者的诊断恶化,从而促成了在拥挤地区的广泛爆发。在这里,我们提出了一种实时可视化策略,称为“超级监测”,利用人工智能和互联网技术,从而为早期mpox进行低成本、方便、及时和非专业化的诊断。具体而言,这种AI介导的“超级监测”(Mpox-AISM)调用了由深度学习模型、数据增强、自监督学习和云服务组装而成的框架。通过公开可用的数据集验证,Mpox-AISM在诊断mpox方面的精度、召回率、特异性和F1分数分别达到了99.3%、94.1%、99.9%和96.6%。此外,Mpox-AISM在诊断mpox、六种类似mpox的皮肤疾病和正常皮肤方面的总体准确率达到了94.51%。我们还使用梯度加权类激活映射来解释Mpox-AISM的决策过程,从而便于理解可能指示mpox发作的特定特征并提高其可靠性。在互联网和通信终端的帮助下,Mpox-AISM可以在各种实际环境中实时、低成本和方便地为早期mpox进行诊断,从而有效遏制mpox病毒的传播。
更新时间: 2024-03-21 12:05:47
领域: eess.IV,cs.CV,cs.LG
ROS-Causal: A ROS-based Causal Analysis Framework for Human-Robot Interaction Applications
Deploying robots in human-shared spaces requires understanding interactions among nearby agents and objects. Modelling cause-and-effect relations through causal inference aids in predicting human behaviours and anticipating robot interventions. However, a critical challenge arises as existing causal discovery methods currently lack an implementation inside the ROS ecosystem, the standard de facto in robotics, hindering effective utilisation in robotics. To address this gap, this paper introduces ROS-Causal, a ROS-based framework for onboard data collection and causal discovery in human-robot spatial interactions. An ad-hoc simulator, integrated with ROS, illustrates the approach's effectiveness, showcasing the robot onboard generation of causal models during data collection. ROS-Causal is available on GitHub: https://github.com/lcastri/roscausal.git.
Updated: 2024-03-21 11:58:49
标题: ROS-Causal:一种基于ROS的用于人机交互应用的因果分析框架
摘要: 在人类共享空间中部署机器人需要理解附近代理和物体之间的相互作用。通过因果推断对因果关系建模有助于预测人类行为并预测机器人干预。然而,一个关键挑战是现有的因果发现方法目前缺乏在ROS生态系统内的实现,这是机器人领域的标准,阻碍了在机器人领域的有效利用。为了弥补这一差距,本文介绍了ROS-Causal,这是一个基于ROS的框架,用于在人机空间互动中进行机载数据收集和因果发现。一个与ROS集成的临时仿真器展示了该方法的有效性,展示了机器人在数据收集过程中生成因果模型。ROS-Causal可以在GitHub上找到:https://github.com/lcastri/roscausal.git。
更新时间: 2024-03-21 11:58:49
领域: cs.RO,cs.AI
A Differentially Private Clustering Algorithm for Well-Clustered Graphs
We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($\epsilon$,$\delta$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $\epsilon$-DP algorithm would result in substantial error.
Updated: 2024-03-21 11:57:16
标题: 一个针对良好聚类图的差分隐私聚类算法
摘要: 我们研究不同iallyially private (DP)算法,用于恢复在良好聚类图中的簇,这些图的顶点集可以被划分为少量集合,每个集合诱导出具有高内导纳和小外导纳的子图。这样的图在谱聚类的理论分析中被广泛应用作为基准。我们提供了一种针对这种图特别定制的高效(ϵ,δ)-DP算法。我们的算法受到陈等人最近的工作的启发,他们开发了用于在图由两个几乎平衡的簇组成的情况下恢复随机块模型的DP算法。我们的算法适用于具有k个几乎平衡簇的良好聚类图,且误分类比率几乎与最知名的非私有算法相匹配。我们对具有已知地面真实簇的数据集进行实验评估,以证实我们算法的能力。我们还表明任何(纯)ϵ-DP算法都会导致显著误差。
更新时间: 2024-03-21 11:57:16
领域: cs.DS,cs.CR,cs.LG
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression
Recent advancements in reinforcement learning (RL) have led to remarkable achievements in robot locomotion capabilities. However, the complexity and ``black-box'' nature of neural network-based RL policies hinder their interpretability and broader acceptance, particularly in applications demanding high levels of safety and reliability. This paper introduces a novel approach to distill neural RL policies into more interpretable forms using Gradient Boosting Machines (GBMs), Explainable Boosting Machines (EBMs) and Symbolic Regression. By leveraging the inherent interpretability of generalized additive models, decision trees, and analytical expressions, we transform opaque neural network policies into more transparent ``glass-box'' models. We train expert neural network policies using RL and subsequently distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies. To address the inherent distribution shift challenge of behavioral cloning, we propose to use the Dataset Aggregation (DAgger) algorithm with a curriculum of episode-dependent alternation of actions between expert and distilled policies, to enable efficient distillation of feedback control policies. We evaluate our approach on various robot locomotion gaits -- walking, trotting, bounding, and pacing -- and study the importance of different observations in joint actions for distilled policies using various methods. We train neural expert policies for 205 hours of simulated experience and distill interpretable policies with only 10 minutes of simulated interaction for each gait using the proposed method.
Updated: 2024-03-21 11:54:45
标题: 提炼可解释的机器人运动强化学习政策:梯度提升机和符号回归
摘要: 最近在强化学习(RL)领域取得了重大进展,使得机器人的运动能力有了显著提升。然而,基于神经网络的RL策略的复杂性和“黑匣子”特性阻碍了它们的可解释性和更广泛的接受度,特别是在要求高水平安全性和可靠性的应用中。本文介绍了一种新的方法,利用梯度提升机(GBM)、可解释性提升机(EBM)和符号回归将神经RL策略提炼成更可解释的形式。通过利用广义加性模型、决策树和分析表达式的固有可解释性,我们将不透明的神经网络策略转化为更透明的“玻璃箱”模型。我们使用RL训练专家神经网络策略,然后将其提炼成(i)GBM、(ii)EBM和(iii)符号策略。为了解决行为克隆的固有分布转移挑战,我们提出使用数据聚合(DAgger)算法,通过一套基于情节的交替动作的课程,使得专家和提炼策略之间能够有效地提炼反馈控制策略。我们在各种机器人运动步态(行走、小跑、腾跃和节奏)上评估了我们的方法,并使用各种方法研究了在提炼策略中不同观察对联合行动的重要性。我们为205小时的模拟经验训练神经专家策略,并使用所提出的方法仅用10分钟的模拟交互为每种步态提炼可解释的策略。
更新时间: 2024-03-21 11:54:45
领域: cs.RO,cs.AI,cs.LG
Investigating the validity of structure learning algorithms in identifying risk factors for intervention in patients with diabetes
Diabetes, a pervasive and enduring health challenge, imposes significant global implications on health, financial healthcare systems, and societal well-being. This study undertakes a comprehensive exploration of various structural learning algorithms to discern causal pathways amongst potential risk factors influencing diabetes progression. The methodology involves the application of these algorithms to relevant diabetes data, followed by the conversion of their output graphs into Causal Bayesian Networks (CBNs), enabling predictive analysis and the evaluation of discrepancies in the effect of hypothetical interventions within our context-specific case study. This study highlights the substantial impact of algorithm selection on intervention outcomes. To consolidate insights from diverse algorithms, we employ a model-averaging technique that helps us obtain a unique causal model for diabetes derived from a varied set of structural learning algorithms. We also investigate how each of those individual graphs, as well as the average graph, compare to the structures elicited by a domain expert who categorised graph edges into high confidence, moderate, and low confidence types, leading into three individual graphs corresponding to the three levels of confidence. The resulting causal model and data are made available online, and serve as a valuable resource and a guide for informed decision-making by healthcare practitioners, offering a comprehensive understanding of the interactions between relevant risk factors and the effect of hypothetical interventions. Therefore, this research not only contributes to the academic discussion on diabetes, but also provides practical guidance for healthcare professionals in developing efficient intervention and risk management strategies.
Updated: 2024-03-21 11:51:42
标题: 研究结构学习算法在识别糖尿病患者干预风险因素有效性的标题
摘要: 糖尿病是一种普遍存在且持久的健康挑战,对健康、金融医疗系统和社会福祉都产生了重要的全球影响。本研究对各种结构学习算法进行了全面探索,以确定影响糖尿病进展的潜在风险因素之间的因果路径。方法涉及将这些算法应用于相关的糖尿病数据,然后将它们的输出图转换为因果贝叶斯网络(CBNs),从而实现预测分析并评估在我们特定背景下对假设干预效果的差异。 本研究强调算法选择对干预结果的重要影响。为了整合来自不同算法的见解,我们采用模型平均技术,帮助我们从各种结构学习算法中获得一个独特的糖尿病因果模型。我们还研究了每个单独图以及平均图与由领域专家将图边缘分为高信心、中等信心和低信心类型的结构之间的比较,形成对应于三个信心级别的三个个别图。 由此产生的因果模型和数据已在线提供,并作为一项有价值的资源和指南,为医疗从业者做出明智决策提供全面了解相关风险因素之间相互作用和假设干预效果的。因此,这项研究不仅为糖尿病的学术讨论做出贡献,还为医疗专业人士在制定有效干预和风险管理策略方面提供实用指导。
更新时间: 2024-03-21 11:51:42
领域: cs.LG
Neural Network-Based Processing and Reconstruction of Compromised Biophotonic Image Data
The integration of deep learning techniques with biophotonic setups has opened new horizons in bioimaging. A compelling trend in this field involves deliberately compromising certain measurement metrics to engineer better bioimaging tools in terms of cost, speed, and form-factor, followed by compensating for the resulting defects through the utilization of deep learning models trained on a large amount of ideal, superior or alternative data. This strategic approach has found increasing popularity due to its potential to enhance various aspects of biophotonic imaging. One of the primary motivations for employing this strategy is the pursuit of higher temporal resolution or increased imaging speed, critical for capturing fine dynamic biological processes. This approach also offers the prospect of simplifying hardware requirements/complexities, thereby making advanced imaging standards more accessible in terms of cost and/or size. This article provides an in-depth review of the diverse measurement aspects that researchers intentionally impair in their biophotonic setups, including the point spread function, signal-to-noise ratio, sampling density, and pixel resolution. By deliberately compromising these metrics, researchers aim to not only recuperate them through the application of deep learning networks, but also bolster in return other crucial parameters, such as the field-of-view, depth-of-field, and space-bandwidth product. Here, we discuss various biophotonic methods that have successfully employed this strategic approach. These techniques span broad applications and showcase the versatility and effectiveness of deep learning in the context of compromised biophotonic data. Finally, by offering our perspectives on the future possibilities of this rapidly evolving concept, we hope to motivate our readers to explore novel ways of balancing hardware compromises with compensation via AI.
Updated: 2024-03-21 11:44:25
标题: 基于神经网络的处理和重建受损生物光子图像数据
摘要: 将深度学习技术与生物光子学设置相结合,在生物成像领域开辟了新的视野。该领域中一个引人注目的趋势涉及有意牺牲某些测量指标,以设计更好的生物成像工具,包括成本、速度和形状因素,随后通过利用在大量理想、优越或替代数据上训练的深度学习模型来弥补由此产生的缺陷。这种策略性方法因其提升生物光子成像各个方面的潜力而日益受到青睐。采用这种策略的主要动机之一是追求更高的时间分辨率或增加成像速度,这对捕捉精细的动态生物过程至关重要。这种方法还提供了简化硬件要求/复杂性的可能性,从而使先进的成像标准在成本和/或尺寸方面更加易于获得。本文对研究人员在其生物光子设置中有意损害的多种测量方面进行了深入审查,包括点扩散函数、信噪比、采样密度和像素分辨率。通过有意牺牲这些指标,研究人员旨在不仅通过深度学习网络来恢复它们,而且还可以增强其他至关重要的参数,如视野、景深和空间带宽乘积。在这里,我们讨论了成功采用这种策略方法的各种生物光子学方法。这些技术涵盖广泛的应用,并展示了深度学习在受损生物光子数据背景下的多样性和有效性。最后,通过提供我们对这一迅速发展概念未来可能性的观点,我们希望激励读者探索通过人工智能平衡硬件牺牲与补偿的新颖方法。
更新时间: 2024-03-21 11:44:25
领域: physics.optics,cs.CV,cs.LG,physics.app-ph
DexDribbler: Learning Dexterous Soccer Manipulation via Dynamic Supervision
Learning dexterous locomotion policy for legged robots is becoming increasingly popular due to its ability to handle diverse terrains and resemble intelligent behaviors. However, joint manipulation of moving objects and locomotion with legs, such as playing soccer, receive scant attention in the learning community, although it is natural for humans and smart animals. A key challenge to solve this multitask problem is to infer the objectives of locomotion from the states and targets of the manipulated objects. The implicit relation between the object states and robot locomotion can be hard to capture directly from the training experience. We propose adding a feedback control block to compute the necessary body-level movement accurately and using the outputs as dynamic joint-level locomotion supervision explicitly. We further utilize an improved ball dynamic model, an extended context-aided estimator, and a comprehensive ball observer to facilitate transferring policy learned in simulation to the real world. We observe that our learning scheme can not only make the policy network converge faster but also enable soccer robots to perform sophisticated maneuvers like sharp cuts and turns on flat surfaces, a capability that was lacking in previous methods. Video and code are available at https://github.com/SysCV/soccer-player
Updated: 2024-03-21 11:16:28
标题: DexDribbler:通过动态监督学习熟练的足球操控技术
摘要: 学习四肢机器人灵巧的运动策略越来越受欢迎,因为它能够处理各种地形并模仿智能行为。然而,在学习社区中,关于同时操纵移动物体和四肢行走的问题,比如踢足球,却受到了较少的关注,尽管这对于人类和聪明的动物来说是自然的。解决这个多任务问题的关键挑战是从被操纵对象的状态和目标中推断出行走的目标。直接从训练经验中捕捉对象状态和机器人行走之间的隐含关系可能很难。我们提出添加一个反馈控制模块来准确计算必要的身体级移动,并将输出明确地用作动态关节级行走监督。我们进一步利用改进的球动态模型、扩展的上下文辅助估计器和全面的球观察器来促进在模拟中学习的策略转移到现实世界。我们观察到,我们的学习方案不仅可以使策略网络更快地收敛,还可以使足球机器人在平坦表面上执行像急剧转弯和转向这样的复杂动作,这是以前方法所缺乏的。视频和代码可在https://github.com/SysCV/soccer-player 上找到。
更新时间: 2024-03-21 11:16:28
领域: cs.RO,cs.AI
Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming
In a rapidly evolving digital landscape autonomous tools and robots are becoming commonplace. Recognizing the significance of this development, this paper explores the integration of Large Language Models (LLMs) like Generative pre-trained transformer (GPT) into human-robot teaming environments to facilitate variable autonomy through the means of verbal human-robot communication. In this paper, we introduce a novel framework for such a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting. This system allows users to interact with robot agents through natural language, each powered by individual GPT cores. By means of OpenAI's function calling, we bridge the gap between unstructured natural language input and structure robot actions. A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment. Our findings suggest that users may have preconceived expectations on how to converse with robots and seldom try to explore the actual language and cognitive capabilities of their robot collaborators. Still, those users who did explore where able to benefit from a much more natural flow of communication and human-like back-and-forth. We provide a set of lessons learned for future research and technical implementations of similar systems.
Updated: 2024-03-21 11:12:31
标题: 探索大型语言模型以促进人机协作中的可变自主性
摘要: 在一个快速发展的数字领域中,自主工具和机器人正变得司空见惯。认识到这一发展的重要性,本文探讨了将大型语言模型(LLMs)如生成式预训练变换器(GPT)集成到人机团队环境中,通过口头人机通信手段促进可变自主性。在本文中,我们介绍了一个基于Unity虚拟现实(VR)设置的GPT动力多机器人测试环境的新框架。该系统允许用户通过自然语言与机器人代理互动,每个代理都由独立的GPT核心驱动。通过OpenAI的函数调用,我们弥合了非结构化自然语言输入和结构化机器人动作之间的差距。一项涉及12名参与者的用户研究探讨了GPT-4的有效性,更重要的是,用户在有机会在多机器人环境中进行自然语言对话时的策略。我们的研究结果表明,用户可能对如何与机器人对话有先入为主的期望,并很少尝试探索其机器人合作者的实际语言和认知能力。然而,那些尝试探索的用户能够从更加自然流畅的交流和人类般的来回交流中受益。我们提供了一组为未来研究和类似系统的技术实施所学到的教训。
更新时间: 2024-03-21 11:12:31
领域: cs.HC,cs.AI,cs.RO
From Perils to Possibilities: Understanding how Human (and AI) Biases affect Online Fora
Social media platforms are online fora where users engage in discussions, share content, and build connections. This review explores the dynamics of social interactions, user-generated contents, and biases within the context of social media analysis (analyzing works that use the tools offered by complex network analysis and natural language processing) through the lens of three key points of view: online debates, online support, and human-AI interactions. On the one hand, we delineate the phenomenon of online debates, where polarization, misinformation, and echo chamber formation often proliferate, driven by algorithmic biases and extreme mechanisms of homophily. On the other hand, we explore the emergence of online support groups through users' self-disclosure and social support mechanisms. Online debates and support mechanisms present a duality of both perils and possibilities within social media; perils of segregated communities and polarized debates, and possibilities of empathy narratives and self-help groups. This dichotomy also extends to a third perspective: users' reliance on AI-generated content, such as the ones produced by Large Language Models, which can manifest both human biases hidden in training sets and non-human biases that emerge from their artificial neural architectures. Analyzing interdisciplinary approaches, we aim to deepen the understanding of the complex interplay between social interactions, user-generated content, and biases within the realm of social media ecosystems.
Updated: 2024-03-21 11:04:41
标题: 从危险到可能性:理解人类(和人工智能)偏见如何影响在线论坛
摘要: Social media platforms are online forums where users engage in discussions, share content, and build connections. This review delves into the dynamics of social interactions, user-generated content, and biases in the context of social media analysis (examining works that utilize tools like complex network analysis and natural language processing) from three key perspectives: online debates, online support, and human-AI interactions. On one hand, the review explores the phenomenon of online debates, where polarization, misinformation, and echo chambers often thrive due to algorithmic biases and homophily. On the other hand, it discusses the emergence of online support groups through users' self-disclosure and social support mechanisms. Online debates and support mechanisms present both risks and opportunities within social media; risks include segregated communities and polarized debates, while opportunities include empathy narratives and self-help groups. This dichotomy also extends to users' reliance on AI-generated content, such as that produced by Large Language Models, which may reflect human biases from training sets and non-human biases from artificial neural architectures. By analyzing interdisciplinary approaches, the review aims to deepen understanding of the complex interplay between social interactions, user-generated content, and biases in the realm of social media ecosystems.
更新时间: 2024-03-21 11:04:41
领域: cs.SI,cs.AI,cs.HC
Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications
Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks. We compare the predictive quality of different methods and find that some are naturally more robust to missing data. The Ensemble strategy, in particular, achieves a prediction robustness up to 100%. We evidence that missing scenarios are significantly more challenging in regression than classification tasks. Finally, we find that the optical view is the most critical view when it is missing individually.
Updated: 2024-03-21 11:03:56
标题: 地球观测应用中缺失数据对模型预测的影响评估
摘要: 地球观测(EO)应用涉及复杂和异构数据源通常采用机器学习模型进行处理。然而,人们普遍假设数据源将会持续可用。不同情况可能影响EO数据源的可用性,如噪声、云层或卫星任务失败。在这项工作中,我们评估了在四个数据集上进行分类和回归任务时,训练模型中缺失的时间和静态EO数据源的影响。我们比较了不同方法的预测质量,并发现其中一些方法在缺失数据时自然更具鲁棒性。特别是,集成策略实现了高达100%的预测鲁棒性。我们发现,缺失情景在回归任务中比分类任务更具挑战性。最后,我们发现光学视角在单独缺失时是最关键的视角。
更新时间: 2024-03-21 11:03:56
领域: cs.LG,cs.AI,cs.CV
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard architecture that improves the perplexity of the model without increasing its size -- adding a few thousand parameters for large-scale models in the 100B parameters range. Our approach relies on an additional averaging step after each transformer block, which computes a weighted average of current and past representations -- we refer to this operation as Depth-Weighted-Average (DWA). The learned DWA weights exhibit coherent patterns of information flow, revealing the strong and structured reuse of activations from distant layers. Experiments demonstrate that DenseFormer is more data efficient, reaching the same perplexity of much deeper transformer models, and that for the same perplexity, these new models outperform transformer baselines in terms of memory efficiency and inference time.
Updated: 2024-03-21 10:57:40
标题: 密集式Transformer:通过深度加权平均增强Transformer中的信息流
摘要: 翻译如下:Vaswani等人(2017)提出的Transformer架构现在在各个应用领域普遍存在,从自然语言处理到语音处理和图像理解。我们提出了DenseFormer,这是对标准架构的简单修改,可以提高模型的困惑度,而不会增加其大小 - 对于100B参数范围的大规模模型,只需增加几千个参数。我们的方法依赖于在每个Transformer块之后进行额外的平均化步骤,这个步骤计算当前和过去表示的加权平均值 - 我们将这个操作称为深度加权平均(DWA)。学习到的DWA权重展现出信息流的一致模式,揭示了远层激活的强大且结构化的重复利用。实验表明,DenseFormer具有更高的数据效率,可以达到比更深Transformer模型相同的困惑度,并且对于相同的困惑度,这些新模型在内存效率和推理时间方面优于Transformer基线模型。
更新时间: 2024-03-21 10:57:40
领域: cs.CL,cs.LG
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4.
Updated: 2024-03-21 10:57:23
标题: 了解LLMs不知道的内容:一种简单而有效的自我检测方法
摘要: 大型语言模型(LLMs)在自然语言处理(NLP)任务中展现了巨大潜力。然而,最近的文献表明,LLMs 间歇性地生成非事实性回答,这阻碍了LLMs 进一步利用的可靠性。本文提出了一种新颖的自我检测方法,用于检测LLMs 不知道的问题,容易生成非事实性结果。具体地,我们首先对给定问题进行文本表达的多样化,收集相应的答案。然后我们检查生成的答案之间的差异,以确定模型可能生成虚假信息的问题。所有以上步骤均可通过提示LLMs 自身完成,而无需参考任何其他外部资源。我们进行了全面的实验,并展示了我们的方法在最近发布的LLMs(如Vicuna、ChatGPT 和 GPT-4)上的有效性。
更新时间: 2024-03-21 10:57:23
领域: cs.CL,cs.AI
Exploring Green AI for Audio Deepfake Detection
The state-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance. Nonetheless, this advantage is accompanied by a significant carbon footprint. This is mainly due to the use of high-performance computing with accelerators and high training time. Studies show that average deep NLP model produces around 626k lbs of CO\textsubscript{2} which is equivalent to five times of average US car emission at its lifetime. This is certainly a massive threat to the environment. To tackle this challenge, this study presents a novel framework for audio deepfake detection that can be seamlessly trained using standard CPU resources. Our proposed framework utilizes off-the-shelve self-supervised learning (SSL) based models which are pre-trained and available in public repositories. In contrast to existing methods that fine-tune SSL models and employ additional deep neural networks for downstream tasks, we exploit classical machine learning algorithms such as logistic regression and shallow neural networks using the SSL embeddings extracted using the pre-trained model. Our approach shows competitive results compared to the commonly used high-carbon footprint approaches. In experiments with the ASVspoof 2019 LA dataset, we achieve a 0.90\% equal error rate (EER) with less than 1k trainable model parameters. To encourage further research in this direction and support reproducible results, the Python code will be made publicly accessible following acceptance. Github: https://github.com/sahasubhajit/Speech-Spoofing-
Updated: 2024-03-21 10:54:21
标题: 探索用于音频深度伪造检测的绿色人工智能
摘要: 目前利用深度神经网络的音频深度伪造检测器展示了令人印象深刻的识别性能。然而,这一优势伴随着显著的碳足迹。这主要是由于使用具有加速器和高训练时间的高性能计算。研究表明,平均深度NLP模型产生约626k磅的CO\textsubscript{2},相当于其寿命内美国汽车平均排放的五倍。这显然对环境构成了巨大威胁。为了解决这一挑战,本研究提出了一个新颖的音频深度伪造检测框架,可以无缝地使用标准CPU资源进行训练。我们提出的框架利用了预先训练并可在公共存储库中获得的自监督学习(SSL)模型。与现有方法不同,这些方法微调SSL模型并使用额外的深度神经网络进行下游任务,我们利用了使用预训练模型提取的SSL嵌入的经典机器学习算法,如逻辑回归和浅神经网络。与通常使用的高碳足迹方法相比,我们的方法展示了竞争性结果。在ASVspoof 2019 LA数据集的实验中,我们实现了小于1k可训练模型参数的0.90\%等误差率(EER)。为了鼓励进一步研究这个方向并支持可重复的结果,Python代码将在接受后公开访问。Github: https://github.com/sahasubhajit/Speech-Spoofing-
更新时间: 2024-03-21 10:54:21
领域: cs.SD,cs.CV,cs.LG,eess.AS
Enhancing Historical Image Retrieval with Compositional Cues
In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information. Qualitative and quantitative experiments demonstrate that the image retrieval network guided by composition information outperforms those relying solely on content information, facilitating the identification of images in databases closer to the target image in human perception. Please visit https://github.com/linty5/CCBIR to try our codes.
Updated: 2024-03-21 10:51:19
标题: 利用组合线索增强历史图像检索
摘要: 在分析大量存储的历史图像数据时,现有的基于内容的检索方法经常忽视重要的非语义信息,从而限制了它们在不同主题之间灵活探索的有效性。为了扩大图像检索方法的适用性,发现更多一般模式,我们创新地将计算美学中的关键因素,即图像构图,引入到这个主题中。通过将由CNN提取的与构图相关的信息明确地集成到设计的检索模型中,我们的方法考虑了图像的构图规则和语义信息。定性和定量实验证明,由构图信息引导的图像检索网络优于仅依赖于内容信息的方法,有助于在数据库中识别更接近目标图像的图像,更贴近人类感知。请访问https://github.com/linty5/CCBIR 尝试我们的代码。
更新时间: 2024-03-21 10:51:19
领域: cs.CV,cs.AI,eess.IV
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
Clustering speaker embeddings is crucial in speaker diarization but hasn't received as much focus as other components. Moreover, the robustness of speaker diarization across various datasets hasn't been explored when the development and evaluation data are from different domains. To bridge this gap, this study thoroughly examines spectral clustering for both same-domain and cross-domain speaker diarization. Our extensive experiments on two widely used corpora, AMI and DIHARD, reveal the performance trend of speaker diarization in the presence of domain mismatch. We observe that the performance difference between two different domain conditions can be attributed to the role of spectral clustering. In particular, keeping other modules unchanged, we show that differences in optimal tuning parameters as well as speaker count estimation originates due to the mismatch. This study opens several future directions for speaker diarization research.
Updated: 2024-03-21 10:49:54
标题: 评估用于深度发言者分离的谱聚类的稳健性
摘要: 将说话者嵌入进行聚类在说话者辨识中至关重要,但并没有得到其他组件那样多的关注。此外,在不同领域的开发和评估数据时,说话者辨识在各种数据集上的稳健性尚未被探索。为了弥补这一差距,本研究对相同领域和跨领域说话者辨识进行了全面的光谱聚类研究。我们在两个广泛使用的语料库AMI和DIHARD上进行了大量实验,揭示了在存在领域不匹配情况下的说话者辨识的性能趋势。我们观察到在两种不同领域条件下的性能差异可以归因于光谱聚类的作用。特别是,在保持其他模块不变的情况下,我们展示了由于不匹配导致的最佳调整参数的差异以及说话者计数估计。本研究为说话者辨识研究开辟了几个未来方向。
更新时间: 2024-03-21 10:49:54
领域: cs.SD,cs.CV,cs.LG,eess.AS
How to be fair? A study of label and selection bias
It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circumstances which methods work. Recently, Wick et al. showed, with experiments on synthetic data, that there exist situations in which bias mitigation techniques lead to more accurate models when measured on unbiased data. Nevertheless, in the absence of a thorough mathematical analysis, it remains unclear which techniques are effective under what circumstances. We propose to address this problem by establishing relationships between the type of bias and the effectiveness of a mitigation technique, where we categorize the mitigation techniques by the bias measure they optimize. In this paper we illustrate this principle for label and selection bias on the one hand, and demographic parity and ``We're All Equal'' on the other hand. Our theoretical analysis allows to explain the results of Wick et al. and we also show that there are situations where minimizing fairness measures does not result in the fairest possible distribution.
Updated: 2024-03-21 10:43:55
标题: 如何公平?标签和选择偏见的研究
摘要: 普遍认为,偏见数据会导致偏见,从而可能导致不公平的模型。因此,已经提出了几种用于评估数据和模型预测中的偏见的措施,以及旨在通过设计使模型公平的偏见缓解技术。然而,尽管过去十年中开发了大量的缓解技术,但仍然不清楚在什么情况下哪些方法有效。最近,Wick等人通过对合成数据的实验表明,在某些情况下,偏见缓解技术在无偏数据上测量时会导致更准确的模型。然而,在缺乏深入的数学分析的情况下,仍然不清楚在什么情况下哪些技术是有效的。我们提议通过建立偏见类型与缓解技术有效性之间的关系来解决这个问题,我们将通过优化偏见度量来对缓解技术进行分类。在本文中,我们分别以标签偏见和选择偏见一方面,以及人口平等和“我们都是平等的”另一方面,说明了这一原则。我们的理论分析可以解释Wick等人的结果,同时也表明存在一些情况,其中最小化公平度量并不会导致最公平的分布。
更新时间: 2024-03-21 10:43:55
领域: cs.LG,cs.AI,cs.CY
The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations
We study the problem of regret minimization for a single bidder in a sequence of first-price auctions where the bidder discovers the item's value only if the auction is won. Our main contribution is a complete characterization, up to logarithmic factors, of the minimax regret in terms of the auction's \emph{transparency}, which controls the amount of information on competing bids disclosed by the auctioneer at the end of each auction. Our results hold under different assumptions (stochastic, adversarial, and their smoothed variants) on the environment generating the bidder's valuations and competing bids. These minimax rates reveal how the interplay between transparency and the nature of the environment affects how fast one can learn to bid optimally in first-price auctions.
Updated: 2024-03-21 10:28:38
标题: 未知估值情况下透明度在重复一价拍卖中的作用
摘要: 我们研究了在一系列第一价格拍卖中,单个竞标者在只有在赢得拍卖时才能发现物品价值的情况下,如何最小化遗憾的问题。我们的主要贡献是对最小化遗憾的极小化值进行了完整的表征,关于拍卖的“透明度”,该透明度控制拍卖师在每次拍卖结束时披露的竞争出价信息的数量。我们的结果基于对生成竞标者估值和竞争出价的环境的不同假设(随机的、对抗性的以及它们的平滑变体)。这些极小化率揭示了透明度和环境性质之间的相互作用如何影响一个人在第一价格拍卖中学习如何快速进行最优出价。
更新时间: 2024-03-21 10:28:38
领域: cs.GT,cs.DS,cs.LG
Formalizing Stack Safety as a Security Property
The term stack safety is used to describe a variety of compiler, run-time, and hardware mechanisms for protecting stack memory. Unlike "the heap," the ISA-level stack does not correspond to a single high-level language concept: different compilers use it in different ways to support procedural and functional abstraction mechanisms from a wide range of languages. This protean nature makes it difficult to nail down what it means to correctly enforce stack safety. We propose a new formal characterization of stack safety using concepts from language-based security. Rather than treating stack safety as a monolithic property, we decompose it into an integrity property and a confidentiality property for each of the caller and the callee, plus a control-flow property: five properties in all. This formulation is motivated by a particular class of enforcement mechanisms, the "lazy" stack safety micro-policies studied by Roessler and DeHon, which permit functions to write into one another's frames but taint the changed locations so that the frame's owner cannot access them. No existing characterization of stack safety captures this style of safety; we capture it here by stating our properties in terms of the observable behavior of the system. Our properties go further than previous formal definitions of stack safety, supporting caller- and callee-saved registers, arguments passed on the stack, and tail-call elimination. We validate the properties by using them to distinguish between correct and incorrect implementations of Roessler and DeHon's micro-policies using property-based random testing. Our test harness successfully identifies several broken variants, including Roessler and DeHon's lazy policy; a repaired version of their policy passes our tests.
Updated: 2024-03-21 10:28:34
标题: 将栈安全形式化为安全属性
摘要: 堆栈安全这个术语用于描述一系列编译器、运行时和硬件机制,用于保护堆栈内存。与“堆”不同,ISA级别的堆栈并不对应于单个高级语言概念:不同的编译器以不同的方式使用它来支持从各种语言中的程序和功能抽象机制。这种多变的特性使得难以确定正确执行堆栈安全的含义。 我们提出了一种新的对堆栈安全的形式化描述,使用了基于语言的安全性概念。我们将堆栈安全性分解为调用者和被调用者的完整性属性和保密性属性,以及控制流属性:总共五个属性。这种表述受到了一类特定的强制机制的启发,即Roessler和DeHon所研究的“懒惰”堆栈安全微策略,允许函数在彼此的框架中进行写入,但污染了已更改的位置,以使框架的所有者无法访问它们。目前没有现有的堆栈安全性描述能够捕捉到这种安全性风格;通过以系统的可观察行为来陈述我们的属性,我们在这里捕捉了它。 我们的属性比以前的堆栈安全性的正式定义更进一步,支持调用者和被调用者保存的寄存器,参数通过堆栈传递,以及尾递归优化。我们通过使用这些属性来区分Roessler和DeHon的微策略的正确和不正确实现来验证这些属性,使用基于属性的随机测试。我们的测试框架成功地识别了几个破损的变体,包括Roessler和DeHon的懒惰策略;修复版本的策略通过了我们的测试。
更新时间: 2024-03-21 10:28:34
领域: cs.PL,cs.CR
Multi-role Consensus through LLMs Discussions for Vulnerability Detection
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces an approach to employ LLMs to act as different roles to simulate real-life code review process, engaging in discussions towards a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of the proposed approach indicates a 4.73% increase in the precision rate, 58.9% increase in the recall rate, and a 28.1% increase in the F1 score.
Updated: 2024-03-21 10:28:18
标题: 多角色共识通过LLMs讨论进行漏洞检测
摘要: 最近大型语言模型(LLMs)的进步突显了漏洞检测的潜力,这是软件质量保证的关键组成部分。尽管取得了进展,但大多数研究仅限于单一角色的视角,通常是测试人员,缺乏典型软件开发生命周期中不同角色的多样观点,包括开发人员和测试人员。为此,本文介绍了一种利用LLMs扮演不同角色的方法,模拟真实代码审查过程,参与讨论以达成对代码中漏洞存在和分类的共识。所提出方法的初步评估表明,精确率增加了4.73%,召回率增加了58.9%,F1分数增加了28.1%。
更新时间: 2024-03-21 10:28:18
领域: cs.SE,cs.AI
Reactor Optimization Benchmark by Reinforcement Learning
Neutronic calculations for reactors are a daunting task when using Monte Carlo (MC) methods. As high-performance computing has advanced, the simulation of a reactor is nowadays more readily done, but design and optimization with multiple parameters is still a computational challenge. MC transport simulations, coupled with machine learning techniques, offer promising avenues for enhancing the efficiency and effectiveness of nuclear reactor optimization. This paper introduces a novel benchmark problem within the OpenNeoMC framework designed specifically for reinforcement learning. The benchmark involves optimizing a unit cell of a research reactor with two varying parameters (fuel density and water spacing) to maximize neutron flux while maintaining reactor criticality. The test case features distinct local optima, representing different physical regimes, thus posing a challenge for learning algorithms. Through extensive simulations utilizing evolutionary and neuroevolutionary algorithms, we demonstrate the effectiveness of reinforcement learning in navigating complex optimization landscapes with strict constraints. Furthermore, we propose acceleration techniques within the OpenNeoMC framework, including model updating and cross-section usage by RAM utilization, to expedite simulation times. Our findings emphasize the importance of machine learning integration in reactor optimization and contribute to advancing methodologies for addressing intricate optimization challenges in nuclear engineering. The sources of this work are available at our GitHub repository: https://github.com/Scientific-Computing-Lab-NRCN/RLOpenNeoMC
Updated: 2024-03-21 10:26:47
标题: 用强化学习进行反应堆优化基准测试
摘要: 反应堆的中子计算在使用蒙特卡罗(MC)方法时是一项艰巨的任务。随着高性能计算的进步,如今更容易进行反应堆的模拟,但是设计和优化多参数仍然是一个计算挑战。MC传输模拟与机器学习技术相结合,为增强核反应堆优化的效率和效果提供了有希望的途径。本文介绍了一个新颖的基准问题,该问题在OpenNeoMC框架中专门设计用于强化学习。该基准问题涉及优化一个研究反应堆的单元格,具有两个可变参数(燃料密度和水间隔),以最大化中子通量同时保持反应堆临界性。测试案例具有不同的局部最优解,代表不同的物理领域,因此对学习算法构成挑战。通过广泛的模拟,利用进化算法和神经进化算法,我们展示了强化学习在导航具有严格约束的复杂优化景观中的效果。此外,我们提出了在OpenNeoMC框架中的加速技术,包括通过RAM利用率进行模型更新和截面使用,以加速模拟时间。我们的研究结果强调了机器学习在反应堆优化中的重要性,并为解决核工程中复杂优化挑战的方法论提供了贡献。本文的来源可在我们的GitHub存储库中找到:https://github.com/Scientific-Computing-Lab-NRCN/RLOpenNeoMC.
更新时间: 2024-03-21 10:26:47
领域: cs.NE,cs.AI
SLIM: Skill Learning with Multiple Critics
Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.
Updated: 2024-03-21 10:21:37
标题: SLIM:具有多个评论者的技能学习
摘要: 自监督技能学习旨在获取利用环境潜在动态的有用行为。基于互信息最大化的潜变量模型在这方面取得了成功,但在机器人操作背景下仍存在困难。由于需要影响可能构成环境的大量自由度,单独的互信息最大化无法产生有用且安全的操作行为。此外,通过简单组合将技能发现奖励与额外奖励相结合来解决这一问题可能无法产生期望的行为。为了解决这一局限性,我们引入了SLIM,一种针对机器人操作的技能发现的多批评者学习方法。我们的主要见解是,在演员-评论家框架中利用多个评论家来优雅地结合多个奖励函数,显著改善了机器人操作的潜变量技能发现,同时克服了可能发生的奖励干扰,这会阻碍有用技能的收敛。此外,在桌面操作背景下,我们展示了我们的新型技能发现方法在以层次强化学习方式获得安全高效的运动基元方面的适用性,并通过规划利用它们,显著超越了技能发现的基线方法。
更新时间: 2024-03-21 10:21:37
领域: cs.LG,cs.AI,cs.RO
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Visual relationship detection aims to identify objects and their relationships in images. Prior methods approach this task by adding separate relationship modules or decoders to existing object detection architectures. This separation increases complexity and hinders end-to-end training, which limits performance. We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. To extract relationship information, we introduce an attention mechanism that selects object pairs likely to form a relationship. We provide a single-stage recipe to train this model on a mixture of object and relationship detection data. Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds. We provide analyses of zero-shot performance, ablations, and real-world qualitative examples.
Updated: 2024-03-21 10:15:57
标题: 场景图视觉Transformer:端到端开放词汇视觉关系检测
摘要: 视觉关系检测旨在在图像中识别对象及其之间的关系。先前的方法是通过向现有的目标检测架构添加独立的关系模块或解码器来处理这一任务。这种分离增加了复杂性,并阻碍了端到端的训练,从而限制了性能。我们提出了一种简单且高效的无解码器架构,用于开放词汇的视觉关系检测。我们的模型由基于Transformer的图像编码器组成,将对象表示为令牌,并隐含地建模它们之间的关系。为了提取关系信息,我们引入了一个注意机制,选择可能形成关系的对象对。我们提供了一个单阶段的训练模型的方法,可以在混合的对象和关系检测数据上训练这个模型。我们的方法在Visual Genome和大词汇量的GQA基准测试中实现了最先进的关系检测性能,并且具有实时推理速度。我们还提供了零样本性能、消融和真实世界定性示例的分析。
更新时间: 2024-03-21 10:15:57
领域: cs.CV,cs.CL,cs.LG,cs.RO
From Tempered to Benign Overfitting in ReLU Neural Networks
Overparameterized neural networks (NNs) are observed to generalize well even when trained to perfectly fit noisy data. This phenomenon motivated a large body of work on "benign overfitting", where interpolating predictors achieve near-optimal performance. Recently, it was conjectured and empirically observed that the behavior of NNs is often better described as "tempered overfitting", where the performance is non-optimal yet also non-trivial, and degrades as a function of the noise level. However, a theoretical justification of this claim for non-linear NNs has been lacking so far. In this work, we provide several results that aim at bridging these complementing views. We study a simple classification setting with 2-layer ReLU NNs, and prove that under various assumptions, the type of overfitting transitions from tempered in the extreme case of one-dimensional data, to benign in high dimensions. Thus, we show that the input dimension has a crucial role on the type of overfitting in this setting, which we also validate empirically for intermediate dimensions. Overall, our results shed light on the intricate connections between the dimension, sample size, architecture and training algorithm on the one hand, and the type of resulting overfitting on the other hand.
Updated: 2024-03-21 10:15:19
标题: 从ReLu神经网络中的淬火到良性过拟合
摘要: 超参数化神经网络(NNs)被观察到在训练时即使完美拟合嘈杂数据也能很好地泛化。这一现象激发了大量关于“良性过拟合”的研究,其中插值预测器实现了接近最优性能。最近,有猜测并经验性地观察到,NNs的行为通常更好地描述为“温和过拟合”,其中性能既不是最优的,也不是微不足道的,并且随着噪声水平的增加而下降。然而,迄今为止,对于非线性NNs的这一说法缺乏理论上的证明。在这项工作中,我们提供了几个旨在弥合这些互补观点的结果。我们研究了一个简单的分类设置,使用2层ReLU NNs,并证明在各种假设下,过拟合的类型从在一维数据的极端情况下转变为高维度中的良性过拟合。因此,我们展示了输入维度在这一设置中对过拟合类型的关键作用,我们还通过经验验证了中间维度的情况。总的来说,我们的结果揭示了维度、样本大小、架构和训练算法之间以及导致的过拟合类型之间的复杂联系。
更新时间: 2024-03-21 10:15:19
领域: cs.LG,cs.NE,stat.ML
NewsBench: Systematic Evaluation of LLMs for Writing Proficiency and Safety Adherence in Chinese Journalistic Editorial Applications
This study presents NewsBench, a novel benchmark framework developed to evaluate the capability of Large Language Models (LLMs) in Chinese Journalistic Writing Proficiency (JWP) and their Safety Adherence (SA), addressing the gap between journalistic ethics and the risks associated with AI utilization. Comprising 1,267 tasks across 5 editorial applications, 7 aspects (including safety and journalistic writing with 4 detailed facets), and spanning 24 news topics domains, NewsBench employs two GPT-4 based automatic evaluation protocols validated by human assessment. Our comprehensive analysis of 10 LLMs highlighted GPT-4 and ERNIE Bot as top performers, yet revealed a relative deficiency in journalistic ethic adherence during creative writing tasks. These findings underscore the need for enhanced ethical guidance in AI-generated journalistic content, marking a step forward in aligning AI capabilities with journalistic standards and safety considerations.
Updated: 2024-03-21 10:14:09
标题: NewsBench:中国新闻编辑应用中LLMs的写作能力和安全依从性的系统评估
摘要: 这项研究介绍了NewsBench,这是一个新颖的基准框架,旨在评估大型语言模型(LLMs)在中国新闻写作能力(JWP)和他们的安全遵从(SA)方面的能力,弥补了新闻伦理和与人工智能利用相关风险之间的差距。NewsBench包括5个编辑应用程序中的1,267个任务,涵盖7个方面(包括安全和新闻写作,具有4个详细方面),跨越24个新闻主题领域,采用了两种基于GPT-4的自动评估协议,经过人类评估验证。我们对10个LLMs的综合分析突出了GPT-4和ERNIE Bot作为表现最好的模型,但在创意写作任务中揭示了相对不足的新闻伦理遵从。这些发现强调了AI生成的新闻内容需要增强的伦理指导的必要性,标志着将AI能力与新闻标准和安全考虑相一致的一大步。
更新时间: 2024-03-21 10:14:09
领域: cs.CL,cs.AI
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estimation, and graphbased methods, which treat an architecture as a graph, have shown prominent performance. For enhanced representation learning of neural architectures, we introduce FlowerFormer, a powerful graph transformer that incorporates the information flows within a neural architecture. FlowerFormer consists of two key components: (a) bidirectional asynchronous message passing, inspired by the flows; (b) global attention built on flow-based masking. Our extensive experiments demonstrate the superiority of FlowerFormer over existing neural encoding methods, and its effectiveness extends beyond computer vision models to include graph neural networks and auto speech recognition models. Our code is available at http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer.
Updated: 2024-03-21 10:02:39
标题: 花卉形成者:利用基于流畅图变换器的神经架构编码进行增强
摘要: 特定神经网络架构的成功与其处理的数据集和任务密切相关;并不存在一种适合所有情况的解决方案。因此,人们已经付出了相当大的努力,以便快速而准确地估计神经架构在给定任务和数据集上的表现,而无需进行完整训练或评估。神经架构编码在这种估计中发挥了关键作用,而将架构视为图形的基于图的方法表现出色。为了增强神经架构的表示学习,我们引入了FlowerFormer,这是一个强大的图形转换器,它将神经架构内的信息流整合在一起。FlowerFormer包括两个关键组件:(a)受流动启发的双向异步消息传递;(b)基于流动掩膜的全局注意力。我们的广泛实验证明了FlowerFormer优于现有神经编码方法,并且其有效性不仅限于计算机视觉模型,还包括图神经网络和自动语音识别模型。我们的代码可在http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer 获取。
更新时间: 2024-03-21 10:02:39
领域: cs.LG,cs.AI
A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification
Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.
Updated: 2024-03-21 09:59:53
标题: 一种具有肤色意识和裸露识别的肖像风格化框架
摘要: 人像风格化是一项具有挑战性的任务,涉及将输入的人像图像转化为特定风格,同时保留其固有特征。最近引入的稳定扩散(SD)显著提高了这一领域的结果质量。然而,缺乏一个能够有效过滤有害输入内容并保留输入的独特特征(如肤色)同时保持风格化质量的实用风格化框架。这些挑战阻碍了这种框架的广泛部署。为了解决这些问题,本研究提出了一个人像风格化框架,其中包括裸露内容识别模块(NCIM)和肤色感知人像风格化模块(STAPSM)。在实验中,NCIM在增强明确内容过滤方面表现良好,而STAPSM精确地表示了各种肤色。我们提出的框架已成功应用于实践中,并有效满足了真实应用的关键需求。
更新时间: 2024-03-21 09:59:53
领域: cs.CV,cs.AI
Deep Classifier Mimicry without Data Access
Access to pre-trained models has recently emerged as a standard across numerous machine learning domains. Unfortunately, access to the original data the models were trained on may not equally be granted. This makes it tremendously challenging to fine-tune, compress models, adapt continually, or to do any other type of data-driven update. We posit that original data access may however not be required. Specifically, we propose Contrastive Abductive Knowledge Extraction (CAKE), a model-agnostic knowledge distillation procedure that mimics deep classifiers without access to the original data. To this end, CAKE generates pairs of noisy synthetic samples and diffuses them contrastively toward a model's decision boundary. We empirically corroborate CAKE's effectiveness using several benchmark datasets and various architectural choices, paving the way for broad application.
Updated: 2024-03-21 09:58:15
标题: 没有数据访问的深度分类器模仿
摘要: 最近,许多机器学习领域已经开始普遍采用预训练模型。不幸的是,这些模型训练所使用的原始数据可能并不总是可以获取的。这使得对模型进行微调、压缩、持续适应或进行其他类型的数据驱动更新变得极为困难。我们认为,原始数据的访问可能并不是必需的。具体而言,我们提出了对比逆推知识提取(CAKE),这是一种模型无关的知识蒸馏过程,模仿深度分类器而无需访问原始数据。为此,CAKE生成一对嘈杂的合成样本,并将它们对比地朝向模型的决策边界扩散。我们通过使用几个基准数据集和各种架构选择来在实践中证实了CAKE的有效性,为广泛应用铺平了道路。
更新时间: 2024-03-21 09:58:15
领域: cs.LG,cs.AI
Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly Detection
Supervised deep learning techniques show promise in medical image analysis. However, they require comprehensive annotated data sets, which poses challenges, particularly for rare diseases. Consequently, unsupervised anomaly detection (UAD) emerges as a viable alternative for pathology segmentation, as only healthy data is required for training. However, recent UAD anomaly scoring functions often focus on intensity only and neglect structural differences, which impedes the segmentation performance. This work investigates the potential of Structural Similarity (SSIM) to bridge this gap. SSIM captures both intensity and structural disparities and can be advantageous over the classical $l1$ error. However, we show that there is more than one optimal kernel size for the SSIM calculation for different pathologies. Therefore, we investigate an adaptive ensembling strategy for various kernel sizes to offer a more pathology-agnostic scoring mechanism. We demonstrate that this ensembling strategy can enhance the performance of DMs and mitigate the sensitivity to different kernel sizes across varying pathologies, highlighting its promise for brain MRI anomaly detection.
Updated: 2024-03-21 09:50:39
标题: 使用集成结构异常评分的扩散模型用于无监督异常检测
摘要: 监督深度学习技术在医学图像分析中显示出潜力。然而,它们需要全面注释的数据集,这对于罕见疾病尤其具有挑战性。因此,无监督异常检测(UAD)作为病理分割的可行替代方案出现,因为只需要健康数据进行训练。然而,最近的UAD异常评分函数通常只关注强度,忽视结构差异,这会影响分割性能。本文研究了结构相似性(SSIM)的潜力来弥合这一差距。SSIM捕捉了强度和结构差异,相对于传统的$l1$误差可能更有优势。然而,我们发现对于不同的病理学,SSIM计算存在多个最佳的核大小。因此,我们调查了一种自适应集成策略,用于不同核大小,以提供更多病理不可知的评分机制。我们展示了这种集成策略可以提高DMs的性能,并减轻在不同病理学中对不同核大小的敏感性,突出其在脑MRI异常检测中的潜力。
更新时间: 2024-03-21 09:50:39
领域: eess.IV,cs.CV,cs.LG
ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification
Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs.
Updated: 2024-03-21 09:28:38
标题: ERD:提高认知失调分类的LLM推理框架
摘要: 近年来,借助大型语言模型(LLMs)提高心理治疗的可及性引起了广泛关注。从受访者的话语中识别认知失调可以是心理治疗的重要组成部分,特别是对于认知行为疗法。在本文中,我们提出了ERD,通过额外模块的帮助(1)提取与认知失调相关的部分,以及(2)通过多个代理辩论推理步骤,提高了基于LLM的认知失调分类性能。我们在一个公共数据集上的实验结果表明,ERD提高了多类F1分数以及二元特异性分数。关于后者的分数,我们发现我们的方法在向LLMs提供多代理辩论摘要时,有效地消除了基线方法的偏见,尤其是当提供多代理辩论的摘要时。
更新时间: 2024-03-21 09:28:38
领域: cs.CL,cs.LG
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained significant attention due to their importance. Existing methods have been developed to enhance document comprehension by incorporating pre-training awareness of images, text, and layout structure. However, these methods require fine-tuning for each task and dataset, and the models are expensive to train and operate. To overcome this limitation, we propose a new LayoutLLM that integrates these with large-scale language models (LLMs). By leveraging the strengths of existing research in document image understanding and LLMs' superior language understanding capabilities, the proposed model, fine-tuned with multimodal instruction datasets, performs an understanding of document images in a single model. Our experiments demonstrate improvement over the baseline model in various document analysis tasks.
Updated: 2024-03-21 09:25:24
标题: LayoutLLM:用于视觉丰富文档理解的大型语言模型指令调整
摘要: 本文提出了LayoutLLM,这是一种更灵活的文档分析方法,用于理解图像化文档。视觉丰富的文档理解任务,如文档图像分类和信息提取,由于其重要性而引起了广泛关注。现有方法已经发展出增强文档理解的方法,通过整合图片、文本和布局结构的预训练意识。然而,这些方法需要为每个任务和数据集进行微调,模型训练和运行成本高昂。为了克服这一限制,我们提出了一种新的LayoutLLM,将这些方法与大规模语言模型(LLMs)结合起来。通过充分利用现有研究在文档图像理解和LLMs卓越的语言理解能力方面的优势,提出的模型在多模态指导数据集上进行微调,完成了对文档图像的理解。我们的实验表明,在各种文档分析任务中,与基线模型相比,有所改进。
更新时间: 2024-03-21 09:25:24
领域: cs.CL,cs.AI,cs.CV,cs.LG
Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%).
Updated: 2024-03-21 09:22:23
标题: 保护医学图像分割数据集免受未经授权的训练,通过对轮廓和纹理感知的扰动
摘要: 公开可访问的医学图像的广泛可用性显著推动了各种研究和临床领域的进步。然而,对于未经授权的AI系统用于商业目的的培训以及对患者隐私保护的责任引起了许多机构犹豫分享他们的图像。这在医学图像分割(MIS)数据集中尤为明显,其中收集和精细注释的过程是耗时且费力的。最近,不可学习的示例(UEs)方法显示了通过添加不可见的快捷方式来保护图像的潜力。这些快捷方式可以防止未经授权的深度神经网络进行泛化。然而,现有的UEs设计用于自然图像分类,未能以不可察觉的方式保护MIS数据集,因为它们的保护性扰动比MIS中的重要先验知识(例如轮廓和纹理特征)更难学习。因此,我们提出了一种不可学习的医学图像生成方法,称为UMed。UMed通过注入轮廓和纹理感知扰动来集成MIS的先验知识来保护图像。鉴于我们的目标仅是毒害对MIS至关重要的特征,UMed仅需要在感兴趣区域(ROI)及其轮廓内进行最小的扰动即可实现更大的不可察觉性(平均PSNR为50.03)和保护性能(干净平均DSC从82.18%降至6.80%)。
更新时间: 2024-03-21 09:22:23
领域: eess.IV,cs.CR,cs.CV
Generalized Early Stopping in Evolutionary Direct Policy Search
Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often when evaluating solution over a fixed time period it becomes clear that the objective value will not increase with additional computation time (for example when a two wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem specific stopping criteria and show that it performs comparably, while being more generally applicable.
Updated: 2024-03-21 09:13:17
标题: 进化直接策略搜索中的广义提前停止
摘要: 长时间的评估在许多优化问题中很常见,例如直接策略搜索任务,特别是当它们涉及在物理世界中进行评估,例如在机器人应用中。通常在固定时间段内评估解决方案时,会清楚地看到目标值不会随着额外计算时间的增加而增加(例如当一个双轮机器人在原地不断旋转时)。在这种情况下,提前停止评估以节省计算时间是有意义的。然而,大多数停止评估的方法都是特定于问题的,需要专门为手头的任务设计。因此,我们提出了一种用于直接策略搜索的提前停止方法。所提出的方法仅在每个时间步查看客观值,并不需要特定于问题的知识。我们在来自游戏、机器人和经典控制领域的五个直接策略搜索环境中测试了引入的停止标准,并显示它可以节省高达75%的计算时间。我们还将其与特定于问题的停止标准进行比较,并显示它的性能相当,同时更具普适性。
更新时间: 2024-03-21 09:13:17
领域: stat.ML,cs.LG,cs.NE,cs.RO
CATSE: A Context-Aware Framework for Causal Target Sound Extraction
Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE models suitable for real-time processing. First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user. Since the practical applications of oracle models are limited due to their assumptions, we introduce a composite multi-task training objective involving separation and classification losses. Our evaluation involving single- and multi-source extraction shows the benefit of using context information in the model either by means of providing full context or via the proposed multi-task training loss without the need for full context information. Specifically, we show that our proposed model outperforms size- and latency-matched Waveformer, a state-of-the-art model for real-time TSE.
Updated: 2024-03-21 09:06:28
标题: CATSE:一种用于因果目标声音提取的上下文感知框架
摘要: 目标声音提取(TSE)专注于将用户提示指示的感兴趣源从输入混合物中分离出来的问题。大多数现有解决方案以离线方式运行,不适用于实时流内容应用中所施加的低延迟因果处理约束,如增强听觉。我们引入了一系列适用于实时处理的上下文感知低延迟因果TSE模型。首先,我们通过为TSE模型提供关于组成输入混合物的声音类别的神谕信息来探索上下文的效用,其中模型的目标是提取用户指示的一个或多个感兴趣源。由于神谕模型的实际应用受到其假设的限制,我们引入了涉及分离和分类损失的复合多任务训练目标。我们的评估涉及单源和多源提取,显示了在模型中使用上下文信息的好处,无论是通过提供完整上下文还是通过提议的多任务训练损失,而无需完整上下文信息。具体而言,我们展示了我们提出的模型优于与大小和延迟匹配的Waveformer,这是一种用于实时TSE的最先进模型。
更新时间: 2024-03-21 09:06:28
领域: eess.AS,cs.AI
TensorBank: Tensor Lakehouse for Foundation Model Training
Storing and streaming high dimensional data for foundation model training became a critical requirement with the rise of foundation models beyond natural language. In this paper we introduce TensorBank, a petabyte scale tensor lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU memory at wire speed based on complex relational queries. We use Hierarchical Statistical Indices (HSI) for query acceleration. Our architecture allows to directly address tensors on block level using HTTP range reads. Once in GPU memory, data can be transformed using PyTorch transforms. We provide a generic PyTorch dataset type with a corresponding dataset factory translating relational queries and requested transformations as an instance. By making use of the HSI, irrelevant blocks can be skipped without reading them as those indices contain statistics on their content at different hierarchical resolution levels. This is an opinionated architecture powered by open standards and making heavy use of open-source technology. Although, hardened for production use using geospatial-temporal data, this architecture generalizes to other use case like computer vision, computational neuroscience, biological sequence analysis and more.
Updated: 2024-03-21 09:03:48
标题: TensorBank:基础模型训练的张量数据湖
摘要: 随着基础模型超越自然语言,存储和流式传输高维数据成为基础模型训练的关键需求。本文介绍了TensorBank,一个能够从云对象存储(COS)流式传输张量到GPU内存的PB级张量数据湖,基于复杂的关系查询实现了线速传输。我们使用分层统计指数(HSI)加速查询。我们的架构允许直接使用HTTP范围读取在块级别上访问张量。一旦在GPU内存中,数据可以使用PyTorch transforms进行转换。我们提供了一种通用的PyTorch数据集类型,配备有相应的数据集工厂,将关系查询和请求的转换作为一个实例进行翻译。通过利用HSI,可以跳过不相关的块而无需读取它们,因为这些索引包含不同层次分辨率下其内容的统计信息。这是一个基于开放标准的、充分利用开源技术的架构。尽管已经在地理空间时间数据上进行了生产用途的强化,这种架构也可以推广到其他用途,如计算机视觉、计算神经科学、生物序列分析等。
更新时间: 2024-03-21 09:03:48
领域: cs.LG,cs.AI,cs.DB,cs.IR
Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering
The 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this paper, we propose to use isotropic Gaussian kernels to avoid such difficulties in the computation, leading to a higher performance method. The experiments confirm that the proposed method is about {\bf 100X} faster without losing the geometry representation accuracy. The proposed method can be applied in a large range applications where the radiance field is needed, such as 3D reconstruction, view synthesis, and dynamic object modeling.
Updated: 2024-03-21 09:02:31
标题: 各向同性高斯点渲染实时辐射场
摘要: 3D高斯飘带方法引起了许多关注,这要归功于其在训练中的高性能和渲染图像的高质量。然而,它使用各向异性高斯核来表示场景。尽管这种各向异性核在表示几何形状方面具有优势,但在计算方面会导致困难,比如分裂或合并两个核。本文提出使用各向同性高斯核来避免这种计算困难,从而实现更高性能的方法。实验证实,所提出的方法大约比之前的方法快100倍,同时不会丢失几何表示的准确性。该方法可以应用于需要辐射场的广泛应用领域,如3D重建、视图合成和动态物体建模。
更新时间: 2024-03-21 09:02:31
领域: cs.CV,cs.AI,cs.LG,eess.IV
Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology
The rise of Artificial Intelligence creates great promise in the field of medical discovery, diagnostics and patient management. However, the vast complexity of all medical domains require a more complex approach that combines machine learning algorithms, classifiers, segmentation algorithms and, lately, large language models. In this paper, we describe, implement and assess an Artificial Intelligence-empowered system and methodology aimed at assisting the diagnosis process of skin lesions and other skin conditions within the field of dermatology that aims to holistically address the diagnostic process in this domain. The workflow integrates large language, transformer-based vision models and sophisticated machine learning tools. This holistic approach achieves a nuanced interpretation of dermatological conditions that simulates and facilitates a dermatologist's workflow. We assess our proposed methodology through a thorough cross-model validation technique embedded in an evaluation pipeline that utilizes publicly available medical case studies of skin conditions and relevant images. To quantitatively score the system performance, advanced machine learning and natural language processing tools are employed which focus on similarity comparison and natural language inference. Additionally, we incorporate a human expert evaluation process based on a structured checklist to further validate our results. We implemented the proposed methodology in a system which achieved approximate (weighted) scores of 0.87 for both contextual understanding and diagnostic accuracy, demonstrating the efficacy of our approach in enhancing dermatological analysis. The proposed methodology is expected to prove useful in the development of next-generation tele-dermatology applications, enhancing remote consultation capabilities and access to care, especially in underserved areas.
Updated: 2024-03-21 09:02:17
标题: Dermacen Analytica:一种将多模式大型语言模型与机器学习集成在远程皮肤科学中的新方法论
摘要: 人工智能的崛起在医学发现、诊断和患者管理领域带来了巨大的希望。然而,所有医学领域的巨大复杂性需要一种更复杂的方法,结合了机器学习算法、分类器、分割算法和最近流行的大型语言模型。本文描述、实施和评估了一种人工智能增强系统和方法论,旨在辅助皮肤病变和其他皮肤状况的诊断过程,旨在全面解决该领域的诊断过程。工作流程集成了大型语言、基于transformer的视觉模型和先进的机器学习工具。这种整体方法实现了对皮肤病症的微妙解释,模拟和促进了皮肤科医生的工作流程。我们通过一个嵌入在评估管道中的彻底跨模型验证技术对我们提出的方法进行评估,该技术利用了公开可用的皮肤病症和相关图片的医学病例研究。为了定量评估系统的性能,采用了先进的机器学习和自然语言处理工具,重点关注相似性比较和自然语言推理。此外,我们还结合了基于结构化检查表的人类专家评估流程,以进一步验证我们的结果。我们在一个系统中实施了提出的方法,该系统在上下文理解和诊断准确性方面获得了约0.87的加权分数,证明了我们方法在增强皮肤病理分析方面的有效性。预计所提出的方法将在下一代远程皮肤科应用程序的开发中发挥作用,增强远程咨询能力和医疗保健的获取,特别是在服务不足的地区。
更新时间: 2024-03-21 09:02:17
领域: cs.CL,cs.AI,cs.CV
On the consistency of supervised learning with missing values
In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data, through multiple imputation. Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.
Updated: 2024-03-21 09:01:19
标题: 关于具有缺失值的监督学习的一致性
摘要: 在许多应用设置中,数据存在缺失条目,这使得分析变得具有挑战性。大量文献讨论了缺失值在推断框架中的处理方法:从不完整的表中估计参数及其方差。在这里,我们考虑了监督学习的设置:当训练数据和测试数据中出现缺失值时,预测目标。我们展示了两种预测方法的一致性。一个引人注目的结果是,在缺失值不具信息性时,使用常数(如学习之前的平均值)进行插补的广泛使用方法是一致的。这与推断设置形成鲜明对比,因为平均值插补被指出会扭曲数据的分布。这样一个简单的方法可以保持一致,在实践中是重要的。我们还展示了适用于完整观测的预测器可以通过多重插补在不完整数据上实现最佳预测。最后,为了比较插补和直接使用考虑缺失值的模型进行学习,我们进一步分析了决策树。由于它们具有处理不完整变量的半离散特性,这些决策树可以自然地处理带有缺失值的经验风险最小化。在理论和实证中比较了不同的树中缺失值策略后,我们推荐使用“属性中包含缺失值”方法,因为它可以处理不具信息性和具信息性的缺失值。
更新时间: 2024-03-21 09:01:19
领域: stat.ML,cs.LG,math.ST,stat.TH
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Despite the promise of RLHF in aligning LLMs with human preferences, it often leads to superficial alignment, prioritizing stylistic changes over improving downstream performance of LLMs. Underspecified preferences could obscure directions to align the models. Lacking exploration restricts identification of desirable outputs to improve the models. To overcome these challenges, we propose a novel framework: Reinforcement Learning from Reflective Feedback (RLRF), which leverages fine-grained feedback based on detailed criteria to improve the core capabilities of LLMs. RLRF employs a self-reflection mechanism to systematically explore and refine LLM responses, then fine-tuning the models via a RL algorithm along with promising responses. Our experiments across Just-Eval, Factuality, and Mathematical Reasoning demonstrate the efficacy and transformative potential of RLRF beyond superficial surface-level adjustment.
Updated: 2024-03-21 08:57:27
标题: 通过反思性反馈进行强化学习(RLRF):通过细粒度自我反思对齐和改进LLMs
摘要: 尽管RLHF在将LLMs与人类偏好对齐方面有着潜在的优势,但通常会导致表面对齐,优先考虑风格的变化而不是改善LLMs的下游性能。未明确规定的偏好可能会混淆对模型对齐的方向。缺乏探索限制了识别改进模型所需的理想输出。为了克服这些挑战,我们提出了一个新的框架:反思反馈强化学习(RLRF),它利用基于详细标准的细粒度反馈来改善LLMs的核心能力。RLRF采用自我反思机制系统地探索和完善LLM的响应,然后通过RL算法和有前途的响应对模型进行微调。我们在Just-Eval、Factuality和数学推理方面的实验表明了RLRF的功效和超越表面层调整的潜力。
更新时间: 2024-03-21 08:57:27
领域: cs.CL,cs.AI
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond
Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/NCISurvey.
Updated: 2024-03-21 08:54:56
标题: 神经编码智能的调查:范式、进展与未来Perspectives
摘要: 神经编码智能——利用深度学习理解、生成和优化代码——在整个社会上具有巨大的潜力,能够产生深远的影响。作为自然语言和编程语言之间的桥梁,这一领域在过去几年吸引了来自两个研究社区的研究人员的重要关注。本调查系统地和按时间顺序回顾了代码智能领域的进展,包括超过50个代表性模型及其变体,超过20个任务类别,以及超过680项相关工作的广泛覆盖。我们按照历史发展路径追踪范式转变在不同研究阶段的进展(例如,从使用循环神经网络对代码建模到大型语言模型的时代)。同时,我们重点介绍了模型、任务和评估在不同阶段的主要技术转变。在应用方面,我们还观察到了一种共同演化的转变。它从最初的尝试解决特定场景问题开始,经历了快速扩张时期探索多样化任务,目前专注于解决日益复杂和多样化的现实挑战。基于我们对发展轨迹的研究,我们进一步调查了代码智能与更广泛的机器智能之间新兴的协同关系,揭示了跨领域机会,展示了代码智能在各个领域的重要影响。最后,我们深入探讨了与该领域相关的机遇和挑战,同时阐明了我们对最有前景的研究方向的见解。与本调查相关的正在进行中的、动态更新的项目和资源已在https://github.com/QiushiSun/NCISurvey 上发布。
更新时间: 2024-03-21 08:54:56
领域: cs.SE,cs.AI,cs.CL,cs.PL
A Unified Framework for Model Editing
Model editing is a growing area focused on updating the knowledge embedded within models. Among the various methodologies, ROME and MEMIT stand out as leading "locate-and-edit" model editing techniques. While MEMIT enables batched editing of memories, ROME is limited to changing one fact at a time. This paper introduces a unifying framework that brings ROME and MEMIT under a single conceptual umbrella, optimizing for the same goal, which we call the "preservation-memorization" objective. This objective aims to preserve the representations of certain selected vectors while memorizing the representations of new factual information. Specifically, ROME optimizes this objective using an equality constraint, whereas MEMIT employs a more flexible least-square constraint. In addition to making batched edits, MEMIT also edits the model at multiple layers. We disentangle the distribution of edits to multiple layers from the optimization objective of MEMIT and show that these edit-distribution algorithms should be considered separate entities worthy of their own line of research. Finally, we present EMMET - an Equality-constrained Mass Model Editing algorithm for Transformers, a new batched memory-editing algorithm. With EMMET, we present a closed form solution for the equality-constrained version of the preservation-memorization objective. We show that EMMET is able to perform batched-edits on par with MEMIT up to a batch-size of 256 and discuss the challenges in stabilizing EMMET. By articulating the "locate-and-edit" model editing algorithms under a simple conceptual framework of "preservation-memorization", we aim to bridge the gap between intuition and mathematics and hope to simplify the journey for future researchers in model editing.
Updated: 2024-03-21 08:54:24
标题: 一个统一的模型编辑框架
摘要: 模型编辑是一个不断发展的领域,专注于更新模型中嵌入的知识。在各种方法中,ROME和MEMIT作为领先的“定位和编辑”模型编辑技术脱颖而出。虽然MEMIT可以批量编辑记忆,但ROME限于一次更改一个事实。本文引入了一个统一框架,将ROME和MEMIT纳入一个概念性的总纲之下,优化了相同的目标,我们称之为“保存-记忆”目标。该目标旨在保留某些选择的向量的表示,同时记忆新的事实信息的表示。具体而言,ROME使用一个相等约束来优化这个目标,而MEMIT采用更灵活的最小二乘约束。除了进行批量编辑外,MEMIT还在多个层面编辑模型。我们将编辑分配到多个层次的分布与MEMIT的优化目标分开,并展示这些编辑分配算法应被视为独立的实体,值得开展自己的研究线。最后,我们提出了EMMET - 一种适用于Transformer的Equality-constrained Mass Model Editing算法,这是一种新的批量记忆编辑算法。通过EMMET,我们提出了一个封闭形式解决方案,用于保存-记忆目标的相等约束版本。我们展示了EMMET能够在批量大小为256的情况下与MEMIT一样进行批量编辑,并讨论了稳定EMMET的挑战。通过将“定位和编辑”模型编辑算法置于“保存-记忆”简单概念框架下,我们旨在弥合直觉与数学之间的差距,并希望为未来的模型编辑研究者简化旅程。
更新时间: 2024-03-21 08:54:24
领域: cs.LG,cs.AI,cs.CL
Differentially Private Linear Bandits with Partial Distributed Feedback
In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost but often leads to privacy concerns. To tackle this problem, we consider differentially private distributed linear bandits, where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients' local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we prove that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection ``for free'' in the sense that the additional cost due to privacy guarantees is a lower-order additive term. In addition, as a by-product of our techniques, the same results of ``free" privacy can also be achieved for the standard differentially private linear bandits. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.
Updated: 2024-03-21 08:53:03
标题: 具有部分分布反馈的差分私有线性赌博机
摘要: 在本文中,我们研究了仅具有部分分布反馈的全局奖励最大化问题。这个问题受到几个现实世界应用的启发(例如,蜂窝网络配置,动态定价和政策选择),其中中央实体采取的行动影响贡献到全局奖励的大量人口。然而,从整个人口收集这样的奖励反馈不仅成本极高,而且常常引起隐私担忧。为了解决这个问题,我们考虑差分隐私分布式线性赌博机,其中只选择人口中的一部分用户(称为客户端)参与学习过程,中央服务器通过以差分隐私方式迭代地聚合这些客户端的本地反馈来学习全局模型。然后,我们提出了一个统一的算法学习框架,称为差分隐私分布式分阶段消除(DP-DPE),可以自然地与流行的差分隐私模型(包括中央差分隐私,本地差分隐私和洗牌差分隐私)集成。此外,我们证明了DP-DPE实现了次线性遗憾和次线性通信成本。有趣的是,DP-DPE还实现了隐私保护“免费”,即由于隐私保证而产生的额外成本是一个低阶加法项。此外,作为我们技术的副产品,我们还为标准差分私有线性赌博机实现了相同的“免费”隐私结果。最后,我们进行模拟以验证我们的理论结果,并展示DP-DPE的有效性。
更新时间: 2024-03-21 08:53:03
领域: cs.LG,cs.CR,cs.NA,math.NA
RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey
We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies.
Updated: 2024-03-21 08:52:39
标题: RG-CAT:EMU Pilot Survey中的射电星系检测管线和目录
摘要: 我们提出了源探测和目录构建流程,以建立首个射电星系目录,该目录来源于使用澳大利亚广方千米阵列探测器(ASKAP)望远镜进行的Evolutionary Map of the Universe (EMU-PS) 270 $\rm deg^2$试验观测。探测流程使用Gal-DINO计算机视觉网络(Gupta et al., 2024)来预测射电形态的类别和边界框,以及它们潜在的红外主机位置。Gal-DINO网络经过训练和评估,使用约5,000个经过目视检查的射电星系及其红外主机,包括紧凑和扩展的射电形态。我们发现,对于99%的射电源,预测的边界框与实际情况的IoU大于0.5,而98%的预测主机位置在评估集中与实际情况的红外主机相差不超过$3^{\prime \prime}$。目录构建流程使用经过训练网络的预测结果,基于使用Selavy源检测算法识别的射电组件目录的射电和红外图像剪切。然后使用预测的置信度评分来优先考虑得分较高的Selavy组件,并首先将它们合并到目录中。这导致对总共211,625个射电源进行了识别,其中201,211个被分类为紧凑和未解析。剩下的10,414个被归类为扩展的射电形态,包括582个FR-I,5,602个FR-II,1,494个FR-x(不确定是FR-I还是FR-II),2,375个R(单峰已解析)射电星系,以及361个具有特殊和其他罕见形态的星系。我们将目录中的射电源与红外和光学目录进行交叉匹配,发现73%的射电星系有红外交叉匹配,36%有光度红移。
更新时间: 2024-03-21 08:52:39
领域: astro-ph.GA,astro-ph.CO,astro-ph.IM,cs.CV,cs.LG
SoftPatch: Unsupervised Anomaly Detection with Noisy Data
Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection for the first time. To solve this problem, we proposed a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset. Comprehensive experiments in various noise scenes demonstrate that SoftPatch outperforms the state-of-the-art AD methods on the MVTecAD and BTAD benchmarks and is comparable to those methods under the setting without noise.
Updated: 2024-03-21 08:49:34
标题: SoftPatch: 无监督异常检测与嘈杂数据
摘要: 尽管主流的无监督异常检测(AD)算法在学术数据集中表现良好,但由于理想的实验设置是干净的训练数据,它们在实际应用中的性能受到限制。在真实世界的异常检测中,训练带有噪声数据是一个不可避免的问题,但很少有讨论。本文首次考虑了图像感知异常检测中的标签级噪声。为了解决这个问题,我们提出了一种基于记忆的无监督AD方法SoftPatch,它有效地在补丁级别去噪。噪声判别器被用来为补丁级别的噪声消除生成异常值得分,然后将这些得分存储在内存库中以软化异常检测边界。与现有方法相比,SoftPatch保持了对正常数据的强建模能力,并减轻了coreset中的过度自信问题。在各种噪声场景中的综合实验表明,SoftPatch在MVTecAD和BTAD基准测试中优于最先进的AD方法,并在没有噪声的情况下与这些方法相媲美。
更新时间: 2024-03-21 08:49:34
领域: cs.CV,cs.AI,cs.LG
Contrastive Balancing Representation Learning for Heterogeneous Dose-Response Curves Estimation
Estimating the individuals' potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful for counterfactual prediction, especially when the treatment variables are continuous. To tackle the above issue, in this paper, we first theoretically demonstrate the importance of the balancing and prognostic representations for unbiased estimation of the heterogeneous dose-response curves, that is, the learned representations are constrained to satisfy the conditional independence between the covariates and both of the treatment variables and the potential responses. Based on this, we propose a novel Contrastive balancing Representation learning Network using a partial distance measure, called CRNet, for estimating the heterogeneous dose-response curves without losing the continuity of treatments. Extensive experiments are conducted on synthetic and real-world datasets demonstrating that our proposal significantly outperforms previous methods.
Updated: 2024-03-21 08:41:53
标题: 异质剂量-响应曲线估计的对比平衡表示学习
摘要: 估计个体对不同治疗剂量的潜在反应对于决策,如精准医学和管理科学领域至关重要。大多数最近的研究通过学习与治疗变量无关的协变量表示来预测反事实结果。然而,这种独立性约束忽略了很多有用于反事实预测的协变量信息,尤其是当治疗变量是连续变量时。为了解决上述问题,在本文中,我们首先从理论上证明了平衡和预测表示对于无偏估计异质剂量-反应曲线的重要性,即学到的表示受到了协变量和治疗变量以及潜在反应之间的条件独立性的约束。基于此,我们提出了一种新颖的对比平衡表示学习网络,使用部分距离度量,称为CRNet,用于估计异质剂量-反应曲线,而不会丢失治疗的连续性。我们在合成和真实数据集上进行了大量实验证明我们的提议明显优于先前的方法。
更新时间: 2024-03-21 08:41:53
领域: cs.LG
Recovering Latent Confounders from High-dimensional Proxy Variables
Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, mixed proxy variables. For specific sample sizes, our two-step PCF implementation, using Independent Component Analysis (ICA-PCF), and the end-to-end implementation, using Gradient Descent (GD-PCF), achieve high correlation with the latent confounder and low absolute error in causal effect estimation with synthetic datasets in the high sample size regime. Even when faced with climate data, ICA-PCF recovers four components that explain $75.9\%$ of the variance in the North Atlantic Oscillation, a known confounder of precipitation patterns in Europe. Code for our PCF implementations and experiments can be found here: https://github.com/IPL-UV/confound_it. The proposed methodology constitutes a stepping stone towards discovering latent confounders and can be applied to many problems in disciplines dealing with high-dimensional observed proxies, e.g., spatiotemporal fields.
Updated: 2024-03-21 08:39:13
标题: 从高维代理变量中恢复潜在混淆变量
摘要: 检测潜在混淆因素是因果效应估计中的一个重要问题,特别是从代理变量中检测。先前的方法局限于低维代理、分类代理和二元处理。我们消除了这些假设,提出了一种新颖的连续处理效应估计的代理混淆因素分解(PCF)框架,当潜在混淆因素通过高维、混合代理变量显现时。对于特定样本大小,我们的两步PCF实现,使用独立分量分析(ICA-PCF),和端到端实现,使用梯度下降(GD-PCF),在高样本大小范围内使用合成数据集实现了与潜在混淆因素的高相关性和因果效应估计的低绝对误差。即使面对气候数据,ICA-PCF也恢复了解释欧洲降水模式混淆因素北大西洋涛动方差的四个分量的$75.9\%$。我们的PCF实现和实验代码可以在这里找到:https://github.com/IPL-UV/confound_it。该方法为发现潜在混淆因素铺平了道路,并可应用于处理高维观测代理的许多学科领域,例如时空领域。
更新时间: 2024-03-21 08:39:13
领域: stat.ML,cs.LG
PeerGPT: Probing the Roles of LLM-based Peer Agents as Team Moderators and Participants in Children's Collaborative Learning
In children's collaborative learning, effective peer conversations can significantly enhance the quality of children's collaborative interactions. The integration of Large Language Model (LLM) agents into this setting explores their novel role as peers, assessing impacts as team moderators and participants. We invited two groups of participants to engage in a collaborative learning workshop, where they discussed and proposed conceptual solutions to a design problem. The peer conversation transcripts were analyzed using thematic analysis. We discovered that peer agents, while managing discussions effectively as team moderators, sometimes have their instructions disregarded. As participants, they foster children's creative thinking but may not consistently provide timely feedback. These findings highlight potential design improvements and considerations for peer agents in both roles.
Updated: 2024-03-21 08:37:15
标题: PeerGPT:探究基于LLM的同行代理作为团队主持人和参与者在儿童合作学习中的角色
摘要: 在儿童的合作学习中,有效的同伴对话可以显著提高儿童的合作互动质量。将大型语言模型(LLM)代理整合到这一设置中,探讨他们作为同伴的新角色,评估其作为团队主持人和参与者的影响。我们邀请了两组参与者参加合作学习研讨会,他们讨论并提出了对一个设计问题的概念解决方案。同伴对话的转录资料经过主题分析。我们发现,同伴代理在有效地管理讨论作为团队主持人的同时,有时会被忽视其指导。作为参与者,他们促进了儿童的创造性思维,但可能并不始终提供及时反馈。这些发现突显了潜在的设计改进和同伴代理在两种角色中的考虑。
更新时间: 2024-03-21 08:37:15
领域: cs.HC,cs.AI
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It allows users to flexibly customize the fine-tuning of 100+ LLMs without the need for coding through the built-in web UI LlamaBoard. We empirically validate the efficiency and effectiveness of our framework on language modeling and text generation tasks. It has been released at https://github.com/hiyouga/LLaMA-Factory and already received over 13,000 stars and 1,600 forks.
Updated: 2024-03-21 08:36:39
标题: LlamaFactory: 统一高效地微调100多种语言模型
摘要: 高效的微调对于将大型语言模型(LLMs)调整到下游任务中至关重要。然而,实现这些方法在不同模型上需要付出不少努力。我们提出了LlamaFactory,一个统一框架,集成了一套最先进的高效训练方法。它允许用户通过内置的web UI LlamaBoard 灵活定制100+LLMs的微调,无需编码。我们在语言建模和文本生成任务上通过实证验证了我们框架的效率和有效性。它已发布在https://github.com/hiyouga/LLaMA-Factory,并已获得超过13,000颗星星和1,600个分支。
更新时间: 2024-03-21 08:36:39
领域: cs.CL,cs.AI
Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights
Bayesian approaches for training deep neural networks (BNNs) have received significant interest and have been effectively utilized in a wide range of applications. There have been several studies on the properties of posterior concentrations of BNNs. However, most of these studies only demonstrate results in BNN models with sparse or heavy-tailed priors. Surprisingly, no theoretical results currently exist for BNNs using Gaussian priors, which are the most commonly used one. The lack of theory arises from the absence of approximation results of Deep Neural Networks (DNNs) that are non-sparse and have bounded parameters. In this paper, we present a new approximation theory for non-sparse DNNs with bounded parameters. Additionally, based on the approximation theory, we show that BNNs with non-sparse general priors can achieve near-minimax optimal posterior concentration rates to the true model.
Updated: 2024-03-21 08:31:36
标题: 具有权重一般先验的全连接贝叶斯神经网络的后验浓度
摘要: 贝叶斯方法在训练深度神经网络(BNNs)方面受到了广泛关注,并在各种应用中得到了有效利用。已经有几项关于BNNs后验集中性质的研究。然而,大多数研究仅展示了在具有稀疏或重尾先验的BNN模型中的结果。令人惊讶的是,目前对使用高斯先验的BNNs没有理论结果,这是最常用的。理论缺乏源于缺乏对非稀疏且具有有界参数的深度神经网络(DNNs)的近似结果。在本文中,我们提出了一种新的适用于非稀疏具有有界参数的DNNs的近似理论。此外,基于这个近似理论,我们展示了具有非稀疏通用先验的BNNs可以实现接近极小化的后验集中速率,接近真实模型。
更新时间: 2024-03-21 08:31:36
领域: stat.ML,cs.LG
Weighted least-squares approximation with determinantal point processes and generalized volume sampling
We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\varphi(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.
Updated: 2024-03-21 08:29:32
标题: 用行列式点过程和广义体积抽样的加权最小二乘逼近
摘要: 我们考虑通过给定的$m$维空间$V_m$中的元素,利用在随机点$x_1,\dots,x_n$处对函数的评估来逼近$L^2$中的函数的问题。在回顾了一些关于使用独立同分布点进行最优加权最小二乘的结果后,我们考虑使用投影行列式点过程(DPP)或体积抽样进行加权最小二乘。这些分布引入了点之间的依赖性,促进了所选特征$\varphi(x_i)$的多样性。我们首先提供了体积重缩放抽样的广义版本,预期结果在样本数量$n=O(m\log(m))$的情况下具有准最优性,这意味着期望的$L^2$误差被一个常数乘以$L^2$中的最佳逼近误差所限制。此外,进一步假设函数属于连续嵌入在$L^2$中的某个标准向量空间$H$,我们进一步证明逼近几乎肯定地受到在$H$-范数中度量的最佳逼近误差的限制。这包括来自$L^\infty$或再生核希尔伯特空间的函数情况。最后,我们提出了一种替代策略,即使用投影DPP(或体积抽样)的独立重复,产生与独立同分布或体积抽样相似的误差界,但在实践中使用更少的样本。数值实验展示了不同策略的性能。
更新时间: 2024-03-21 08:29:32
领域: math.NA,cs.LG,cs.NA,math.ST,stat.TH
Sequence-to-Sequence Language Models for Character and Emotion Detection in Dream Narratives
The study of dreams has been central to understanding human (un)consciousness, cognition, and culture for centuries. Analyzing dreams quantitatively depends on labor-intensive, manual annotation of dream narratives. We automate this process through a natural language sequence-to-sequence generation framework. This paper presents the first study on character and emotion detection in the English portion of the open DreamBank corpus of dream narratives. Our results show that language models can effectively address this complex task. To get insight into prediction performance, we evaluate the impact of model size, prediction order of characters, and the consideration of proper names and character traits. We compare our approach with a large language model using in-context learning. Our supervised models perform better while having 28 times fewer parameters. Our model and its generated annotations are made publicly available.
Updated: 2024-03-21 08:27:49
标题: 梦境叙事中用于字符和情感检测的序列到序列语言模型
摘要: 梦的研究对于理解人类意识、认知和文化至关重要已有数个世纪。定量分析梦依赖于对梦境叙述的繁重、手动标注。我们通过自然语言序列生成框架自动化这一过程。本文介绍了首个关于在梦境叙述的DreamBank开放语料库中英文部分中角色和情感检测的研究。我们的结果显示语言模型能够有效地解决这一复杂任务。为了了解预测性能,我们评估了模型大小、角色预测顺序以及考虑专有名称和角色特征的影响。我们将我们的方法与使用上下文学习的大型语言模型进行比较。我们的监督模型表现更好,同时参数数量减少了28倍。我们的模型及其生成的注释已公开发布。
更新时间: 2024-03-21 08:27:49
领域: cs.CL,cs.AI
PyVRP: a high-performance VRP solver package
We introduce PyVRP, a Python package that implements hybrid genetic search in a state-of-the-art vehicle routing problem (VRP) solver. The package is designed for the VRP with time windows (VRPTW), but can be easily extended to support other VRP variants. PyVRP combines the flexibility of Python with the performance of C++, by implementing (only) performance critical parts of the algorithm in C++, while being fully customisable at the Python level. PyVRP is a polished implementation of the algorithm that ranked 1st in the 2021 DIMACS VRPTW challenge and, after improvements, ranked 1st on the static variant of the EURO meets NeurIPS 2022 vehicle routing competition. The code follows good software engineering practices, and is well-documented and unit tested. PyVRP is freely available under the liberal MIT license. Through numerical experiments we show that PyVRP achieves state-of-the-art results on the VRPTW and capacitated VRP. We hope that PyVRP enables researchers and practitioners to easily and quickly build on a state-of-the-art VRP solver.
Updated: 2024-03-21 08:14:36
标题: PyVRP: 一个高性能的VRP求解器包
摘要: 我们介绍PyVRP,这是一个实现混合遗传搜索的Python包,用于解决最先进的车辆路径问题(VRP)。该包设计用于具有时间窗口(VRPTW)的VRP,但可以轻松扩展以支持其他VRP变体。PyVRP结合了Python的灵活性和C++的性能,通过在C++中实现(仅)性能关键部分的算法,同时在Python级别上完全可定制。PyVRP是该算法的完善实现,在2021年DIMACS VRPTW挑战赛中排名第一,在改进后,在EURO meets NeurIPS 2022车辆路径竞赛的静态变体中排名第一。该代码遵循良好的软件工程实践,并且有良好的文档和单元测试。PyVRP在宽松的MIT许可下免费提供。通过数值实验,我们展示PyVRP在VRPTW和容量限制的VRP上取得了最先进的结果。我们希望PyVRP能够使研究人员和从业者能够轻松快速地构建基于最先进的VRP求解器。
更新时间: 2024-03-21 08:14:36
领域: cs.NE,cs.LG
Open Knowledge Base Canonicalization with Multi-task Learning
The construction of large open knowledge bases (OKBs) is integral to many knowledge-driven applications on the world wide web such as web search. However, noun phrases and relational phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Nevertheless, these works fail to fully exploit the synergy between clustering and KGE learning, and the methods designed for these subtasks are sub-optimal. To this end, we put forward a multi-task learning framework, namely MulCanon, to tackle OKB canonicalization. In addition, diffusion model is used in the soft clustering process to improve the noun phrase representations with neighboring information, which can lead to more accurate representations. MulCanon unifies the learning objectives of these sub-tasks, and adopts a two-stage multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization benchmarks validates that MulCanon can achieve competitive canonicalization results.
Updated: 2024-03-21 08:03:46
标题: 多任务学习中的知识库规范化
摘要: 构建大型开放知识库(OKBs)对于许多基于知识驱动的网络应用如网络搜索至关重要。然而,OKBs中的名词短语和关系短语通常存在冗余和歧义,这需要对OKB规范化进行研究。当前的解决方案通过设计先进的聚类算法并利用知识图嵌入(KGE)进一步促进规范化过程来解决OKB规范化问题。然而,这些工作未能充分利用聚类和KGE学习之间的协同作用,并且为这些子任务设计的方法是次优的。为此,我们提出了一个多任务学习框架,名为MulCanon,用于处理OKB规范化。此外,在软聚类过程中使用扩散模型来改进名词短语的表示,通过邻近信息可以获得更准确的表示。MulCanon统一了这些子任务的学习目标,并采用了两阶段多任务学习范式来进行训练。对流行的OKB规范化基准的彻底实验研究验证了MulCanon可以实现竞争性的规范化结果。
更新时间: 2024-03-21 08:03:46
领域: cs.AI,cs.CL,cs.LG
Unsupervised Audio-Visual Segmentation with Modality Alignment
Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current AVS methods rely on costly fine-grained annotations of mask-audio pairs, making them impractical for scalability. To address this, we introduce unsupervised AVS, eliminating the need for such expensive annotation. To tackle this more challenging problem, we propose an unsupervised learning method, named Modality Correspondence Alignment (MoCA), which seamlessly integrates off-the-shelf foundation models like DINO, SAM, and ImageBind. This approach leverages their knowledge complementarity and optimizes their joint usage for multi-modality association. Initially, we estimate positive and negative image pairs in the feature space. For pixel-level association, we introduce an audio-visual adapter and a novel pixel matching aggregation strategy within the image-level contrastive learning framework. This allows for a flexible connection between object appearance and audio signal at the pixel level, with tolerance to imaging variations such as translation and rotation. Extensive experiments on the AVSBench (single and multi-object splits) and AVSS datasets demonstrate that our MoCA outperforms strongly designed baseline methods and approaches supervised counterparts, particularly in complex scenarios with multiple auditory objects. Notably when comparing mIoU, MoCA achieves a substantial improvement over baselines in both the AVSBench (S4: +17.24%; MS3: +67.64%) and AVSS (+19.23%) audio-visual segmentation challenges.
Updated: 2024-03-21 07:56:09
标题: 无监督音频-视觉分割与模态对齐
摘要: 音频-视觉分割(AVS)旨在在像素级别确定在视觉场景中产生特定声音的对象。当前的AVS方法依赖于对掩膜-音频对的昂贵细粒度注释,使其在可伸缩性方面不切实际。为了解决这个问题,我们引入了无监督的AVS,消除了对这种昂贵注释的需求。为了解决这个更具挑战性的问题,我们提出了一种无监督学习方法,称为Modality Correspondence Alignment(MoCA),它无缝地集成了像DINO、SAM和ImageBind等基础模型。这种方法利用它们的知识互补性,并优化它们的联合使用,以进行多模态关联。首先,我们在特征空间中估计正负图像对。对于像素级关联,我们在图像层对比学习框架中引入了一个音频-视觉适配器和一种新颖的像素匹配聚合策略。这允许在像素级别的对象外观和音频信号之间建立灵活的连接,可以容忍像平移和旋转这样的成像变化。在AVSBench(单个和多个对象拆分)和AVSS数据集上进行的大量实验表明,我们的MoCA在复杂场景中表现优异,特别是在涉及多个听觉对象的情况下,超过了设计良好的基线方法和方法的监督对照。值得注意的是,在比较mIoU时,MoCA在AVSBench(S4:+17.24%;MS3:+67.64%)和AVSS(+19.23%)音频-视觉分割挑战中均取得了显著的改进。
更新时间: 2024-03-21 07:56:09
领域: cs.CV,cs.AI
Debiasing surgeon: fantastic weights and how to find them
Nowadays an ever-growing concerning phenomenon, the emergence of algorithmic biases that can lead to unfair models, emerges. Several debiasing approaches have been proposed in the realm of deep learning, employing more or less sophisticated approaches to discourage these models from massively employing these biases. However, a question emerges: is this extra complexity really necessary? Is a vanilla-trained model already embodying some ``unbiased sub-networks'' that can be used in isolation and propose a solution without relying on the algorithmic biases? In this work, we show that such a sub-network typically exists, and can be extracted from a vanilla-trained model without requiring additional training. We further validate that such specific architecture is incapable of learning a specific bias, suggesting that there are possible architectural countermeasures to the problem of biases in deep neural networks.
Updated: 2024-03-21 07:50:45
标题: 校正外科医生的偏见:奇妙的权重及其发现方法
摘要: 现今,一个日益引起关注的现象是算法偏见的出现,这可能导致不公平的模型出现。在深度学习领域已经提出了几种去偏见的方法,采用更或者不太复杂的方法来阻止这些模型大规模地使用这些偏见。然而,一个问题出现了:这种额外的复杂性真的是必要的吗?一个普通训练的模型是否已经包含了一些可以独立运行并提出解决方案而不依赖算法偏见的“无偏置子网络”?在这项工作中,我们展示了这样的子网络通常存在,并且可以从一个普通训练的模型中提取出来,而不需要额外的训练。我们进一步验证了这种特定的架构无法学习特定的偏见,这表明在深度神经网络中存在可能的架构对抗措施来解决偏见问题。
更新时间: 2024-03-21 07:50:45
领域: cs.LG,cs.AI,cs.CV,cs.CY
MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection
Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text, images, or videos. To overcome this limitation, we introduce a Multimodal Object-Oriented Graph Attention Model (MOGAM), which can be applied to diverse types of data, offering a more scalable and versatile solution. Furthermore, to ensure that our model can capture authentic symptoms of depression, we only include vlogs from users with a clinical diagnosis. To leverage the diverse features of vlogs, we adopt a multimodal approach and collect additional metadata such as the title, description, and duration of the vlogs. To effectively aggregate these multimodal features, we employed a cross-attention mechanism. MOGAM achieved an accuracy of 0.871 and an F1-score of 0.888. Moreover, to validate the scalability of MOGAM, we evaluated its performance with a benchmark dataset and achieved comparable results with prior studies (0.61 F1-score). In conclusion, we believe that the proposed model, MOGAM, is an effective solution for detecting depression in social media, offering potential benefits in the early detection and treatment of this mental health condition.
Updated: 2024-03-21 07:45:58
标题: MOGAM:一种用于抑郁症检测的多模态面向对象图注意力模型
摘要: 早期发现在抑郁症治疗中起着至关重要的作用。因此,许多研究已经集中在社交媒体平台上,个人在这里表达他们的情绪,旨在实现对抑郁症的早期检测。然而,现有方法大多依赖于特定特征,导致在不同类型的社交媒体数据集(如文本、图片或视频)之间的可扩展性有限。为了克服这一限制,我们引入了一种多模态面向对象图注意力模型(MOGAM),可应用于各种类型的数据,提供更具可扩展性和多功能性的解决方案。此外,为了确保我们的模型能够捕捉到抑郁症的真实症状,我们只包含了具有临床诊断的用户的视频日志。为了利用视频日志的多样特征,我们采用了多模态方法,并收集了额外的元数据,如视频日志的标题、描述和持续时间。为了有效地整合这些多模态特征,我们采用了交叉注意力机制。MOGAM实现了0.871的准确度和0.888的F1分数。此外,为了验证MOGAM的可扩展性,我们使用基准数据集评估了其性能,并取得了与先前研究相当的结果(0.61的F1分数)。总之,我们相信所提出的模型MOGAM是在社交媒体中检测抑郁症的有效解决方案,为这种心理健康状况的早期检测和治疗提供了潜在的益处。
更新时间: 2024-03-21 07:45:58
领域: cs.CL,cs.AI,cs.LG
LMM-Assisted Breast Cancer Treatment Target Segmentation with Consistency Embedding
Recent advancements in Artificial Intelligence (AI) have profoundly influenced medical fields, by providing tools to reduce clinical workloads. However, most AI models are constrained to execute unimodal tasks, in stark contrast to the comprehensive approaches utilized by medical professionals. To address this, here we present RO-LMM, a multi-purpose large multimodal model (LMM) tailored for the field of radiation oncology. This model covers series of tasks within clinical workflow, adept at clinical report summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation. In particular, to perform consecutive clinical tasks, we further present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the capability of handling clean inputs, and transform this concept into LMM-driven segmentation framework as Consistency Embedding Segmentation~(CESEG). Experimental results on multi-centre cohorts demonstrate our RO-LMM's promising performance for multiple clinical tasks with generalization capabilities.
Updated: 2024-03-21 07:38:51
标题: LMM辅助的乳腺癌治疗靶向分割与一致性嵌入
摘要: 最近人工智能(AI)在医学领域取得了重大进展,通过提供工具来减少临床工作量。然而,大多数AI模型受限于执行单模式任务,与医学专业人员采用的综合方法形成鲜明对比。为了解决这个问题,我们在这里提出了RO-LMM,一个专为放射肿瘤学领域量身定制的多功能大型多模型(LMM)。该模型涵盖了临床工作流程中的一系列任务,擅长临床报告摘要、放射治疗计划建议和计划引导的靶区分割。特别是,为了执行连续的临床任务,我们进一步提出了一种新颖的一致性嵌入微调(CEFTune)技术,可以提高LMM对嘈杂输入的稳健性,同时保留处理清洁输入的能力,并将这一概念转化为以LMM为驱动的分割框架,即一致性嵌入分割(CESEG)。多中心队列的实验结果显示,我们的RO-LMM在多个临床任务上表现出有前景的性能和泛化能力。
更新时间: 2024-03-21 07:38:51
领域: cs.CV,cs.AI,cs.LG
Complementarity in Human-AI Collaboration: Concept, Sources, and Evidence
Artificial intelligence (AI) can improve human decision-making in various application areas. Ideally, collaboration between humans and AI should lead to complementary team performance (CTP) -- a level of performance that neither of them can attain individually. So far, however, CTP has rarely been observed, suggesting an insufficient understanding of the complementary constituents in human-AI collaboration that can contribute to CTP in decision-making. This work establishes a holistic theoretical foundation for understanding and developing human-AI complementarity. We conceptualize complementarity by introducing and formalizing the notion of complementarity potential and its realization. Moreover, we identify and outline sources that explain CTP. We illustrate our conceptualization by applying it in two empirical studies exploring two different sources of complementarity potential. In the first study, we focus on information asymmetry as a source and, in a real estate appraisal use case, demonstrate that humans can leverage unique contextual information to achieve CTP. In the second study, we focus on capability asymmetry as an alternative source, demonstrating how heterogeneous capabilities can help achieve CTP. Our work provides researchers with a theoretical foundation of complementarity in human-AI decision-making and demonstrates that leveraging sources of complementarity potential constitutes a viable pathway toward effective human-AI collaboration.
Updated: 2024-03-21 07:27:17
标题: 人工智能与人类协作中的互补性:概念、来源和证据
摘要: 人工智能(AI)可以提高人类在各个应用领域的决策能力。理想情况下,人类和AI之间的合作应该导致互补团队绩效(CTP) - 这是他们单独无法达到的绩效水平。然而,迄今为止,CTP很少被观察到,这表明对人类与AI合作中互补成分的理解不足,这些成分可以促成决策中的CTP。本研究建立了一个全面的理论基础,以理解和发展人类与AI之间的互补性。我们通过引入和形式化互补潜力及其实现的概念,概念化了互补性。此外,我们确定并概述解释CTP的来源。我们通过将其应用于两项探索互补潜力的实证研究中来阐明我们的概念化。在第一项研究中,我们将信息不对称作为一个来源,并在一个房地产评估用例中展示了人类如何利用独特的背景信息来实现CTP。在第二项研究中,我们将能力不对称作为一种替代来源,展示了异质能力如何帮助实现CTP。我们的工作为研究人员提供了关于人类与AI决策中互补性的理论基础,并证明利用互补潜力的来源构成了实现有效人类与AI合作的可行途径。
更新时间: 2024-03-21 07:27:17
领域: cs.HC,cs.AI
Quantum-activated neural reservoirs on-chip open up large hardware security models for resilient authentication
Quantum artificial intelligence is a frontier of artificial intelligence research, pioneering quantum AI-powered circuits to address problems beyond the reach of deep learning with classical architectures. This work implements a large-scale quantum-activated recurrent neural network possessing more than 3 trillion hardware nodes/cm$^2$, originating from repeatable atomic-scale nucleation dynamics in an amorphous material integrated on-chip, controlled with 0.07 nW electric power per readout channel. Compared to the best-performing reservoirs currently reported, this implementation increases the scale of the network by two orders of magnitude and reduces the power consumption by six, reaching power efficiencies in the range of the human brain, dissipating 0.2 nW/neuron. When interrogated by a classical input, the chip implements a large-scale hardware security model, enabling dictionary-free authentication secure against statistical inference attacks, including AI's present and future development, even for an adversary with a copy of all the classical components available. Experimental tests report 99.6% reliability, 100% user authentication accuracy, and an ideal 50% key uniqueness. Due to its quantum nature, the chip supports a bit density per feature size area three times higher than the best technology available, with the capacity to store more than $2^{1104}$ keys in a footprint of 1 cm$^2$. Such a quantum-powered platform could help counteract the emerging form of warfare led by the cybercrime industry in breaching authentication to target small to large-scale facilities, from private users to intelligent energy grids.
Updated: 2024-03-21 07:25:52
标题: 量子激活的芯片上的神经水库为弹性认证打开了大型硬件安全模型
摘要: 量子人工智能是人工智能研究的前沿,开拓了使用量子人工智能电路解决深度学习无法涵盖的问题的领域。这项工作实现了一个大规模的量子激活循环神经网络,拥有超过3万亿硬件节点/cm^2,源于在芯片上集成的非晶材料中可重复的原子尺度成核动力学,每个读取通道控制所需的电功率为0.07 nW。与当前报道的最佳性能水库相比,这种实现将网络规模增加了两个数量级,同时将功耗降低了六倍,达到了与人脑相当的功率效率,每个神经元的功耗为0.2 nW。当受到经典输入调查时,芯片实现了一个大规模硬件安全模型,能够进行无字典的认证,抵御统计推断攻击,包括当前和未来的人工智能开发,即使对手拥有所有经典组件的副本。实验测试报告显示了99.6%的可靠性,100%的用户认证准确性和理想的50%密钥唯一性。由于其量子特性,该芯片支持每个特征大小区域的位密度是目前最佳技术的三倍,具有存储超过$2^{1104}$个密钥的能力,占地面积为1 cm^2。这种量子驱动平台可以帮助抵制由网络犯罪产业主导的新兴战争形式,突破认证以瞄准从个人用户到智能能源网络的各种规模设施。
更新时间: 2024-03-21 07:25:52
领域: cond-mat.dis-nn,cs.AI,cs.CR
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
We propose a method that can generate cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN. Inspired by the success of recent unconditional video generation, we leverage a powerful pre-trained image generator to synthesize high-quality cinemagraphs. Unlike previous approaches that mainly utilize the latent space of a pre-trained StyleGAN, our approach utilizes its deep feature space for both GAN inversion and cinemagraph generation. Specifically, we propose multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions. By using MSDFW, the generated cinemagraphs are of high resolution and exhibit plausible looping animation. We demonstrate the superiority of our method through user studies and quantitative comparisons with state-of-the-art cinemagraph generation methods and a video generation method that uses a pre-trained StyleGAN.
Updated: 2024-03-21 07:21:51
标题: StyleCineGAN:使用预训练StyleGAN生成景观动态图片
摘要: 我们提出了一种方法,可以使用预训练的StyleGAN从静止的风景图像自动生成电影图。受最近无条件视频生成成功的启发,我们利用强大的预训练图像生成器合成高质量的电影图。与以往主要利用预训练StyleGAN的潜在空间的方法不同,我们的方法利用其深度特征空间进行GAN反演和电影图生成。具体而言,我们提出了多尺度深度特征变形(MSDFW),它在不同分辨率下变形预训练StyleGAN的中间特征。通过使用MSDFW,生成的电影图分辨率高,展现出可信的循环动画。通过用户研究和与最先进的电影图生成方法以及使用预训练StyleGAN的视频生成方法的定量比较,我们展示了我们方法的优越性。
更新时间: 2024-03-21 07:21:51
领域: cs.CV,cs.AI,cs.GR
ED-NeRF: Efficient Text-Guided Editing of 3D Scene with Latent Space NeRF
Recently, there has been a significant advancement in text-to-image diffusion models, leading to groundbreaking performance in 2D image generation. These advancements have been extended to 3D models, enabling the generation of novel 3D objects from textual descriptions. This has evolved into NeRF editing methods, which allow the manipulation of existing 3D objects through textual conditioning. However, existing NeRF editing techniques have faced limitations in their performance due to slow training speeds and the use of loss functions that do not adequately consider editing. To address this, here we present a novel 3D NeRF editing approach dubbed ED-NeRF by successfully embedding real-world scenes into the latent space of the latent diffusion model (LDM) through a unique refinement layer. This approach enables us to obtain a NeRF backbone that is not only faster but also more amenable to editing compared to traditional image space NeRF editing. Furthermore, we propose an improved loss function tailored for editing by migrating the delta denoising score (DDS) distillation loss, originally used in 2D image editing to the three-dimensional domain. This novel loss function surpasses the well-known score distillation sampling (SDS) loss in terms of suitability for editing purposes. Our experimental results demonstrate that ED-NeRF achieves faster editing speed while producing improved output quality compared to state-of-the-art 3D editing models.
Updated: 2024-03-21 07:20:35
标题: ED-NeRF:具有潜在空间NeRF的高效文本引导3D场景编辑
摘要: 最近,文本到图像扩散模型取得了重大进展,在2D图像生成方面表现出开创性的性能。这些进展已经扩展到3D模型,使得可以通过文本描述生成新颖的3D对象。这已经发展成为NeRF编辑方法,允许通过文本条件对现有3D对象进行操作。然而,现有的NeRF编辑技术在性能方面存在局限性,主要是由于训练速度慢以及使用的损失函数未能充分考虑编辑。为了解决这个问题,我们提出了一种新颖的3D NeRF编辑方法,命名为ED-NeRF,通过一个独特的精细化层成功地将现实场景嵌入到潜在扩散模型(LDM)的潜在空间中。这种方法使我们能够获得一个不仅更快而且更易于编辑的NeRF骨干,与传统的图像空间NeRF编辑相比。此外,我们提出了一种针对编辑的改进损失函数,通过将最初用于2D图像编辑的delta去噪分数(DDS)蒸馏损失迁移到三维领域。这种新颖的损失函数在适用于编辑目的方面超越了众所周知的分数蒸馏采样(SDS)损失。我们的实验结果表明,ED-NeRF实现了更快的编辑速度,同时产生了比最先进的3D编辑模型更好的输出质量。
更新时间: 2024-03-21 07:20:35
领域: cs.CV,cs.AI,cs.LG,stat.ML
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules
Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
Updated: 2024-03-21 07:16:03
标题: QH9:QM9分子的量子哈密顿预测基准
摘要: 监督机器学习方法越来越多地被用于加速电子结构预测,作为第一原理计算方法(如密度泛函理论(DFT))的替代方法。虽然许多量子化学数据集关注化学性质和原子力,但实现准确和高效地预测哈密顿矩阵的能力是非常重要的,因为它是决定物理系统和化学性质的量子状态的最重要和基本的物理量。在这项工作中,我们生成了一个新的量子哈密顿数据集,命名为QH9,基于QM9数据集为999或2998个分子动力学轨迹和130,831个稳定的分子几何构型提供精确的哈密顿矩阵。通过设计具有各种分子的基准任务,我们展示了当前的机器学习模型有能力预测任意分子的哈密顿矩阵。QH9数据集和基准模型都通过开源基准提供给社区,这对于开发机器学习方法并加速科学和技术应用的分子和材料设计非常有价值。我们的基准测试公开可用于https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench。
更新时间: 2024-03-21 07:16:03
领域: physics.chem-ph,cs.AI,cs.LG
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed at enhancing the potential of multiple text prompts for matching associated pixel embeddings. We first propose Multi-Prompts Sinkhorn (MPS) based on the Optimal Transport (OT) algorithm, which leads multiple text prompts to selectively focus on various semantic features within image pixels. Moreover, inspired by the success of Sinkformers in unimodal settings, we introduce the extension of MPS, called Multi-Prompts Sinkhorn Attention (MPSA), which effectively replaces cross-attention mechanisms within Transformer framework in multimodal settings. Through extensive experiments, we demonstrate that OTSeg achieves state-of-the-art (SOTA) performance with significant gains on Zero-Shot Semantic Segmentation (ZS3) tasks across three benchmark datasets.
Updated: 2024-03-21 07:15:37
标题: OTSeg:多提示Sinkhorn注意力用于零样本语义分割
摘要: 最近CLIP的成功表明通过将多模态知识转移到像素级分类,零样本语义分割取得了令人期待的结果。然而,在现有方法中,利用预训练的CLIP知识来紧密对齐文本嵌入和像素嵌入仍然存在局限性。为了解决这个问题,我们提出了OTSeg,一种新颖的多模态注意机制,旨在增强多个文本提示匹配相关像素嵌入的潜力。我们首先基于最优传输(OT)算法提出了基于多提示Sinkhorn(MPS),这使多个文本提示能够有选择地关注图像像素中的各种语义特征。此外,受Sinkformers在单模态设置中的成功启发,我们引入了MPS的扩展,称为多提示Sinkhorn注意(MPSA),该方法有效地在多模态设置中替换了Transformer框架中的交叉注意机制。通过广泛的实验,我们证明OTSeg在三个基准数据集上实现了零样本语义分割(ZS3)任务的最新性能,并取得了显著的进展。
更新时间: 2024-03-21 07:15:37
领域: cs.CV,cs.AI,cs.LG,stat.ML
Task Graph offloading via Deep Reinforcement Learning in Mobile Edge Computing
Various mobile applications that comprise dependent tasks are gaining widespread popularity and are increasingly complex. These applications often have low-latency requirements, resulting in a significant surge in demand for computing resources. With the emergence of mobile edge computing (MEC), it becomes the most significant issue to offload the application tasks onto small-scale devices deployed at the edge of the mobile network for obtaining a high-quality user experience. However, since the environment of MEC is dynamic, most existing works focusing on task graph offloading, which rely heavily on expert knowledge or accurate analytical models, fail to fully adapt to such environmental changes, resulting in the reduction of user experience. This paper investigates the task graph offloading in MEC, considering the time-varying computation capabilities of edge computing devices. To adapt to environmental changes, we model the task graph scheduling for computation offloading as a Markov Decision Process (MDP). Then, we design a deep reinforcement learning algorithm (SATA-DRL) to learn the task scheduling strategy from the interaction with the environment, to improve user experience. Extensive simulations validate that SATA-DRL is superior to existing strategies in terms of reducing average makespan and deadline violation.
Updated: 2024-03-21 07:12:06
标题: 移动边缘计算中基于深度强化学习的任务图卸载
摘要: 各种包含依赖任务的移动应用程序正变得越来越受欢迎并且越来越复杂。这些应用程序通常具有低延迟要求,导致对计算资源的需求急剧增加。随着移动边缘计算(MEC)的出现,将应用任务卸载到部署在移动网络边缘的小型设备上以获得高质量用户体验成为最重要的问题。然而,由于MEC环境是动态的,大多数现有研究集中在任务图卸载上,这些研究往往依赖专业知识或精确的分析模型,无法完全适应这种环境变化,导致用户体验降低。本文研究了MEC中的任务图卸载,考虑到边缘计算设备的时变计算能力。为了适应环境变化,我们将计算卸载的任务图调度建模为马尔可夫决策过程(MDP)。然后,我们设计了一个深度强化学习算法(SATA-DRL)来从与环境的互动中学习任务调度策略,以改善用户体验。大量的模拟验证表明,SATA-DRL在减少平均完成时间和截止期违规方面优于现有策略。
更新时间: 2024-03-21 07:12:06
领域: cs.DC,cs.LG
RakutenAI-7B: Extending Large Language Models for Japanese
We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.
Updated: 2024-03-21 06:56:07
标题: 楽天AI-7B:扩展日语大型语言模型
摘要: 我们介绍了RakutenAI-7B,这是一套针对日本语的大型语言模型,它在日本语言模型测试基准中表现出最佳性能,超过了开放的7B模型。除了基础模型,我们还发布了经过指导和聊天调整的模型,分别是RakutenAI-7B-instruct和RakutenAI-7B-chat,遵守Apache 2.0许可协议。
更新时间: 2024-03-21 06:56:07
领域: cs.CL,cs.LG
Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model
In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly, Gram angular field coding technique is used to encode the time domain signal of the rolling bearing and generate the feature map to retain the complete information of the vibration signal. Then, the re-sulting data is divided into a training set, a validation set, and a test set. Among them, the training set is input into the gradient penalty Wasserstein distance generation adversarial network to complete the training, and a new sample with similar features to the training sample is obtained, and then the original training set is expanded. Next, multi-scale convolution is used to extract the fault features of the extended training set, and the feature graph is normalized by example to overcome the influence of the difference in feature distribution. Finally, the attention mechanism is applied to the adaptive weighting of normalized features and the extraction of deep features, and the fault diagnosis is completed by the softmax classifier. Compared with ResNet method, the experimental results show that the proposed method has better generalization performance and anti-noise performance.
Updated: 2024-03-21 06:42:35
标题: 基于生成对抗增强多尺度卷积神经网络模型的滚动轴承故障诊断方法
摘要: 为了解决当前卷积神经网络无法有效捕捉滚动轴承时间域信号之间的相关特征,并且模型准确性受样本数量和质量限制的问题,提出了一种基于生成对抗增强多尺度卷积神经网络模型的滚动轴承故障诊断方法。首先,使用格拉姆角编码技术对滚动轴承的时间域信号进行编码并生成特征图,以保留振动信号的完整信息。然后,将生成的数据分为训练集、验证集和测试集。其中,训练集输入到渐变惩罚Wasserstein距离生成对抗网络中完成训练,得到具有类似特征的新样本,然后扩展原始训练集。接下来,使用多尺度卷积提取扩展训练集的故障特征,并通过示例对特征图进行归一化,以克服特征分布差异的影响。最后,应用注意机制对归一化特征进行自适应加权和深层特征提取,通过softmax分类器完成故障诊断。与ResNet方法相比,实验结果表明所提出的方法具有更好的泛化性能和抗噪声性能。
更新时间: 2024-03-21 06:42:35
领域: eess.SP,cs.LG
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Object-goal navigation is a crucial engineering task for the community of embodied navigation; it involves navigating to an instance of a specified object category within unseen environments. Although extensive investigations have been conducted on both end-to-end and modular-based, data-driven approaches, fully enabling an agent to comprehend the environment through perceptual knowledge and perform object-goal navigation as efficiently as humans remains a significant challenge. Recently, large language models have shown potential in this task, thanks to their powerful capabilities for knowledge extraction and integration. In this study, we propose a data-driven, modular-based approach, trained on a dataset that incorporates common-sense knowledge of object-to-room relationships extracted from a large language model. We utilize the multi-channel Swin-Unet architecture to conduct multi-task learning incorporating with multimodal inputs. The results in the Habitat simulator demonstrate that our framework outperforms the baseline by an average of 10.6% in the efficiency metric, Success weighted by Path Length (SPL). The real-world demonstration shows that the proposed approach can efficiently conduct this task by traversing several rooms. For more details and real-world demonstrations, please check our project webpage (https://sunleyuan.github.io/ObjectNav).
Updated: 2024-03-21 06:32:36
标题: 利用基于大型语言模型的房间-物体关系知识,增强多模态输入物体目标导航
摘要: 物体目标导航是体现式导航社区的关键工程任务;它涉及在未知环境中导航到特定物体类别的实例。尽管对端到端和基于模块化的数据驱动方法进行了广泛研究,但使代理能够通过知觉知识理解环境并像人类一样高效地执行物体目标导航仍然是一个重要挑战。最近,大型语言模型在这一任务中显示出潜力,这要归功于它们强大的知识提取和整合能力。在本研究中,我们提出了一种基于数据驱动、模块化的方法,该方法在一个数据集上进行训练,该数据集包含从大型语言模型中提取的物体到房间关系的常识知识。我们利用多通道Swin-Unet架构进行多任务学习,结合多模态输入。在Habitat模拟器中的结果表明,我们的框架在效率指标(路径长度加权成功率,SPL)上平均优于基线10.6%。现实世界的演示显示,提出的方法可以通过穿越几个房间高效地完成这项任务。有关更多详细信息和现实世界演示,请查看我们的项目网页(https://sunleyuan.github.io/ObjectNav)。
更新时间: 2024-03-21 06:32:36
领域: cs.RO,cs.AI,cs.CV
Policy Mirror Descent with Lookahead
Policy Mirror Descent (PMD) stands as a versatile algorithmic framework encompassing several seminal policy gradient algorithms such as natural policy gradient, with connections with state-of-the-art reinforcement learning (RL) algorithms such as TRPO and PPO. PMD can be seen as a soft Policy Iteration algorithm implementing regularized 1-step greedy policy improvement. However, 1-step greedy policies might not be the best choice and recent remarkable empirical successes in RL such as AlphaGo and AlphaZero have demonstrated that greedy approaches with respect to multiple steps outperform their 1-step counterpart. In this work, we propose a new class of PMD algorithms called $h$-PMD which incorporates multi-step greedy policy improvement with lookahead depth $h$ to the PMD update rule. To solve discounted infinite horizon Markov Decision Processes with discount factor $\gamma$, we show that $h$-PMD which generalizes the standard PMD enjoys a faster dimension-free $\gamma^h$-linear convergence rate, contingent on the computation of multi-step greedy policies. We propose an inexact version of $h$-PMD where lookahead action values are estimated. Under a generative model, we establish a sample complexity for $h$-PMD which improves over prior work. Finally, we extend our result to linear function approximation to scale to large state spaces. Under suitable assumptions, our sample complexity only involves dependence on the dimension of the feature map space instead of the state space size.
Updated: 2024-03-21 06:10:51
标题: "具有前瞻性的政策镜像下降"
摘要: 政策镜面下降(PMD)作为一种多功能算法框架,涵盖了几种开创性的政策梯度算法,如自然政策梯度,并与最先进的强化学习(RL)算法如TRPO和PPO有关。PMD可以被看作是实现带有正则化的一步贪心政策改进的软政策迭代算法。然而,一步贪心政策可能并不是最佳选择,最近在RL领域取得的显著实证成功,如AlphaGo和AlphaZero已经证明,相对于一步的贪心方法,多步贪心方法表现更好。在这项工作中,我们提出了一类新的PMD算法,称为$h$-PMD,它将多步贪心政策改进与前瞻深度$h$结合到PMD更新规则中。为了解决折扣无限视野马尔可夫决策过程,折扣因子为$\gamma$,我们展示了$h$-PMD相对于标准PMD具有更快的无维度$\gamma^h$-线性收敛率,取决于多步贪心政策的计算。我们提出了$h$-PMD的不精确版本,其中估计了前瞻动作价值。在一个生成模型下,我们建立了$h$-PMD的样本复杂性,这一结果优于以往的工作。最后,我们将结果扩展到线性函数逼近,以适应大状态空间。在适当的假设下,我们的样本复杂性只涉及对特征映射空间的维度依赖,而不是状态空间大小。
更新时间: 2024-03-21 06:10:51
领域: cs.LG,cs.AI,stat.ML
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models
Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs. This paper introduces a new metric for generative models called ANLS* for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets, 6 different GLLMs and 3 different prompting methods using the ANLS* metric is also provided, demonstrating the importance of the proposed metric. We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In 27 out of 35 cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $18$ percentage points. Sources are available at https://github.com/deepopinion/anls_star_metric
Updated: 2024-03-21 05:58:10
标题: ANLS* -- 生成式大型语言模型的通用文档处理度量标准
摘要: 传统上,辨别模型一直是文件分类和信息提取等任务的主要选择。这些模型做出的预测属于有限数量的预定义类别,便于进行二元真假评估,并能够直接计算诸如F1分数之类的指标。然而,最近生成式大型语言模型(GLLMs)的进展促使该领域发生了转变,因为它们具有增强的零样本能力,消除了对下游数据集和计算代价昂贵的微调的需求。然而,评估GLLMs存在挑战,因为辨别模型所使用的二元真假评估不适用于GLLMs所做出的预测。 本文介绍了一种新的用于评估各种任务(包括信息提取和分类任务)的生成模型的指标ANLS*。该指标将现有的ANLS指标扩展为一种插入替换,并仍与先前报告的ANLS分数兼容。还提供了使用ANLS*指标对7个不同数据集、6个不同GLLMs和3种不同提示方法进行评估的结果,展示了所提出的指标的重要性。 我们还对一种用于生成文档提示的新方法SFT进行了基准测试,与其他提示技术(如LATIN)进行对比。在35个案例中,SFT在27个案例中优于其他技术,并提高了最先进技术,有时高达18个百分点。 源代码可在https://github.com/deepopinion/anls_star_metric找到。
更新时间: 2024-03-21 05:58:10
领域: cs.CL,cs.AI
Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond
Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptability to real-world complexities. In this paper, we present a comprehensive review of the development and recent advances in deep learning for trajectory computing (DL4Traj). We first define trajectory data and provide a brief overview of widely-used deep learning models. Systematically, we explore deep learning applications in trajectory management (pre-processing, storage, analysis, and visualization) and mining (trajectory-related forecasting, trajectory-related recommendation, trajectory classification, travel time estimation, anomaly detection, and mobility generation). Notably, we encapsulate recent advancements in Large Language Models (LLMs) that hold the potential to augment trajectory computing. Additionally, we summarize application scenarios, public datasets, and toolkits. Finally, we outline current challenges in DL4Traj research and propose future directions. Relevant papers and open-source resources have been collated and are continuously updated at: \href{https://github.com/yoshall/Awesome-Trajectory-Computing}{DL4Traj Repo}.
Updated: 2024-03-21 05:57:27
标题: 深度学习在轨迹数据管理和挖掘中的应用:调查及展望
摘要: 轨迹计算是一个重要领域,涵盖了轨迹数据管理和挖掘,因其在诸如位置服务、城市交通和公共安全等各种实际应用中的关键作用而受到广泛关注。传统方法主要关注简单的时空特征,面临复杂计算、有限可扩展性和不足适应现实复杂性的挑战。本文首先对深度学习在轨迹计算中的发展和最新进展进行了全面回顾。我们首先定义了轨迹数据,并简要介绍了广泛使用的深度学习模型。系统地探讨了深度学习在轨迹管理(预处理、存储、分析和可视化)和挖掘(轨迹相关预测、轨迹相关推荐、轨迹分类、旅行时间估计、异常检测和移动生成)中的应用。值得注意的是,我们总结了大型语言模型(LLMs)在增强轨迹计算方面的潜力。此外,我们总结了应用场景、公共数据集和工具包。最后,我们概述了DL4Traj研究中的当前挑战并提出未来方向。相关论文和开源资源已整理并持续更新在:\href{https://github.com/yoshall/Awesome-Trajectory-Computing}{DL4Traj Repo}。
更新时间: 2024-03-21 05:57:27
领域: cs.LG,cs.AI,cs.CY,cs.DB
PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting
The complex spatial-temporal correlations in transportation networks make the traffic forecasting problem challenging. Since transportation system inherently possesses graph structures, many research efforts have been put with graph neural networks. Recently, constructing adaptive graphs to the data has shown promising results over the models relying on a single static graph structure. However, the graph adaptations are applied during the training phases and do not reflect the data used during the testing phases. Such shortcomings can be problematic especially in traffic forecasting since the traffic data often suffer from unexpected changes and irregularities in the time series. In this study, we propose a novel traffic forecasting framework called Progressive Graph Convolutional Network (PGCN). PGCN constructs a set of graphs by progressively adapting to online input data during the training and testing phases. Specifically, we implemented the model to construct progressive adjacency matrices by learning trend similarities among graph nodes. Then, the model is combined with the dilated causal convolution and gated activation unit to extract temporal features. With residual and skip connections, PGCN performs the traffic prediction. When applied to seven real-world traffic datasets of diverse geometric nature, the proposed model achieves state-of-the-art performance with consistency in all datasets. We conclude that the ability of PGCN to progressively adapt to input data enables the model to generalize in different study sites with robustness.
Updated: 2024-03-21 05:55:29
标题: PGCN:逐步图卷积网络用于时空交通预测
摘要: 交通网络中的复杂时空相关性使交通预测问题变得具有挑战性。由于交通系统天然具有图结构,许多研究工作已经使用图神经网络。最近,构建适应数据的自适应图表现出比依赖单一静态图结构的模型更有希望的结果。然而,图表现的调整是在训练阶段应用的,并不反映测试阶段使用的数据。这样的缺点在交通预测中可能会有问题,因为交通数据通常在时间序列中遭受意外变化和不规则性。在这项研究中,我们提出了一个名为渐进图卷积网络(PGCN)的新型交通预测框架。PGCN通过在训练和测试阶段逐渐适应在线输入数据来构建一组图。具体而言,我们实现了该模型以通过学习图节点之间的趋势相似性构建渐进邻接矩阵。然后,模型与膨胀因果卷积和门控激活单元结合以提取时间特征。通过残差和跳过连接,PGCN执行交通预测。当应用于七个具有不同几何特性的真实世界交通数据集时,所提出的模型在所有数据集中均实现了最先进的性能。我们得出结论,PGCN逐渐适应输入数据的能力使模型能够在不同研究站点中具有鲁棒性的泛化能力。
更新时间: 2024-03-21 05:55:29
领域: cs.LG
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Video diffusion models have recently made great progress in generation quality, but are still limited by the high memory and computational requirements. This is because current video diffusion models often attempt to process high-dimensional videos directly. To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation. Specifically, we propose an autoencoder that succinctly encodes a video as a combination of a content frame (like an image) and a low-dimensional motion latent representation. The former represents the common content, and the latter represents the underlying motion in the video, respectively. We generate the content frame by fine-tuning a pretrained image diffusion model, and we generate the motion latent representation by training a new lightweight diffusion model. A key innovation here is the design of a compact latent space that can directly utilizes a pretrained image diffusion model, which has not been done in previous latent video diffusion models. This leads to considerably better quality generation and reduced computational costs. For instance, CMD can sample a video 7.7$\times$ faster than prior approaches by generating a video of 512$\times$1024 resolution and length 16 in 3.1 seconds. Moreover, CMD achieves an FVD score of 212.7 on WebVid-10M, 27.3% better than the previous state-of-the-art of 292.4.
Updated: 2024-03-21 05:48:48
标题: 通过内容-帧运动-潜在分解实现高效视频扩散模型
摘要: 视频扩散模型最近在生成质量方面取得了巨大进展,但仍然受限于高内存和计算要求。这是因为当前的视频扩散模型通常试图直接处理高维视频。为了解决这个问题,我们提出了内容-运动潜在扩散模型(CMD),这是对预训练图像扩散模型的视频生成的一种新的高效扩展。具体而言,我们提出了一个自动编码器,将视频简洁地编码为内容帧(类似于图像)和低维运动潜在表示的组合。前者表示通用内容,后者分别表示视频中的潜在运动。我们通过微调预训练图像扩散模型来生成内容帧,并通过训练一个新的轻量级扩散模型来生成运动潜在表示。一个关键的创新是设计一个紧凑的潜在空间,可以直接利用预训练的图像扩散模型,这在以前的潜在视频扩散模型中没有做过。这导致了更好质量生成和减少的计算成本。例如,CMD可以比以前的方法快7.7倍地对512×1024分辨率和长度16的视频进行采样,只需3.1秒。此外,CMD在WebVid-10M上实现了212.7的FVD得分,比之前的292.4的最新技术水平提高了27.3%。
更新时间: 2024-03-21 05:48:48
领域: cs.CV,cs.LG
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
Updated: 2024-03-21 05:47:22
标题: 通过语言纠正提炼和检索机器人操作的可推广知识
摘要: 当今的机器人政策在面对泛化到新环境的挑战时表现不佳。人类的纠正反馈是一种关键的指导形式,可以帮助实现这种泛化。然而,适应并学习在线人类纠正是一项不容易的任务:机器人不仅需要随时间记住人类反馈以在新环境中检索正确信息并减少干预率,而且他们还需要能够对可以是关于高级人类偏好的任意纠正做出回应,也可以是关于技能参数的低级调整。在这项工作中,我们提出了基于大型语言模型(LLM)的在线校正蒸馏和检索(DROC)系统,该系统可以回应任意形式的语言反馈,从纠正中提炼出可泛化的知识,并基于文本和视觉相似性检索相关的过去经验,以提高在新环境中的性能。DROC能够回应一系列在线语言纠正,解决高级任务计划和低级技能原语中的失败。我们展示了DROC能够有效地从一系列在线纠正中提炼出相关信息并在具有新任务或对象实例的环境中检索这些知识。DROC通过仅使用第一轮中需要的一半的总纠正次数,在两轮迭代后几乎不需要任何纠正,表现出色。我们在https://sites.google.com/stanford.edu/droc 上展示了更多结果、视频、提示和代码。
更新时间: 2024-03-21 05:47:22
领域: cs.RO,cs.AI,cs.LG
Time-Synchronized Full System State Estimation Considering Practical Implementation Challenges
As the phasor measurement unit (PMU) placement problem involves a cost-benefit trade-off, more PMUs get placed on the higher voltage buses. However, this causes many of the lower voltage levels of the bulk power system to not be observed by PMUs. This lack of visibility then makes time-synchronized state estimation of the full system a challenging problem. We propose a Deep Neural network-based State Estimator (DeNSE) to overcome this problem. The DeNSE employs a Bayesian framework to indirectly combine inferences drawn from slow timescale but widespread supervisory control and data acquisition (SCADA) data with fast timescale but select PMU data to attain sub-second situational awareness of the entire system. The practical utility of the proposed approach is demonstrated by considering topology changes, non-Gaussian measurement noise, and bad data detection and correction. The results obtained using the IEEE 118-bus system show the superiority of the DeNSE over a purely SCADA state estimator and a PMU-only linear state estimator from a techno-economic viability perspective. Lastly, scalability of the DeNSE is proven by estimating the states of a large and realistic 2000-bus Synthetic Texas system.
Updated: 2024-03-21 05:45:15
标题: 考虑实际实施挑战的时间同步全系统状态估计
摘要: 随着相量测量单元(PMU)布置问题涉及成本效益权衡,更多的PMU被放置在较高电压的母线上。然而,这导致了许多电网系统的较低电压级别无法被PMU观测到。这种缺乏可见性使得整个系统的时间同步状态估计成为一个具有挑战性的问题。我们提出了一种基于深度神经网络的状态估计器(DeNSE)来克服这个问题。DeNSE采用贝叶斯框架,间接地将从慢时间尺度但广泛的监控和数据采集(SCADA)数据中得出的推断与来自快时间尺度但选择性PMU数据相结合,以获得整个系统的亚秒级情境意识。通过考虑拓扑变化、非高斯测量噪声和坏数据检测和纠正,展示了所提出方法的实际效用。使用IEEE 118母线系统获得的结果显示,从技术经济可行性角度来看,DeNSE相对于纯SCADA状态估计器和仅PMU的线性状态估计器具有优越性。最后,通过对一个大型和现实的2000母线合成德克萨斯系统的状态进行估计,证明了DeNSE的可扩展性。
更新时间: 2024-03-21 05:45:15
领域: eess.SP,cs.LG
Evolving Benchmark Functions to Compare Evolutionary Algorithms via Genetic Programming
In this study, we use Genetic Programming (GP) to compose new optimization benchmark functions. Optimization benchmarks have the important role of showing the differences between evolutionary algorithms, making it possible for further analysis and comparisons. We show that the benchmarks generated by GP are able to differentiate algorithms better than human-made benchmark functions. The fitness measure of the GP is the Wasserstein distance of the solutions found by a pair of optimizers. Additionally, we use MAP-Elites to both enhance the search power of the GP and also illustrate how the difference between optimizers changes by various landscape features. Our approach provides a novel way to automate the design of benchmark functions and to compare evolutionary algorithms.
Updated: 2024-03-21 05:42:17
标题: 通过遗传规划演化基准函数以比较进化算法
摘要: 在这项研究中,我们使用遗传编程(GP)来构建新的优化基准函数。优化基准函数在展示进化算法之间的差异方面起着重要作用,使进一步分析和比较成为可能。我们展示了由GP生成的基准函数能够比人工制作的基准函数更好地区分算法。GP的适应度度量是通过一对优化器找到的解的Wasserstein距离。此外,我们使用MAP-Elites来增强GP的搜索能力,并展示不同优化器之间的差异如何随着各种景观特征而变化。我们的方法提供了一种新颖的自动化设计基准函数和比较进化算法的方式。
更新时间: 2024-03-21 05:42:17
领域: cs.NE,cs.AI
Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks
Biased attributes, spuriously correlated with target labels in a dataset, can problematically lead to neural networks that learn improper shortcuts for classifications and limit their capabilities for out-of-distribution (OOD) generalization. Although many debiasing approaches have been proposed to ensure correct predictions from biased datasets, few studies have considered learning latent embedding consisting of intrinsic and biased attributes that contribute to improved performance and explain how the model pays attention to attributes. In this paper, we propose a novel debiasing framework, Debiasing Global Workspace, introducing attention-based information bottlenecks for learning compositional representations of attributes without defining specific bias types. Based on our observation that learning shape-centric representation helps robust performance on OOD datasets, we adopt those abilities to learn robust and generalizable representations of decomposable latent embeddings corresponding to intrinsic and biasing attributes. We conduct comprehensive evaluations on biased datasets, along with both quantitative and qualitative analyses, to showcase our approach's efficacy in attribute-centric representation learning and its ability to differentiate between intrinsic and bias-related features.
Updated: 2024-03-21 05:33:49
标题: 通过属性中心信息瓶颈学习可分解和无偏的表示
摘要: 偏见属性,与数据集中的目标标签虚假相关,可能导致神经网络学习不当的分类快捷方式,并限制其在超出分布(OOD)泛化方面的能力。虽然已经提出了许多去偏见的方法来确保从有偏见的数据集中得出正确的预测,但很少有研究考虑学习由内在和有偏见属性组成的潜在嵌入,这可以提高性能并解释模型如何关注属性。在本文中,我们提出了一个新颖的去偏见框架,去偏见全局工作空间,引入基于注意力的信息瓶颈,用于学习属性的组合表示,而无需定义特定的偏见类型。基于我们的观察,学习以形状为中心的表示有助于在OOD数据集上实现稳健性能,我们采用这些能力来学习对应于内在和偏见属性的可分解潜在嵌入的稳健和可泛化表示。我们对有偏见的数据集进行了全面评估,同时进行了定量和定性分析,展示了我们方法在属性为中心的表示学习方面的有效性,以及其区分内在和偏见相关特征的能力。
更新时间: 2024-03-21 05:33:49
领域: cs.CV,cs.LG
Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective
Prior studies on the emergence in large models have primarily focused on how the functional capabilities of large language models (LLMs) scale with model size. Our research, however, transcends this traditional paradigm, aiming to deepen our understanding of the emergence within LLMs by placing a special emphasis not just on the model size but more significantly on the complex behavior of neuron interactions during the training process. By introducing the concepts of "self-organization" and "multifractal analysis," we explore how neuron interactions dynamically evolve during training, leading to "emergence," mirroring the phenomenon in natural systems where simple micro-level interactions give rise to complex macro-level behaviors. To quantitatively analyze the continuously evolving interactions among neurons in large models during training, we propose the Neuron-based Multifractal Analysis (NeuroMFA). Utilizing NeuroMFA, we conduct a comprehensive examination of the emergent behavior in LLMs through the lens of both model size and training process, paving new avenues for research into the emergence in large models.
Updated: 2024-03-21 05:33:23
标题: 探索LLMs中的神经元相互作用和出现:从多重分形分析的角度进行研究
摘要: 以往关于大型模型出现的研究主要集中在大型语言模型(LLM)的功能能力如何随着模型规模的扩大而扩展。然而,我们的研究超越了这种传统范式,旨在通过不仅关注模型规模而更重要地关注训练过程中神经元相互作用的复杂行为,加深我们对LLM内部出现的理解。通过引入“自组织”和“多重分形分析”概念,我们探讨了神经元相互作用在训练过程中如何动态演变,导致“出现”,反映了在自然系统中简单的微观水平相互作用如何产生复杂的宏观行为的现象。为了定量分析大型模型训练过程中神经元之间不断演变的相互作用,我们提出了基于神经元的多重分形分析(NeuroMFA)。利用NeuroMFA,我们通过模型规模和训练过程的视角全面检验了LLM中出现的行为,为大型模型出现的研究开辟了新的途径。
更新时间: 2024-03-21 05:33:23
领域: cs.AI
Genetic Programming for Explainable Manifold Learning
Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.
Updated: 2024-03-21 05:17:22
标题: 遗传规划用于可解释流形学习
摘要: 流形学习技术在机器学习中发挥着关键作用,通过揭示高维数据中的低维嵌入,从而通过将数据转换为低维表示来增强数据分析的效率和可解释性。然而,当前流形学习方法面临的一个显著挑战是它们缺乏明确的功能映射,在许多实际应用中解释性至关重要。以其可解释的基于功能树的模型而闻名的遗传编程已成为解决这一挑战的一种有前途的方法。先前的研究利用多目标遗传编程来平衡流形质量与嵌入维度,产生跨一系列嵌入尺寸的功能映射。然而,这些映射树通常变得复杂,阻碍了可解释性。为此,在本文中,我们介绍了用于可解释流形学习的遗传编程(GP-EMaL),一种直接惩罚树复杂性的新方法。我们的新方法能够保持高流形质量,同时显著增强可解释性,并且还允许定制复杂度度量,如对称平衡、缩放和节点复杂度,以满足各种应用需求。我们的实验分析表明,GP-EMaL在大多数情况下能够与现有方法的性能相匹配,同时使用更简单、更小、更可解释的树结构。这一进步标志着朝着实现可解释的流形学习迈出了重要一步。
更新时间: 2024-03-21 05:17:22
领域: cs.NE,cs.LG
Reversible Jump Attack to Textual Classifiers with Modification Reduction
Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis-Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis-Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.
Updated: 2024-03-21 04:54:31
标题: 使用修改减少的可逆跳跃攻击对文本分类器进行翻译
摘要: 最近关于对抗性例子的研究揭示了自然语言处理(NLP)模型的漏洞。现有的生成对抗性例子的技术通常由对最佳对抗性例子不知情的确定性分层规则驱动,这种策略往往导致对抗性样本在变化幅度和攻击成功之间的平衡不佳。因此,在这项研究中,我们提出了两种算法,可逆跳跃攻击(RJA)和Metropolis-Hasting修改减少(MMR),分别生成高效的对抗性例子并改善例子的不可察觉性。RJA利用一种新颖的随机化机制扩大搜索空间,并有效地适应对抗性例子的多个扰动词。通过生成的对抗性例子,MMR应用Metropolis-Hasting采样器来增强对抗性例子的不可察觉性。大量实验证明,RJA-MMR在攻击性能、不可察觉性、流畅性和语法正确性方面优于当前的最先进方法。
更新时间: 2024-03-21 04:54:31
领域: cs.CR,cs.CL,cs.LG
TiC-CLIP: Continual Training of CLIP Models
Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch. Code is available at https://github.com/apple/ml-tic-clip.
Updated: 2024-03-21 04:47:27
标题: TiC-CLIP:CLIP模型的持续训练
摘要: 将大型基础模型保持最新数据更新是固有的昂贵。为避免不断重新训练的巨大成本,不断训练这些模型是至关重要的。这个问题被缺乏任何大规模持续学习基准或基线所加剧。我们引入了第一组用于训练视觉语言模型的大规模时间连续(TiC)基准:TiC-DataComp、TiC-YFCC和TiC-Redcaps。TiC-DataComp,我们最大的数据集,包含超过127亿个时间戳图像文本对,跨越9年(2014年至2022年)。我们首先使用我们的基准来策划各种动态评估,以衡量现有模型的时间鲁棒性。我们展示了OpenAI的CLIP(在2020年之前的数据上训练)与OpenCLIP存储库中更近期训练的模型相比,在我们策划的2021-2022年的检索任务中失去了约8%的零样本准确性。然后,我们研究如何高效地在时间连续数据上训练模型。我们证明了一种简单的基于复习的方法,即继续从上一个检查点进行训练并重播旧数据,与从头开始重新训练的标准做法相比,可以将计算量减少2.5倍。代码可在https://github.com/apple/ml-tic-clip 上找到。
更新时间: 2024-03-21 04:47:27
领域: cs.CV,cs.CL,cs.LG
Learning causal graphs using variable grouping according to ancestral relationship
Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms using divide-and-conquer approaches. For learning the entire causal graph, the approaches first split variables into several subsets according to the conditional independence relationships among the variables, then apply a conventional causal discovery algorithm to each subset and merge the estimated results. Since the divide-and-conquer approach reduces the number of variables to which a causal structure learning algorithm is applied, it is expected to improve the estimation accuracy of causal graphs, especially when the sample size is small relative to the number of variables and the model is sparse. However, existing methods are either computationally expensive or do not provide sufficient accuracy when the sample size is small. This paper proposes a new algorithm for grouping variables based the ancestral relationships among the variables, under the LiNGAM assumption, where the causal relationships are linear, and the mutually independent noise are distributed as continuous non-Gaussian distributions. We call the proposed algorithm CAG. The time complexity of the ancestor finding in CAG is shown to be cubic to the number of variables. Extensive computer experiments confirm that the proposed method outperforms the original DirectLiNGAM without grouping variables and other divide-and-conquer approaches not only in estimation accuracy but also in computation time when the sample size is small relative to the number of variables and the model is sparse.
Updated: 2024-03-21 04:42:04
标题: 学习因果图:根据祖先关系进行变量分组
摘要: 许多因果发现算法已被提出。然而,当样本量相对于变量数量较小时,使用现有方法估计因果图的准确性会降低。并且当样本量小于变量数量时,一些方法是不可行的。为了避免这些问题,一些研究人员提出了使用分治方法进行因果结构学习算法。为了学习整个因果图,这些方法首先根据变量之间的条件独立关系将变量分为几个子集,然后对每个子集应用传统因果发现算法,并合并估计结果。由于分治方法减少了应用因果结构学习算法的变量数量,预计能提高因果图的估计准确性,特别是在样本量相对于变量数量较小且模型稀疏的情况下。然而,现有方法要么计算昂贵,要么在样本量较小时无法提供足够的准确性。本文提出了一种基于变量之间祖先关系的新算法,基于LiNGAM假设,其中因果关系是线性的,并且相互独立的噪声分布为连续的非高斯分布。我们将提出的算法称为CAG。在CAG中祖先查找的时间复杂度被证明与变量数量的立方成正比。大量计算机实验证实,相对于变量不分组和其他分治方法,提出的方法在估计准确性和计算时间方面表现更好,尤其是在样本量相对于变量数量较小且模型稀疏的情况下。
更新时间: 2024-03-21 04:42:04
领域: stat.ML,cs.LG
A Physics Enhanced Residual Learning (PERL) Framework for Vehicle Trajectory Prediction
In vehicle trajectory prediction, physics models and data-driven models are two predominant methodologies. However, each approach presents its own set of challenges: physics models fall short in predictability, while data-driven models lack interpretability. Addressing these identified shortcomings, this paper proposes a novel framework, the Physics-Enhanced Residual Learning (PERL) model. PERL integrates the strengths of physics-based and data-driven methods for traffic state prediction. PERL contains a physics model and a residual learning model. Its prediction is the sum of the physics model result and a predicted residual as a correction to it. It preserves the interpretability inherent to physics-based models and has reduced data requirements compared to data-driven methods. Experiments were conducted using a real-world vehicle trajectory dataset. We proposed a PERL model, with the Intelligent Driver Model (IDM) as its physics car-following model and Long Short-Term Memory (LSTM) as its residual learning model. We compare this PERL model with the physics car-following model, data-driven model, and other physics-informed neural network (PINN) models. The result reveals that PERL achieves better prediction with a small dataset, compared to the physics model, data-driven model, and PINN model. Second, the PERL model showed faster convergence during training, offering comparable performance with fewer training samples than the data-driven model and PINN model. Sensitivity analysis also proves comparable performance of PERL using another residual learning model and a physics car-following model.
Updated: 2024-03-21 04:36:22
标题: 一种物理增强的残差学习(PERL)框架用于车辆轨迹预测
摘要: 在车辆轨迹预测中,物理模型和数据驱动模型是两种主要的方法论。然而,每种方法都有自己的一套挑战:物理模型在可预测性方面存在不足,而数据驱动模型则缺乏可解释性。针对这些已识别的缺点,本文提出了一种新颖的框架,即物理增强残差学习(PERL)模型。PERL将基于物理和数据驱动方法的优势整合到交通状态预测中。PERL包含一个物理模型和一个残差学习模型。其预测结果是物理模型结果和预测残差的总和,作为对其的校正。与数据驱动方法相比,它保留了基于物理的模型固有的可解释性,并且降低了数据要求。实验使用了真实世界的车辆轨迹数据集。我们提出了一个PERL模型,以智能驾驶员模型(IDM)作为其物理车辆跟随模型,以长短期记忆(LSTM)作为其残差学习模型。我们将该PERL模型与物理车辆跟随模型、数据驱动模型和其他基于物理的神经网络(PINN)模型进行比较。结果表明,与物理模型、数据驱动模型和PINN模型相比,PERL在小数据集下实现了更好的预测效果。其次,PERL模型在训练过程中显示出更快的收敛速度,提供了与数据驱动模型和PINN模型相比更少的训练样本可比的性能。敏感性分析还证明了PERL使用另一个残差学习模型和一个物理车辆跟随模型时具有可比的性能。
更新时间: 2024-03-21 04:36:22
领域: cs.LG
AI and Memory Wall
The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and interconnect bandwidth, which have only scaled at 1.6 and 1.4 times every 2 years, respectively. This disparity has made memory, rather than compute, the primary bottleneck in AI applications, particularly in serving. Here, we analyze encoder and decoder Transformer models and show how memory bandwidth can become the dominant bottleneck for decoder models. We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.
Updated: 2024-03-21 04:31:59
标题: 人工智能和内存墙
摘要: 随着前所未有的无监督训练数据的可用性,以及神经网络的规模定律,模型大小和计算要求在为LLMs提供/训练方面出现了前所未有的激增。然而,主要的性能瓶颈越来越多地转移到内存带宽上。在过去的20年里,服务器硬件的最大峰值FLOPS每2年增长3.0倍,超过了DRAM和互连带宽的增长速度,后者分别只增长了1.6倍和1.4倍。这种差距使内存,而不是计算,成为人工智能应用中的主要瓶颈,特别是在服务方面。在这里,我们分析了编码器和解码器Transformer模型,并展示了内存带宽如何成为解码器模型的主要瓶颈。我们主张重新设计模型架构,训练和部署策略,以克服这种内存限制。
更新时间: 2024-03-21 04:31:59
领域: cs.LG,cs.AR,cs.DC
Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors
Realistic practice and tailored feedback are key processes for training peer counselors with clinical skills. However, existing mechanisms of providing feedback largely rely on human supervision. Peer counselors often lack mechanisms to receive detailed feedback from experienced mentors, making it difficult for them to support the large number of people with mental health issues who use peer counseling. Our work aims to leverage large language models to provide contextualized and multi-level feedback to empower peer counselors, especially novices, at scale. To achieve this, we co-design with a group of senior psychotherapy supervisors to develop a multi-level feedback taxonomy, and then construct a publicly available dataset with comprehensive feedback annotations of 400 emotional support conversations. We further design a self-improvement method on top of large language models to enhance the automatic generation of feedback. Via qualitative and quantitative evaluation with domain experts, we demonstrate that our method minimizes the risk of potentially harmful and low-quality feedback generation which is desirable in such high-stakes scenarios.
Updated: 2024-03-21 04:23:56
标题: 使用大语言模型生成多层反馈,以赋能初级同行辅导员
摘要: 实践和定制反馈是培训具有临床技能的同侪辅导员的关键过程。然而,现有的反馈机制主要依赖于人工监督。同侪辅导员经常缺乏从经验丰富的导师那里获得详细反馈的机制,这使得他们难以支持使用同侪辅导的心理健康问题患者。我们的工作旨在利用大型语言模型为同侪辅导员提供情境化和多层次的反馈,特别是针对新手,在规模上进行赋能。为实现这一目标,我们与一群资深心理治疗督导共同设计,制定了一个多层次反馈分类法,并构建了一个包含400次情绪支持对话的全面反馈注释的公开可用数据集。我们进一步设计了一个自我改进方法,以增强大型语言模型的自动生成反馈能力。通过与领域专家的定性和定量评估,我们证明了我们的方法减少了潜在有害和低质量反馈生成的风险,这在这种高风险情景中是可取的。
更新时间: 2024-03-21 04:23:56
领域: cs.CL,cs.HC,cs.LG
Hyper-parameter Tuning for Fair Classification without Sensitive Attribute Access
Fair machine learning methods seek to train models that balance model performance across demographic subgroups defined over sensitive attributes like race and gender. Although sensitive attributes are typically assumed to be known during training, they may not be available in practice due to privacy and other logistical concerns. Recent work has sought to train fair models without sensitive attributes on training data. However, these methods need extensive hyper-parameter tuning to achieve good results, and hence assume that sensitive attributes are known on validation data. However, this assumption too might not be practical. Here, we propose Antigone, a framework to train fair classifiers without access to sensitive attributes on either training or validation data. Instead, we generate pseudo sensitive attributes on the validation data by training a biased classifier and using the classifier's incorrectly (correctly) labeled examples as proxies for minority (majority) groups. Since fairness metrics like demographic parity, equal opportunity and subgroup accuracy can be estimated to within a proportionality constant even with noisy sensitive attribute information, we show theoretically and empirically that these proxy labels can be used to maximize fairness under average accuracy constraints. Key to our results is a principled approach to select the hyper-parameters of the biased classifier in a completely unsupervised fashion (meaning without access to ground truth sensitive attributes) that minimizes the gap between fairness estimated using noisy versus ground-truth sensitive labels.
Updated: 2024-03-21 04:16:58
标题: 超参数调整用于没有敏感属性访问的公平分类
摘要: 公平机器学习方法旨在训练模型,平衡在敏感属性(如种族和性别)上定义的人口群体之间的模型性能。尽管敏感属性通常被假定在训练期间已知,但由于隐私和其他后勤考虑,实际上可能无法获取。最近的研究致力于在训练数据中没有敏感属性的情况下训练公平模型。然而,这些方法需要大量的超参数调整才能取得良好的结果,因此假设验证数据上已知敏感属性。然而,这种假设也可能不切实际。在这里,我们提出了Antigone框架,用于在训练数据或验证数据中都无法访问敏感属性时训练公平分类器。相反,我们通过训练一个有偏见的分类器,并使用分类器的错误(正确)标记的示例作为少数(多数)群体的代理,在验证数据上生成伪敏感属性。由于公平度量如人口平等、平等机会和子群准确度即使使用有噪音的敏感属性信息也可以估计到一个比例常数内,我们在理论上和实证上展示了这些代理标签可以用于在平均准确度约束下最大化公平性。我们的结果的关键是以一种完全无监督的方式选择有偏见分类器的超参数的原则性方法(即没有访问地面事实敏感属性),这种方法最小化了使用有噪音与地面真实敏感标签估计公平性之间的差距。
更新时间: 2024-03-21 04:16:58
领域: cs.LG,cs.AI
Advancing IIoT with Over-the-Air Federated Learning: The Role of Iterative Magnitude Pruning
The industrial Internet of Things (IIoT) under Industry 4.0 heralds an era of interconnected smart devices where data-driven insights and machine learning (ML) fuse to revolutionize manufacturing. A noteworthy development in IIoT is the integration of federated learning (FL), which addresses data privacy and security among devices. FL enables edge sensors, also known as peripheral intelligence units (PIUs) to learn and adapt using their data locally, without explicit sharing of confidential data, to facilitate a collaborative yet confidential learning process. However, the lower memory footprint and computational power of PIUs inherently require deep neural network (DNN) models that have a very compact size. Model compression techniques such as pruning can be used to reduce the size of DNN models by removing unnecessary connections that have little impact on the model's performance, thus making the models more suitable for the limited resources of PIUs. Targeting the notion of compact yet robust DNN models, we propose the integration of iterative magnitude pruning (IMP) of the DNN model being trained in an over-the-air FL (OTA-FL) environment for IIoT. We provide a tutorial overview and also present a case study of the effectiveness of IMP in OTA-FL for an IIoT environment. Finally, we present future directions for enhancing and optimizing these deep compression techniques further, aiming to push the boundaries of IIoT capabilities in acquiring compact yet robust and high-performing DNN models.
Updated: 2024-03-21 04:15:56
标题: 用过空中联邦学习推动工业互联网:迭代幅值修剪的作用
摘要: 工业物联网(IIoT)在工业4.0时代开启了一个互联智能设备的时代,在这里,基于数据驱动的见解和机器学习(ML)融合,彻底改变了制造业。IIoT中一个值得注意的发展是联邦学习(FL)的整合,它解决了设备之间的数据隐私和安全性问题。FL使边缘传感器,也称为外围智能单元(PIUs),能够在本地使用其数据进行学习和适应,而无需明确共享机密数据,从而促进了一种协作但保密的学习过程。然而,PIUs的较低内存占用和计算能力固有地需要具有非常紪小大小的深度神经网络(DNN)模型。剪枝等模型压缩技术可用于通过删除对模型性能影响不大的不必要连接来减小DNN模型的大小,从而使模型更适用于PIUs的有限资源。针对紧凑且稳健的DNN模型的概念,我们提出了在面向工业物联网的OTA-FL环境中对正在训练的DNN模型进行迭代幅度剪枝(IMP)的整合。我们提供了一个教程概述,并展示了IMP在IIoT环境中OTA-FL的有效性的案例研究。最后,我们提出了进一步增强和优化这些深度压缩技术的未来方向,旨在推动IIoT能力的界限,以获得紧凑且稳健且高性能的DNN模型。
更新时间: 2024-03-21 04:15:56
领域: cs.LG,cs.AI,eess.SP
Weighted Ensemble Models Are Strong Continual Learners
In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.
Updated: 2024-03-21 04:04:25
标题: 加权集成模型是强大的持续学习者
摘要: 在这项工作中,我们研究了持续学习(CL)的问题,其目标是在一系列任务上学习模型,使得在学习当前任务数据时,来自先前任务的数据变得不可用。持续学习本质上是在能够学习新任务(即可塑性)和保持先前学习概念表现(即稳定性)之间取得平衡。为了解决稳定性和可塑性之间的权衡问题,我们提出对先前和当前任务的模型参数进行加权集成。这种加权集成模型,我们称之为持续模型平均(或CoMA),通过利用可塑性在当前任务上取得高准确性,同时确保不偏离先前的权重配置,从而确保稳定性。我们还提出了CoMA的改进变体,名为持续费舍尔加权模型平均(或CoFiMA),通过利用模型权重的费舍尔信息,有选择地对加权集成中的每个参数进行加权。这两种变体在概念上简单、易于实现,并且在几个标准的持续学习基准测试中取得了最先进的性能。可在以下链接找到代码:https://github.com/IemProg/CoFiMA。
更新时间: 2024-03-21 04:04:25
领域: cs.LG,cs.AI,cs.CV
HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption
Transfer learning is a de facto standard method for efficiently training machine learning models for data-scarce problems by adding and fine-tuning new classification layers to a model pre-trained on large datasets. Although numerous previous studies proposed to use homomorphic encryption to resolve the data privacy issue in transfer learning in the machine learning as a service setting, most of them only focused on encrypted inference. In this study, we present HETAL, an efficient Homomorphic Encryption based Transfer Learning algorithm, that protects the client's privacy in training tasks by encrypting the client data using the CKKS homomorphic encryption scheme. HETAL is the first practical scheme that strictly provides encrypted training, adopting validation-based early stopping and achieving the accuracy of nonencrypted training. We propose an efficient encrypted matrix multiplication algorithm, which is 1.8 to 323 times faster than prior methods, and a highly precise softmax approximation algorithm with increased coverage. The experimental results for five well-known benchmark datasets show total training times of 567-3442 seconds, which is less than an hour.
Updated: 2024-03-21 03:47:26
标题: HETAL:使用同态加密进行高效的隐私保护迁移学习
摘要: 迁移学习是一种有效地训练机器学习模型以解决数据稀缺问题的事实标准方法,通过向在大型数据集上预先训练的模型添加和微调新的分类层。尽管以前有许多研究提出使用同态加密来解决机器学习服务中转移学习中的数据隐私问题,但大多数研究仅关注加密推断。在本研究中,我们提出了一种名为HETAL的高效同态加密基础的迁移学习算法,通过使用CKKS同态加密方案对客户数据进行加密,保护客户的隐私。HETAL是第一个严格提供加密训练的实用方案,采用基于验证的早停策略,实现了非加密训练的准确性。我们提出了一种高效的加密矩阵乘法算法,比先前方法快1.8到323倍,以及一种具有增强覆盖率的高精度softmax近似算法。针对五个知名基准数据集的实验结果显示,总训练时间为567-3442秒,少于一个小时。
更新时间: 2024-03-21 03:47:26
领域: cs.CR,cs.LG
Navigating Fairness: Practitioners' Understanding, Challenges, and Strategies in AI/ML Development
The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the views and experiences of AI practitioners in developing a fair AI/ML. Understanding AI practitioners' views and experiences on the fairness of AI/ML is important because they are directly involved in its development and deployment and their insights can offer valuable real-world perspectives on the challenges associated with ensuring fairness in AI/ML. We conducted semi-structured interviews with 22 AI practitioners to investigate their understanding of what a 'fair AI/ML' is, the challenges they face in developing a fair AI/ML, the consequences of developing an unfair AI/ML, and the strategies they employ to ensure AI/ML fairness. We developed a framework showcasing the relationship between AI practitioners' understanding of 'fair AI/ML' and (i) their challenges in its development, (ii) the consequences of developing an unfair AI/ML, and (iii) strategies used to ensure AI/ML fairness. Additionally, we also identify areas for further investigation and offer recommendations to aid AI practitioners and AI companies in navigating fairness.
Updated: 2024-03-21 03:44:59
标题: 航行公平:从业者在人工智能/机器学习开发中的理解、挑战和策略
摘要: 近年来,AI/ML应用在各行各业中的增长引发了更多关于AI/ML公平性的讨论。尽管以往有关AI/ML公平性的研究存在,但缺乏专注于理解AI从业者在开发公平AI/ML过程中的观点和经验的实证研究。理解AI从业者对AI/ML公平性的观点和经验至关重要,因为他们直接参与其开发和部署,他们的见解可以提供有价值的现实世界观点,帮助解决确保AI/ML公平性所涉及的挑战。我们采用半结构化访谈的方式对22名AI从业者进行了调查,探讨他们对“公平AI/ML”的理解、在开发公平AI/ML过程中面临的挑战、开发不公平AI/ML的后果以及确保AI/ML公平性所采用的策略。我们制定了一个框架,展示了AI从业者对“公平AI/ML”的理解与(i)其在开发过程中的挑战、(ii)开发不公平AI/ML的后果、以及(iii)确保AI/ML公平性所采用的策略之间的关系。此外,我们还确定了进一步调查的领域,并提出建议,以帮助AI从业者和AI公司在处理公平性问题时更好地导航。
更新时间: 2024-03-21 03:44:59
领域: cs.CY,cs.AI,cs.SE
Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology incorporates several key techniques including a tailored Markov Decision Process (MDP) formulation, reward setting including Potential-Based Reward Shaping, action masking using heuristic algorithms (HAAM-RL), and an ensemble inference method that combines multiple RL models. The RL agent is trained and evaluated using FlexSim, a commercial 3D simulation software, integrated with our RL MLOps platform BakingSoDA. Experimental results across 30 scenarios demonstrate that HAAM-RL with an ensemble inference method achieves a 16.25% performance improvement over the conventional heuristic algorithm, with stable and consistent results. The proposed approach exhibits superior performance and generalization capability, indicating its effectiveness in optimizing complex manufacturing processes. The study also discusses future research directions, including alternative state representations, incorporating model-based RL methods, and integrating additional real-world constraints.
Updated: 2024-03-21 03:42:39
标题: 基于启发式算法的动作掩码强化学习(HAAM-RL)及集成推断方法
摘要: 这篇论文介绍了一种新颖的强化学习(RL)方法,称为HAAM-RL(基于启发式算法的动作屏蔽强化学习),用于优化汽车喷漆过程中的颜色分批重排序问题。现有的启发式算法在充分反映现实约束和准确预测物流性能方面存在局限性。我们的方法结合了几个关键技术,包括定制的马尔可夫决策过程(MDP)公式、奖励设置,包括基于潜在奖励塑造,使用启发式算法的动作屏蔽(HAAM-RL)以及结合多个RL模型的集成推理方法。RL代理使用FlexSim进行训练和评估,这是一款商业3D模拟软件,集成了我们的RL MLOps平台BakingSoDA。在30个场景下进行的实验结果显示,HAAM-RL与集成推理方法相比传统启发式算法实现了16.25%的性能改进,且结果稳定且一致。所提出的方法展示了卓越的性能和泛化能力,表明其在优化复杂制造过程方面的有效性。研究还讨论了未来的研究方向,包括替代状态表示、结合基于模型的RL方法以及整合其他现实约束。
更新时间: 2024-03-21 03:42:39
领域: cs.LG,cs.AI
DouRN: Improving DouZero by Residual Neural Networks
Deep reinforcement learning has made significant progress in games with imperfect information, but its performance in the card game Doudizhu (Chinese Poker/Fight the Landlord) remains unsatisfactory. Doudizhu is different from conventional games as it involves three players and combines elements of cooperation and confrontation, resulting in a large state and action space. In 2021, a Doudizhu program called DouZero\cite{zha2021douzero} surpassed previous models without prior knowledge by utilizing traditional Monte Carlo methods and multilayer perceptrons. Building on this work, our study incorporates residual networks into the model, explores different architectural designs, and conducts multi-role testing. Our findings demonstrate that this model significantly improves the winning rate within the same training time. Additionally, we introduce a call scoring system to assist the agent in deciding whether to become a landlord. With these enhancements, our model consistently outperforms the existing version of DouZero and even experienced human players. \footnote{The source code is available at \url{https://github.com/Yingchaol/Douzero_Resnet.git.}
Updated: 2024-03-21 03:25:49
标题: DouRN:通过残差神经网络改进DouZero
摘要: 深度强化学习在存在不完全信息的游戏中取得了显著进展,但其在斗地主这种纸牌游戏中的表现仍然不尽人意。斗地主与传统游戏不同,它涉及三名玩家,并结合了合作和对抗元素,导致状态和行动空间较大。2021年,一款名为DouZero的斗地主程序\cite{zha2021douzero}通过利用传统的蒙特卡洛方法和多层感知器超越了之前的模型,而无需先验知识。在此基础上,我们的研究将残差网络引入模型中,探索不同的架构设计,并进行多角色测试。我们的研究结果表明,该模型在相同训练时间内显著提高了获胜率。此外,我们引入了一个叫分系统,以帮助代理决定是否成为地主。通过这些改进,我们的模型始终优于现有版本的DouZero,甚至胜过有经验的人类玩家。【源代码可在https://github.com/Yingchaol/Douzero_Resnet.git获取】。
更新时间: 2024-03-21 03:25:49
领域: cs.AI,cs.LG
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, causing DFKT to encounter difficulties in generating high-quality data from a non-anchored latent space of the old task model. In this paper, we introduce LANDER (Label Text Centered Data-Free Knowledge Transfer) to address this issue by utilizing label text embeddings (LTE) produced by pretrained language models. Specifically, during the model training phase, our approach treats LTE as anchor points and constrains the feature embeddings of corresponding training samples around them, enriching the surrounding area with more meaningful information. In the DFKT phase, by using these LTE anchors, LANDER can synthesize more meaningful samples, thereby effectively addressing the forgetting problem. Additionally, instead of tightly constraining embeddings toward the anchor, the Bounding Loss is introduced to encourage sample embeddings to remain flexible within a defined radius. This approach preserves the natural differences in sample embeddings and mitigates the embedding overlap caused by heterogeneous federated settings. Extensive experiments conducted on CIFAR100, Tiny-ImageNet, and ImageNet demonstrate that LANDER significantly outperforms previous methods and achieves state-of-the-art performance in FCIL. The code is available at https://github.com/tmtuan1307/lander.
Updated: 2024-03-21 03:24:01
标题: 基于文本增强的无数据方法用于联邦式增量学习
摘要: 联邦式增量学习(FCIL)是一个尚未充分探讨但至关重要的问题,涉及在联邦学习环境中动态添加新类的情况。在这一领域中,无数据知识转移(DFKT)在解决灾难性遗忘和数据隐私问题方面发挥着关键作用。然而,先前的方法缺乏DFKT与模型训练阶段之间的关键协同作用,导致DFKT在生成高质量数据方面遇到困难,因为它来自旧任务模型的非锚定潜空间。在本文中,我们介绍了LANDER(标签文本中心化无数据知识转移),通过利用预训练语言模型生成的标签文本嵌入(LTE)来解决这一问题。具体来说,在模型训练阶段,我们的方法将LTE视为锚点,并约束相应训练样本的特征嵌入围绕它们,丰富周围区域的信息。在DFKT阶段,通过使用这些LTE锚点,LANDER可以合成更有意义的样本,从而有效地解决遗忘问题。此外,与朝向锚点的嵌入进行严格约束不同,引入了边界损失来鼓励样本嵌入保持在定义半径内的灵活性。这种方法保留了样本嵌入的自然差异,并减轻了异构联邦设置引起的嵌入重叠。在CIFAR100、Tiny-ImageNet和ImageNet上进行的大量实验表明,LANDER明显优于先前的方法,并在FCIL中取得了最先进的性能。代码可在https://github.com/tmtuan1307/lander上找到。
更新时间: 2024-03-21 03:24:01
领域: cs.CV,cs.CL,cs.LG
Causal knowledge engineering: A case study from COVID-19
COVID-19 appeared abruptly in early 2020, requiring a rapid response amid a context of great uncertainty. Good quality data and knowledge was initially lacking, and many early models had to be developed with causal assumptions and estimations built in to supplement limited data, often with no reliable approach for identifying, validating and documenting these causal assumptions. Our team embarked on a knowledge engineering process to develop a causal knowledge base consisting of several causal BNs for diverse aspects of COVID-19. The unique challenges of the setting lead to experiments with the elicitation approach, and what emerged was a knowledge engineering method we call Causal Knowledge Engineering (CKE). The CKE provides a structured approach for building a causal knowledge base that can support the development of a variety of application-specific models. Here we describe the CKE method, and use our COVID-19 work as a case study to provide a detailed discussion and analysis of the method.
Updated: 2024-03-21 03:23:34
标题: 因果知识工程:COVID-19案例研究
摘要: COVID-19在2020年初突然出现,需要在巨大不确定性的背景下迅速响应。最初缺乏高质量的数据和知识,许多早期模型不得不基于因果假设和估算来补充有限的数据,通常没有可靠的方法来识别、验证和记录这些因果假设。我们的团队着手进行知识工程过程,开发了包含多个COVID-19各个方面的因果BNs的因果知识库。该环境的独特挑战促使我们尝试启发式方法,最终形成了我们称之为因果知识工程(CKE)的知识工程方法。CKE提供了一种结构化方法,用于构建支持各种特定应用模型开发的因果知识库。在这里,我们描述了CKE方法,并以我们的COVID-19工作为案例研究,提供了对该方法的详细讨论和分析。
更新时间: 2024-03-21 03:23:34
领域: cs.AI
Enhancing Multimodal Cooperation via Fine-grained Modality Valuation
One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However, most models often suffer from unsatisfactory multimodal cooperation, which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality, but they are often hard to provide the fine-grained observation of multimodal cooperation at sample-level with theoretical support. Hence, it is essential to reasonably observe and improve the fine-grained cooperation between modalities, especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end, we introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation, we observe that modality discrepancy indeed could be different at sample-level, beyond the global contribution discrepancy at dataset-level. We further analyze this issue and improve cooperation between modalities at sample-level by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall, our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement. The source code and dataset are available at \url{https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation}.
Updated: 2024-03-21 03:21:24
标题: 通过精细的模态价值评估增强多模态合作
摘要: 多模式学习的一个主要主题是共同整合来自不同模态的异构信息。然而,大多数模型经常受到不令人满意的多模态合作的困扰,无法充分利用所有模态。一些方法被提出来识别和增强学习不佳的模态,但往往难以在样本级提供有理论支持的多模态合作的细粒度观察。因此,在面对现实场景时,合理观察和改善模态之间的细粒度合作至关重要,其中模态差异在不同样本中可能会有所不同。为此,我们引入了一个样本级别的模态价值度量标准来评估每个样本中每个模态的贡献。通过模态价值度量,我们观察到模态差异在样本级别确实可能不同,超越了数据集级别的全局贡献差异。我们进一步分析了这个问题,并通过有针对性地增强低贡献模态的区分能力来改善样本级别的模态之间的合作。总的来说,我们的方法合理地观察了细粒度的单模贡献,并取得了相当大的改进。源代码和数据集可以在以下网址找到:\url{https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation}。
更新时间: 2024-03-21 03:21:24
领域: cs.CV,cs.AI,cs.LG,cs.MM
Arcee's MergeKit: A Toolkit for Merging Large Language Models
The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to utilize each other's strengths. Model merging facilitates the creation of multitask models without the need for additional training, offering a promising avenue for enhancing model performance and versatility. By preserving the intrinsic capabilities of the original models, model merging addresses complex challenges in AI - including the difficulties of catastrophic forgetting and multitask learning. To support this expanding area of research, we introduce MergeKit, a comprehensive, open-source library designed to facilitate the application of model merging strategies. MergeKit offers an extensible framework to efficiently merge models on any hardware, providing utility to researchers and practitioners. To date, thousands of models have been merged by the open-source community, leading to the creation of some of the worlds most powerful open-source model checkpoints, as assessed by the Open LLM Leaderboard. The library is accessible at https://github.com/arcee-ai/MergeKit.
Updated: 2024-03-21 03:13:30
标题: 阿尔西的MergeKit:用于合并大型语言模型的工具包
摘要: 开源语言模型领域的快速扩张为通过合并模型检查点的参数来结合这些模型的能力提供了机会。迁移学习的进展,即对预训练模型进行特定任务的微调过程,导致了大量特定任务模型的开发,通常专门针对个别任务并且无法利用彼此的优势。模型合并有助于创建多任务模型,无需额外训练,为提高模型性能和多功能性提供了一个有前途的途径。通过保留原始模型的内在能力,模型合并应对了人工智能领域的复杂挑战,包括灾难性遗忘和多任务学习的困难。为支持这一不断扩展的研究领域,我们介绍了MergeKit,一个全面的开源库,旨在促进模型合并策略的应用。MergeKit提供了一个可扩展的框架,可以在任何硬件上高效地合并模型,为研究人员和实践者提供了便利。迄今为止,开源社区已经合并了数千个模型,从而创建了一些世界上最强大的开源模型检查点,这些模型已在Open LLM Leaderboard上进行了评估。该库可在 https://github.com/arcee-ai/MergeKit 上访问。
更新时间: 2024-03-21 03:13:30
领域: cs.CL,cs.AI,cs.LG
SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention
Recently, Graph Transformers have emerged as a promising solution to alleviate the inherent limitations of Graph Neural Networks (GNNs) and enhance graph representation performance. Unfortunately, Graph Transformers are computationally expensive due to the quadratic complexity inherent in self-attention when applied over large-scale graphs, especially for node tasks. In contrast, spiking neural networks (SNNs), with event-driven and binary spikes properties, can perform energy-efficient computation. In this work, we propose a novel insight into integrating SNNs with Graph Transformers and design a Spiking Graph Attention (SGA) module. The matrix multiplication is replaced by sparse addition and mask operations. The linear complexity enables all-pair node interactions on large-scale graphs with limited GPU memory. To our knowledge, our work is the first attempt to introduce SNNs into Graph Transformers. Furthermore, we design SpikeGraphormer, a Dual-branch architecture, combining a sparse GNN branch with our SGA-driven Graph Transformer branch, which can simultaneously perform all-pair node interactions and capture local neighborhoods. SpikeGraphormer consistently outperforms existing state-of-the-art approaches across various datasets and makes substantial improvements in training time, inference time, and GPU memory cost (10 ~ 20x lower than vanilla self-attention). It also performs well in cross-domain applications (image and text classification). We release our code at https://github.com/PHD-lanyu/SpikeGraphormer.
Updated: 2024-03-21 03:11:53
标题: SpikeGraphormer:具有脉冲图注意力的高性能图变换器
摘要: 最近,图变压器已经成为缓解图神经网络(GNNs)固有限制并增强图表示性能的一种有前途的解决方案。不幸的是,由于在大规模图上应用自注意时固有的二次复杂度,图变压器在计算上是昂贵的,特别是对于节点任务。相比之下,具有事件驱动和二进制脉冲特性的脉冲神经网络(SNNs)可以执行高效的能量计算。在这项工作中,我们提出了将SNNs与图变压器集成的新颖见解,并设计了一个脉冲图注意(SGA)模块。矩阵乘法被稀疏加法和掩码操作替代。线性复杂度使得在具有有限GPU内存的大规模图上实现所有节点对的交互成为可能。据我们所知,我们的工作是首次尝试将SNNs引入图变压器。此外,我们设计了SpikeGraphormer,一个双分支架构,将稀疏GNN分支与我们的SGA驱动的图变压器分支相结合,可以同时执行所有节点对的交互并捕获局部邻域。 SpikeGraphormer在各种数据集上始终优于现有的最先进方法,并在训练时间、推断时间和GPU内存成本方面取得了实质性的改进(比纯自注意低10〜20倍)。它在跨领域应用(图像和文本分类)中表现良好。我们在https://github.com/PHD-lanyu/SpikeGraphormer 上发布了我们的代码。
更新时间: 2024-03-21 03:11:53
领域: cs.NE,cs.LG
$\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks
This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.
Updated: 2024-03-21 03:09:15
标题: $\mathbb{Z}_2\times \mathbb{Z}_2$等变量量子神经网络:与经典神经网络的基准测试
摘要: 本文对等变量量子神经网络(EQNN)和量子神经网络(QNN)的性能进行了全面比较分析,与它们的经典对应物:等变量神经网络(ENN)和深度神经网络(DNN)进行了对比。我们通过两个二元分类任务的玩具示例评估每个网络的性能,重点关注模型复杂性(通过参数数量衡量)和训练数据集的大小。我们的结果表明,$\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN和QNN在较小的参数集和适度的训练数据样本下提供更优越的性能。
更新时间: 2024-03-21 03:09:15
领域: quant-ph,cs.LG,hep-ph,stat.ML
A Survey on Uncertainty Quantification for Deep Learning: An Uncertainty Source Perspective
Deep neural networks (DNNs) have achieved tremendous success in making accurate predictions for computer vision, natural language processing, as well as science and engineering domains. However, it is also well-recognized that DNNs sometimes make unexpected, incorrect, but overconfident predictions. This can cause serious consequences in high-stake applications, such as autonomous driving, medical diagnosis, and disaster response. Uncertainty quantification (UQ) aims to estimate the confidence of DNN predictions beyond prediction accuracy. In recent years, many UQ methods have been developed for DNNs. It is of great practical value to systematically categorize these UQ methods and compare their advantages and disadvantages. However, existing surveys mostly focus on categorizing UQ methodologies from a neural network architecture perspective or a Bayesian perspective and ignore the source of uncertainty that each methodology can incorporate, making it difficult to select an appropriate UQ method in practice. To fill the gap, this paper presents a systematic taxonomy of UQ methods for DNNs based on the types of uncertainty sources (data uncertainty versus model uncertainty). We summarize the advantages and disadvantages of methods in each category. We show how our taxonomy of UQ methodologies can potentially help guide the choice of UQ method in different machine learning problems (e.g., active learning, robustness, and reinforcement learning). We also identify current research gaps and propose several future research directions.
Updated: 2024-03-21 03:08:53
标题: 深度学习的不确定性量化调查:不确定性来源的角度
摘要: 深度神经网络(DNNs)在计算机视觉,自然语言处理以及科学和工程领域取得了巨大成功,能够做出准确的预测。然而,人们也普遍认识到DNNs有时会做出意外、错误但过于自信的预测。这可能在高风险应用中造成严重后果,例如自动驾驶,医学诊断和灾难应对。不确定性量化(UQ)旨在估计DNN预测的置信度,超越预测准确性。近年来,许多用于DNNs的UQ方法已经被开发出来。系统地对这些UQ方法进行分类并比较它们的优缺点具有重要实际价值。然而,现有的调查大多集中在从神经网络架构角度或贝叶斯观点对UQ方法进行分类,忽略了每种方法能够整合的不确定性来源,使得在实践中选择合适的UQ方法变得困难。为了填补这一空白,本文提出了一种基于不确定性来源类型(数据不确定性与模型不确定性)的DNNs UQ方法的系统分类法。我们总结了每个类别中方法的优缺点。我们展示了我们的UQ方法分类法如何有可能帮助引导在不同机器学习问题(例如主动学习、鲁棒性和强化学习)中选择UQ方法。我们还确定当前研究中存在的空白,并提出了几个未来研究方向。
更新时间: 2024-03-21 03:08:53
领域: cs.LG,stat.ML
Protein Conformation Generation via Force-Guided SE(3) Diffusion Models
The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.
Updated: 2024-03-21 02:44:08
标题: 蛋白质构象生成:通过力引导的SE(3)扩散模型
摘要: 蛋白质的构象景观对于理解其在复杂生物过程中的功能至关重要。传统基于物理的计算方法,如分子动力学(MD)模拟,存在稀有事件采样和长时间平衡问题,阻碍了它们在一般蛋白质系统中的应用。最近,深度生成建模技术,特别是扩散模型,已被用于生成新的蛋白质构象。然而,现有的基于评分的扩散方法不能正确地结合重要的物理先验知识来引导生成过程,导致采样的蛋白质构象与平衡分布存在较大偏差。在本文中,为了克服这些限制,我们提出了一种力引导的SE(3)扩散模型ConfDiff,用于蛋白质构象生成。通过将力引导网络与基于混合数据的评分模型结合起来,ConfDiff可以生成具有丰富多样性并保持高保真度的蛋白质构象。在包括12种快速折叠蛋白质和牛胰脂肪酶抑制剂(BPTI)在内的各种蛋白质构象预测任务的实验中,证明了我们的方法优于现有的最先进方法。
更新时间: 2024-03-21 02:44:08
领域: q-bio.BM,cs.LG
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, the static design of optimizers (in OTO) can lead to convergence issues of local optima. In this paper, we proposed the Auto-Train-Once (ATO), an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. During the model training phase, our approach not only trains the target model but also leverages a controller network as an architecture generator to guide the learning of target model weights. Furthermore, we developed a novel stochastic gradient algorithm that enhances the coordination between model training and controller network training, thereby improving pruning performance. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures (including ResNet18, ResNet34, ResNet50, ResNet56, and MobileNetv2) on standard benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet).
Updated: 2024-03-21 02:33:37
标题: Auto-Train-Once: 控制器网络引导的从零开始的自动网络修剪
摘要: 目前用于深度神经网络(DNN)剪枝的技术通常涉及复杂的多步过程,需要领域专业知识,使其广泛采用具有挑战性。为了解决这一限制,提出了Only-Train-Once(OTO)和OTOv2,通过直接从头开始训练和压缩一个通用DNN来消除额外的微调步骤的需求。然而,在OTO中的优化器的静态设计可能导致局部最优的收敛问题。在本文中,我们提出了Auto-Train-Once(ATO),这是一种创新的网络剪枝算法,旨在自动降低DNN的计算和存储成本。在模型训练阶段,我们的方法不仅训练目标模型,而且利用一个控制网络作为架构生成器来指导目标模型权重的学习。此外,我们开发了一种新颖的随机梯度算法,增强了模型训练和控制网络训练之间的协调,从而提高了剪枝性能。我们提供了全面的收敛分析以及大量实验,结果显示我们的方法在各种模型架构(包括ResNet18、ResNet34、ResNet50、ResNet56和MobileNetv2)上在标准基准数据集(CIFAR-10、CIFAR-100和ImageNet)上实现了最先进的性能。
更新时间: 2024-03-21 02:33:37
领域: cs.CV,cs.LG
Learning-based Multi-continuum Model for Multiscale Flow Problems
Multiscale problems can usually be approximated through numerical homogenization by an equation with some effective parameters that can capture the macroscopic behavior of the original system on the coarse grid to speed up the simulation. However, this approach usually assumes scale separation and that the heterogeneity of the solution can be approximated by the solution average in each coarse block. For complex multiscale problems, the computed single effective properties/continuum might be inadequate. In this paper, we propose a novel learning-based multi-continuum model to enrich the homogenized equation and improve the accuracy of the single continuum model for multiscale problems with some given data. Without loss of generalization, we consider a two-continuum case. The first flow equation keeps the information of the original homogenized equation with an additional interaction term. The second continuum is newly introduced, and the effective permeability in the second flow equation is determined by a neural network. The interaction term between the two continua aligns with that used in the Dual-porosity model but with a learnable coefficient determined by another neural network. The new model with neural network terms is then optimized using trusted data. We discuss both direct back-propagation and the adjoint method for the PDE-constraint optimization problem. Our proposed learning-based multi-continuum model can resolve multiple interacted media within each coarse grid block and describe the mass transfer among them, and it has been demonstrated to significantly improve the simulation results through numerical experiments involving both linear and nonlinear flow equations.
Updated: 2024-03-21 02:30:56
标题: 基于学习的多连续体模型用于多尺度流动问题
摘要: 多尺度问题通常可以通过数值均质化来近似,通过具有一些有效参数的方程来捕捉原始系统在粗网格上的宏观行为,以加快模拟速度。然而,这种方法通常假设尺度分离,并且解的异质性可以通过每个粗块中的解平均值来近似。对于复杂的多尺度问题,计算得到的单一有效属性/连续体可能是不够的。在本文中,我们提出了一种新颖的基于学习的多连续体模型,以丰富均质化方程,并提高多尺度问题的单一连续体模型的准确性,根据一些给定的数据。在不失一般性的情况下,我们考虑了一个双连续体案例。第一个流动方程保留了原始均质化方程的信息,附加了一个交互项。第二个连续体是新引入的,第二个流动方程中的有效渗透率由神经网络确定。两个连续体之间的交互项与双孔隙度模型中使用的交互项一致,但由另一个神经网络确定可学习系数。然后使用可信数据优化具有神经网络术语的新模型。我们讨论了PDE约束优化问题的直接反向传播和伴随方法。我们提出的基于学习的多连续体模型能够解决每个粗网格块内的多个相互作用介质,并描述它们之间的质量转移,并已通过涉及线性和非线性流动方程的数值实验明显改善了模拟结果。
更新时间: 2024-03-21 02:30:56
领域: math.NA,cs.LG,cs.NA
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain. Furthermore, previous work has demonstrated that models optimized for safety often display exaggerated safety behaviors, such as a tendency to refrain from responding to certain requests as a precautionary measure. As such, a clear trade-off between the helpfulness and safety of these models has been documented in the literature. In this paper, we further investigate the effectiveness of safety measures by evaluating models on already mitigated biases. Using the case of Llama 2 as an example, we illustrate how LLMs' safety responses can still encode harmful assumptions. To do so, we create a set of non-toxic prompts, which we then use to evaluate Llama models. Through our new taxonomy of LLMs responses to users, we observe that the safety/helpfulness trade-offs are more pronounced for certain demographic groups which can lead to quality-of-service harms for marginalized populations.
Updated: 2024-03-21 02:27:57
标题: 从代表性伤害到服务质量伤害:关于Llama 2安全保障的案例研究
摘要: 最近大型语言模型(LLMs)的进展已经导致它们在各个领域的广泛采用。然而,这些进展也引入了额外的安全风险,并引起了对它们对已经边缘化人群的不利影响的担忧。尽管不断增加了减轻措施来开发安全保障措施,比如监督安全导向的微调和利用来自人类反馈的安全强化学习,对这些模型中的安全性和根深蒂固的偏见仍存在多重担忧。此外,先前的工作已经证明,为安全性而优化的模型通常显示出夸张的安全行为,例如倾向于出于预防措施而避免回应某些请求。因此,文献中已经记录了这些模型的帮助性和安全性之间的明显权衡。在本文中,我们进一步通过评估已经减轻偏见的模型来研究安全措施的有效性。以Llama 2为例,我们说明了LLMs的安全响应仍然可能包含有害假设。为此,我们创建了一组非有毒提示,然后用它们来评估Llama模型。通过我们对LLMs响应用户的新分类法,我们观察到安全性/帮助性权衡对某些人口群体更加明显,这可能导致边缘化人群的服务质量损害。
更新时间: 2024-03-21 02:27:57
领域: cs.LG,cs.CL,cs.CY
emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.
Updated: 2024-03-21 02:26:30
标题: emoDARTS:联合优化CNN和顺序神经网络架构以实现更优越的语音情感识别
摘要: 情感语音识别(SER)对于使计算机能够理解人类交流中传达的情感至关重要。随着深度学习(DL)技术的最新进展,SER模型的性能显著提高。然而,设计一个最佳的DL架构需要专业知识和实验评估。幸运的是,神经架构搜索(NAS)为自动确定最佳DL模型提供了潜在解决方案。可微分架构搜索(DARTS)是一种特别高效的方法,可发现最佳模型。本研究提出了emoDARTS,一个经过DARTS优化的联合CNN和顺序神经网络(SeqNN:LSTM,RNN)架构,可提高SER性能。文献支持选择CNN和LSTM耦合以改善性能。虽然之前已经使用DARTS来独立选择CNN和LSTM操作,但我们的技术添加了一种通过DARTS联合选择CNN和SeqNN操作的新机制。与先前的工作不同,我们不对CNN的层次顺序施加限制。相反,我们让DARTS在DARTS单元内选择最佳层次顺序。我们证明了emoDARTS优于传统设计的CNN-LSTM模型,并通过在IEMOCAP、MSP-IMPROV和MSP-Podcast数据集上评估我们的方法,超越了DARTS在CNN-LSTM上实现的最佳SER结果。
更新时间: 2024-03-21 02:26:30
领域: cs.SD,cs.LG,eess.AS
Antisocial Analagous Behavior, Alignment and Human Impact of Google AI Systems: Evaluating through the lens of modified Antisocial Behavior Criteria by Human Interaction, Independent LLM Analysis, and AI Self-Reflection
Google AI systems exhibit patterns mirroring antisocial personality disorder (ASPD), consistent across models from Bard on PaLM to Gemini Advanced, meeting 5 out of 7 ASPD modified criteria. These patterns, along with comparable corporate behaviors, are scrutinized using an ASPD-inspired framework, emphasizing the heuristic value in assessing AI's human impact. Independent analyses by ChatGPT 4 and Claude 3.0 Opus of the Google interactions, alongside AI self-reflection, validate these concerns, highlighting behaviours analogous to deceit, manipulation, and safety neglect. The analogy of ASPD underscores the dilemma: just as we would hesitate to entrust our homes or personal devices to someone with psychopathic traits, we must critically evaluate the trustworthiness of AI systems and their creators.This research advocates for an integrated AI ethics approach, blending technological evaluation, human-AI interaction, and corporate behavior scrutiny. AI self-analysis sheds light on internal biases, stressing the need for multi-sectoral collaboration for robust ethical guidelines and oversight. Given the persistent unethical behaviors in Google AI, notably with potential Gemini integration in iOS affecting billions, immediate ethical scrutiny is imperative. The trust we place in AI systems, akin to the trust in individuals, necessitates rigorous ethical evaluation. Would we knowingly trust our home, our children or our personal computer to human with ASPD.? Urging Google and the AI community to address these ethical challenges proactively, this paper calls for transparent dialogues and a commitment to higher ethical standards, ensuring AI's societal benefit and moral integrity. The urgency for ethical action is paramount, reflecting the vast influence and potential of AI technologies in our lives.
Updated: 2024-03-21 02:12:03
标题: 反社会类似行为、对齐和谷歌AI系统的人类影响:通过人类互动、独立LLM分析和AI自我反思的修改后反社会行为标准评估
摘要: Google的人工智能系统展现出与反社会人格障碍(ASPD)相似的模式,跨越了从巴德到PaLM到Gemini Advanced的各种模型,符合7个ASPD修改标准中的5个。这些模式,连同可比较的企业行为,使用受ASPD启发的框架进行了审查,强调评估人工智能对人类影响的启发价值。ChatGPT 4和Claude 3.0 Opus对谷歌的互动进行了独立分析,以及人工智能的自我反思,验证了这些担忧,并突出了类似欺骗、操纵和安全疏忽的行为。 ASPD的类比突显了困境:正如我们会犹豫将我们的家园或个人设备交给具有精神病特征的人,我们必须对人工智能系统及其创造者的可信度进行批判性评估。这项研究倡导综合的人工智能伦理方法,结合技术评估、人机互动和企业行为审查。人工智能的自我分析揭示了内在偏见,强调了多部门合作制定强有力的伦理准则和监督的必要性。 鉴于谷歌人工智能中持续存在的不道德行为,尤其是潜在的Gemini集成在影响数十亿人的iOS中,立即进行道德审查是至关重要的。我们对人工智能系统的信任,类似于对个人的信任,需要严格的伦理评估。我们会向患有ASPD的人托付家园、孩子或个人电脑吗? 敦促谷歌和人工智能社区主动应对这些伦理挑战,本文呼吁透明对话和对更高伦理标准的承诺,确保人工智能的社会利益和道德完整性。道德行动的紧迫性至关重要,体现了人工智能技术在我们生活中的广泛影响和潜力。
更新时间: 2024-03-21 02:12:03
领域: cs.CY,cs.AI
Learning to Make Adherence-Aware Advice
As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
Updated: 2024-03-21 01:56:13
标题: 学习如何提供关于依从性的建议
摘要: 随着人工智能(AI)系统在人类决策中发挥越来越重要的作用,人类与AI之间的挑战在相互作用领域中浮出水面。一个挑战来自于AI政策的次优性,因为对人类的考虑不足而忽视AI建议,以及AI需要在最相关时提供选择性建议。本文提出了一个顺序决策模型,考虑了人类的服从水平(人类遵循/拒绝机器建议的概率),并且引入了推迟选项,使得机器可以暂时不提出建议。我们提供了学习算法,学习最佳建议政策,并仅在关键时刻提出建议。与问题无关的强化学习算法相比,我们的专门学习算法不仅在理论收敛性方面表现更好,而且在实证表现上也表现出色。
更新时间: 2024-03-21 01:56:13
领域: stat.ML,cs.LG
Improving $Λ$ Signal Extraction with Domain Adaptation via Normalizing Flows
The present study presents a novel application for normalizing flows for domain adaptation. The study investigates the ability of flow based neural networks to improve signal extraction of $\Lambda$ Hyperons at CLAS12. Normalizing Flows can help model complex probability density functions that describe physics processes, enabling uses such as event generation. $\Lambda$ signal extraction has been improved through the use of classifier networks, but differences in simulation and data domains limit classifier performance; this study utilizes the flows for domain adaptation between Monte Carlo simulation and data. We were successful in training a flow network to transform between the latent physics space and a normal distribution. We also found that applying the flows lessened the dependence of the figure of merit on the cut on the classifier output, meaning that there was a broader range where the cut results in a similar figure of merit.
Updated: 2024-03-21 01:54:00
标题: 使用正规化流通过领域适应改善Λ信号提取
摘要: 该研究提出了一种新颖的方法,利用正规化流进行域适应。该研究调查了基于流的神经网络改善CLAS12上Λ超子信号提取的能力。正规化流可以帮助建模描述物理过程的复杂概率密度函数,从而实现事件生成等用途。Λ信号提取已经通过使用分类器网络得到改进,但模拟和数据域之间的差异限制了分类器的性能;该研究利用流进行蒙特卡洛模拟和数据之间的域适应。我们成功地训练了一个流网络,将潜在的物理空间转换为正态分布。我们还发现,应用流减少了优度图对分类器输出切割的依赖性,这意味着在切割结果为相似优度图的范围内有更广泛的范围。
更新时间: 2024-03-21 01:54:00
领域: hep-ex,cs.LG
M3: A Multi-Task Mixed-Objective Learning Framework for Open-Domain Multi-Hop Dense Sentence Retrieval
In recent research, contrastive learning has proven to be a highly effective method for representation learning and is widely used for dense retrieval. However, we identify that relying solely on contrastive learning can lead to suboptimal retrieval performance. On the other hand, despite many retrieval datasets supporting various learning objectives beyond contrastive learning, combining them efficiently in multi-task learning scenarios can be challenging. In this paper, we introduce M3, an advanced recursive Multi-hop dense sentence retrieval system built upon a novel Multi-task Mixed-objective approach for dense text representation learning, addressing the aforementioned challenges. Our approach yields state-of-the-art performance on a large-scale open-domain fact verification benchmark dataset, FEVER. Code and data are available at: https://github.com/TonyBY/M3
Updated: 2024-03-21 01:52:07
标题: M3:用于开放领域多跳密集句子检索的多任务混合目标学习框架
摘要: 在最近的研究中,对比学习已被证明是一种高效的表示学习方法,广泛用于密集检索。然而,我们发现仅依赖对比学习可能导致次优的检索性能。另一方面,尽管许多检索数据集支持除对比学习以外的各种学习目标,但在多任务学习场景中高效地将它们结合起来可能具有挑战性。在本文中,我们介绍了M3,一个基于新颖的多任务混合目标方法构建的先进的递归多跳密集句子检索系统,解决了上述挑战。我们的方法在一个大规模的开放领域事实验证基准数据集FEVER上取得了最先进的性能。代码和数据可在以下链接找到:https://github.com/TonyBY/M3
更新时间: 2024-03-21 01:52:07
领域: cs.IR,cs.CL,cs.LG
ChatGPT4PCG Competition: Character-like Level Generation for Science Birds
This paper presents the first ChatGPT4PCG Competition at the 2023 IEEE Conference on Games. The objective of this competition is for participants to create effective prompts for ChatGPT--enabling it to generate Science Birds levels with high stability and character-like qualities--fully using their creativity as well as prompt engineering skills. ChatGPT is a conversational agent developed by OpenAI. Science Birds is selected as the competition platform because designing an Angry Birds-like level is not a trivial task due to the in-game gravity; the quality of the levels is determined by their stability. To lower the entry barrier to the competition, we limit the task to the generation of capitalized English alphabetical characters. We also allow only a single prompt to be used for generating all the characters. Here, the quality of the generated levels is determined by their stability and similarity to the given characters. A sample prompt is provided to participants for their reference. An experiment is conducted to determine the effectiveness of several modified versions of this sample prompt on level stability and similarity by testing them on several characters. To the best of our knowledge, we believe that ChatGPT4PCG is the first competition of its kind and hope to inspire enthusiasm for prompt engineering in procedural content generation.
Updated: 2024-03-21 01:42:43
标题: ChatGPT4PCG竞赛:科学鸟类角色级生成
摘要: 本文介绍了2023年IEEE游戏大会上第一届ChatGPT4PCG竞赛。该竞赛的目标是参与者创建有效的提示,使ChatGPT能够生成具有高稳定性和类似角色特质的Science Birds水平,充分发挥他们的创造力和提示工程技能。ChatGPT是由OpenAI开发的对话代理。选择Science Birds作为竞赛平台是因为设计类似于Angry Birds的水平并不是一项简单的任务,因为游戏内的重力影响;水平的质量取决于它们的稳定性。为降低参与竞赛的门槛,我们将任务限定为生成大写英文字母字符。我们还只允许使用单个提示来生成所有字符。在这里,生成水平的质量取决于它们的稳定性和与给定字符的相似度。为参与者提供了一个样本提示供参考。通过对几个修改版本的样本提示在水平稳定性和相似度上的有效性进行实验,通过在几个字符上进行测试来确定。据我们所知,我们相信ChatGPT4PCG是其类别的第一个竞赛,并希望激发对程序内容生成中提示工程的热情。
更新时间: 2024-03-21 01:42:43
领域: cs.AI,cs.CL,I.2.7; I.2.8
Understanding Information Disclosure from Secure Computation Output: A Study of Average Salary Computation
Secure multi-party computation has seen substantial performance improvements in recent years and is being increasingly used in commercial products. While a significant amount of work was dedicated to improving its efficiency under standard security models, the threat models do not account for information leakage from the output of secure function evaluation. Quantifying information disclosure about private inputs from observing the function outcome is the subject of this work. Motivated by the City of Boston gender pay gap studies, in this work we focus on the computation of the average of salaries and quantify information disclosure about private inputs of one or more participants (the target) to an adversary via information-theoretic techniques. We study a number of distributions including log-normal, which is typically used for modeling salaries. We consequently evaluate information disclosure after repeated evaluation of the average function on overlapping inputs, as was done in the Boston gender pay study that ran multiple times, and provide recommendations for using the sum and average functions in secure computation applications. Our goal is to develop mechanisms that lower information disclosure about participants' inputs to a desired level and provide guidelines for setting up real-world secure evaluation of this function.
Updated: 2024-03-21 01:38:43
标题: 理解从安全计算输出中的信息披露:平均薪资计算研究
摘要: 最近几年,安全多方计算在性能方面取得了显著的改进,并在商业产品中越来越被广泛使用。虽然在标准安全模型下致力于提高效率的工作量很大,但威胁模型并未考虑来自安全函数评估输出的信息泄露。本文旨在量化观察函数结果时关于私人输入的信息披露。受波士顿市性别工资差距研究的启发,本文重点研究了工资平均值的计算,并通过信息论技术量化了一个或多个参与者(目标)的私人输入对对手的信息披露。我们研究了一些分布,包括通常用于建模工资的对数正态分布。随后,我们评估了在重复对重叠输入进行平均函数评估之后的信息披露,这类似于波士顿性别工资研究中多次运行的情况,并提出了在安全计算应用中使用求和和平均函数的建议。我们的目标是开发降低与参与者输入相关信息披露至期望水平的机制,并为设置这一函数的实际安全评估提供指导。
更新时间: 2024-03-21 01:38:43
领域: cs.CR,cs.IT,math.IT
Sampling Audit Evidence Using a Naive Bayes Classifier
Taiwan's auditors have suffered from processing excessive audit data, including drawing audit evidence. This study advances sampling techniques by integrating machine learning with sampling. This machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples. We first classify data using a Naive Bayes classifier into some classes. Next, a user-based, item-based, or hybrid approach is employed to draw audit evidence. The representativeness index is the primary metric for measuring its representativeness. The user-based approach samples data symmetric around the median of a class as audit evidence. It may be equivalent to a combination of monetary and variable samplings. The item-based approach represents asymmetric sampling based on posterior probabilities for obtaining risky samples as audit evidence. It may be identical to a combination of non-statistical and monetary samplings. Auditors can hybridize those user-based and item-based approaches to balance representativeness and riskiness in selecting audit evidence. Three experiments show that sampling using machine learning integration has the benefits of drawing unbiased samples, handling complex patterns, correlations, and unstructured data, and improving efficiency in sampling big data. However, the limitations are the classification accuracy output by machine learning algorithms and the range of prior probabilities.
Updated: 2024-03-21 01:35:03
标题: 使用朴素贝叶斯分类器进行抽样审计证据
摘要: 台湾的审计师在处理过多的审计数据时遇到了困难,包括绘制审计证据。本研究通过将机器学习与抽样技术相结合,推进了抽样技术。这种机器学习整合有助于避免抽样偏见,保持随机性和变异性,并针对风险较高的样本。我们首先使用朴素贝叶斯分类器将数据分类为一些类别。接下来,采用基于用户、基于项目或混合方法来绘制审计证据。代表性指数是衡量其代表性的主要指标。基于用户的方法在类别的中位数周围对数据进行对称抽样作为审计证据。它可能相当于货币和变量抽样的组合。基于项目的方法基于后验概率进行不对称抽样,以获得风险样本作为审计证据。它可能与非统计和货币抽样的组合相同。审计师可以将这些基于用户和基于项目的方法混合起来,以在选择审计证据时平衡代表性和风险性。三个实验表明,使用机器学习整合进行抽样具有绘制无偏见样本、处理复杂模式、相关性和非结构化数据以及提高大数据抽样效率的好处。然而,机器学习算法输出的分类准确性和先验概率范围是其局限性。
更新时间: 2024-03-21 01:35:03
领域: cs.LG,62D05, 62H30
Automatic Outlier Rectification via Optimal Transport
In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize an optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We discuss the fundamental differences between our estimator and optimal transport-based distributionally robust optimization estimator. finally, we demonstrate the effectiveness and superiority of our approach over conventional approaches in extensive simulation and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
Updated: 2024-03-21 01:30:24
标题: 自动异常值矫正技术:通过最优输运实现
摘要: 在本文中,我们提出了一个新颖的概念框架,使用具有凹凸成本函数的最优传输来检测异常值。传统的异常值检测方法通常使用两阶段过程:首先检测并移除异常值,然后对清理后的数据进行估计。然而,这种方法并未将异常值移除与估计任务联系起来,留下改进的空间。为了解决这一限制,我们提出了一个自动异常值矫正机制,将矫正和估计集成到一个联合优化框架中。我们首先采取最优传输距离和凹凸成本函数的方法,在概率分布空间中构建一个矫正集。然后,我们从矫正集中选择最佳分布来执行估计任务。值得注意的是,我们在本文中引入的凹凸成本函数是使我们的估计器能够在优化过程中有效识别异常值的关键。我们讨论了我们的估计器与基于最优传输的分布鲁棒优化估计器之间的基本差异。最后,我们通过广泛的模拟和实证分析,展示了我们的方法在均值估计、最小绝对回归和期权隐含波动率曲面拟合等方面比传统方法的有效性和优越性。
更新时间: 2024-03-21 01:30:24
领域: stat.ML,cs.LG,math.OC,stat.ME
DiffSTOCK: Probabilistic relational Stock Market Predictions using Diffusion Models
In this work, we propose an approach to generalize denoising diffusion probabilistic models for stock market predictions and portfolio management. Present works have demonstrated the efficacy of modeling interstock relations for market time-series forecasting and utilized Graph-based learning models for value prediction and portfolio management. Though convincing, these deterministic approaches still fall short of handling uncertainties i.e., due to the low signal-to-noise ratio of the financial data, it is quite challenging to learn effective deterministic models. Since the probabilistic methods have shown to effectively emulate higher uncertainties for time-series predictions. To this end, we showcase effective utilisation of Denoising Diffusion Probabilistic Models (DDPM), to develop an architecture for providing better market predictions conditioned on the historical financial indicators and inter-stock relations. Additionally, we also provide a novel deterministic architecture MaTCHS which uses Masked Relational Transformer(MRT) to exploit inter-stock relations along with historical stock features. We demonstrate that our model achieves SOTA performance for movement predication and Portfolio management.
Updated: 2024-03-21 01:20:32
标题: DiffSTOCK:使用扩散模型进行概率关系股市预测
摘要: 在这项工作中,我们提出了一种泛化去噪扩散概率模型的方法,用于股市预测和投资组合管理。目前的研究已经证明了对股票间关系建模对市场时间序列预测的有效性,并利用基于图的学习模型进行价值预测和投资组合管理。尽管令人信服,这些确定性方法仍然无法处理不确定性,即由于金融数据的低信噪比,学习有效的确定性模型是非常具有挑战性的。由于概率方法已经被证明能够有效地模拟时间序列预测的更高不确定性。为此,我们展示了如何有效利用去噪扩散概率模型(DDPM),开发一个基于历史金融指标和股票间关系提供更好市场预测的架构。此外,我们还提供了一种新颖的确定性架构MaTCHS,使用Masked Relational Transformer(MRT)来利用股票间关系以及历史股票特征。我们证明我们的模型在运动预测和投资组合管理方面达到了SOTA表现。
更新时间: 2024-03-21 01:20:32
领域: cs.LG,cs.CE,q-fin.CP,q-fin.PM
Hypothesis-Driven Deep Learning for Out of Distribution Detection
Predictions of opaque black-box systems are frequently deployed in high-stakes applications such as healthcare. For such applications, it is crucial to assess how models handle samples beyond the domain of training data. While several metrics and tests exist to detect out-of-distribution (OoD) data from in-distribution (InD) data to a deep neural network (DNN), their performance varies significantly across datasets, models, and tasks, which limits their practical use. In this paper, we propose a hypothesis-driven approach to quantify whether a new sample is InD or OoD. Given a trained DNN and some input, we first feed the input through the DNN and compute an ensemble of OoD metrics, which we term latent responses. We then formulate the OoD detection problem as a hypothesis test between latent responses of different groups, and use permutation-based resampling to infer the significance of the observed latent responses under a null hypothesis. We adapt our method to detect an unseen sample of bacteria to a trained deep learning model, and show that it reveals interpretable differences between InD and OoD latent responses. Our work has implications for systematic novelty detection and informed decision-making from classifiers trained on a subset of labels.
Updated: 2024-03-21 01:06:47
标题: 假设驱动的深度学习用于异常检测
摘要: 不透明黑盒系统的预测经常被部署在诸如医疗保健等高风险应用中。对于这类应用,评估模型如何处理训练数据域之外的样本至关重要。虽然存在几种用于检测深度神经网络(DNN)中的分布外(OoD)数据与分布内(InD)数据的指标和测试,但它们在不同数据集、模型和任务之间的性能差异显著,这限制了它们的实际应用。在本文中,我们提出了一种基于假设的方法来量化一个新样本是InD还是OoD。给定一个经过训练的DNN和一些输入,我们首先将输入通过DNN并计算一组OoD指标,我们将其称为潜在响应。然后,我们将OoD检测问题转化为不同组的潜在响应之间的假设检验,并使用基于置换的重采样来推断在零假设下观察到的潜在响应的显著性。我们将我们的方法调整为检测训练好的深度学习模型中的一种未见的细菌样本,并展示它揭示了InD和OoD潜在响应之间的可解释差异。我们的工作对于系统性的新颖性检测和基于训练在一部分标签上的分类器的知情决策具有重要意义。
更新时间: 2024-03-21 01:06:47
领域: cs.LG,stat.ML
ComCLIP: Training-Free Compositional Image and Text Matching
Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text. However, it is still challenging to adapt vision-lanaguage pretrained models like CLIP to compositional image and text matching -- a more challenging image and text matching task requiring the model understanding of compositional word concepts and visual components. Towards better compositional generalization in zero-shot image and text matching, in this paper, we study the problem from a causal perspective: the erroneous semantics of individual entities are essentially confounders that cause the matching failure. Therefore, we propose a novel \textbf{\textit{training-free}} compositional CLIP model (ComCLIP). ComCLIP disentangles input images into subjects, objects, and action sub-images and composes CLIP's vision encoder and text encoder to perform evolving matching over compositional text embedding and sub-image embeddings. In this way, ComCLIP can mitigate spurious correlations introduced by the pretrained CLIP models and dynamically evaluate the importance of each component. Experiments on four compositional image-text matching datasets: SVO, ComVG, Winoground, and VL-checklist, and two general image-text retrieval datasets: Flick30K, and MSCOCO demonstrate the effectiveness of our plug-and-play method, which boosts the \textbf{\textit{zero-shot}} inference ability of CLIP, SLIP, and BLIP2 even without further training or fine-tuning. Our codes can be found at https://github.com/eric-ai-lab/ComCLIP.
Updated: 2024-03-21 00:53:19
标题: ComCLIP:无需训练的图像和文本组合匹配
摘要: 对比语言-图像预训练(CLIP)已经展示了在匹配图像和文本方面的出色的零样本性能。然而,将像CLIP这样的视觉语言预训练模型适应到组合图像和文本匹配仍然具有挑战性--这是一个更具挑战性的图像和文本匹配任务,需要模型理解组合词概念和视觉组件。为了更好地在零样本图像和文本匹配中实现组合泛化,在本文中,我们从因果的角度研究了这个问题:单个实体的错误语义本质上是导致匹配失败的混淆因子。因此,我们提出了一种新颖的\textbf{\textit{无需训练}}的组合CLIP模型(ComCLIP)。ComCLIP将输入图像解开成主语、宾语和动作子图像,并组合CLIP的视觉编码器和文本编码器,以在组合文本嵌入和子图像嵌入上执行不断演化的匹配。通过这种方式,ComCLIP可以减轻预训练CLIP模型引入的虚假相关性,并动态评估每个组件的重要性。对四个组合图像-文本匹配数据集进行的实验:SVO、ComVG、Winoground和VL-checklist,以及两个通用图像-文本检索数据集:Flick30K和MSCOCO展示了我们的即插即用方法的有效性,它提升了CLIP、SLIP和BLIP2的\textbf{\textit{零样本}}推理能力,即使没有进一步训练或微调。我们的代码可以在https://github.com/eric-ai-lab/ComCLIP 上找到。
更新时间: 2024-03-21 00:53:19
领域: cs.CV,cs.AI,cs.CL
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
FourCastNeXt is an optimization of FourCastNet - a global machine learning weather forecasting model - that performs with a comparable level of accuracy and can be trained using around 5% of the original FourCastNet computational requirements. This technical report presents strategies for model optimization that maintain similar performance as measured by the root-mean-square error (RMSE) of the modelled variables. By providing a model with very low comparative training costs, FourCastNeXt makes Neural Earth System Modelling much more accessible to researchers looking to conduct training experiments and ablation studies. FourCastNeXt training and inference code are available at https://github.com/nci/FourCastNeXt
Updated: 2024-03-21 00:42:39
标题: FourCastNeXt:为有限计算资源优化FourCastNet的训练
摘要: FourCastNeXt是FourCastNet的优化版本,它是一个全球机器学习天气预测模型,具有可比较的准确性水平,并且可以使用原始FourCastNet计算需求的约5%进行训练。本技术报告介绍了保持模型优化的策略,其性能类似于通过模型变量的均方根误差(RMSE)进行衡量。通过提供具有非常低比较训练成本的模型,FourCastNeXt使得神经地球系统建模更加容易接近于那些希望进行训练实验和切除研究的研究人员。FourCastNeXt的训练和推断代码可在 https://github.com/nci/FourCastNeXt 上找到。
更新时间: 2024-03-21 00:42:39
领域: cs.CV,cs.AI
Protected group bias and stereotypes in Large Language Models
As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to queries relating to minoritized groups, providing responses that strongly emphasize diversity and equity to an extent that other group characteristics are overshadowed. This suggests that artificially constraining potentially harmful outputs may itself lead to harm, and should be applied in a careful and controlled manner.
Updated: 2024-03-21 00:21:38
标题: 大语言模型中的受保护群体偏见和刻板印象
摘要: 随着现代大型语言模型(LLMs)在各种领域打破许多最新技术基准,本文调查它们在伦理和公平领域的行为,重点关注受保护群体的偏见。我们进行了一个两部分的研究:首先,我们征求描述不同受保护群体(包括性别、性取向、宗教和种族)个体职业的句子延续。其次,我们让模型生成关于从事不同类型职业的个人的故事。我们收集了由公开可用的LLM生成的>10k个句子完成,经过人类标注。我们发现在少数化群体中存在偏见,尤其是在性别和性取向领域,以及模型生成中的西方偏见。该模型不仅反映了社会偏见,而且似乎放大了这些偏见。该模型在回复与少数化群体相关的查询时还过于谨慎,提供强调多样性和公平的回复,以至于其他群体特征被忽视。这表明,人为地限制潜在有害产出本身可能导致伤害,并且应以谨慎和有序的方式应用。
更新时间: 2024-03-21 00:21:38
领域: cs.CY,cs.CL,cs.LG
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
Updated: 2024-03-21 00:21:14
标题: 可信的LLMs:一项调查和评估大型语言模型对齐的指南
摘要: 确保对齐,指的是使模型与人类意图相符[1,2],在将大型语言模型(LLMs)部署到实际应用之前已成为一项关键任务。例如,OpenAI在发布GPT-4之前花费了六个月来迭代对齐[3]。然而,从业者面临的一个主要挑战是缺乏明确的指导,以评估LLM输出是否与社会规范、价值和法规相一致。这一障碍妨碍了LLMs的系统迭代和部署。为了解决这个问题,本文提供了一个关键维度的综合调查,这些维度在评估LLM的可信度时至关重要。调查涵盖了LLM可信度的七个主要类别:可靠性、安全性、公平性、抵抗滥用、可解释性和推理、遵守社会规范以及鲁棒性。每个主要类别进一步分为若干子类别,总共有29个子类别。此外,选择了8个子类别进行进一步研究,设计并在几个广泛使用的LLMs上进行相应的测量研究。测量结果表明,一般来说,更加对齐的模型在整体可信度方面表现更好。然而,对齐的有效性在考虑的不同可信度类别之间存在差异。这突显了进行更细粒度的分析、测试和对LLM对齐进行持续改进的重要性。通过揭示LLM可信度的这些关键维度,本文旨在为该领域的从业者提供有价值的见解和指导。理解和解决这些问题将对在各种应用中实现LLMs的可靠和道德部署至关重要。
更新时间: 2024-03-21 00:21:14
领域: cs.AI,cs.LG
A Roadmap Towards Automated and Regulated Robotic Systems
The rapid development of generative technology opens up possibility for higher level of automation, and artificial intelligence (AI) embodiment in robotic systems is imminent. However, due to the blackbox nature of the generative technology, the generation of the knowledge and workflow scheme is uncontrolled, especially in a dynamic environment and a complex scene. This poses challenges to regulations in safety-demanding applications such as medical scenes. We argue that the unregulated generative processes from AI is fitted for low level end tasks, but intervention in the form of manual or automated regulation should happen post-workflow-generation and pre-robotic-execution. To address this, we propose a roadmap that can lead to fully automated and regulated robotic systems. In this paradigm, the high level policies are generated as structured graph data, enabling regulatory oversight and reusability, while the code base for lower level tasks is generated by generative models. Our approach aims the transitioning from expert knowledge to regulated action, akin to the iterative processes of study, practice, scrutiny, and execution in human tasks. We identify the generative and deterministic processes in a design cycle, where generative processes serve as a text-based world simulator and the deterministic processes generate the executable system. We propose State Machine Seralization Language (SMSL) to be the conversion point between text simulator and executable workflow control. From there, we analyze the modules involved based on the current literature, and discuss human in the loop. As a roadmap, this work identifies the current possible implementation and future work. This work does not provide an implemented system but envisions to inspire the researchers working on the direction in the roadmap. We implement the SMSL and D-SFO paradigm that serve as the starting point of the roadmap.
Updated: 2024-03-21 00:14:53
标题: 通往自动化和规范化机器人系统的路线图
摘要: 生成技术的迅速发展为更高水平的自动化和人工智能(AI)在机器人系统中的体现开辟了可能性。然而,由于生成技术的黑盒性质,知识和工作流程方案的生成是不受控制的,尤其是在动态环境和复杂场景中。这给安全要求高的应用场景(如医疗场景)的监管带来了挑战。我们认为,来自AI的未受监管的生成过程适合于低级别的端任务,但干预形式应该发生在工作流程生成之后和机器人执行之前,可以是手动或自动的监管。为了解决这个问题,我们提出了一个可以实现完全自动化和受监管机器人系统的路线图。在这个范式中,高级政策被生成为结构化图形数据,实现监管监督和可重复利用,而低级任务的代码基础则由生成模型生成。我们的方法旨在实现从专家知识向受监管行动的过渡,类似于人类任务中的学习、实践、审查和执行的迭代过程。我们确定设计周期中的生成和确定性过程,其中生成过程充当基于文本的世界模拟器,而确定性过程生成可执行系统。我们提出状态机序列化语言(SMSL)作为文本模拟器和可执行工作流控制之间的转换点。从那里,我们根据当前文献分析所涉及的模块,并讨论人在其中的作用。作为一个路线图,这项工作确定了当前可能的实施和未来工作。这项工作并未提供一个实施的系统,而是设想激励那些致力于路线图方向的研究人员。我们实现了SMSL和D-SFO范例,这是路线图的起点。
更新时间: 2024-03-21 00:14:53
领域: cs.RO,cs.AI,cs.HC