Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity
Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into $\mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ overall time complexity, marking the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.
Updated: 2024-05-24 23:59:41
标题: 使用并行抽样加速扩散模型:亚线性时间复杂度下的推断
摘要: 扩散模型已成为生成图像和科学数据的主要方法。由于这些模型训练和评估成本高昂,降低扩散模型的推理成本仍然是一个主要目标。受最近通过并行采样技术加速扩散模型取得的实证成功的启发,我们提出将采样过程分为 $\mathcal{O}(1)$ 个块,每个块内部进行可并行化的Picard迭代。严格的理论分析表明,我们的算法实现了 $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ 的总时间复杂度,这标志着首个具有可证明次线性复杂度的实现,相对于数据维度 $d$。我们的分析基于Girsanov定理的广义版本,并与SDE和概率流ODE实现兼容。我们的结果揭示了在快速发展的现代大内存GPU集群上快速高效地对高维数据进行采样的潜力。
更新时间: 2024-05-24 23:59:41
领域: cs.LG,cs.DC,cs.NA,math.NA,stat.ML
The Impact and Opportunities of Generative AI in Fact-Checking
Generative AI appears poised to transform white collar professions, with more than 90% of Fortune 500 companies using OpenAI's flagship GPT models, which have been characterized as "general purpose technologies" capable of effecting epochal changes in the economy. But how will such technologies impact organizations whose job is to verify and report factual information, and to ensure the health of the information ecosystem? To investigate this question, we conducted 30 interviews with N=38 participants working at 29 fact-checking organizations across six continents, asking about how they use generative AI and the opportunities and challenges they see in the technology. We found that uses of generative AI envisioned by fact-checkers differ based on organizational infrastructure, with applications for quality assurance in Editing, for trend analysis in Investigation, and for information literacy in Advocacy. We used the TOE framework to describe participant concerns ranging from the Technological (lack of transparency), to the Organizational (resource constraints), to the Environmental (uncertain and evolving policy). Building on the insights of our participants, we describe value tensions between fact-checking and generative AI, and propose a novel Verification dimension to the design space of generative models for information verification work. Finally, we outline an agenda for fairness, accountability, and transparency research to support the responsible use of generative AI in fact-checking. Throughout, we highlight the importance of human infrastructure and labor in producing verified information in collaboration with AI. We expect that this work will inform not only the scientific literature on fact-checking, but also contribute to understanding of organizational adaptation to a powerful but unreliable new technology.
Updated: 2024-05-24 23:58:01
标题: 《生成式人工智能在事实核查中的影响和机遇》
摘要: 生成式人工智能似乎有望改变白领职业,超过90%的财富500强公司正在使用OpenAI的旗舰GPT模型,这些模型被描述为“通用技术”,能够对经济产生具有划时代意义的改变。但这样的技术将如何影响那些负责验证和报告事实信息,以及确保信息生态健康的组织呢?为了调查这个问题,我们对来自六大洲的29个事实核查组织共38名参与者进行了30次访谈,询问他们如何使用生成式人工智能以及他们在这项技术中看到的机遇和挑战。我们发现,事实核查人员对生成式人工智能的应用构想取决于组织基础设施,包括在编辑中的质量保证、在调查中的趋势分析以及在倡导中的信息素养。我们使用了TOE框架来描述参与者的担忧,涵盖技术层面(缺乏透明度)、组织层面(资源限制)和环境层面(不确定和不断演变的政策)。借鉴参与者的见解,我们描述了事实核查与生成式人工智能之间的价值张力,并提出了一个新颖的验证维度,用于信息核查工作中生成模型的设计空间。最后,我们概述了一个关于公平性、问责制和透明性研究议程,以支持在事实核查中负责任地使用生成式人工智能。在整个过程中,我们强调了人类基础设施和劳动在与人工智能合作生产核实信息的重要性。我们期望这项工作不仅将为事实核查的科学文献提供信息,还将有助于理解组织如何适应一种强大但不可靠的新技术。
更新时间: 2024-05-24 23:58:01
领域: cs.HC,cs.AI,cs.CY
Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models
With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks, outperforming vanilla ICL with a 4.87% reduction in Attack Success Rate (ASR); however, they exhibit overconfidence in the demonstrations, leading to a 2% increase in ASR for demonstration attacks. Adversarial training can help improve the robustness of ICL methods to adversarial attacks; however, such a training scheme can be too costly in the context of LLMs. As an alternative, we introduce an effective training-free adversarial defence method, DARD, which enriches the example pool with those attacked samples. We show that DARD yields improvements in performance and robustness, achieving a 15% reduction in ASR over the baselines. Code and data are released to encourage further research: https://github.com/simonucl/adv-retreival-icl
Updated: 2024-05-24 23:56:36
标题: 评估大型语言模型基于检索的上下文学习的对抗鲁棒性
摘要: 随着大型语言模型的出现,如LLaMA和OpenAI GPT-3,上下文学习(ICL)因其有效性和效率而受到广泛关注。然而,ICL对于编码提示中的演示的选择、顺序和表达者非常敏感。检索增强的ICL方法试图通过利用检索器提取语义相关的示例作为演示来解决这个问题。虽然这种方法产生了更准确的结果,但其对各种类型的对抗性攻击,包括对测试样本、演示和检索数据的扰动,仍未得到充分探讨。我们的研究表明,检索增强模型可以增强对测试样本攻击的鲁棒性,在攻击成功率(ASR)上胜过普通ICL,降低了4.87%;但是,它们在演示中表现出过度自信,导致演示攻击的ASR增加了2%。对抗训练可以帮助改善ICL方法对对抗性攻击的鲁棒性;然而,在LLMs的情况下,这种训练方案可能成本过高。作为替代,我们引入了一种有效的无训练对抗性防御方法DARD,通过将受攻击的样本丰富到示例池中。我们展示DARD可以改善性能和鲁棒性,比基线减少了15%的ASR。发布了代码和数据以鼓励进一步研究:https://github.com/simonucl/adv-retreival-icl
更新时间: 2024-05-24 23:56:36
领域: cs.CL,cs.AI
Hierarchical Clustering via Local Search
In this paper, we introduce a local search algorithm for hierarchical clustering. For the local step, we consider a tree re-arrangement operation, known as the {\em interchange}, which involves swapping two closely positioned sub-trees within a tree hierarchy. The interchange operation has been previously used in the context of phylogenetic trees. As the objective function for evaluating the resulting hierarchies, we utilize the revenue function proposed by Moseley and Wang (NIPS 2017.) In our main result, we show that any locally optimal tree guarantees a revenue of at least $\frac{n-2}{3}\sum_{i < j}w(i,j)$ where is $n$ the number of objects and $w: [n] \times [n] \rightarrow \mathbb{R}^+$ is the associated similarity function. This finding echoes the previously established bound for the average link algorithm as analyzed by Moseley and Wang. We demonstrate that this alignment is not coincidental, as the average link trees enjoy the property of being locally optimal with respect to the interchange operation. Consequently, our study provides an alternative insight into the average link algorithm and reveals the existence of a broader range of hierarchies with relatively high revenue achievable through a straightforward local search algorithm. Furthermore, we present an implementation of the local search framework, where each local step requires $O(n)$ computation time. Our empirical results indicate that the proposed method, used as post-processing step, can effectively generate a hierarchical clustering with substantial revenue.
Updated: 2024-05-24 23:46:24
标题: 通过局部搜索进行分层聚类
摘要: 在本文中,我们介绍了一种用于分层聚类的局部搜索算法。对于局部步骤,我们考虑了一种称为“交换”的树重新排列操作,其中涉及在树层次结构内交换两个紧密位置的子树。交换操作先前已在系统发生树的背景下使用过。作为评估结果层次结构的目标函数,我们利用了Moseley和Wang(NIPS 2017)提出的收益函数。 在我们的主要结果中,我们表明任何局部最优树都保证至少有$\frac{n-2}{3}\sum_{i < j}w(i,j)$的收益,其中$n$是对象的数量,$w: [n] \times [n] \rightarrow \mathbb{R}^+$是相关的相似性函数。这一发现呼应了Moseley和Wang分析的平均链接算法的先前确定的界限。我们证明了这种对齐并非偶然,因为平均链接树具有相对于交换操作而言的局部最优性质。因此,我们的研究提供了对平均链接算法的替代见解,并揭示了通过简单的局部搜索算法可以实现相对高收益的更广泛范围层次结构的存在。 此外,我们提出了局部搜索框架的实现,其中每个局部步骤需要$O(n)$的计算时间。我们的实证结果表明,所提出的方法作为后处理步骤使用,可以有效生成具有可观收益的分层聚类。
更新时间: 2024-05-24 23:46:24
领域: cs.DS,cs.LG
On the Tractability of SHAP Explanations under Markovian Distributions
Thanks to its solid theoretical foundation, the SHAP framework is arguably one the most widely utilized frameworks for local explainability of ML models. Despite its popularity, its exact computation is known to be very challenging, proven to be NP-Hard in various configurations. Recent works have unveiled positive complexity results regarding the computation of the SHAP score for specific model families, encompassing decision trees, random forests, and some classes of boolean circuits. Yet, all these positive results hinge on the assumption of feature independence, often simplistic in real-world scenarios. In this article, we investigate the computational complexity of the SHAP score by relaxing this assumption and introducing a Markovian perspective. We show that, under the Markovian assumption, computing the SHAP score for the class of Weighted automata, Disjoint DNFs and Decision Trees can be performed in polynomial time, offering a first positive complexity result for the problem of SHAP score computation that transcends the limitations of the feature independence assumption.
Updated: 2024-05-24 23:45:34
标题: 关于在马尔可夫分布下SHAP解释的可处理性
摘要: 由于其坚实的理论基础,SHAP框架可以说是最广泛应用于机器学习模型本地可解释性的框架之一。尽管它很受欢迎,但其确切计算被认为非常具有挑战性,在各种配置中被证明是NP-难的。最近的研究揭示了关于计算SHAP评分的正面复杂性结果,涵盖了决策树、随机森林和一些布尔电路类别。然而,所有这些正面结果都依赖于特征独立性的假设,这在现实场景中通常是简化的。在本文中,我们通过放宽这一假设并引入马尔可夫视角,调查了SHAP评分的计算复杂性。我们展示,在马尔可夫假设下,对于加权自动机、不相交DNF和决策树类别,计算SHAP评分可以在多项式时间内完成,为SHAP评分计算问题提供了第一个超越特征独立性假设限制的正面复杂性结果。
更新时间: 2024-05-24 23:45:34
领域: cs.LG,cs.AI
BadGD: A unified data-centric framework to identify gradient descent vulnerabilities
We present BadGD, a unified theoretical framework that exposes the vulnerabilities of gradient descent algorithms through strategic backdoor attacks. Backdoor attacks involve embedding malicious triggers into a training dataset to disrupt the model's learning process. Our framework introduces three novel constructs: Max RiskWarp Trigger, Max GradWarp Trigger, and Max GradDistWarp Trigger, each designed to exploit specific aspects of gradient descent by distorting empirical risk, deterministic gradients, and stochastic gradients respectively. We rigorously define clean and backdoored datasets and provide mathematical formulations for assessing the distortions caused by these malicious backdoor triggers. By measuring the impact of these triggers on the model training procedure, our framework bridges existing empirical findings with theoretical insights, demonstrating how a malicious party can exploit gradient descent hyperparameters to maximize attack effectiveness. In particular, we show that these exploitations can significantly alter the loss landscape and gradient calculations, leading to compromised model integrity and performance. This research underscores the severe threats posed by such data-centric attacks and highlights the urgent need for robust defenses in machine learning. BadGD sets a new standard for understanding and mitigating adversarial manipulations, ensuring the reliability and security of AI systems.
Updated: 2024-05-24 23:39:45
标题: BadGD:一个统一的数据中心框架,用于识别梯度下降漏洞
摘要: 我们提出了BadGD,一个统一的理论框架,通过策略性后门攻击揭示了梯度下降算法的脆弱性。后门攻击涉及将恶意触发器嵌入训练数据集,以干扰模型的学习过程。我们的框架引入了三个新颖的构造:最大风险扭曲触发器、最大梯度扭曲触发器和最大梯度距离扭曲触发器,分别旨在利用梯度下降的特定方面,通过扭曲经验风险、确定性梯度和随机梯度。我们严格定义了清洁和后门数据集,并提供了评估这些恶意后门触发器引起的扭曲的数学公式。通过衡量这些触发器对模型训练过程的影响,我们的框架将现有的经验发现与理论洞见联系起来,展示了恶意方如何利用梯度下降超参数来最大化攻击效果。特别是,我们表明这些利用可以显著改变损失面和梯度计算,导致模型完整性和性能受损。这项研究强调了此类数据中心攻击带来的严重威胁,并突出了机器学习中对强大防御的迫切需求。BadGD为理解和缓解对抗性操纵设立了新的标准,确保人工智能系统的可靠性和安全性。
更新时间: 2024-05-24 23:39:45
领域: cs.LG,cs.CR,stat.ML
Inference of Utilities and Time Preference in Sequential Decision-Making
This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.
Updated: 2024-05-24 23:13:56
标题: 在顺序决策中推理效用和时间偏好
摘要: 本文介绍了一种新颖的随机控制框架,以增强自动投资经理或机器顾问的能力,通过准确推断客户的投资偏好从过去的活动中。我们的方法利用了一个连续时间模型,该模型包括效用函数和一个时间变化率的通用折扣方案,根据每位客户的风险承受能力、对每日消费的估值和重要生活目标定制。我们通过状态增强和动态规划原理以及验证定理解决了由此产生的时间不一致问题。此外,我们提供了客户投资偏好可识别性的充分条件。为了补充我们的理论发展,我们提出了一种基于最大似然估计的学习算法,该算法在离散时间马尔可夫决策过程框架中增加了熵正则化。我们证明对数似然函数是局部凹的,从而促进了我们提出的算法的快速收敛。通过两个数值例子展示了实际的有效性和效率,包括默顿问题和一个存在无法对冲风险的投资问题。我们提出的框架不仅通过改进个性化投资建议推进了金融技术,还广泛贡献于其他领域,如医疗保健、经济学和人工智能,理解个人偏好在这些领域至关重要。
更新时间: 2024-05-24 23:13:56
领域: math.OC,cs.LG,q-fin.CP
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches.
Updated: 2024-05-24 23:09:27
标题: 通过自我改进增强大型视觉语言模型中视觉-语言模态对齐
摘要: 大型视觉-语言模型(LVLMs)通过在特定数据集上进行视觉指导调整,在各种视觉问答和推理任务中取得了令人印象深刻的成果。然而,在视觉和语言模态之间的对齐仍然有很大的改进空间。以前增强这种对齐的方法通常需要外部模型或数据,严重依赖它们的能力和质量,这不可避免地为性能设定了一个上限。在本文中,我们提出了SIMA,一个通过自我改进增强视觉和语言模态对齐的框架,消除了对外部模型或数据的需求。SIMA利用现有的视觉指导调整数据集中的提示来自动生成响应,并采用上下文自我批评机制来选择用于偏好调整的响应对。关键创新在于在上下文自我批评过程中引入了三个视觉度量,这些度量可以指导LVLM选择增强图像理解的响应。通过在14个幻觉和全面基准测试中进行实验,我们展示了SIMA不仅在所有基准测试中提高了模型性能,而且实现了更好的模态对齐,优于以前的方法。
更新时间: 2024-05-24 23:09:27
领域: cs.CV,cs.AI,cs.CL,cs.LG
LTL-Constrained Policy Optimization with Cycle Experience Replay
Linear Temporal Logic (LTL) offers a precise means for constraining the behavior of reinforcement learning agents. However, in many tasks, LTL is insufficient for task specification; LTL-constrained policy optimization, where the goal is to optimize a scalar reward under LTL constraints, is needed. Prior methods for this constrained problem are restricted to finite state spaces. In this work, we present Cycle Experience Replay (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces and the use of function approximations. CyclER guides a policy towards satisfaction by encouraging partial behaviors compliant with the LTL constraint, using the structure of the constraint. In doing so, it addresses the optimization challenges stemming from the sparse nature of LTL satisfaction. We evaluate CyclER in three continuous control domains. On these tasks, CyclER outperforms existing reward-shaping methods at finding performant and LTL-satisfying policies.
Updated: 2024-05-24 22:57:06
标题: 具有周期经验重放的LTL约束策略优化
摘要: 线性时间逻辑(LTL)为约束强化学习代理的行为提供了精确的手段。然而,在许多任务中,LTL并不足以用于任务规范;需要LTL约束下的策略优化,即目标是在LTL约束下优化标量奖励。先前针对这个约束问题的方法局限于有限状态空间。在这项工作中,我们提出了循环经验重放(CyclER),这是一种奖励塑造方法,可以允许连续的状态和动作空间以及使用函数逼近。CyclER通过鼓励符合LTL约束的部分行为来引导策略朝向满足,利用约束的结构。通过这样做,它解决了由于LTL满足性的稀疏性而产生的优化挑战。我们在三个连续控制领域评估了CyclER。在这些任务中,CyclER在找到性能良好且满足LTL的策略方面优于现有的奖励塑造方法。
更新时间: 2024-05-24 22:57:06
领域: cs.LG,cs.AI,cs.FL
Robust width: A lightweight and certifiable adversarial defense
Deep neural networks are vulnerable to so-called adversarial examples: inputs which are intentionally constructed to cause the model to make incorrect predictions or classifications. Adversarial examples are often visually indistinguishable from natural data samples, making them hard to detect. As such, they pose significant threats to the reliability of deep learning systems. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse. The defense is easy to implement and can be applied to any existing model without additional training or finetuning. We empirically validate the defense on ImageNet against $L^\infty$ perturbations at perturbation budgets ranging from $4/255$ to $32/255$. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.
Updated: 2024-05-24 22:50:50
标题: 稳健的宽度:一种轻量级且可认证的对抗性防御
摘要: 深度神经网络容易受到所谓的对抗样本的攻击:这些输入被有意构造,以使模型做出错误的预测或分类。对抗样本通常在视觉上与自然数据样本无法区分,这使得它们难以检测。因此,它们对深度学习系统的可靠性构成重大威胁。在这项工作中,我们研究了一种基于稳健宽度属性(RWP)的对抗防御,该属性最近用于压缩感知。我们展示了一种基于RWP的特定输入净化方案,为近似稀疏的图像提供了理论上的稳健性保证。该防御方案易于实现,并可应用于任何现有模型,无需额外的训练或微调。我们在ImageNet上对$L^\infty$扰动的对抗防御进行了实证验证,扰动预算范围从$4/255$到$32/255$。在黑盒设置中,我们的方法在特别是对大幅扰动的情况下明显优于当前最先进技术。在白盒设置中,根据基分类器的选择,我们在稳健的ImageNet分类方面与最新技术水平相匹配,同时避免了需要额外数据、更大的模型或昂贵的对抗训练程序。我们的代码可在https://github.com/peck94/robust-width-defense上找到。
更新时间: 2024-05-24 22:50:50
领域: cs.LG,cs.CR,cs.CV
Inferring Manifolds From Noisy Data Using Gaussian Processes
In analyzing complex datasets, it is often of interest to infer lower dimensional structure underlying the higher dimensional observations. As a flexible class of nonlinear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower dimensional coordinates without providing an estimate of the manifold in the observation space or using the manifold to denoise the original data. This article proposes a new methodology for addressing these problems, allowing interpolation of the estimated manifold between fitted data points. The proposed approach is motivated by novel theoretical properties of local covariance matrices constructed from noisy samples on a manifold. Our results enable us to turn a global manifold reconstruction problem into a local regression problem, allowing application of Gaussian processes for probabilistic manifold reconstruction. In addition to theory justifying the algorithm, we provide simulated and real data examples to illustrate the performance.
Updated: 2024-05-24 22:35:29
标题: 使用高斯过程从嘈杂数据推断流形
摘要: 在分析复杂数据集时,通常会有兴趣推断高维观测下潜在的低维结构。作为一种灵活的非线性结构类别,通常会关注黎曼流形。大多数现有的流形学习算法会用低维坐标替换原始数据,但并未提供观测空间中流形的估计或使用流形对原始数据进行去噪。本文提出了一种新的方法来解决这些问题,允许在拟合数据点之间插值估计流形。所提出的方法受到从流形上的噪声样本构建的局部协方差矩阵的新颖理论特性的启发。我们的结果使我们能够将全局流形重构问题转化为局部回归问题,从而可以应用高斯过程进行概率流形重构。除了理论上证明算法的正确性外,我们还提供了模拟和真实数据示例来说明性能。
更新时间: 2024-05-24 22:35:29
领域: stat.ML,cs.LG
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safe control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the algorithm and theory progress from the perspectives of answering the "2H3W" problems. Particularly, the sample complexity of safe RL algorithms is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread. To advance the study of safe RL algorithms, we release an open-sourced repository containing the implementations of major safe RL algorithms at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.
Updated: 2024-05-24 22:33:04
标题: 《安全强化学习综述:方法、理论与应用》
摘要: 强化学习(RL)在许多复杂的决策任务中取得了巨大成功。然而,在将RL部署到现实世界应用中时,会引发安全问题,这导致了对安全RL算法的日益增长的需求,例如在自动驾驶和机器人场景中。虽然安全控制有着悠久的历史,但安全RL算法的研究仍处于早期阶段。为了为未来安全RL研究奠定良好基础,本文从方法、理论和应用的角度提供了安全RL的综述。首先,我们从五个维度回顾了安全RL的进展,并提出了在现实世界应用中部署安全RL时面临的五个关键问题,称为“2H3W”。其次,我们从回答“2H3W”问题的角度分析了算法和理论的进展。特别地,我们回顾和讨论了安全RL算法的样本复杂度,然后介绍了安全RL算法的应用和基准。最后,我们展开了对安全RL中具有挑战性的问题的讨论,希望激励未来关于这一主题的研究。为了推进安全RL算法的研究,我们发布了一个开源存储库,其中包含主要安全RL算法的实现,链接为:https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git。
更新时间: 2024-05-24 22:33:04
领域: cs.AI,cs.LG
KAN: Kolmogorov-Arnold Networks
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.
Updated: 2024-05-24 22:30:07
标题: KAN:科尔莫戈洛夫-阿诺德网络
摘要: 受科尔莫戈洛夫-阿诺德表示定理的启发,我们提出科尔莫戈洛夫-阿诺德网络(KANs)作为多层感知器(MLPs)的有希望的替代方案。虽然MLPs在节点(“神经元”)上具有固定的激活函数,但KANs在边缘(“权重”)上具有可学习的激活函数。KANs根本没有线性权重-每个权重参数都被参数化为样条函数。我们表明,这看似简单的改变使KANs在准确性和可解释性方面优于MLPs。在准确性方面,比较小的KANs在数据拟合和PDE求解方面可以实现与比较大的MLPs相当或更好的准确性。从理论和经验上看,KANs具有比MLPs更快的神经缩放定律。在可解释性方面,KANs可以直观地可视化,并且可以轻松地与人类用户互动。通过数学和物理领域的两个示例,我们展示了KANs是有用的合作者,帮助科学家重新发现数学和物理定律。总之,KANs是MLPs的有希望的替代方案,为进一步改进依赖于MLPs的当今深度学习模型提供机会。
更新时间: 2024-05-24 22:30:07
领域: cs.LG,cond-mat.dis-nn,cs.AI,stat.ML
Human-Centered Automation
The rapid advancement of Generative Artificial Intelligence (AI), such as Large Language Models (LLMs) and Multimodal Large Language Models (MLLM), has the potential to revolutionize the way we work and interact with digital systems across various industries. However, the current state of software automation, such as Robotic Process Automation (RPA) frameworks, often requires domain expertise and lacks visibility and intuitive interfaces, making it challenging for users to fully leverage these technologies. This position paper argues for the emerging area of Human-Centered Automation (HCA), which prioritizes user needs and preferences in the design and development of automation systems. Drawing on empirical evidence from human-computer interaction research and case studies, we highlight the importance of considering user perspectives in automation and propose a framework for designing human-centric automation solutions. The paper discusses the limitations of existing automation approaches, the challenges in integrating AI and RPA, and the benefits of human-centered automation for productivity, innovation, and democratizing access to these technologies. We emphasize the importance of open-source solutions and provide examples of how HCA can empower individuals and organizations in the era of rapidly progressing AI, helping them remain competitive. The paper also explores pathways to achieve more advanced and context-aware automation solutions. We conclude with a call to action for researchers and practitioners to focus on developing automation technologies that adapt to user needs, provide intuitive interfaces, and leverage the capabilities of high-end AI to create a more accessible and user-friendly future of automation.
Updated: 2024-05-24 22:12:28
标题: 人类中心自动化
摘要: 生成人工智能(AI)的快速发展,如大型语言模型(LLM)和多模态大型语言模型(MLLM),有潜力彻底改变我们在各个行业中与数字系统工作和互动的方式。然而,当前的软件自动化状态,如机器人流程自动化(RPA)框架,通常需要领域专业知识,缺乏可见性和直观界面,使用户难以充分利用这些技术。本文论述了人类中心自动化(HCA)这一新兴领域的重要性,该领域优先考虑用户需求和偏好在自动化系统的设计和开发中。借鉴人机交互研究和案例研究的实证证据,我们强调了在自动化中考虑用户视角的重要性,并提出了设计以人为中心的自动化解决方案的框架。本文讨论了现有自动化方法的局限性,整合AI和RPA的挑战,以及以人为中心的自动化对生产力、创新和使这些技术普及化的好处。我们强调开源解决方案的重要性,并提供了HCA如何在快速发展的AI时代赋予个人和组织力量的示例,帮助他们保持竞争力。本文还探讨了实现更先进和上下文感知的自动化解决方案的途径。最后,我们呼吁研究人员和从业者专注于开发能够适应用户需求、提供直观界面并利用高端AI能力创造更易访问和用户友好的自动化未来的技术。
更新时间: 2024-05-24 22:12:28
领域: cs.HC,cs.AI
Machine Learning-Assisted Thermoelectric Cooling for On-Demand Multi-Hotspot Thermal Management
Thermoelectric coolers (TECs) offer a promising solution for direct cooling of local hotspots and active thermal management in advanced electronic systems. However, TECs present significant trade-offs among spatial cooling, heating and power consumption. The optimization of TECs requires extensive simulations, which are impractical for managing actual systems with multiple hotspots under spatial and temporal variations. In this study, we present a novel machine learning-assisted optimization algorithm for thermoelectric coolers that can achieve global optimal temperature by individually controlling TEC units based on real-time multi-hotspot conditions across the entire domain. We train a convolutional neural network (CNN) with a combination of the Inception module and multi-task learning (MTL) approach to comprehend the coupled thermal-electrical physics underlying the system and attain accurate predictions for both temperature and power consumption with and without TECs. Due to the intricate interaction among passive thermal gradient, Peltier effect and Joule effect, a local optimal TEC control experiences spatial temperature trade-off which may not lead to a global optimal solution. To address this issue, we develop a backtracking-based optimization algorithm using the machine learning model to iterate all possible TEC assignments for attaining global optimal solutions. For any m by n matrix with NHS hotspots (n, m <= 10, 0<= NHS <= 20), our algorithm is capable of providing 52.4% peak temperature reduction and its corresponding TEC array control within an average of 1.64 seconds while iterating through tens of temperature predictions behind-the-scenes. This represents a speed increase of over three orders of magnitude compared to traditional FEM strategies which take approximately 27 minutes.
Updated: 2024-05-24 21:57:17
标题: 机器学习辅助的热电制冷技术用于按需多热点热管理
摘要: 热电制冷器(TEC)为解决局部热点直接制冷和先进电子系统中的主动热管理提供了有希望的解决方案。然而,TEC在空间制冷、加热和功耗之间存在显著的权衡。对TEC的优化需要进行大量模拟,这在管理具有多个热点的实际系统并考虑空间和时间变化方面是不切实际的。在本研究中,我们提出了一种新颖的机器学习辅助优化算法,用于热电制冷器,可以通过根据整个领域的实时多热点条件单独控制TEC单元来实现全局最优温度。我们使用Inception模块和多任务学习(MTL)方法结合训练卷积神经网络(CNN),以理解系统底层的耦合热电物理学,并在有或没有TEC的情况下获得温度和功耗的准确预测。由于被动热梯度、Peltier效应和焦耳效应之间的复杂相互作用,局部最优TEC控制经历了可能无法导致全局最优解的空间温度权衡。为解决这一问题,我们开发了一种基于回溯的优化算法,利用机器学习模型迭代所有可能的TEC分配,以实现全局最优解。对于具有NHS个热点的m乘n矩阵(n,m <= 10,0 <= NHS <= 20),我们的算法能够在平均1.64秒内提供52.4%的峰值温度降低和相应的TEC阵列控制,同时在幕后迭代数十次温度预测。与传统的有限元方法相比,后者大约需要27分钟,这代表了超过三个数量级的速度增加。
更新时间: 2024-05-24 21:57:17
领域: physics.app-ph,cs.LG
CFGs: Causality Constrained Counterfactual Explanations using goal-directed ASP
Machine learning models that automate decision-making are increasingly used in consequential areas such as loan approvals, pretrial bail approval, and hiring. Unfortunately, most of these models are black boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might also desire explanations to understand why a decision was made. Ethical and legal considerations require informing the individual of changes in the input attribute (s) that could be made to produce a desirable outcome. Our work focuses on the latter problem of generating counterfactual explanations by considering the causal dependencies between features. In this paper, we present the framework CFGs, CounterFactual Generation with s(CASP), which utilizes the goal-directed Answer Set Programming (ASP) system s(CASP) to automatically generate counterfactual explanations from models generated by rule-based machine learning algorithms in particular. We benchmark CFGs with the FOLD-SE model. Reaching the counterfactual state from the initial state is planned and achieved using a series of interventions. To validate our proposal, we show how counterfactual explanations are computed and justified by imagining worlds where some or all factual assumptions are altered/changed. More importantly, we show how CFGs navigates between these worlds, namely, go from our initial state where we obtain an undesired outcome to the imagined goal state where we obtain the desired decision, taking into account the causal relationships among features.
Updated: 2024-05-24 21:47:58
标题: CFGs: 使用目标导向ASP的因果约束反事实解释
摘要: 自动化决策的机器学习模型越来越多地应用于贷款批准、预审批准和招聘等重要领域。不幸的是,大多数这些模型都是黑匣子,即它们无法揭示它们如何做出这些预测决定。透明度的需求要求对这些预测进行理由解释。受影响的个人可能还希望得到解释,以了解为什么做出了某个决定。道德和法律考虑要求告知个人输入属性的变化,以产生期望的结果。我们的工作集中在通过考虑特征之间的因果依赖性来生成反事实解释的后一个问题上。在本文中,我们提出了框架CFGs,即CounterFactual Generation with s(CASP),它利用目标导向答案集编程(ASP)系统s(CASP)来自动生成由基于规则的机器学习算法生成的模型的反事实解释。我们使用FOLD-SE模型对CFGs进行基准测试。通过一系列干预计划和达到反事实状态,从初始状态到反事实状态的转变是计划并实现的。为了验证我们的提议,我们展示了如何通过想象一些或全部事实假设发生变化/改变来计算和证明反事实解释。更重要的是,我们展示了CFGs如何在这些世界之间导航,即从我们获取不良结果的初始状态到我们获取期望决定的想象目标状态,考虑到特征之间的因果关系。
更新时间: 2024-05-24 21:47:58
领域: cs.AI,cs.LG,cs.LO
A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based Brain Age Prediction
Machine learning models for continuous outcomes often yield systematically biased predictions, particularly for values that largely deviate from the mean. Specifically, predictions for large-valued outcomes tend to be negatively biased, while those for small-valued outcomes are positively biased. We refer to this linear central tendency warped bias as the "systematic bias of machine learning regression". In this paper, we first demonstrate that this issue persists across various machine learning models, and then delve into its theoretical underpinnings. We propose a general constrained optimization approach designed to correct this bias and develop a computationally efficient algorithm to implement our method. Our simulation results indicate that our correction method effectively eliminates the bias from the predicted outcomes. We apply the proposed approach to the prediction of brain age using neuroimaging data. In comparison to competing machine learning models, our method effectively addresses the longstanding issue of "systematic bias of machine learning regression" in neuroimaging-based brain age calculation, yielding unbiased predictions of brain age.
Updated: 2024-05-24 21:34:16
标题: 机器学习回归模型的系统偏差及其校正:基于影像的脑龄预测应用
摘要: 机器学习模型对于连续性结果往往产生系统偏倚的预测,特别是对于远离平均值的数值。具体而言,对于大数值结果的预测往往出现负偏差,而对于小数值结果则出现正偏差。我们将这种线性中心倾向扭曲偏差称为"机器学习回归的系统性偏差"。在本文中,我们首先展示了这个问题在各种机器学习模型中的持续存在,并深入探讨其理论基础。我们提出了一个旨在纠正这种偏差的一般受限优化方法,并开发了一个计算效率高的算法来实现我们的方法。我们的模拟结果表明,我们的校正方法有效地消除了预测结果的偏差。我们将提出的方法应用于使用神经影像数据预测大脑年龄。与竞争的机器学习模型相比,我们的方法有效地解决了"机器学习回归的系统性偏差"这一长期存在的问题,在基于神经影像的大脑年龄计算中产生了无偏预测。
更新时间: 2024-05-24 21:34:16
领域: stat.ML,cs.LG,stat.ME
Transformers represent belief state geometry in their residual stream
What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a framework connecting the structure of training data to the computational structure and representations that transformers use to carry out their behavior.
Updated: 2024-05-24 21:14:10
标题: 变压器在其残差流中代表信念状态几何形状
摘要: 我们在对大型语言模型进行下一个标记预测训练时,正在构建什么计算结构?在这里,我们提出证据表明,这种结构由对数据生成过程的隐藏状态进行信念更新的元动力学给出。利用最优预测理论,我们预期并发现信念状态在变压器的残留流中被线性表示,即使在预测的信念状态几何结构具有非常复杂的分形结构的情况下也是如此。我们研究了信念状态几何结构在最终残余流中表示或分布在多个层的残余流中的情况,提供了解释这些观察结果的框架。此外,我们证明了推断的信念状态包含有关整个未来的信息,超出了变压器明确训练的局部下一个标记预测。我们的工作提供了一个框架,连接训练数据的结构与变压器用来执行其行为的计算结构和表示。
更新时间: 2024-05-24 21:14:10
领域: cs.LG,cs.CL
Can Implicit Bias Imply Adversarial Robustness?
The implicit bias of gradient-based training algorithms has been considered mostly beneficial as it leads to trained networks that often generalize well. However, Frei et al. (2023) show that such implicit bias can harm adversarial robustness. Specifically, when the data consists of clusters with small inter-cluster correlation, a shallow (two-layer) ReLU network trained by gradient flow generalizes well, but it is not robust to adversarial attacks of small radius, despite the existence of a much more robust classifier that can be explicitly constructed from a shallow network. In this paper, we extend recent analyses of neuron alignment to show that a shallow network with a polynomial ReLU activation (pReLU) trained by gradient flow not only generalizes well but is also robust to adversarial attacks. Our results highlight the importance of the interplay between data structure and architecture design in the implicit bias and robustness of trained networks.
Updated: 2024-05-24 21:09:53
标题: 隐性偏见是否意味着对抗性鲁棒性?
摘要: 基于梯度的训练算法的隐性偏见通常被认为是有益的,因为它导致训练的网络通常具有良好的泛化能力。然而,Frei等人(2023年)表明,这种隐性偏见可能会损害对抗鲁棒性。具体来说,当数据由具有小间隔相关性的簇组成时,通过梯度流训练的浅层(两层)ReLU网络具有良好的泛化性能,但不具有对小半径的对抗性攻击的鲁棒性,尽管存在一个可以从浅层网络中显式构建的更加鲁棒的分类器。在本文中,我们扩展了对神经元对齐的最近分析,表明通过梯度流训练的具有多项式ReLU激活(pReLU)的浅层网络不仅具有良好的泛化性能,而且对对抗攻击具有鲁棒性。我们的结果突显了数据结构和架构设计在训练网络的隐性偏见和鲁棒性中的相互作用的重要性。
更新时间: 2024-05-24 21:09:53
领域: cs.LG,stat.ML
Review of Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Industrial Open Source Data
In the field of Prognostics and Health Management (PHM), recent years have witnessed a significant surge in the application of machine learning (ML). Despite this growth, the field grapples with a lack of unified guidelines and systematic approaches for effectively implementing these ML techniques and comprehensive analysis regarding industrial open-source data across varied scenarios. To address these gaps, this paper provides a comprehensive review of machine learning approaches for diagnostics and prognostics of industrial systems using open-source datasets from PHM Data Challenge Competitions held between 2018 and 2023 by PHM Society and IEEE Reliability Society and summarizes a unified ML framework. This review systematically categorizes and scrutinizes the problems, challenges, methodologies, and advancements demonstrated in these competitions, highlighting the evolving role of both conventional machine learning and deep learning in tackling complex industrial tasks related to detection, diagnosis, assessment, and prognosis. Moreover, this paper delves into the common challenges in PHM data challenge competitions by emphasizing both data-related and model-related issues and summarizes the solutions that have been employed to address these challenges. Finally, we identify key themes and potential directions for future research, providing opportunities and prospects for ML further development in PHM.
Updated: 2024-05-24 21:09:34
标题: 工业系统诊断和预测的机器学习方法综述:利用工业开源数据
摘要: 在预测和健康管理(PHM)领域,近年来机器学习(ML)的应用显著增加。尽管这一增长,该领域仍然缺乏统一的指导方针和系统化方法,以有效地实施这些ML技术,并对工业开源数据在不同场景下进行全面分析。为了弥补这些空白,本文提供了一份关于利用PHM数据挑战竞赛在2018年至2023年举办的开源数据集进行工业系统诊断和预测的机器学习方法的全面评估,这些竞赛由PHM学会和IEEE可靠性学会举办,并总结了一个统一的ML框架。这篇综述系统地对这些竞赛中展示的问题、挑战、方法论和进展进行分类和审查,突出了传统机器学习和深度学习在解决与检测、诊断、评估和预测有关的复杂工业任务中的不断演变的作用。此外,本文深入探讨了PHM数据挑战竞赛中的常见挑战,强调了数据相关和模型相关问题,并总结了已经采用的解决方案。最后,我们确定了未来研究的关键主题和潜在方向,为ML在PHM中的进一步发展提供了机遇和前景。
更新时间: 2024-05-24 21:09:34
领域: cs.LG,cs.AI
A Unified Theory of Stochastic Proximal Point Methods without Smoothness
This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning, a trait not shared by the dominant stochastic gradient descent (SGD) algorithm. A framework of assumptions that we introduce encompasses methods employing techniques such as variance reduction and arbitrary sampling. A cornerstone of our general theoretical approach is a parametric assumption on the iterates, correction and control vectors. We establish a single theorem that ensures linear convergence under this assumption and the $\mu$-strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which demonstrates the robustness of our approach. We expand our study by developing three new variants of SPPM, and through numerical experiments we elucidate various properties inherent to them.
Updated: 2024-05-24 21:09:19
标题: 一个不涉及光滑性的随机近端点方法的统一理论
摘要: 本文提出了对随机近端点方法(SPPM)的广泛变体进行综合分析。由于其数值稳定性和对不完美调整的鲁棒性,近端点方法引起了极大的兴趣,这是主导的随机梯度下降(SGD)算法所不具备的特性。我们引入了一组假设框架,包括采用方差减少和任意抽样等技术的方法。我们通用理论方法的一个基石是对迭代、校正和控制向量的参数假设。在这个假设和损失函数的$\mu$-强凸性下,我们建立了一个定理,保证了线性收敛,并且无需调用平滑性。这一整体定理重新确立了对几种现有方法的已知最佳复杂度和收敛保证,证明了我们方法的鲁棒性。我们通过开发三种SPPM的新变体扩展了研究,并通过数值实验阐明了它们固有的各种属性。
更新时间: 2024-05-24 21:09:19
领域: math.OC,cs.LG
Send Message to the Future? Blockchain-based Time Machines for Decentralized Reveal of Locked Information
Conditional information reveal systems automate the release of information upon meeting specific predefined conditions, such as time or location. This paper introduces a breakthrough in the understanding, design and application of conditional information reveal systems that are highly secure and decentralized. By designing a new practical timed-release cryptography system and a verifiable secret sharing scheme, a novel data sharing system is devised on the blockchain that `sends messages in the future' with highly accurate decryption times. This paper provides a complete evaluation portfolio of this pioneering paradigm, including analytical results, a validation of its robustness in the Tamarin Prover and a performance evaluation of a real-world, open-source system prototype deployed across the globe. Using real-world election data, we also demonstrate the applicability of this innovative system in e-voting, illustrating its capacity to secure and ensure fair electronic voting processes.
Updated: 2024-05-24 21:06:17
标题: 向未来发送信息?基于区块链的时间机器用于去中心化揭示锁定信息
摘要: 条件信息揭示系统自动释放信息,当满足特定预定义条件时,比如时间或位置。本文介绍了对高度安全和分散化的条件信息揭示系统的理解、设计和应用的突破。通过设计一种新的实用的定时释放加密系统和可验证的秘密分享方案,在区块链上设计了一种新颖的数据共享系统,可以在未来"发送消息"并具有高度准确的解密时间。本文提供了这一开创性范式的完整评估组合,包括分析结果、在Tamarin Prover中对其鲁棒性的验证以及在全球部署的一个真实世界开源系统原型的性能评估。利用真实世界的选举数据,我们还展示了这一创新系统在电子投票中的适用性,展示了其确保和保证公平电子投票流程的能力。
更新时间: 2024-05-24 21:06:17
领域: cs.CR
Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents
Goal misalignment, reward sparsity and difficult credit assignment are only a few of the many issues that make it difficult for deep reinforcement learning (RL) agents to learn optimal policies. Unfortunately, the black-box nature of deep neural networks impedes the inclusion of domain experts for inspecting the model and revising suboptimal policies. To this end, we introduce *Successive Concept Bottleneck Agents* (SCoBots), that integrate consecutive concept bottleneck (CB) layers. In contrast to current CB models, SCoBots do not just represent concepts as properties of individual objects, but also as relations between objects which is crucial for many RL tasks. Our experimental results provide evidence of SCoBots' competitive performances, but also of their potential for domain experts to understand and regularize their behavior. Among other things, SCoBots enabled us to identify a previously unknown misalignment problem in the iconic video game, Pong, and resolve it. Overall, SCoBots thus result in more human-aligned RL agents. Our code is available at https://github.com/k4ntz/SCoBots .
Updated: 2024-05-24 21:00:56
标题: 可解释的概念瓶颈以对齐强化学习代理
摘要: 目标不一致、奖励稀缺和困难的信用分配只是使深度强化学习(RL)代理程序难以学习最佳政策的许多问题之一。不幸的是,深度神经网络的黑盒性质阻碍了领域专家检查模型和修订次优政策的过程。为此,我们引入了*连续概念瓶颈代理*(SCoBots),它们集成了连续概念瓶颈(CB)层。与当前的CB模型相比,SCoBots不仅将概念表示为个体对象的属性,还将其表示为对象之间的关系,这对许多RL任务至关重要。我们的实验结果证明了SCoBots具有竞争性能力,同时也表明领域专家可以理解和规范他们的行为。SCoBots使我们能够在标志性视频游戏Pong中识别以前未知的不一致问题并解决它。总的来说,SCoBots可以产生更符合人类的RL代理程序。我们的代码可在https://github.com/k4ntz/SCoBots 上找到。
更新时间: 2024-05-24 21:00:56
领域: cs.LG,cs.SC
Zero-Shot Spam Email Classification Using Pre-trained Large Language Models
This paper investigates the application of pre-trained large language models (LLMs) for spam email classification using zero-shot prompting. We evaluate the performance of both open-source (Flan-T5) and proprietary LLMs (ChatGPT, GPT-4) on the well-known SpamAssassin dataset. Two classification approaches are explored: (1) truncated raw content from email subject and body, and (2) classification based on summaries generated by ChatGPT. Our empirical analysis, leveraging the entire dataset for evaluation without further training, reveals promising results. Flan-T5 achieves a 90% F1-score on the truncated content approach, while GPT-4 reaches a 95% F1-score using summaries. While these initial findings on a single dataset suggest the potential for classification pipelines of LLM-based subtasks (e.g., summarisation and classification), further validation on diverse datasets is necessary. The high operational costs of proprietary models, coupled with the general inference costs of LLMs, could significantly hinder real-world deployment for spam filtering.
Updated: 2024-05-24 20:55:49
标题: 使用预训练的大型语言模型进行零样本垃圾邮件分类
摘要: 本文研究了使用零-shot提示的预训练大型语言模型(LLMs)在垃圾邮件分类中的应用。我们评估了开源(Flan-T5)和专有(LLMs) (ChatGPT、GPT-4)在著名的SpamAssassin数据集上的性能。我们探讨了两种分类方法:(1)来自电子邮件主题和正文的截断原始内容,以及(2)基于ChatGPT生成的摘要进行分类。我们的实证分析,利用整个数据集进行评估而无需进一步训练,显示出令人鼓舞的结果。Flan-T5在截断内容方法上实现了90%的F1分数,而GPT-4在使用摘要时达到了95%的F1分数。尽管这些初步发现仅针对单个数据集,但表明了基于LLM的子任务(例如摘要和分类)的分类流水线的潜力,但需要进一步在不同数据集上进行验证。专有模型的高运营成本,以及LLMs的一般推理成本,可能会显著阻碍垃圾邮件过滤的实际部署。
更新时间: 2024-05-24 20:55:49
领域: cs.CL,cs.AI,I.2.7
Data-Driven Discovery of PDEs via the Adjoint Method
In this work, we present an adjoint-based method for discovering the underlying governing partial differential equations (PDEs) given data. The idea is to consider a parameterized PDE in a general form and formulate a PDE-constrained optimization problem aimed at minimizing the error of the PDE solution from data. Using variational calculus, we obtain an evolution equation for the Lagrange multipliers (adjoint equations) allowing us to compute the gradient of the objective function with respect to the parameters of PDEs given data in a straightforward manner. In particular, we consider a family of parameterized PDEs encompassing linear, nonlinear, and spatial derivative candidate terms, and elegantly derive the corresponding adjoint equations. We show the efficacy of the proposed approach in identifying the form of the PDE up to machine accuracy, enabling the accurate discovery of PDEs from data. We also compare its performance with the famous PDE Functional Identification of Nonlinear Dynamics method known as PDE-FIND (Rudy et al., 2017), on both smooth and noisy data sets. Even though the proposed adjoint method relies on forward/backward solvers, it outperforms PDE-FIND for large data sets thanks to the analytic expressions for gradients of the cost function with respect to each PDE parameter.
Updated: 2024-05-24 20:52:32
标题: 基于数据驱动的通过伴随方法发现PDEs
摘要: 在这项工作中,我们提出了一种基于伴随的方法,用于从数据中发现潜在的主导偏微分方程(PDEs)。其思想是考虑一般形式的参数化PDE,并制定一个旨在最小化PDE解与数据之间误差的PDE约束优化问题。利用变分微积分,我们得到了拉格朗日乘子的演化方程(伴随方程),从而使我们能够直接计算目标函数对给定数据中PDE参数的梯度。特别地,我们考虑了一个包含线性、非线性和空间导数候选项的参数化PDE族,并优雅地推导了相应的伴随方程。我们展示了所提出方法在识别PDE形式上达到机器精度的有效性,使得能够准确地从数据中发现PDEs。我们还在平滑和嘈杂的数据集上将其性能与著名的PDE非线性动力学识别方法PDE-FIND(Rudy等人,2017年)进行了比较。尽管所提出的伴随方法依赖于正向/反向求解器,但由于对每个PDE参数的成本函数梯度具有解析表达式,它在大数据集上胜过PDE-FIND。
更新时间: 2024-05-24 20:52:32
领域: math.OC,cs.LG
Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.
Updated: 2024-05-24 20:52:02
标题: 三脚架:三种互补的感知偏见用于解开表示学习
摘要: 感应偏差在解开表示学习中起着至关重要的作用,可以缩小一个不明确的解决方案集。在这项工作中,我们考虑从文献中选择三种感应偏差赋予神经网络自编码器:通过量化将数据压缩到类似网格的潜在空间中,潜在之间的集体独立性,以及任何潜在对其他潜在如何确定数据生成的最小功能影响。原则上,这些感应偏差是相互补充的:它们直接指定了潜在空间、编码器和解码器的属性。然而,在实践中,简单地组合现有技术来实现这些感应偏差通常无法带来显著的好处。为了解决这个问题,我们提出了对这三种技术的适应,简化了学习问题,为关键的正则化项提供了稳定的不变性,并消除了退化的激励。最终的模型Tripod在四个图像解开基准测试中取得了最先进的结果。我们还验证了Tripod明显改进了其天真的版本,而其三个“支柱”都是实现最佳性能所必要的。
更新时间: 2024-05-24 20:52:02
领域: cs.LG,cs.CV
Clustering Survival Data using a Mixture of Non-parametric Experts
Survival analysis aims to predict the timing of future events across various fields, from medical outcomes to customer churn. However, the integration of clustering into survival analysis, particularly for precision medicine, remains underexplored. This study introduces SurvMixClust, a novel algorithm for survival analysis that integrates clustering with survival function prediction within a unified framework. SurvMixClust learns latent representations for clustering while also predicting individual survival functions using a mixture of non-parametric experts. Our evaluations on five public datasets show that SurvMixClust creates balanced clusters with distinct survival curves, outperforms clustering baselines, and competes with non-clustering survival models in predictive accuracy, as measured by the time-dependent c-index and log-rank metrics.
Updated: 2024-05-24 20:47:58
标题: 使用非参数专家混合模型对生存数据进行聚类
摘要: 生存分析旨在预测未来事件的时间,涵盖各个领域,从医疗结果到客户流失。然而,将聚类整合到生存分析中,特别是对于精准医学,仍然未被充分探讨。本研究引入了SurvMixClust,一种新颖的生存分析算法,它将聚类与生存函数预测整合到统一框架中。SurvMixClust在学习聚类的潜在表示的同时,还使用非参数专家混合来预测个体生存函数。我们在五个公共数据集上的评估结果显示,SurvMixClust创建了具有明显生存曲线的平衡聚类,优于聚类基线,并在预测准确性方面与非聚类生存模型竞争,根据时间相关的C指数和log-rank指标进行测量。
更新时间: 2024-05-24 20:47:58
领域: cs.LG,stat.ML
PatchProt: Hydrophobic patch prediction using protein foundation models
Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has been shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multi-task deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods. In this study, we harnessed a recently released leading large language model ESM-2. Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our fine-tuned ESM-2 model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.
Updated: 2024-05-24 20:37:02
标题: PatchProt:使用蛋白质基础模型预测疏水斑点
摘要: 蛋白质表面上的疏水斑块在蛋白质-蛋白质和蛋白质-配体相互作用中起着重要的功能作用。大量的疏水表面也参与了聚集性疾病的发展。从蛋白质序列中预测出暴露的疏水斑块被证明是一项困难的任务。通过对基础模型进行微调,可以使模型适应新任务的特定细微差异,同时使用更小的数据集。此外,多任务深度学习为解决数据差距提供了一个有希望的解决方案,同时优于单任务方法。在这项研究中,我们利用最近发布的领先大型语言模型ESM-2。通过利用最近开发的参数高效微调方法,实现了ESM-2的高效微调。这种方法使得模型层的全面训练不需要过多的参数,也不需要包含计算昂贵的多序列分析。我们探索了几个相关任务,包括局部(残基)和全局(蛋白质)水平,以改善模型的表征。结果,我们微调的ESM-2模型PatchProt不仅可以预测疏水斑块区域,而且在预测主要任务(包括二级结构和表面可及性预测)方面优于现有方法。重要的是,我们的分析表明,包括相关的局部任务可以改善更困难的全局任务的预测。这项研究为基于序列的蛋白质属性预测设定了新的标准,并突出了微调基础模型丰富模型表征的显著潜力。
更新时间: 2024-05-24 20:37:02
领域: q-bio.QM,cs.AI,cs.LG
Critical windows: non-asymptotic theory for feature emergence in diffusion models
We develop theory to understand an intriguing property of diffusion models for image generation that we term critical windows. Empirically, it has been observed that there are narrow time intervals in sampling during which particular features of the final image emerge, e.g. the image class or background color (Ho et al., 2020b; Meng et al., 2022; Choi et al., 2022; Raya & Ambrogioni, 2023; Georgiev et al., 2023; Sclocchi et al., 2024; Biroli et al., 2024). While this is advantageous for interpretability as it implies one can localize properties of the generation to a small segment of the trajectory, it seems at odds with the continuous nature of the diffusion. We propose a formal framework for studying these windows and show that for data coming from a mixture of strongly log-concave densities, these windows can be provably bounded in terms of certain measures of inter- and intra-group separation. We also instantiate these bounds for concrete examples like well-conditioned Gaussian mixtures. Finally, we use our bounds to give a rigorous interpretation of diffusion models as hierarchical samplers that progressively "decide" output features over a discrete sequence of times. We validate our bounds with synthetic experiments. Additionally, preliminary experiments on Stable Diffusion suggest critical windows may serve as a useful tool for diagnosing fairness and privacy violations in real-world diffusion models.
Updated: 2024-05-24 20:35:38
标题: 临界窗口:扩散模型中特征出现的非渐近理论
摘要: 我们开发了理论来理解图像生成扩散模型的一个引人注目的特性,我们称之为关键窗口。经验上观察到,在采样过程中存在狭窄的时间间隔,在这些时间间隔内,最终图像的特定特征会出现,例如图像类别或背景颜色(Ho等,2020b; Meng等,2022; Choi等,2022; Raya&Ambrogioni,2023; Georgiev等,2023; Sclocchi等,2024; Biroli等,2024)。虽然这对于可解释性是有利的,因为它意味着可以将生成的特性定位到轨迹的一个小部分,但这似乎与扩散的连续性质相矛盾。我们提出了一个形式化的框架来研究这些窗口,并表明对于来自强对数凹密度混合的数据,这些窗口可以在一定程度上受到一些组间和组内分离度量的约束。我们还为具体示例如条件良好的高斯混合实例化了这些界限。最后,我们利用我们的界限给出了扩散模型的严格解释,这些模型被视为逐渐“决定”输出特征的分层采样器,这些特征在离散时间序列上逐步出现。我们通过合成实验验证了我们的界限。此外,对稳定扩散的初步实验表明,关键窗口可能作为诊断现实世界扩散模型中的公平性和隐私侵犯的有用工具。
更新时间: 2024-05-24 20:35:38
领域: cs.LG,cs.CV,stat.ML
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-width thermodynamic limit, i.e., $N,P\rightarrow\infty$, $P/N=\mathcal{O}(1)$, where $N$ is the network width and $P$ is the number of training examples. Our theory shows that the predictor statistics are expressed as a sum of independent kernels, each one pairing different 'attention paths', defined as information pathways through different attention heads across layers. The kernels are weighted according to a 'task-relevant kernel combination' mechanism that aligns the total kernel with the task labels. As a consequence, this interplay between attention paths enhances generalization performance. Experiments confirm our findings on both synthetic and real-world sequence classification tasks. Finally, our theory explicitly relates the kernel combination mechanism to properties of the learned weights, allowing for a qualitative transfer of its insights to models trained via gradient descent. As an illustration, we demonstrate an efficient size reduction of the network, by pruning those attention heads that are deemed less relevant by our theory.
Updated: 2024-05-24 20:34:18
标题: 拆解变压器统计力学理论中注意路径的相互作用
摘要: 尽管Transformer在实践中表现出色,但其理论理解仍然难以捉摸。在这里,我们考虑了一个深层多头自注意力网络,与Transformer密切相关但在分析上是可处理的。我们在这个模型中发展了一个贝叶斯学习的统计力学理论,推导出在有限宽度热力学极限下网络预测统计量的精确方程,即$N,P\rightarrow\infty$,$P/N=\mathcal{O}(1)$,其中$N$是网络宽度,$P$是训练样本数量。我们的理论表明,预测统计量可以表示为独立核的总和,每个核都将不同的“注意路径”配对,这些路径被定义为跨层的不同注意头中的信息通路。这些核根据“任务相关核组合”机制进行加权,该机制将总核与任务标签对齐。因此,这种注意路径之间的相互作用增强了泛化性能。实验证实了我们在合成和真实序列分类任务上的发现。最后,我们的理论明确将核组合机制与学习权重的属性联系起来,从而使其见解可以定性地转移到通过梯度下降训练的模型。作为一个示例,我们演示了通过我们的理论认为不太相关的那些注意头的修剪,实现了网络的高效缩小。
更新时间: 2024-05-24 20:34:18
领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech,stat.ML
MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation
Skin lesion segmentation is key for early skin cancer detection. Challenges in automatic segmentation from dermoscopic images include variations in color, texture, and artifacts of indistinct lesion boundaries. Deep learning methods like CNNs and U-Net have shown promise in addressing these issues. To further aid early diagnosis, especially on mobile devices with limited computing power, we present MUCM-Net. This efficient model combines Mamba State-Space Models with our UCM-Net architecture for improved feature learning and segmentation. MUCM-Net's Mamba-UCM Layer is optimized for mobile deployment, offering high accuracy with low computational needs. Tested on ISIC datasets, it outperforms other methods in accuracy and computational efficiency, making it a scalable tool for early detection in settings with limited resources. Our MUCM-Net source code is available for research and collaboration, supporting advances in mobile health diagnostics and the fight against skin cancer. In order to facilitate accessibility and further research in the field, the MUCM-Net source code is https://github.com/chunyuyuan/MUCM-Net
Updated: 2024-05-24 20:33:59
标题: MUCM-Net:一种以曼巴为动力的用于皮肤病变分割的UCM-Net
摘要: 皮肤病变分割对早期皮肤癌检测至关重要。从皮肤镜图像中自动分割的挑战包括颜色、纹理和模糊病变边界的变化。深度学习方法如CNN和U-Net已显示出在解决这些问题方面的潜力。为了进一步帮助早期诊断,特别是在计算能力有限的移动设备上,我们提出了MUCM-Net。这种高效模型将Mamba状态空间模型与我们的UCM-Net架构相结合,以改善特征学习和分割。MUCM-Net的Mamba-UCM层经过优化,可用于移动部署,在不需要较高计算量的情况下提供高准确性。在ISIC数据集上经过测试,它在准确性和计算效率方面优于其他方法,使其成为在资源有限的环境中早期检测的可扩展工具。我们的MUCM-Net源代码可供研究和合作使用,支持移动健康诊断和抗击皮肤癌的进展。为了促进领域内的可访问性和进一步研究,MUCM-Net源代码可通过https://github.com/chunyuyuan/MUCM-Net获取。
更新时间: 2024-05-24 20:33:59
领域: eess.IV,cs.CV,cs.LG
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.
Updated: 2024-05-24 20:30:14
标题: SF-DQN:使用继承特征进行深度强化学习的可证明知识迁移
摘要: 本文研究了转移强化学习(RL)问题,其中多个RL问题具有不同的奖励函数,但共享相同的基础转移动态。在这种设置中,每个RL问题(任务)的Q函数可以分解为后继特征(SF)和奖励映射:前者表征转移动态,后者表征特定于任务的奖励函数。这种Q函数分解,结合一种称为广义策略改进(GPI)的策略改进算子,降低了找到最优Q函数的样本复杂性,因此SF和GPI框架在实证性能方面表现出有希望的优势,相对于传统的RL方法如Q学习。然而,其理论基础仍然没有完全确立,尤其是在使用深度神经网络学习后继特征(SF-DQN)时。本文研究了在转移RL问题中使用SF-DQN进行可证明的知识转移。我们建立了第一个具有可证明概括保证的SF-DQN与GPI的收敛分析。理论揭示了SF-DQN与GPI在收敛速度和泛化能力方面优于传统的RL方法,如深度Q网络。对真实和合成RL任务的数值实验支持了SF-DQN和GPI的优越性能,与我们的理论发现一致。
更新时间: 2024-05-24 20:30:14
领域: cs.LG,stat.ML
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures
The transformative influence of Large Language Models (LLMs) is profoundly reshaping the Artificial Intelligence (AI) technology domain. Notably, ChatGPT distinguishes itself within these models, demonstrating remarkable performance in multi-turn conversations and exhibiting code proficiency across an array of languages. In this paper, we carry out a comprehensive evaluation of ChatGPT's coding capabilities based on what is to date the largest catalog of coding challenges. Our focus is on the python programming language and problems centered on data structures and algorithms, two topics at the very foundations of Computer Science. We evaluate ChatGPT for its ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. Where ChatGPT code successfully executes, but fails to solve the problem at hand, we look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations. To infer whether ChatGPT might have directly memorized some of the data that was used to train it, we methodically design an experiment to investigate this phenomena. Making comparisons with human performance whenever feasible, we investigate all the above questions from the context of both its underlying learning models (GPT-3.5 and GPT-4), on a vast array sub-topics within the main topics, and on problems having varying degrees of difficulty.
Updated: 2024-05-24 20:28:09
标题: 揭开巨人的面纱:对ChatGPT在编码算法和数据结构方面能力的全面评估
摘要: 大型语言模型(LLMs)的转变性影响正在深刻地重塑人工智能(AI)技术领域。值得注意的是,ChatGPT在这些模型中脱颖而出,展现出在多轮对话中卓越的表现,并在多种语言中展示出代码熟练度。在本文中,我们对ChatGPT的编码能力进行了全面评估,基于迄今为止最大的编码挑战目录。我们的重点是Python编程语言和围绕数据结构和算法两个在计算机科学基础上非常重要的主题的问题。我们评估ChatGPT生成正确解决方案的能力,其代码质量以及其代码引发的运行时错误的性质。当ChatGPT代码成功执行,但无法解决手头的问题时,我们研究通过的测试案例中的模式,以获得一些关于ChatGPT在这些情况下错了多少的见解。为了推断ChatGPT是否可能直接记住了用于训练它的一些数据,我们系统地设计了一个实验来调查这种现象。在可能的情况下与人类表现进行比较,我们从其底层学习模型(GPT-3.5和GPT-4),在主题的各种子主题上,以及难度不同的问题上,调查上述所有问题。
更新时间: 2024-05-24 20:28:09
领域: cs.SE,cs.AI,cs.CL
Pattern-Based Time-Series Risk Scoring for Anomaly Detection and Alert Filtering -- A Predictive Maintenance Case Study
Fault detection is a key challenge in the management of complex systems. In the context of SparkCognition's efforts towards predictive maintenance in large scale industrial systems, this problem is often framed in terms of anomaly detection - identifying patterns of behavior in the data which deviate from normal. Patterns of normal behavior aren't captured simply in the coarse statistics of measured signals. Rather, the multivariate sequential pattern itself can be indicative of normal vs. abnormal behavior. For this reason, normal behavior modeling that relies on snapshots of the data without taking into account temporal relationships as they evolve would be lacking. However, common strategies for dealing with temporal dependence, such as Recurrent Neural Networks or attention mechanisms are oftentimes computationally expensive and difficult to train. In this paper, we propose a fast and efficient approach to anomaly detection and alert filtering based on sequential pattern similarities. In our empirical analysis section, we show how this approach can be leveraged for a variety of purposes involving anomaly detection on a large scale real-world industrial system. Subsequently, we test our approach on a publicly-available dataset in order to establish its general applicability and robustness compared to a state-of-the-art baseline. We also demonstrate an efficient way of optimizing the framework based on an alert recall objective function.
Updated: 2024-05-24 20:27:45
标题: 基于模式的时间序列风险评分用于异常检测和警报过滤 -- 以预测性维护案例研究为例
摘要: 故障检测是复杂系统管理中的关键挑战。在SparkCognition致力于大规模工业系统预测维护的努力中,这个问题通常被表述为异常检测 - 识别数据中与正常行为偏离的模式。正常行为的模式并不仅仅体现在测量信号的粗略统计数据中。相反,多变量序列模式本身可能表明正常与异常行为。因此,仅依赖数据快照进行正常行为建模而不考虑随时间演变的时间关系的常见策略将不足。然而,常见的处理时间依赖性的策略,如循环神经网络或注意机制,通常计算昂贵且难以训练。在本文中,我们提出了一种基于序列模式相似性的快速有效的异常检测和警报过滤方法。在我们的经验分析部分,我们展示了这种方法如何可以用于涉及大规模实际工业系统的异常检测的各种目的。随后,我们在一个公开可用的数据集上测试了我们的方法,以建立其相对于最先进基线的一般适用性和稳健性。我们还展示了一种基于警报召回目标函数优化框架的有效方法。
更新时间: 2024-05-24 20:27:45
领域: cs.LG,eess.SP
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, including even those with large speaker populations such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed \textit{Peacock}, with strong vision and language capabilities. Through comprehensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce ~\textit{Henna}, a new benchmark specifically designed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally-aware Arabic MLLMs.The GitHub repository for the \textit{Peacock} project is available at \url{https://github.com/UBC-NLP/peacock}.
Updated: 2024-05-24 20:24:36
标题: 孔雀:一系列阿拉伯语多模式大型语言模型及基准
摘要: 多模式大语言模型(MLLMs)已被证明在需要复杂推理和语言理解的各种任务中有效。然而,由于除英语以外的其他语言缺乏高质量的多模式资源,MLLMs的成功仍然相对局限于基于英语的环境。这给开发其他语言的可比较模型,甚至包括像阿拉伯语这样拥有大量使用者的语言,带来了重大挑战。为了缓解这一挑战,我们引入了一系列全面的阿拉伯语MLLMs,名为“Peacock”,具有强大的视觉和语言能力。通过全面的定性和定量分析,我们展示了我们的模型在各种视觉推理任务上的良好性能,并进一步展示了它们新兴的方言潜力。此外,我们还介绍了一个名为“Henna”的新基准,专门设计用于评估MLLMs在与阿拉伯文化相关的方面,为具有文化意识的阿拉伯语MLLMs奠定了基础。 “Peacock”项目的GitHub存储库可在\url{https://github.com/UBC-NLP/peacock}上找到。
更新时间: 2024-05-24 20:24:36
领域: cs.CL,cs.AI
Fisher Flow Matching for Generative Modeling over Discrete Data
Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the $\textit{Fisher-Rao metric}$. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the $d$-hypersphere $\mathbb{S}^d_+$, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of $\mathbb{S}^d_+$. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.
Updated: 2024-05-24 20:21:17
标题: 费舍尔流匹配用于离散数据生成建模
摘要: 最近,离散数据上的生成建模取得了许多成功故事,应用范围涵盖语言建模、生物序列设计和图结构分子数据。离散数据的主要生成建模范式仍然是自回归的,最近的基于扩散或流匹配的替代方法在连续数据设置中的表现仍然不及,比如图像或视频生成。在这项工作中,我们介绍了Fisher-Flow,一种新颖的离散数据流匹配模型。Fisher-Flow采用明显的几何视角,将离散数据上的分类分布视为驻留在具有其自然黎曼度量的统计流形上的点:Fisher-Rao度量。因此,我们展示了离散数据本身可以连续地重新参数化为$d$-超球的正半轴上的点,这使我们能够通过沿着$\mathbb{S}^d_+$的(闭合形式)测地线沿运输质量的方式以合理的方式定义将任何源分布映射到目标分布的流。此外,Fisher-Flow中学习到的流可以通过利用黎曼最优传输进一步引导,从而提高训练动力学。我们证明由Fisher-Flow引起的梯度流在降低前向KL散度方面是最佳的。我们在一系列合成和多样的真实世界基准测试中评估了Fisher-Flow,包括设计DNA启动子和DNA增强子序列。从经验上看,我们发现Fisher-Flow在这些基准测试中优于先前的扩散和流匹配模型。
更新时间: 2024-05-24 20:21:17
领域: cs.LG,cs.AI
Explicit Lipschitz Value Estimation Enhances Policy Robustness Against Perturbation
In robotic control tasks, policies trained by reinforcement learning (RL) in simulation often experience a performance drop when deployed on physical hardware, due to modeling error, measurement error, and unpredictable perturbations in the real world. Robust RL methods account for this issue by approximating a worst-case value function during training, but they can be sensitive to approximation errors in the value function and its gradient before training is complete. In this paper, we hypothesize that Lipschitz regularization can help condition the approximated value function gradients, leading to improved robustness after training. We test this hypothesis by combining Lipschitz regularization with an application of Fast Gradient Sign Method to reduce approximation errors when evaluating the value function under adversarial perturbations. Our empirical results demonstrate the benefits of this approach over prior work on a number of continuous control benchmarks.
Updated: 2024-05-24 20:19:37
标题: 明确的利普希茨值估计增强政策对扰动的稳健性
摘要: 在机器人控制任务中,通过强化学习(RL)在模拟中训练的策略通常在部署到物理硬件上时会出现性能下降,这是由于建模误差、测量误差和现实世界中的不可预测的扰动所导致的。鲁棒的RL方法通过在训练过程中逼近一个最坏情况值函数来解决这个问题,但在训练完成之前,它们可能对值函数及其梯度的逼近误差敏感。在本文中,我们假设Lipschitz正则化可以帮助调整近似值函数的梯度,从而在训练后提高鲁棒性。我们通过将Lipschitz正则化与快速梯度符号方法相结合,以减少在对抗性扰动下评估值函数时的逼近误差,来验证这一假设。我们的实证结果表明,与先前的工作相比,这种方法在许多连续控制基准测试中带来了好处。
更新时间: 2024-05-24 20:19:37
领域: cs.LG
Scaling up the Banded Matrix Factorization Mechanism for Differentially Private ML
DP-BandMF offers a powerful approach to differentially private machine learning, balancing privacy amplification with noise correlation for optimal noise reduction. However, its scalability has been limited to settings where the number of training iterations is less than $10^4$. In this work, we present techniques that significantly extend DP-BandMF's reach, enabling use in settings with and over $10^6$ training iterations. Our enhanced implementation, coupled with extensive experiments, provides clear guidelines on selecting the optimal number of bands. These insights offer practitioners a deeper understanding of DP-BandMF's performance and how to maximize its utility for privacy-preserving machine learning.
Updated: 2024-05-24 20:19:15
标题: 扩展带状矩阵分解机制以实现差分隐私机器学习
摘要: DP-BandMF提供了一种强大的方法来实现差分隐私机器学习,平衡隐私放大和噪声相关性,以实现最佳噪声减少。然而,其可扩展性仅限于训练迭代次数少于$10^4$的情况。在这项工作中,我们提出了一些技术,显著扩展了DP-BandMF的适用范围,使其可以在训练迭代次数达到和超过$10^6$的情况下使用。我们增强的实现,结合大量实验,为选择最佳频带数量提供了明确的指导。这些见解为从业人员提供了对DP-BandMF性能的更深入了解,并指导他们如何最大化其在隐私保护机器学习中的效用。
更新时间: 2024-05-24 20:19:15
领域: cs.LG,cs.CR,cs.DS
RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network
Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs.
Updated: 2024-05-24 20:17:59
标题: RN-Net:水库节点启用的神经形视觉感知网络
摘要: 基于事件的摄像头受到生物视觉系统稀疏和异步尖峰表示的启发。然而,处理事件数据要么需要使用昂贵的特征描述符将尖峰转换为帧,要么需要使用昂贵的脉冲神经网络进行训练。在这项工作中,我们提出了一种神经网络架构,Reservoir Nodes-enabled neuromorphic vision sensing Network(RN-Net),基于简单的卷积层,集成了动态时间编码水库,用于局部和全局时空特征检测,具有低硬件和训练成本。RN-Net允许有效处理异步时间特征,并达到迄今为止DVS128 Gesture的最高准确率99.2%,以及DVS Lip数据集的最高准确率67.5%,网络尺寸更小。通过利用内部设备和电路动态,可以以非常低的硬件成本实现异步时间特征编码,无需预处理和专用内存和算术单元。使用简单的DNN模块和基于标准反向传播的训练规则进一步降低了实现成本。
更新时间: 2024-05-24 20:17:59
领域: cs.CV,cs.AI,eess.IV
Uncertainty Quantification for Neurosymbolic Programs via Compositional Conformal Prediction
Machine learning has become an effective tool for automatically annotating unstructured data (e.g., images) with structured labels (e.g., object detections). As a result, a new programming paradigm called neurosymbolic programming has emerged where users write queries against these predicted annotations. However, due to the intrinsic fallibility of machine learning models, these programs currently lack any notion of correctness. In many domains, users may want some kind of conservative guarantee that the results of their queries contain all possibly relevant instances. Conformal prediction has emerged as a promising strategy for quantifying uncertainty in machine learning by modifying models to predict sets of labels instead of individual labels; it provides a probabilistic guarantee that the prediction set contains the true label with high probability. We propose a novel framework for adapting conformal prediction to neurosymbolic programs; our strategy is to represent prediction sets as abstract values in some abstract domain, and then to use abstract interpretation to propagate prediction sets through the program. Our strategy satisfies three key desiderata: (i) correctness (i.e., the program outputs a prediction set that contains the true output with high probability), (ii) compositionality (i.e., we can quantify uncertainty separately for different modules and then compose them together), and (iii) structured values (i.e., we can provide uncertainty quantification for structured values such as lists). When the full program is available ahead-of-time, we propose an optimization that incorporates conformal prediction at intermediate program points to reduce imprecision in abstract interpretation. We evaluate our approach on programs that take MNIST and MS-COCO images as input, demonstrating that it produces reasonably sized prediction sets while satisfying a coverage guarantee.
Updated: 2024-05-24 20:15:53
标题: 通过组合式符号预测对神经符号程序的不确定性量化
摘要: 机器学习已成为自动为非结构化数据(例如图像)提供结构化标签(例如对象检测)的有效工具。因此,出现了一种称为神经符号编程的新编程范式,用户可以针对这些预测的注释编写查询。然而,由于机器学习模型固有的不可靠性,这些程序目前缺乏任何正确性概念。在许多领域中,用户可能希望获得某种保守保证,即他们的查询结果包含所有可能相关的实例。符合预测已成为量化机器学习不确定性的一种有前途的策略,通过修改模型以预测标签集而不是单个标签;它提供了一种概率保证,即预测集合以高概率包含真实标签。我们提出了一种新颖的框架,将符合预测调整为神经符号程序;我们的策略是将预测集表示为某个抽象域中的抽象值,然后使用抽象解释来传播预测集合通过程序。我们的策略满足三个关键要求:(i)正确性(即程序输出一个包含真实输出的预测集合,概率很高),(ii)组合性(即我们可以分别量化不同模块的不确定性,然后将它们组合在一起),以及(iii)结构化值(即我们可以为列表等结构化值提供不确定性量化)。当完整程序提前可用时,我们提出了一种优化方法,将符合预测合并到中间程序点以减少抽象解释中的不精确性。我们在以MNIST和MS-COCO图像作为输入的程序上评估了我们的方法,证明它产生了合理大小的预测集合,并满足了覆盖保证。
更新时间: 2024-05-24 20:15:53
领域: cs.PL,cs.LG,stat.ML
Learning accurate and interpretable decision trees
Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.
Updated: 2024-05-24 20:10:10
标题: 学习准确和可解释的决策树
摘要: 决策树是机器学习中的一种流行工具,可以生成易于理解的模型。文献中提出了几种学习决策树分类器的技术,不同技术对来自不同领域的数据效果良好。在这项工作中,我们开发了一种方法来设计决策树学习算法,通过重复访问来自相同领域的数据。我们提出了一种新颖的参数化节点分裂标准类别,用于自顶向下算法,可以在熵和基于基尼不纯度的准则之间插值,并为学习适合手头数据的分裂函数提供理论上的样本数量界限。我们还研究了在贝叶斯决策树学习中调整先验参数的样本复杂性,并将我们的结果扩展到决策树回归。我们进一步研究了在修剪决策树中调整超参数的问题,包括最小成本复杂度修剪等经典修剪算法。我们还研究了学习的决策树的可解释性,并提出了一种数据驱动方法,以优化解释性与准确性之间的权衡。最后,我们通过学习数据特定的决策树,在真实世界数据集上展示了我们方法的重要性,这些决策树同时更准确和可解释。
更新时间: 2024-05-24 20:10:10
领域: cs.LG
Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine
Automated penetration testing (AutoPT) based on reinforcement learning (RL) has proven its ability to improve the efficiency of vulnerability identification in information systems. However, RL-based PT encounters several challenges, including poor sampling efficiency, intricate reward specification, and limited interpretability. To address these issues, we propose a knowledge-informed AutoPT framework called DRLRM-PT, which leverages reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy. In our study, we specifically focus on lateral movement as a PT case study and formulate it as a partially observable Markov decision process (POMDP) guided by RMs. We design two RMs based on the MITRE ATT\&CK knowledge base for lateral movement. To solve the POMDP and optimize the PT policy, we employ the deep Q-learning algorithm with RM (DQRM). The experimental results demonstrate that the DQRM agent exhibits higher training efficiency in PT compared to agents without knowledge embedding. Moreover, RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.
Updated: 2024-05-24 20:05:12
标题: 基于强化学习和奖励机制的知识驱动自动渗透测试
摘要: 基于强化学习的自动渗透测试(AutoPT)已经证明其能够提高信息系统中漏洞识别的效率。然而,基于强化学习的PT遇到了一些挑战,包括采样效率低、奖励规范复杂和可解释性有限。为了解决这些问题,我们提出了一种基于知识的自动渗透测试框架,称为DRLRM-PT,它利用奖励机器(RMs)将领域知识编码为训练PT策略的指导方针。在我们的研究中,我们特别关注横向移动作为一个PT案例研究,并将其制定为由RMs指导的部分可观察马尔可夫决策过程(POMDP)。我们基于MITRE ATT\&CK知识库设计了两个RMs用于横向移动。为了解决POMDP并优化PT策略,我们采用了带有RM的深度Q学习算法(DQRM)。实验结果表明,与没有嵌入知识的代理相比,DQRM代理在PT中表现出更高的训练效率。此外,编码更详细领域知识的RMs相较于简单知识的RMs表现出更好的PT性能。
更新时间: 2024-05-24 20:05:12
领域: cs.AI,cs.CR,cs.LG
Belief-State Query Policies for Planning With Preferences Under Partial Observability
Planning in real-world settings often entails addressing partial observability while aligning with users' preferences. We present a novel framework for expressing users' preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) preferences in the setting of goal-oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such preferences and prove that while the expected value of a BSQ preference is not a convex function w.r.t its parameters, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior while guaranteeing user preference compliance. Theoretical analysis proves that our algorithms converge to the optimal preference-compliant behavior in the limit. Empirical results show that BSQ preferences provide a computationally feasible approach for planning with preferences in partially observable settings.
Updated: 2024-05-24 20:04:51
标题: 信念状态查询策略在部分可观察性下带偏好规划中的应用
摘要: 在现实世界中进行规划通常涉及解决部分可观察性问题,同时与用户的偏好保持一致。我们提出了一个新颖的框架,用于在部分可观测环境中表达用户对代理行为的偏好,该框架使用参数化信念状态查询(BSQ)偏好,在目标导向的部分可观测马尔可夫决策过程(gPOMDPs)中。我们首次对这种偏好进行了形式化分析,并证明,虽然BSQ偏好的预期值不是关于其参数的凸函数,但它是分段常数,并产生一个隐式的离散参数搜索空间,对于有限的时间跨度是有限的。这一理论结果导致了优化gPOMDP代理行为的新算法,同时保证了用户偏好的遵从。理论分析证明,我们的算法在极限情况下收敛到最佳的符合偏好的行为。实证结果表明,BSQ偏好为在部分可观测环境中进行带有偏好的规划提供了一种计算可行的方法。
更新时间: 2024-05-24 20:04:51
领域: cs.AI
A Novel Nearest Neighbors Algorithm Based on Power Muirhead Mean
This paper introduces the innovative Power Muirhead Mean K-Nearest Neighbors (PMM-KNN) algorithm, a novel data classification approach that combines the K-Nearest Neighbors method with the adaptive Power Muirhead Mean operator. The proposed methodology aims to address the limitations of traditional KNN by leveraging the Power Muirhead Mean for calculating the local means of K-nearest neighbors in each class to the query sample. Extensive experimentation on diverse benchmark datasets demonstrates the superiority of PMM-KNN over other classification methods. Results indicate statistically significant improvements in accuracy on various datasets, particularly those with complex and high-dimensional distributions. The adaptability of the Power Muirhead Mean empowers PMM-KNN to effectively capture underlying data structures, leading to enhanced accuracy and robustness. The findings highlight the potential of PMM-KNN as a powerful and versatile tool for data classification tasks, encouraging further research to explore its application in real-world scenarios and the automation of Power Muirhead Mean parameters to unleash its full potential.
Updated: 2024-05-24 20:04:48
标题: 基于幂Muirhead均值的一种新颖的最近邻算法
摘要: 本文介绍了创新的Power Muirhead Mean K-Nearest Neighbors (PMM-KNN)算法,这是一种将K-最近邻方法与自适应Power Muirhead Mean运算符结合的新型数据分类方法。所提出的方法旨在通过利用Power Muirhead Mean来计算每个类别中K个最近邻样本对查询样本的局部均值,以解决传统KNN的局限性。对多个基准数据集进行的广泛实验表明,PMM-KNN相对于其他分类方法具有优越性。结果表明,在各种数据集上,特别是那些具有复杂和高维分布的数据集上,准确率有统计学意义上的改善。Power Muirhead Mean的适应性使PMM-KNN能够有效捕获基础数据结构,从而提高准确性和稳健性。研究结果突显了PMM-KNN作为数据分类任务的强大而多功能工具的潜力,鼓励进一步研究探索其在现实场景中的应用以及自动化Power Muirhead Mean参数以释放其全部潜力。
更新时间: 2024-05-24 20:04:48
领域: cs.LG,cs.AI
UnitNorm: Rethinking Normalization for Transformers in Time Series
Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks, yet traditional methods like batch and layer normalization often lead to issues such as token shift, attention shift, and sparse attention. We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns, effectively circumventing these challenges. Grounded in existing normalization frameworks, UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection, via a rigorous evaluation on 6 state-of-the-art models and 10 datasets. Notably, UnitNorm shows superior performance, especially in scenarios requiring robust attention mechanisms and contextual comprehension, evidenced by significant improvements by up to a 1.46 decrease in MSE for forecasting, and a 4.89% increase in accuracy for classification. This work not only calls for a reevaluation of normalization strategies in time series Transformers but also sets a new direction for enhancing model performance and stability. The source code is available at https://anonymous.4open.science/r/UnitNorm-5B84.
Updated: 2024-05-24 19:58:25
标题: UnitNorm:重新思考时间序列中Transformer的规范化
摘要: 归一化技术对于增强Transformer模型在时间序列分析任务中的性能和稳定性至关重要,然而传统方法如批量归一化和层归一化常常导致诸如标记偏移、注意力偏移和稀疏注意力等问题。我们提出了UnitNorm,一种通过范数缩放输入向量并调节注意力模式的新方法,有效地避免了这些挑战。基于现有的归一化框架,UnitNorm的有效性在包括预测、分类和异常检测在内的各种时间序列分析任务中得到了证明,通过对6个最先进模型和10个数据集的严格评估。值得注意的是,UnitNorm表现出卓越的性能,特别是在需要稳健的注意机制和上下文理解的情景中,表现出显著的改进,预测中MSE减少高达1.46,分类准确率增加4.89%。这项工作不仅呼吁重新评估时间序列Transformer中的归一化策略,还为增强模型性能和稳定性设定了新方向。源代码可在https://anonymous.4open.science/r/UnitNorm-5B84找到。
更新时间: 2024-05-24 19:58:25
领域: cs.LG
Hacc-Man: An Arcade Game for Jailbreaking LLMs
The recent leaps in complexity and fluency of Large Language Models (LLMs) mean that, for the first time in human history, people can interact with computers using natural language alone. This creates monumental possibilities of automation and accessibility of computing, but also raises severe security and safety threats: When everyone can interact with LLMs, everyone can potentially break into the systems running LLMs. All it takes is creative use of language. This paper presents Hacc-Man, a game which challenges its players to "jailbreak" an LLM: subvert the LLM to output something that it is not intended to. Jailbreaking is at the intersection between creative problem solving and LLM security. The purpose of the game is threefold: 1. To heighten awareness of the risks of deploying fragile LLMs in everyday systems, 2. To heighten people's self-efficacy in interacting with LLMs, and 3. To discover the creative problem solving strategies, people deploy in this novel context.
Updated: 2024-05-24 19:55:20
标题: Hacc-Man: 一个用于越狱LLMs的街机游戏
摘要: 大型语言模型(LLMs)在复杂性和流畅性方面的最新进展意味着,人类历史上首次可以仅使用自然语言与计算机进行交互。这创造了自动化和计算的可访问性的巨大可能性,但也带来了严重的安全威胁:当每个人都可以与LLMs交互时,每个人都有可能侵入运行LLMs的系统。只需要创造性地运用语言。本文介绍了Hacc-Man,一个挑战玩家“越狱”LLM的游戏:颠覆LLM输出非预期结果。越狱处于创造性问题解决和LLM安全之间的交叉点。游戏的目的是三重的:1. 提高人们对在日常系统中部署脆弱LLMs风险的意识,2. 提高人们与LLMs交互的自我效能感,3. 发现人们在这种新领域中采用的创造性问题解决策略。
更新时间: 2024-05-24 19:55:20
领域: cs.CR,cs.AI,cs.CL,cs.HC
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
Denoising diffusion probabilistic models (DDPMs) have pioneered new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively, and generates more novel and unique unconditional 3D molecules for the QM9 dataset compared to previous methods. Importantly, we demonstrate that the geometry-complete denoising process of GCDM learned for 3D molecule generation enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but also that GCDM's geometric features can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Our source code and data are freely available at https://github.com/BioinfoMachineLearning/Bio-Diffusion.
Updated: 2024-05-24 19:49:33
标题: 几何完全扩散用于3D分子生成和优化
摘要: 去噪扩散概率模型(DDPMs)在计算机视觉和计算生物学等学科中开创了新的最先进结果,用于从文本引导的图像生成到结构引导的蛋白质设计等各种任务。沿着这一研究方向,最近提出了一种方法,使用等变图神经网络(GNNs)在DDPM框架内生成3D分子。然而,这些方法无法学习3D分子的重要几何属性,因为它们采用了分子无关和非几何的GNNs作为它们的3D图去噪网络,这明显阻碍了它们生成有效的大型3D分子的能力。在这项工作中,我们通过引入用于3D分子生成的几何完整扩散模型(GCDM),填补了这些差距,它在QM9数据集和更大的GEOM-Drugs数据集的有条件和无条件设置中明显优于现有的3D分子扩散模型,并与先前的方法相比,为QM9数据集生成更多新颖和独特的无条件3D分子。重要的是,我们展示了GCDM学习的完整几何去噪过程对于3D分子生成使模型能够在GEOM-Drugs尺度上生成大量有效和能量稳定的大分子,而以前的方法未能做到这一点。此外,我们展示了GCDM的扩展不仅可以有效地为特定蛋白质口袋设计3D分子,还可以将GCDM的几何特征重新用于一致地优化现有3D分子的几何和化学成分,以实现分子稳定性和属性特异性,展示了分子扩散模型的新多功能性。我们的源代码和数据可在https://github.com/BioinfoMachineLearning/Bio-Diffusion 上免费获取。
更新时间: 2024-05-24 19:49:33
领域: cs.LG,cs.AI,q-bio.BM,q-bio.QM,stat.ML,I.2.1; J.3
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Active learning for imbalanced classification tasks is challenging as the minority classes naturally occur rarely. Gathering a large pool of unlabelled data is thus essential to capture minority instances. Standard pool-based active learning is computationally expensive on large pools and often reaches low accuracy by overfitting the initial decision boundary, thus failing to explore the input space and find minority instances. To address these issues we propose AnchorAL. At each iteration, AnchorAL chooses class-specific instances from the labelled set, or anchors, and retrieves the most similar unlabelled instances from the pool. This resulting subpool is then used for active learning. Using a small, fixed-sized subpool AnchorAL allows scaling any active learning strategy to large pools. By dynamically selecting different anchors at each iteration it promotes class balance and prevents overfitting the initial decision boundary, thus promoting the discovery of new clusters of minority instances. In experiments across different classification tasks, active learning strategies, and model architectures AnchorAL is (i) faster, often reducing runtime from hours to minutes, (ii) trains more performant models, (iii) and returns more balanced datasets than competing methods.
Updated: 2024-05-24 19:46:14
标题: AnchorAL:针对大型和不平衡数据集的计算效率高的主动学习
摘要: 针对不平衡分类任务的主动学习具有挑战性,因为少数类自然而然地很少出现。因此,收集大量未标记数据对于捕捉少数实例至关重要。标准基于池的主动学习在大型池上计算成本高昂,通常通过过度拟合初始决策边界而达到低准确性,因此无法探索输入空间并找到少数实例。为了解决这些问题,我们提出了AnchorAL。在每次迭代中,AnchorAL从标记集中选择特定类别的实例,或锚点,并从池中检索最相似的未标记实例。然后使用这个结果子池进行主动学习。通过使用一个小的、固定大小的子池,AnchorAL可以将任何主动学习策略扩展到大型池。通过在每次迭代中动态选择不同的锚点,它促进了类别平衡,并防止过度拟合初始决策边界,从而促进了少数实例新集群的发现。在不同的分类任务、主动学习策略和模型架构的实验中,AnchorAL比竞争方法更快(通常将运行时间从几小时缩短到几分钟)、训练更高性能的模型,并返回更平衡的数据集。
更新时间: 2024-05-24 19:46:14
领域: cs.LG,cs.CL
Text-Based Reasoning About Vector Graphics
While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose the Visually Descriptive Language Model (VDLM), which performs text-based reasoning about vector graphics. VDLM leverages Scalable Vector Graphics (SVG) for a more precise visual description and first uses an off-the-shelf raster-to-SVG algorithm for encoding. Since existing language models cannot understand raw SVGs in a zero-shot setting, VDLM then bridges SVG with pretrained language models through a newly introduced intermediate symbolic representation, Primal Visual Description (PVD), comprising primitive attributes (e.g., shape, position, measurement) with their corresponding predicted values. PVD is task-agnostic and represents visual primitives that are universal across all vector graphics. It can be learned with procedurally generated (SVG, PVD) pairs and also enables the direct use of LLMs for generalization to complex reasoning tasks. By casting an image to a text-based representation, we can leverage the power of language models to learn alignment from SVG to visual primitives and generalize to unseen question-answering tasks. Empirical results show that VDLM achieves stronger zero-shot performance compared to state-of-the-art LMMs, such as GPT-4V, in various low-level multimodal perception and reasoning tasks on vector graphics. We additionally present extensive analyses on VDLM's performance, demonstrating that our framework offers better interpretability due to its disentangled perception and reasoning processes. Project page: https://mikewangwzhl.github.io/VDLM/
Updated: 2024-05-24 19:40:26
标题: 基于文本的矢量图形推理
摘要: 尽管大型多模态模型在广泛的视觉-语言基准测试中表现出色,但它们通常在需要精确感知低级视觉细节的任务中遇到困难,例如比较线段长度或解决简单的迷宫问题。特别是,在关于矢量图形的问答任务中,这种失败模式仍然存在,矢量图形是由纯粹的2D对象和形状组成的图像。为了解决这一挑战,我们提出了视觉描述语言模型(VDLM),它执行关于矢量图形的基于文本的推理。VDLM利用可伸缩矢量图形(SVG)进行更精确的视觉描述,并首先使用现成的光栅到SVG算法进行编码。由于现有的语言模型无法在零-shot设置中理解原始的SVG,VDLM通过引入一种新的中间符号表示,原始视觉描述(PVD),将SVG与预训练的语言模型进行桥接,PVD由基本属性(例如形状、位置、测量)及其相应的预测值组成。PVD是任务不可知的,并代表在所有矢量图形中通用的视觉基元。它可以通过程序生成的(SVG,PVD)对进行学习,并且还能直接使用LLM进行对复杂推理任务的泛化。通过将图像转换为基于文本的表示形式,我们可以利用语言模型的能力,从SVG到视觉基元的学习对齐,并泛化到未见的问答任务。实证结果显示,与最先进的LMM(如GPT-4V)相比,VDLM在矢量图形的各种低级多模态感知和推理任务上实现了更强的零-shot性能。我们还对VDLM的性能进行了广泛的分析,证明我们的框架由于其解耦的感知和推理过程而提供了更好的可解释性。项目页面:https://mikewangwzhl.github.io/VDLM/
更新时间: 2024-05-24 19:40:26
领域: cs.CL,cs.AI,cs.CV
Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced statistics to measure these effects, they remain flawed. To rectify this, we offer a new approach for understanding and quantifying the impact of expansion through the lens of the loss landscape, which has been shown to contain a manifold of linearly connected minima. Building on this new perspective, we propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size, enabling the comparison of candidate models and presenting a first step towards expanding models more reliably based on geometric properties of the loss landscape.
Updated: 2024-05-24 19:33:05
标题: 预测模型扩展对极小流形的影响:从损失景观的角度看
摘要: 对于给定任务的最佳模型通常很难确定,需要从头开始训练多个模型,随着数据集和模型规模的增长,这变得不切实际。一种更有效的替代方法是重复使用较小的预训练模型进行扩展,然而,由于对训练动态的影响尚不明确,这种方法并不被广泛采用。尽管之前的研究引入了用于测量这些影响的统计数据,但它们仍存在缺陷。为了纠正这一点,我们提出了一种新方法,通过损失景观的视角来理解和量化扩展的影响,已经证明损失景观包含一系列线性连接的极小值。基于这一新视角,我们提出了一个度量标准,通过估计流形的大小来研究扩展的影响。实验结果显示,性能提升与流形大小之间存在明显的关系,使得可以比较候选模型,并为基于损失景观的几何特性更可靠地扩展模型迈出了第一步。
更新时间: 2024-05-24 19:33:05
领域: cs.LG
Derivatives of Stochastic Gradient Descent
We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.
Updated: 2024-05-24 19:32:48
标题: 随机梯度下降法的导数
摘要: 我们考虑随机优化问题,其中目标函数取决于某个参数,例如在超参数优化中常见。我们研究随机梯度下降(SGD)迭代的导数关于该参数的行为,并表明它们受到不同目标函数上不精确SGD递归的驱动,该递归受到原始SGD的收敛的干扰。这使我们能够建立SGD的导数在目标函数强凸时以均方误差的形式收敛于解映射的导数。具体地,我们证明使用恒定步长时,这些导数在解导数中心的噪声球内稳定,并且使用逐渐减小的步长时,它们表现出$O(\log(k)^2 / k)$的收敛速度。此外,我们证明了在插值区间指数收敛。我们的理论发现通过在合成任务上进行数值实验进行了说明。
更新时间: 2024-05-24 19:32:48
领域: math.OC,cs.LG
Transfer of Safety Controllers Through Learning Deep Inverse Dynamics Model
Control barrier certificates have proven effective in formally guaranteeing the safety of the control systems. However, designing a control barrier certificate is a time-consuming and computationally expensive endeavor that requires expert input in the form of domain knowledge and mathematical maturity. Additionally, when a system undergoes slight changes, the new controller and its correctness certificate need to be recomputed, incurring similar computational challenges as those faced during the design of the original controller. Prior approaches have utilized transfer learning to transfer safety guarantees in the form of a barrier certificate while maintaining the control invariant. Unfortunately, in practical settings, the source and the target environments often deviate substantially in their control inputs, rendering the aforementioned approach impractical. To address this challenge, we propose integrating \emph{inverse dynamics} -- a neural network that suggests required action given a desired successor state -- of the target system with the barrier certificate of the source system to provide formal proof of safety. In addition, we propose a validity condition that, when met, guarantees correctness of the controller. We demonstrate the effectiveness of our approach through three case studies.
Updated: 2024-05-24 19:29:48
标题: 通过学习深度反动力学模型实现安全控制器的转移
摘要: 控制屏障证书已被证明在正式保证控制系统的安全性方面是有效的。然而,设计控制屏障证书是一项耗时且计算密集的工作,需要领域知识和数学成熟度方面的专家输入。此外,当系统发生轻微变化时,新的控制器及其正确性证书需要重新计算,导致类似于设计原始控制器时面临的计算挑战。先前的方法利用迁移学习来转移以屏障证书形式的安全性保证,同时保持控制不变性。不幸的是,在实际环境中,源环境和目标环境在其控制输入方面经常存在显著偏差,使上述方法不切实际。为了解决这一挑战,我们提出将目标系统的“逆动力学” —— 一个神经网络,根据所需的后继状态提出所需的动作 —— 与源系统的屏障证书集成,以提供形式化的安全性证明。此外,我们提出了一个有效性条件,当满足时,可以保证控制器的正确性。我们通过三个案例研究展示了我们方法的有效性。
更新时间: 2024-05-24 19:29:48
领域: eess.SY,cs.AI,cs.LG,cs.SY
Score Distillation via Reparametrized DDIM
While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and unrealistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS's generative process for 2D images almost identical to DDIM. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods, all without training additional neural networks or multi-view supervision, and providing useful insights into relationship between 2D and 3D asset generation with diffusion models.
Updated: 2024-05-24 19:22:09
标题: 通过重新参数化的DDIM进行分数蒸馏
摘要: 尽管二维扩散模型生成了逼真且高细节的图像,但基于这些二维扩散模型构建的三维形状生成方法,如基于Score Distillation Sampling(SDS)的方法产生了类似卡通般的过于平滑的形状。为了解释这种差异,我们展示了在Score Distillation中使用的图像引导可以被理解为一个二维去噪生成过程的速度场,直到选择一个噪声项。特别是,在变量的改变后,SDS类似于具有不同采样噪声项的Denoising Diffusion Implicit Models(DDIM)的高方差版本:SDS在每一步随机独立地引入噪声,而DDIM则从先前的噪声预测中推断出噪声。这种过度的方差可能导致过度平滑和不真实的输出。我们展示了通过在每个SDS更新步骤中反转DDIM可以恢复更好的噪声逼近。这种修改使得SDS的二维图像生成过程几乎与DDIM相同。在三维中,它去除了过度平滑,保留了更高频率的细节,并使生成质量更接近于二维采样器。实验证明,我们的方法在不训练额外的神经网络或多视图监督的情况下,比其他最先进的Score Distillation方法实现了更好或类似的三维生成质量,并提供了有关扩散模型下二维和三维资产生成之间关系的有用见解。
更新时间: 2024-05-24 19:22:09
领域: cs.CV,cs.GR,cs.LG
Discriminative Entropy Clustering and its Relation to K-means and SVM
Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating these unsupervised entropy-based criteria for clustering. First, in the context of linear softmax models, we discuss some general properties of entropy-based clustering. Disproving some earlier claims, we point out fundamental differences with K-means. On the other hand, we prove the margin maximizing property for decisiveness establishing a relation to SVM-based clustering. Second, we propose a new self-labeling formulation of entropy clustering for general softmax models. The pseudo-labels are introduced as auxiliary variables "splitting" the fairness and decisiveness. The derived self-labeling loss includes the reverse cross-entropy robust to pseudo-label errors and allows an efficient EM solver for pseudo-labels. Our algorithm improves the state of the art on several standard benchmarks for deep clustering.
Updated: 2024-05-24 19:18:42
标题: 区分性熵聚类及其与K均值和支持向量机的关系
摘要: 文献摘要: 模型输入和输出之间的互信息最大化与softmax预测的“决定性”和“公平性”形式上相关,这激发了基于熵的无监督聚类标准。首先,在线性softmax模型的背景下,我们讨论了基于熵的聚类的一些一般性质。反驳了一些早期的说法,指出了与K均值的基本差异。另一方面,我们证明了决定性的边界最大化特性,建立了与基于SVM的聚类的关系。其次,我们提出了一种新的基于熵的自标记形式,适用于一般的softmax模型。伪标签被引入为辅助变量,将“公平性”和“决定性”进行“分离”。导出的自标记损失包括对伪标签错误具有鲁棒性的逆交叉熵,并允许对伪标签进行高效的EM求解器。我们的算法在几个深度聚类的标准基准上改进了现有技术水平。
更新时间: 2024-05-24 19:18:42
领域: cs.LG,cs.CV
Diffusion Bridge Implicit Models
Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluations. In this work, we present diffusion bridge implicit models (DBIMs) for accelerated sampling of diffusion bridges without extra training. We generalize DDBMs via a class of non-Markovian diffusion bridges defined on the discretized timesteps concerning sampling, which share the same training objective as DDBMs. These generalized diffusion bridges give rise to generative processes ranging from stochastic to deterministic (i.e., an implicit probabilistic model) while being up to 25$\times$ faster than the vanilla sampler of DDBMs. Moreover, the deterministic sampling procedure yielded by DBIMs enables faithful encoding and reconstruction by a booting noise used in the initial sampling step, and allows us to perform semantically meaningful interpolation in image translation tasks by regarding the booting noise as the latent variable.
Updated: 2024-05-24 19:08:30
标题: 扩散桥隐式模型
摘要: 去噪扩散桥模型(DDBMs)是扩散模型的一个强大变体,用于在给定两个任意配对分布作为端点的情况下进行插值。尽管在诸如图像转换的任务中表现出有希望的性能,但DDBMs需要进行计算密集型的采样过程,涉及通过数百次网络评估来模拟(随机)微分方程。在这项工作中,我们提出了扩散桥隐式模型(DBIMs),用于加速扩散桥的采样,无需额外的训练。我们通过在离散化的时间步长上定义一类非马尔科夫扩散桥来推广DDBMs,这些扩散桥在采样方面具有相同的训练目标。这些广义扩散桥产生的生成过程从随机到确定性(即,隐式概率模型)不等,速度比DDBMs的普通采样器快多达25倍。此外,由DBIMs产生的确定性采样过程通过在初始采样步骤中使用的引导噪声实现了忠实的编码和重构,使我们能够通过将引导噪声视为潜在变量,在图像转换任务中执行语义上有意义的插值。
更新时间: 2024-05-24 19:08:30
领域: cs.LG,stat.ML
Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling
Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets prevent these models from forecasting at finer time scales. This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales beyond training dataset. Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale (e.g., 300 seconds) and use a parallel neural networks with a learnable router for bias correction. Furthermore, we introduce a lead time-aware training framework to promote the generalization of the model at different lead times. The weight analysis of physics-AI modules indicates that physics conducts major evolution while AI performs corrections adaptively. Extensive experiments show that WeatherGFT trained on an hourly dataset, achieves state-of-the-art performance across multiple lead times and exhibits the capability to generalize 30-minute forecasts.
Updated: 2024-05-24 18:56:33
标题: 将天气预报推广到细粒度时间尺度:通过物理-人工智能混合建模
摘要: 基于数据驱动的人工智能(AI)模型在天气预报领域取得了重大进展,特别是在中期和短期预报方面。然而,大多数基于数据驱动的天气预报模型是黑匣子系统,主要关注学习数据映射而不是时间维度中的细粒度物理演变。因此,数据集在时间尺度上的限制阻碍了这些模型在更细的时间尺度上进行预测。本文提出了一种物理-AI混合模型(即WeatherGFT),将天气预报推广到超出训练数据集的更细粒度时间尺度。具体而言,我们采用精心设计的PDE核来模拟小时间尺度(例如,300秒)上的物理演变,并使用具有可学习路由器的并行神经网络进行偏差校正。此外,我们引入了一种提前时间感知的训练框架,促进模型在不同提前时间下的泛化。物理-AI模块的权重分析表明,物理模块主要进行演化,而AI则自适应地进行校正。大量实验证明,在每小时数据集上训练的WeatherGFT在多个提前时间上实现了最先进的性能,并展示了泛化30分钟预报的能力。
更新时间: 2024-05-24 18:56:33
领域: cs.LG,cs.AI
Continual Release of Differentially Private Synthetic Data from Longitudinal Data Collections
Motivated by privacy concerns in long-term longitudinal studies in medical and social science research, we study the problem of continually releasing differentially private synthetic data from longitudinal data collections. We introduce a model where, in every time step, each individual reports a new data element, and the goal of the synthesizer is to incrementally update a synthetic dataset in a consistent way to capture a rich class of statistical properties. We give continual synthetic data generation algorithms that preserve two basic types of queries: fixed time window queries and cumulative time queries. We show nearly tight upper bounds on the error rates of these algorithms and demonstrate their empirical performance on realistically sized datasets from the U.S. Census Bureau's Survey of Income and Program Participation.
Updated: 2024-05-24 18:55:41
标题: 持续发布纵向数据集中的差分隐私合成数据
摘要: 受长期纵向研究中医学和社会科学研究中隐私问题的启发,我们研究了从纵向数据收集中持续发布差分私密合成数据的问题。我们引入了一个模型,在每个时间步中,每个个体报告一个新的数据元素,合成器的目标是以一致的方式增量更新一个合成数据集,以捕捉丰富的统计特性类别。我们提出了持续合成数据生成算法,可以保留两种基本类型的查询:固定时间窗口查询和累积时间查询。我们展示了这些算法的误差率的近乎严格的上界,并展示了它们在美国人口普查局的收入和参与计划调查中的实际数据集上的实证表现。
更新时间: 2024-05-24 18:55:41
领域: cs.DS,cs.CR,cs.CY,stat.AP
Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques
In this study, the reliability of identified risk factors associated with osteoporosis is investigated using a new clustering-based method on electronic medical records. This study proposes utilizing a new CLustering Iterations Framework (CLIF) that includes an iterative clustering framework that can adapt any of the following three components: clustering, feature selection, and principal feature identification. The study proposes using Wasserstein distance to identify principal features, borrowing concepts from the optimal transport theory. The study also suggests using a combination of ANOVA and ablation tests to select influential features from a data set. Some risk factors presented in existing works are endorsed by our identified significant clusters, while the reliability of some other risk factors is weakened.
Updated: 2024-05-24 18:53:28
标题: 使用无监督机器学习技术识别骨质疏松风险因素
摘要: 在这项研究中,利用一种基于聚类的新方法对与骨质疏松症相关的风险因素的可靠性进行了调查,该方法应用于电子医疗记录。本研究提出了利用一种新的聚类迭代框架(CLIF),该框架包括一个可以适应以下三个组件的迭代聚类框架:聚类、特征选择和主要特征识别。本研究提出使用Wasserstein距离来识别主要特征,借鉴了最优传输理论的概念。研究还建议使用ANOVA和消融测试的组合来从数据集中选择有影响的特征。一些现有作品中提出的风险因素得到了我们识别的显著聚类的支持,而另一些风险因素的可靠性则受到削弱。
更新时间: 2024-05-24 18:53:28
领域: cs.LG,cs.AI
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional diffusion transformers (DiT), which utilize self-attention blocks, are effective but their computational complexity scales quadratically with the input length, limiting their use for high-resolution images. To address this challenge, we introduce a novel diffusion architecture, Diffusion Mamba (DiM), which foregoes traditional attention mechanisms in favor of a scalable alternative. By harnessing the inherent efficiency of the Mamba architecture, DiM achieves rapid inference times and reduced computational load, maintaining linear complexity with respect to sequence length. Our architecture not only scales effectively but also outperforms existing diffusion transformers in both image and video generation tasks. The results affirm the scalability and efficiency of DiM, establishing a new benchmark for image and video generation techniques. This work advances the field of generative models and paves the way for further applications of scalable architectures.
Updated: 2024-05-24 18:50:27
标题: 使用双向SSM来扩展Diffusion Mamba以实现高效的图像和视频生成
摘要: 在最近的发展中,以其选择性状态空间方法而闻名的曼巴架构在高效建模长序列方面显示出潜力。然而,它在图像生成方面的应用仍未得到充分探索。传统扩散变压器(DiT)利用自注意力块是有效的,但其计算复杂度随输入长度的平方级增长,限制了它们在高分辨率图像上的应用。为了解决这一挑战,我们引入了一种新颖的扩散架构,Diffusion Mamba(DiM),放弃了传统的注意机制,转而采用可扩展的替代方案。通过利用曼巴架构固有的高效性,DiM实现了快速的推理时间和降低的计算负载,保持了对序列长度的线性复杂度。我们的架构不仅有效扩展,而且在图像和视频生成任务中优于现有的扩散变压器。结果证实了DiM的可扩展性和高效性,为图像和视频生成技术建立了新的基准。这项工作推动了生成模型领域的发展,为可扩展架构的进一步应用铺平了道路。
更新时间: 2024-05-24 18:50:27
领域: cs.CV,cs.AI,cs.LG
Intensive Care as One Big Sequence Modeling Problem
Reinforcement Learning in Healthcare is typically concerned with narrow self-contained tasks such as sepsis prediction or anesthesia control. However, previous research has demonstrated the potential of generalist models (the prime example being Large Language Models) to outperform task-specific approaches due to their capability for implicit transfer learning. To enable training of foundation models for Healthcare as well as leverage the capabilities of state of the art Transformer architectures, we propose the paradigm of Healthcare as Sequence Modeling, in which interaction between the patient and the healthcare provider is represented as an event stream and tasks like diagnosis and treatment selection are modeled as prediction of future events in the stream. To explore this paradigm experimentally we develop MIMIC-SEQ, a sequence modeling benchmark derived by translating heterogenous clinical records from MIMIC-IV dataset into a uniform event stream format, train a baseline model and explore its capabilities.
Updated: 2024-05-24 18:50:06
标题: 重症监护作为一个大型序列建模问题
摘要: 在医疗保健领域的强化学习通常涉及狭窄的自包含任务,如败血症预测或麻醉控制。然而,先前的研究已经证明了通用模型(最典型的例子是大型语言模型)由于其隐式迁移学习的能力,可能优于特定任务的方法。为了为医疗保健训练基础模型并利用最先进的Transformer架构的能力,我们提出了医疗保健作为序列建模的范式,其中患者和医疗保健提供者之间的互动被表示为一个事件流,诊断和治疗选择等任务被建模为对流中未来事件的预测。为了在实验中探索这种范式,我们开发了MIMIC-SEQ,这是一个序列建模基准,通过将MIMIC-IV数据集中的异构临床记录转换为统一的事件流格式而得到,训练一个基准模型并探索其能力。
更新时间: 2024-05-24 18:50:06
领域: cs.LG,cs.AI
Maximum diffusion reinforcement learning
Robots and animals both experience the world through their bodies and senses. Their embodiment constrains their experiences, ensuring they unfold continuously in space and time. As a result, the experiences of embodied agents are intrinsically correlated. Correlations create fundamental challenges for machine learning, as most techniques rely on the assumption that data are independent and identically distributed. In reinforcement learning, where data are directly collected from an agent's sequential experiences, violations of this assumption are often unavoidable. Here, we derive a method that overcomes this issue by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments over the course of individual task attempts. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control form a foundation for transparent and reliable decision-making in embodied reinforcement learning agents.
Updated: 2024-05-24 18:49:00
标题: 最大扩散强化学习
摘要: 机器人和动物都通过它们的身体和感官体验世界。它们的具体化约束了它们的经验,确保它们在空间和时间中持续展开。因此,具体化代理的经验是内在相关的。相关性对机器学习提出了基本挑战,因为大多数技术依赖于数据独立且同分布的假设。在强化学习中,数据直接从代理的序列经验中收集,这个假设的违反通常是不可避免的。在这里,我们提出了一种利用遍历过程的统计力学的方法,我们称之为最大扩散强化学习。通过使代理经验解耦,我们的方法可证明在个体任务尝试过程中连续部署中实现一次学习。此外,我们证明了我们的方法泛化了众所周知的最大熵技术,并且在流行的基准测试中稳定超过了现有技术的表现。我们的研究结果涉及物理学、学习和控制的交汇点,为具体化强化学习代理的透明和可靠决策奠定了基础。
更新时间: 2024-05-24 18:49:00
领域: cs.LG,cond-mat.stat-mech,cs.AI,cs.RO
HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis
Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based on combinatorial search scale poorly to complex problems. Motivated by these limitations, we introduce a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis. We evaluate this hybrid approach on three domains, and show that it outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.
Updated: 2024-05-24 18:45:51
标题: HYSYNTH:用于引导程序合成的无上下文LLM近似
摘要: 许多结构化预测和推理任务可以被构建为程序合成问题,目标是生成一个在特定领域语言(DSL)中将输入数据转换为所需输出的程序。不幸的是,纯粹的神经方法,如大型语言模型(LLMs),通常无法在陌生的DSL中生成完全正确的程序,而基于组合搜索的纯符号方法在复杂问题上扩展性差。受这些限制的启发,我们引入了一种混合方法,其中给定任务的LLM补全被用来学习一个任务特定的无上下文代理模型,然后用于指导程序合成。我们在三个领域上评估了这种混合方法,并显示它优于无引导搜索和直接从LLMs抽样,以及现有的程序合成器。
更新时间: 2024-05-24 18:45:51
领域: cs.PL,cs.AI
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed. Existing methods for single-image 3D reconstruction are primarily based on Score Distillation Sampling (SDS) with Neural 3D representations. Despite promising results, these approaches encounter practical limitations due to lengthy optimizations and significant memory consumption. In this work, we introduce Gamba, an end-to-end 3D reconstruction model from a single-view image, emphasizing two main insights: (1) Efficient Backbone Design: introducing a Mamba-based GambaFormer network to model 3D Gaussian Splatting (3DGS) reconstruction as sequential prediction with linear scalability of token length, thereby accommodating a substantial number of Gaussians; (2) Robust Gaussian Constraints: deriving radial mask constraints from multi-view masks to eliminate the need for warmup supervision of 3D point clouds in training. We trained Gamba on Objaverse and assessed it against existing optimization-based and feed-forward 3D reconstruction approaches on the GSO Dataset, among which Gamba is the only end-to-end trained single-view reconstruction model with 3DGS. Experimental results demonstrate its competitive generation capabilities both qualitatively and quantitatively and highlight its remarkable speed: Gamba completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU, which is about $1,000\times$ faster than optimization-based methods. Please see our project page at https://florinshen.github.io/gamba-project.
Updated: 2024-05-24 18:43:28
标题: Gamba:将高斯雨点法与Mamba相结合,用于单视角3D重建
摘要: 我们面对的挑战是以毫秒速度从单个图像高效地重建3D资产。现有的单图像3D重建方法主要基于得分蒸馏采样(SDS)与神经3D表示。尽管取得了令人期待的结果,但这些方法由于优化时间长和内存消耗大等实际限制而遇到困难。在这项工作中,我们介绍了Gamba,这是一个从单视图图像进行端到端3D重建的模型,强调了两个主要见解:(1)高效的骨干设计:引入基于Mamba的GambaFormer网络来将3D高斯喷洒(3DGS)重建建模为具有线性可扩展性的令牌长度顺序预测,从而容纳大量高斯函数;(2)鲁棒的高斯约束:从多视图掩模中导出径向掩模约束,消除训练中对3D点云预热监督的需求。我们在Objaverse上训练了Gamba,并在GSO数据集上对其进行评估,与现有的基于优化和前馈3D重建方法进行对比,其中Gamba是唯一一个具有3DGS的端到端训练的单视图重建模型。实验结果展示了其在定性和定量方面的竞争生成能力,并突出了其出色的速度:Gamba在单个NVIDIA A100 GPU上完成重建仅需0.05秒,比基于优化的方法快约1000倍。请访问我们的项目页面https://florinshen.github.io/gamba-project。
更新时间: 2024-05-24 18:43:28
领域: cs.CV,cs.AI
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
Collaboration is a key challenge in distributed multi-agent reinforcement learning (MARL) environments. Learning frameworks for these decentralized systems must weigh the benefits of explicit player coordination against the communication overhead and computational cost of sharing local observations and environmental data. Quantum computing has sparked a potential synergy between quantum entanglement and cooperation in multi-agent environments, which could enable more efficient distributed collaboration with minimal information sharing. This relationship is largely unexplored, however, as current state-of-the-art quantum MARL (QMARL) implementations rely on classical information sharing rather than entanglement over a quantum channel as a coordination medium. In contrast, in this paper, a novel framework dubbed entangled QMARL (eQMARL) is proposed. The proposed eQMARL is a distributed actor-critic framework that facilitates cooperation over a quantum channel and eliminates local observation sharing via a quantum entangled split critic. Introducing a quantum critic uniquely spread across the agents allows coupling of local observation encoders through entangled input qubits over a quantum channel, which requires no explicit sharing of local observations and reduces classical communication overhead. Further, agent policies are tuned through joint observation-value function estimation via joint quantum measurements, thereby reducing the centralized computational burden. Experimental results show that eQMARL with ${\Psi}^{+}$ entanglement converges to a cooperative strategy up to $17.8\%$ faster and with a higher overall score compared to split classical and fully centralized classical and quantum baselines. The results also show that eQMARL achieves this performance with a constant factor of $25$-times fewer centralized parameters compared to the split classical baseline.
Updated: 2024-05-24 18:43:05
标题: eQMARL:量子多智能体强化学习在量子通道上的分布式合作
摘要: 合作是分布式多智能体强化学习(MARL)环境中的一个关键挑战。这些分散系统的学习框架必须权衡明确玩家协调的好处与共享本地观察和环境数据的通信开销和计算成本之间的关系。量子计算引发了量子纠缠和多智能体环境合作之间潜在的协同作用,这可能实现更有效的分布式合作,而最小化信息共享。然而,这种关系在很大程度上尚未被探索,因为当前最先进的量子MARL(QMARL)实现依赖于经典信息共享,而不是通过量子通道的纠缠作为协调媒介。相比之下,在本文中,提出了一种新颖的框架,名为纠缠QMARL(eQMARL)。所提出的eQMARL是一个分布式的演员-评论家框架,通过量子通道促进合作,并通过量子纠缠的分裂评论家消除本地观察共享。引入一个独特分布在智能体之间的量子评论家,允许通过量子通道上的纠缠输入量子位耦合本地观察编码器,而无需明确共享本地观察,并减少经典通信开销。此外,通过联合量子测量来调整代理策略,从而减少中心化计算负担。实验结果表明,具有${\Psi}^{+}$纠缠的eQMARL相对于分裂经典和完全集中化的经典和量子基线,收敛到合作策略的速度快了17.8%,总体得分更高。结果还表明,与分裂经典基线相比,eQMARL以25倍较少的中心化参数实现了这种性能。
更新时间: 2024-05-24 18:43:05
领域: quant-ph,cs.ET,cs.LG,cs.MA
$\textit{Comet:}$ A $\underline{Com}$munication-$\underline{e}$fficient and Performant Approxima$\underline{t}$ion for Private Transformer Inference
The prevalent use of Transformer-like models, exemplified by ChatGPT in modern language processing applications, underscores the critical need for enabling private inference essential for many cloud-based services reliant on such models. However, current privacy-preserving frameworks impose significant communication burden, especially for non-linear computation in Transformer model. In this paper, we introduce a novel plug-in method Comet to effectively reduce the communication cost without compromising the inference performance. We second introduce an efficient approximation method to eliminate the heavy communication in finding good initial approximation. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9$\times$ less communication and 3.5$\times$ speedups while keep competitive model performance compared to the prior art.
Updated: 2024-05-24 18:43:00
标题: 《$\textit{彗星:}$一种$\underline{通信}$高效且性能优越的用于私人变压器推断的近似方法》
摘要: Transformer-like模型的普遍使用,如ChatGPT在现代语言处理应用中的应用,突显了对启用私有推理的重要性,这对许多依赖于这些模型的云服务至关重要。然而,当前的隐私保护框架会给Transformer模型中的非线性计算带来显著的通信负担。在本文中,我们引入了一种名为Comet的新型插件方法,可以有效地减少通信成本,而不影响推理性能。我们还介绍了一种高效的近似方法,可以消除在寻找良好初始近似值时的大量通信。我们在GLUE基准数据集上评估了我们的Comet在Bert和RoBERTa模型上的表现,结果显示通信量减少了最多3.9倍,速度提升了最多3.5倍,同时与先前的技术相比保持了竞争力的模型性能。
更新时间: 2024-05-24 18:43:00
领域: cs.LG,cs.AI,cs.CR
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.
Updated: 2024-05-24 18:40:20
标题: 基础选择:针对目标应用程序进行预训练大型语言模型的低秩分解
摘要: 大型语言模型(LLMs)显著提高了各种应用程序的性能,但它们需要大量计算和耗费能量。这使得在资源有限的设备上部署它们变得具有挑战性,例如个人电脑和移动/可穿戴设备,并导致在云服务器等资源丰富的环境中产生大量推断成本。为了扩展LLMs的使用,我们引入了一种低秩分解方法,有效地压缩这些模型,以满足特定应用程序的要求。我们观察到,在通用数据集上预训练的LLMs包含许多对特定应用程序不需要的冗余组件。我们的方法专注于识别并去除这些冗余部分,仅保留目标应用程序所需的必要元素。具体而言,我们将LLMs的权重矩阵表示为基本组件的线性组合。然后我们修剪不相关的基础,并用对特定应用程序有益的新基础增强模型。在目标应用程序上进行的Llama 2-7b和-13B模型的深度压缩结果,包括数学推理和代码生成,在保持与最先进的低秩压缩技术相当的准确性的同时,显示出我们的方法显著减少了模型大小。
更新时间: 2024-05-24 18:40:20
领域: cs.LG,cs.AR,cs.CL
CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models
Despite the excelling performance of machine learning models, understanding the decisions of machine learning models remains a long-standing goal. While commonly used attribution methods in explainable AI attempt to address this issue, they typically rely on associational rather than causal relationships. In this study, within the context of time series classification, we introduce a novel framework to assess the causal effect of concepts, i.e., predefined segments within a time series, on specific classification outcomes. To achieve this, we leverage state-of-the-art diffusion-based generative models to estimate counterfactual outcomes. Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically. We demonstrate the insights gained by our approach for a diverse set of qualitatively different time series classification tasks. Although causal and associational attributions might often share some similarities, in all cases they differ in important details, underscoring the risks associated with drawing causal conclusions from associational data alone. We believe that the proposed approach is widely applicable also in other domains, particularly where predefined segmentations are available, to shed some light on the limits of associational attributions.
Updated: 2024-05-24 18:33:18
标题: CausalConceptTS: 使用高精度扩散模型进行时间序列分类的因果归因
摘要: 尽管机器学习模型表现出色,但理解机器学习模型的决策仍然是一个长期目标。虽然可解释AI中常用的归因方法试图解决这个问题,但它们通常依赖于关联性而不是因果关系。在这项研究中,在时间序列分类的背景下,我们引入了一个新的框架来评估概念对特定分类结果的因果效应,即时间序列中预定义段的影响。为了实现这一目标,我们利用最先进的基于扩散的生成模型来估计反事实结果。我们的方法从理论和实证两方面比较这些因果归因和紧密相关的关联归因。我们展示了我们的方法在各种不同类型的时间序列分类任务中所获得的见解。尽管因果和关联归因在许多情况下可能有一些相似之处,但在所有情况下它们在重要细节上有所不同,强调了仅从关联数据中得出因果结论所带来的风险。我们相信提出的方法在其他领域也是广泛适用的,特别是在预定义的分割可用的情况下,可以揭示关联归因的局限性。
更新时间: 2024-05-24 18:33:18
领域: cs.LG,cs.AI,stat.ML
Outcome-Driven Dynamic Refugee Assignment with Allocation Balancing
This study proposes two new dynamic assignment algorithms to match refugees and asylum seekers to geographic localities within a host country. The first, currently implemented in a multi-year randomized control trial in Switzerland, seeks to maximize the average predicted employment level (or any measured outcome of interest) of refugees through a minimum-discord online assignment algorithm. The performance of this algorithm is tested on real refugee resettlement data from both the US and Switzerland, where we find that it is able to achieve near-optimal expected employment compared to the hindsight-optimal solution, and is able to improve upon the status quo procedure by 40-50%. However, pure outcome maximization can result in a periodically imbalanced allocation to the localities over time, leading to implementation difficulties and an undesirable workflow for resettlement resources and agents. To address these problems, the second algorithm balances the goal of improving refugee outcomes with the desire for an even allocation over time. We find that this algorithm can achieve near-perfect balance over time with only a small loss in expected employment compared to the employment-maximizing algorithm. In addition, the allocation balancing algorithm offers a number of ancillary benefits compared to pure outcome maximization, including robustness to unknown arrival flows and greater exploration.
Updated: 2024-05-24 18:27:09
标题: 以结果为导向的动态难民分配与分配平衡
摘要: 这项研究提出了两种新的动态分配算法,用于将难民和寻求庇护者分配到主机国内的地理位置。第一种目前在瑞士进行多年的随机对照试验中实施,旨在通过最小不和谐在线分配算法最大化难民的平均预测就业水平(或任何感兴趣的测量结果)。该算法的性能在来自美国和瑞士的真实难民安置数据上进行了测试,结果发现与事后最优解相比,它能够实现接近最优预期就业水平,并且能够提高现状程序的效果40-50%。然而,纯粹的结果最大化可能导致难民在不同地区之间的分配不平衡,从而导致实施困难以及对重新安置资源和代理人产生不良影响。为了解决这些问题,第二种算法在改善难民结果的目标与随时间平衡分配的愿望之间取得平衡。我们发现,与最大化就业的算法相比,该算法可以在时间上实现近乎完美的平衡,仅在预期就业方面略有损失。此外,与纯结果最大化相比,分配平衡算法还提供了许多附加好处,包括对未知到达流量的鲁棒性和更大的探索性。
更新时间: 2024-05-24 18:27:09
领域: math.OC,cs.GT,cs.LG,stat.ME
LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity Synchronization
Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption, particularly for on-device learning where computational resources are limited. Various alternatives to BP, including random feedback alignment, forward-forward, and local classifiers, have been explored to address these challenges. These methods have their advantages, but they can encounter difficulties when dealing with intricate visual tasks or demand considerable computational resources. In this paper, we propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain. LLS utilizes fixed periodic basis vectors to synchronize neuron activity within each layer, enabling efficient training without the need for additional trainable parameters. We demonstrate the effectiveness of LLS and its variations, LLS-M and LLS-MxM, on multiple image classification datasets, achieving accuracy comparable to BP with reduced computational complexity and minimal additional parameters. Furthermore, the performance of LLS on the Visual Wake Word (VWW) dataset highlights its suitability for on-device learning tasks, making it a promising candidate for edge hardware implementations.
Updated: 2024-05-24 18:24:24
标题: LLS:受神经活动同步启发的深度神经网络局部学习规则
摘要: 使用传统的反向传播(BP)训练深度神经网络(DNNs)在计算复杂性和能耗方面存在挑战,特别是在设备学习中,计算资源有限。为了解决这些挑战,已经探索了包括随机反馈对齐、前向-前向和本地分类器在内的各种BP替代方法。这些方法各有优势,但在处理复杂的视觉任务或需要大量计算资源时可能会遇到困难。本文提出了一种受大脑中观察到的神经活动同步现象(LLS)启发的新型本地学习规则。LLS利用固定周期基向量来同步每一层内的神经元活动,实现高效训练而无需额外的可训练参数。我们展示了LLS及其变种LLS-M和LLS-MxM在多个图像分类数据集上的有效性,实现了与BP相当的准确性,同时降低了计算复杂性和最小化了额外参数。此外,LLS在Visual Wake Word(VWW)数据集上的表现突显了其适用于设备学习任务,使其成为边缘硬件实现的有前景的候选者。
更新时间: 2024-05-24 18:24:24
领域: cs.NE,cs.AI,cs.LG
Hypergraph: A Unified and Uniform Definition with Application to Chemical Hypergraph
The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for representing high-order correlations among things, i.e., nodes and hyperedges. Specifically, we define a hyperedge to be a simple hyperedge, a nesting hyperedge, or a directed hyperedge. With this new definition, a hypergraph is nested if it has nesting hyperedge(s), and is directed if it has directed hyperedge(s). Otherwise, a hypergraph is a simple hypergraph. The uniformity and power of this new definition, with visualization, should facilitate the use of hypergraph for representing (hierarchical) high-order correlations in general and chemical systems in particular. Graph has been widely used as a mathematical structure for machine learning on molecular structures and 3D molecular geometries. However, graph has a major limitation: it can represent only pairwise correlations between nodes. Hypergraph extends graph with high-order correlations among nodes. This extension is significant or essential for machine learning on chemical systems. For molecules, this is significant as it allows the direct, explicit representation of multicenter bonds and molecular substructures. For chemical reactions, this is essential since most chemical reactions involve multiple participants. We propose the use of chemical hypergraph, a multilevel hypergraph with simple, nesting and directed hyperedges, as a single mathematical structure for representing chemical systems. We apply the new definition of hypergraph to chemical hypergraph and, as simplified versions, molecular hypergraph and chemical reaction hypergraph.
Updated: 2024-05-24 18:14:44
标题: 超图:一个统一且统一的定义及其在化学超图中的应用
摘要: 超图的传统定义存在两个主要问题:(1)没有标准的有向超图定义,(2)没有嵌套超图的正式定义。为了解决这些问题,我们提出了一个新的超图定义,统一了无向、有向和嵌套超图的概念,并统一使用超边作为表示事物之间高阶相关性的单一构造,即节点和超边。具体而言,我们将超边定义为简单超边、嵌套超边或有向超边。有了这个新定义,如果超图具有嵌套超边,则它是嵌套的;如果超图具有有向超边,则它是有向的。否则,超图就是简单超图。这个新定义的统一性和强大性,结合可视化,应该有助于使用超图来表示(分级)高阶相关性,特别是化学系统中的相关性。图被广泛用作机器学习分子结构和3D分子几何的数学结构。然而,图有一个主要限制:它只能表示节点之间的成对相关性。超图通过节点之间的高阶相关性扩展了图。这种扩展对于化学系统的机器学习至关重要。对于分子而言,这是重要的,因为它允许直接、明确地表示多中心键和分子亚结构。对于化学反应,这是必不可少的,因为大多数化学反应涉及多个参与者。我们提议使用化学超图,一个具有简单、嵌套和有向超边的多层次超图,作为表示化学系统的单一数学结构。我们将超图的新定义应用于化学超图,并作为简化版本,分子超图和化学反应超图。
更新时间: 2024-05-24 18:14:44
领域: cs.LG,q-bio.QM
Horizontal Federated Computer Vision
In the modern world, the amount of visual data recorded has been rapidly increasing. In many cases, data is stored in geographically distinct locations and thus requires a large amount of time and space to consolidate. Sometimes, there are also regulations for privacy protection which prevent data consolidation. In this work, we present federated implementations for object detection and recognition using a federated Faster R-CNN (FRCNN) and image segmentation using a federated Fully Convolutional Network (FCN). Our FRCNN was trained on 5000 examples of the COCO2017 dataset while our FCN was trained on the entire train set of the CamVid dataset. The proposed federated models address the challenges posed by the increasing volume and decentralized nature of visual data, offering efficient solutions in compliance with privacy regulations.
Updated: 2024-05-24 18:13:11
标题: 水平联合计算机视觉
摘要: 在现代世界中,记录的视觉数据量迅速增加。在许多情况下,数据存储在地理位置不同的地方,因此需要大量时间和空间来 consololidate。有时,也有隐私保护的法规阻止数据的合并。在这项工作中,我们提出了使用联邦式 Faster R-CNN(FRCNN)进行目标检测和识别以及使用联邦式全卷积网络(FCN)进行图像分割的联邦实现。我们的FRCNN是在COCO2017数据集的5000个示例上训练的,而我们的FCN是在CamVid数据集的整个训练集上训练的。所提出的联邦模型解决了视觉数据数量增加和分散性质所带来的挑战,提供了符合隐私法规的高效解决方案。
更新时间: 2024-05-24 18:13:11
领域: cs.CV,cs.AI,cs.DC,cs.LG,C.2.4; I.2.8; I.4; I.4.8
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mislabeling, weak labeling, unlabeled data, and low-quality music waveform significantly hampers the development of music generation models. To overcome these challenges, we introduce a novel quality-aware masked diffusion transformer (QA-MDT) approach that enables generative models to discern the quality of input music waveform during training. Building on the unique properties of musical signals, we have adapted and implemented a MDT model for TTM task, while further unveiling its distinct capacity for quality control. Moreover, we address the issue of low-quality captions with a caption refinement data processing approach. Our demo page is shown in https://qa-mdt.github.io/. Code on https://github.com/ivcylc/qa-mdt
Updated: 2024-05-24 18:09:27
标题: 质量感知掩模扩散变压器用于增强音乐生成
摘要: 近年来,基于扩散的文本到音乐(TTM)生成逐渐受到重视,提供了一种从文本描述中合成音乐内容的新方法。在这一生成过程中实现高准确性和多样性需要大量高质量的数据,而这种数据往往只占可用数据集的一小部分。在开源数据集中,诸如错误标记、弱标记、未标记数据和低质量音乐波形等问题的普遍存在显著阻碍了音乐生成模型的发展。为了克服这些挑战,我们引入了一种新颖的质量感知遮蔽扩散变压器(QA-MDT)方法,使生成模型能够在训练过程中识别输入音乐波形的质量。基于音乐信号的独特属性,我们已经改编并实施了一个MDT模型用于TTM任务,并揭示了它在质量控制方面的独特能力。此外,我们使用一种标题优化数据处理方法解决了低质量标题的问题。我们的演示页面在https://qa-mdt.github.io/上显示。代码在https://github.com/ivcylc/qa-mdt。
更新时间: 2024-05-24 18:09:27
领域: cs.SD,cs.AI,eess.AS
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL pose a significant challenge to its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. In this paper, we introduce a novel dimension-free communication strategy for FL, leveraging zero-order optimization techniques. We propose a new algorithm, FedDisco, which facilitates the transmission of only a constant number of scalar values between clients and the server in each communication round, thereby reducing the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$, where $d$ is the dimension of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions and dimension-free rate for low effective rank scenarios. Empirical evaluations through classic deep learning training and large language model fine-tuning substantiate significant reductions in communication overhead compared to traditional FL approaches.
Updated: 2024-05-24 18:07:05
标题: 通过零阶优化在联邦学习中实现无维度通信
摘要: 联邦学习(FL)为分布式数据源提供了协作和隐私保护的机器学习的有前途的框架。然而,与FL相关的大量通信成本对其效率构成了重大挑战。具体而言,在每一轮通信中,通信成本与模型的维度呈线性关系,这在大型模型场景中尤为困难。尽管存在各种通信效率策略,但固有的维度相关通信成本仍然是当前FL实现的主要瓶颈。在本文中,我们介绍了一种新颖的无维度通信策略,利用零阶优化技术。我们提出了一种新算法FedDisco,可以在每一轮通信中仅传输一个恒定数量的标量值,从而将通信成本从O(d)降低到O(1),其中d是模型参数的维度。在非凸函数中,我们理论上证明了我们的算法在标准假设和无维度率下实现了最先进的速率,显示了客户端数量和本地步数的线性加速以及低有效秩场景下的无维度率。通过经典深度学习训练和大型语言模型微调的实证评估,与传统FL方法相比,证实了通信开销的显著降低。
更新时间: 2024-05-24 18:07:05
领域: cs.LG,cs.DC
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to construct augmented samples. In this paper, we propose LogicMix, a Mixup variant designed for such partially labeled datasets. LogicMix mixes the sample labels by logical OR so that the unknown labels can be correctly mixed by utilizing OR's logical equivalences, including the domination and identity laws. Unlike Mixup, which mixes exactly two samples, LogicMix can mix multiple ($\geq2$) partially labeled samples, constructing visually more confused augmented samples to regularize training. LogicMix is more general and effective than other compared Mixup variants in the experiments on various partially labeled dataset scenarios. Moreover, it is plug-and-play and only requires minimal computation, hence it can be easily inserted into existing frameworks to collaborate with other methods to improve model performance with a negligible impact on training time, as demonstrated through extensive experiments. In particular, through the collaboration of LogicMix, RandAugment, Curriculum Labeling, and Category-wise Fine-Tuning, we attain state-of-the-art performance on MS-COCO, VG-200, and Pascal VOC 2007 benchmarking datasets. The remarkable generality, effectiveness, collaboration, and simplicity suggest that LogicMix promises to be a popular and vital data augmentation method.
Updated: 2024-05-24 18:05:09
标题: 在多标签图像分类中混合多个部分标记样本可获得的免费性能提升
摘要: 多标签图像分类数据集通常是部分标记的,其中许多标签是缺失的,这给训练准确的深度分类器带来了重大挑战。然而,强大的Mixup样本混合数据增强不能很好地利用来解决这一挑战,因为它无法对未知标签进行线性插值以构建增强样本。在本文中,我们提出了LogicMix,一种专为这种部分标记数据集设计的Mixup变体。LogicMix通过逻辑OR混合样本标签,以便利用OR的逻辑等价性正确混合未知标签,包括支配和恒等法则。与Mixup不同,Mixup只能混合两个样本,LogicMix可以混合多个(≥2)部分标记样本,构造外观更混乱的增强样本以规范训练。在各种部分标记数据集场景的实验中,LogicMix比其他Mixup变体更通用和有效。此外,它是即插即用的,并且仅需要最少的计算,因此可以轻松地插入到现有框架中与其他方法协作,以在训练时间上对模型性能产生可忽略的影响,通过大量实验证明。特别是,通过LogicMix、RandAugment、Curriculum Labeling和Category-wise Fine-Tuning的协作,我们在MS-COCO、VG-200和Pascal VOC 2007基准数据集上实现了最先进的性能。显著的通用性、有效性、协作性和简单性表明,LogicMix有望成为一种受欢迎和重要的数据增强方法。
更新时间: 2024-05-24 18:05:09
领域: cs.CV,cs.AI
Canonical Variates in Wasserstein Metric Space
In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, we define both between-class and within-class variations as the average squared distances between pairs of instances, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. We conduct empirical studies to assess the algorithm's convergence and, through experimental validation, demonstrate that our dimension reduction technique substantially enhances classification performance. Moreover, our method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness against variations in the distributional representations of data clouds.
Updated: 2024-05-24 17:59:21
标题: Wasserstein度量空间中的规范变量
摘要: 在本文中,我们讨论了每个实例的分类,这些实例不是由一个单一点,而是由一个向量空间上的分布来表征。我们采用Wasserstein度量来衡量分布之间的距离,然后将这些距离用于基于距离的分类算法,如k最近邻、k均值和伪混合建模。我们研究的核心是在Wasserstein度量空间内进行维度缩减以提高分类准确性。我们引入了一种新颖的方法,基于最大化Fisher比的原则,Fisher比被定义为类间变异性与类内变异性之比。在最大化这个比例的方向上被称为判别坐标或规范变量轴。在实践中,我们将类间和类内变异性定义为实例对之间的平均平方距离,这些实例对要么属于相同类别,要么属于不同类别。这种比率优化是通过一个迭代算法实现的,该算法在向量空间内的最优传输和最大化步骤之间交替进行。我们进行了实证研究以评估算法的收敛性,并通过实验验证表明,我们的维度缩减技术显著提高了分类性能。此外,我们的方法优于基于分布数据导出的向量表示的经过验证的算法。它还展现出对数据云的分布表示变化的稳健性。
更新时间: 2024-05-24 17:59:21
领域: stat.ML,cs.AI,cs.LG
Improved Particle Approximation Error for Mean Field Neural Networks
Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. MFLD has gained attention due to its connection with noisy gradient descent for mean-field two-layer neural networks. Unlike standard Langevin dynamics, the nonlinearity of the objective functional induces particle interactions, necessitating multiple particles to approximate the dynamics in a finite-particle setting. Recent works (Chen et al., 2022; Suzuki et al., 2023b) have demonstrated the uniform-in-time propagation of chaos for MFLD, showing that the gap between the particle system and its mean-field limit uniformly shrinks over time as the number of particles increases. In this work, we improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors, which can exponentially deteriorate with the regularization coefficient. Specifically, we establish an LSI-constant-free particle approximation error concerning the objective gap by leveraging the problem structure in risk minimization. As the application, we demonstrate improved convergence of MFLD, sampling guarantee for the mean-field stationary distribution, and uniform-in-time Wasserstein propagation of chaos in terms of particle complexity.
Updated: 2024-05-24 17:59:06
标题: 均场神经网络的改进粒子逼近误差
摘要: 均场 Langevin 动力学(MFLD)最小化了一个在概率分布空间上定义的熵正则化非线性凸函数。由于与均场两层神经网络的噪声梯度下降的联系,MFLD 受到了关注。与标准 Langevin 动力学不同,目标函数的非线性性引起了粒子间的相互作用,需要多个粒子来在有限粒子设置下近似动态。最近的研究(Chen 等,2022 年;Suzuki 等,2023b 年)展示了 MFLD 的时间均匀传播混沌,表明随着粒子数量的增加,粒子系统和其均场极限之间的差距随时间均匀缩小。在本研究中,我们改进了其粒子近似误差中对数 Sobolev 不等式(LSI)常数的依赖性,该依赖性可能随正则化系数指数级恶化。具体来说,我们通过利用风险最小化中的问题结构,建立了一个与客观差距无关的 LSI 常数自由粒子近似误差。作为应用,我们展示了 MFLD 的收敛改进,对于均场稳态分布的采样保证,以及在粒子复杂度方面的时间均匀 Wasserstein 传播混沌。
更新时间: 2024-05-24 17:59:06
领域: cs.LG,stat.ML
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.
Updated: 2024-05-24 17:58:42
标题: 通过多模态数据集增强不良药物事件检测:语料库创建和模型开发
摘要: 不良药物事件(ADE)的挖掘在药物监测中至关重要,通过识别与药物相关的潜在风险,促进早期发现不良事件,引导监管决策,增强患者安全。传统的ADE检测方法可靠但缓慢,不易适应大规模运作,提供的信息有限。随着社交媒体内容、生物医学文献和电子病历等数据源的指数增长,从这些非结构化文本中提取相关ADE信息至关重要。先前的ADE挖掘研究集中在基于文本的方法上,忽视了视觉线索,限制了语境理解,阻碍了准确解释。为了填补这一空白,我们提出了一个多模式不良药物事件(MMADE)检测数据集,将ADE相关的文本信息与视觉辅助信息相结合。此外,我们介绍了一个框架,利用LLMs和VLMs的能力进行ADE检测,通过生成描述ADE的医学图像详细说明,帮助医疗专业人员在视觉上识别不良事件。利用我们的MMADE数据集,我们展示了整合图像中的视觉线索以增强整体性能的重要性。这种方法对患者安全、ADE意识和医疗可及性具有潜力,并为个性化医疗的进一步探索铺平了道路。
更新时间: 2024-05-24 17:58:42
领域: cs.AI,cs.CL,cs.CV
Scaling Laws for Discriminative Classification in Large Language Models
Modern large language models (LLMs) represent a paradigm shift in what can plausibly be expected of machine learning models. The fact that LLMs can effectively generate sensible answers to a diverse range of queries suggests that they would be useful in customer support applications. While powerful, LLMs have been observed to be prone to hallucination which unfortunately makes their near term use in customer support applications challenging. To address this issue we present a system that allows us to use an LLM to augment our customer support advocates by re-framing the language modeling task as a discriminative classification task. In this framing, we seek to present the top-K best template responses for a customer support advocate to use when responding to a customer. We present the result of both offline and online experiments where we observed offline gains and statistically significant online lifts for our experimental system. Along the way, we present observed scaling curves for validation loss and top-K accuracy, resulted from model parameter ablation studies. We close by discussing the space of trade-offs with respect to model size, latency, and accuracy as well as and suggesting future applications to explore.
Updated: 2024-05-24 17:58:38
标题: 大型语言模型中的判别分类的规模定律
摘要: 现代大型语言模型(LLMs)代表了机器学习模型可以合理期望的范式转变。LLMs能够有效地生成对各种查询的明智答案,这表明它们在客户支持应用中将是有用的。虽然强大,但观察到LLMs容易出现幻觉,这不幸地使它们在客户支持应用中的近期使用具有挑战性。为了解决这个问题,我们提出了一个系统,允许我们使用LLM来增强客户支持代表,将语言建模任务重新构建为一个区分性分类任务。在这种框架中,我们寻求为客户支持代表呈现前K个最佳模板回复,以便在回复客户时使用。我们展示了离线和在线实验的结果,我们观察到离线增益和实验系统的统计显著在线提升。在这个过程中,我们介绍了验证损失和前K准确率的观察到的缩放曲线,这些曲线是通过模型参数消融研究得出的。最后,我们讨论了与模型大小、延迟和准确性相关的权衡空间,并建议未来要探索的应用。
更新时间: 2024-05-24 17:58:38
领域: cs.CL,cs.LG
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity, and generalizability to the resulting video. Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars. Technically, we design an automatic annotation pipeline to construct an instruction-video paired training dataset, equipped with a novel two-branch diffusion-based generator to predict avatars with audio and text instructions at the same time. Experimental results demonstrate that InstructAvatar produces results that align well with both conditions, and outperforms existing methods in fine-grained emotion control, lip-sync quality, and naturalness. Our project page is https://wangyuchi369.github.io/InstructAvatar/.
Updated: 2024-05-24 17:53:54
标题: InstructAvatar:用文本引导的情感和动作控制用于角色生成
摘要: 最近的语音化身生成模型在实现与音频的真实和准确的唇部同步方面取得了进展,但往往在控制和传达化身的详细表情和情感方面表现不佳,使得生成的视频缺乏生动性和可控性。在本文中,我们提出了一种新颖的文本引导方法,用于生成情感表达丰富的2D化身,提供细粒度的控制、改进的交互性和对生成视频的泛化能力。我们的框架名为InstructAvatar,利用自然语言界面来控制化身的情绪和面部运动。在技术上,我们设计了一个自动注释流水线来构建一个指令-视频配对的训练数据集,配备了一个新颖的基于扩散的双分支生成器,同时预测具有音频和文本指令的化身。实验结果表明,InstructAvatar 产生的结果与两种条件都很好地吻合,并在细粒度情感控制、唇部同步质量和自然性方面优于现有方法。我们的项目页面是 https://wangyuchi369.github.io/InstructAvatar/。
更新时间: 2024-05-24 17:53:54
领域: cs.CV,cs.AI
Sparse Expansion and Neuronal Disentanglement
We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively $\textit{disentangling}$ the input-output relationship of every individual neuron across clusters of inputs. Specifically, sparse experts approximate the dense neuron output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron's output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has a fraction of highly entangled Wasserstein neurons, and model performance suffers more when these are sparsified as opposed to others.
Updated: 2024-05-24 17:51:39
标题: 稀疏扩展和神经元解缠
摘要: 我们展示了如何通过将LLM扩展为稀疏专家混合来提高推断效率,其中每个专家是原始权重的一个副本,经过一次性修剪以适用于特定输入值的聚类。我们称这种方法为“稀疏扩展”。我们展示了对于像Llama 270B这样的模型,随着稀疏专家数量的增加,稀疏扩展在相同推断FLOP预算下优于所有其他一次性稀疏化方法,并且随着稀疏性的增加,这种差距会扩大,从而导致推断加速。 但是为什么呢?为了回答这个问题,我们提供了强有力的证据,表明稀疏专家的混合有效地“解耦”了每个神经元在输入聚类中的输入-输出关系。具体来说,稀疏专家通过将分布分解为一系列较简单的分布,每个分布具有单独的稀疏点积来近似密集神经元输出分布。有趣的是,我们展示了神经元输出分布与高斯分布之间的Wasserstein距离是其纠缠级别和对模型准确性的贡献的指标。LLM的每一层都有一部分高度纠缠的Wasserstein神经元,当这些神经元被稀疏化时,模型性能会受到更大影响。
更新时间: 2024-05-24 17:51:39
领域: cs.LG,cs.AI
Score-based generative models are provably robust: an uncertainty quantification perspective
Through an uncertainty quantification (UQ) perspective, we show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem, a model-form UQ bound that describes how the $L^2$ error from learning the score function propagates to a Wasserstein-1 ($\mathbf{d}_1$) ball around the true data distribution under the evolution of the Fokker-Planck equation. We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization expressiveness, and (e) reference distribution choice, impact the quality of the generative model in terms of a $\mathbf{d}_1$ bound of computable quantities. The WUP theorem relies on Bernstein estimates for Hamilton-Jacobi-Bellman partial differential equations (PDE) and the regularizing properties of diffusion processes. Specifically, PDE regularity theory shows that stochasticity is the key mechanism ensuring SGM algorithms are provably robust. The WUP theorem applies to integral probability metrics beyond $\mathbf{d}_1$, such as the total variation distance and the maximum mean discrepancy. Sample complexity and generalization bounds in $\mathbf{d}_1$ follow directly from the WUP theorem. Our approach requires minimal assumptions, is agnostic to the manifold hypothesis and avoids absolute continuity assumptions for the target distribution. Additionally, our results clarify the trade-offs among multiple error sources in SGMs.
Updated: 2024-05-24 17:50:17
标题: 基于分数的生成模型在不确定性量化视角下可证明具有稳健性
摘要: 通过不确定性量化(UQ)的视角,我们展示了基于分数的生成模型(SGMs)在实际实现中对多种误差源具有可靠性。我们的主要工具是Wasserstein不确定性传播(WUP)定理,这是一个描述学习分数函数$L^2$误差如何在Fokker-Planck方程演化下传播到真实数据分布周围Wasserstein-1($\mathbf{d}_1$)球的模型形式UQ界限。我们展示了由于(a)有限样本逼近,(b)早停止,(c)匹配分数目标选择,(d)分数函数参数化表达能力,以及(e)参考分布选择而导致的误差如何影响生成模型的质量,以$\mathbf{d}_1$边界的可计算量来衡量。WUP定理依赖于Hamilton-Jacobi-Bellman偏微分方程(PDE)的Bernstein估计和扩散过程的正则性质。具体而言,PDE正则性理论表明,随机性是确保SGM算法具有可靠性的关键机制。WUP定理适用于超过$\mathbf{d}_1$的积分概率度量,如总变差距离和最大平均差异。样本复杂性和$\mathbf{d}_1$中的泛化边界直接源自WUP定理。我们的方法需要最少的假设,对流形假设不可知,并且避免了目标分布的绝对连续性假设。此外,我们的结果澄清了SGMs中多个误差源之间的权衡。
更新时间: 2024-05-24 17:50:17
领域: stat.ML,cs.LG,math.ST,stat.TH
Data Reconstruction: When You See It and When You Don't
We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically, * We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes. * We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.
Updated: 2024-05-24 17:49:34
标题: 数据重建:何时你能看到它,何时你看不到
摘要: 我们重新审视了正式定义重建攻击的基本问题。虽然通常从上下文中很清楚,但我们的探索表明,精确的定义比看起来更微妙,以至于可能不存在一个涵盖一切的定义。因此,我们采用了一种不同的策略,旨在通过回答两个互补的问题来“夹击”重建攻击的概念:(i) 什么条件可以保证给定系统免受此类攻击?(ii) 在什么情况下,给定的攻击明确表明系统没有受到保护?更具体地说, * 我们引入了一个新的定义范式--自恋弹性--来制定针对重建攻击的保护安全定义。这种范式具有自我参照的特性,使其能够规避先前研究的安全概念的缺陷。此外,作为一个副作用,我们证明自恋弹性捕捉了多个经过深入研究的概念,包括差分隐私和其他一次函数和加密方案的安全概念。 * 我们构建了重建攻击与科尔莫哥罗夫复杂性之间的联系。这使我们能够提出一个评估这种攻击何时确切成功的标准。
更新时间: 2024-05-24 17:49:34
领域: cs.CR
How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks
This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. We focus on so-called post-hoc methods, which replace the confidence estimator of a given classifier without modifying or retraining it, thus being practically appealing. Considering neural networks with softmax outputs, our goal is to identify the best confidence estimator that can be computed directly from the unnormalized logits. This problem is motivated by the intriguing observation in recent work that many classifiers appear to have a "broken" confidence estimator, in the sense that their selective classification performance is much worse than what could be expected by their corresponding accuracies. We perform an extensive experimental study of many existing and proposed confidence estimators applied to 84 pretrained ImageNet classifiers available from popular repositories. Our results show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance, completely fixing the pathological behavior observed in many classifiers. As a consequence, the selective classification performance of any classifier becomes almost entirely determined by its corresponding accuracy. Moreover, these results are shown to be consistent under distribution shift. Our code is available at https://github.com/lfpc/FixSelectiveClassification.
Updated: 2024-05-24 17:48:11
标题: 如何修复破损的置信度估计器:评估后处理方法在深度神经网络中的选择性分类
摘要: 本文讨论了深度神经网络中选择分类的问题,其中模型被允许在置信度较低的预测中弃权以避免潜在错误。我们专注于所谓的事后方法,这些方法替换了给定分类器的置信度估计器,而无需修改或重新训练它,因此在实践中具有吸引力。考虑到具有softmax输出的神经网络,我们的目标是识别可以直接从未标准化的logits计算得出的最佳置信度估计器。这个问题是由最近的研究中一个有趣的观察所激发的,即许多分类器似乎具有“破损”的置信度估计器,即它们的有选择性分类性能远远不及其相应准确性所预期的。我们对84个来自流行仓库的预训练ImageNet分类器应用许多现有和提出的置信度估计器进行了广泛的实验研究。我们的结果表明,对logits进行简单的$p$-范数归一化,然后将最大logit作为置信度估计器,可以在选择性分类性能方面带来显著的收益,完全修复了许多分类器中观察到的病态行为。因此,任何分类器的选择性分类性能几乎完全取决于其相应的准确性。此外,这些结果在分布转移下显示出一致性。我们的代码可在https://github.com/lfpc/FixSelectiveClassification找到。
更新时间: 2024-05-24 17:48:11
领域: cs.LG
Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence
This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpora that target a wide range of linguistic phenomena. Our results show that while transformers are better qua LMs (as measured by perplexity), both models perform equally and surprisingly well on linguistic generalization measures, suggesting that they are capable of generalizing from indirect evidence.
Updated: 2024-05-24 17:47:20
标题: 过滤语料库训练(FiCT)表明语言模型可以从间接证据中推广
摘要: 本文介绍了一种名为Filtered Corpus Training的方法,该方法在语言模型(LMs)的语料库上进行训练,过滤掉训练数据中的某些语言构造,并利用它来衡量LMs基于间接证据进行语言概括的能力。我们将该方法应用于LSTM和Transformer LMs(大小大致相当的模型),开发了针对各种语言现象的过滤语料库。我们的结果显示,虽然transformers在LMs方面表现更好(以困惑度衡量),但两种模型在语言概括度量上表现相同且出乎意料地好,表明它们能够从间接证据中进行概括。
更新时间: 2024-05-24 17:47:20
领域: cs.CL,cs.AI,cs.LG
Collaborative Access Control for IoT -- A Blockchain Approach
The Internet of Things (IoT) necessitates robust access control mechanisms to secure a vast array of interconnected devices. Most of the existing IoT systems in practice use centralized solutions. We identify the problems in such solutions and adopt the blockchain based decentralized access control approach. Though there are works in the literature that use blockchain for access control, there are some gaps in these works. We develop a blockchain embedded access control (BEAC) framework to bridge the gaps. First, blockchain based solutions for access control require an enabling P2P network while existing P2P overlays do not support some required features. We develop a novel P2P infrastructure to seamlessly support our BEAC framework. Second, most of the works consider blockchain based access control for a single access control model, and we develop a generic blockchain mechanism and show that it can support the embedding of various access control models. Finally, existing works adopt existing blockchain mechanisms which may incur a high communication overhead. We develop a shortcut approach to improve the number of message rounds in the access protocol. Our experiments demonstrate the efficacy of our system, showing that the shortcut mechanism can reduces access time by approximately 43%.
Updated: 2024-05-24 17:46:53
标题: 物联网的协作访问控制——区块链方法
摘要: 物联网(IoT)需要强大的访问控制机制来保护各种互联设备。目前大部分实际应用的IoT系统使用集中式解决方案。我们确定了这些解决方案中存在的问题,并采用基于区块链的去中心化访问控制方法。尽管文献中有使用区块链进行访问控制的研究,但这些研究存在一些差距。我们开发了一个嵌入式区块链访问控制(BEAC)框架来填补这些差距。首先,基于区块链的访问控制解决方案需要一个支持P2P网络,而现有的P2P覆盖层不支持一些必要的功能。我们开发了一个新颖的P2P基础设施,无缝支持我们的BEAC框架。其次,大部分研究只考虑单一访问控制模型的基于区块链的访问控制,我们开发了一个通用的区块链机制,并展示它可以支持各种访问控制模型的嵌入。最后,现有的研究采用了现有的区块链机制,这可能会导致高通信开销。我们开发了一种快捷方法来改善访问协议中的信息轮数。我们的实验证明了我们系统的有效性,显示出快捷机制可以将访问时间缩短约43%。
更新时间: 2024-05-24 17:46:53
领域: cs.DC,cs.CR
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. The salient features of RiC are simplicity and adaptivity, as it only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approaches the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning both Large Language Models (LLMs) and diffusion models to accommodate diverse rewards with only around 10% GPU hours compared with multi-objective RL baseline.
Updated: 2024-05-24 17:44:13
标题: 奖励背景下的基础模型多目标对齐与动态偏好调整
摘要: 我们考虑了将基础模型与人类偏好进行多目标对齐的问题,这是实现有益且无害的人工智能系统的关键一步。然而,使用强化学习(RL)对大型基础模型进行微调通常既昂贵又不稳定,而人类偏好的多维性、异质性和矛盾性进一步复杂化了对齐过程。在本文中,我们引入了Rewards-in-Context(RiC),它在基础模型的提示上下文中将响应条件化为多个奖励,并应用监督微调进行对齐。RiC的显著特点是简单和适应性,因为它只需要对单个基础模型进行监督微调,并支持在推理时间动态调整用户偏好。受到一个抽象凸优化问题的解析解的启发,我们的动态推理时间调整方法接近于多目标的帕累托最优解。经验证据表明,与多目标RL基线相比,我们的方法在仅需约10%的GPU小时的情况下,使大型语言模型(LLMs)和扩散模型能够适应各种奖励。
更新时间: 2024-05-24 17:44:13
领域: cs.LG,cs.AI,cs.CL
Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment
Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://github.com/laurenok24/NSAQA.
Updated: 2024-05-24 17:44:11
标题: 层次化的神经符号方法用于全面和可解释的动作质量评估
摘要: 动作质量评估(AQA)应用计算机视觉定量评估人类动作的表现或执行。当前的AQA方法是端到端的神经模型,缺乏透明度并且倾向于存在偏见,因为它们是基于主观人类判断作为地面真相进行训练的。为了解决这些问题,我们引入了一种神经符号范式用于AQA,该方法使用神经网络从视频数据中抽象可解释的符号,并通过应用规则对这些符号进行质量评估。我们以跳水为案例研究。我们发现领域专家更喜欢我们的系统,并认为它比纯神经方法更具信息量。我们的系统还实现了最先进的动作识别和时间分割,并自动生成了一个详细报告,将跳水细分为其各个元素,并提供具有视觉证据的客观评分。经领域专家组验证,该报告可用于协助评委评分,帮助培训评委,并向跳水运动员提供反馈。已注释的训练数据和代码:https://github.com/laurenok24/NSAQA。
更新时间: 2024-05-24 17:44:11
领域: cs.CV,cs.AI,cs.LG,cs.SC
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs.
Updated: 2024-05-24 17:43:35
标题: ACPO:一种用于带约束的平均MDPs的策略优化算法
摘要: 强化学习(RL)用于受限制的马尔可夫决策过程(CMDPs)是各种应用中越来越重要的问题。通常,平均标准比折扣标准更适用。然而,针对平均-CMDPs(ACMDPs)的RL仍然是一个具有挑战性的问题。为折扣约束RL问题设计的算法通常在平均CMDP设置下表现不佳。在本文中,我们介绍了一个针对带有平均标准的受限制MDPs的新的策略优化与函数逼近算法。平均受限策略优化(ACPO)算法受到基于信任域的策略优化算法的启发。我们为平均CMDPs开发了基本敏感性理论,然后在算法设计中使用相应的界限。我们对其性能提供了理论保证,并通过在各种具有挑战性的OpenAI Gym环境中进行大量实验工作,展示了与其他为ACMDPs调整的最先进算法相比其优越的实证性能。
更新时间: 2024-05-24 17:43:35
领域: cs.LG,cs.AI
Over-the-Air Runtime Wi-Fi MAC Address Re-randomization
Medium Access Control (MAC) address randomization is a key component for privacy protection in Wi-Fi networks. Current proposals periodically change the mobile device MAC addresses when it disconnects from the Access Point (AP). This way frames cannot be linked across changes, but the mobile device presence is exposed as long as it remains connected: all its communication is trivially linkable by observing the randomized yet same MAC address throughout the connection. Our runtime MAC re-randomization scheme addresses this issue, reducing or eliminating Wi-Fi frames linkability without awaiting for or requiring a disconnection. Our MAC re-randomization is practically 'over-the-air': MAC addresses are re-randomized just before transmission, while the protocol stacks (at the mobile and the AP) maintain locally the original connection MAC addresses - making our MAC layer scheme transparent to upper layers. With an implementation and a set of small-scale experiments with off-the-shelf devices, we show the feasibility of our scheme and the potential towards future deployment.
Updated: 2024-05-24 17:42:15
标题: 无线网络MAC地址运行时重新随机化
摘要: 介质访问控制(MAC)地址随机化是Wi-Fi网络中隐私保护的关键组成部分。当前的提案在移动设备从接入点(AP)断开连接时定期更改MAC地址。这样,在更改后无法将帧关联起来,但只要移动设备保持连接,其存在就会暴露出来:观察到在整个连接过程中MAC地址随机化但仍然相同的情况下,所有通信都可以轻松关联起来。我们的运行时MAC重新随机化方案解决了这个问题,减少或消除了Wi-Fi帧的可关联性,而无需等待或要求断开连接。我们的MAC重新随机化实际上是“通过空中”进行的:MAC地址在传输之前重新随机化,而协议栈(在移动设备和AP上)在本地维护原始连接MAC地址 - 这使得我们的MAC层方案对上层透明。通过使用现成设备进行实施和一系列小规模实验,我们展示了我们方案的可行性以及未来部署的潜力。
更新时间: 2024-05-24 17:42:15
领域: cs.NI,cs.CR
CAFe: Cost and Age aware Federated Learning
In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$, once the parameter server (PS) has broadcasted the global model. If enough clients do not report back within the deadline, the particular round is considered to be a failed round and the training round is restarted from scratch. If enough clients have responded back, the round is deemed successful and the local gradients of all the clients that responded back are used to update the global model. In either case, the clients that failed to report back an update within the deadline would have wasted their computational resources. Having a tighter deadline (small $T$) and waiting for a larger number of participating clients (large $M$) leads to a large number of failed rounds and therefore greater communication cost and computation resource wastage. However, having a larger $T$ leads to longer round durations whereas smaller $M$ may lead to noisy gradients. Therefore, there is a need to optimize the parameters $M$ and $T$ such that communication cost and the resource wastage is minimized while having an acceptable convergence rate. In this regard, we show that the average age of a client at the PS appears explicitly in the theoretical convergence bound, and therefore, can be used as a metric to quantify the convergence of the global model. We provide an analytical scheme to select the parameters $M$ and $T$ in this setting.
Updated: 2024-05-24 17:41:30
标题: CAFe:成本和年龄感知的联邦学习
摘要: 在许多联邦学习(FL)模型中,为了确保训练过程中的进展,通常采用的一种策略是等待至少$M$个客户端中的$N$个客户端根据报告截止日期$T$发送回他们的本地梯度,一旦参数服务器(PS)广播了全局模型。如果在截止日期内没有足够的客户端报告回来,该特定轮次被视为失败的轮次,并且训练轮次从头开始重新启动。如果有足够的客户端回复,该轮次被视为成功,并且所有回复的客户端的本地梯度用于更新全局模型。在任何情况下,未能在截止日期内报告更新的客户端将浪费他们的计算资源。设置更紧的截止日期(小$T$)并等待更多的参与客户端(大$M$)会导致大量失败的轮次,从而导致更大的通信成本和计算资源浪费。然而,设置更大的$T$会导致更长的轮次持续时间,而较小的$M$可能会导致梯度噪声。因此,有必要优化参数$M$和$T$,以减少通信成本和资源浪费,同时具有可接受的收敛速度。在这方面,我们表明在PS处客户端的平均年龄明确出现在理论收敛界限中,因此可以用作度量全局模型收敛的指标。我们提供了一种在这种设置中选择参数$M$和$T$的分析方案。
更新时间: 2024-05-24 17:41:30
领域: cs.LG,cs.DC,cs.IT,math.IT
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Several challenges make it difficult for sparse neural networks to compete with dense models. First, setting a large fraction of weights to zero impairs forward and gradient signal propagation. Second, sparse studies often need to test multiple sparsity levels, while also introducing new hyperparameters (HPs), leading to prohibitive tuning costs. Indeed, the standard practice is to re-use the learning HPs originally crafted for dense models. Unfortunately, we show sparse and dense networks do not share the same optimal HPs. Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware. A holistic approach is needed to tackle these challenges and we propose S$\mu$Par as one such approach. S$\mu$Par ensures activations, gradients, and weight updates all scale independently of sparsity level. Further, by reparameterizing the HPs, S$\mu$Par enables the same HP values to be optimal as we vary both sparsity level and model width. HPs can be tuned on small dense networks and transferred to large sparse models, greatly reducing tuning costs. On large-scale language modeling, S$\mu$Par training improves loss by up to 8.2% over the common approach of using the dense model standard parameterization.
Updated: 2024-05-24 17:39:26
标题: 稀疏最大更新参数化:稀疏训练动力学的整体方法
摘要: 稀疏神经网络面临着几个挑战,使其难以与密集模型竞争。首先,将大部分权重设为零会损害前向和梯度信号传播。其次,稀疏研究通常需要测试多个稀疏级别,同时引入新的超参数(HPs),导致调整成本过高。事实上,标准做法是重复使用最初为密集模型设计的学习HPs。不幸的是,我们发现稀疏和密集网络并不共享相同的最优HPs。在没有稳定动态和有效训练方法的情况下,测试规模化的稀疏性成本高昂,这是超越密集网络并为硬件稀疏性加速提供商业支持的关键。需要一种整体方法来解决这些挑战,我们提出SμPar作为一种解决方案。SμPar确保激活、梯度和权重更新都独立于稀疏级别进行缩放。此外,通过重新参数化HPs,SμPar使相同的HP值在改变稀疏级别和模型宽度时都能达到最优。HPs可以在小型密集网络上进行调优,并转移到大型稀疏模型,大大降低调整成本。在大规模语言建模中,SμPar训练相比使用密集型模型标准参数化的常见方法,损失减少了最多8.2%。
更新时间: 2024-05-24 17:39:26
领域: cs.LG
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias
Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) like GPT-4 introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset of 166 papers from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date, encompassing 3,066 references in total. In our experiment, GPT-4 was tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias in GPT-4, which persists even after controlling for publication year, title length, number of authors, and venue. Additionally, we observe a large consistency between the characteristics of GPT-4's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended by GPT-4 are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination. Our results underscore the need for identifying the model's biases and for developing balanced methods to interact with LLMs in general.
Updated: 2024-05-24 17:34:32
标题: 大型语言模型反映出人类的引文模式,具有更加突出的引文偏见。
摘要: 引用实践在塑造科学知识结构方面至关重要,然而往往受当代规范和偏见的影响。像GPT-4这样的大型语言模型(LLMs)的出现为这些实践引入了新的动态。有趣的是,完全依赖于参数化知识而不是搜索或检索增强生成的LLMs推荐的参考文献的特征和潜在偏见尚未被探索。在这里,我们使用166篇来自AAAI、NeurIPS、ICML和ICLR的论文数据集进行实验分析,这些论文是在GPT-4的知识截止日期之后发表的,总共包含3,066个参考文献。在我们的实验中,GPT-4被要求为这些论文中的匿名文本引用提供学术参考文献建议。我们的研究结果显示,人类和LLM引用模式之间存在明显的相似性,但GPT-4中存在更为明显的高引用偏见,即使在控制出版年份、标题长度、作者数量和会议地点等因素后仍然存在。此外,我们观察到GPT-4已存在和不存在的生成参考文献特征之间存在很大的一致性,表明模型已内化了引用模式。通过分析引用图表,我们展示了GPT-4推荐的参考文献嵌入在相关引用上下文中,表明模型对引用网络的概念化内化更深。虽然LLMs可以帮助生成引文,但它们也可能放大已有的偏见并引入新的偏见,可能扭曲科学知识的传播。我们的结果强调了需要识别模型偏见的必要性,并为与LLMs相互作用的平衡方法的开发提供了基础。
更新时间: 2024-05-24 17:34:32
领域: cs.DL,cs.AI,cs.LG,cs.SI
Single-Round Proofs of Quantumness from Knowledge Assumptions
A proof of quantumness is an efficiently verifiable interactive test that an efficient quantum computer can pass, but all efficient classical computers cannot (under some cryptographic assumption). Such protocols play a crucial role in the certification of quantum devices. Existing single-round protocols (like asking the quantum computer to factor a large number) require large quantum circuits, whereas multi-round ones use smaller circuits but require experimentally challenging mid-circuit measurements. As such, current proofs of quantumness are out of reach for near-term devices. In this work, we construct efficient single-round proofs of quantumness based on existing knowledge assumptions. While knowledge assumptions have not been previously considered in this context, we show that they provide a natural basis for separating classical and quantum computation. Specifically, we show that multi-round protocols based on Decisional Diffie-Hellman (DDH) or Learning With Errors (LWE) can be "compiled" into single-round protocols using a knowledge-of-exponent assumption or knowledge-of-lattice-point assumption, respectively. We also prove an adaptive hardcore-bit statement for a family of claw-free functions based on DDH, which might be of independent interest. Previous approaches to constructing single-round protocols relied on the random oracle model and thus incurred the overhead associated with instantiating the oracle with a cryptographic hash function. In contrast, our protocols have the same resource requirements as their multi-round counterparts without necessitating mid-circuit measurements, making them, arguably, the most efficient single-round proofs of quantumness to date. Our work also helps in understanding the interplay between black-box/white-box reductions and cryptographic assumptions in the design of proofs of quantumness.
Updated: 2024-05-24 17:33:10
标题: 基于知识假设的量子性的单轮证据
摘要: 量子性的证明是一种有效可验证的交互测试,有效量子计算机可以通过,但所有高效的经典计算机都不能通过(在某些密码学假设下)。这种协议在量子设备认证中起着至关重要的作用。现有的单轮协议(如要求量子计算机分解一个大数)需要大型量子电路,而多轮协议使用较小的电路,但需要实验上具有挑战性的中间电路测量。因此,当前的量子性证明对近期设备来说是难以实现的。 在本研究中,我们基于现有的知识假设构建了高效的单轮量子性证明。虽然知识假设在这个背景下以前没有被考虑过,但我们展示了它们提供了一个自然的基础来区分经典和量子计算。具体来说,我们展示了基于决策性Diffie-Hellman(DDH)或学习带有误差(LWE)的多轮协议可以通过使用指数知识假设或格点知识假设“编译”成单轮协议。我们还对基于DDH的一族无爪函数证明了自适应的硬核位陈述,这可能具有独立的兴趣。 以前构建单轮协议的方法依赖于随机预言模型,因此会产生与使用密码哈希函数实例化预言相关的额外开销。相比之下,我们的协议具有与它们的多轮对应物相同的资源需求,而不需要中间电路测量,使它们可以被认为是迄今为止最有效的单轮量子性证明。我们的工作也有助于理解黑盒/白盒归约和密码学假设在量子性证明设计中的相互作用。
更新时间: 2024-05-24 17:33:10
领域: quant-ph,cs.CR
First-order methods for Stochastic Variational Inequality problems with Function Constraints
The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (FCVI) problem in smooth or nonsmooth settings with possibly stochastic operators and constraints. We introduce the AdOpEx method, which employs an operator extrapolation on the KKT operator of the FCVI in a smooth deterministic setting. Since this operator is not uniformly Lipschitz continuous in the Lagrange multipliers, we employ an adaptive two-timescale algorithm leading to bounded multipliers and achieving the optimal $O(1/T)$ convergence rate. For the nonsmooth and stochastic VIs, we introduce design changes to the AdOpEx method and propose a novel P-OpEx method that takes partial extrapolation. It converges at the rate of $O(1/\sqrt{T})$ when both the operator and constraints are stochastic or nonsmooth. This method has suboptimal dependence on the noise and Lipschitz constants of function constraints. We propose a constraint extrapolation approach leading to the OpConEx method that improves this dependence by an order of magnitude. All our algorithms easily extend to saddle point problems with function constraints that couple the primal and dual variables while maintaining the same complexity results. To the best of our knowledge, all our complexity results are new in the literature
Updated: 2024-05-24 17:31:22
标题: 一阶方法用于带有函数约束的随机变分不等式问题
摘要: 单调变分不等式(VI)是一个通用模型,在各种工程和科学领域中具有重要应用。在许多情况下,VI问题伴随着可以是数据驱动的函数约束,使得通常的投影算子难以计算。本文提出了用于光滑或非光滑设置中可能具有随机算子和约束的函数约束变分不等式(FCVI)问题的新型一阶方法。我们介绍了AdOpEx方法,该方法在光滑确定性设置中对FCVI的KKT算子进行了外推。由于该算子在拉格朗日乘子中不是一致Lipschitz连续的,我们采用了自适应两时间尺度算法,导致乘子有界且实现了最佳的$O(1/T)$收敛速度。对于非光滑和随机VI,我们对AdOpEx方法进行了设计更改,并提出了一种新颖的P-OpEx方法,该方法采用部分外推。当算子和约束都是随机或非光滑时,它以$O(1/\sqrt{T})$的速率收敛。这种方法对噪声和函数约束的Lipschitz常数有次优的依赖性。我们提出了一种约束外推方法,引入了OpConEx方法,通过一个数量级改善了这种依赖性。我们所有的算法都可以轻松扩展到具有函数约束的鞍点问题,这些函数约束将原始和对偶变量耦合在一起,同时保持相同的复杂度结果。据我们所知,我们所有的复杂度结果在文献中都是新的。
更新时间: 2024-05-24 17:31:22
领域: math.OC,cs.LG
Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation
Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts. Our code and models are publicly available at https://mathis.petrovich.fr/stmc.
Updated: 2024-05-24 17:28:19
标题: 基于文本驱动的三维人体动作生成的多轨道时间线控制
摘要: 最近在生成建模领域取得了一些进展,这些进展在从文本中合成3D人体动作方面取得了有希望的进展,可以从简短提示和指定持续时间中生成角色动画。然而,仅使用单个文本提示作为输入缺乏动画师所需的细粒度控制,例如组合多个动作和定义动作部分的精确持续时间。为了解决这个问题,我们引入了基于文本驱动的时间线控制新问题,为用户提供了直观且细粒度的输入界面。用户可以指定多轨道时间线,其中包含多个组织在时间间隔中的提示,这些时间间隔可能重叠。这使得可以指定每个动作的确切时间,并按顺序或在重叠的时间间隔中组合多个动作。为了从多轨道时间线生成复合动画,我们提出了一种新的测试时间去噪方法。这种方法可以与任何预训练的动作扩散模型集成,以合成真实的动作,准确反映时间线。在每一步去噪的过程中,我们的方法分别处理每个时间线间隔(文本提示),随后根据每个动作中涉及的特定身体部位综合预测。实验比较和消融验证了我们的方法产生了符合给定文本提示语义和时间的真实动作。我们的代码和模型公开可用于https://mathis.petrovich.fr/stmc。
更新时间: 2024-05-24 17:28:19
领域: cs.CV,cs.GR,cs.LG
SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception
In this paper, we propose SpotNet: a fast, single stage, image-centric but LiDAR anchored approach for long range 3D object detection. We demonstrate that our approach to LiDAR/image sensor fusion, combined with the joint learning of 2D and 3D detection tasks, can lead to accurate 3D object detection with very sparse LiDAR support. Unlike more recent bird's-eye-view (BEV) sensor-fusion methods which scale with range $r$ as $O(r^2)$, SpotNet scales as $O(1)$ with range. We argue that such an architecture is ideally suited to leverage each sensor's strength, i.e. semantic understanding from images and accurate range finding from LiDAR data. Finally we show that anchoring detections on LiDAR points removes the need to regress distances, and so the architecture is able to transfer from 2MP to 8MP resolution images without re-training.
Updated: 2024-05-24 17:25:48
标题: SpotNet:一种以图像为中心、激光雷达锚定的长距离感知方法
摘要: 在本文中,我们提出了SpotNet:一种快速、单阶段、以图像为中心但以LiDAR为锚点的远距离3D目标检测方法。我们展示了我们的LiDAR/图像传感器融合方法,结合2D和3D检测任务的联合学习,可以实现准确的3D目标检测,即使LiDAR支持很稀疏。与最近的鸟瞰(BEV)传感器融合方法不同,其随着范围$r$的变化为$O(r^2)$,SpotNet的范围变化为$O(1)$。我们认为这样的架构非常适合利用每个传感器的优势,即来自图像的语义理解和来自LiDAR数据的准确距离确定。最后,我们展示了在LiDAR点上锚定检测结果消除了回归距离的需求,因此该架构能够在无需重新训练的情况下从2MP升级到8MP分辨率的图像。
更新时间: 2024-05-24 17:25:48
领域: cs.CV,cs.AI
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fr\'echet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD
Updated: 2024-05-24 17:20:46
标题: 评分身份蒸馏:预训练扩散模型的指数快速蒸馏,用于一步生成
摘要: 我们介绍了Score Identity Distillation(SiD),这是一种创新的无数据方法,将预训练扩散模型的生成能力提炼为单步生成器。SiD不仅在提炼过程中实现弗雷歇特启发距离(FID)的指数级快速降低,而且还接近甚至超过原始教师扩散模型的FID性能。通过将向前扩散过程重新制定为半隐式分布,我们利用三个与得分相关的身份来创建一种创新的损失机制。该机制通过使用生成器自己合成的图像进行训练,实现了快速的FID降低,消除了对真实数据或基于反向扩散的生成的需求,所有这些都在显著缩短的生成时间内完成。在四个基准数据集上进行评估后,SiD算法在提炼过程中表现出高迭代效率,并在生成质量方面超越了竞争的提炼方法,无论是一步或几步,无数据还是依赖训练数据。这一成就不仅重新定义了扩散提炼中效率和有效性的基准,也重新定义了基于扩散的生成的更广泛领域。PyTorch实现可在https://github.com/mingyuanzhou/SiD 上找到。
更新时间: 2024-05-24 17:20:46
领域: cs.LG,cs.AI,cs.CV,stat.ML
Neural Persistence Dynamics
We consider the problem of learning the dynamics in the topology of time-evolving point clouds, the prevalent spatiotemporal model for systems exhibiting collective behavior, such as swarms of insects and birds or particles in physics. In such systems, patterns emerge from (local) interactions among self-propelled entities. While several well-understood governing equations for motion and interaction exist, they are difficult to fit to data due to the often large number of entities and missing correspondences between the observation times, which may also not be equidistant. To evade such confounding factors, we investigate collective behavior from a \textit{topological perspective}, but instead of summarizing entire observation sequences (as in prior work), we propose learning a latent dynamical model from topological features \textit{per time point}. The latter is then used to formulate a downstream regression task to predict the parametrization of some a priori specified governing equation. We implement this idea based on a latent ODE learned from vectorized (static) persistence diagrams and show that this modeling choice is justified by a combination of recent stability results for persistent homology. Various (ablation) experiments not only demonstrate the relevance of each individual model component, but provide compelling empirical evidence that our proposed model -- \textit{neural persistence dynamics} -- substantially outperforms the state-of-the-art across a diverse set of parameter regression tasks.
Updated: 2024-05-24 17:20:18
标题: 神经持久动力学
摘要: 我们考虑学习拓扑中的动态学习问题,时间演化点云是展示集体行为的系统的流行时空模型,例如昆虫和鸟类的群体或物理学中的粒子。在这种系统中,模式是由自驱动实体之间的(局部)相互作用而产生的。虽然存在几个被充分理解的运动和相互作用的控制方程,但由于实体数量通常很大且观察时间之间存在缺失对应关系(这些时间也可能不是等距的),因此很难将其拟合到数据中。为了规避这些混淆因素,我们从\textit{拓扑角度}研究集体行为,但与以前的工作不同的是,我们提出从每个时间点的拓扑特征中学习潜在动态模型。然后利用这一模型制定下游回归任务,以预测一些预先指定的控制方程的参数化。我们基于从矢量化(静态)持久图中学习的潜在ODE实现了这一想法,并展示了这一建模选择是通过持久同调的最近稳定性结果的组合来证明的。各种(消融)实验不仅展示了每个个体模型组件的相关性,还提供了有力的经验证据,证明我们提出的模型--\textit{神经持久动力学}--在各种参数回归任务中明显优于现有技术水平。
更新时间: 2024-05-24 17:20:18
领域: cs.LG,cs.CE
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
Updated: 2024-05-24 17:19:57
标题: 理解基础模型之间的差异:注意力,状态空间模型和循环神经网络
摘要: Softmax注意力是各种人工智能应用基础模型的主要支柱,然而在序列长度方面的二次复杂性可能限制其在长上下文环境中的推理吞吐量。为了解决这一挑战,已经考虑了替代架构,如线性注意力、状态空间模型(SSMs)和循环神经网络(RNNs),作为更高效的替代方案。尽管这些方法之间存在联系,但这些模型通常是独立开发的,缺乏对支撑这些架构及其细微差异的共享原则的理论理解,这极大地影响了性能和可伸缩性。在本文中,我们介绍了动态系统框架(DSF),它允许对所有这些架构进行基于原则的调查,采用共同的表示。我们的框架促进了严格的比较,提供了关于每个模型类别独特特征的新见解。例如,我们比较了线性注意力和选择性SSMs,详细说明它们之间的差异以及两者等效的条件。我们还提供了softmax注意力和其他模型类别之间的基于原则的比较,讨论了softmax注意力可以近似的理论条件。此外,我们通过经验验证和数学论证实证了这些新见解。这显示了DSF指导未来更高效和可扩展基础模型系统性发展的潜力。
更新时间: 2024-05-24 17:19:57
领域: cs.LG,cs.AI,cs.SY,eess.SY
Optimizing Large Language Models for OpenAPI Code Completion
Recent advancements in Large Language Models (LLMs) and their utilization in code generation tasks have significantly reshaped the field of software development. Despite the remarkable efficacy of code completion solutions in mainstream programming languages, their performance lags when applied to less ubiquitous formats such as OpenAPI definitions. This study evaluates the OpenAPI completion performance of GitHub Copilot, a prevalent commercial code completion tool, and proposes a set of task-specific optimizations leveraging Meta's open-source model Code Llama. A semantics-aware OpenAPI completion benchmark proposed in this research is used to perform a series of experiments through which the impact of various prompt-engineering and fine-tuning techniques on the Code Llama model's performance is analyzed. The fine-tuned Code Llama model reaches a peak correctness improvement of 55.2% over GitHub Copilot despite utilizing 25 times fewer parameters than the commercial solution's underlying Codex model. Additionally, this research proposes an enhancement to a widely used code infilling training technique, addressing the issue of underperformance when the model is prompted with context sizes smaller than those used during training.
Updated: 2024-05-24 17:19:03
标题: 优化大型语言模型以实现OpenAPI代码完成
摘要: 近期大型语言模型(LLMs)的进展及其在代码生成任务中的利用显著地重塑了软件开发领域。尽管代码补全解决方案在主流编程语言中表现出色,但在应用于OpenAPI定义等不太普遍的格式时,其性能仍有待改进。本研究评估了GitHub Copilot这一常见商业代码补全工具在OpenAPI补全性能上的表现,并提出了一组基于Meta开源模型Code Llama的任务特定优化方案。本研究提出的基于语义的OpenAPI补全基准用于进行一系列实验,分析各种提示工程和微调技术对Code Llama模型性能的影响。微调后的Code Llama模型在GitHub Copilot的基础Codex模型参数的25倍少的情况下,使正确性改进达到55.2%的峰值。此外,本研究提出了一种改进广泛使用的代码填充训练技术的方法,解决了在模型受到比训练过程中使用的上下文尺寸更小的情况下性能不佳的问题。
更新时间: 2024-05-24 17:19:03
领域: cs.SE,cs.CL,cs.LG,68T07, 68T50, 68T05
Anomalous Change Point Detection Using Probabilistic Predictive Coding
Change point detection (CPD) and anomaly detection (AD) are essential techniques in various fields to identify abrupt changes or abnormal data instances. However, existing methods are often constrained to univariate data, face scalability challenges with large datasets due to computational demands, and experience reduced performance with high-dimensional or intricate data, as well as hidden anomalies. Furthermore, they often lack interpretability and adaptability to domain-specific knowledge, which limits their versatility across different fields. In this work, we propose a deep learning-based CPD/AD method called Probabilistic Predictive Coding (PPC) that jointly learns to encode sequential data to low dimensional latent space representations and to predict the subsequent data representations as well as the corresponding prediction uncertainties. The model parameters are optimized with maximum likelihood estimation by comparing these predictions with the true encodings. At the time of application, the true and predicted encodings are used to determine the probability of conformity, an interpretable and meaningful anomaly score. Furthermore, our approach has linear time complexity, scalability issues are prevented, and the method can easily be adjusted to a wide range of data types and intricate applications. We demonstrate the effectiveness and adaptability of our proposed method across synthetic time series experiments, image data, and real-world magnetic resonance spectroscopic imaging data.
Updated: 2024-05-24 17:17:34
标题: 使用概率预测编码进行异常变点检测
摘要: 变点检测(CPD)和异常检测(AD)是各个领域中识别突变变化或异常数据实例的基本技术。然而,现有方法通常局限于单变量数据,面临大数据集的可伸缩性挑战,由于计算需求且在高维或复杂数据以及隐藏异常情况下性能降低。此外,它们通常缺乏可解释性和对领域特定知识的适应性,从而限制了它们在不同领域的多功能性。在这项工作中,我们提出了一种基于深度学习的CPD/AD方法,称为概率预测编码(PPC),该方法联合学习将顺序数据编码为低维潜在空间表示,并预测随后的数据表示以及相应的预测不确定性。通过比较这些预测与真实编码,模型参数通过最大似然估计进行优化。在应用时,真实和预测编码可用于确定符合度的概率,这是一个可解释且有意义的异常得分。此外,我们的方法具有线性时间复杂度,可以避免可伸缩性问题,并且该方法可以轻松调整到各种数据类型和复杂应用中。我们通过合成时间序列实验、图像数据和真实磁共振波谱成像数据展示了我们提出的方法的有效性和适应性。
更新时间: 2024-05-24 17:17:34
领域: stat.ML,cs.LG
Bisimulation Learning
We introduce a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. Our novel technique computes stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. Our procedure learns a candidate state classifier and candidate ranking functions from a finite dataset of sample states; then, it checks whether these generalise to the entire state space using satisfiability modulo theory solving. Upon the affirmative answer, the procedure concludes that the classifier constitutes a valid stutter-insensitive bisimulation of the system. Upon a negative answer, the solver produces a counterexample state for which the classifier violates the claim, adds it to the dataset, and repeats learning and checking in a counterexample-guided inductive synthesis loop until a valid bisimulation is found. We demonstrate on a range of benchmarks from reactive verification and software model checking that our method yields faster verification results than alternative state-of-the-art tools in practice. Our method produces succinct abstractions that enable an effective verification of linear temporal logic without next operator, and are interpretable for system diagnostics.
Updated: 2024-05-24 17:11:27
标题: 双模拟学习
摘要: 我们介绍了一种数据驱动的方法,用于计算具有非常大、可能无限状态空间的状态迁移系统的有限双模拟。我们的新颖技术计算确定性系统的抖动不敏感的双模拟,我们将其描述为学习每个类的状态分类器以及排名函数的问题。我们的过程从样本状态的有限数据集中学习候选状态分类器和候选排名函数;然后,使用可满足性模理论求解检查这些是否推广到整个状态空间。在肯定的答案下,该过程得出结论,即分类器构成了系统的有效抖动不敏感的双模拟。在否定的答案下,求解器产生一个反例状态,其中分类器违反了声明,将其添加到数据集中,并在反例引导的归纳综合循环中重复学习和检查,直到找到有效的双模拟。我们在来自反应性验证和软件模型检查的一系列基准测试中展示了我们的方法比实践中的替代最先进工具产生更快的验证结果。我们的方法产生简洁的抽象,可以有效验证线性时序逻辑而无需下一个运算符,并且对于系统诊断是可解释的。
更新时间: 2024-05-24 17:11:27
领域: cs.LO,cs.LG
Models That Prove Their Own Correctness
How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured \emph{on average} over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output \emph{and} successfully proves its correctness to $V\!$. The *soundness* property of $V$ guarantees that, for *every* input, no model can convince $V$ of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by $V$. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.
Updated: 2024-05-24 17:10:08
标题: 自证正确性的模型
摘要: 我们如何相信一个学习模型在特定输入上的正确性?模型准确性通常是通过对输入分布的平均值进行测量来衡量的,不能保证任何固定输入的准确性。本文提出了一个基于理论的解决方案:训练*自证模型*,通过交互证明向验证算法$V$证明其输出的正确性。自证模型满足,在随机输入的高概率下,模型生成正确输出并成功向$V$证明其正确性。$V$的*正确性*属性保证,对于*每个*输入,没有模型能够说服$V$输出的正确性。因此,自证模型证明了大部分输出的正确性,同时$V$检测到*所有*不正确的输出(任何模型的)。我们设计了一个用于学习自证模型的通用方法,并在某些假设下证明了收敛界限。理论框架和结果得到了对两个整数的最大公约数(GCD)计算能力进行的实验的补充。我们的学习方法用于训练一个自证变换器,该变换器计算GCD并证明其答案的正确性。
更新时间: 2024-05-24 17:10:08
领域: cs.LG,cs.CC,cs.SE
Hierarchical Uncertainty Exploration via Feedforward Posterior Trees
When solving ill-posed inverse problems, one often desires to explore the space of potential solutions rather than be presented with a single plausible reconstruction. Valuable insights into these feasible solutions and their associated probabilities are embedded in the posterior distribution. However, when confronted with data of high dimensionality (such as images), visualizing this distribution becomes a formidable challenge, necessitating the application of effective summarization techniques before user examination. In this work, we introduce a new approach for visualizing posteriors across multiple levels of granularity using tree-valued predictions. Our method predicts a tree-valued hierarchical summarization of the posterior distribution for any input measurement, in a single forward pass of a neural network. We showcase the efficacy of our approach across diverse datasets and image restoration challenges, highlighting its prowess in uncertainty quantification and visualization. Our findings reveal that our method performs comparably to a baseline that hierarchically clusters samples from a diffusion-based posterior sampler, yet achieves this with orders of magnitude greater speed.
Updated: 2024-05-24 17:06:51
标题: 通过前向后验树实现的分层不确定性探索
摘要: 在解决不适定的逆问题时,人们通常希望探索潜在解的空间,而不是只提供一个合理的重建。有价值的见解嵌入在后验分布中,描述了这些可行解及其相关概率。然而,当面对高维数据(如图像)时,可视化这种分布成为一项艰巨的挑战,需要在用户检查之前应用有效的总结技术。在这项工作中,我们提出了一种新的方法,使用树值预测跨多个粒度级别可视化后验。我们的方法在神经网络的单次前向传递中预测任何输入测量的树状层次总结后验分布。我们展示了我们方法在不同数据集和图像恢复挑战中的有效性,突出了其在不确定性量化和可视化方面的实力。我们的研究结果显示,我们的方法与从基于扩散的后验采样器中分层聚类样本相比表现相当,但速度快了几个数量级。
更新时间: 2024-05-24 17:06:51
领域: cs.CV,cs.LG,eess.IV,stat.ML
Infinite Limits of Multi-head Transformer Dynamics
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) to analyze various infinite limits (infinite key/query dimension, infinite heads, and infinite depth) which have different statistical descriptions depending on which infinite limit is taken and how attention layers are scaled. We provide numerical evidence of convergence to the limits and discuss how the parameterization qualitatively influences learned features.
Updated: 2024-05-24 17:01:37
标题: 多头变压器动力学的无限极限
摘要: 在这项工作中,我们分析了变压器模型在特征学习阶段的训练动态的各种缩放极限。我们确定了一组参数化,允许具有明确定义的无限宽度和深度极限,使注意力层能够在训练过程中更新 - 这是这些模型中特征学习的相关概念。然后我们使用动力学平均场理论(DMFT)工具来分析各种无限极限(无限关键/查询维度、无限头数和无限深度),这些极限具有不同的统计描述,取决于采取哪个无限极限以及注意力层如何被缩放。我们提供了收敛到极限的数值证据,并讨论了参数化如何在质量上影响学习到的特征。
更新时间: 2024-05-24 17:01:37
领域: stat.ML,cond-mat.dis-nn,cs.LG
Information-theoretic Generalization Analysis for Expected Calibration Error
While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.
Updated: 2024-05-24 16:59:29
标题: 信息理论的一般化分析:期望校准误差
摘要: 尽管采用分箱法的预期校准误差(ECE)被广泛应用于评估机器学习模型的校准性能,但对其估计偏差的理论理解有限。在本文中,我们首次对两种常见的分箱策略,即均匀质量分箱和均匀宽度分箱的估计偏差进行了全面分析。我们的分析建立了偏差的上限,实现了改进的收敛速度。此外,我们的界限首次揭示了最小化估计偏差所需的最佳分箱数量。我们进一步将偏差分析扩展到基于信息理论方法的泛化误差分析,推导出上界,使得可以对未知数据的ECE有多小进行数值评估。使用深度学习模型的实验表明,由于这种信息理论泛化分析方法,我们的界限是有意义的。
更新时间: 2024-05-24 16:59:29
领域: cs.LG,math.ST,stat.ML,stat.TH
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.
Updated: 2024-05-24 16:52:09
标题: 几何复杂性对迁移学习中神经网络崩溃的影响
摘要: 最近在计算机视觉和语言模型领域取得的许多显著进展可以归功于通过预训练大型基础模型实现的迁移学习的成功。然而,一个能够解释这种经验成功的理论框架仍然不完整,仍然是一个活跃的研究领域。损失表面的平坦性和神经坍塌最近出现作为有用的预训练指标,揭示了潜在的预训练基础的偏见。在本文中,我们探讨了模型学习表示的几何复杂性作为联系这两个概念的基本机制。我们通过实验证明和理论表明,影响预训练网络的几何复杂性的机制也会影响神经坍塌。此外,我们展示了这种几何复杂性的效果如何推广到新类别的神经坍塌,从而在下游任务中鼓励更好的性能,特别是在少样本情况下。
更新时间: 2024-05-24 16:52:09
领域: cs.LG
Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification
Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we introduce Medformer, a multi-granularity patching transformer tailored specifically for medical time series classification. Our method incorporates three novel mechanisms to leverage the unique characteristics of medical time series: cross-channel patching to leverage inter-channel correlations, multi-granularity embedding for capturing features at different scales, and two-stage (intra- and inter-granularity) multi-granularity self-attention for learning features and correlations within and among granularities. We conduct extensive experiments on five public datasets under both subject-dependent and challenging subject-independent setups. Results demonstrate Medformer's superiority over 10 baselines, achieving top averaged ranking across five datasets on all six evaluation metrics. These findings underscore the significant impact of our method on healthcare applications, such as diagnosing Myocardial Infarction, Alzheimer's, and Parkinson's disease. We release the source code at \url{https://github.com/DL4mHealth/Medformer}.
Updated: 2024-05-24 16:51:10
标题: Medformer:一种用于医学时间序列分类的多粒度修补变换器
摘要: 医学时间序列数据,如脑电图(EEG)和心电图(ECG),在诊断脑部和心脏疾病等医疗保健中起着至关重要的作用。现有的医学时间序列分类方法主要依赖于手工提取生物标志物和基于CNN的模型,对于专门针对医学时间序列的变换器的探索有限。在本文中,我们介绍了Medformer,一个专门针对医学时间序列分类的多粒度贴片变换器。我们的方法结合了三种新颖的机制,利用医学时间序列的独特特征:跨通道贴片以利用通道间的相关性,多粒度嵌入以捕捉不同尺度的特征,以及两阶段(内部和外部粒度)多粒度自注意力以学习特征和跨粒度之间的相关性。我们在五个公共数据集上进行了广泛实验,涵盖了主体相关和具有挑战性的主体无关设置。结果表明,Medformer在所有六个评估指标上在五个数据集中的平均排名最高,超过了10个基线方法。这些发现强调了我们的方法对诊断心肌梗死、阿尔茨海默病和帕金森病等医疗应用的重要影响。我们在\url{https://github.com/DL4mHealth/Medformer}上发布了源代码。
更新时间: 2024-05-24 16:51:10
领域: eess.SP,cs.AI,cs.LG
Dimension-free deterministic equivalents for random feature regression
In this work we investigate the generalization performance of random feature ridge regression (RFRR). Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues. Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension -- allowing for infinite-dimensional features. We expect this deterministic equivalent to hold broadly beyond our theoretical analysis, and we empirically validate its predictions on various real and synthetic datasets. As an application, we derive sharp excess error rates under standard power-law assumptions of the spectrum and target decay. In particular, we provide a tight result for the smallest number of features achieving optimal minimax error rate.
Updated: 2024-05-24 16:43:26
标题: 《无维度确定等价随机特征回归》
摘要: 在这项工作中,我们研究了随机特征岭回归(RFRR)的泛化性能。我们的主要贡献是为RFRR的测试误差提供了一个一般确定性等价物。具体而言,在某种集中特性下,我们表明测试误差可以很好地近似为一个仅依赖于特征映射特征值的闭合形式表达式。值得注意的是,我们的近似保证是非渐近的、乘法的,并且独立于特征映射维度 -- 允许使用无限维特征。我们期望这一确定性等价物在我们的理论分析之外广泛适用,并在各种真实和合成数据集上对其预测进行了实证验证。作为一个应用,我们根据谱和目标衰减的标准幂律假设推导出尖锐的超额误差率。特别地,我们为实现最优极小误差率所需的最少特征数提供了一个紧密的结果。
更新时间: 2024-05-24 16:43:26
领域: stat.ML,cond-mat.dis-nn,cs.LG
Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection
Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data from 27478 admissions to the University Hospitals Leuven, covering 30862 catheter episodes (970 CLABSI, 1466 deaths and 28426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. We evaluated model performance across 100 train/test splits. Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for baseline predictions, rose to 0.78 for predictions at day 5 in the catheter episode, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models. In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.
Updated: 2024-05-24 16:43:16
标题: 静态和动态随机森林模型在竞争风险存在下的电子病历数据比较:预测中心静脉导管相关血流感染
摘要: 与住院相关的预后结果通常不会受到审查,可以将其建模为分类或作为事件发生的时间。竞争事件很常见但通常被忽视。我们比较了随机森林(RF)模型在使用不同结果操作规范预测中心导管相关血流感染(CLABSI)风险的性能。我们包含了27478次入院数据,涵盖了30862个导管插管(970例CLABSI、1466例死亡和28426例出院),以建立用于二元(CLABSI与无CLABSI)、多项式(CLABSI、出院、死亡或无事件)、生存(CLABSI发生时间)和竞争风险(CLABSI发生时间、出院或死亡)结果预测的静态和动态RF模型,以预测7天CLABSI风险。我们在100次训练/测试分割中评估了模型性能。二元、多项式和竞争风险模型的性能相似:基线预测的AUROC为0.74,到插管事件第5天的预测上升至0.78,之后下降。生存模型高估了CLABSI的风险(E:O比值在1.2和1.6之间),AUROC约比其他模型低0.01。二元和多项式模型具有最低的计算时间。包含多个结果事件的模型(多项式和竞争风险)显示出与二元和生存模型不同的内部结构。在没有审查的情况下,与我们研究的情境中用于CLABSI预测的二元模型相比,复杂的建模选择并没有显著改善预测性能。生存模型在竞争事件发生时审查应被避免。
更新时间: 2024-05-24 16:43:16
领域: cs.LG,stat.ML
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Industries such as finance, meteorology, and energy generate vast amounts of data daily. Efficiently managing, processing, and displaying this data requires specialized expertise and is often tedious and repetitive. Leveraging large language models (LLMs) to develop an automated workflow presents a highly promising solution. However, LLMs are not adept at handling complex numerical computations and table manipulations and are also constrained by a limited context budget. Based on this, we propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests. The advancements are twofold: First, it is a code-centric agent that receives human requests and generates code as an intermediary to handle massive data, which is quite flexible for large-scale data processing tasks. Second, Data-Copilot involves a data exploration phase in advance, which explores how to design more universal and error-free interfaces for real-time response. Specifically, it actively explores data sources, discovers numerous common requests, and abstracts them into many universal interfaces for daily invocation. When deployed in real-time requests, Data-Copilot only needs to invoke these pre-designed interfaces, transforming raw data into visualized outputs (e.g., charts, tables) that best match the user's intent. Compared to generating code from scratch, invoking these pre-designed and compiler-validated interfaces can significantly reduce errors during real-time requests. Additionally, interface workflows are more efficient and offer greater interpretability than code. We open-sourced Data-Copilot with massive Chinese financial data, such as stocks, funds, and news, demonstrating promising application prospects.
Updated: 2024-05-24 16:35:15
标题: 数据飞行员:利用自主工作流桥接数十亿数据和人类
摘要: 金融、气象和能源等行业每天产生大量数据。高效管理、处理和显示这些数据需要专业知识,通常是繁琐重复的。利用大型语言模型(LLMs)开发自动化工作流程是一种非常有前途的解决方案。然而,LLMs并不擅长处理复杂的数值计算和表格操作,并且受到有限的上下文预算的限制。基于此,我们提出了Data-Copilot,这是一个数据分析代理,可以自动执行查询、处理和可视化大规模数据,以满足各种人类请求。这个技术的进步有两个方面:首先,它是一个以代码为中心的代理,接收人类请求并生成代码作为中介来处理大规模数据,非常灵活适用于大规模数据处理任务。其次,Data-Copilot 在预先进行数据探索阶段,探索如何设计更通用和无误的接口以实现实时响应。具体而言,它积极探索数据源,发现许多常见请求,并将其抽象为许多通用接口以便每天调用。在实时请求中部署时,Data-Copilot 只需调用这些预先设计的接口,将原始数据转换为最符合用户意图的可视化输出(如图表、表格)。与从头开始生成代码相比,调用这些预先设计和经过编译验证的接口可以显著减少实时请求中的错误。此外,接口工作流比代码更高效,提供更大的解释性。我们以大量中国金融数据(如股票、基金和新闻)开源了Data-Copilot,展示了有前途的应用前景。
更新时间: 2024-05-24 16:35:15
领域: cs.CL,cs.AI,cs.CE
Heart Murmur and Abnormal PCG Detection via Wavelet Scattering Transform & a 1D-CNN
Heart murmurs provide valuable information about mechanical activity of the heart, which aids in diagnosis of various heart valve diseases. This work does automatic and accurate heart murmur detection from phonocardiogram (PCG) recordings. Two public PCG datasets (CirCor Digiscope 2022 dataset and PCG 2016 dataset) from Physionet online database are utilized to train and test three custom neural networks (NN): a 1D convolutional neural network (CNN), a long short-term memory (LSTM) recurrent neural network (RNN), and a convolutional RNN (C-RNN). We first do pre-processing which includes the following key steps: denoising, segmentation, re-labeling of noise-only segments, data normalization, and time-frequency analysis of the PCG segments using wavelet scattering transform. We then conduct four experiments, first three (E1-E3) using PCG 2022 dataset, and fourth (E4) using PCG 2016 dataset. It turns out that our custom 1D-CNN outperforms other two NNs (LSTM-RNN and C-RNN). Further, our 1D-CNN model outperforms the related work in terms of accuracy, weighted accuracy, F1-score and AUROC, for experiment E3 (that utilizes the cleaned and re-labeled PCG 2022 dataset). As for experiment E1 (that utilizes the original PCG 2022 dataset), our model performs quite close to the related work in terms of weighted accuracy and F1-score.
Updated: 2024-05-24 16:31:43
标题: 心脏杂音和异常PCG检测通过小波散射变换和一维卷积神经网络
摘要: 心脏杂音提供了有关心脏机械活动的宝贵信息,有助于诊断各种心脏瓣膜疾病。本文对来自Physionet在线数据库的两个公共心音图(PCG)数据集(CirCor Digiscope 2022数据集和PCG 2016数据集)进行了自动和准确的心脏杂音检测。我们利用三个自定义神经网络(NN)进行训练和测试:一个一维卷积神经网络(CNN),一个长短期记忆(LSTM)递归神经网络(RNN)和一个卷积RNN(C-RNN)。我们首先进行预处理,包括以下关键步骤:去噪,分割,重新标记仅噪音段,数据归一化和使用小波散射变换对PCG段进行时频分析。然后进行四个实验,前三个(E1-E3)使用PCG 2022数据集,第四个(E4)使用PCG 2016数据集。结果表明,我们的自定义1D-CNN胜过其他两个NN(LSTM-RNN和C-RNN)。此外,我们的1D-CNN模型在准确性,加权准确性,F1分数和AUROC方面优于相关工作,用于实验E3(利用清洁和重新标记的PCG 2022数据集)。至于实验E1(利用原始PCG 2022数据集),我们的模型在加权准确性和F1得分方面表现与相关工作相当。
更新时间: 2024-05-24 16:31:43
领域: eess.SP,cs.LG
WorDepth: Variational Language Prior for Monocular Depth Estimation
Three-dimensional (3D) reconstruction from a single image is an ill-posed problem with inherent ambiguities, i.e. scale. Predicting a 3D scene from text description(s) is similarly ill-posed, i.e. spatial arrangements of objects described. We investigate the question of whether two inherently ambiguous modalities can be used in conjunction to produce metric-scaled reconstructions. To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene. To this end, we begin by encoding the text caption as a mean and standard deviation; using a variational framework, we learn the distribution of the plausible metric reconstructions of 3D scenes corresponding to the text captions as a prior. To "select" a specific reconstruction or depth map, we encode the given image through a conditional sampler that samples from the latent space of the variational text encoder, which is then decoded to the output depth map. Our approach is trained alternatingly between the text and image branches: in one optimization step, we predict the mean and standard deviation from the text description and sample from a standard Gaussian, and in the other, we sample using a (image) conditional sampler. Once trained, we directly predict depth from the encoded text using the conditional sampler. We demonstrate our approach on indoor (NYUv2) and outdoor (KITTI) scenarios, where we show that language can consistently improve performance in both.
Updated: 2024-05-24 16:30:43
标题: WorDepth:用于单目深度估计的变分语言先验
摘要: 从单个图像进行三维(3D)重建是一个具有固有模糊性的逆问题,即尺度。从文本描述中预测3D场景同样是一个具有固有模糊性的问题,即描述的物体空间布局。我们研究了两种固有模糊性模态是否可以结合起来生成度量级重建。为了测试这一点,我们关注单眼深度估计,即从单个图像预测稠密深度图的问题,但附加一个描述场景的文本说明。为此,我们首先将文本说明编码为均值和标准差;使用变分框架,我们学习与文本说明对应的3D场景可能度量重建的分布作为先验。为了“选择”特定的重建或深度图,我们通过条件取样器对给定图像进行编码,该取样器从变分文本编码器的潜在空间中取样,然后解码为输出深度图。我们的方法在文本和图像分支之间交替训练:在一个优化步骤中,我们从文本描述中预测均值和标准差,并从标准高斯分布中取样,而在另一个优化步骤中,我们使用(图像)条件取样器进行取样。训练完成后,我们使用条件取样器直接从编码的文本中预测深度。我们在室内(NYUv2)和室外(KITTI)场景上展示了我们的方法,在这些场景中,我们展示语言可以在两者中始终提高性能。
更新时间: 2024-05-24 16:30:43
领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex Equiangular Tight Framework, which nicely explains the prevalent observation that ID features reside further from the origin than OOD features. Taking both insights from Neural Collapse into consideration, we propose to leverage feature proximity to weight vectors for OOD detection and further complement this perspective by using feature norms to filter OOD samples. Extensive experiments on off-the-shelf models demonstrate the efficiency and effectiveness of our method across diverse classification tasks and model architectures, enhancing the generalization capability of OOD detection.
Updated: 2024-05-24 16:30:30
标题: 透过神经崩溃的视角检测分布之外的数据
摘要: 高效且多功能的离群分布(OOD)检测对于人工智能的安全部署至关重要,但对现有算法仍然具有挑战性。受神经坍塌的启发,我们发现内部分布(ID)样本的特征与权重向量相比,聚集得更近,而OOD样本的特征则相对分散。此外,我们揭示了ID特征倾向于在空间中扩展以构建一个简单的等角紧框架,这很好地解释了一个普遍观察结果,即ID特征比OOD特征更远离原点。综合考虑神经坍塌的这两个见解,我们提议利用特征与权重向量的接近程度来进行OOD检测,并通过使用特征规范来过滤OOD样本,进一步补充这一观点。对现有模型的大量实验表明,我们的方法在不同分类任务和模型架构上均表现出高效和有效,增强了OOD检测的泛化能力。
更新时间: 2024-05-24 16:30:30
领域: cs.LG,eess.IV
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses. However, when it comes to controllable video generation, ControlNets cannot be directly integrated into new backbones due to feature space mismatches, and training ControlNets for new backbones can be a significant burden for many users. Furthermore, applying ControlNets independently to different frames cannot effectively maintain object temporal consistency. To address these challenges, we introduce Ctrl-Adapter, an efficient and versatile framework that adds diverse controls to any image/video diffusion model through the adaptation of pretrained ControlNets. Ctrl-Adapter offers strong and diverse capabilities, including image and video control, sparse-frame video control, fine-grained patch-level multi-condition control (via an MoE router), zero-shot adaptation to unseen conditions, and supports a variety of downstream tasks beyond spatial control, including video editing, video style transfer, and text-guided motion control. With six diverse U-Net/DiT-based image/video diffusion models (SDXL, PixArt-$\alpha$, I2VGen-XL, SVD, Latte, Hotshot-XL), Ctrl-Adapter matches the performance of pretrained ControlNets on COCO and achieves the state-of-the-art on DAVIS 2017 with significantly lower computation (< 10 GPU hours).
Updated: 2024-05-24 16:29:38
标题: Ctrl-Adapter:一种高效且多功能的框架,用于将各种控制器适应到任何扩散模型中
摘要: 控制网络广泛用于为具有不同条件的文本到图像扩散模型添加空间控制,例如深度图,涂鸦/素描和人体姿势。然而,在可控视频生成方面,由于特征空间不匹配,控制网络不能直接集成到新的骨干结构中,为新的骨干结构训练控制网络可能对许多用户来说是一个重大负担。此外,独立地将控制网络应用于不同帧不能有效地维持对象的时间一致性。为了解决这些挑战,我们引入了Ctrl-Adapter,这是一个高效且多功能的框架,通过预训练的控制网络的适应来为任何图像/视频扩散模型添加多样化的控制。Ctrl-Adapter提供了强大且多样化的功能,包括图像和视频控制,稀疏帧视频控制,细粒度的补丁级多条件控制(通过MoE路由器),零样本适应到未见条件,并支持各种空间控制以外的下游任务,包括视频编辑,视频风格转移和文本引导的运动控制。通过六种多样化的U-Net/DiT-based图像/视频扩散模型(SDXL,PixArt-$\alpha$,I2VGen-XL,SVD,Latte,Hotshot-XL),Ctrl-Adapter在COCO上与预训练的控制网络的性能相匹配,并在DAVIS 2017上取得了最新技术水平,计算量显著降低(<10 GPU小时)。
更新时间: 2024-05-24 16:29:38
领域: cs.CV,cs.AI,cs.LG
A Distributional Analogue to the Successor Representation
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
Updated: 2024-05-24 16:29:32
标题: 一个与继任者表征相关的分布式模拟
摘要: 本文提出了一种新的分布强化学习方法,阐明了在学习过程中转移结构和奖励之间的清晰分离。类似于继任者表示(SR)描述了根据给定策略行为的预期后果,我们的分布继任者测量(SM)描述了这种行为的分布后果。我们将分布SM制定为分布,并提供与分布和基于模型的强化学习相关的理论。此外,我们提出了一种通过最小化两级最大均值差异从数据中学习分布SM的算法。我们方法的关键在于一些算法技术,这些技术对于学习状态的生成模型是独立有价值的。作为分布SM有用性的例证,我们展示它使得零射风险敏感策略评估成为可能,而以前是不可能的。
更新时间: 2024-05-24 16:29:32
领域: cs.LG,cs.AI,stat.ML
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
Conventional demographic inference methods have predominantly operated under the supervision of accurately labeled data, yet struggle to adapt to shifting social landscapes and diverse cultural contexts, leading to narrow specialization and limited accuracy in applications. Recently, the emergence of large multimodal models (LMMs) has shown transformative potential across various research tasks, such as visual comprehension and description. In this study, we explore the application of LMMs to demographic inference and introduce a benchmark for both quantitative and qualitative evaluation. Our findings indicate that LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs, albeit with a propensity for off-target predictions. To enhance LMM performance and achieve comparability with supervised learning baselines, we propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
Updated: 2024-05-24 16:26:56
标题: 使用大型多模态模型进行人口推断的思维链提示
摘要: 传统的人口统计推断方法主要在准确标记的数据监督下运行,但难以适应不断变化的社会景观和多样化的文化背景,导致在应用中专业化程度较窄、准确性有限。最近,大型多模态模型(LMMs)的出现展示了在各种研究任务中的变革潜力,比如视觉理解和描述。在这项研究中,我们探索了将LMMs应用于人口统计推断,并引入了一个用于定量和定性评估的基准。我们的研究结果表明,LMMs在零样本学习、可解释性和处理未经筛选的“野外”输入方面具有优势,尽管存在偏离目标预测的倾向。为了提高LMM性能并实现与监督学习基线的可比性,我们提出了一种增强提示的“思维链”方法,有效地缓解了偏离目标预测问题。
更新时间: 2024-05-24 16:26:56
领域: cs.CV,cs.LG
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs). However, most adapters generate consistent visual tokens, regardless of the specific objects of interest mentioned in the prompt. Since these adapters distribute equal attention to every detail in the image and focus on the entire scene, they may increase the cognitive load for LLMs, particularly when processing complex scenes. To alleviate this problem, we propose prompt-aware adapters. These adapters are designed with the capability to dynamically embed visual inputs based on the specific focus of the prompt. Specifically, prompt-aware adapters utilize both global and local textual features to capture the most relevant visual clues from the prompt at both coarse and fine granularity levels. This approach significantly enhances the ability of LLMs to understand and interpret visual content. Experiments on various visual question answering tasks, such as counting and position reasoning, demonstrate the effectiveness of prompt-aware adapters.
Updated: 2024-05-24 16:24:10
标题: Prompt-Aware Adapter: 为多模态大型语言模型学习自适应视觉令牌
摘要: 为了弥合视觉和语言模态之间的差距,多模大语言模型(MLLMs)通常学习一个适配器,将视觉输入转换为大型语言模型(LLMs)可理解的标记。然而,大多数适配器生成一致的视觉标记,无论提示中提到的特定对象是什么。由于这些适配器对图像中的每个细节都分配了相同的注意力,并专注于整个场景,它们可能增加了对LLMs的认知负荷,特别是在处理复杂场景时。为了缓解这个问题,我们提出了提示感知适配器。这些适配器被设计为具有根据提示的特定焦点动态嵌入视觉输入的能力。具体而言,提示感知适配器利用全局和局部文本特征,在粗粒度和细粒度级别捕捉提示中最相关的视觉线索。这种方法显著增强了LLMs理解和解释视觉内容的能力。对各种视觉问答任务的实验,如计数和位置推理,展示了提示感知适配器的有效性。
更新时间: 2024-05-24 16:24:10
领域: cs.CV,cs.AI
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models
Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When improving efficiency, researchers often use the original diffusion model to train an additional network designed specifically for fast image generation. In contrast, our approach seeks to reduce latency directly, without any retraining, fine-tuning, or knowledge distillation. In particular, we find the repeated calculation of attention maps to be costly yet redundant, and instead suggest reusing them during sampling. Our specific reuse strategies are based on ODE theory, which implies that the later a map is reused, the smaller the distortion in the final image. We empirically compare these reuse strategies with few-step sampling procedures of comparable latency, finding that reuse generates images that are closer to those produced by the original high-latency diffusion model.
Updated: 2024-05-24 16:23:38
标题: 在扩散模型中通过重复使用注意力图实现快速采样
摘要: 文本到图像扩散模型展示了灵活和逼真的图像合成的前所未有的能力。然而,这些模型依赖于耗时的采样过程,这促使人们试图减少它们的延迟。在提高效率时,研究人员通常使用原始扩散模型来训练一个专门设计用于快速图像生成的附加网络。相比之下,我们的方法直接寻求减少延迟,而无需任何重新训练、微调或知识蒸馏。具体来说,我们发现重复计算注意力图在成本高且多余,而建议在采样过程中重复使用它们。我们的具体重用策略基于ODE理论,这意味着图被重用的时间越晚,最终图像中的失真越小。我们通过实证方法将这些重用策略与具有相似延迟的少步采样过程进行比较,发现重用生成的图像更接近原始高延迟扩散模型生成的图像。
更新时间: 2024-05-24 16:23:38
领域: cs.CV,cs.AI
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap
Recent interest in Large Vision-Language Models (LVLMs) for practical applications is moderated by the significant challenge of hallucination or the inconsistency between the factual information and the generated text. In this paper, we first perform an in-depth analysis of hallucinations and discover several novel insights about how and when LVLMs hallucinate. From our analysis, we show that: (1) The community's efforts have been primarily targeted towards reducing hallucinations related to visual recognition (VR) prompts (e.g., prompts that only require describing the image), thereby ignoring hallucinations for cognitive prompts (e.g., prompts that require additional skills like reasoning on contents of the image). (2) LVLMs lack visual perception, i.e., they can see but not necessarily understand or perceive the input image. We analyze responses to cognitive prompts and show that LVLMs hallucinate due to a perception gap: although LVLMs accurately recognize visual elements in the input image and possess sufficient cognitive skills, they struggle to respond accurately and hallucinate. To overcome this shortcoming, we propose Visual Description Grounded Decoding (VDGD), a simple, robust, and training-free method for alleviating hallucinations. Specifically, we first describe the image and add it as a prefix to the instruction. Next, during auto-regressive decoding, we sample from the plausible candidates according to their KL-Divergence (KLD) to the description, where lower KLD is given higher preference. Experimental results on several benchmarks and LVLMs show that VDGD improves significantly over other baselines in reducing hallucinations. We also propose VaLLu, a benchmark for the comprehensive evaluation of the cognitive capabilities of LVLMs.
Updated: 2024-05-24 16:21:59
标题: VDGD:通过弥合视觉知觉差距缓解认知提示中的LVLM幻觉
摘要: 最近对于大规模视觉-语言模型(LVLMs)在实际应用中的兴趣受到了幻觉或生成文本与事实信息之间不一致的显著挑战的调节。本文首先对幻觉进行了深入分析,并发现了一些关于LVLMs如何以及何时产生幻觉的新颖见解。通过我们的分析,我们发现:(1) 社区的努力主要集中在减少与视觉识别(VR)提示相关的幻觉(例如,仅需要描述图像的提示),从而忽略了认知提示(例如,需要对图像内容进行推理等技能的提示)的幻觉。(2) LVLMs缺乏视觉感知,即它们能够看到但不一定能理解或感知输入图像。我们分析了对认知提示的反应,并展示了LVLMs由于感知差距而产生幻觉:尽管LVLMs能够准确识别输入图像中的视觉元素并具备足够的认知技能,但它们在做出准确回应时会遇到困难并产生幻觉。为了克服这一缺点,我们提出了一种简单、稳健且无需训练的方法——视觉描述基础解码(VDGD),用于减少幻觉。具体来说,我们首先描述图像并将其作为指导的前缀。接下来,在自回归解码过程中,我们根据它们与描述之间的KL散度(KLD)从可能的候选中进行抽样,其中KL散度较低的被给予更高的优先级。在几个基准测试和LVLMs上的实验结果显示,VDGD在减少幻觉方面明显优于其他基线。我们还提出了VaLLu,这是一个用于全面评估LVLMs认知能力的基准测试。
更新时间: 2024-05-24 16:21:59
领域: cs.CV,cs.AI,cs.CL
What AIs are not Learning (and Why): Bio-Inspired Foundation Models for Robots
It is hard to make robots (including telerobots) that are useful, and harder to make autonomous robots that are robust and general. Current smart robots are created using manual programming, mathematical models, planning frameworks, and reinforcement learning. These methods do not lead to the leaps in performance and generality seen with deep learning, generative AI, and foundation models (FMs). Today's robots do not learn to provide home care, to be nursing assistants, or to do household chores nearly as well as people do. Addressing the aspirational opportunities of robot service applications requires improving how they are created. The high cost of bipedal multi-sensory robots ("bodies") is a significant obstacle for both research and deployment. A deeper issue is that mainstream FMs ("minds") do not support sensing, acting, and learning in context in the real world. They do not lead to robots that communicate well or collaborate. They do not lead to robots that try to learn by experimenting, by asking others, or by imitation learning as appropriate. They do not lead to robots that know enough to be deployed widely in service applications. This paper focuses on what human-compatible service robots need to know. It recommends developing experiential (aka "robotic") FMs for bootstrapping them.
Updated: 2024-05-24 16:20:47
标题: 人工智能未学习到的内容(以及原因):机器人的生物启发基础模型
摘要: 很难制造出有用的机器人(包括遥控机器人),而制造出具有稳健性和通用性的自主机器人则更为困难。目前的智能机器人是通过手动编程、数学模型、规划框架和强化学习创建的。这些方法不会像深度学习、生成式人工智能和基础模型(FMs)那样带来性能和通用性的飞跃。今天的机器人无法像人类那样学会提供家庭护理、担任护士助手或做家务。解决机器人服务应用的愿景机会需要改进它们的创造方式。双足多感官机器人(“身体”)的高成本是研究和部署的重要障碍。更深层次的问题是主流的基础模型(“思维”)在现实世界中不支持感知、行动和学习。它们不会导致机器人良好沟通或合作。它们不会导致机器人通过实验、向他人提问或适当地进行模仿学习而尝试学习。它们不会导致机器人掌握足够的知识以广泛部署于服务应用中。本文聚焦于人类兼容的服务机器人需要了解的内容,并建议开发经验型(又称“机器人”)基础模型以帮助它们自我启动。
更新时间: 2024-05-24 16:20:47
领域: cs.AI,cs.HC
The Road Less Scheduled
Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).
Updated: 2024-05-24 16:20:46
标题: 《少有规划的道路》
摘要: 现有的学习率调度方案中,不需要指定优化停止步骤T的方案被依赖于T的学习率调度方案大大超越。我们提出了一种方法,通过完全避免使用调度,同时在从凸问题到大规模深度学习问题的广泛问题族中展现出与调度相比的最新性能。我们的无调度方法不需要额外的超参数,与带有动量的标准优化器相比。我们的方法是我们开发的统一调度和迭代平均的新理论的直接结果。我们的方法的开源实现可在https://github.com/facebookresearch/schedule_free找到。
更新时间: 2024-05-24 16:20:46
领域: cs.LG,cs.AI,math.OC,stat.ML
Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation
The rapid development of large language models (LLMs) has led to significant advancements in code completion tasks. While larger models have higher accuracy, they also cost much more to run. Meanwhile, model cascading has been proven effective to conserve computational resources while enhancing accuracy in LLMs on natural language generation tasks. It generates output with the smallest model in a set, and only queries the larger models when it fails to meet predefined quality criteria. However, this strategy has not been used in code completion tasks, primarily because assessing the quality of code completions differs substantially from assessing natural language, where the former relies heavily on the functional correctness. To address this, we propose letting each model generate and execute a set of test cases for their solutions, and use the test results as the cascading threshold. We show that our model cascading strategy reduces computational costs while increases accuracy compared to generating the output with a single model. We also introduce a heuristics to determine the optimal combination of the number of solutions, test cases, and test lines each model should generate, based on the budget. Compared to speculative decoding, our method works on black-box models, having the same level of cost-accuracy trade-off, yet providing much more choices based on the server's budget. Ours is the first work to optimize cost-accuracy trade-off for LLM code generation with model cascading.
Updated: 2024-05-24 16:20:04
标题: 代码模型级联:使用基于LLM的代码生成模型级联减少推理成本
摘要: 大型语言模型(LLM)的快速发展已经在代码完成任务方面取得了重大进展。虽然更大的模型具有更高的准确性,但运行成本也更高。同时,模型级联已被证明在自然语言生成任务中可以有效节省计算资源同时提高LLM的准确性。它使用一组中最小的模型生成输出,只有在未达到预定义的质量标准时才查询更大的模型。然而,这种策略尚未在代码完成任务中使用,主要是因为评估代码完成的质量与评估自然语言有很大不同,前者在很大程度上依赖于功能正确性。为了解决这个问题,我们建议让每个模型生成和执行一组针对其解决方案的测试用例,并将测试结果用作级联阈值。我们展示了我们的模型级联策略相比使用单一模型生成输出可以降低计算成本同时提高准确性。我们还引入了一种启发式方法来确定每个模型应生成的解决方案数量、测试用例数量和测试行数的最佳组合,基于预算。与推测解码相比,我们的方法适用于黑匣子模型,在成本-准确性权衡方面具有相同水平,但根据服务器预算提供更多选择。我们是第一个针对LLM代码生成进行模型级联的优化成本-准确性权衡的研究。
更新时间: 2024-05-24 16:20:04
领域: cs.SE,cs.LG
Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code is available at https://github.com/DaShenZi721/HRA
Updated: 2024-05-24 16:18:16
标题: 通过Householder反射适应来弥合低秩和正交适应之间的差距
摘要: 在遵循不同的技术路线的同时,低秩和正交适应技术都可以有效地根据少量可训练参数在特定任务或领域中对大规模预训练模型进行适应。在这项研究中,我们弥合了这两种技术之间的差距,提出了一种基于Householder反射的简单但有效的适应方法。给定一个预训练模型,我们的方法通过将每个冻结权重矩阵与由可学习Householder反射(HRs)链构造的正交矩阵相乘来微调其层。这基于HR的正交微调等同于自适应低秩适应。此外,我们还展示了与HRs对应的反射平面的正交性如何影响模型的容量和规则性。分析激励我们对HRs的正交性进行规范化,从而导致了提出的Householder反射适应(HRA)方法的不同实现。与最先进的方法相比,HRA在适应大型语言模型和有条件图像生成器时以更少的可学习参数实现了卓越的性能。代码可在https://github.com/DaShenZi721/HRA上找到。
更新时间: 2024-05-24 16:18:16
领域: cs.LG,cs.CV
Toward TransfORmers: Revolutionizing the Solution of Mixed Integer Programs with Transformers
In this study, we introduce an innovative deep learning framework that employs a transformer model to address the challenges of mixed-integer programs, specifically focusing on the Capacitated Lot Sizing Problem (CLSP). Our approach, to our knowledge, is the first to utilize transformers to predict the binary variables of a mixed-integer programming (MIP) problem. Specifically, our approach harnesses the encoder decoder transformer's ability to process sequential data, making it well-suited for predicting binary variables indicating production setup decisions in each period of the CLSP. This problem is inherently dynamic, and we need to handle sequential decision making under constraints. We present an efficient algorithm in which CLSP solutions are learned through a transformer neural network. The proposed post-processed transformer algorithm surpasses the state-of-the-art solver, CPLEX and Long Short-Term Memory (LSTM) in solution time, optimal gap, and percent infeasibility over 240K benchmark CLSP instances tested. After the ML model is trained, conducting inference on the model, reduces the MIP into a linear program (LP). This transforms the ML-based algorithm, combined with an LP solver, into a polynomial-time approximation algorithm to solve a well-known NP-Hard problem, with almost perfect solution quality.
Updated: 2024-05-24 16:17:43
标题: 走向TransfORmers:用Transformers彻底改变混合整数规划问题的解决方式
摘要: 在这项研究中,我们引入了一种创新的深度学习框架,采用了一个变压器模型来解决混合整数规划的挑战,特别是专注于容量限制批量生产问题(CLSP)。据我们所知,我们的方法是第一个利用变压器来预测混合整数规划(MIP)问题的二进制变量。具体来说,我们的方法利用了编码器-解码器变压器处理序列数据的能力,使其适合于预测CLSP每个时间段中指示生产设置决策的二进制变量。这个问题本质上是动态的,我们需要在约束条件下处理顺序决策。我们提出了一种高效的算法,通过变压器神经网络学习CLSP的解决方案。所提出的后处理变压器算法在240K个基准CLSP实例测试中超越了最先进的求解器CPLEX和长短期记忆(LSTM)在解决时间、最优间隙和不可行性百分比方面。在训练ML模型后,对模型进行推理,将MIP转化为线性规划(LP)。这将基于ML的算法与LP求解器相结合,成为一个多项式时间近似算法,用来解决一个众所周知的NP-难问题,几乎完美的解决方案质量。
更新时间: 2024-05-24 16:17:43
领域: cs.AI,cs.LG,math.CO,math.OC,stat.ML
Taming Score-Based Diffusion Priors for Infinite-Dimensional Nonlinear Inverse Problems
This work introduces a sampling method capable of solving Bayesian inverse problems in function space. It does not assume the log-concavity of the likelihood, meaning that it is compatible with nonlinear inverse problems. The method leverages the recently defined infinite-dimensional score-based diffusion models as a learning-based prior, while enabling provable posterior sampling through a Langevin-type MCMC algorithm defined on function spaces. A novel convergence analysis is conducted, inspired by the fixed-point methods established for traditional regularization-by-denoising algorithms and compatible with weighted annealing. The obtained convergence bound explicitly depends on the approximation error of the score; a well-approximated score is essential to obtain a well-approximated posterior. Stylized and PDE-based examples are provided, demonstrating the validity of our convergence analysis. We conclude by presenting a discussion of the method's challenges related to learning the score and computational complexity.
Updated: 2024-05-24 16:17:01
标题: 驯化基于分数的扩散先验用于无限维非线性反问题
摘要: 这项工作介绍了一种在函数空间中解决贝叶斯逆问题的抽样方法。它不假设似然函数的对数凹性,这意味着它与非线性逆问题兼容。该方法利用了最近定义的无限维度基于得分的扩散模型作为学习先验,同时通过在函数空间上定义的 Langevin 类型 MCMC 算法实现可证明的后验抽样。受传统正则化-去噪算法建立的固定点方法的启发,进行了新颖的收敛分析,并与加权退火兼容。获得的收敛界明确取决于得分的逼近误差;一个良好逼近的得分对于获得良好逼近的后验是至关重要的。提供了样式化和基于 PDE 的示例,展示了我们收敛分析的有效性。最后,我们讨论了与学习得分和计算复杂性相关的方法挑战。
更新时间: 2024-05-24 16:17:01
领域: stat.ML,cs.LG,cs.NA,math.NA,62F15, 65N21, 68Q32, 60Hxx, 60Jxx, 65C05, 82C31
Data-driven Semi-supervised Machine Learning with Surrogate Safety Measures for Abnormal Driving Behavior Detection
Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled data. However, ground truth labels are not always available in the real world, and labeling large amounts of data is tedious. Thus, there is a need to explore unsupervised or semi-supervised methods to make the anomaly detection process more feasible and efficient. To fill this research gap, this study analyzes large-scale real-world data revealing several abnormal driving behaviors (e.g., sudden acceleration, rapid lane-changing) and develops a Hierarchical Extreme Learning Machines (HELM) based semi-supervised ML method using partly labeled data to accurately detect the identified abnormal driving behaviors. Moreover, previous ML-based approaches predominantly utilize basic vehicle motion features (such as velocity and acceleration) to label and detect abnormal driving behaviors, while this study seeks to introduce Surrogate Safety Measures (SSMs) as the input features for ML models to improve the detection performance. Results from extensive experiments demonstrate the effectiveness of the proposed semi-supervised ML model with the introduced SSMs serving as important features. The proposed semi-supervised ML method outperforms other baseline semi-supervised or unsupervised methods regarding various metrics, e.g., delivering the best accuracy at 99.58% and the best F-1 measure at 0.9913. The ablation study further highlights the significance of SSMs for advancing detection performance.
Updated: 2024-05-24 16:16:46
标题: 基于数据驱动的半监督机器学习,在异常驾驶行为检测中采用替代安全度量
摘要: 检测异常驾驶行为对道路交通安全和驾驶行为评估至关重要。随着机器学习(ML)算法的进步和自然驾驶数据的积累,许多ML模型已被用于异常驾驶行为检测。大多数现有的基于ML的检测器依赖于(完全)监督的ML方法,这些方法需要大量标记数据。然而,在现实世界中并不总是有地面真实标签可用,标记大量数据是繁琐的。因此,有必要探索无监督或半监督方法,以使异常检测过程更具可行性和效率。为了填补这一研究空白,本研究分析了大规模实际数据,揭示了几种异常驾驶行为(如突然加速、快速变道),并开发了一种基于分部标记数据的分层极限学习机器(HELM)的半监督ML方法,以准确检测已识别的异常驾驶行为。此外,先前的基于ML的方法主要利用基本车辆运动特征(如速度和加速度)来标记和检测异常驾驶行为,而本研究旨在引入替代安全措施(SSMs)作为ML模型的输入特征,以提高检测性能。广泛实验的结果表明,所提出的半监督ML模型具有引入的SSMs作为重要特征的效果。所提出的半监督ML方法在各种指标上优于其他基线半监督或无监督方法,例如将准确率最佳提升至99.58%,将F-1度量最佳提升至0.9913。消融研究进一步凸显了SSMs对提升检测性能的重要性。
更新时间: 2024-05-24 16:16:46
领域: cs.LG,cs.AI,eess.SP,stat.OT
Signal Processing Meets SGD: From Momentum to Filter
In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization, but they typically suffer from slow convergence. Conversely, existing adaptive learning rate optimizers speed up convergence but often compromise generalization. To resolve this issue, we propose a novel optimization method designed to accelerate SGD's convergence without sacrificing generalization. Our approach reduces the variance of the historical gradient, improves first-order moment estimation of SGD by applying Wiener filter theory, and introduces a time-varying adaptive gain. Empirical results demonstrate that SGDF (SGD with Filter) effectively balances convergence and generalization compared to state-of-the-art optimizers.
Updated: 2024-05-24 16:14:37
标题: 信号处理遇见随机梯度下降:从动量到滤波器
摘要: 在深度学习中,随机梯度下降(SGD)及其基于动量的变种被广泛用于优化,但通常收敛较慢。相反,现有的自适应学习率优化器加快了收敛速度,但往往会牺牲泛化能力。为解决这一问题,我们提出了一种新颖的优化方法,旨在加速SGD的收敛而不损害泛化能力。我们的方法通过应用维纳滤波器理论降低了历史梯度的方差,改善了SGD的一阶矩估计,并引入了一个时间变化的自适应增益。实证结果表明,与最先进的优化器相比,SGDF(带滤波器的SGD)有效地平衡了收敛和泛化能力。
更新时间: 2024-05-24 16:14:37
领域: cs.LG,eess.SP
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (e.g. squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. The code is available on the project website https://abdullahamdi.com/ges .
Updated: 2024-05-24 16:13:43
标题: GES:通用指数分层光辐射场渲染
摘要: 3D高斯点阵技术的进步显著加快了3D重建和生成的速度。然而,这可能需要大量的高斯函数,这会导致大量的内存占用。本文介绍了GES(广义指数点阵),这是一种新颖的表示方法,采用广义指数函数(GEF)来建模3D场景,需要更少的粒子来表示一个场景,因此在效率上明显优于高斯点阵方法,并具有插拔式替代高斯工具的能力。GES在理论上和实证上验证了在原则性的1D设置和现实的3D场景中。 它被证明更准确地表示具有尖锐边缘的信号,这对高斯函数来说通常是具有挑战性的,因为它们具有固有的低通特性。我们的实证分析表明,在拟合自然发生的信号时,GEF优于高斯函数(例如方形、三角形和抛物线信号),从而减少了增加高斯点阵内存占用的大量分裂操作的需求。通过频率调制损失的帮助,GES在新视角合成基准测试中实现了竞争性的性能,同时需要不到高斯点阵存储的一半,并将渲染速度提高了高达39%。代码可在项目网站https://abdullahamdi.com/ges上找到。
更新时间: 2024-05-24 16:13:43
领域: cs.CV,cs.GR,cs.LG
FedAWARE: Maximizing Gradient Diversity for Heterogeneous Federated Server-side Optimization
Federated learning (FL) is a distributed learning framework where numerous clients collaborate with a central server to train a model without sharing local data. However, the standard federated optimization in real-world applications faces both statistical and system heterogeneity challenges, which result in unfavorable convergence behavior. The previous works attempted to modify the local training process (client-side) to tackle heterogeneity challenges. However, they ignored that the updates on the server side can coordinate the diverse local updates efficiently. This work explores the effect of server-side updates against heterogeneity issues. We first introduce the gradient diversity maximization direction findings, suggesting the global model moves continuously in this direction for fast and stable convergence. Then, we derive a novel server-side optimizer \textsc{FedAWARE} with rigorous convergence analysis for general non-convex settings. Our extensive experiments across multiple heterogeneous federated settings using four datasets showcase that \textsc{FedAWARE} achieves competitive convergence performance in comparison to state-of-the-art adaptive federated optimizers. Furthermore, our results show that \textsc{FedAWARE} can enhance the performance of FL algorithms as a plug-in module. Our source code is available at \url{https://github.com/dunzeng/FedAWARE}.
Updated: 2024-05-24 16:13:22
标题: FedAWARE:最大化异构联合服务器端优化的梯度多样性
摘要: 联邦学习(FL)是一种分布式学习框架,其中许多客户端与中央服务器合作训练模型,而无需共享本地数据。然而,在现实世界的应用中,标准的联邦优化面临统计和系统异质性挑战,导致不利的收敛行为。先前的研究尝试修改本地训练过程(客户端端)以解决异质性挑战。然而,它们忽视了服务器端的更新可以有效协调各种本地更新。本研究探讨了服务器端更新对异质性问题的影响。我们首先介绍了梯度多样性最大化方向的发现,建议全局模型在这个方向上持续移动,以实现快速稳定的收敛。然后,我们推导出一种新颖的服务器端优化器FedAWARE,对一般非凸设置进行了严格的收敛分析。我们通过对四个数据集进行的多个异构联邦设置的广泛实验表明,与最先进的自适应联邦优化器相比,FedAWARE在收敛性能上取得了竞争性表现。此外,我们的结果表明,FedAWARE可以作为插件模块提高FL算法的性能。我们的源代码可在https://github.com/dunzeng/FedAWARE 上找到。
更新时间: 2024-05-24 16:13:22
领域: cs.LG
Consistency of Neural Causal Partial Identification
Recent progress in Neural Causal Models (NCMs) showcased how identification and partial identification of causal effects can be automatically carried out via training of neural generative models that respect the constraints encoded in a given causal graph [Xia et al. 2022, Balazadeh et al. 2022]. However, formal consistency of these methods has only been proven for the case of discrete variables or only for linear causal models. In this work, we prove consistency of partial identification via NCMs in a general setting with both continuous and categorical variables. Further, our results highlight the impact of the design of the underlying neural network architecture in terms of depth and connectivity as well as the importance of applying Lipschitz regularization in the training phase. In particular, we provide a counterexample showing that without Lipschitz regularization the NCM may not be asymptotically consistent. Our results are enabled by new results on the approximability of structural causal models via neural generative models, together with an analysis of the sample complexity of the resulting architectures and how that translates into an error in the constrained optimization problem that defines the partial identification bounds.
Updated: 2024-05-24 16:12:39
标题: 神经因果部分识别的一致性
摘要: 最近在神经因果模型(NCMs)领域取得的进展展示了如何通过训练遵循给定因果图中编码的约束的神经生成模型来自动执行因果效应的识别和部分识别[Xia等人,2022年,Balazadeh等人,2022年]。然而,这些方法的形式一致性仅在离散变量的情况下或仅针对线性因果模型的情况下才被证明。在这项工作中,我们证明了在具有连续和分类变量的一般设置中通过NCMs进行部分识别的一致性。此外,我们的结果突出了底层神经网络架构设计的深度和连接性以及在训练阶段应用Lipschitz正则化的重要性。特别地,我们提供了一个反例,表明在没有Lipschitz正则化的情况下,NCM可能不会渐近一致。我们的结果得益于关于结构因果模型通过神经生成模型的可近似性的新结果,以及对所得架构的样本复杂性的分析以及如何将其转化为定义部分识别界限的受限优化问题中的误差。
更新时间: 2024-05-24 16:12:39
领域: cs.LG,cs.AI,stat.ML
Coordinated Disclosure for AI: Beyond Security Vulnerabilities
Harm reporting in the field of Artificial Intelligence (AI) currently operates on an ad hoc basis, lacking a structured process for disclosing or addressing algorithmic flaws. In contrast, the Coordinated Vulnerability Disclosure (CVD) ethos and ecosystem play a pivotal role in software security and transparency. Globally, there are ongoing efforts to establish frameworks that promote transparency and collaboration in addressing AI-related issues, though challenges persist. Algorithmic flaws in machine learning (ML) models present distinct challenges compared to traditional software vulnerabilities, warranting a specialized approach. To address this gap, we propose the implementation of a dedicated Coordinated Flaw Disclosure (CFD) framework tailored to the intricacies of machine learning and artificial intelligence issues. This paper delves into the historical landscape of disclosures in ML, encompassing the ad hoc reporting of harms and the emergence of participatory auditing. By juxtaposing these practices with the well-established disclosure norms in cybersecurity, we argue that the broader adoption of CFD has the potential to enhance public trust through transparent processes that carefully balance the interests of both organizations and the community.
Updated: 2024-05-24 16:08:34
标题: 人工智能的协调披露:超越安全漏洞
摘要: 在人工智能(AI)领域,有害事件的报告目前是一种临时性的操作,缺乏一个结构化的披露或解决算法缺陷的过程。相比之下,协调漏洞披露(CVD)的理念和生态系统在软件安全和透明度方面发挥着关键作用。全球范围内正在努力建立促进透明性和协作解决AI相关问题的框架,尽管挑战依然存在。机器学习(ML)模型中的算法缺陷与传统软件漏洞相比存在着独特的挑战,需要一种专门的方法。为了填补这一空白,我们提出实施一项专门针对机器学习和人工智能问题的Coordinated Flaw Disclosure(CFD)框架。本文深入探讨了ML披露历史背景,包括有害事件的临时报告和参与审计的出现。通过将这些实践与网络安全中已经建立的披露规范进行对比,我们认为更广泛地采用CFD有潜力通过透明的过程来平衡组织和社区的利益,从而增强公众信任。
更新时间: 2024-05-24 16:08:34
领域: cs.AI,cs.CR,cs.CY
Optimal Algorithms for Online Convex Optimization with Adversarial Constraints
A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV) against an adaptive adversary interacting over a horizon of length $T$. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. Furthermore, in the case of strongly convex cost and convex constraint functions, the regret guarantee can be improved to $O(\log T)$ while keeping the CCV bound the same as above. We establish these results by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
Updated: 2024-05-24 16:07:26
标题: 对抗性约束下在线凸优化的最优算法
摘要: 一个被广泛研究的标准在线凸优化(OCO)的推广是受约束的在线凸优化(COCO)。在COCO中,每一轮,在选择了该轮行动后,会向学习者公布一个凸成本函数和一个凸约束函数。目标是设计一个在线策略,同时实现较小的遗憾,并确保针对在长度为$T$的时间段内与自适应对手的互动中达到较小的累积约束违反(CCV)。在COCO中长期存在的一个问题是,是否可以在没有任何限制性假设的情况下,同时实现$O(\sqrt{T})$遗憾和$O(\sqrt{T})$ CCV。我们首次肯定地回答了这个问题,并展示了在线策略可以同时实现$O(\sqrt{T})$遗憾和$\tilde{O}(\sqrt{T})$ CCV。此外,在成本和凸约束函数强凸的情况下,遗憾保证可以提高到$O(\log T)$,同时保持与上述相同的CCV边界。我们通过有效地将AdaGrad算法的自适应遗憾界限与Lyapunov优化 - 控制理论中的经典工具相结合来建立这些结果。令人惊讶的是,分析简洁而优雅。
更新时间: 2024-05-24 16:07:26
领域: cs.LG,math.OC
Mirage: An RNS-Based Photonic Accelerator for DNN Training
Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. This paper proposes Mirage, a photonic DNN training accelerator that overcomes the precision challenges in photonic hardware using the Residue Number System (RNS). RNS is a numeral system based on modular arithmetic, allowing us to perform high-precision operations via multiple low-precision modular operations. In this work, we present a novel micro-architecture and dataflow for an RNS-based photonic tensor core performing modular arithmetic in the analog domain. By combining RNS and photonics, Mirage provides high energy efficiency without compromising precision and can successfully train state-of-the-art DNNs achieving accuracy comparable to FP32 training. Our study shows that on average across several DNNs when compared to systolic arrays, Mirage achieves more than $23.8\times$ faster training and $32.1\times$ lower EDP in an iso-energy scenario and consumes $42.8\times$ lower power with comparable or better EDP in an iso-area scenario.
Updated: 2024-05-24 16:06:53
标题: Mirage:基于RNS的光子加速器用于DNN训练
摘要: 光子计算是执行高效矩阵乘法的一个引人注目的途径,在深度神经网络(DNNs)中是一个关键的操作。虽然这种方法在DNN推断中取得了巨大成功,但由于昂贵的数据转换器和光子硬件中固有的模拟噪声所施加的精度限制,满足DNN训练的高精度要求仍然具有挑战性。本文提出了Mirage,一种光子DNN训练加速器,通过使用余数系统(RNS)克服了光子硬件中的精度挑战。RNS是一种基于模运算的数字系统,使我们能够通过多个低精度模运算进行高精度操作。在这项工作中,我们提出了一种基于RNS的光子张量核的新型微体系结构和数据流,可以在模拟域中进行模运算。通过结合RNS和光子学,Mirage提供了高能效性,而不会影响精度,并且可以成功训练达到与FP32训练相当的准确性的最新DNN。我们的研究表明,在几个DNNs上,与并行阵列相比,Mirage在等能量情况下实现了超过23.8倍的更快训练速度和32.1倍更低的EDP,并且在等面积情况下消耗了42.8倍更低的功率,并具有相当或更好的EDP。
更新时间: 2024-05-24 16:06:53
领域: cs.AR,cs.AI,cs.LG
Learning the Language of Protein Structure
Representation learning and \emph{de novo} generation of proteins are pivotal computational biology tasks. Whilst natural language processing (NLP) techniques have proven highly effective for protein sequence modelling, structure modelling presents a complex challenge, primarily due to its continuous and three-dimensional nature. Motivated by this discrepancy, we introduce an approach using a vector-quantized autoencoder that effectively tokenizes protein structures into discrete representations. This method transforms the continuous, complex space of protein structures into a manageable, discrete format with a codebook ranging from 4096 to 64000 tokens, achieving high-fidelity reconstructions with backbone root mean square deviations (RMSD) of approximately 1-5 \AA. To demonstrate the efficacy of our learned representations, we show that a simple GPT model trained on our codebooks can generate novel, diverse, and designable protein structures. Our approach not only provides representations of protein structure, but also mitigates the challenges of disparate modal representations and sets a foundation for seamless, multi-modal integration, enhancing the capabilities of computational methods in protein design.
Updated: 2024-05-24 16:03:47
标题: 学习蛋白质结构的语言
摘要: 表征学习和蛋白质的\emph{de novo}生成是关键的计算生物学任务。尽管自然语言处理(NLP)技术已经被证明对蛋白质序列建模非常有效,但结构建模面临着一个复杂的挑战,主要是由于其连续和三维的特性。受到这种差异的激励,我们引入了一种使用矢量量化自动编码器的方法,有效地将蛋白结构标记成离散表示。这种方法将蛋白质结构的连续、复杂空间转换为一个可管理的、离散格式,编码本范围从4096到64000个标记,实现了大约1-5\AA的主干均方根偏差(RMSD)的高保真重构。为了展示我们学到的表示的有效性,我们展示了一个简单的在我们的编码本上训练的GPT模型可以生成新颖、多样化和可设计的蛋白质结构。我们的方法不仅提供了蛋白质结构的表示,而且减轻了不同模态表示的挑战,为无缝、多模态集成奠定了基础,增强了蛋白质设计中计算方法的能力。
更新时间: 2024-05-24 16:03:47
领域: q-bio.QM,cs.LG
Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning
In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for large-scaling complex data, such as image or text data, unlearning a class from a model leads to a inferior performance due to the difficulty to identify the link between classes and model. An inaccurate class deleting may lead to over or under unlearning. In this paper, to accurately defining the unlearning class of complex data, we apply the definition of Concept, rather than an image feature or a token of text data, to represent the semantic information of unlearning class. This new representation can cut the link between the model and the class, leading to a complete erasing of the impact of a class. To analyze the impact of the concept of complex data, we adopt a Post-hoc Concept Bottleneck Model, and Integrated Gradients to precisely identify concepts across different classes. Next, we take advantage of data poisoning with random and targeted labels to propose unlearning methods. We test our methods on both image classification models and large language models (LLMs). The results consistently show that the proposed methods can accurately erase targeted information from models and can largely maintain the performance of the models.
Updated: 2024-05-24 15:59:17
标题: 复杂数据的类机器遗忘:通过概念推断和数据注入
摘要: 在当前的人工智能时代,用户可能要求人工智能公司从训练数据集中删除他们的数据,原因是出于隐私方面的考虑。作为模型所有者,重新训练一个模型将消耗大量的计算资源。因此,机器遗忘是一种新兴技术,允许模型所有者删除请求的训练数据或一个类别,而对模型性能影响较小。然而,对于大规模复杂数据,如图像或文本数据,从模型中遗忘一个类别会导致性能下降,因为难以识别类别与模型之间的联系。不准确的类别删除可能导致过度或不足的遗忘。在本文中,为了准确定义复杂数据的遗忘类别,我们应用概念的定义,而不是图像特征或文本数据的标记,来代表遗忘类别的语义信息。这种新的表征可以切断模型与类别之间的联系,从而完全消除一个类别的影响。为了分析复杂数据概念的影响,我们采用后续概念瓶颈模型和整合梯度来精确识别不同类别之间的概念。接下来,我们利用随机和有针对性的标签的数据毒害来提出遗忘方法。我们在图像分类模型和大型语言模型(LLMs)上测试我们的方法。结果一致表明,提出的方法可以准确地从模型中消除目标信息,并且可以大大保持模型的性能。
更新时间: 2024-05-24 15:59:17
领域: cs.LG
Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables
The rise of deep learning in image classification has brought unprecedented accuracy but also highlighted a key issue: the use of 'shortcuts' by models. Such shortcuts are easy-to-learn patterns from the training data that fail to generalise to new data. Examples include the use of a copyright watermark to recognise horses, snowy background to recognise huskies, or ink markings to detect malignant skin lesions. The explainable AI (XAI) community has suggested using instance-level explanations to detect shortcuts without external data, but this requires the examination of many explanations to confirm the presence of such shortcuts, making it a labour-intensive process. To address these challenges, we introduce Counterfactual Frequency (CoF) tables, a novel approach that aggregates instance-based explanations into global insights, and exposes shortcuts. The aggregation implies the need for some semantic concepts to be used in the explanations, which we solve by labelling the segments of an image. We demonstrate the utility of CoF tables across several datasets, revealing the shortcuts learned from them.
Updated: 2024-05-24 15:58:02
标题: 使用反事实频率表暴露图像分类器的捷径
摘要: 图像分类中深度学习的兴起带来了前所未有的准确性,但也突显出一个关键问题:模型使用“捷径”。这些捷径是从训练数据中容易学习的模式,但不能推广到新数据。例如,使用版权水印来识别马,雪地背景来识别哈士奇,或墨迹来检测恶性皮肤病变。可解释人工智能(XAI)社区建议使用实例级解释来检测捷径,而无需外部数据,但这需要检查许多解释来确认这些捷径的存在,使其成为一项劳动密集型过程。为了解决这些挑战,我们引入了反事实频率(CoF)表,这是一种将基于实例的解释聚合成全局见解并暴露捷径的新方法。聚合意味着需要在解释中使用一些语义概念,我们通过标记图像的各个部分来解决这个问题。我们展示了CoF表在多个数据集上的实用性,揭示了从中学习到的捷径。
更新时间: 2024-05-24 15:58:02
领域: cs.CV,cs.AI
HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation
The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving multiple/non-target scenarios. Recent approaches focus on optimizing the last modality-fused feature which is directly utilized for segmentation and object-existence identification. However, the attempt to integrate all-grained information into a single joint representation is impractical in GRES due to the increased complexity of the spatial relationships among instances and deceptive text descriptions. Furthermore, the subsequent binary target justification across all referent scenarios fails to specify their inherent differences, leading to ambiguity in object understanding. To address the weakness, we propose a $\textbf{H}$ierarchical Semantic $\textbf{D}$ecoding with $\textbf{C}$ounting Assistance framework (HDC). It hierarchically transfers complementary modality information across granularities, and then aggregates each well-aligned semantic correspondence for multi-level decoding. Moreover, with complete semantic context modeling, we endow HDC with explicit counting capability to facilitate comprehensive object perception in multiple/single/non-target settings. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of HDC which outperforms the state-of-the-art GRES methods by a remarkable margin. Code will be available $\href{https://github.com/RobertLuo1/HDC}{here}$.
Updated: 2024-05-24 15:53:59
标题: HDC:具有计数辅助的分层语义解码用于广义指代表达式分割
摘要: 最近提出的广义指称表达分割(GRES)通过涉及多个/非目标场景来增强经典RES的表达。最近的方法集中于优化直接用于分割和对象存在识别的最后模态融合特征。然而,在GRES中尝试将所有粒度信息整合到单个联合表示中是不切实际的,因为实例之间的空间关系的复杂性增加,以及欺骗性文本描述。此外,随后的二进制目标证明跨所有指代场景未能指明它们固有的差异,导致对象理解中的模糊性。为了解决这个弱点,我们提出了一个带有计数辅助框架的分层语义解码(HDC)。它在不同粒度之间分层传递互补的模态信息,然后聚合每个良好对齐的语义对应以进行多级解码。此外,通过完整的语义上下文建模,我们赋予HDC明确的计数能力,以促进在多个/单个/非目标设置中的全面对象感知。在gRefCOCO、Ref-ZOM、R-RefCOCO和RefCOCO基准测试上的实验结果表明,HDC的有效性和合理性,它比最先进的GRES方法表现出显著优势。代码将在此处提供:https://github.com/RobertLuo1/HDC。
更新时间: 2024-05-24 15:53:59
领域: cs.CV,cs.AI
Light Unbalanced Optimal Transport
While the continuous Entropic Optimal Transport (EOT) field has been actively developing in recent years, it became evident that the classic EOT problem is prone to different issues like the sensitivity to outliers and imbalance of classes in the source and target measures. This fact inspired the development of solvers that deal with the unbalanced EOT (UEOT) problem $-$ the generalization of EOT allowing for mitigating the mentioned issues by relaxing the marginal constraints. Surprisingly, it turns out that the existing solvers are either based on heuristic principles or heavy-weighted with complex optimization objectives involving several neural networks. We address this challenge and propose a novel theoretically-justified, lightweight, unbalanced EOT solver. Our advancement consists of developing a novel view on the optimization of the UEOT problem yielding tractable and a non-minimax optimization objective. We show that combined with a light parametrization recently proposed in the field our objective leads to a fast, simple, and effective solver which allows solving the continuous UEOT problem in minutes on CPU. We prove that our solver provides a universal approximation of UEOT solutions and obtain its generalization bounds. We give illustrative examples of the solver's performance.
Updated: 2024-05-24 15:53:23
标题: 光不平衡的最优输运
摘要: 尽管近年来连续熵最优输运(EOT)领域一直在积极发展,但显然经典的EOT问题容易出现不同问题,如对异常值的敏感性和源和目标度量的类别不平衡。这一事实激发了解决不平衡EOT(UEOT)问题的求解器的发展 - 这是EOT的泛化,允许通过放宽边际约束来缓解上述问题。令人惊讶的是,现有的解算器要么基于启发式原则,要么依赖于包含多个神经网络的复杂优化目标。我们应对这一挑战,提出了一种新颖的理论上证明的、轻量级的、不平衡的EOT求解器。我们的进展在于开发了一种新颖的视角,用于优化UEOT问题,导致了一个可行的非极小化优化目标。我们展示了结合最近在该领域提出的轻量级参数化,我们的目标导致了一个快速、简单、有效的求解器,可以在CPU上在几分钟内解决连续UEOT问题。我们证明了我们的求解器提供了UEOT解的通用逼近,并获得了其泛化界限。我们给出了求解器性能的说明性示例。
更新时间: 2024-05-24 15:53:23
领域: cs.LG
Dual Lagrangian Learning for Conic Optimization
This paper presents Dual Lagrangian Learning (DLL), a principled learning methodology for dual conic optimization proxies. DLL leverages conic duality and the representation power of ML models to provide high-duality, dual-feasible solutions, and therefore valid Lagrangian dual bounds, for linear and nonlinear conic optimization problems. The paper introduces a systematic dual completion procedure, differentiable conic projection layers, and a self-supervised learning framework based on Lagrangian duality. It also provides closed-form dual completion formulae for broad classes of conic problems, which eliminate the need for costly implicit layers. The effectiveness of DLL is demonstrated on linear and nonlinear conic optimization problems. The proposed methodology significantly outperforms a state-of-the-art learning-based method, and achieves 1000x speedups over commercial interior-point solvers with optimality gaps under 0.5\% on average.
Updated: 2024-05-24 15:53:01
标题: 锥优化的双拉格朗日学习
摘要: 本文介绍了双拉格朗日学习(DLL),这是一种基于原则的双锥优化代理学习方法。DLL利用锥对偶和机器学习模型的表示能力,为线性和非线性锥优化问题提供高对偶性、对偶可行解,从而为拉格朗日对偶界提供有效的保证。本文介绍了一种系统的对偶完成过程,可微锥投影层,以及基于拉格朗日对偶的自监督学习框架。它还提供了广泛类别锥问题的封闭形式对偶完成公式,消除了昂贵的隐式层。DLL的有效性在线性和非线性锥优化问题上得到了证明。所提出的方法明显优于最先进的基于学习的方法,并在商业内点求解器上实现了1000倍的加速,最优性差距平均不超过0.5\%。
更新时间: 2024-05-24 15:53:01
领域: math.OC,cs.LG
HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct extensive experiments on multiple state-of-the-art (SOTA) models in the speaker verification domain to evaluate HiddenSpeaker. Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations, showcasing strong transferability across different models.
Updated: 2024-05-24 15:49:00
标题: 隐藏扬声器:为说话人验证系统生成难以察觉且无法学习的音频
摘要: 近年来,深度神经网络的显著进展带来了巨大的便利。然而,一个高效模型的训练过程需要大量的样本,这带来了潜在的巨大威胁,比如未经授权的利用可能会导致隐私泄露。为此,我们提出了一个名为HiddenSpeaker的框架,将不可察觉的扰动嵌入训练语音样本中,使它们对采用大规模讲话者进行高效训练的基于深度学习的说话者验证系统无法学习。HiddenSpeaker利用一种简化的错误最小化方法称为Single-Level Error-Minimizing(SLEM)来生成特定且有效的扰动。此外,采用混合目标函数进行人类感知优化,确保扰动与人类听众无法区分。我们在多个最先进的说话者验证领域的模型上进行了广泛的实验来评估HiddenSpeaker。我们的结果表明,HiddenSpeaker不仅可以用无法学习的样本欺骗模型,还可以增强扰动的不可察觉性,展示了在不同模型之间具有强大的可转移性。
更新时间: 2024-05-24 15:49:00
领域: cs.SD,cs.LG,eess.AS
$$\mathbf{L^2\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis
Large Language Models (LLMs) have gained significant popularity in the last few years due to their performance in diverse tasks such as translation, prediction, or content generation. At the same time, the research community has shown that LLMs are susceptible to various attacks but can also improve the security of diverse systems. However, besides enabling more secure systems, how well do open source LLMs behave as covertext distributions to, e.g., facilitate censorship resistant communication? In this paper, we explore the capabilities of open-source LLM-based covert channels. We approach this problem from the experimental side by empirically measuring the security vs. capacity of the open-source LLM model (Llama-7B) to assess how well it performs as a covert channel. Although our results indicate that such channels are not likely to achieve high practical bitrates, which depend on message length and model entropy, we also show that the chance for an adversary to detect covert communication is low. To ensure that our results can be used with the least effort as a general reference, we employ a conceptually simple and concise scheme and only assume public models.
Updated: 2024-05-24 15:47:35
标题: 大型语言模型作为隐蔽通道的系统分析
摘要: 大型语言模型(LLMs)由于在翻译、预测或内容生成等各种任务中的表现而在过去几年中获得了显著的流行度。与此同时,研究界表明LLMs容易受到各种攻击,但也可以提高各种系统的安全性。然而,除了实现更安全的系统外,开源LLMs作为覆盖文本分布表现如何,例如,促进抗审查的通信? 在本文中,我们探讨了基于开源LLM的隐蔽通道的能力。我们从实验的角度来解决这个问题,通过实证测量开源LLM模型(Llama-7B)的安全性与容量,评估其作为隐蔽通道的表现如何。尽管我们的结果表明这种通道不太可能实现高实际比特率,这取决于消息长度和模型熵,但我们也表明对手检测隐蔽通信的几率很低。为确保我们的结果能够尽可能轻松地作为一般参考使用,我们采用了一个概念简单而简洁的方案,并只假设公共模型。
更新时间: 2024-05-24 15:47:35
领域: cs.CR
Harnessing Increased Client Participation with Cohort-Parallel Federated Learning
Federated Learning (FL) is a machine learning approach where nodes collaboratively train a global model. As more nodes participate in a round of FL, the effectiveness of individual model updates by nodes also diminishes. In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or cohorts. We introduce Cohort-Parallel Federated Learning (CPFL): a novel learning approach where each cohort independently trains a global model using FL, until convergence, and the produced models by each cohort are then unified using one-shot Knowledge Distillation (KD) and a cross-domain, unlabeled dataset. The insight behind CPFL is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate. Through exhaustive experiments involving realistic traces and non-IID data distributions on the CIFAR-10 and FEMNIST image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute and communication resources. Compared to traditional FL, CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9$\times$ reduction in train time and a 1.3$\times$ reduction in resource usage, with a minimal drop in test accuracy.
Updated: 2024-05-24 15:34:09
标题: 利用队列-并行联邦学习增加客户参与
摘要: 联邦学习(FL)是一种机器学习方法,其中节点共同训练一个全局模型。随着越来越多的节点参与FL的一轮训练,节点对个体模型更新的有效性也在减弱。在本研究中,我们通过将网络分成更小的分区或队伍来提高客户端更新的有效性。我们引入了队伍并行联邦学习(CPFL):一种新颖的学习方法,其中每个队伍独立地使用FL训练全局模型,直到收敛,并且每个队伍产生的模型然后通过一次性知识蒸馏(KD)和一个跨领域、无标签的数据集进行统一。CPFL背后的见解是,较小的孤立网络比所有节点参与的单网络设置更快地收敛。通过在CIFAR-10和FEMNIST图像分类任务上进行涉及真实跟踪和非IID数据分布的详尽实验,我们调查了队伍数量、模型准确性、训练时间以及计算和通信资源之间的平衡。与传统FL相比,具有四个队伍、非IID数据分布和CIFAR-10的CPFL训练时间减少了1.9倍,资源使用减少了1.3倍,并且测试准确性几乎没有下降。
更新时间: 2024-05-24 15:34:09
领域: cs.LG,cs.DC
Reducing the cost of posterior sampling in linear inverse problems via task-dependent score learning
Score-based diffusion models (SDMs) offer a flexible approach to sample from the posterior distribution in a variety of Bayesian inverse problems. In the literature, the prior score is utilized to sample from the posterior by different methods that require multiple evaluations of the forward mapping in order to generate a single posterior sample. These methods are often designed with the objective of enabling the direct use of the unconditional prior score and, therefore, task-independent training. In this paper, we focus on linear inverse problems, when evaluation of the forward mapping is computationally expensive and frequent posterior sampling is required for new measurement data, such as in medical imaging. We demonstrate that the evaluation of the forward mapping can be entirely bypassed during posterior sample generation. Instead, without introducing any error, the computational effort can be shifted to an offline task of training the score of a specific diffusion-like random process. In particular, the training is task-dependent requiring information about the forward mapping but not about the measurement data. It is shown that the conditional score corresponding to the posterior can be obtained from the auxiliary score by suitable affine transformations. We prove that this observation generalizes to the framework of infinite-dimensional diffusion models introduced recently and provide numerical analysis of the method. Moreover, we validate our findings with numerical experiments.
Updated: 2024-05-24 15:33:27
标题: 通过任务相关性得分学习降低线性逆问题后验采样成本
摘要: Score-based diffusion models (SDMs)提供了一种灵活的方法,可以在各种贝叶斯逆问题中从后验分布中抽样。在文献中,先验分数被用于通过不同的方法从后验中抽样,这些方法需要对正向映射进行多次评估,以生成单个后验样本。这些方法通常旨在使无条件先验分数的直接使用成为可能,因此具有任务无关的训练。在本文中,我们关注线性逆问题,当正向映射的评估在计算上昂贵且需要频繁的后验抽样以获取新的测量数据时,比如在医学成像中。我们证明了在后验样本生成过程中可以完全绕过对正向映射的评估。相反,不引入任何误差,计算工作可以转移到训练特定扩散类随机过程的分数的离线任务上。特别地,这种训练是任务相关的,需要关于正向映射而非测量数据的信息。我们展示了后验对应的条件分数可以通过适当的仿射变换从辅助分数中获得。我们证明了这一观察结果可以推广到最近引入的无限维扩散模型框架,并对该方法进行了数值分析。此外,我们通过数值实验验证了我们的发现。
更新时间: 2024-05-24 15:33:27
领域: stat.ML,cs.LG,cs.NA,math.AP,math.NA,math.PR,62F15, 65N21, 68Q32, 60Hxx, 60Jxx
Effective Confidence Region Prediction Using Probability Forecasters
Confidence region prediction is a practically useful extension to the commonly studied pattern recognition problem. Instead of predicting a single label, the constraint is relaxed to allow prediction of a subset of labels given a desired confidence level 1-delta. Ideally, effective region predictions should be (1) well calibrated - predictive regions at confidence level 1-delta should err with relative frequency at most delta and (2) be as narrow (or certain) as possible. We present a simple technique to generate confidence region predictions from conditional probability estimates (probability forecasts). We use this 'conversion' technique to generate confidence region predictions from probability forecasts output by standard machine learning algorithms when tested on 15 multi-class datasets. Our results show that approximately 44% of experiments demonstrate well-calibrated confidence region predictions, with the K-Nearest Neighbour algorithm tending to perform consistently well across all data. Our results illustrate the practical benefits of effective confidence region prediction with respect to medical diagnostics, where guarantees of capturing the true disease label can be given.
Updated: 2024-05-24 15:33:08
标题: 利用概率预测器有效地预测置信区域
摘要: 置信区域预测是对常见的模式识别问题的实用扩展。与预测单个标签不同,该约束被放宽以允许在所需的置信水平1-delta下预测标签的子集。理想情况下,有效的区域预测应该具有以下特点:(1)校准良好 - 在置信水平1-delta下的预测区域应该以最多的频率出错;(2)尽可能狭窄(或确定)。我们提出了一种简单的技术,从条件概率估计(概率预测)中生成置信区域预测。我们使用这种“转换”技术,从标准机器学习算法在15个多类数据集上测试时输出的概率预测中生成置信区域预测。我们的结果显示,大约44%的实验表明校准良好的置信区域预测,K-最近邻算法在所有数据上表现良好。我们的结果说明了有效的置信区域预测在医学诊断方面的实际益处,可以提供捕获真实疾病标签的保证。
更新时间: 2024-05-24 15:33:08
领域: cs.LG,cs.AI
Informed Meta-Learning
In noisy and low-data regimes prevalent in real-world applications, a key challenge of machine learning lies in effectively incorporating inductive biases that promote data efficiency and robustness. Meta-learning and informed ML stand out as two approaches for incorporating prior knowledge into ML pipelines. While the former relies on a purely data-driven source of priors, the latter is guided by prior domain knowledge. In this paper, we formalise a hybrid paradigm, informed meta-learning, facilitating the incorporation of priors from unstructured knowledge representations, such as natural language; thus, unlocking complementarity in cross-task knowledge sharing of humans and machines. We establish the foundational components of informed meta-learning and present a concrete instantiation of this framework--the Informed Neural Process. Through a series of experiments, we demonstrate the potential benefits of informed meta-learning in improving data efficiency, robustness to observational noise and task distribution shifts.
Updated: 2024-05-24 15:31:57
标题: Informed Meta-Learning 知情元学习
摘要: 在真实应用中普遍存在的嘈杂和低数据环境中,机器学习的一个关键挑战在于有效地整合归纳偏见,以促进数据效率和鲁棒性。元学习和知情机器学习是两种将先前知识整合到机器学习流程中的方法。前者依赖于纯粹基于数据的先验信息源,而后者则受先前领域知识指导。本文中,我们形式化了一种混合范式,即知情元学习,促进了从自然语言等非结构化知识表示中整合先验信息,从而解锁了人类和机器之间的跨任务知识共享的互补性。我们建立了知情元学习的基础组件,并提出了这一框架的具体实例——知情神经过程。通过一系列实验,我们展示了知情元学习在提高数据效率、对观测噪声的鲁棒性以及任务分布变化方面的潜在好处。
更新时间: 2024-05-24 15:31:57
领域: cs.LG
GECKO: Generative Language Model for English, Code and Korean
We introduce GECKO, a bilingual large language model (LLM) optimized for Korean and English, along with programming languages. GECKO is pretrained on the balanced, high-quality corpus of Korean and English employing LLaMA architecture. In this report, we share the experiences of several efforts to build a better data pipeline for the corpus and to train our model. GECKO shows great efficiency in token generations for both Korean and English, despite its small size of vocabulary. We measure the performance on the representative benchmarks in terms of Korean, English and Code, and it exhibits great performance on KMMLU (Korean MMLU) and modest performance in English and Code, even with its smaller number of trained tokens compared to English-focused LLMs. GECKO is available to the open-source community under a permissive license. We hope our work offers a research baseline and practical insights for Korean LLM research. The model can be found at: https://huggingface.co/kifai/GECKO-7B
Updated: 2024-05-24 15:30:41
标题: 壁虎:用于英语、代码和韩语的生成式语言模型
摘要: 我们介绍了GECKO,这是一个针对韩语和英语以及编程语言进行优化的双语大型语言模型(LLM)。GECKO是在平衡的高质量韩语和英语语料库上进行预训练的,采用LLaMA架构。在这份报告中,我们分享了建立更好的语料库数据管道和训练模型的几个努力经验。尽管其词汇量较小,GECKO在韩语和英语的标记生成方面表现出了很高的效率。我们根据韩语、英语和代码的代表性基准测试了性能,结果显示在韩语MMLU(KMMLU)方面表现出色,在英语和代码方面表现适中,即使与以英语为重点的LLMs相比,它的训练标记数量较少。GECKO在开源社区中采用自由许可证发布。我们希望我们的工作为韩语LLM研究提供了研究基线和实用见解。该模型可以在以下网址找到:https://huggingface.co/kifai/GECKO-7B
更新时间: 2024-05-24 15:30:41
领域: cs.CL,cs.AI
On the Correspondence of Non-flat Assumption-based Argumentation and Logic Programming with Negation as Failure in the Head
The relation between (a fragment of) assumption-based argumentation (ABA) and logic programs (LPs) under stable model semantics is well-studied. However, for obtaining this relation, the ABA framework needs to be restricted to being flat, i.e., a fragment where the (defeasible) assumptions can never be entailed, only assumed to be true or false. Here, we remove this restriction and show a correspondence between non-flat ABA and LPs with negation as failure in their head. We then extend this result to so-called set-stable ABA semantics, originally defined for the fragment of non-flat ABA called bipolar ABA. We showcase how to define set-stable semantics for LPs with negation as failure in their head and show the correspondence to set-stable ABA semantics.
Updated: 2024-05-24 15:25:22
标题: 关于非扁平化基于假设的论证与具有否定作为失败的逻辑编程之间的对应关系
摘要: (一个片段的)基于假设的论证(ABA)与逻辑程序(LP)在稳定模型语义下的关系已经得到了深入研究。然而,为了获得这种关系,ABA框架需要被限制为平面的,即一种片段,其中(可废弃的)假设永远不会被推导出,只能被假定为真或假。在这里,我们消除了这一限制,并展示了非平面ABA与在其头部具有否定作为失败的LP之间的对应关系。然后,我们将这一结果扩展到所谓的集稳定ABA语义,最初定义为非平面ABA片段称为双极ABA。我们展示了如何为在其头部具有否定作为失败的LP定义集稳定语义,并展示了与集稳定ABA语义的对应关系。
更新时间: 2024-05-24 15:25:22
领域: cs.AI
Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models
This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while keeping high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios.
Updated: 2024-05-24 15:25:09
标题: 个性化适配器用于设备上的大气象模型:走向天气基础模型
摘要: 本文表明,预训练语言模型(PLMs)是设备上气象变量建模的强大基础模型。我们提出了LM-Weather,这是一种通用方法,用于驯服PLMs,这些模型从自然语言数据库的宇宙中学习了大量的顺序知识,从而具备立即获取设备上异构气象数据高度定制模型的能力,同时保持高效性。具体地,我们向PLMs引入了一个轻量级的个性化适配器,并赋予了它对天气模式的意识。在客户端和服务器之间的通信过程中,采用基于低秩的传输方式,有效地融合了设备之间的全局知识,同时保持高通信效率并确保隐私安全。在真实数据集上的实验表明,LM-Weather在各种任务(例如不同尺度的预测和插补)上均大幅优于最先进的结果。我们提供了广泛而深入的分析实验,验证了LM-Weather能够(1)确实利用自然语言的顺序知识准确处理气象序列,(2)允许每个设备在显著的异质性下获取高度定制的模型,并且(3)在数据受限和超出分布范围的情况下进行泛化。
更新时间: 2024-05-24 15:25:09
领域: physics.ao-ph,cs.LG
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
Updated: 2024-05-24 15:24:58
标题: DeepSeek-V2:一个强大、经济高效的专家混合语言模型
摘要: 我们介绍了DeepSeek-V2,这是一个强大的混合专家(MoE)语言模型,具有经济的训练和高效的推理特点。它包含236B总参数,其中每个标记激活了21B,并支持128K标记的上下文长度。DeepSeek-V2采用创新的架构,包括多头潜在注意力(MLA)和DeepSeekMoE。MLA通过将关键-值(KV)缓存显著压缩为潜在向量,确保高效的推理,而DeepSeekMoE通过稀疏计算实现在经济成本下训练强大的模型。与DeepSeek 67B相比,DeepSeek-V2实现了显著更强的性能,同时节省了42.5%的训练成本,将KV缓存减少了93.3%,并将最大生成吞吐量提升至5.76倍。我们在由8100亿标记组成的高质量和多源语料库上对DeepSeek-V2进行预训练,并进一步进行监督微调(SFT)和强化学习(RL)以充分释放其潜力。评估结果显示,即使只激活了21B参数,DeepSeek-V2及其聊天版本仍然在开源模型中表现出顶尖性能。
更新时间: 2024-05-24 15:24:58
领域: cs.CL,cs.AI
Visualize and Paint GAN Activations
We investigate how generated structures of GANs correlate with their activations in hidden layers, with the purpose of better understanding the inner workings of those models and being able to paint structures with unconditionally trained GANs. This gives us more control over the generated images, allowing to generate them from a semantic segmentation map while not requiring such a segmentation in the training data. To this end we introduce the concept of tileable features, allowing us to identify activations that work well for painting.
Updated: 2024-05-24 15:22:58
标题: 可视化和绘制GAN激活
摘要: 我们研究了GAN生成的结构如何与它们在隐藏层中的激活相关,目的是更好地理解这些模型的内部运作,并能够使用无条件训练的GAN来绘制结构。这使我们能够更好地控制生成的图像,可以从语义分割地图中生成它们,而不需要在训练数据中包含这样的分割。为此,我们引入了可平铺特征的概念,使我们能够识别适用于绘画的激活。
更新时间: 2024-05-24 15:22:58
领域: cs.CV,cs.LG
SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models
Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic backdoors are easily exposed, due to explicit triggers used. Then, we present $\mathtt{SynGhost}$, an imperceptible and universal task-agnostic backdoor attack in PLMs. Specifically, $\mathtt{SynGhost}$ hostilely manipulates clean samples through different syntactic and then maps the backdoor to representation space without disturbing the primitive representation. $\mathtt{SynGhost}$ further leverages contrastive learning to achieve universal, which performs a uniform distribution of backdoors in the representation space. In light of the syntactic properties, we also introduce an awareness module to alleviate the interference between different syntactic. Experiments show that $\mathtt{SynGhost}$ holds more serious threats. Not only do severe harmfulness to various downstream tasks on two tuning paradigms but also to any PLMs. Meanwhile, $\mathtt{SynGhost}$ is imperceptible against three countermeasures based on perplexity, fine-pruning, and the proposed $\mathtt{maxEntropy}$.
Updated: 2024-05-24 15:21:55
标题: SynGhost:在预训练语言模型中的不可察觉且通用的与任务无关的后门攻击
摘要: 预训练一直是部署预训练语言模型(PLMs)以在下游任务中取得显著性能的必要阶段。然而,我们通过实验证明,后门攻击利用这一阶段作为一个易受攻击的入口点,无视任务的特点。在本文中,我们首先提出了基于熵的污染过滤防御$\mathtt{maxEntropy}$,证明了现有的易受攻击的后门由于明确触发器的使用而容易被暴露。然后,我们提出了$\mathtt{SynGhost}$,这是一个在PLMs中无感知且通用的任务不可知后门攻击。具体来说,$\mathtt{SynGhost}$通过不同的句法敌意地操纵干净样本,然后将后门映射到表示空间,而不干扰原始表示。$\mathtt{SynGhost}$进一步利用对比学习实现了通用性,在表示空间中实现后门的均匀分布。鉴于句法特性,我们还引入了一个意识模块来缓解不同句法之间的干扰。实验证明$\mathtt{SynGhost}$带来更严重的威胁。它不仅在两种调整范式上对各种下游任务造成严重伤害,而且对任何PLMs都具有危害。同时,$\mathtt{SynGhost}$对基于困惑度、细微修剪和提出的$\mathtt{maxEntropy}$的三项对策都是无感知的。
更新时间: 2024-05-24 15:21:55
领域: cs.CR,cs.AI,cs.CL
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at https://docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ/
Updated: 2024-05-24 15:21:13
标题: 视觉回响:一种用于音频-视觉生成的简单统一变压器
摘要: 最近几年,随着逼真的生成结果和广泛的个性化应用,基于扩散的生成模型在视觉和音频生成领域引起了巨大关注。与文本到图像或文本到音频生成领域的相当进步相比,音频到视觉或视觉到音频生成领域的研究相对缓慢。最近的音频-视觉生成方法通常依赖于巨大的大型语言模型或可组合的扩散模型。本文并没有设计另一个巨大的模型用于音频-视觉生成,而是展示了一种简单且轻量级的生成变压器,这在多模态生成中尚未得到充分研究,可以在图像到音频生成方面取得出色的结果。变压器在离散音频和视觉向量量化GAN空间中运作,并以掩码去噪的方式进行训练。训练后,无需分类器指导即可获得更好的性能,无需任何额外的训练或修改。由于变压器模型是模态对称的,因此也可以直接用于音频到图像生成和共同生成。在实验中,我们展示了我们的简单方法超过了最近的图像到音频生成方法。生成的音频样本可以在链接中找到。
更新时间: 2024-05-24 15:21:13
领域: cs.CV,cs.LG,cs.MM,cs.SD,eess.AS
Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning
Prompt tuning has emerged as an effective rehearsal-free technique for class-incremental learning (CIL) that learns a tiny set of task-specific parameters (or prompts) to instruct a pre-trained transformer to learn on a sequence of tasks. Albeit effective, prompt tuning methods do not lend well in the multi-label class incremental learning (MLCIL) scenario (where an image contains multiple foreground classes) due to the ambiguity in selecting the correct prompt(s) corresponding to different foreground objects belonging to multiple tasks. To circumvent this issue we propose to eliminate the prompt selection mechanism by maintaining task-specific pathways, which allow us to learn representations that do not interact with the ones from the other tasks. Since independent pathways in truly incremental scenarios will result in an explosion of computation due to the quadratically complex multi-head self-attention (MSA) operation in prompt tuning, we propose to reduce the original patch token embeddings into summarized tokens. Prompt tuning is then applied to these fewer summarized tokens to compute the final representation. Our proposed method Multi-Label class incremental learning via summarising pAtch tokeN Embeddings (MULTI-LANE) enables learning disentangled task-specific representations in MLCIL while ensuring fast inference. We conduct experiments in common benchmarks and demonstrate that our MULTI-LANE achieves a new state-of-the-art in MLCIL. Additionally, we show that MULTI-LANE is also competitive in the CIL setting. Source code available at https://github.com/tdemin16/multi-lane
Updated: 2024-05-24 15:18:27
标题: Less is more: 小结补丁令牌以实现高效的多标签类增量学习
摘要: 即时调整已经成为一种有效的无需排练的技术,用于类增量学习(CIL),它学习一小组任务特定参数(或提示),以指导预先训练的转换器在一系列任务上进行学习。尽管有效,即时调整方法在多标签类增量学习(MLCIL)场景中(其中图像包含多个前景类别)并不适用,因为选择与属于多个任务的不同前景对象对应的正确提示存在歧义。为了规避这个问题,我们建议通过维护任务特定路径来消除提示选择机制,这使我们能够学习不与其他任务的表示相互作用的表示。由于在真正增量场景中的独立路径会导致计算爆炸,因为即时调整中的二次复杂多头自注意(MSA)操作,我们建议将原始补丁令牌嵌入减少为总结令牌。然后将即时调整应用于这些较少的总结令牌以计算最终表示。我们提出的多标签类增量学习方法通过总结补丁令牌嵌入(MULTI-LANE)在MLCIL中实现学习解开的任务特定表示,同时确保快速推断。我们在常见基准测试中进行实验,并展示我们的MULTI-LANE在MLCIL中实现了新的最先进水平。此外,我们展示MULTI-LANE在CIL设置中也具有竞争力。源代码可在https://github.com/tdemin16/multi-lane找到。
更新时间: 2024-05-24 15:18:27
领域: cs.CV,cs.AI
Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning
Federated Learning (FL), a privacy-aware approach in distributed deep learning environments, enables many clients to collaboratively train a model without sharing sensitive data, thereby reducing privacy risks. However, enabling human trust and control over FL systems requires understanding the evolving behaviour of clients, whether beneficial or detrimental for the training, which still represents a key challenge in the current literature. To address this challenge, we introduce Federated Behavioural Planes (FBPs), a novel method to analyse, visualise, and explain the dynamics of FL systems, showing how clients behave under two different lenses: predictive performance (error behavioural space) and decision-making processes (counterfactual behavioural space). Our experiments demonstrate that FBPs provide informative trajectories describing the evolving states of clients and their contributions to the global model, thereby enabling the identification of clusters of clients with similar behaviours. Leveraging the patterns identified by FBPs, we propose a robust aggregation technique named Federated Behavioural Shields to detect malicious or noisy client models, thereby enhancing security and surpassing the efficacy of existing state-of-the-art FL defense mechanisms.
Updated: 2024-05-24 15:17:51
标题: 联邦行为平面:解释联邦学习中客户行为的演变
摘要: 联合学习(FL)是分布式深度学习环境中一种注重隐私的方法,它使许多客户端能够共同训练模型而不共享敏感数据,从而降低隐私风险。然而,要实现人类对FL系统的信任和控制,需要了解客户端的行为演变,无论是对训练有益还是有害,这仍然是当前文献中的一个关键挑战。为了解决这一挑战,我们引入了联合行为平面(FBPs),这是一种分析、可视化和解释FL系统动态的新方法,展示了客户端在两个不同角度下的行为:预测性能(错误行为空间)和决策过程(反事实行为空间)。我们的实验表明,FBPs提供了描述客户端演化状态及其对全局模型贡献的信息轨迹,从而使得能够识别具有相似行为的客户端群集。利用FBPs识别的模式,我们提出了一种强大的聚合技术,称为联合行为屏障,用于检测恶意或噪声客户端模型,从而增强安全性并超越现有最先进的FL防御机制的效力。
更新时间: 2024-05-24 15:17:51
领域: cs.LG,cs.DC
Nonlinear denoising score matching for enhanced learning of structured distributions
We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be obtained from the data by an inexpensive preprocessing step. The nonlinear dynamics introduces new challenges into training which we address in two ways: 1) we develop a new nonlinear denoising score matching (NDSM) method, 2) we introduce neural control variates in order to reduce the variance of the NDSM training objective. We demonstrate the effectiveness of this method on several examples: a) a collection of low-dimensional examples, motivated by clustering in latent space, b) high-dimensional images, addressing issues with mode collapse, small training sets, and approximate symmetries, the latter being a challenge for methods based on equivariant neural networks, which require exact symmetries.
Updated: 2024-05-24 15:14:23
标题: 非线性去噪评分匹配用于增强结构化分布学习
摘要: 我们提出了一种训练基于分数的生成模型的新方法,该方法利用非线性噪声动力学来改善对结构化分布的学习。将非线性漂移泛化为允许将额外结构纳入动力学中,从而使训练更适应数据,例如,在多模态或(近似)对称性的情况下。这种结构可以通过一个廉价的预处理步骤从数据中获取。非线性动力学引入了新的训练挑战,我们通过两种方式解决这些挑战:1)我们开发了一种新的非线性去噪分数匹配(NDSM)方法,2)我们引入神经控制变量以减少NDSM训练目标的方差。我们在几个示例中展示了这种方法的有效性:a)一组低维示例,受潜在空间聚类的启发,b)高维图像,解决了模式坍塌、小训练集和近似对称性等问题,后者对于基于等变神经网络的方法来说是一个挑战,这种方法需要精确的对称性。
更新时间: 2024-05-24 15:14:23
领域: stat.ML,cs.LG
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment
Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility. However, existing methods, primarily based on preference datasets, face challenges such as noisy labels, high annotation costs, and privacy concerns. In this work, we introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges. We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals. Drawing insights from forward and inverse reinforcement learning, we introduce divergence minimization objectives for AfD. Analytically, we elucidate the mass-covering and mode-seeking behaviors of various approaches, explaining when and why certain methods are superior. Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD. We validate our key insights through experiments on the Harmless and Helpful tasks, demonstrating their strong empirical performance while maintaining simplicity.
Updated: 2024-05-24 15:13:53
标题: 逆强化对齐:从演示中进行逆强化学习以实现LLM对齐
摘要: 对齐大型语言模型(LLMs)对于增强其安全性和实用性至关重要。然而,现有方法主要基于偏好数据集,在面临噪声标签、高昂的注释成本和隐私问题等挑战。在这项工作中,我们引入了来自演示的对齐(AfD),这是一种利用高质量演示数据来克服这些挑战的新方法。我们在一个顺序决策框架内形式化了AfD,突出了其缺失奖励信号的独特挑战。借鉴正向和反向强化学习的见解,我们为AfD引入了分歧最小化目标。在理论上,我们阐明了各种方法的质量覆盖和模式寻找行为,解释了何时以及为什么某些方法优越。在实践中,我们提出了一种针对AfD的计算效率高的算法,通过针对特定奖励模型进行外推。我们通过对无害和有益任务的实验验证了我们的关键见解,展示了它们在保持简单性的同时具有强大的实证表现。
更新时间: 2024-05-24 15:13:53
领域: cs.LG,cs.AI
Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues
This work introduces a novel Text-Guided Time Series Forecasting (TGTSF) task. By integrating textual cues, such as channel descriptions and dynamic news, TGTSF addresses the critical limitations of traditional methods that rely purely on historical data. To support this task, we propose TGForecaster, a robust baseline model that fuses textual cues and time series data using cross-attention mechanisms. We then present four meticulously curated benchmark datasets to validate the proposed framework, ranging from simple periodic data to complex, event-driven fluctuations. Our comprehensive evaluations demonstrate that TGForecaster consistently achieves state-of-the-art performance, highlighting the transformative potential of incorporating textual information into time series forecasting. This work not only pioneers a novel forecasting task but also establishes a new benchmark for future research, driving advancements in multimodal data integration for time series models.
Updated: 2024-05-24 15:10:27
标题: 超越趋势和周期性:通过文本线索指导时间序列预测
摘要: 这项工作介绍了一种新颖的文本引导时间序列预测(TGTSF)任务。通过整合文本提示,如频道描述和动态新闻,TGTSF解决了传统方法的关键局限,这些方法纯粹依赖于历史数据。为了支持这项任务,我们提出了TGForecaster,这是一个融合文本提示和时间序列数据的健壮基线模型,使用交叉注意机制。然后,我们提出了四个精心策划的基准数据集,用于验证提出的框架,从简单的周期性数据到复杂的事件驱动波动。我们的综合评估表明,TGForecaster始终实现了最先进的性能,突出了将文本信息纳入时间序列预测的变革潜力。这项工作不仅开创了一项新颖的预测任务,还为未来研究建立了一个新的基准,推动了时间序列模型的多模态数据集成的进展。
更新时间: 2024-05-24 15:10:27
领域: cs.LG,cs.AI,cs.CL
Align as Ideal: Cross-Modal Alignment Binding for Federated Medical Vision-Language Pre-training
Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for medical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for medical VLP while protecting data privacy. However, client data are often heterogeneous in real-world scenarios, and we observe that local training on heterogeneous client data would distort the multimodal representation learning and lead to biased cross-modal alignment. To address this challenge, we propose a Federated Align as IDeal (FedAID) framework for federated VLP with robustness to data heterogeneity, to bind local clients with an ideal crossmodal alignment. Specifically, to reduce distortions on global-aggregated features while learning diverse semantics from client datasets during local training, we propose to bind the cross-model aligned representation space learned by local models with an unbiased one via guidance-based regularization. Moreover, we employ a distribution-based min-max optimization to learn the unbiased cross-modal alignment at each communication turn of federated pre-training. The experiments on real-world datasets demonstrate our method successfully promotes efficient federated multimodal learning for medical VLP with data heterogeneity.
Updated: 2024-05-24 15:08:38
标题: 将Align as Ideal: Cross-Modal Alignment Binding for Federated Medical Vision-Language Pre-training翻译为:将对齐视为理想:联邦医学视觉-语言预训练的跨模态对齐绑定
摘要: 视觉-语言预训练(VLP)已经成为多模态表示学习的有效方案,但它需要大规模的多模态数据进行预训练,这对医疗应用尤其是一个障碍。为了克服数据限制,联邦学习(FL)可以作为一个有前途的策略,用于扩大医疗VLP的数据集,同时保护数据隐私。然而,在现实场景中,客户数据通常是异构的,我们观察到,在异构客户数据上进行本地训练会扭曲多模态表示学习,并导致偏向性跨模态对齐。为了解决这一挑战,我们提出了一个名为联邦对齐的理想(FedAID)框架,用于具有对数据异构性具有鲁棒性的联邦VLP,以将本地客户与理想的跨模态对齐联系起来。具体而言,为了在本地训练期间从客户数据集中学习不同语义的同时减少对全局聚合特征的扭曲,我们建议通过基于指导的正则化将本地模型学习的交叉模型对齐表示空间与一个无偏的对齐空间联系起来。此外,我们采用基于分布的极小-极大优化,在每次联邦预训练通信的转换中学习无偏的跨模态对齐。在真实世界数据集上的实验表明,我们的方法成功促进了具有数据异构性的医疗VLP的高效联邦多模态学习。
更新时间: 2024-05-24 15:08:38
领域: cs.LG,cs.CL,cs.CV
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregated with equal weight and added to the residual stream, the subsequent MLP acts as an ``activation,'' which either erases or amplifies the information originating from individual heads. As a result, the topic token ``France'' stands out in the residual stream. (3) A deep MLP takes ``France'' and generates a component that redirects the residual stream towards the direction of the correct answer, i.e., ``Paris.'' This procedure is akin to applying an implicit function such as ``get\_capital($X$),'' and the argument $X$ is the topic token information passed by attention heads. To achieve the above quantitative and qualitative analysis for MLPs, we proposed a novel analytic method aimed at decomposing the outputs of the MLP into components understandable by humans. Additionally, we observed a universal anti-overconfidence mechanism in the final layer of models, which suppresses correct predictions. We mitigate this suppression by leveraging our interpretation to improve factual recall confidence. The above interpretations are evaluated across diverse tasks spanning various domains of factual knowledge, using various language models from the GPT-2 families, 1.3B OPT, up to 7B Llama-2, and in both zero- and few-shot setups.
Updated: 2024-05-24 15:06:45
标题: 利用基于Transformer的语言模型解释事实回忆的关键机制
摘要: 在这篇论文中,我们深入探讨了Transformer-based语言模型(LLMs)在事实回忆任务中使用的几种机制。我们概述了一个由三个主要步骤组成的流程:(1)在给定提示“法国的首都是”,任务特定的注意力头从上下文中提取主题标记,如“法国”,并将其传递给后续的MLPs。 (2)由于注意力头的输出以相等的权重聚合并添加到残余流中,因此后续的MLP充当“激活”,可以擦除或放大来自单个头的信息。因此,主题标记“法国”在残余流中突出显示。 (3)一个深层MLP获取“法国”并生成一个组件,将残余流重定向到正确答案的方向,即“巴黎”。此过程类似于应用隐式函数,例如“get\_capital(X)”,其中参数$X$是由注意力头传递的主题标记信息。为了对MLPs进行上述定量和定性分析,我们提出了一种新颖的分析方法,旨在将MLP的输出分解为人类可理解的组件。此外,我们观察到模型最终层中存在一种普遍的抑制过度自信的机制,这会抑制正确预测。我们通过利用我们的解释来缓解这种抑制,以提高事实回忆信心。上述解释在跨越各种领域的各种任务中进行评估,使用来自GPT-2系列的各种语言模型,从1.3B OPT到7B Llama-2,以及零次和少次样本设置。
更新时间: 2024-05-24 15:06:45
领域: cs.CL,cs.AI,cs.LG
MLPs Learn In-Context
In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, has commonly been assumed to be a unique hallmark of Transformer models. In this study, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, we find that MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget. We further show that MLPs outperform Transformers on a subset of ICL tasks designed to test relational reasoning. These results suggest that in-context learning is not exclusive to Transformers and highlight the potential of exploring this phenomenon beyond attention-based architectures. In addition, MLPs' surprising success on relational tasks challenges prior assumptions about simple connectionist models. Altogether, our results endorse the broad trend that ``less inductive bias is better" and contribute to the growing interest in all-MLP alternatives to task-specific architectures.
Updated: 2024-05-24 15:04:36
标题: MLPs学会上下文
摘要: 上下文学习(ICL),即仅凭输入示例就能解决任务的显著能力,通常被认为是Transformer模型的独特标志。在这项研究中,我们展示了多层感知器(MLPs)也可以进行上下文学习。此外,我们发现MLPs以及与之密切相关的MLP-Mixer模型,在相同的计算预算下,与Transformer竞争性地进行上下文学习。我们进一步展示了MLPs在设计用于测试关系推理的ICL任务子集上优于Transformers。这些结果表明,上下文学习并不是Transformer独有的,突显了探索这一现象超越基于注意力的架构的潜力。此外,MLPs在关系任务上的惊人成功挑战了关于简单连接模型的先前假设。总的来说,我们的结果支持“更少的归纳偏见更好”的普遍趋势,并有助于不断增长的对所有MLP替代特定任务架构的兴趣。
更新时间: 2024-05-24 15:04:36
领域: cs.LG,cs.NE
Neuromorphic dreaming: A pathway to efficient learning in artificial agents
Achieving energy efficiency in learning is a key challenge for artificial intelligence (AI) computing platforms. Biological systems demonstrate remarkable abilities to learn complex skills quickly and efficiently. Inspired by this, we present a hardware implementation of model-based reinforcement learning (MBRL) using spiking neural networks (SNNs) on mixed-signal analog/digital neuromorphic hardware. This approach leverages the energy efficiency of mixed-signal neuromorphic chips while achieving high sample efficiency through an alternation of online learning, referred to as the "awake" phase, and offline learning, known as the "dreaming" phase. The model proposed includes two symbiotic networks: an agent network that learns by combining real and simulated experiences, and a learned world model network that generates the simulated experiences. We validate the model by training the hardware implementation to play the Atari game Pong. We start from a baseline consisting of an agent network learning without a world model and dreaming, which successfully learns to play the game. By incorporating dreaming, the number of required real game experiences are reduced significantly compared to the baseline. The networks are implemented using a mixed-signal neuromorphic processor, with the readout layers trained using a computer in-the-loop, while the other layers remain fixed. These results pave the way toward energy-efficient neuromorphic learning systems capable of rapid learning in real world applications and use-cases.
Updated: 2024-05-24 15:03:56
标题: 神经形态学梦想:人工智能代理有效学习的路径
摘要: 在学习中实现能源效率是人工智能(AI)计算平台的一个关键挑战。生物系统展示了快速高效学习复杂技能的显著能力。受此启发,我们提出了一种利用混合信号模拟数字神经形态硬件的脉冲神经网络(SNN)实现模型驱动的强化学习(MBRL)的硬件实现。该方法利用混合信号神经形态芯片的能源效率,同时通过在线学习的交替,即“觉醒”阶段和离线学习,即“梦想”阶段,实现高样本效率。提出的模型包括两个共生网络:一个通过结合真实和模拟经验学习的代理网络,以及生成模拟经验的学习世界模型网络。我们通过训练硬件实现来验证该模型,以玩Atari游戏Pong。我们从一个基线开始,包括一个代理网络学习,没有世界模型和梦想,成功学会玩游戏。通过引入梦想,与基线相比,所需的真实游戏经验数量大大减少。这些网络使用混合信号神经形态处理器实现,读出层使用计算机循环训练,而其他层保持不变。这些结果为在现实世界应用和用例中快速学习的能源效率神经形态学习系统铺平了道路。
更新时间: 2024-05-24 15:03:56
领域: cs.AI,cs.LG,cs.NE
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study
Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused by many factors, like lack of awareness, limited efficacy of the existing vulnerability detection tools or the tools not being user-friendly. To help combat some issues with traditional vulnerability detection tools, we propose using large language models (LLMs) to assist in finding vulnerabilities in source code. LLMs have shown a remarkable ability to understand and generate code, underlining their potential in code-related tasks. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies, allowing extraction of the best value from the LLMs. We provide an overview of the strengths and weaknesses of the LLM-based approach and compare the results to those of traditional static analysis tools. We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores. The results should benefit software developers and security analysts responsible for ensuring that the code is free of vulnerabilities.
Updated: 2024-05-24 14:59:19
标题: 利用大型语言模型进行软件漏洞检测:一项全面的基准研究
摘要: 尽管采用了各种方法来检测漏洞,但多年来报告的漏洞数量呈上升趋势。这表明问题在代码发布前未被发现,可能由于诸多因素导致,如缺乏意识、现有漏洞检测工具的效力有限或工具不够用户友好。为了帮助解决传统漏洞检测工具的一些问题,我们建议使用大型语言模型(LLMs)来辅助发现源代码中的漏洞。LLMs展现出了出色的理解和生成代码的能力,突显了它们在与代码相关的任务中的潜力。我们的目标是测试多个最先进的LLMs,并确定最佳提示策略,从而提取LLMs的最大价值。我们概述了基于LLM的方法的优势和劣势,并将结果与传统静态分析工具进行了比较。我们发现LLMs能够比传统的静态分析工具更准确地找出问题,在召回率和F1分数方面表现优于传统工具。这些结果应当使软件开发人员和负责确保代码没有漏洞的安全分析人员受益。
更新时间: 2024-05-24 14:59:19
领域: cs.CR,cs.AI,cs.SE
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size. In this work, we consider the problem of automatic curation of high-quality datasets for self-supervised pre-training. We posit that such datasets should be large, diverse and balanced, and propose a clustering-based approach for building ones satisfying all these criteria. Our method involves successive and hierarchical applications of $k$-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. Extensive experiments on three different data domains including web-based images, satellite images and text show that features trained on our automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data.
Updated: 2024-05-24 14:58:51
标题: 自监督学习的自动数据整理:一种基于聚类的方法
摘要: 自我监督特征是现代机器学习系统的基石。它们通常在需要大量人力的数据集上进行预训练。这种手动过程存在一些限制,与监督学习中遇到的类似,例如,众包选择数据成本高昂且耗时,阻碍了数据集规模的扩展。在这项工作中,我们考虑了自动策划高质量数据集用于自我监督预训练的问题。我们认为这样的数据集应该是大型、多样化且平衡的,并提出了一种基于聚类的方法来构建符合所有这些标准的数据集。我们的方法涉及在大型和多样化的数据存储库上连续和分层应用k均值,以获得在数据概念之间均匀分布的聚类,然后从这些聚类中进行分层平衡抽样步骤。对包括基于网络的图像、卫星图像和文本在内的三个不同数据领域进行的大量实验表明,我们自动策划的数据集训练的特征优于在未策划数据上训练的特征,同时与在手动策划数据上训练的特征相当或更好。
更新时间: 2024-05-24 14:58:51
领域: cs.LG,cs.AI,cs.CV
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain intermediate feedback inside the search algorithm by checking partial programs at each step to estimate an upper bound on the value function. To measure the performance of VerMCTS, we develop a new suite of multi-step verified programming problems in Dafny and Coq. In terms of pass@T, a new metric which computes the pass rate given a budget of T tokens sampled from the LLM, VerMCTS leads to more than a 30% absolute increase in average pass@5000 across the suite over repeated sampling from the base language model. Our code and benchmarks are available at https://github.com/namin/llm-verified-with-monte-carlo-tree-search .
Updated: 2024-05-24 14:51:14
标题: VerMCTS:使用验证器、大型语言模型和树搜索合成多步程序
摘要: 大型语言模型(LLMs)可以生成有用的代码,但通常生成的代码不能被信任为正确的。在本文中,我们介绍了VerMCTS,这是一种通过在Dafny和Coq中生成经过验证的程序来开始解决这个问题的方法。VerMCTS使用逻辑验证器与LLM协同工作,通过引导修改的蒙特卡洛树搜索(MCTS)来指导。这种方法利用验证器在搜索算法中获得中间反馈,通过在每一步检查部分程序来估计价值函数的上限。为了衡量VerMCTS的性能,我们在Dafny和Coq中开发了一套新的多步经过验证的编程问题。在通过LLM采样T个令牌的预算下计算通过率的新指标pass@T方面,VerMCTS相对于从基础语言模型中重复采样获得的平均pass@5000在整个套件中实现了超过30%的绝对增长。我们的代码和基准测试可在https://github.com/namin/llm-verified-with-monte-carlo-tree-search找到。
更新时间: 2024-05-24 14:51:14
领域: cs.SE,cs.AI,cs.LG,cs.LO,cs.PL
PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all latent variables for generation. Then, we insert the corresponding training pose into the target pose sequences to enhance faithfulness through a trained temporal attention module. Furthermore, to alleviate the face and hand degradation resulting from discrepancies between poses of training videos and inference poses, we implement simple latent editing through an affine transformation matrix involving facial and hand landmarks. Extensive experiments on several datasets demonstrate that PoseCrafter achieves superior results to baselines pre-trained on a vast collection of videos under 8 commonly used metrics. Besides, PoseCrafter can follow poses from different individuals or artificial edits and simultaneously retain the human identity in an open-domain training video. Our project page is available at https://ml-gsai.github.io/PoseCrafter-demo/.
Updated: 2024-05-24 14:46:34
标题: PoseCrafter: 一次性个性化视频合成,遵循灵活的姿势控制
摘要: 在本文中,我们介绍了PoseCrafter,这是一种用于根据灵活姿势控制个性化视频生成的一次性方法。基于稳定扩散和控制网络,我们精心设计了一个推断过程,以在没有对应的真实帧的情况下生成高质量的视频。首先,我们从训练视频中选择一个适当的参考帧并反转它以初始化所有潜变量进行生成。然后,我们通过经过训练的时间注意模块将相应的训练姿势插入目标姿势序列,以增强忠实度。此外,为了缓解由训练视频和推断姿势之间差异导致的脸部和手部退化,我们通过涉及面部和手部标志的仿射变换矩阵实现简单的潜变编辑。对几个数据集的广泛实验表明,PoseCrafter在8个常用指标下优于在大量视频上预训练的基线。此外,PoseCrafter可以遵循不同个体的姿势或人工编辑,并同时在开放域训练视频中保留人类身份。我们的项目页面位于https://ml-gsai.github.io/PoseCrafter-demo/。
更新时间: 2024-05-24 14:46:34
领域: cs.CV,cs.AI
Fast-PGM: Fast Probabilistic Graphical Model Learning and Inference
Probabilistic graphical models (PGMs) serve as a powerful framework for modeling complex systems with uncertainty and extracting valuable insights from data. However, users face challenges when applying PGMs to their problems in terms of efficiency and usability. This paper presents Fast-PGM, an efficient and open-source library for PGM learning and inference. Fast-PGM supports comprehensive tasks on PGMs, including structure and parameter learning, as well as exact and approximate inference, and enhances efficiency of the tasks through computational and memory optimizations and parallelization techniques. Concurrently, Fast-PGM furnishes developers with flexible building blocks, furnishes learners with detailed documentation, and affords non-experts user-friendly interfaces, thereby ameliorating the usability of PGMs to users across a spectrum of expertise levels. The source code of Fast-PGM is available at https://github.com/jjiantong/FastPGM.
Updated: 2024-05-24 14:43:37
标题: 快速PGM:快速概率图模型学习和推断
摘要: 概率图模型(PGMs)作为一个强大的框架,用于建模具有不确定性的复杂系统,并从数据中提取有价值的见解。然而,用户在将PGMs应用于其问题时面临效率和易用性方面的挑战。本文介绍了Fast-PGM,一个高效且开源的PGM学习和推断库。Fast-PGM支持PGMs的全面任务,包括结构和参数学习,以及精确和近似推断,并通过计算和内存优化以及并行化技术提高任务的效率。同时,Fast-PGM为开发人员提供灵活的构建模块,为学习者提供详细的文档,并为非专家用户提供友好的界面,从而改善PGMs对各种专业水平的用户的可用性。Fast-PGM的源代码可在https://github.com/jjiantong/FastPGM上找到。
更新时间: 2024-05-24 14:43:37
领域: cs.LG
Towards Principled Graph Transformers
Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at https://github.com/luis-mueller/towards-principled-gts
Updated: 2024-05-24 14:42:44
标题: 走向基于原则的图变换器
摘要: 基于k维Weisfeiler-Leman(k-WL)层次结构的图学习架构在理论上具有很好的表达能力。然而,这种架构通常在实际任务中无法提供可靠的预测性能,限制了它们的实际影响。相比之下,全局注意力模型(如图变换器)在实践中表现出强大的性能,但与k-WL层次结构的表达能力进行比较仍然具有挑战性,特别是因为这些架构依赖于位置或结构编码来实现其表达能力和预测性能。为了解决这个问题,我们展示了最近提出的边缘变换器,这是一个在节点对而不是节点上操作的全局注意力模型,具有至少3-WL的表达能力。在实证方面,我们证明了边缘变换器在预测性能方面超越了其他理论对齐的架构,同时不依赖于位置或结构编码。我们的代码可在 https://github.com/luis-mueller/towards-principled-gts 找到。
更新时间: 2024-05-24 14:42:44
领域: cs.LG,cs.AI
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
Risk-aware Reinforcement Learning (RL) algorithms like SAC and TD3 were shown empirically to outperform their risk-neutral counterparts in a variety of continuous-action tasks. However, the theoretical basis for the pessimistic objectives these algorithms employ remains unestablished, raising questions about the specific class of policies they are implementing. In this work, we apply the expected utility hypothesis, a fundamental concept in economics, to illustrate that both risk-neutral and risk-aware RL goals can be interpreted through expected utility maximization using an exponential utility function. This approach reveals that risk-aware policies effectively maximize value certainty equivalent, aligning them with conventional decision theory principles. Furthermore, we propose Dual Actor-Critic (DAC). DAC is a risk-aware, model-free algorithm that features two distinct actor networks: a pessimistic actor for temporal-difference learning and an optimistic actor for exploration. Our evaluations of DAC across various locomotion and manipulation tasks demonstrate improvements in sample efficiency and final performance. Remarkably, DAC, while requiring significantly less computational resources, matches the performance of leading model-based methods in the complex dog and humanoid domains.
Updated: 2024-05-24 14:40:18
标题: 关于风险感知代理理论:搭建演员评论家和经济学之间的桥梁
摘要: 风险感知强化学习(RL)算法,如SAC和TD3在各种连续动作任务中经验证明优于其风险中性对应物。然而,这些算法所采用的悲观目标的理论基础尚未确立,引发了关于它们实施的特定政策类别的问题。在这项工作中,我们应用了经济学中的基本概念——期望效用假设,以说明风险中性和风险感知RL目标均可通过使用指数效用函数进行期望效用最大化来解释。这种方法揭示了风险感知政策有效地最大化了价值确定等价物,使其与传统决策理论原则保持一致。此外,我们提出了Dual Actor-Critic(DAC)。DAC是一种风险感知的、无模型算法,具有两个不同的演员网络:一个用于时序差分学习的悲观演员和一个用于探索的乐观演员。我们对DAC在各种运动和操作任务中的评估表明,其在样本效率和最终性能方面有所改善。值得注意的是,DAC在需要较少的计算资源的情况下,即可与复杂的狗和人形领域中领先的基于模型方法相匹敌。
更新时间: 2024-05-24 14:40:18
领域: cs.LG
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.
Updated: 2024-05-24 14:39:24
标题: 无滤镜:对比视觉-语言模型中的文化和社会经济多样性
摘要: 我们研究了对比视觉语言模型(VLMs)中的文化和社会经济多样性。通过使用广泛的基准数据集和评估指标,我们引起了一些重要的发现。首先,常见的训练数据筛选为英语图像文本对会给社会经济地位较低的社区造成不利影响,并且对文化理解产生负面影响。值得注意的是,这种性能差距并不能被当前流行的基于西方中心的ImageNet和COCO数据集衍生的评估指标所捕捉到,甚至与之相悖。其次,在微调英语内容之前使用全球未经过滤的数据进行预训练,可以在不牺牲对这些流行基准的性能的情况下提高文化理解。第三,我们引入了地理定位任务作为一种新颖的评估指标来评估VLMs中的文化多样性。我们的工作强调了利用多样化数据创建更具包容性的多模态系统的价值,并为开发更好地代表全球视角的VLMs奠定了基础。
更新时间: 2024-05-24 14:39:24
领域: cs.CV,cs.AI
Leveraging joint sparsity in hierarchical Bayesian learning
We present a hierarchical Bayesian learning approach to infer jointly sparse parameter vectors from multiple measurement vectors. Our model uses separate conditionally Gaussian priors for each parameter vector and common gamma-distributed hyper-parameters to enforce joint sparsity. The resulting joint-sparsity-promoting priors are combined with existing Bayesian inference methods to generate a new family of algorithms. Our numerical experiments, which include a multi-coil magnetic resonance imaging application, demonstrate that our new approach consistently outperforms commonly used hierarchical Bayesian methods.
Updated: 2024-05-24 14:37:45
标题: 利用层次贝叶斯学习中的联合稀疏性
摘要: 我们提出了一种层次贝叶斯学习方法,用于从多个测量向量中推断共同稀疏的参数向量。我们的模型为每个参数向量使用单独的条件高斯先验,并使用共同的伽马分布超参数来强制实现联合稀疏性。由此产生的促进联合稀疏性的先验与现有的贝叶斯推断方法相结合,形成了一类新的算法。我们的数值实验,包括多线圈磁共振成像应用,表明我们的新方法始终优于常用的层次贝叶斯方法。
更新时间: 2024-05-24 14:37:45
领域: stat.ML,cs.LG,cs.NA,math.NA,65F22, 62F15, 65K10, 68U10
Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods by several orders of magnitude. While promising, the proposed methods only scale to networks with a few thousand parameters due to the high computational cost to evaluate, store, and invert the curvature matrix. We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks. Our approach goes beyond the established KFAC for traditional deep learning problems as it captures contributions from a PDE's differential operator that are crucial for optimization. To establish KFAC for such losses, we use Taylor-mode automatic differentiation to describe the differential operator's computation graph as a forward network with shared weights. This allows us to apply KFAC thanks to a recently-developed general formulation for networks with weight sharing. Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.
Updated: 2024-05-24 14:36:02
标题: Kronecker分解近似曲率用于物理启发神经网络
摘要: 物理信息神经网络(PINNs)以难以训练而臭名昭著。最近,基于自然梯度和高斯-牛顿方法的二阶方法表现出有希望的性能,将第一阶方法实现的准确性提高了几个数量级。尽管有所希望,但由于评估、存储和反转曲率矩阵的高计算成本,提出的方法仅适用于具有少量参数的网络。我们提出了用于PINN损失的克罗内克分解近似曲率(KFAC),大大降低了计算成本,并允许扩展到更大的网络。我们的方法超越了传统深度学习问题的KFAC,因为它捕捉了对优化至关重要的偏微分方程(PDE)的微分算子的贡献。为了建立这些损失的KFAC,我们使用泰勒模式自动微分将微分算子的计算图描述为具有共享权重的前向网络。这使我们能够应用KFAC,这得益于最近开发的具有权重共享的网络的一般公式。经验上,我们发现我们基于KFAC的优化器在小问题上与昂贵的二阶方法竞争力相当,对更高维度的神经网络和PDEs更有利,并一贯优于第一阶方法和LBFGS。
更新时间: 2024-05-24 14:36:02
领域: cs.LG,physics.comp-ph
On the Computational Landscape of Replicable Learning
We study computational aspects of algorithmic replicability, a notion of stability introduced by Impagliazzo, Lei, Pitassi, and Sorrell [2022]. Motivated by a recent line of work that established strong statistical connections between replicability and other notions of learnability such as online learning, private learning, and SQ learning, we aim to understand better the computational connections between replicability and these learning paradigms. Our first result shows that there is a concept class that is efficiently replicably PAC learnable, but, under standard cryptographic assumptions, no efficient online learner exists for this class. Subsequently, we design an efficient replicable learner for PAC learning parities when the marginal distribution is far from uniform, making progress on a question posed by Impagliazzo et al. [2022]. To obtain this result, we design a replicable lifting framework inspired by Blanc, Lange, Malik, and Tan [2023] that transforms in a black-box manner efficient replicable PAC learners under the uniform marginal distribution over the Boolean hypercube to replicable PAC learners under any marginal distribution, with sample and time complexity that depends on a certain measure of the complexity of the distribution. Finally, we show that any pure DP learner can be transformed to a replicable one in time polynomial in the accuracy, confidence parameters and exponential in the representation dimension of the underlying hypothesis class.
Updated: 2024-05-24 14:30:40
标题: 关于可复制学习的计算景观
摘要: 我们研究了算法的可复制性的计算方面,这是由Impagliazzo、Lei、Pitassi和Sorrell[2022]引入的稳定性概念。受最近一系列工作的启发,这些工作建立了可复制性与在线学习、私有学习和SQ学习等其他学习概念之间的强大统计联系,我们旨在更好地理解可复制性与这些学习范式之间的计算连接。我们的第一个结果表明,存在一个概念类,可以有效地进行可复制的PAC学习,但在标准密码假设下,不存在适用于这个类的有效在线学习器。随后,我们设计了一个高效的可复制学习器,用于在边际分布远离均匀分布时PAC学习奇偶,这是对Impagliazzo等人[2022]提出的问题的进展。为了获得这个结果,我们设计了一个受到Blanc、Lange、Malik和Tan[2023]启发的可复制提升框架,以黑盒方式将在布尔超立方体上的均匀边际分布下的有效可复制PAC学习器转换为在任何边际分布下的可复制PAC学习器,其样本和时间复杂度取决于分布复杂性的某种度量。最后,我们表明任何纯DP学习器在精度、置信参数的多项式时间内可以转换为一个可复制学习器,并且在底层假设类的表示维度上是指数级的。
更新时间: 2024-05-24 14:30:40
领域: cs.LG,stat.ML
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model Integrating CNN, LSTM, and GRU
Accurate demand forecasting is crucial for optimizing supply chain management. Traditional methods often fail to capture complex patterns from seasonal variability and special events. Despite advancements in deep learning, interpretable forecasting models remain a challenge. To address this, we introduce the Multi-Channel Data Fusion Network (MCDFN), a hybrid architecture that integrates Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and Gated Recurrent Units (GRU) to enhance predictive performance by extracting spatial and temporal features from time series data. Our rigorous benchmarking demonstrates that MCDFN outperforms seven other deep-learning models, achieving superior metrics: MSE (23.5738%), RMSE (4.8553%), MAE (3.9991%), and MAPE (20.1575%). Additionally, MCDFN's predictions were statistically indistinguishable from actual values, confirmed by a paired t-test with a 5% p-value and a 10-fold cross-validated statistical paired t-test. We apply explainable AI techniques like ShapTime and Permutation Feature Importance to enhance interpretability. This research advances demand forecasting methodologies and offers practical guidelines for integrating MCDFN into supply chain systems, highlighting future research directions for scalability and user-friendly deployment.
Updated: 2024-05-24 14:30:00
标题: MCDFN:通过可解释的多通道数据融合网络模型整合CNN、LSTM和GRU进行供应链需求预测
摘要: 准确的需求预测对于优化供应链管理至关重要。传统方法往往无法捕捉季节性变化和特殊事件中的复杂模式。尽管深度学习取得了进展,但可解释的预测模型仍然是一个挑战。为了解决这个问题,我们引入了多通道数据融合网络(MCDFN),这是一种混合架构,集成了卷积神经网络(CNN)、长短期记忆网络(LSTM)和门控循环单元(GRU),通过从时间序列数据中提取空间和时间特征来增强预测性能。我们严格的基准测试表明,MCDFN优于其他七种深度学习模型,实现了优越的指标:MSE(23.5738%)、RMSE(4.8553%)、MAE(3.9991%)和MAPE(20.1575%)。此外,MCDFN的预测与实际值在统计上没有区别,经过配对t检验和10倍交叉验证的统计配对t检验得到了确认。我们应用可解释的AI技术,如ShapTime和排列特征重要性,以增强可解释性。这项研究推动了需求预测方法论的发展,并提供了将MCDFN整合到供应链系统中的实用指南,强调了未来研究方向,以实现可扩展性和用户友好部署。
更新时间: 2024-05-24 14:30:00
领域: cs.LG,cs.AI
SpACNN-LDVAE: Spatial Attention Convolutional Latent Dirichlet Variational Autoencoder for Hyperspectral Pixel Unmixing
The hyperspectral pixel unmixing aims to find the underlying materials (endmembers) and their proportions (abundances) in pixels of a hyperspectral image. This work extends the Latent Dirichlet Variational Autoencoder (LDVAE) pixel unmixing scheme by taking into account local spatial context while performing pixel unmixing. The proposed method uses an isotropic convolutional neural network with spatial attention to encode pixels as a dirichlet distribution over endmembers. We have evaluated our model on Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21 datasets. Our model also leverages the transfer learning paradigm for Cuprite Dataset, where we train the model on synthetic data and evaluate it on the real-world data. The results suggest that incorporating spatial context improves both endmember extraction and abundance estimation.
Updated: 2024-05-24 14:26:47
标题: SpACNN-LDVAE: 用于高光谱像素解混的空间注意卷积潜在狄利克雷变分自动编码器
摘要: 高光谱像素解混旨在找到高光谱图像中像素中的潜在材料(端元)及其比例(丰度)。本研究通过考虑局部空间背景来扩展潜在Dirichlet变分自动编码器(LDVAE)像素解混方案。所提出的方法利用具有空间注意力的各向同性卷积神经网络来将像素编码为端元的Dirichlet分布。我们在Samson、Hydice Urban、Cuprite和OnTech-HSI-Syn-21数据集上评估了我们的模型。我们的模型还利用了Cuprite数据集的迁移学习范式,我们在合成数据上训练模型,并在真实数据上进行评估。结果表明,融入空间背景可以改善端元提取和丰度估计。
更新时间: 2024-05-24 14:26:47
领域: cs.CV,cs.LG,eess.IV
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called MICROADAM that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before it is fed into the optimizer state, thereby reducing its memory footprint significantly. We control the resulting compression error via a novel instance of the classical error feedback mechanism from distributed optimization [Seide et al., 2014, Alistarh et al., 2018, Karimireddy et al., 2019] in which the error correction information is itself compressed to allow for practical memory gains. We prove that the resulting approach maintains theoretical convergence guarantees competitive to those of AMSGrad, while providing good practical performance. Specifically, we show that MICROADAM can be implemented efficiently on GPUs: on both million-scale (BERT) and billion-scale (LLaMA) models, MicroAdam provides practical convergence competitive to that of the uncompressed Adam baseline, with lower memory usage and similar running time. Our code is available at https://github.com/IST-DASLab/MicroAdam.
Updated: 2024-05-24 14:25:23
标题: MicroAdam: 准确的自适应优化,低空间开销和可证收敛性
摘要: 我们提出了一种新变体的Adam优化器[Kinma和Ba,2014],称为MICROADAM,该优化器专门最小化内存开销,同时保持理论收敛保证。我们通过在将梯度信息输入优化器状态之前压缩梯度信息来实现这一点,从而显著减少其内存占用。我们通过一种新颖的分布式优化中的经典误差反馈机制的实例[Seide等,2014,Alistarh等,2018,Karimireddy等,2019]来控制结果的压缩误差,其中错误校正信息本身也被压缩以实现实际内存增益。我们证明了结果方法保持了与AMSGrad相竞争的理论收敛保证,同时提供良好的实际性能。具体地,我们展示了MICROADAM可以在GPU上高效实现:在百万规模(BERT)和十亿规模(LLaMA)模型上,MicroAdam提供了与未压缩的Adam基线相竞争的实际收敛,具有更低的内存使用和类似的运行时间。我们的代码可在https://github.com/IST-DASLab/MicroAdam找到。
更新时间: 2024-05-24 14:25:23
领域: cs.LG,cs.NA,math.NA
Efficient Adversarial Training in LLMs with Continuous Attacks
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete adversarial attacks at each training iteration. We address this problem by instead calculating adversarial attacks in the continuous embedding space of the LLM, which is orders of magnitudes more efficient. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data. Moreover, we introduce C-AdvIPO, an adversarial variant of IPO that does not require utility data for adversarially robust alignment. Our empirical evaluation on four models from different families (Gemma, Phi3, Mistral, Zephyr) and at different scales (2B, 3.8B, 7B) shows that both algorithms substantially enhance LLM robustness against discrete attacks (GCG, AutoDAN, PAIR), while maintaining utility. Our results demonstrate that robustness to continuous perturbations can extrapolate to discrete threat models. Thereby, we present a path toward scalable adversarial training algorithms for robustly aligning LLMs.
Updated: 2024-05-24 14:20:09
标题: 使用连续攻击在LLMs中进行高效对抗训练
摘要: 大型语言模型(LLMs)容易受到对抗性攻击的影响,这些攻击可以绕过它们的安全防护。在许多领域,对抗性训练已被证明是可靠地提高对抗性攻击的鲁棒性的最有前途的方法之一。然而,在LLMs的背景下,目前的对抗性训练方法受到每个训练迭代中进行离散对抗性攻击所需的高计算成本的限制。我们通过在LLM的连续嵌入空间中计算对抗性攻击来解决这个问题,这种方法效率更高。我们提出了一种快速对抗性训练算法(C-AdvUL),由两个损失组成:第一个损失使模型在对抗行为数据集上计算的连续嵌入攻击中具有鲁棒性;第二个损失通过在实用数据上进行微调来确保最终模型的实用性。此外,我们还引入了C-AdvIPO,这是IPO的对抗性变体,不需要实用数据来进行对抗性鲁棒对齐。我们在来自不同系列(Gemma、Phi3、Mistral、Zephyr)和不同规模(2B、3.8B、7B)的四个模型上进行了实证评估,结果显示这两种算法显著增强了LLMs对离散攻击(GCG、AutoDAN、PAIR)的鲁棒性,同时保持了实用性。我们的结果表明,对连续扰动的鲁棒性可以推广到离散威胁模型。因此,我们提出了一条路径,用于开发可扩展的对抗性训练算法,从而使LLMs能够具有强大的对齐能力。
更新时间: 2024-05-24 14:20:09
领域: cs.LG,cs.CR
Distribution-free Deviation Bounds and The Role of Domain Knowledge in Learning via Model Selection with Cross-validation Risk Estimation
Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. In particular, we investigate how the generalization of learning via model selection may be increased by modeling the collection of candidate models. We define the Learning Spaces as a class of candidate models in which the partial order by inclusion reflects the models complexities, and we formalize a manner of defining them based on domain knowledge. We illustrate this modeling in a worst-case scenario of learning a classifier with finite domain and a typical scenario of linear regression. Through theoretical insights and concrete examples, we aim to provide guidance on selecting the family of candidate models based on domain knowledge to increase generalization.
Updated: 2024-05-24 14:19:27
标题: 不受分布限制的偏差界限和领域知识在通过模型选择和交叉验证风险估计学习中的作用
摘要: 交叉验证技术在风险估计和模型选择中被广泛应用于统计学和机器学习领域。然而,在交叉验证风险估计的模型选择学习过程中,对其理论性质的理解相当有限,尽管它被广泛使用。在这种背景下,本文将交叉验证风险估计的模型选择学习作为一个通用的系统化学习框架在经典统计学习理论中进行了讨论,并建立了基于VC维度的无分布偏差界,详细证明了结果并考虑了有界和无界损失函数。特别地,我们研究了如何通过对候选模型集合进行建模来增加模型选择学习的泛化能力。我们将学习空间定义为一类候选模型,在其中按照包含关系的偏序反映了模型的复杂性,并制定了一种基于领域知识定义它们的方法。我们在有限域的分类器学习的最坏情况和线性回归的典型情况下说明了这种建模过程。通过理论洞察和具体例子,我们旨在提供指导,根据领域知识选择候选模型族,以增加泛化能力。
更新时间: 2024-05-24 14:19:27
领域: stat.ML,cs.LG
DAGER: Exact Gradient Inversion for Large Language Models
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).
Updated: 2024-05-24 14:14:24
标题: DAGER:大型语言模型的精确梯度反转
摘要: 联邦学习通过聚合来自多个客户端的本地计算梯度,从而实现协作训练,而无需共享私人客户数据。然而,先前的研究表明,服务器实际上可以使用所谓的梯度反演攻击来恢复数据。尽管这些攻击在应用于图像时表现良好,但在文本领域受到限制,仅允许对小批量和短输入序列进行近似重建。在这项工作中,我们提出了DAGER,这是第一个能够精确恢复整个输入文本批次的算法。DAGER利用自注意力层梯度的低秩结构和令牌嵌入的离散特性,有效地检查给定的令牌序列是否属于客户端数据。我们利用这种检查在诚实但好奇的设置中准确地恢复全批次,无需对数据进行任何先验,分别使用完全穷举的启发式搜索和贪婪方法,对编码器和解码器架构。我们提供了DAGER的高效GPU实现,并实验证明它在大型语言模型(LLMs)上能够恢复高达128个大小的完整批次,速度优于先前的攻击(在相同批次大小上快20倍)、可扩展性(扩大10倍批次)和重建质量(ROUGE-1/2 > 0.99)。
更新时间: 2024-05-24 14:14:24
领域: cs.LG,cs.DC,I.2.7; I.2.11
Submodular Reinforcement Learning
In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $\textit{independent}$ of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose $\textit{submodular RL}$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.
Updated: 2024-05-24 14:14:19
标题: 子模块强化学习
摘要: 在强化学习(RL)中,状态的奖励通常被认为是可加的,并且遵循马尔可夫假设,它们与先前访问的状态是独立的。在许多重要应用中,比如覆盖控制、实验设计和信息路径规划,奖励自然地具有递减收益,即在访问类似状态之后,它们的价值会降低。为了解决这个问题,我们提出了一种名为子模块RL(SubRL)的范式,它旨在通过子模块集函数模拟更一般、非可加(和依赖历史的)奖励,以捕捉递减收益。不幸的是,通常情况下,即使在表格设置中,我们也证明了由此产生的优化问题很难近似。另一方面,受经典子模块优化中贪婪算法的成功启发,我们提出了SubPO,一种基于简单策略梯度的算法,用于处理通过贪婪地最大化边际增益来处理非可加奖励的SubRL。事实上,在对底层马尔可夫决策过程(MDP)做一些假设的情况下,SubPO可以恢复出子模块赌博机的最优常数因子近似值。此外,我们推导了一种自然的策略梯度方法,用于在大规模状态和操作空间中局部优化SubRL实例。我们通过将SubPO应用于多个应用程序展示了我们方法的多功能性,例如生物多样性监测、贝叶斯实验设计、信息路径规划和覆盖最大化。我们的结果展示了样本效率,以及对高维状态-动作空间的可扩展性。
更新时间: 2024-05-24 14:14:19
领域: cs.LG
On the Weight Dynamics of Deep Normalized Networks
Recent studies have shown that high disparities in effective learning rates (ELRs) across layers in deep neural networks can negatively affect trainability. We formalize how these disparities evolve over time by modeling weight dynamics (evolution of expected gradient and weight norms) of networks with normalization layers, predicting the evolution of layer-wise ELR ratios. We prove that when training with any constant learning rate, ELR ratios converge to 1, despite initial gradient explosion. We identify a ``critical learning rate" beyond which ELR disparities widen, which only depends on current ELRs. To validate our findings, we devise a hyper-parameter-free warm-up method that successfully minimizes ELR spread quickly in theory and practice. Our experiments link ELR spread with trainability, a relationship that is most evident in very deep networks with significant gradient magnitude excursions.
Updated: 2024-05-24 14:12:25
标题: 关于深度归一化网络的权重动态
摘要: 最近的研究表明,深度神经网络中各层有效学习速率(ELR)的高度差异可以对可训练性产生负面影响。我们通过建模带有归一化层的网络的权重动态(期望梯度和权重范数的演变)来形式化这些差异随时间的演变,预测逐层ELR比率的演变。我们证明,当使用任何恒定学习速率进行训练时,尽管初始梯度爆炸,但ELR比率会收敛到1。我们确定了一种“临界学习速率”,超过该速率,ELR差异会扩大,这仅取决于当前的ELR。为了验证我们的发现,我们设计了一种无超参数的预热方法,在理论和实践中成功地快速最小化了ELR的传播。我们的实验将ELR传播与可训练性联系起来,这种关系在非常深的网络中最为明显,这些网络存在明显的梯度幅度偏移。
更新时间: 2024-05-24 14:12:25
领域: cs.LG,cs.AI
Transfer Learning with Informative Priors: Simple Baselines Better than Previously Reported
We pursue transfer learning to improve classifier accuracy on a target task with few labeled examples available for training. Recent work suggests that using a source task to learn a prior distribution over neural net weights, not just an initialization, can boost target task performance. In this study, we carefully compare transfer learning with and without source task informed priors across 5 datasets. We find that standard transfer learning informed by an initialization only performs far better than reported in previous comparisons. The relative gains of methods using informative priors over standard transfer learning vary in magnitude across datasets. For the scenario of 5-300 examples per class, we find negative or negligible gains on 2 datasets, modest gains (between 1.5-3 points of accuracy) on 2 other datasets, and substantial gains (>8 points) on one dataset. Among methods using informative priors, we find that an isotropic covariance appears competitive with learned low-rank covariance matrix while being substantially simpler to understand and tune. Further analysis suggests that the mechanistic justification for informed priors -- hypothesized improved alignment between train and test loss landscapes -- is not consistently supported due to high variability in empirical landscapes. We release code to allow independent reproduction of all experiments.
Updated: 2024-05-24 14:12:23
标题: 使用信息先验的迁移学习:简单基线胜过先前报道的方法
摘要: 我们追求迁移学习以提高在目标任务上分类器准确性,而训练中只有少量标记示例可用。最近的研究表明,使用源任务来学习神经网络权重的先验分布,而不仅仅是一个初始化,可以提高目标任务的性能。在这项研究中,我们仔细比较了在5个数据集上使用和不使用源任务先验的迁移学习。我们发现,仅使用初始化的标准迁移学习的表现远远优于以前的比较中报告的情况。使用信息先验方法相对于标准迁移学习的收益量在不同数据集之间有所不同。对于每类5-300个示例的情况,我们发现在2个数据集上出现负面或可忽略的收益,在另外2个数据集上出现适度收益(准确度提高了1.5-3个点),而在一个数据集上出现显著收益(>8个点)。在使用信息先验方法中,我们发现一个各向同性协方差与学习的低秩协方差矩阵相竞争,同时更简单易懂且易调整。进一步的分析表明,对于信息先验的机械化理由 -- 假定训练和测试损失景观之间的改进对齐 -- 由于实证景观的高变异性而得不到一致支持。我们发布代码以允许独立重现所有实验。
更新时间: 2024-05-24 14:12:23
领域: cs.LG
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{Self-Agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.
Updated: 2024-05-24 14:11:21
标题: 再问一次!自我一致性提高语言模型在(几乎)所有场景中的推理能力
摘要: 尽管链式思维(CoT)提示结合语言模型在复杂推理任务上取得了令人鼓舞的结果,但CoT提示中通常使用的贪婪解码会导致重复性和局部最优性。为解决这一缺点,集成优化尝试获得多个推理路径以获取最终答案组装。然而,当前的集成优化方法要么简单地采用基于规则的后处理,例如\textit{自一致性},要么基于几个与任务相关的人类注释训练额外模型来选择最佳的推理路径之一,但无法推广到现实设置中的情况,其中输入问题的类型未知或推理路径的答案格式未知。为避免它们的局限性,我们提出了\textbf{自协议},一种通用的集成优化方法,适用于几乎所有场景,其中输入问题的类型和推理路径的答案格式可能已知或未知。自协议首先从语言模型的解码器中采样,生成一个\textit{多样化}的推理路径集合,然后再提示语言模型\textit{一次},通过从采样的推理路径中选择最\textit{一致}的答案来确定最优答案。自协议同时在六个公共推理基准上取得了显著的性能,并具有卓越的泛化能力。
更新时间: 2024-05-24 14:11:21
领域: cs.CL,cs.AI
Discovering deposition process regimes: leveraging unsupervised learning for process insights, surrogate modeling, and sensitivity analysis
This work introduces a comprehensive approach utilizing data-driven methods to elucidate the deposition process regimes in Chemical Vapor Deposition (CVD) reactors and the interplay of physical mechanism that dominate in each one of them. Through this work, we address three key objectives. Firstly, our methodology relies on process outcomes, derived by a detailed CFD model, to identify clusters of "outcomes" corresponding to distinct process regimes, wherein the relative influence of input variables undergoes notable shifts. This phenomenon is experimentally validated through Arrhenius plot analysis, affirming the efficacy of our approach. Secondly, we demonstrate the development of an efficient surrogate model, based on Polynomial Chaos Expansion (PCE), that maintains accuracy, facilitating streamlined computational analyses. Finally, as a result of PCE, sensitivity analysis is made possible by means of Sobol' indices, that quantify the impact of process inputs across identified regimes. The insights gained from our analysis contribute to the formulation of hypotheses regarding phenomena occurring beyond the transition regime. Notably, the significance of temperature even in the diffusion-limited regime, as evidenced by the Arrhenius plot, suggests activation of gas phase reactions at elevated temperatures. Importantly, our proposed methods yield insights that align with experimental observations and theoretical principles, aiding decision-making in process design and optimization. By circumventing the need for costly and time-consuming experiments, our approach offers a pragmatic pathway towards enhanced process efficiency. Moreover, this study underscores the potential of data-driven computational methods for innovating reactor design paradigms.
Updated: 2024-05-24 14:10:22
标题: 发现沉积过程的制度:利用无监督学习进行过程洞察、替代建模和敏感性分析
摘要: 这项工作引入了一种综合方法,利用数据驱动的方法阐明化学气相沉积(CVD)反应器中的沉积过程制度以及在每个制度中占主导地位的物理机制的相互作用。通过这项工作,我们解决了三个关键目标。首先,我们的方法依赖于由详细的CFD模型导出的过程结果,以识别对应于不同过程制度的“结果”簇,其中输入变量的相对影响发生明显变化。通过阿伦尼乌斯图分析,实验证实了这一现象,证实了我们方法的有效性。其次,我们展示了基于多项式混沌展开(PCE)的高效替代模型的发展,该模型保持准确性,有助于简化计算分析。最后,由于PCE,通过Sobol'指数,敏感性分析成为可能,量化了跨过识别制度的过程输入的影响。我们分析获得的见解有助于提出关于发生在过渡制度之外的现象的假设。值得注意的是,即使在扩散受限制的制度中,如阿伦尼乌斯图所证明的,温度的重要性也在于表明在高温下激活气相反应。重要的是,我们提出的方法提供了与实验观察和理论原则一致的见解,有助于过程设计和优化中的决策制定。通过避免昂贵和耗时的实验,我们的方法为提高过程效率提供了务实的途径。此外,这项研究强调了数据驱动计算方法在创新反应器设计范式方面的潜力。
更新时间: 2024-05-24 14:10:22
领域: physics.chem-ph,cs.LG,cs.SY,eess.SY,stat.AP
Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients
Instrumental variables (IVs) provide a powerful strategy for identifying causal effects in the presence of unobservable confounders. Within the nonparametric setting (NPIV), recent methods have been based on nonlinear generalizations of Two-Stage Least Squares and on minimax formulations derived from moment conditions or duality. In a novel direction, we show how to formulate a functional stochastic gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk. We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance relative to current state-of-the-art alternatives. This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions, which may be suitable for structural equations beyond the setting of continuous outcomes and additive noise. Finally, we demonstrate this flexibility of our framework by presenting how it naturally addresses the important case of binary outcomes, which has received far less attention by recent developments in the NPIV literature.
Updated: 2024-05-24 14:09:40
标题: 非参数工具变量回归通过随机近似梯度
摘要: 仪器变量(IVs)为在存在不可观测混淆因素的情况下确定因果效应提供了强大的策略。在非参数设置(NPIV)中,最近的方法基于两阶段最小二乘的非线性推广以及从矩条件或对偶导出的极小化形式。在一个新颖的方向上,我们展示了如何通过直接最小化总体风险来制定一个功能随机梯度下降算法来解决NPIV回归问题。我们提供了关于超额风险的界限的理论支持,并进行了数值实验,展示我们的方法相对于当前最先进的替代方案具有更好的稳定性和竞争性表现。该算法使得可以选择灵活的估计器,如神经网络或基于核的方法,以及非二次损失函数,这对于超出连续结果和加性噪声设置的结构方程可能是合适的。最后,我们通过展示我们的框架如何自然地解决了二元结果的重要案例来展示我们的框架的灵活性,这在最近NPIV文献的发展中受到了远远较少的关注。
更新时间: 2024-05-24 14:09:40
领域: stat.ML,cs.LG
chainBoost: A Secure Performance Booster for Blockchain-based Resource Markets
Cryptocurrencies and blockchain technology provide an innovative model for reshaping digital services. Driven by the movement toward Web 3.0, recent systems started to provide distributed services, such as computation outsourcing or file storage, on top of the currency exchange medium. By allowing anyone to join and collect cryptocurrency payments for serving others, these systems create decentralized markets for trading digital resources. Yet, there is still a big gap between the promise of these markets and their practical viability. Existing initiatives are still early-stage and have already encountered security and efficiency obstacles. At the same time, existing work around promising ideas, specifically sidechains, fall short in exploiting their full potential in addressing these problems. To bridge this gap, we propose chainBoost, a secure performance booster for decentralized resource markets. It expedites service related operations, reduces the blockchain size, and supports flexible service-payment exchange modalities at low overhead. At its core, chainBoost employs a sidechain, that has a (security and semantic) mutual-dependence with the mainchain, to which the system offloads heavy/frequent operations. To enable it, we develop a novel sidechain architecture composed of temporary and permanent blocks, a block suppression mechanism to prune the sidechain, a syncing protocol to permit arbitrary data exchange between the two chains, and an autorecovery protocol to support robustness and resilience. We analyze the security of chainBoost, and implement a proof-of-concept prototype for a distributed file storage market as a use case. For a market handling around 2000 transactions per round, our experiments show up to 11x improvement in throughput and 94\% reduction in confirmation time. They also show that chainBoost can reduce the main blockchain size by around 90%.
Updated: 2024-05-24 14:08:38
标题: chainBoost:基于区块链资源市场的安全性能增强器
摘要: 加密货币和区块链技术为重塑数字服务提供了创新模式。受推动向Web 3.0的运动的影响,最近的系统开始提供分布式服务,例如计算外包或文件存储,建立在货币交易媒介之上。通过允许任何人加入并收取为他人提供服务的加密货币支付,这些系统创建了用于交易数字资源的去中心化市场。然而,这些市场的承诺与其实际可行性之间仍存在很大差距。现有的倡议仍处于早期阶段,并已遇到安全和效率障碍。同时,围绕有前途的想法的现有工作,特别是侧链,未能充分利用它们在解决这些问题方面的潜力。 为了弥合这一差距,我们提出了chainBoost,一个安全的性能增强器,用于去中心化资源市场。它加快了与服务相关的操作,减小了区块链的大小,并支持低开销的灵活服务支付交换模式。在其核心,chainBoost使用一个侧链,该侧链与主链具有(安全和语义)相互依赖关系,系统将频繁/繁重操作卸载到该侧链。为了实现这一点,我们开发了一个由临时和永久块组成的新颖侧链架构,一个修剪侧链的块抑制机制,一个同步协议以允许两个链之间的任意数据交换,以及一个自动恢复协议以支持强大和韧性。我们分析了chainBoost的安全性,并为分布式文件存储市场实现了一个概念验证原型作为一个使用案例。对于每轮处理大约2000笔交易的市场,我们的实验显示吞吐量提高了多达11倍,确认时间减少了94%。它们还表明,chainBoost可以将主区块链的大小减少约90%。
更新时间: 2024-05-24 14:08:38
领域: cs.CR
Generating density nowcasts for U.S. GDP growth with deep learning: Bayes by Backprop and Monte Carlo dropout
Recent results in the literature indicate that artificial neural networks (ANNs) can outperform the dynamic factor model (DFM) in terms of the accuracy of GDP nowcasts. Compared to the DFM, the performance advantage of these highly flexible, nonlinear estimators is particularly evident in periods of recessions and structural breaks. From the perspective of policy-makers, however, nowcasts are the most useful when they are conveyed with uncertainty attached to them. While the DFM and other classical time series approaches analytically derive the predictive (conditional) distribution for GDP growth, ANNs can only produce point nowcasts based on their default training procedure (backpropagation). To fill this gap, first in the literature, we adapt two different deep learning algorithms that enable ANNs to generate density nowcasts for U.S. GDP growth: Bayes by Backprop and Monte Carlo dropout. The accuracy of point nowcasts, defined as the mean of the empirical predictive distribution, is evaluated relative to a naive constant growth model for GDP and a benchmark DFM specification. Using a 1D CNN as the underlying ANN architecture, both algorithms outperform those benchmarks during the evaluation period (2012:Q1 -- 2022:Q4). Furthermore, both algorithms are able to dynamically adjust the location (mean), scale (variance), and shape (skew) of the empirical predictive distribution. The results indicate that both Bayes by Backprop and Monte Carlo dropout can effectively augment the scope and functionality of ANNs, rendering them a fully compatible and competitive alternative for classical time series approaches.
Updated: 2024-05-24 14:06:08
标题: 使用深度学习生成美国GDP增长的密度预测:贝叶斯反向传播和蒙特卡洛辍学
摘要: 最近文献中的结果表明,人工神经网络(ANNs)在国内生产总值(GDP)现在预测的准确性方面可以胜过动态因子模型(DFM)。与DFM相比,这些高度灵活、非线性的估计器的性能优势在经济衰退和结构性转变期间尤为明显。然而,从政策制定者的角度来看,现在预测在传达时应该附加不确定性。虽然DFM和其他经典时间序列方法可以分析地推导出GDP增长的预测(条件)分布,但ANNs只能基于它们的默认训练过程(反向传播)产生点预测。为了填补这一空白,我们首次在文献中采用了两种不同的深度学习算法,使ANNs能够为美国GDP增长生成密度预测:Bayes by Backprop和Monte Carlo dropout。点预测的准确性,定义为经验预测分布的均值,相对于GDP的朴素恒定增长模型和基准DFM规范进行评估。在评估期间(2012年第一季度至2022年第四季度)使用1D CNN作为基础ANN架构,两种算法均优于这些基准。此外,两种算法都能够动态调整经验预测分布的位置(均值)、尺度(方差)和形状(偏度)。结果表明,Bayes by Backprop和Monte Carlo dropout都能有效地扩展ANNs的范围和功能,使其成为经典时间序列方法的一个完全兼容且有竞争力的替代方案。
更新时间: 2024-05-24 14:06:08
领域: econ.EM,cs.AI,cs.LG
Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation
The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can result in unexpected behavior in the task of style transfer, for example. In this research, we introduce extended flow matching (EFM), a direct extension of flow matching that learns a ``matrix field'' corresponding to the continuous map from the space of conditions to the space of distributions. We show that we can introduce inductive bias to the conditional generation through the matrix field and demonstrate this fact with MMOT-EFM, a version of EFM that aims to minimize the Dirichlet energy or the sensitivity of the distribution with respect to conditions. We will present our theory along with experimental results that support the competitiveness of EFM in conditional generation.
Updated: 2024-05-24 14:03:04
标题: 扩展流匹配:一种具有广义连续性方程的条件生成方法
摘要: 条件生成的任务是生成模型中最重要的应用之一,迄今已基于著名的流模型开发了许多方法。然而,今天许多使用的流模型并未建立允许引入明确的归纳偏差以指导生成条件分布随条件变化的方式。这可能导致在风格转移任务中出现意外行为。在这项研究中,我们介绍了扩展流匹配(EFM),这是流匹配的直接扩展,学习了对应于从条件空间到分布空间的连续映射的“矩阵场”。我们展示通过矩阵场可以引入归纳偏差到条件生成中,并通过MMOT-EFM展示了这一事实,该版本的EFM旨在最小化与条件相关的分布的狄利克雷能量或灵敏度。我们将展示我们的理论以及支持EFM在条件生成中竞争力的实验结果。
更新时间: 2024-05-24 14:03:04
领域: cs.LG,math.AP,math.FA,math.OC,math.PR,68T07 (Primary), 49Q22 (Secondary)
Randomized heuristic repair for large-scale multidimensional knapsack problem
The multidimensional knapsack problem (MKP) is an NP-hard combinatorial optimization problem whose solution is determining a subset of maximum total profit items that do not violate capacity constraints. Due to its hardness, large-scale MKP instances are usually a target for metaheuristics, a context in which effective feasibility maintenance strategies are crucial. In 1998, Chu and Beasley proposed an effective heuristic repair that is still relevant for recent metaheuristics. However, due to its deterministic nature, the diversity of solutions such heuristic provides is insufficient for long runs. As a result, the search for new solutions ceases after a while. This paper proposes an efficiency-based randomization strategy for the heuristic repair that increases the variability of the repaired solutions without deteriorating quality and improves the overall results.
Updated: 2024-05-24 14:01:05
标题: 随机启发式修复方法用于大规模多维背包问题
摘要: 多维背包问题(MKP)是一个NP难的组合优化问题,其解决方案是确定不违反容量约束的最大总利润项目的子集。由于其困难性,大规模MKP实例通常是元启发式的目标,这种情况下有效的可行性维护策略至关重要。1998年,Chu和Beasley提出了一种有效的启发式修复方法,至今仍然适用于最近的元启发式。然而,由于其确定性性质,此类启发式提供的解决方案的多样性对于长时间运行是不足够的。因此,一段时间后对新解决方案的搜索停止。本文提出了一种基于效率的随机化策略,用于启发式修复,增加修复解的变化性而不降低质量,并提高整体结果。
更新时间: 2024-05-24 14:01:05
领域: cs.AI,cs.NE
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms. Project website with videos: https://dub.sh/omniepic
Updated: 2024-05-24 13:57:32
标题: OMNI-EPIC:通过编程代码模拟人类有趣概念的环境实现开放性
摘要: 开放式和人工智能生成算法的目标是不断生成和解决日益复杂的任务,为通向更普遍智能的有前途之路。为了实现这一宏伟愿景,学习必须在广泛的潜在任务数组中发生。现有的自动生成环境的方法受制于手动预定义的、通常是狭窄的环境分布,限制了它们创建任何学习环境的能力。为了解决这一限制,我们引入了一个新颖的框架OMNI-EPIC,通过人类有趣性概念模型(OMNI)和编程代码环境(EPIC)的开放性工作来增强以前的工作。OMNI-EPIC利用基础模型自动生成代码,指定下一个可学习(即,对于代理当前的技能集来说既不太容易也不太困难)和有趣的(例如,有价值和新颖的)任务。OMNI-EPIC生成环境(例如,障碍赛道)和奖励功能(例如,快速通过障碍赛道而不触碰红色物体),原则上可以创建任何可模拟的学习任务。我们展示了OMNI-EPIC的爆炸性创造力,不断创新提出新的有趣的学习挑战。我们还强调了OMNI-EPIC如何适应强化学习代理的学习进度,生成适当难度的任务。总的来说,OMNI-EPIC可以无休止地创建可学习和有趣的环境,进一步推动自我改进的人工智能系统和人工智能生成算法的发展。项目网站及视频链接:https://dub.sh/omniepic
更新时间: 2024-05-24 13:57:32
领域: cs.AI
Solving Partial Differential Equations with Equivariant Extreme Learning Machines
We utilize extreme-learning machines for the prediction of partial differential equations (PDEs). Our method splits the state space into multiple windows that are predicted individually using a single model. Despite requiring only few data points (in some cases, our method can learn from a single full-state snapshot), it still achieves high accuracy and can predict the flow of PDEs over long time horizons. Moreover, we show how additional symmetries can be exploited to increase sample efficiency and to enforce equivariance.
Updated: 2024-05-24 13:53:42
标题: 用等变极端学习机解决偏微分方程
摘要: 我们利用极限学习机来预测偏微分方程(PDEs)。我们的方法将状态空间分割成多个窗口,分别使用单一模型进行预测。尽管只需要少量数据点(在某些情况下,我们的方法可以从单个完整状态快照中学习),它仍然能够实现高准确度,并且可以预测PDEs在长时间范围内的流动。此外,我们展示了如何利用额外的对称性来增加样本效率,并强制实现等变性。
更新时间: 2024-05-24 13:53:42
领域: cs.LG
Rethinking Independent Cross-Entropy Loss For Graph-Structured Data
Graph neural networks (GNNs) have exhibited prominent performance in learning graph-structured data. Considering node classification task, based on the i.i.d assumption among node labels, the traditional supervised learning simply sums up cross-entropy losses of the independent training nodes and applies the average loss to optimize GNNs' weights. But different from other data formats, the nodes are naturally connected. It is found that the independent distribution modeling of node labels restricts GNNs' capability to generalize over the entire graph and defend adversarial attacks. In this work, we propose a new framework, termed joint-cluster supervised learning, to model the joint distribution of each node with its corresponding cluster. We learn the joint distribution of node and cluster labels conditioned on their representations, and train GNNs with the obtained joint loss. In this way, the data-label reference signals extracted from the local cluster explicitly strengthen the discrimination ability on the target node. The extensive experiments demonstrate that our joint-cluster supervised learning can effectively bolster GNNs' node classification accuracy. Furthermore, being benefited from the reference signals which may be free from spiteful interference, our learning paradigm significantly protects the node classification from being affected by the adversarial attack.
Updated: 2024-05-24 13:52:41
标题: 重新思考图结构数据的独立交叉熵损失
摘要: 图神经网络(GNNs)在学习图结构数据方面表现出色。考虑到节点分类任务,基于节点标签之间的独立同分布假设,传统的监督学习简单地将独立训练节点的交叉熵损失相加,并将平均损失应用于优化GNNs的权重。但与其他数据格式不同,节点自然地连接在一起。研究发现,节点标签的独立分布建模限制了GNNs在整个图上进行泛化和抵御对抗攻击的能力。在这项工作中,我们提出了一个新框架,称为联合簇监督学习,来模拟每个节点与其对应簇的联合分布。我们学习节点和簇标签的联合分布,条件是它们的表示,并使用获得的联合损失训练GNNs。这样,从本地簇中提取的数据标签参考信号明确地增强了目标节点的区分能力。广泛的实验表明,我们的联合簇监督学习可以有效地增强GNNs的节点分类准确性。此外,受益于可能不受恶意干扰的参考信号,我们的学习范式显著保护了节点分类免受对抗攻击的影响。
更新时间: 2024-05-24 13:52:41
领域: cs.LG,cs.AI
When Generative AI Meets Workplace Learning: Creating A Realistic & Motivating Learning Experience With A Generative PCA
Workplace learning is used to train employees systematically, e.g., via e-learning or in 1:1 training. However, this is often deemed ineffective and costly. Whereas pure e-learning lacks the possibility of conversational exercise and personal contact, 1:1 training with human instructors involves a high level of personnel and organizational costs. Hence, pedagogical conversational agents (PCAs), based on generative AI, seem to compensate for the disadvantages of both forms. Following Action Design Research, this paper describes an organizational communication training with a Generative PCA (GenPCA). The evaluation shows promising results: the agent was perceived positively among employees and contributed to an improvement in self-determined learning. However, the integration of such agent comes not without limitations. We conclude with suggestions concerning the didactical methods, which are supported by a GenPCA, and possible improvements of such an agent for workplace learning.
Updated: 2024-05-24 13:49:18
标题: 当生成式人工智能遇见职场学习:利用生成式PCA创造一个真实且激励学习体验
摘要: 工作场所学习被用来系统地培训员工,例如通过电子学习或1:1培训。然而,这经常被认为是无效和昂贵的。纯粹的电子学习缺乏对话练习和个人接触的可能性,而带有人类教练的1:1培训涉及高水平的人员和组织成本。因此,基于生成式人工智能的教育性对话代理(PCA)似乎能够弥补这两种形式的缺点。本文遵循行动设计研究,描述了一个具有生成式PCA(GenPCA)的组织沟通培训。评估显示了令人鼓舞的结果:员工对代理的看法是积极的,并有助于提高自主学习。然而,这种代理的整合并不是没有限制的。我们最后提出了关于教学方法的建议,这些建议得到了GenPCA的支持,以及关于改进工作场所学习中这种代理的可能改进。
更新时间: 2024-05-24 13:49:18
领域: cs.HC,cs.AI
Accelerating Relative Entropy Coding with Space Partitioning
Relative entropy coding (REC) algorithms encode a random sample following a target distribution $Q$, using a coding distribution $P$ shared between the sender and receiver. Sadly, general REC algorithms suffer from prohibitive encoding times, at least on the order of $2^{D_{\text{KL}}[Q||P]}$, and faster algorithms are limited to very specific settings. This work addresses this issue by introducing a REC scheme utilizing space partitioning to reduce runtime in practical scenarios. We provide theoretical analyses of our method and demonstrate its effectiveness with both toy examples and practical applications. Notably, our method successfully handles REC tasks with $D_{\text{KL}}[Q||P]$ about three times greater than what previous methods can manage, and reduces the bitrate by approximately 5-15% in VAE-based lossless compression on MNIST and INR-based lossy compression on CIFAR-10, compared to previous methods, significantly improving the practicality of REC for neural compression.
Updated: 2024-05-24 13:45:20
标题: 加速相对熵编码与空间分区
摘要: 相对熵编码(REC)算法使用发送者和接收者共享的编码分布$P$对符合目标分布$Q$的随机样本进行编码。不幸的是,一般的REC算法在编码时间上存在严重的限制,至少在$2^{D_{\text{KL}}[Q||P]}$的数量级上,而更快的算法只适用于非常特定的情况。本文通过引入利用空间划分来减少实际场景中运行时间的REC方案来解决这个问题。我们对我们的方法进行了理论分析,并通过玩具示例和实际应用程序展示了其有效性。值得注意的是,我们的方法成功处理了$D_{\text{KL}}[Q||P]$大约是之前方法所能处理的三倍的REC任务,并在MNIST和CIFAR-10上基于VAE的无损压缩和基于INR的有损压缩中,将比特率降低了大约5-15%,相比之前的方法,显著提高了REC在神经压缩中的实用性。
更新时间: 2024-05-24 13:45:20
领域: cs.IT,cs.LG,math.IT
Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers
Large linear systems are ubiquitous in modern computational science. The main recipe for solving them is iterative solvers with well-designed preconditioners. Deep learning models may be used to precondition residuals during iteration of such linear solvers as the conjugate gradient (CG) method. Neural network models require an enormous number of parameters to approximate well in this setup. Another approach is to take advantage of small graph neural networks (GNNs) to construct preconditioners of the predefined sparsity pattern. In our work, we recall well-established preconditioners from linear algebra and use them as a starting point for training the GNN. Numerical experiments demonstrate that our approach outperforms both classical methods and neural network-based preconditioning. We also provide a heuristic justification for the loss function used and validate our approach on complex datasets.
Updated: 2024-05-24 13:44:30
标题: 学习线性代数:一种图神经网络方法用于共轭梯度求解器的预处理器设计
摘要: 现代计算科学中普遍存在大规模线性系统。解决这些系统的主要方法是使用设计良好的迭代求解器预处理。深度学习模型可以用来在共轭梯度(CG)方法等线性求解器迭代过程中预处理残差。神经网络模型需要大量参数来在这种设置中进行良好逼近。另一种方法是利用小型图神经网络(GNNs)来构建预定义稀疏模式的预处理器。在我们的工作中,我们回顾了线性代数中已建立的预处理器,并将它们作为训练GNN的起点。数值实验证明,我们的方法在性能上优于传统方法和基于神经网络的预处理方法。我们还提供了对所使用的损失函数的启发式解释,并在复杂数据集上验证了我们的方法。
更新时间: 2024-05-24 13:44:30
领域: cs.LG,cs.NA,math.NA
Certifiably Robust RAG against Retrieval Corruption
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.
Updated: 2024-05-24 13:44:25
标题: 可靠的抗检索破坏的RAG
摘要: 检索增强生成(RAG)已被证明容易受到检索损坏攻击的影响:攻击者可以将恶意段落注入到检索结果中,以引导不准确的响应。在本文中,我们提出了RobustRAG作为第一个针对检索损坏攻击的防御框架。RobustRAG的关键洞察是一个隔离-然后聚合的策略:我们从每个段落中独立获取LLM响应,然后安全地聚合这些独立的响应。为了实现RobustRAG,我们设计了基于关键字和解码的算法,用于安全地聚合非结构化文本响应。值得注意的是,RobustRAG可以实现可证实的鲁棒性:我们可以正式证明和证明,对于某些查询,RobustRAG总是可以返回准确的响应,即使攻击者完全了解我们的防御并可以任意注入少量恶意段落。我们在开放领域问答和长文本生成数据集上评估了RobustRAG,并展示了其在各种任务和数据集上的有效性和泛化能力。
更新时间: 2024-05-24 13:44:25
领域: cs.LG,cs.CL,cs.CR
Instantiations and Computational Aspects of Non-Flat Assumption-based Argumentation
Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. In this paper, we study an instantiation-based approach for reasoning in possibly non-flat ABA. We make use of a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). By utilizing compilability theory, we establish that the constructed BAFs will in general be of exponential size. In order to keep the number of arguments and computational cost low, we present three ways of identifying redundant arguments. Moreover, we identify fragments of ABA which admit a poly-sized instantiation. We propose two algorithmic approaches for reasoning in possibly non-flat ABA. The first approach utilizes the BAF instantiation while the second works directly without constructing arguments. An empirical evaluation shows that the former outperforms the latter on many instances, reflecting the lower complexity of BAF reasoning. This result is in contrast to flat ABA, where direct approaches dominate instantiation-based approaches.
Updated: 2024-05-24 13:42:44
标题: 非平面假设论证的实例化和计算方面
摘要: 现有的大多数基于假设的论证(ABA)的计算工具都专注于所谓的平面框架,忽略了更一般的情况。在本文中,我们研究了一种基于实例的方法,用于推理可能非平面的ABA。我们利用ABA和双极论证框架(BAFs)之间的语义保持转换。通过利用可编译性理论,我们建立了构建的BAFs通常会呈指数大小。为了保持论点数量和计算成本低,我们提出了三种识别冗余论点的方法。此外,我们确定了一些允许多项实例化的ABA片段。我们提出了两种算法方法,用于推理可能非平面的ABA。第一种方法利用BAF实例化,而第二种方法直接工作,无需构建论点。经验评估表明,在许多实例中,前者优于后者,反映了BAF推理的较低复杂性。这一结果与平面ABA相反,直接方法主导实例化方法。
更新时间: 2024-05-24 13:42:44
领域: cs.AI
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
The pretrain+fine-tune paradigm is foundational in deploying large language models (LLMs) across a diverse range of downstream applications. Among these, Low-Rank Adaptation (LoRA) stands out for its parameter-efficient fine-tuning (PEFT), producing numerous off-the-shelf task-specific LoRA adapters. However, this approach requires explicit task intention selection, posing challenges for automatic task sensing and switching during inference with multiple existing LoRA adapters embedded in a single LLM. In this work, we introduce MeteoRA (Multiple-Tasks embedded LoRA), a scalable multi-knowledge LoRA fusion framework designed for LLMs. MeteoRA integrates various LoRA adapters in a Mixture-of-Experts (MoE) style into the base LLM, enabling the model to automatically select the most pertinent adapter based on the task input. This advancement significantly enhances the LLM's capability to handle composite tasks that require different adapters to solve various components of the problem. Our evaluations, featuring the LlaMA2-13B and LlaMA3-8B base models equipped with off-the-shelf 28 LoRA adapters through MeteoRA, demonstrate equivalent performance with the individual adapters. Furthermore, both base models equipped with MeteoRA achieve superior performance in sequentially solving composite tasks with ten problems in only a single inference process, highlighting the ability of timely intention switching in MeteoRA embedded LLMs.
Updated: 2024-05-24 13:38:54
标题: MeteoRA:用于大型语言模型的多任务嵌入式LoRA
摘要: 预训练+微调范式是在各种下游应用中部署大型语言模型(LLMs)的基础。在这些模型中,低秩适应(LoRA)因其参数高效的微调(PEFT)而脱颖而出,产生了许多即插即用的特定任务LoRA适配器。然而,这种方法需要显式的任务意图选择,在推理过程中具有多个现有LoRA适配器嵌入在单个LLM中时,会带来自动任务感知和切换的挑战。在这项工作中,我们介绍了MeteoRA(多任务嵌入LoRA),这是一个为LLMs设计的可扩展的多知识LoRA融合框架。MeteoRA将各种LoRA适配器以Mixture-of-Experts(MoE)风格集成到基础LLM中,使模型能够根据任务输入自动选择最相关的适配器。这一进步显著增强了LLM处理需要不同适配器来解决问题各个组成部分的复合任务的能力。我们的评估显示,配备了28个即插即用LoRA适配器的LlaMA2-13B和LlaMA3-8B基础模型通过MeteoRA展示了与单个适配器相当的性能。此外,配备MeteoRA的两个基础模型在顺序解决包含十个问题的复合任务时取得了更优秀的性能,突出了MeteoRA嵌入LLM中及时意图切换的能力。
更新时间: 2024-05-24 13:38:54
领域: cs.CL,cs.AI,I.2.7
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Finetuning large language models (LLMs) in federated learning (FL) settings has become important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy. This work introduces Spry, an FL algorithm that splits trainable weights of an LLM among participating clients, such that each client computes gradients using Forward-mode AD that are closer estimates of the true gradients. Spry achieves a low memory footprint, high accuracy, and fast convergence. We theoretically show that the global gradients in Spry are unbiased estimates of true global gradients for homogeneous data distributions across clients, while heterogeneity increases bias of the estimates. We also derive Spry's convergence rate, showing that the gradients decrease inversely proportional to the number of FL rounds, indicating the convergence up to the limits of heterogeneity. Empirically, Spry reduces the memory footprint during training by 1.4-7.1$\times$ in contrast to backpropagation, while reaching comparable accuracy, across a wide range of language tasks, models, and FL settings. Spry reduces the convergence time by 1.2-20.3$\times$ and achieves 5.2-13.5\% higher accuracy against state-of-the-art zero-order methods. When finetuning Llama2-7B with LoRA, compared to the peak memory usage of 33.9GB of backpropagation, Spry only consumes 6.2GB of peak memory. For OPT13B, the reduction is from 76.5GB to 10.8GB. Spry makes feasible previously impossible FL deployments on commodity mobile and edge devices. Source code is available at https://github.com/Astuary/Spry.
Updated: 2024-05-24 13:37:48
标题: 向前思考:内存高效的联邦微调语言模型
摘要: 在联邦学习(FL)设置中对大型语言模型(LLMs)进行微调变得越来越重要,因为它使资源受限的设备能够使用私人数据对模型进行微调。然而,使用反向传播对LLMs进行微调需要大量内存(尤其是来自中间激活的内存)对资源受限的设备来说过多。虽然正向模式自动微分(AD)可以减少激活的内存占用,我们注意到直接将其应用于LLM微调会导致收敛缓慢和精度低下。这项工作引入了Spry,一种FL算法,将LLM的可训练权重分配给参与的客户端,使得每个客户端使用正向模式AD计算梯度,这些梯度更接近真实梯度的估计。Spry实现了低内存占用、高精度和快速收敛。我们在理论上表明,对于客户端之间具有同质数据分布的情况,Spry中的全局梯度是真实全局梯度的无偏估计,而异质性会增加估计的偏差。我们还推导了Spry的收敛速度,表明梯度随着FL轮次的增加而逆比例减少,表明收敛达到异质性的极限。在实践中,与反向传播相比,Spry在训练期间将内存占用减少了1.4-7.1倍,同时在广泛的语言任务、模型和FL设置中达到了可比较的精度。Spry将收敛时间减少了1.2-20.3倍,并在与最先进的零阶方法相比实现了5.2-13.5\%的更高精度。将Llama2-7B与LoRA进行微调时,与反向传播的峰值内存使用量33.9GB相比,Spry仅消耗6.2GB的峰值内存。对于OPT13B,这一减少从76.5GB减少到10.8GB。Spry使先前在普通移动和边缘设备上无法实现的FL部署成为可能。源代码可在https://github.com/Astuary/Spry获取。
更新时间: 2024-05-24 13:37:48
领域: cs.LG
Concept-based Explainable Malignancy Scoring on Pulmonary Nodules in CT Images
To increase the transparency of modern computer-aided diagnosis (CAD) systems for assessing the malignancy of lung nodules, an interpretable model based on applying the generalized additive models and the concept-based learning is proposed. The model detects a set of clinically significant attributes in addition to the final malignancy regression score and learns the association between the lung nodule attributes and a final diagnosis decision as well as their contributions into the decision. The proposed concept-based learning framework provides human-readable explanations in terms of different concepts (numerical and categorical), their values, and their contribution to the final prediction. Numerical experiments with the LIDC-IDRI dataset demonstrate that the diagnosis results obtained using the proposed model, which explicitly explores internal relationships, are in line with similar patterns observed in clinical practice. Additionally, the proposed model shows the competitive classification and the nodule attribute scoring performance, highlighting its potential for effective decision-making in the lung nodule diagnosis.
Updated: 2024-05-24 13:36:44
标题: 基于概念的CT图像中肺结节恶性评分的可解释性研究
摘要: 为了提高现代计算机辅助诊断(CAD)系统评估肺结节恶性程度的透明度,提出了一种基于应用广义加性模型和基于概念学习的可解释模型。该模型检测了一组临床显著属性,除了最终的恶性回归分数,还学习了肺结节属性与最终诊断决定之间的关联以及它们对决策的贡献。提出的基于概念学习框架提供了以不同概念(数值和分类)为单位的人类可读解释,它们的值以及它们对最终预测的贡献。使用LIDC-IDRI数据集进行的数值实验表明,使用提出的模型获得的诊断结果,明确探索了内部关系的模式与临床实践中观察到的相似模式一致。此外,提出的模型展示了竞争性分类和结节属性评分性能,突显了其在肺结节诊断中有效决策制定的潜力。
更新时间: 2024-05-24 13:36:44
领域: eess.IV,cs.AI,cs.LG
PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling
Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfaces. Here, we investigate an inductive bias approach that is versatile and generalizable to model generic nonlinear field evolution problems. Our study focuses on the recent physics-aware recurrent convolutions (PARC), which incorporates a differentiator-integrator architecture that inductively models the spatiotemporal dynamics of generic physical systems. We extend the capabilities of PARC to simulate unsteady, transient, and advection-dominant systems. The extended model, referred to as PARCv2, is equipped with differential operators to model advection-reaction-diffusion equations, as well as a hybrid integral solver for stable, long-time predictions. PARCv2 is tested on both standard benchmark problems in fluid dynamics, namely Burgers and Navier-Stokes equations, and then applied to more complex shock-induced reaction problems in energetic materials. We evaluate the behavior of PARCv2 in comparison to other physics-informed and learning bias models and demonstrate its potential to model unsteady and advection-dominant dynamics regimes.
Updated: 2024-05-24 13:35:59
标题: PARCv2:用于时空动态建模的物理感知循环卷积神经网络
摘要: 建模不稳定、快速瞬态和平流主导的物理问题对于物理感知深度学习(PADL)来说是一个紧迫的挑战。复杂系统的物理由大型偏微分方程(PDEs)系统和具有非线性结构的辅助构成模型所控制,以及展现出急剧梯度和快速变形材料界面的演变状态场。在这里,我们研究了一种具有通用性且可泛化的归纳偏差方法,用于建模通用的非线性场演化问题。我们的研究集中在最近提出的物理感知递归卷积(PARC)上,该模型融合了一个不同iator-integrator架构,归纳地建模了通用物理系统的时空动态。我们将PARC的能力扩展到模拟不稳定、瞬态和平流主导系统。扩展模型被称为PARCv2,配备了微分算子来模拟平流-反应-扩散方程,以及一个混合积分求解器用于稳定、长时间预测。PARCv2在流体动力学中的标准基准问题上进行了测试,即Burgers和Navier-Stokes方程,然后应用于更复杂的能量材料中受冲击诱导的反应问题。我们将PARCv2的行为与其他物理感知和学习偏差模型进行了评估,并展示了它在建模不稳定和平流主导动力学状态下的潜力。
更新时间: 2024-05-24 13:35:59
领域: cs.LG
Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations
In practical distributed systems, workers are typically not homogeneous, and due to differences in hardware configurations and network conditions, can have highly varying processing times. We consider smooth nonconvex finite-sum (empirical risk minimization) problems in this setup and introduce a new parallel method, Freya PAGE, designed to handle arbitrarily heterogeneous and asynchronous computations. By being robust to "stragglers" and adaptively ignoring slow computations, Freya PAGE offers significantly improved time complexity guarantees compared to all previous methods, including Asynchronous SGD, Rennala SGD, SPIDER, and PAGE, while requiring weaker assumptions. The algorithm relies on novel generic stochastic gradient collection strategies with theoretical guarantees that can be of interest on their own, and may be used in the design of future optimization methods. Furthermore, we establish a lower bound for smooth nonconvex finite-sum problems in the asynchronous setup, providing a fundamental time complexity limit. This lower bound is tight and demonstrates the optimality of Freya PAGE in the large-scale regime, i.e., when $\sqrt{m} \geq n$, where $n$ is # of workers, and $m$ is # of data samples.
Updated: 2024-05-24 13:33:30
标题: 弗雷亚页面:大规模非凸有限和优化中异步计算的首个最优时间复杂度
摘要: 在实际的分布式系统中,工作节点通常不是同质的,由于硬件配置和网络条件的差异,处理时间可能会有很大的变化。我们在这种设置下考虑光滑非凸有限和(经验风险最小化)问题,并引入了一种新的并行方法,Freya PAGE,旨在处理任意异构和异步计算。通过对“慢速”计算具有鲁棒性并自适应地忽略慢速计算,Freya PAGE相比于所有先前的方法,包括异步SGD、Rennala SGD、SPIDER和PAGE,提供了显著改进的时间复杂性保证,同时需要更弱的假设。该算法依赖于具有理论保证的新颖的通用随机梯度收集策略,这些策略本身可能具有兴趣,并且可以在未来优化方法的设计中使用。此外,我们在异步设置下为光滑非凸有限和问题建立了一个下界,提供了一个基本的时间复杂性限制。这个下界是紧密的,并展示了Freya PAGE在大规模情况下的最优性,即当$\sqrt{m} \geq n$时,其中$n$是工作者数量,$m$是数据样本数量。
更新时间: 2024-05-24 13:33:30
领域: math.OC,cs.LG,stat.ML
Knowledge-enhanced Relation Graph and Task Sampling for Few-shot Molecular Property Prediction
Recently, few-shot molecular property prediction (FSMPP) has garnered increasing attention. Despite impressive breakthroughs achieved by existing methods, they often overlook the inherent many-to-many relationships between molecules and properties, which limits their performance. For instance, similar substructures of molecules can inspire the exploration of new compounds. Additionally, the relationships between properties can be quantified, with high-related properties providing more information in exploring the target property than those low-related. To this end, this paper proposes a novel meta-learning FSMPP framework (KRGTS), which comprises the Knowledge-enhanced Relation Graph module and the Task Sampling module. The knowledge-enhanced relation graph module constructs the molecule-property multi-relation graph (MPMRG) to capture the many-to-many relationships between molecules and properties. The task sampling module includes a meta-training task sampler and an auxiliary task sampler, responsible for scheduling the meta-training process and sampling high-related auxiliary tasks, respectively, thereby achieving efficient meta-knowledge learning and reducing noise introduction. Empirically, extensive experiments on five datasets demonstrate the superiority of KRGTS over a variety of state-of-the-art methods. The code is available in https://github.com/Vencent-Won/KRGTS-public.
Updated: 2024-05-24 13:31:19
标题: 知识增强关系图和任务采样用于少样本分子性质预测
摘要: 最近,少样本分子性质预测(FSMPP)受到越来越多的关注。尽管现有方法取得了令人印象深刻的突破,但它们通常忽视了分子和性质之间固有的多对多关系,从而限制了它们的性能。例如,分子的相似亚结构可以激发对新化合物的探索。此外,性质之间的关系可以被量化,相关性高的性质比相关性低的性质在探索目标性质时提供更多信息。因此,本文提出了一种新颖的元学习FSMPP框架(KRGTS),包括增强知识关系图模块和任务采样模块。增强知识关系图模块构建了分子-性质多关系图(MPMRG),以捕捉分子和性质之间的多对多关系。任务采样模块包括元训练任务采样器和辅助任务采样器,分别负责调度元训练过程和采样高相关辅助任务,从而实现高效的元知识学习并减少噪声引入。在五个数据集上的大量实验证明了KRGTS相对于各种最先进方法的优越性。代码可在https://github.com/Vencent-Won/KRGTS-public找到。
更新时间: 2024-05-24 13:31:19
领域: q-bio.QM,cs.AI,cs.LG
SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing
Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential. However, spectrum sensing in space is more challenging than in terrestrial networks due to variable channel conditions, making single-satellite sensing unstable. Therefore, we first attempt to design a collaborative sensing scheme utilizing diverse data from multiple satellites. However, it is non-trivial to achieve this collaboration due to heterogeneous channel quality, considerable raw sampling data, and packet loss. To address the above challenges, we first establish connections between the satellites by modeling their sensing data as a graph and devising a graph neural network-based algorithm to achieve effective spectrum sensing. Meanwhile, we establish a joint sub-Nyquist sampling and autoencoder data compression framework to reduce the amount of transmitted sensing data. Finally, we propose a contrastive learning-based mechanism compensates for missing packets. Extensive experiments demonstrate that our proposed strategy can achieve efficient spectrum sensing performance and outperform the conventional deep learning algorithm in spectrum sensing accuracy.
Updated: 2024-05-24 13:29:57
标题: SATSense:用于频谱感知的多卫星协作框架
摘要: 最近,低地球轨道卫星互联网已经部署,利用非地面网络提供全球服务。随着非地面和地面网络的大规模部署,有限的频谱资源将无法得到充分分配。因此,在相同频谱中它们的共存中动态频谱共享至关重要,准确的频谱感知是必不可少的。然而,在空间中的频谱感知比在地面网络中更具挑战性,因为通道条件是不稳定的,使得单个卫星感知不稳定。因此,我们首先尝试设计一种利用多个卫星的多样数据的协作感知方案。然而,由于异构通道质量、大量原始采样数据和数据包丢失,要实现这种协作并不容易。为了解决上述挑战,我们首先通过将它们的感知数据建模为图,并设计基于图神经网络的算法建立卫星之间的连接,以实现有效的频谱感知。同时,我们建立了一个联合次纳奎斯特采样和自编码器数据压缩框架,以减少传输的感知数据量。最后,我们提出了一种基于对比学习的机制来补偿丢失的数据包。大量实验表明,我们提出的策略可以实现高效的频谱感知性能,并在频谱感知准确性方面胜过传统的深度学习算法。
更新时间: 2024-05-24 13:29:57
领域: cs.NI,cs.DC,cs.LG,eess.SP
Bundle Neural Networks for message diffusion on graphs
The dominant paradigm for learning on graph-structured data is message passing. Despite being a strong inductive bias, the local message passing mechanism suffers from pathological issues such as over-smoothing, over-squashing, and limited node-level expressivity. To address these limitations we propose Bundle Neural Networks (BuNN), a new type of GNN that operates via message diffusion over flat vector bundles - structures analogous to connections on Riemannian manifolds that augment the graph by assigning to each node a vector space and an orthogonal map. A BuNN layer evolves the features according to a diffusion-type partial differential equation. When discretized, BuNNs are a special case of Sheaf Neural Networks (SNNs), a recently proposed MPNN capable of mitigating over-smoothing. The continuous nature of message diffusion enables BuNNs to operate on larger scales of the graph and, therefore, to mitigate over-squashing. Finally, we prove that BuNN can approximate any feature transformation over nodes on any (potentially infinite) family of graphs given injective positional encodings, resulting in universal node-level expressivity. We support our theory via synthetic experiments and showcase the strong empirical performance of BuNNs over a range of real-world tasks, achieving state-of-the-art results on several standard benchmarks in transductive and inductive settings.
Updated: 2024-05-24 13:28:48
标题: 在图上进行消息扩散的Bundle神经网络
摘要: 图结构数据学习的主导范式是消息传递。尽管是一种强大的归纳偏见,但局部消息传递机制存在病态问题,如过度平滑、过度压缩和有限的节点级表达能力。为了解决这些限制,我们提出了Bundle Neural Networks(BuNN),一种通过在平面向量束上进行消息扩散的新型GNN,这些结构类似于在黎曼流形上的连接,通过为每个节点分配一个向量空间和一个正交映射来扩充图形。BuNN层根据扩散类型的偏微分方程演变特征。当离散化时,BuNN是Sheaf Neural Networks(SNNs)的一种特殊情况,SNNs是最近提出的一种能够减轻过度平滑的MPNN。消息扩散的连续性使BuNN能够在图的更大规模上运行,因此能够减轻过度压缩。最后,我们证明BuNN可以通过注入位置编码在任何(潜在的无限)图族上近似任何节点上的特征转换,从而获得通用节点级表达能力。我们通过合成实验支持我们的理论,并展示BuNN在一系列真实世界任务上的强大经验表现,在传导和感知设置中取得了几个标准基准测试的最新结果。
更新时间: 2024-05-24 13:28:48
领域: cs.LG
A generalized neural tangent kernel for surrogate gradient learning
State-of-the-art neural network training methods depend on the gradient of the network function. Therefore, they cannot be applied to networks whose activation functions do not have useful derivatives, such as binary and discrete-time spiking neural networks. To overcome this problem, the activation function's derivative is commonly substituted with a surrogate derivative, giving rise to surrogate gradient learning (SGL). This method works well in practice but lacks theoretical foundation. The neural tangent kernel (NTK) has proven successful in the analysis of gradient descent. Here, we provide a generalization of the NTK, which we call the surrogate gradient NTK, that enables the analysis of SGL. First, we study a naive extension of the NTK to activation functions with jumps, demonstrating that gradient descent for such activation functions is also ill-posed in the infinite-width limit. To address this problem, we generalize the NTK to gradient descent with surrogate derivatives, i.e., SGL. We carefully define this generalization and expand the existing key theorems on the NTK with mathematical rigor. Further, we illustrate our findings with numerical experiments. Finally, we numerically compare SGL in networks with sign activation function and finite width to kernel regression with the surrogate gradient NTK; the results confirm that the surrogate gradient NTK provides a good characterization of SGL.
Updated: 2024-05-24 13:27:23
标题: 一种用于替代梯度学习的广义神经切线核
摘要: 当前最先进的神经网络训练方法依赖于网络函数的梯度。因此,它们无法应用于激活函数没有有用导数的网络,例如二进制和离散时间尖峰神经网络。为了解决这个问题,通常会用一个替代导数替代激活函数的导数,从而产生替代梯度学习(SGL)。这种方法在实践中表现良好,但缺乏理论基础。神经切向核(NTK)在梯度下降分析中表现成功。在这里,我们提供了NTK的一个概括,我们称之为替代梯度NTK,它能够分析SGL。首先,我们研究了NTK对具有跳跃的激活函数的天真扩展,证明了对于这种激活函数的梯度下降在无限宽度极限下也是不适定的。为了解决这个问题,我们将NTK推广到使用替代导数进行梯度下降,即SGL。我们仔细定义了这个概括,并且用数学严谨性扩展了现有关于NTK的关键定理。此外,我们用数值实验说明了我们的发现。最后,我们在具有符号激活函数和有限宽度的网络中对比了带有替代梯度NTK的SGL与核回归;结果证实了替代梯度NTK很好地描述了SGL。
更新时间: 2024-05-24 13:27:23
领域: stat.ML,cond-mat.dis-nn,cs.LG,math.PR,q-bio.NC
Do Not Trust Power Management: Challenges and Hints for Securing Future Trusted Execution Environments
Over the past few years, several research groups have introduced innovative hardware designs for Trusted Execution Environments (TEEs), aiming to secure applications against potentially compromised privileged software, including the kernel. Since 2017, Tang et al. introduced a new class of software-enabled hardware attacks, which leverages energy management mechanisms. These attacks aim at bypassing TEE security guarantees and exposing sensitive information like cryptographic keys. They have increased in prevalence over the past few years. Despite that, current RISC-V TEE architectures have yet to incorporate them into their threat models. Proprietary implementations, such as Arm TrustZone and Intel SGX, embed countermeasures. However, these countermeasures are not viable in the long term and hinder the capabilities of energy management mechanisms. This article presents the first comprehensive knowledge survey of these attacks, along with an evaluation of literature countermeasures. Our analysis highlights a substantial security gap between assumed threat models and the actual ones, presenting considerable threats in modern systems-on-chip that can undermine even the security guarantees provided by TEEs. We advocate for the enhancement of the next generation of RISC-V TEEs to address these attacks within their threat models, and we believe this study will spur further community efforts in this direction.
Updated: 2024-05-24 13:26:39
标题: 不要相信电源管理:保护未来可信执行环境的挑战和提示
摘要: 在过去几年中,几个研究团队引入了创新的硬件设计用于可信执行环境(TEE),旨在保护应用程序免受潜在受损特权软件(包括内核)的攻击。自2017年以来,唐等人引入了一种新型的软件启用硬件攻击,利用能量管理机制。这些攻击旨在绕过TEE的安全保证,并暴露诸如加密密钥之类的敏感信息。这些攻击在过去几年中有所增加。尽管如此,当前的RISC-V TEE架构尚未将它们纳入威胁模型中。专有实现,如Arm TrustZone和Intel SGX,嵌入了对抗措施。然而,这些对抗措施在长期内并不可行,并阻碍了能量管理机制的能力。本文介绍了这些攻击的第一个全面知识调查,以及对文献对抗措施的评估。我们的分析突出了假设威胁模型和实际威胁模型之间存在实质性的安全差距,在现代片上系统中可能会对TEE提供的安全保证产生严重威胁。我们主张在下一代RISC-V TEE中加强对这些攻击的防范,并相信这项研究将激励社区在这方面进一步努力。
更新时间: 2024-05-24 13:26:39
领域: cs.CR,cs.AR,cs.ET
Mask-based Invisible Backdoor Attacks on Object Detection
Deep learning models have achieved unprecedented performance in the domain of object detection, resulting in breakthroughs in areas such as autonomous driving and security. However, deep learning models are vulnerable to backdoor attacks. These attacks prompt models to behave similarly to standard models without a trigger; however, they act maliciously upon detecting a predefined trigger. Despite extensive research on backdoor attacks in image classification, their application to object detection remains relatively underexplored. Given the widespread application of object detection in critical real-world scenarios, the sensitivity and potential impact of these vulnerabilities cannot be overstated. In this study, we propose an effective invisible backdoor attack on object detection utilizing a mask-based approach. Three distinct attack scenarios were explored for object detection: object disappearance, object misclassification, and object generation attack. Through extensive experiments, we comprehensively examined the effectiveness of these attacks and tested certain defense methods to determine effective countermeasures.
Updated: 2024-05-24 13:17:39
标题: 基于口罩的目标检测隐形后门攻击
摘要: 深度学习模型在目标检测领域取得了前所未有的性能,导致在自动驾驶和安全等领域取得了突破。然而,深度学习模型容易受到后门攻击的影响。这些攻击促使模型在没有触发器的情况下表现类似于标准模型;然而,一旦检测到预定义的触发器,它们会表现出恶意行为。尽管在图像分类中对后门攻击进行了广泛研究,但对目标检测的应用仍相对未开发。鉴于目标检测在关键现实场景中的广泛应用,这些漏洞的敏感性和潜在影响不容忽视。在本研究中,我们提出了一种利用基于遮罩的方法对目标检测进行有效的隐形后门攻击。我们探索了三种不同的攻击场景:目标消失、目标错分和目标生成攻击。通过大量实验,我们全面检验了这些攻击的有效性,并测试了一些防御方法,以确定有效的对策。
更新时间: 2024-05-24 13:17:39
领域: cs.CV,cs.AI,cs.CR,I.4.8
Optimal transport for automatic alignment of untargeted metabolomic data
Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.
Updated: 2024-05-24 13:16:49
标题: 无目标代谢组学数据的自动对齐的最佳传输
摘要: 通过液相色谱-质谱(LC-MS)进行无靶代谢组学分析可测量生物标本中的大量代谢物,推动药物开发、疾病诊断和风险预测。然而,LC-MS的低通量对生物标记物发现、注释和实验比较构成重大挑战,需要合并多个数据集。目前的数据汇总方法由于对数据变化和超参数依赖性的脆弱性而遇到实际限制。在这里,我们介绍了GromovMatcher,这是一种灵活且用户友好的算法,可以利用最优传输自动合并LC-MS数据集。通过利用特征强度相关结构,GromovMatcher相对于现有方法具有更优的对齐精度和稳健性。该算法可扩展到需要最小超参数调整的数千个特征。在无靶代谢组学领域,手动筛选的验证对齐算法的数据集受限,因此我们开发了一个数据集拆分程序,生成用于测试由GromovMatcher和其他方法产生的对齐的验证数据集对。将我们的方法应用于实验性肝脏和胰腺癌患者研究中,我们发现与患者饮酒有关的共享代谢特征,展示了GromovMatcher如何促进寻找与多种癌症类型相关的生活方式风险因素相关的生物标记物的过程。
更新时间: 2024-05-24 13:16:49
领域: q-bio.QM,cs.LG,49Q22, 92C40,G.3; J.3
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Identifying the features learned by neural networks is a core challenge in mechanistic interpretability. Sparse autoencoders (SAEs), which learn a sparse, overcomplete dictionary that reconstructs a network's internal activations, have been used to identify these features. However, SAEs may learn more about the structure of the datatset than the computational structure of the network. There is therefore only indirect reason to believe that the directions found in these dictionaries are functionally important to the network. We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. Compared to standard SAEs, e2e SAEs offer a Pareto improvement: They explain more network performance, require fewer total features, and require fewer simultaneously active features per datapoint, all with no cost to interpretability. We explore geometric and qualitative differences between e2e SAE features and standard SAE features. E2e dictionary learning brings us closer to methods that can explain network behavior concisely and accurately. We release our library for training e2e SAEs and reproducing our analysis at https://github.com/ApolloResearch/e2e_sae
Updated: 2024-05-24 13:16:32
标题: 利用端到端稀疏字典学习识别功能重要特征
摘要: 神经网络学习的特征识别是机械解释性的核心挑战。稀疏自动编码器(SAEs)学习一个稀疏的、过完备的字典,可以重构网络的内部激活,已被用来识别这些特征。然而,SAEs可能更多地学习了数据集的结构,而非网络的计算结构。因此,只有间接的理由相信这些字典中找到的方向对网络是功能重要的。我们提出了端到端(e2e)稀疏字典学习,一种训练SAEs的方法,通过最小化原始模型和插入SAE激活的模型的输出分布之间的KL散度,确保学到的特征在功能上是重要的。与标准SAEs相比,e2e SAEs提供了帕累托改进:它们解释了更多的网络性能,需要更少的总特征,并且每个数据点需要更少的同时激活特征,而这一切都不会损害可解释性。我们探索了e2e SAE特征和标准SAE特征之间的几何和定性差异。端到端字典学习让我们更接近能够简洁准确地解释网络行为的方法。我们发布了用于训练e2e SAEs并重现我们分析的库,网址是https://github.com/ApolloResearch/e2e_sae
更新时间: 2024-05-24 13:16:32
领域: cs.LG,cs.AI
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://anonymous.4open.science/r/Curriculum-DPO-EE14.
Updated: 2024-05-24 13:14:40
标题: 课程直接偏好优化对扩散和一致性模型的影响
摘要: 直接优化偏好(DPO)已被提议作为一种有效且高效的替代方案,用于从人类反馈中进行强化学习(RLHF)。在本文中,我们提出了一种基于课程学习的文本到图像生成的新颖和增强版本的DPO。我们的方法分为两个训练阶段。首先,通过使用奖励模型获得为每个提示生成的示例的排名。然后,逐渐困难的示例对被采样并提供给文本到图像生成(扩散或一致性)模型。在排名中相距较远的生成样本被认为形成简单对,而在排名中接近的形成困难对。换句话说,我们使用样本之间的排名差异作为难度的度量。根据它们的难度级别,采样的对被分成批次,逐渐用于训练生成模型。我们的方法Curriculum DPO 在三个基准测试中与最先进的微调方法进行比较,在文本对齐、美学和人类偏好方面表现优异。我们的代码可在https://anonymous.4open.science/r/Curriculum-DPO-EE14 获取。
更新时间: 2024-05-24 13:14:40
领域: cs.CV,cs.AI,cs.LG
Risks and Opportunities of Open-Source Generative AI
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.
Updated: 2024-05-24 13:12:49
标题: 开源生成人工智能的风险与机遇
摘要: Generative AI(Gen AI)的应用预计将彻底改变许多不同领域,从科学和医学到教育。这些重大变革的潜力引发了关于技术潜在风险的热烈讨论,并导致了对更严格监管的呼吁,尤其是一些领先于AI发展的主要科技公司。这种监管可能会危及新兴的开源生成AI领域。我们使用Gen AI发展的三阶段框架(近期、中期和长期),分析了具有类似能力的当前可用模型(近期到中期)和更强大能力(长期)的开源生成AI模型的风险和机会。我们认为,总体而言,开源Gen AI的益处超过了其风险。因此,我们鼓励对模型、训练和评估数据进行开源,并提供一套管理与开源生成AI相关风险的建议和最佳实践。
更新时间: 2024-05-24 13:12:49
领域: cs.LG
Polyp Segmentation Generalisability of Pretrained Backbones
It has recently been demonstrated that pretraining backbones in a self-supervised manner generally provides better fine-tuned polyp segmentation performance, and that models with ViT-B backbones typically perform better than models with ResNet50 backbones. In this paper, we extend this recent work to consider generalisability. I.e., we assess the performance of models on a different dataset to that used for fine-tuning, accounting for variation in network architecture and pretraining pipeline (algorithm and dataset). This reveals how well models with different pretrained backbones generalise to data of a somewhat different distribution to the training data, which will likely arise in deployment due to different cameras and demographics of patients, amongst other factors. We observe that the previous findings, regarding pretraining pipelines for polyp segmentation, hold true when considering generalisability. However, our results imply that models with ResNet50 backbones typically generalise better, despite being outperformed by models with ViT-B backbones in evaluation on the test set from the same dataset used for fine-tuning.
Updated: 2024-05-24 13:09:52
标题: 多项分割预训练骨干网络的通用性
摘要: 最近的研究表明,以自监督方式预训练骨干网络通常能提供更好的微调息肉分割性能,而具有ViT-B骨干的模型通常表现比具有ResNet50骨干的模型更好。在本文中,我们将这一最近的工作扩展到考虑泛化能力。即,我们评估模型在与微调使用的不同数据集上的表现,考虑网络架构和预训练流程(算法和数据集)的变化。这揭示了具有不同预训练骨干的模型对数据的泛化能力,这些数据与训练数据的分布略有不同,这可能会因为不同的相机和患者人口统计数据等因素而在部署中出现。我们观察到以前关于息肉分割的预训练流程的发现在考虑泛化能力时仍然成立。然而,我们的结果表明,尽管在对用于微调的相同数据集的测试集上的评估中被ViT-B骨干的模型表现更出色,但具有ResNet50骨干的模型通常具有更好的泛化能力。
更新时间: 2024-05-24 13:09:52
领域: cs.CV,cs.LG
Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language Models
The immense datasets used to develop Large Language Models (LLMs) often include copyright-protected content, typically without the content creator's consent. Copyright traps have been proposed to be injected into the original content, improving content detectability in newly released LLMs. Traps, however, rely on the exact duplication of a unique text sequence, leaving them vulnerable to commonly deployed data deduplication techniques. We here propose the generation of fuzzy copyright traps, featuring slight modifications across duplication. When injected in the fine-tuning data of a 1.3B LLM, we show fuzzy trap sequences to be memorized nearly as well as exact duplicates. Specifically, the Membership Inference Attack (MIA) ROC AUC only drops from 0.90 to 0.87 when 4 tokens are replaced across the fuzzy duplicates. We also find that selecting replacement positions to minimize the exact overlap between fuzzy duplicates leads to similar memorization, while making fuzzy duplicates highly unlikely to be removed by any deduplication process. Lastly, we argue that the fact that LLMs memorize across fuzzy duplicates challenges the study of LLM memorization relying on naturally occurring duplicates. Indeed, we find that the commonly used training dataset, The Pile, contains significant amounts of fuzzy duplicates. This introduces a previously unexplored confounding factor in post-hoc studies of LLM memorization, and questions the effectiveness of (exact) data deduplication as a privacy protection technique.
Updated: 2024-05-24 13:05:05
标题: 镶嵌式记忆:大型语言模型中版权陷阱中的模糊复制
摘要: 用于开发大型语言模型(LLMs)的庞大数据集通常包含受版权保护的内容,通常未经内容创建者同意。已经提出了版权陷阱,用于注入原始内容中,提高新发布的LLMs中内容的可检测性。然而,陷阱依赖于对唯一文本序列的精确复制,使其容易受到常用的数据重复消除技术的影响。在这里,我们提出了生成模糊版权陷阱的方法,跨重复时进行轻微修改。当注入到1.3B LLM的微调数据中时,我们发现模糊陷阱序列几乎与精确重复一样被记忆。具体来说,成员推理攻击(MIA)ROC AUC仅在4个标记在模糊重复中被替换时从0.90下降到0.87。我们还发现,选择替换位置以最大程度减少模糊重复之间的精确重叠会导致类似的记忆效果,同时使模糊重复极不可能被任何重复消除过程删除。最后,我们认为LLMs跨模糊重复进行记忆挑战了依赖自然发生的重复进行LLMs记忆研究。事实上,我们发现常用的训练数据集The Pile包含大量模糊重复。这在LLMs记忆后研究中引入了一个以前未探讨的混淆因素,并质疑(精确)数据重复消除作为一种隐私保护技术的有效性。
更新时间: 2024-05-24 13:05:05
领域: cs.CL,cs.LG
Bias Testing and Mitigation in LLM-based Code Generation
Utilizing state-of-the-art Large Language Models (LLMs), automatic code generation models play a pivotal role in enhancing the productivity of software development procedures. As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social bias and unfairness, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models, yet is under-explored in the literature. This paper presents a novel bias testing framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive evaluation of the bias in code generated by five state-of-the-art LLMs. Our findings reveal that 20.29% to 44.93% code functions generated by the models under study are biased when handling bias sensitive tasks (i.e., tasks that involve sensitive attributes such as age and gender). This indicates that the existing LLMs can be unfair in code generation, posing risks of unintended and harmful software behaviors. To mitigate bias for code generation models, we evaluate five bias mitigation prompt strategies, i.e., utilizing bias testing results to refine the code (zero-shot), one-, few-shot, and two Chain-of-Thought (CoT) prompts. Our evaluation results illustrate that these strategies are all effective in mitigating bias. Overall, one-shot and few-shot learning are the two most effective. For GPT-4, 80% to 90% code bias can be removed with one-shot learning.
Updated: 2024-05-24 13:03:49
标题: Bias Testing and Mitigation in LLM-based Code Generation 基于LLM的代码生成中的偏差测试和缓解
摘要: 利用最先进的大型语言模型(LLMs),自动生成代码模型在增强软件开发流程的生产力方面发挥着关键作用。随着LLMs在软件编码生态系统中的应用变得更加广泛,一个紧迫的问题出现了:生成的代码是否包含与年龄、性别和种族等有关的社会偏见和不公平现象?这个问题涉及依赖这些模型生成的代码的软件应用程序的完整性、公平性和伦理基础,但在文献中尚未得到充分探讨。本文提出了一个专门为代码生成任务设计的新颖偏见测试框架。基于这个框架,我们对五种最先进的LLMs生成的代码进行了广泛评估。我们的研究结果显示,研究中模型生成的代码函数中有20.29%至44.93%存在偏见,当处理涉及敏感属性(如年龄和性别)的任务时。这表明现有的LLMs在代码生成中可能存在不公平现象,可能导致意外和有害的软件行为风险。为了减轻代码生成模型的偏见,我们评估了五种偏见缓解提示策略,即利用偏见测试结果来完善代码(零射击)、一次射击、少次射击和两次思维链(CoT)提示。我们的评估结果表明,这些策略都有效地减轻了偏见。总体而言,一次射击和少次射击是最有效的两种策略。对于GPT-4,一次射击学习可以消除80%至90%的代码偏见。
更新时间: 2024-05-24 13:03:49
领域: cs.SE,cs.AI
A Preference-oriented Diversity Model Based on Mutual-information in Re-ranking for E-commerce Search
Re-ranking is a process of rearranging ranking list to more effectively meet user demands by accounting for the interrelationships between items. Existing methods predominantly enhance the precision of search results, often at the expense of diversity, leading to outcomes that may not fulfill the varied needs of users. Conversely, methods designed to promote diversity might compromise the precision of the results, failing to satisfy the users' requirements for accuracy. To alleviate the above problems, this paper proposes a Preference-oriented Diversity Model Based on Mutual-information (PODM-MI), which consider both accuracy and diversity in the re-ranking process. Specifically, PODM-MI adopts Multidimensional Gaussian distributions based on variational inference to capture users' diversity preferences with uncertainty. Then we maximize the mutual information between the diversity preferences of the users and the candidate items using the maximum variational inference lower bound to enhance their correlations. Subsequently, we derive a utility matrix based on the correlations, enabling the adaptive ranking of items in line with user preferences and establishing a balance between the aforementioned objectives. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of PODM-MI, and we have successfully deployed PODM-MI on an e-commerce search platform.
Updated: 2024-05-24 13:03:34
标题: 一个基于互信息的偏好导向多样性模型在电子商务搜索中的重新排序
摘要: 重新排名是重新排列排名列表以更有效地满足用户需求的过程,通过考虑项目之间的相互关系。现有的方法主要是增强搜索结果的精确度,通常是以牺牲多样性为代价,导致可能无法满足用户的各种需求。相反,旨在促进多样性的方法可能会损害结果的精确度,无法满足用户对准确性的要求。为了缓解上述问题,本文提出了一种基于互信息的偏好导向多样性模型(PODM-MI),在重新排名过程中考虑了准确性和多样性。具体而言,PODM-MI采用基于变分推理的多维高斯分布来捕捉用户的多样性偏好及其不确定性。然后,我们通过最大化变分推理下界来增强用户的多样性偏好与候选项目之间的互信息,进而推导出一个基于相关性的效用矩阵,实现了根据用户偏好自适应排名项目,并在准确性和多样性之间建立平衡。真实世界在线电子商务系统上的实验结果表明了PODM-MI的显著改进,并且我们已成功将PODM-MI部署在一个电子商务搜索平台上。
更新时间: 2024-05-24 13:03:34
领域: cs.IR,cs.AI
Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction
Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated. This paper shows that machine unlearning is possible in MRI tasks and has the potential to benefit for bias removal. We set up a protocol to study how much shared knowledge exists between datasets of different organs, allowing us to effectively quantify the effect of unlearning. Our study reveals that combining training data can lead to hallucinations and reduced image quality in the reconstructed data. We use unlearning to remove hallucinations as a proxy exemplar of undesired data removal. Indeed, we show that machine unlearning is possible without full retraining. Furthermore, our observations indicate that maintaining high performance is feasible even when using only a subset of retain data. We have made our code publicly accessible.
Updated: 2024-05-24 13:01:35
标题: 擦除以增强:在MRI重建中数据高效机器学习消除
摘要: 机器遗忘是一种有希望的范式,用于从训练模型中删除不需要的数据样本,以确保符合隐私法规并限制有害偏见。尽管遗忘已在分类和推荐系统中得到证明,但在医学图像到图像转换中,特别是在图像重构中,其潜力尚未得到充分调查。本文表明机器遗忘在MRI任务中是可能的,并有助于消除偏见。我们建立了一个协议来研究不同器官数据集之间存在多少共享知识,从而有效量化遗忘的效果。我们的研究表明,合并训练数据可能导致幻觉和重建数据的图像质量降低。我们使用遗忘来移除幻觉,作为不需要的数据移除的代理示例。事实上,我们展示了即使没有完全重新训练,机器遗忘也是可能的。此外,我们的观察表明,即使只使用部分保留数据,仍然可以保持高性能。我们已经使我们的代码公开可访问。
更新时间: 2024-05-24 13:01:35
领域: eess.IV,cs.CV,cs.LG
Learning Relevant Contextual Variables Within Bayesian Optimization
Contextual Bayesian Optimization (CBO) efficiently optimizes black-box functions with respect to design variables, while simultaneously integrating contextual information regarding the environment, such as experimental conditions. However, the relevance of contextual variables is not necessarily known beforehand. Moreover, contextual variables can sometimes be optimized themselves at an additional cost, a setting overlooked by current CBO algorithms. Cost-sensitive CBO would simply include optimizable contextual variables as part of the design variables based on their cost. Instead, we adaptively select a subset of contextual variables to include in the optimization, based on the trade-off between their relevance and the additional cost incurred by optimizing them compared to leaving them to be determined by the environment. We learn the relevance of contextual variables by sensitivity analysis of the posterior surrogate model while minimizing the cost of optimization by leveraging recent developments on early stopping for BO. We empirically evaluate our proposed Sensitivity-Analysis-Driven Contextual BO (SADCBO) method against alternatives on both synthetic and real-world experiments, together with extensive ablation studies, and demonstrate a consistent improvement across examples.
Updated: 2024-05-24 12:59:54
标题: 在贝叶斯优化中学习相关的上下文变量
摘要: 背景贝叶斯优化(CBO)可以高效地优化关于设计变量的黑箱函数,同时集成有关环境的上下文信息,如实验条件。然而,上下文变量的相关性并不一定事先已知。此外,上下文变量有时可以以额外成本进行优化,这是当前CBO算法忽视的一个设置。成本敏感的CBO简单地将可优化的上下文变量作为设计变量的一部分,基于它们的成本。相反,我们通过根据它们的相关性和相对于由环境确定的成本而言的优化成本之间的权衡,自适应地选择要包含在优化中的上下文变量的子集。我们通过对后验替代模型的敏感性分析来学习上下文变量的相关性,同时利用最近关于提前停止BO的发展来最小化优化成本。我们在合成和实际实验中对我们提出的基于敏感性分析的上下文BO(SADCBO)方法进行了经验评估,并进行了广泛的消融研究,展示了在各个示例中的持续改进。
更新时间: 2024-05-24 12:59:54
领域: cs.LG,stat.ML
A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification
Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
Updated: 2024-05-24 12:58:55
标题: 一个用于联邦学习的碳追踪模型:量化和稀疏化的影响
摘要: Federated Learning(FL)方法采用高效的通信技术,将机器学习任务分布到边缘设备上,与集中式解决方案相比,减少了数据存储和计算复杂性方面的开销。与将大量数据从生产者(传感器、机器)传输到耗能的数据中心相比,引发了环境担忧,FL提供了一种替代解决方案,减轻了多个学习任务的能量需求,同时实现了新的物联网人工智能(AIoT)应用。本文提出了一个框架,用于实时监测FL系统对能源和碳足迹的影响。碳追踪工具对共识(完全分散)和经典FL策略进行了评估。我们首次对不同计算和通信高效FL方法从能量消耗和碳当量排放的角度进行了定量评估,同时提出了能源高效设计的一般指导方针。结果表明,在通信能效低(即< 25 Kbit/Joule)时,应优先选择以共识为驱动的FL实现以限制碳排放。此外,量化和稀疏化操作被证明在学习表现和能源消耗之间取得平衡,导致可持续的FL设计。
更新时间: 2024-05-24 12:58:55
领域: eess.SP,cs.LG
On the Convexity and Reliability of the Bethe Free Energy Approximation
The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if it is convex on a submanifold of its domain, the 'Bethe box'. For verifying its convexity, we derive two sufficient conditions that are based on the definiteness properties of the Bethe Hessian matrix: the first uses the concept of diagonal dominance, and the second decomposes the Bethe Hessian matrix into a sum of sparse matrices and characterizes the definiteness properties of the individual matrices in that sum. These theoretical results provide a simple way to estimate the critical phase transition temperature of a model. As a practical contribution we propose $\texttt{BETHE-MIN}$, a projected quasi-Newton method to efficiently find a minimum of the Bethe free energy.
Updated: 2024-05-24 12:57:40
标题: 关于贝塔自由能近似的凸性和可靠性
摘要: 贝塔自由能近似提供了一种放松概率推断中的NP难题的有效方法。然而,其准确性取决于模型参数,特别是在模型发生相变时会出现明显退化。在本文中,我们分析了贝塔近似何时可靠以及如何验证这一点。我们通过实验证明,如果在其定义域的子流形上是凸的,即“贝塔盒子”,则该近似大多数情况下是准确的。为了验证其凸性,我们推导了两个基于贝塔Hessian矩阵的确定性属性的充分条件:第一个使用对角线优势的概念,第二个将贝塔Hessian矩阵分解为稀疏矩阵的和,并描述了该求和中各个矩阵的确定性属性。这些理论结果提供了一种简单的方法来估计模型的临界相变温度。作为实际贡献,我们提出了$\texttt{BETHE-MIN}$,这是一种投影拟牛顿方法,能够有效地找到贝塔自由能的最小值。
更新时间: 2024-05-24 12:57:40
领域: stat.ML,cs.AI,cs.LG
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
In recent times, large language models (LLMs) have made significant strides in generating computer code, blurring the lines between code created by humans and code produced by artificial intelligence (AI). As these technologies evolve rapidly, it is crucial to explore how they influence code generation, especially given the risk of misuse in areas like higher education. This paper explores this issue by using advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT, a type of LLM. We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms - including Deep Neural Networks, Random Forests, and Extreme Gradient Boosting - to achieve this differentiation with an impressive accuracy of 98%. For the successful combinations, we also examine their model calibration, showing that some of the models are extremely well calibrated. Additionally, we present white-box features and an interpretable Bayes classifier to elucidate critical differences between the code sources, enhancing the explainability and transparency of our approach. Both approaches work well but provide at most 85-88% accuracy. We also show that untrained humans solve the same task not better than random guessing. This study is crucial in understanding and mitigating the potential risks associated with using AI in code generation, particularly in the context of higher education, software development, and competitive programming.
Updated: 2024-05-24 12:56:18
标题: ChatGPT 代码检测:揭示代码来源的技术
摘要: 最近,大型语言模型(LLMs)在生成计算机代码方面取得了重大进展,模糊了人类创造的代码和人工智能(AI)生成的代码之间的界限。随着这些技术的快速发展,探索它们如何影响代码生成至关重要,尤其是考虑到在诸如高等教育等领域可能被滥用的风险。本文通过使用先进的分类技术来区分由人类编写的代码和由ChatGPT生成的代码,一种LLM类型,探讨了这个问题。我们采用一种新方法,将强大的嵌入特征(黑盒)与监督学习算法结合使用,包括深度神经网络、随机森林和极限梯度提升,以实现98%的准确性的区分。对于成功的组合,我们还检查它们的模型校准,显示一些模型非常良好校准。此外,我们提出白盒特征和可解释的贝叶斯分类器,以阐明代码来源之间的关键差异,增强我们方法的可解释性和透明性。这两种方法都表现良好,但最多提供85-88%的准确性。我们还展示未经训练的人类并不比随机猜测更好地解决相同的任务。这项研究对于理解和减轻在代码生成中使用人工智能可能带来的潜在风险至关重要,特别是在高等教育、软件开发和竞技编程的背景下。
更新时间: 2024-05-24 12:56:18
领域: cs.LG,cs.AI
Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces
This work studies discrete-time discounted Markov decision processes with continuous state and action spaces and addresses the inverse problem of inferring a cost function from observed optimal behavior. We first consider the case in which we have access to the entire expert policy and characterize the set of solutions to the inverse problem by using occupation measures, linear duality, and complementary slackness conditions. To avoid trivial solutions and ill-posedness, we introduce a natural linear normalization constraint. This results in an infinite-dimensional linear feasibility problem, prompting a thorough analysis of its properties. Next, we use linear function approximators and adopt a randomized approach, namely the scenario approach and related probabilistic feasibility guarantees, to derive epsilon-optimal solutions for the inverse problem. We further discuss the sample complexity for a desired approximation accuracy. Finally, we deal with the more realistic case where we only have access to a finite set of expert demonstrations and a generative model and provide bounds on the error made when working with samples.
Updated: 2024-05-24 12:53:07
标题: 在连续空间中的逆强化学习的随机算法和PAC界限
摘要: 这项工作研究了具有连续状态和动作空间的离散时间折扣马尔可夫决策过程,并解决了从观察到的最优行为中推断成本函数的逆问题。我们首先考虑了我们可以访问整个专家策略的情况,并利用占用测度、线性对偶和互补松弛条件来表征逆问题的解集。为了避免平凡解和不适定性问题,我们引入了一个自然的线性归一化约束。这导致了一个无限维的线性可行性问题,引发了对其属性的深入分析。接下来,我们使用线性函数逼近器,并采用了一种随机化方法,即情景方法和相关的概率可行性保证,以获得逆问题的epsilon-最优解。我们进一步讨论了对所需近似精度的样本复杂性。最后,我们处理了更现实的情况,即我们只能访问有限的专家演示集和一个生成模型,并在使用样本时提供了误差边界。
更新时间: 2024-05-24 12:53:07
领域: math.OC,cs.LG
Human-in-the-loop Reinforcement Learning for Data Quality Monitoring in Particle Physics Experiments
Data Quality Monitoring (DQM) is a crucial task in large particle physics experiments, since detector malfunctioning can compromise the data. DQM is currently performed by human shifters, which is costly and results in limited accuracy. In this work, we provide a proof-of-concept for applying human-in-the-loop Reinforcement Learning (RL) to automate the DQM process while adapting to operating conditions that change over time. We implement a prototype based on the Proximal Policy Optimization (PPO) algorithm and validate it on a simplified synthetic dataset. We demonstrate how a multi-agent system can be trained for continuous automated monitoring during data collection, with human intervention actively requested only when relevant. We show that random, unbiased noise in human classification can be reduced, leading to an improved accuracy over the baseline. Additionally, we propose data augmentation techniques to deal with scarce data and to accelerate the learning process. Finally, we discuss further steps needed to implement the approach in the real world, including protocols for periodic control of the algorithm's outputs.
Updated: 2024-05-24 12:52:46
标题: 人为调控强化学习在粒子物理实验数据质量监测中的应用
摘要: 数据质量监控(DQM)是大型粒子物理实验中至关重要的任务,因为探测器故障可能会影响数据。目前,DQM由人类操作员执行,这是昂贵的并且结果准确性有限。在这项工作中,我们提供了一个将人机协作强化学习(RL)应用于自动化DQM过程的概念验证,同时适应随时间变化的运行条件。我们基于Proximal Policy Optimization(PPO)算法实现了一个原型,并在简化的合成数据集上进行了验证。我们展示了如何训练一个多智能体系统进行连续自动化监控数据收集过程,只有在相关时才主动请求人类干预。我们展示了人类分类中的随机、无偏差的噪声可以减少,从而提高基线准确性。此外,我们提出了数据增强技术来处理稀缺数据并加速学习过程。最后,我们讨论了在实际应用中实施该方法所需的进一步步骤,包括定期控制算法输出的协议。
更新时间: 2024-05-24 12:52:46
领域: hep-ex,cs.LG
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
Updated: 2024-05-24 12:51:29
标题: 中至中期开源生成式人工智能的风险和机遇
摘要: 在未来几年,生成式人工智能的应用预计将彻底改变许多不同领域,从科学和医学到教育。这些重大变革的潜力引发了一场关于潜在风险的热烈讨论,并导致了对更严格监管的呼吁,特别是一些领先于人工智能开发的主要科技公司。这种监管可能会使开源生成式人工智能领域处于风险之中。我们主张在近中期对生成式人工智能模型进行负责任的开源。为了铺平道路,我们首先引入了一个人工智能开放性分类系统,并将其应用于40个当前的大型语言模型。然后,我们概述了开源与闭源人工智能的差异化利益和风险,并提出了潜在的风险缓解措施,从最佳实践到对技术和科学贡献的呼吁。我们希望这份报告能为当前关于近中期人工智能安全和其他社会影响的公共讨论增添一个急需的声音。
更新时间: 2024-05-24 12:51:29
领域: cs.LG
Learning to Discretize Denoising Diffusion ODEs
Diffusion Probabilistic Models (DPMs) are powerful generative models showing competitive performance in various domains, including image synthesis and 3D point cloud generation. However, sampling from pre-trained DPMs involves multiple neural function evaluations (NFE) to transform Gaussian noise samples into images, resulting in higher computational costs compared to single-step generative models such as GANs or VAEs. Therefore, a crucial problem is to reduce NFE while preserving generation quality. To this end, we propose LD3, a lightweight framework for learning time discretization while sampling from the diffusion ODE encapsulated by DPMs. LD3 can be combined with various diffusion ODE solvers and consistently improves performance without retraining resource-intensive neural networks. We demonstrate analytically and empirically that LD3 enhances sampling efficiency compared to distillation-based methods, without the extensive computational overhead. We evaluate our method with extensive experiments on 5 datasets, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. For example, in about 5 minutes of training on a single GPU, our method reduces the FID score from 6.63 to 2.68 on CIFAR10 (7 NFE), and in around 20 minutes, decreases the FID from 8.51 to 5.03 on class-conditional ImageNet-256 (5 NFE). LD3 complements distillation methods, offering a more efficient approach to sampling from pre-trained diffusion models.
Updated: 2024-05-24 12:51:23
标题: 学习离散化去噪扩散ODEs
摘要: 扩散概率模型(DPMs)是一种强大的生成模型,在各个领域,包括图像合成和3D点云生成中表现出竞争力。然而,从预先训练的DPMs中进行采样涉及多次神经功能评估(NFE),将高斯噪声样本转换为图像,与单步生成模型(如GANs或VAEs)相比,导致更高的计算成本。因此,一个关键问题是减少NFE同时保持生成质量。为此,我们提出了LD3,一种轻量级框架,用于在从DPMs封装的扩散ODE中进行采样时学习时间离散化。LD3可以与各种扩散ODE解算器结合使用,并在不重新训练资源密集型神经网络的情况下持续改进性能。我们在5个数据集上进行了大量实验评估我们的方法,涵盖了无条件和有条件的像素空间和潜在空间的DPMs中的采样。例如,在单个GPU上进行大约5分钟的训练,我们的方法将CIFAR10的FID分数从6.63降至2.68(7 NFE),并在大约20分钟内,将ImageNet-256的条件FID从8.51降至5.03(5 NFE)。LD3补充了蒸馏方法,提供了一种更高效的从预先训练的扩散模型中进行采样的方法。
更新时间: 2024-05-24 12:51:23
领域: cs.LG
HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning
Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which dynamically filters a target network depending on the CL task. The hypernetwork produces semi-binary masks to obtain dedicated target subnetworks. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets. Depending on the task, the importance of some weights may be dynamically enhanced while others may be weakened. HyperMask achieves competitive results in several CL datasets and, in some scenarios, goes beyond the state-of-the-art scores, both with derived and unknown task identities.
Updated: 2024-05-24 12:49:30
标题: 超蒙版:基于自适应超网络的持续学习掩码
摘要: 人工神经网络在连续训练多个任务时容易发生灾难性遗忘。许多连续学习(CL)策略试图克服这个问题。其中最有效的之一是基于超网络的方法。超网络根据任务的身份生成目标模型的权重。该模型的主要限制在于实践中,超网络可能会为后续任务生成完全不同的架构。为了解决这个问题,我们使用了“中奖彩票”假设,该假设假定存在稀疏子网络,名为中奖彩票,可以保持整个网络的性能。在本文中,我们提出了一种称为HyperMask的方法,根据CL任务动态过滤目标网络。超网络生成半二进制掩码以获得专用目标子网络。此外,由于中奖彩票假设,我们可以使用具有加权子网络的单个网络。根据任务,一些权重的重要性可能会动态增强,而其他权重可能会减弱。HyperMask在几个CL数据集中取得了竞争性的结果,并且在某些情况下,无论是有派生还是未知的任务身份,都超越了现有技术水平。
更新时间: 2024-05-24 12:49:30
领域: cs.LG,cs.AI,68T07,I.2.6
Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck
As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfactual regression through the lens of information bottleneck and propose a novel learning paradigm called Gromov-Wasserstein information bottleneck (GWIB). In this paradigm, we learn CFR by maximizing the mutual information between covariates' latent representations and outcomes while penalizing the kernelized mutual information between the latent representations and the covariates. We demonstrate that the upper bound of the penalty term can be implemented as a new regularizer consisting of $i)$ the fused Gromov-Wasserstein distance between the latent representations of different groups and $ii)$ the gap between the transport cost generated by the model and the cross-group Gromov-Wasserstein distance between the latent representations and the covariates. GWIB effectively learns the CFR model through alternating optimization, suppressing selection bias while avoiding trivial latent distributions. Experiments on ITE estimation tasks show that GWIB consistently outperforms state-of-the-art CFR methods. To promote the research community, we release our project at https://github.com/peteryang1031/Causal-GWIB.
Updated: 2024-05-24 12:48:24
标题: 通过Gromov-Wasserstein信息瓶颈的视角重新审视反事实回归
摘要: 作为一种有前景的个性化治疗效果(ITE)估计方法,反事实回归(CFR)将个体的协变量映射到潜在空间,并预测他们的反事实结果。然而,控制组和治疗组之间的选择偏差经常导致这两组的潜在分布不平衡,并且对这种方法的性能产生负面影响。在这项研究中,我们通过信息瓶颈的视角重新审视反事实回归,并提出了一种称为Gromov-Wasserstein信息瓶颈(GWIB)的新颖学习范式。在这种范式中,我们通过最大化协变量的潜在表示和结果之间的互信息来学习CFR,同时惩罚潜在表示和协变量之间的核化互信息。我们证明了惩罚项的上界可以作为一个新的正则化器实现,包括$i)$ 不同组的潜在表示之间的融合Gromov-Wasserstein距离和$ii)$ 模型生成的传输成本与潜在表示和协变量之间的跨组Gromov-Wasserstein距离之间的差距。GWIB通过交替优化有效地学习CFR模型,抑制选择偏差同时避免平凡的潜在分布。对ITE估计任务的实验证明,GWIB始终优于最先进的CFR方法。为了促进研究社区,我们在https://github.com/peteryang1031/Causal-GWIB 上发布了我们的项目。
更新时间: 2024-05-24 12:48:24
领域: cs.LG,cs.AI,stat.ML
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves 100\% success rate in the development stage, while attaining 36\% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing \$1.60 and \$0.13 per run with GPT-4, respectively. Our data and code are open-sourced at https://github.com/guosyjlu/DS-Agent.
Updated: 2024-05-24 12:40:48
标题: DS-Agent:通过赋能大型语言模型与基于案例推理的自动化数据科学
摘要: 在这项工作中,我们研究了基于大型语言模型(LLM)的代理人自动化数据科学任务的潜力,目标是理解任务需求,然后构建和训练最适合的机器学习模型。尽管现有的LLM代理在这种情况下受到生成不合理实验计划的限制,我们提出了DS-Agent,这是一个利用LLM代理和案例推理(CBR)的新型自动框架。在开发阶段,DS-Agent遵循CBR框架构建自动迭代管道,可以灵活地利用来自Kaggle的专业知识,并通过反馈机制促进一致的性能改进。此外,DS-Agent在低资源部署阶段实现了一个简化的CBR范例,从开发阶段成功的解决方案中适应直接代码生成,大大降低了对LLM基本能力的需求。经验上,DS-Agent与GPT-4在开发阶段实现了100%的成功率,在部署阶段,相较于其他LLM,平均一次通过率提高了36%。在两个阶段中,DS-Agent在性能方面取得了最佳排名,分别使用GPT-4每次运行的成本为\$1.60和\$0.13。我们的数据和代码在https://github.com/guosyjlu/DS-Agent 上开源。
更新时间: 2024-05-24 12:40:48
领域: cs.LG
Hierarchical Loss And Geometric Mask Refinement For Multilabel Ribs Segmentation
Automatic ribs segmentation and numeration can increase computed tomography assessment speed and reduce radiologists mistakes. We introduce a model for multilabel ribs segmentation with hierarchical loss function, which enable to improve multilabel segmentation quality. Also we propose postprocessing technique to further increase labeling quality. Our model achieved new state-of-the-art 98.2% label accuracy on public RibSeg v2 dataset, surpassing previous result by 6.7%.
Updated: 2024-05-24 12:39:21
标题: 层次损失和几何掩膜细化用于多标签肋骨分割
摘要: 自动肋骨分割和编号可以提高计算机断层扫描评估速度,并减少放射科医生的错误。我们引入了一个多标签肋骨分割模型,采用分层损失函数,可以改善多标签分割质量。此外,我们提出了后处理技术,进一步提高标记质量。我们的模型在公共RibSeg v2数据集上实现了新的98.2%标签准确率,超过先前结果6.7%。
更新时间: 2024-05-24 12:39:21
领域: eess.IV,cs.CV,cs.LG
Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank
Unbiased Learning to Rank (ULTR) aims to train unbiased ranking models from biased click logs, by explicitly modeling a generation process for user behavior and fitting click data based on examination hypothesis. Previous research found empirically that the true latent relevance is mostly recoverable through click fitting. However, we demonstrate that this is not always achievable, resulting in a significant reduction in ranking performance. This research investigates the conditions under which relevance can be recovered from click data in the first principle. We initially characterize a ranking model as identifiable if it can recover the true relevance up to a scaling transformation, a criterion sufficient for the pairwise ranking objective. Subsequently, we investigate an equivalent condition for identifiability, articulated as a graph connectivity test problem: the recovery of relevance is feasible if and only if the identifiability graph (IG), derived from the underlying structure of the dataset, is connected. The presence of a disconnected IG may lead to degenerate cases and suboptimal ranking performance. To tackle this challenge, we introduce two methods, namely node intervention and node merging, designed to modify the dataset and restore the connectivity of the IG. Empirical results derived from a simulated dataset and two real-world LTR benchmark datasets not only validate our proposed theory but also demonstrate the effectiveness of our methods in alleviating data bias when the relevance model is unidentifiable.
Updated: 2024-05-24 12:29:55
标题: 可辨识性很重要:揭示在无偏学习排序中隐藏的可恢复条件
摘要: 无偏学习排序(ULTR)旨在从有偏点击日志中训练无偏排序模型,通过明确建模用户行为的生成过程并基于检验假设拟合点击数据。先前的研究从经验上发现,通过点击拟合,真正的潜在相关性大多是可恢复的。然而,我们证明这并非总是可实现的,导致排名性能显著降低。本研究探讨了在何种条件下可以从点击数据中恢复相关性的第一原则。我们最初将排名模型表征为可识别的,如果它可以恢复真实相关性直至缩放变换,这是对于成对排序目标的充分标准。随后,我们研究了一个等价的可识别条件,表述为图连接性测试问题:当且仅当从数据集的底层结构导出的可识别性图(IG)是连接的时,相关性的恢复是可行的。不连通的IG的存在可能导致退化情况和次优的排名性能。为了解决这一挑战,我们引入了两种方法,即节点干预和节点合并,旨在修改数据集并恢复IG的连接性。从模拟数据集和两个真实的LTR基准数据集中得出的实证结果不仅验证了我们提出的理论,而且证明了我们的方法在相关性模型不可识别时缓解数据偏见的有效性。
更新时间: 2024-05-24 12:29:55
领域: cs.IR,cs.AI,cs.LG
Towards Natural Machine Unlearning
Machine unlearning (MU) aims to eliminate information that has been learned from specific training data, namely forgetting data, from a pre-trained model. Currently, the mainstream of existing MU methods involves modifying the forgetting data with incorrect labels and subsequently fine-tuning the model. While learning such incorrect information can indeed remove knowledge, the process is quite unnatural as the unlearning process undesirably reinforces the incorrect information and leads to over-forgetting. Towards more \textit{natural} machine unlearning, we inject correct information from the remaining data to the forgetting samples when changing their labels. Through pairing these adjusted samples with their labels, the model will tend to use the injected correct information and naturally suppress the information meant to be forgotten. Albeit straightforward, such a first step towards natural machine unlearning can significantly outperform current state-of-the-art approaches. In particular, our method substantially reduces the over-forgetting and leads to strong robustness to hyperparameters, making it a promising candidate for practical machine unlearning.
Updated: 2024-05-24 12:23:38
标题: 朝向自然机器遗忘
摘要: 机器遗忘(MU)旨在消除从特定训练数据中学习的信息,即从预训练模型中遗忘数据。目前,现有MU方法的主流方法涉及使用错误标签修改遗忘数据,然后对模型进行微调。虽然学习这种错误信息确实可以消除知识,但这个过程是相当不自然的,因为遗忘过程不可避免地会强化错误信息并导致过度遗忘。为了更加\textit{自然}的机器遗忘,我们在改变遗忘样本的标签时,将正确信息从剩余数据中注入遗忘样本。通过将这些调整后的样本与它们的标签配对,模型将倾向于使用注入的正确信息,并自然地抑制应该被遗忘的信息。尽管这种方法很直接,但这是朝向自然机器遗忘的第一步,可以显著优于目前的最先进方法。特别是,我们的方法大大减少了过度遗忘,并使模型对超参数具有较强的鲁棒性,使其成为实用的机器遗忘的有前途的候选方法。
更新时间: 2024-05-24 12:23:38
领域: cs.LG
Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations
Efficient sampling of the Boltzmann distribution of molecular systems is a long-standing challenge. Recently, instead of generating long molecular dynamics simulations, generative machine learning methods such as normalizing flows have been used to learn the Boltzmann distribution directly, without samples. However, this approach is susceptible to mode collapse and thus often does not explore the full configurational space. In this work, we address this challenge by separating the problem into two levels, the fine-grained and coarse-grained degrees of freedom. A normalizing flow conditioned on the coarse-grained space yields a probabilistic connection between the two levels. To explore the configurational space, we employ coarse-grained simulations with active learning which allows us to update the flow and make all-atom potential energy evaluations only when necessary. Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach.
Updated: 2024-05-24 12:13:33
标题: 条件归一化流用于粗粒化分子表示的主动学习
摘要: 分子系统玻尔兹曼分布的高效采样一直是一个长期存在的挑战。最近,与生成长时间分子动力学模拟相比,生成式机器学习方法如规范化流被用来直接学习玻尔兹曼分布,而无需样本。然而,这种方法容易受到模式坍塌的影响,因此通常无法探索完整的构象空间。在本研究中,我们通过将问题分解为细粒度和粗粒度自由度两个层面来解决这一挑战。在粗粒度空间上有条件的规范化流产生了两个层次之间的概率连接。为了探索构象空间,我们利用粗粒度模拟与主动学习,这使我们能够在必要时更新流,并仅在必要时进行全原子势能评估。以丙氨酸二肽为例,我们展示了我们的方法相对于当前最先进的机器学习方法的速度提升约为15.9到216.2,而后者的速度提升为4.5。
更新时间: 2024-05-24 12:13:33
领域: cs.LG,cs.AI,physics.chem-ph
Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data
Constraint-based causal discovery methods leverage conditional independence tests to infer causal relationships in a wide variety of applications. Just as the majority of machine learning methods, existing work focuses on studying $\textit{independent and identically distributed}$ data. However, it is known that even with infinite i.i.d.$\ $ data, constraint-based methods can only identify causal structures up to broad Markov equivalence classes, posing a fundamental limitation for causal discovery. In this work, we observe that exchangeable data contains richer conditional independence structure than i.i.d.$\ $ data, and show how the richer structure can be leveraged for causal discovery. We first present causal de Finetti theorems, which state that exchangeable distributions with certain non-trivial conditional independences can always be represented as $\textit{independent causal mechanism (ICM)}$ generative processes. We then present our main identifiability theorem, which shows that given data from an ICM generative process, its unique causal structure can be identified through performing conditional independence tests. We finally develop a causal discovery algorithm and demonstrate its applicability to inferring causal relationships from multi-environment data. Our code and models are publicly available at: https://github.com/syguo96/Causal-de-Finetti
Updated: 2024-05-24 12:12:57
标题: Causal de Finetti:关于可交换数据中不变因果结构的识别
摘要: 基于约束的因果发现方法利用条件独立性测试推断各种应用中的因果关系。与大多数机器学习方法一样,现有研究侧重于研究独立同分布的数据。然而,众所周知,即使有无限的独立同分布数据,基于约束的方法只能识别出广义马尔可夫等价类中的因果结构,这对因果发现构成了根本限制。在这项工作中,我们观察到可交换数据包含比独立同分布数据更丰富的条件独立性结构,并展示如何利用这种更丰富的结构进行因果发现。我们首先提出了因果de Finetti定理,该定理表明具有一定非平凡条件独立性的可交换分布总是可以表示为独立因果机制(ICM)生成过程。然后,我们提出了我们的主要可识别性定理,该定理表明,给定来自ICM生成过程的数据,可以通过进行条件独立性测试来识别其唯一的因果结构。最后,我们开发了一个因果发现算法,并展示了其适用性,可以从多环境数据中推断因果关系。我们的代码和模型可以在以下网址公开获取:https://github.com/syguo96/Causal-de-Finetti
更新时间: 2024-05-24 12:12:57
领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH
Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2
Protein diffusion models have emerged as a promising approach for protein design. One such pioneering model is Genie, a method that asymmetrically represents protein structures during the forward and backward processes, using simple Gaussian noising for the former and expressive SE(3)-equivariant attention for the latter. In this work we introduce Genie 2, extending Genie to capture a larger and more diverse protein structure space through architectural innovations and massive data augmentation. Genie 2 adds motif scaffolding capabilities via a novel multi-motif framework that designs co-occurring motifs with unspecified inter-motif positions and orientations. This makes possible complex protein designs that engage multiple interaction partners and perform multiple functions. On both unconditional and conditional generation, Genie 2 achieves state-of-the-art performance, outperforming all known methods on key design metrics including designability, diversity, and novelty. Genie 2 also solves more motif scaffolding problems than other methods and does so with more unique and varied solutions. Taken together, these advances set a new standard for structure-based protein design. Genie 2 inference and training code, as well as model weights, are freely available at: https://github.com/aqlaboratory/genie2.
Updated: 2024-05-24 12:11:41
标题: “众多中的一个:使用Genie 2在结构宇宙的尺度上设计和搭建蛋白质”
摘要: 蛋白质扩散模型已经成为蛋白质设计的一种有前途的方法。其中一个开创性的模型是Genie,该方法在前向和后向过程中不对称地表示蛋白质结构,前者使用简单的高斯噪声,后者使用表达SE(3)-等变关注。在这项工作中,我们介绍了Genie 2,通过架构创新和大规模数据增强,扩展了Genie以捕获更大更多样的蛋白质结构空间。Genie 2通过一种新颖的多基序框架增加了主题脚手架功能,该框架设计了具有未指定基序位置和取向的共生基序,从而实现了涉及多个相互作用伙伴并执行多个功能的复杂蛋白质设计。在无条件和有条件生成方面,Genie 2实现了最先进的性能,在关键设计指标包括设计性,多样性和新颖性方面胜过所有已知方法。Genie 2还解决了比其他方法更多的主题脚手架问题,并以更独特和多样化的解决方案完成了这些问题。总的来说,这些进步为基于结构的蛋白质设计设定了新的标准。Genie 2的推理和训练代码以及模型权重可在以下网址免费获取:https://github.com/aqlaboratory/genie2。
更新时间: 2024-05-24 12:11:41
领域: q-bio.BM,cs.LG
RFold: RNA Secondary Structure Prediction with Decoupled Optimization
The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in \href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}.
Updated: 2024-05-24 12:05:40
标题: RFold:使用解耦优化进行RNA二级结构预测
摘要: 核糖核酸(RNA)的二级结构在细胞中比其三级结构更稳定且更易访问,这使得对其功能进行预测变得至关重要。尽管深度学习在这一领域表现出有希望的结果,但当前的方法存在泛化能力差和复杂性高的问题。在这项工作中,我们将RNA二级结构预测重新构建为一个K-Rook问题,从而将预测过程简化为有限解空间内的概率匹配。基于这一创新视角,我们引入了RFold,这是一种简单而有效的方法,它学会从给定的序列中预测最匹配的K-Rook解决方案。RFold采用双向优化策略,将概率匹配问题分解为行向和列向组件,以减少匹配复杂性,简化解决过程同时确保输出的有效性。大量实验证明,RFold实现了竞争性能,并且比最先进的方法提高了约八倍的推断效率。代码和Colab演示可在\href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}中找到。
更新时间: 2024-05-24 12:05:40
领域: q-bio.BM,cs.AI,cs.LG
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs
We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.
Updated: 2024-05-24 12:04:54
标题: 学习超越模式匹配?在LLMs中评估数学理解
摘要: 我们开始看到语言模型辅助科学发现取得进展。受将LLMs用作一般科学助手的启发,本文通过评估LLMs的领域知识来了解其理解解决问题所需的不同数学技能。特别地,我们不仅关注预训练模型已经知道的内容,还关注它如何通过利用数学内的复杂知识结构在上下文学习或指导调整中学习。受神经切线核(NTK)的启发,我们提出\textit{NTKEval}来评估LLM在不同种类数学数据上训练后的概率分布变化。我们的系统分析发现了在上下文学习过程中对领域理解的证据。相比之下,某些指导调整导致类似的性能变化,无论在不同数据上训练,这表明在不同技能领域中缺乏理解。
更新时间: 2024-05-24 12:04:54
领域: cs.AI,cs.CL,cs.LG
Gradient-Free Training of Recurrent Neural Networks using Random Perturbations
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle with propagating gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. Subsequently, we conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability when compared to BPTT, strongly outperforming standard node perturbation and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic applications
Updated: 2024-05-24 12:00:40
标题: 无梯度训练递归神经网络使用随机扰动
摘要: 循环神经网络(RNN)由于其图灵完备性和顺序处理能力而具有巨大的计算潜力,然而现有的训练方法遇到了效率挑战。时序反向传播(BPTT)是主流方法,通过将RNN在时间上展开来扩展反向传播(BP)算法。然而,这种方法存在显著缺点,包括需要交替进行前向和后向阶段,并存储精确的梯度信息。此外,BPTT已经被证明在传播长序列的梯度信息方面存在困难,导致梯度消失。使用像BPTT这样的基于梯度的方法的替代策略涉及通过扰动方法随机逼近梯度。这种学习方法异常简单,只需要在网络中进行前向传递和一个全局的强化信号作为反馈。尽管它的简单性,其更新的随机性通常导致优化效率低,限制了其在训练神经网络中的有效性。在本研究中,我们提出了一种新的基于扰动的RNN学习方法,其性能与BPTT竞争力相当,同时保持了基于梯度学习的内在优势。为此,我们将最近引入的基于活动的节点扰动(ANP)方法扩展到时间域中,从而实现更高效的学习和泛化。随后,我们进行了一系列实验来验证我们的方法。我们的结果显示,与BPTT相比,性能、收敛时间和可扩展性类似,明显优于标准的节点扰动和权重扰动方法。这些发现表明,基于扰动的学习方法为训练RNN提供了一种多功能的替代选择,可以理想地适用于神经形态学应用。
更新时间: 2024-05-24 12:00:40
领域: cs.LG
Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
The growing computational demands posed by increasingly number of neural network's parameters necessitate low-memory-consumption training approaches. Previous memory reduction techniques, such as Low-Rank Adaptation (LoRA) and ReLoRA, suffer from the limitation of low rank and saddle point issues, particularly during intensive tasks like pre-training. In this paper, we propose Sparse Spectral Training (SST), an advanced training methodology that updates all singular values and selectively updates singular vectors of network weights, thereby optimizing resource usage while closely approximating full-rank training. SST refines the training process by employing a targeted updating strategy for singular vectors, which is determined by a multinomial sampling method weighted by the significance of the singular values, ensuring both high performance and memory reduction. Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, including natural language generation, machine translation, node classification and link prediction, SST demonstrates its capability to outperform existing memory reduction training methods and is comparable with full-rank training in some cases. On OPT-125M, with rank equating to 8.3% of embedding dimension, SST reduces the perplexity gap to full-rank training by 67.6%, demonstrating a significant reduction of the performance loss with prevalent low-rank methods. This approach offers a strong alternative to traditional training techniques, paving the way for more efficient and scalable neural network training solutions.
Updated: 2024-05-24 11:59:41
标题: 稀疏谱训练和推断在欧几里得和双曲神经网络上
摘要: 随着神经网络参数数量的增加,日益增长的计算需求需要低内存消耗的训练方法。先前的内存减少技术,如低秩适应(LoRA)和ReLoRA,在密集任务(如预训练)中存在低秩和鞍点问题的限制。在本文中,我们提出了稀疏谱训练(SST),一种先进的训练方法,更新所有网络权重的奇异值并选择性地更新奇异向量,从而优化资源使用,同时紧密逼近全秩训练。SST通过采用由奇异值的重要性加权的多项式抽样方法确定的针对性更新策略,为训练过程提供了精细化,确保高性能和内存减少。通过对包括自然语言生成、机器翻译、节点分类和链接预测在内的各种任务上进行全面测试,SST展示了其优于现有内存减少训练方法的能力,并在某些情况下与全秩训练可比。在OPT-125M上,秩等于嵌入维度的8.3%,SST将困惑度差距减少到全秩训练的67.6%,显示出与普遍低秩方法相比,性能损失显著减少。这种方法为传统训练技术提供了强有力的替代方案,为更高效和可扩展的神经网络训练解决方案铺平了道路。
更新时间: 2024-05-24 11:59:41
领域: cs.LG
Fundamental limits of weak learnability in high-dimensional multi-index models
Multi-index models -- functions which only depend on the covariates through a non-linear transformation of their projection on a subspace -- are a useful benchmark for investigating feature learning with neural networks. This paper examines the theoretical boundaries of learnability in this hypothesis class, focusing particularly on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples is $n=\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) first, we identify under which conditions a \textit{trivial subspace} can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) second, in the case where the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an {\it easy subspace} consisting of directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$. The critical threshold $\alpha_{c}$ marks the presence of a computational phase transition, in the sense that no efficient iterative algorithm can succeed for $\alpha\!<\!\alpha_c$. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we demonstrate that interactions between different directions can result in an intricate hierarchical learning phenomenon, where some directions can be learned sequentially when coupled to easier ones. Our analytical approach is built on the optimality of approximate message-passing algorithms among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent.
Updated: 2024-05-24 11:59:02
标题: 高维多指数模型中弱可学习性的基本限制
摘要: 多指数模型——仅依赖于协变量通过对它们在子空间上的投影的非线性转换——是研究神经网络特征学习的一个有用基准。本文研究了在这种假设类中学习的理论边界,特别关注弱恢复其低维结构所需的最小样本复杂性,使用高维度条件下样本数量$n=\alpha d$与协变量维度$d$成比例。我们的发现分为三部分:(i)首先,我们确定了在任意$\alpha\!>\!0$的情况下通过一步第一阶算法学习\textit{平凡子空间}的条件;(ii)其次,在平凡子空间为空的情况下,我们提供了存在{\it 简单子空间}的必要和充分条件,该子空间包含只能在某个样本复杂度$\alpha\!>\!\alpha_c$以上才能学习的方向。关键阈值$\alpha_{c}$标志着计算相变的存在,意味着对于$\alpha\!<\!\alpha_c$没有有效的迭代算法可以成功。在一组真正困难的方向中——类似于奇偶问题——发现$\alpha_c$发散。最后,(iii)我们证明不同方向之间的相互作用可能导致一个错综复杂的分层学习现象,当与更容易的方向耦合时,一些方向可以顺序学习。我们的分析方法建立在近似传递消息算法在第一阶迭代方法中的最优性基础上,勾勒了跨一系列算法的基本可学习界限,包括用梯度下降训练的神经网络。
更新时间: 2024-05-24 11:59:02
领域: cs.LG,cond-mat.dis-nn,cs.CC
Editable Concept Bottleneck Models
Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as privacy concerns, data mislabelling, spurious concepts, and concept annotation errors. Thus, the challenge of deriving efficient editable CBMs without retraining from scratch persists, particularly in large-scale applications. To address these challenges, we propose Editable Concept Bottleneck Models (ECBMs). Specifically, ECBMs support three different levels of data removal: concept-label-level, concept-level, and data-level. ECBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for re-training. Experimental results demonstrate the efficiency and effectiveness of our ECBMs, affirming their adaptability within the realm of CBMs.
Updated: 2024-05-24 11:55:46
标题: 可编辑的概念瓶颈模型
摘要: Concept Bottleneck Models(CBMs)因其通过人类可理解的概念层阐明预测过程的能力而引起了广泛关注。然而,大多数先前的研究集中在数据(包括概念)干净的情况下。在许多情况下,我们总是需要删除/插入一些训练数据或新概念,由于不同的原因,例如隐私问题、数据标注错误、虚假概念和概念注释错误。因此,从头开始推导高效可编辑CBMs的挑战仍然存在,特别是在大规模应用中。为了解决这些挑战,我们提出了可编辑概念瓶颈模型(ECBMs)。具体而言,ECBMs支持三种不同级别的数据删除:概念标签级别、概念级别和数据级别。ECBMs享有数学严谨的闭合形式近似,这些近似是从影响函数中推导出来的,无需重新训练。实验结果证明了我们的ECBMs的效率和有效性,肯定了它们在CBMs领域内的适应性。
更新时间: 2024-05-24 11:55:46
领域: cs.LG,cs.AI,cs.CV
Unlearning during Learning: An Efficient Federated Machine Unlearning Method
In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders them less practical in real FL scenarios. In this paper, we introduce FedAU, an innovative and efficient FMU framework aimed at overcoming these limitations. Specifically, FedAU incorporates a lightweight auxiliary unlearning module into the learning process and employs a straightforward linear operation to facilitate unlearning. This approach eliminates the requirement for extra time-consuming steps, rendering it well-suited for FL. Furthermore, FedAU exhibits remarkable versatility. It not only enables multiple clients to carry out unlearning tasks concurrently but also supports unlearning at various levels of granularity, including individual data samples, specific classes, and even at the client level. We conducted extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets to evaluate the performance of FedAU. The results demonstrate that FedAU effectively achieves the desired unlearning effect while maintaining model accuracy.
Updated: 2024-05-24 11:53:13
标题: 学习过程中的遗忘:一种高效的联邦机器遗忘方法
摘要: 最近,联邦学习(FL)作为一种分布式机器学习范式,引起了广泛关注。为了促进被遗忘权的实施,联邦机器遗忘(FMU)的概念也应运而生。然而,目前的FMU方法往往涉及额外耗时步骤,可能无法提供全面的遗忘能力,这使它们在真实的FL场景中不太实用。在本文中,我们介绍了FedAU,一个旨在克服这些限制的创新而高效的FMU框架。具体来说,FedAU将一个轻量级的辅助遗忘模块整合到学习过程中,并采用直观的线性操作来促进遗忘。这种方法消除了额外耗时步骤的要求,使其非常适合FL。此外,FedAU表现出卓越的多功能性。它不仅使多个客户端能够同时进行遗忘任务,还支持在不同粒度上进行遗忘,包括单个数据样本、特定类别,甚至在客户端级别进行遗忘。我们在MNIST、CIFAR10和CIFAR100数据集上进行了广泛的实验,以评估FedAU的性能。结果表明,FedAU有效地实现了所期望的遗忘效果,同时保持了模型的准确性。
更新时间: 2024-05-24 11:53:13
领域: cs.LG,cs.DC
ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, \moniker{} robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of \moniker{} is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method's ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that \moniker{} significantly improves collision handling for learned simulation and produces visually compelling results.
Updated: 2024-05-24 11:51:32
标题: ContourCraft:学习解决神经多服装模拟中的相交问题
摘要: 学习驱动的布料模拟方法近年来开始展现出潜力。然而,在神经模拟中处理碰撞和交叉仍然是一个尚未解决的问题。在这项工作中,我们提出了一种名为\moniker{}的学习驱动解决方案,用于处理神经布料模拟中的交叉。与传统方法不同,\moniker{}能够稳健地从由于未检测到碰撞、自相交体或手动设计的多层服装中引入的交叉中恢复。\moniker{}的技术核心是一种新颖的交叉轮廓损失,惩罚物体的相互穿插并鼓励快速解决这些问题。我们将交叉损失与避免碰撞的斥力目标相结合,应用于基于图神经网络(GNNs)的神经布料模拟方法中。我们展示了我们的方法在动态人体运动下的多层服装中的能力。我们广泛的分析表明,\moniker{}显著改善了学习模拟中的碰撞处理,并产生视觉上引人注目的结果。
更新时间: 2024-05-24 11:51:32
领域: cs.GR,cs.LG
Encoder Embedding for General Graph and Node Classification
Graph encoder embedding, a recent technique for graph data, offers speed and scalability in producing vertex-level representations from binary graphs. In this paper, we extend the applicability of this method to a general graph model, which includes weighted graphs, distance matrices, and kernel matrices. We prove that the encoder embedding satisfies the law of large numbers and the central limit theorem on a per-observation basis. Under certain condition, it achieves asymptotic normality on a per-class basis, enabling optimal classification through discriminant analysis. These theoretical findings are validated through a series of experiments involving weighted graphs, as well as text and image data transformed into general graph representations using appropriate distance metrics.
Updated: 2024-05-24 11:51:08
标题: 编码器嵌入用于一般图和节点分类
摘要: 图编码器嵌入是一种最近针对图数据的技术,它能够在从二进制图中生成顶点级表示方面提供速度和可扩展性。在本文中,我们将该方法的适用性扩展到一般图模型,包括加权图、距离矩阵和核矩阵。我们证明编码器嵌入满足大数定律和中心极限定理。在某些条件下,它在每个类别的基础上实现了渐近正态性,通过判别分析实现最佳分类。这些理论发现通过一系列实验得到验证,这些实验涉及加权图以及将文本和图像数据转换为一般图表示,使用适当的距离度量。
更新时间: 2024-05-24 11:51:08
领域: stat.ML,cs.LG,cs.SI
Autoregressive Image Diffusion: Generation of Image Sequence and Application in MRI
Magnetic resonance imaging (MRI) is a widely used non-invasive imaging modality. However, a persistent challenge lies in balancing image quality with imaging speed. This trade-off is primarily constrained by k-space measurements, which traverse specific trajectories in the spatial Fourier domain (k-space). These measurements are often undersampled to shorten acquisition times, resulting in image artifacts and compromised quality. Generative models learn image distributions and can be used to reconstruct high-quality images from undersampled k-space data. In this work, we present the autoregressive image diffusion (AID) model for image sequences and use it to sample the posterior for accelerated MRI reconstruction. The algorithm incorporates both undersampled k-space and pre-existing information. Models trained with fastMRI dataset are evaluated comprehensively. The results show that the AID model can robustly generate sequentially coherent image sequences. In 3D and dynamic MRI, the AID can outperform the standard diffusion model and reduce hallucinations, due to the learned inter-image dependencies.
Updated: 2024-05-24 11:41:54
标题: 自回归图像扩散:图像序列的生成及在MRI中的应用
摘要: 核磁共振成像(MRI)是一种广泛使用的非侵入性成像技术。然而,一个持续的挑战在于平衡图像质量和成像速度。这种权衡主要受到k-空间测量的限制,这些测量沿着空间傅里叶域(k-空间)的特定轨迹。这些测量通常被欠采样以缩短采集时间,导致图像伪影和质量妥协。生成模型学习图像分布,并可用于从欠采样k-空间数据重建高质量图像。在这项工作中,我们提出了用于图像序列的自回归图像扩散(AID)模型,并将其用于加速MRI重建的后验抽样。该算法结合了欠采样k-空间和预先存在的信息。使用fastMRI数据集训练的模型进行了全面评估。结果显示,AID模型能够稳健地生成连贯的图像序列。在3D和动态MRI中,AID可以胜过标准扩散模型并减少幻觉,这是由于学习到的图像间依赖关系。
更新时间: 2024-05-24 11:41:54
领域: eess.IV,cs.AI,cs.CV
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensional relevant directions. In the high-dimensional regime, where the input dimension $d$ diverges, we show that a simple modification of the idealized single-pass gradient descent training scenario, where data can now be repeated or iterated upon twice, drastically improves its computational efficiency. In particular, it surpasses the limitations previously believed to be dictated by the Information and Leap exponents associated with the target function to be learned. Our results highlight the ability of networks to learn relevant structures from data alone without any pre-processing. More precisely, we show that (almost) all directions are learned with at most $O(d \log d)$ steps. Among the exceptions is a set of hard functions that includes sparse parities. In the presence of coupling between directions, however, these can be learned sequentially through a hierarchical mechanism that generalizes the notion of staircase functions. Our results are proven by a rigorous study of the evolution of the relevant statistics for high-dimensional dynamics.
Updated: 2024-05-24 11:34:31
标题: 重复有益:数据重复使得随机梯度下降能够学习高维多指数函数
摘要: 神经网络能够识别高维嘈杂数据中的低维相关结构,但我们对它们如何做到这一点的数学理解仍然很有限。在这里,我们研究了使用基于梯度的算法训练的两层浅层神经网络的训练动态,并讨论它们如何学习多指数模型中的相关特征,即具有低维相关方向的目标函数。在高维情况下,输入维度$d$发散,我们展示了一种简单修改理想化的单次梯度下降训练方案的方法,其中数据现在可以重复或迭代两次,极大地提高了计算效率。特别是,它超越了先前认为由于与要学习的目标函数相关的信息和跃迁指数所规定的限制。我们的结果突出了网络仅通过数据学习相关结构的能力,而无需任何预处理。更确切地说,我们展示了几乎所有方向最多需要$O(d \log d)$步就可以学习。其中的例外是一组包括稀疏奇偶性在内的难函数。然而,在方向之间存在耦合的情况下,这些函数可以通过一个层次机制顺序学习,该机制推广了阶梯函数的概念。我们的结果通过对高维动态的相关统计值演变的严格研究加以证明。
更新时间: 2024-05-24 11:34:31
领域: stat.ML,cs.LG
FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler
Federated learning (FL) enables collaborative machine learning across distributed data owners, but data heterogeneity poses a challenge for model calibration. While prior work focused on improving accuracy for non-iid data, calibration remains under-explored. This study reveals existing FL aggregation approaches lead to sub-optimal calibration, and theoretical analysis shows despite constraining variance in clients' label distributions, global calibration error is still asymptotically lower bounded. To address this, we propose a novel Federated Calibration (FedCal) approach, emphasizing both local and global calibration. It leverages client-specific scalers for local calibration to effectively correct output misalignment without sacrificing prediction accuracy. These scalers are then aggregated via weight averaging to generate a global scaler, minimizing the global calibration error. Extensive experiments demonstrate FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47.66% on average.
Updated: 2024-05-24 11:33:58
标题: FedCal:通过聚合参数化标量在联邦学习中实现本地和全局校准
摘要: 联邦学习(FL)实现了分布式数据所有者之间的协作机器学习,但数据异质性给模型校准带来了挑战。尽管先前的研究侧重于提高非独立同分布数据的准确性,但校准仍未被充分探讨。本研究揭示了现有的FL聚合方法导致次优校准,理论分析表明,尽管限制了客户端标签分布的方差,全局校准误差仍然渐近地受到下界限制。为了解决这个问题,我们提出了一种新颖的联邦校准(FedCal)方法,强调本地和全局校准。它利用客户端特定的缩放器进行本地校准,有效地校正输出不对齐而不损失预测准确性。然后,这些缩放器通过权重平均聚合生成全局缩放器,最小化全局校准误差。大量实验证明,FedCal显著优于表现最佳的基线,平均降低全局校准误差47.66%。
更新时间: 2024-05-24 11:33:58
领域: cs.LG,cs.DC
Predicting Parkinson's disease trajectory using clinical and functional MRI features: a reproduction and replication study
Parkinson's disease (PD) is a common neurodegenerative disorder with a poorly understood physiopathology and no established biomarkers for the diagnosis of early stages and for prediction of disease progression. Several neuroimaging biomarkers have been studied recently, but these are susceptible to several sources of variability. In this context, an evaluation of the robustness of such biomarkers is essential. This study is part of a larger project investigating the replicability of potential neuroimaging biomarkers of PD. Here, we attempt to reproduce (same data, same method) and replicate (different data or method) the models described in Nguyen et al., 2021 to predict individual's PD current state and progression using demographic, clinical and neuroimaging features (fALFF and ReHo extracted from resting-state fMRI). We use the Parkinson's Progression Markers Initiative dataset (PPMI, ppmi-info.org), as in Nguyen et al.,2021 and aim to reproduce the original cohort, imaging features and machine learning models as closely as possible using the information available in the paper and the code. We also investigated methodological variations in cohort selection, feature extraction pipelines and sets of input features. The success of the reproduction was assessed using different criteria. Notably, we obtained significantly better than chance performance using the analysis pipeline closest to that in the original study (R2 > 0), which is consistent with its findings. The challenges encountered while reproducing and replicating the original work are likely explained by the complexity of neuroimaging studies, in particular in clinical settings. We provide recommendations to further facilitate the reproducibility of such studies in the future.
Updated: 2024-05-24 11:33:01
标题: 使用临床和功能性磁共振特征预测帕金森病发展轨迹:一项复制和复制研究
摘要: 帕金森病(PD)是一种常见的神经退行性疾病,其病理生理机制尚不明确,也没有确立的生物标志物用于早期诊断和预测疾病进展。最近已研究了几种神经影像学生物标志物,但这些标志物容易受到多种变量的影响。在这种背景下,评估这些生物标志物的稳健性是至关重要的。本研究是一个更大项目的一部分,旨在研究PD潜在神经影像学生物标志物的可复制性。在这里,我们尝试复制(相同数据,相同方法)和重复(不同数据或方法)Nguyen等人(2021年)描述的模型,以预测个体的PD当前状态和进展,使用人口统计学、临床和神经影像特征(fALFF和ReHo从静息态fMRI中提取)。我们使用了帕金森病进展标志物倡议数据集(PPMI,ppmi-info.org),与Nguyen等人(2021年)一样,并且旨在尽可能接近使用论文和代码中提供的信息复制原始队列、影像特征和机器学习模型。我们还调查了队列选择、特征提取管线和输入特征集的方法变化。通过不同的标准来评估复制的成功。值得注意的是,我们使用最接近原始研究中的分析流程获得了显著优于随机的表现(R2 > 0),这与其发现一致。在复制和重复原始工作时遇到的挑战可能是由于神经影像学研究的复杂性,特别是在临床环境中。我们提出建议,以进一步促进将来这类研究的可重复性。
更新时间: 2024-05-24 11:33:01
领域: q-bio.NC,cs.AI,eess.IV
Benchmarking Pre-trained Large Language Models' Potential Across Urdu NLP tasks
Large Language Models (LLMs) pre-trained on multilingual data have revolutionized natural language processing research, by transitioning from languages and task specific model pipelines to a single model adapted on a variety of tasks. However majority of existing multilingual NLP benchmarks for LLMs provide evaluation data in only few languages with little linguistic diversity. In addition these benchmarks lack quality assessment against the respective state-of the art models. This study presents an in-depth examination of prominent LLMs; GPT-3.5-turbo, Llama2-7B-Chat, Bloomz 7B1 and Bloomz 3B, across 14 tasks using 15 Urdu datasets, in a zero-shot setting, and their performance against state-of-the-art (SOTA) models, has been compared and analysed. Our experiments show that SOTA models surpass all the encoder-decoder pre-trained language models in all Urdu NLP tasks with zero-shot learning. Our results further show that LLMs with fewer parameters, but more language specific data in the base model perform better than larger computational models, but low language data.
Updated: 2024-05-24 11:30:37
标题: 基准测试预训练大型语言模型在乌尔都语自然语言处理任务中的潜力
摘要: 在多语言数据上预训练的大型语言模型(LLMs)已经彻底改变了自然语言处理研究,从专门针对语言和任务的模型管道过渡到一个适应各种任务的单一模型。然而,现有大多数多语言NLP基准仅提供少数语言的评估数据,缺乏语言多样性。此外,这些基准缺乏针对最先进模型的质量评估。本研究对知名LLMs(GPT-3.5-turbo、Llama2-7B-Chat、Bloomz 7B1和Bloomz 3B)在14项任务中使用15个乌尔都语数据集进行了深入研究,在零炮击设置下,比较和分析了它们与最先进(SOTA)模型的性能。我们的实验表明,SOTA模型在所有乌尔都语NLP任务中均超越所有编码器-解码器预训练语言模型的零炮击学习。我们的结果进一步显示,具有更少参数但在基础模型中具有更多语言特定数据的LLMs优于计算量更大但语言数据较少的模型。
更新时间: 2024-05-24 11:30:37
领域: cs.CL,cs.AI,I.2.7
Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated knowledge updates resulting as a consequence of knowledge edits. This has a detrimental impact on the overall consistency of the updated knowledge. To address these issues, in this paper, we propose a novel framework named RULE-KE, i.e., RULE based Knowledge Editing, which is a cherry on the top for augmenting the performance of all existing MQA methods under KE. Specifically, RULE-KE leverages rule discovery to discover a set of logical rules. Then, it uses these discovered rules to update knowledge about facts highly correlated with the edit. Experimental evaluation using existing and newly curated datasets (i.e., RKE-EVAL) shows that RULE-KE helps augment both performances of parameter-based and memory-based solutions up to 92% and 112.9%, respectively.
Updated: 2024-05-24 11:30:00
标题: 在知识编辑中利用逻辑规则:画龙点睛
摘要: 基于知识编辑的多跳问题回答(MQA)是大型语言模型(LLMs)中的一个关键挑战。在这一领域中表现最佳的解决方案使用了计划和解决范式,将一个问题分解为子问题,然后生成响应。然而,我们认为这种方法并不是最优的,因为它无法解决难以分解的问题,并且不能明确地满足由于知识编辑而产生的相关知识更新。这对更新后的知识的整体一致性产生了不利影响。为了解决这些问题,本文提出了一种名为RULE-KE的新框架,即基于规则的知识编辑,它是增强所有现有基于KE的MQA方法性能的一种额外手段。具体而言,RULE-KE利用规则发现来发现一组逻辑规则。然后,它使用这些发现的规则来更新与编辑高度相关的事实知识。使用现有和新筛选的数据集(即RKE-EVAL)进行的实验评估显示,RULE-KE有助于将基于参数和基于内存的解决方案的性能分别提高了92%和112.9%。
更新时间: 2024-05-24 11:30:00
领域: cs.CL,cs.AI,cs.LG
Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning
Recent regulatory initiatives like the European AI Act and relevant voices in the Machine Learning (ML) community stress the need to describe datasets along several key dimensions for trustworthy AI, such as the provenance processes and social concerns. However, this information is typically presented as unstructured text in accompanying documentation, hampering their automated analysis and processing. In this work, we explore using large language models (LLM) and a set of prompting strategies to automatically extract these dimensions from documents and enrich the dataset description with them. Our approach could aid data publishers and practitioners in creating machine-readable documentation to improve the discoverability of their datasets, assess their compliance with current AI regulations, and improve the overall quality of ML models trained on them. In this paper, we evaluate the approach on 12 scientific dataset papers published in two scientific journals (Nature's Scientific Data and Elsevier's Data in Brief) using two different LLMs (GPT3.5 and Flan-UL2). Results show good accuracy with our prompt extraction strategies. Concrete results vary depending on the dimensions, but overall, GPT3.5 shows slightly better accuracy (81,21%) than FLAN-UL2 (69,13%) although it is more prone to hallucinations. We have released an open-source tool implementing our approach and a replication package, including the experiments' code and results, in an open-source repository.
Updated: 2024-05-24 11:25:49
标题: 使用大型语言模型丰富机器学习数据集文档
摘要: 最近的监管倡议,如欧洲AI法案和机器学习(ML)社区中相关的声音强调了需要沿着几个关键维度描述数据集以实现可信任的人工智能,比如数据来源过程和社会关注点。然而,这些信息通常以非结构化文本的形式呈现在附随文档中,阻碍了它们的自动分析和处理。在这项工作中,我们探讨使用大型语言模型(LLM)和一组提示策略从文档中自动提取这些维度,并用它们丰富数据集描述。我们的方法可以帮助数据发布者和从业者创建机器可读的文档,以提高他们数据集的可发现性,评估其符合当前AI法规的程度,并改善在其上训练的ML模型的整体质量。 在本文中,我们使用两种不同的LLM(GPT3.5和Flan-UL2)评估了该方法在两本科学期刊(自然科学数据和爱思唯尔数据简报)上发表的12篇科学数据集论文上的效果。结果显示我们的提示提取策略具有较好的准确性。具体结果取决于维度,但总体而言,GPT3.5显示出略高的准确性(81.21%)比FLAN-UL2(69.13%),尽管它更容易出现幻觉。我们已经发布了一个实现我们方法的开源工具和一个复制包,包括实验的代码和结果,在一个开源存储库中。
更新时间: 2024-05-24 11:25:49
领域: cs.DL,cs.AI,cs.CL,H.4.4
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making
Investigating fairness and equity of automated systems has become a critical field of inquiry. Most of the literature in fair machine learning focuses on defining and achieving fairness criteria in the context of prediction, while not explicitly focusing on how these predictions may be used later on in the pipeline. For instance, if commonly used criteria, such as independence or sufficiency, are satisfied for a prediction score $S$ used for binary classification, they need not be satisfied after an application of a simple thresholding operation on $S$ (as commonly used in practice). In this paper, we take an important step to address this issue in numerous statistical and causal notions of fairness. We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We then demonstrate that the marginal difference in the optimal 0/1 predictor $\widehat Y$ between groups, written $P(\hat y \mid x_1) - P(\hat y \mid x_0)$, can be causally decomposed into the influences of $X$ on the $L_2$-optimal prediction score $S$ and the influences of $X$ on the margin complement $M$, along different causal pathways (direct, indirect, spurious). We then show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$. This yields a new decomposition of the disparity in the predictor $\widehat Y$ that allows us to disentangle causal differences inherited from the true outcome $Y$ that exists in the real world vs. those coming from the optimization procedure itself. This observation highlights the need for more regulatory oversight due to the potential for bias amplification, and to address this issue we introduce new notions of weak and strong business necessity, together with an algorithm for assessing whether these notions are satisfied.
Updated: 2024-05-24 11:22:19
标题: 注意差距:关于预测和决策中偏见放大的因果透视
摘要: 调查自动化系统的公平性和公平性已经成为一个关键的研究领域。公平机器学习领域的大部分文献都集中在定义和实现公平标准,而并没有明确关注这些预测结果可能在后续流程中如何使用。例如,如果用于二元分类的预测分数$S$满足常用的独立性或充分性等标准,那么在$S$上应用简单的阈值操作(在实践中常用)后并不一定满足这些标准。在本文中,我们采取了重要步骤来解决这个问题,涉及了多种统计和因果公平性概念。我们引入了边际互补的概念,该概念衡量了预测分数$S$由于阈值操作而发生的变化程度。然后我们证明最优0/1预测器$\widehat Y$在不同群体之间的边际差异,记为$P(\hat y \mid x_1) - P(\hat y \mid x_0)$,可以被因果分解为$X$对$L_2$最优预测分数$S$的影响和$X$对边际互补$M$的影响,沿着不同的因果路径(直接、间接、虚假)。然后我们展示,在适当的因果假设下,$X$对预测分数$S$的影响等于$X$对真实结果$Y$的影响。这产生了一个新的预测器$\widehat Y$不一致性的分解,使我们能够区分来自真实世界中存在的真实结果$Y的因果差异与来自优化过程本身的差异。这一观察强调了由于偏见放大的潜力而需要更多的监管监督,为了解决这个问题,我们引入了弱业务必要性和强业务必要性的新概念,以及一种评估这些概念是否满足的算法。
更新时间: 2024-05-24 11:22:19
领域: cs.LG,cs.AI,stat.ML
HyperInterval: Hypernetwork approach to training weight interval regions in continual learning
Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HyperInterval, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HyperInterval maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and can be seen as a meta-trainer. HyperInterval obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
Updated: 2024-05-24 11:20:41
标题: 超区间:超网络方法用于在持续学习中训练权重区间区域
摘要: 最近,提出了一种新的持续学习(CL)范式,用于控制灾难性遗忘,称为间隔持续学习(InterContiNet),它依赖于对神经网络参数空间施加间隔约束。不幸的是,InterContiNet训练由于权重空间的高维性而具有挑战性,使得间隔难以管理。为了解决这个问题,我们引入了HyperInterval,这是一种在嵌入空间内使用间隔算术并利用超网络将这些间隔映射到目标网络参数空间的技术。我们为连续任务训练间隔嵌入,并训练一个超网络将这些嵌入转换为目标网络的权重。给定任务的嵌入与超网络一起训练,保留了目标网络对先前任务嵌入的响应。间隔算术在一个更易管理的、低维度的嵌入空间中工作,而不是直接在高维权重空间中准备间隔。我们的模型允许更快、更高效的训练。此外,HyperInterval保持了不遗忘的保证。在训练结束时,我们可以选择一个通用的嵌入来生成一个专用于所有任务的单一网络。在这样的框架中,超网络仅用于训练,可以看作是一个元训练器。HyperInterval在几个基准测试中取得了显著更好的结果,比InterContiNet取得了SOTA结果。
更新时间: 2024-05-24 11:20:41
领域: cs.LG,cs.AI
Fairness-Accuracy Trade-Offs: A Causal Perspective
Systems based on machine learning may exhibit discriminatory behavior based on sensitive characteristics such as gender, sex, religion, or race. In light of this, various notions of fairness and methods to quantify discrimination were proposed, leading to the development of numerous approaches for constructing fair predictors. At the same time, imposing fairness constraints may decrease the utility of the decision-maker, highlighting a tension between fairness and utility. This tension is also recognized in legal frameworks, for instance in the disparate impact doctrine of Title VII of the Civil Rights Act of 1964 -- in which specific attention is given to considerations of business necessity -- possibly allowing the usage of proxy variables associated with the sensitive attribute in case a high-enough utility cannot be achieved without them. In this work, we analyze the tension between fairness and accuracy from a causal lens for the first time. We introduce the notion of a path-specific excess loss (PSEL) that captures how much the predictor's loss increases when a causal fairness constraint is enforced. We then show that the total excess loss (TEL), defined as the difference between the loss of predictor fair along all causal pathways vs. an unconstrained predictor, can be decomposed into a sum of more local PSELs. At the same time, enforcing a causal constraint often reduces the disparity between demographic groups. Thus, we introduce a quantity that summarizes the fairness-utility trade-off, called the causal fairness/utility ratio, defined as the ratio of the reduction in discrimination vs. the excess loss from constraining a causal pathway. This quantity is suitable for comparing the fairness-utility trade-off across causal pathways. Finally, as our approach requires causally-constrained fair predictors, we introduce a new neural approach for causally-constrained fair learning.
Updated: 2024-05-24 11:19:52
标题: 公平性-准确性权衡:因果透视
摘要: 基于机器学习的系统可能基于敏感特征如性别、宗教或种族而表现出歧视行为。鉴于此,提出了各种公平概念和量化歧视的方法,导致了构建公平预测器的众多方法的发展。同时,施加公平约束可能降低决策者的效用,凸显了公平与效用之间的紧张关系。这种紧张关系也在法律框架中得到认可,例如1964年《民权法案》第七章的不同影响条款中,特别关注商业必要性的考虑,可能允许在没有这些变量的情况下无法达到足够高效用时使用与敏感属性相关联的代理变量。在这项工作中,我们首次从因果角度分析了公平与准确性之间的紧张关系。我们引入了路径特定过度损失(PSEL)的概念,捕捉了当强制执行因果公平约束时预测器的损失增加了多少。然后,我们展示了总过度损失(TEL)的概念,定义为在所有因果路径上公平预测器的损失与无约束预测器之间的差异,可以分解为更多局部PSEL的总和。同时,强制执行因果约束通常降低了人口群体之间的差异。因此,我们引入了一个总结公平-效用权衡的量,称为因果公平/效用比,定义为歧视减少与在约束一个因果路径时的过度损失之间的比率。这一量可以用于比较各因果路径上的公平-效用权衡。最后,由于我们的方法需要具有因果约束的公平预测器,我们引入了一种新的神经方法来进行因果约束的公平学习。
更新时间: 2024-05-24 11:19:52
领域: cs.LG,cs.AI,stat.ML
Towards Precision Healthcare: Robust Fusion of Time Series and Image Data
With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code will be made available online.
Updated: 2024-05-24 11:18:13
标题: 朝向精准医疗:时间序列和图像数据的稳健融合
摘要: 随着各种数据类型的不断增加,特别是来自医学实验的图像和时间序列数据,对于设计有效结合各种数据模态的技术需求正在增长。我们的动机源于预测死亡和表型等重要领域,使用不同数据模态可以显著提高我们的预测能力。为了解决这一挑战,我们引入了一种新方法,使用两个单独的编码器,一个用于每种类型的数据,使模型能够理解视觉和基于时间的信息中的复杂模式。除了技术挑战,我们的目标是使预测模型在嘈杂环境中更加稳健,并且比当前方法表现更好。我们还处理不平衡数据集,并使用不确定性损失函数,提供改进的结果同时提供一种原则性的建模不确定性的方法。此外,我们还包括注意机制来融合不同的模态,使模型能够专注于每项任务的重点。我们使用全面的多模态MIMIC数据集进行了测试,结合了MIMIC-IV和MIMIC-CXR数据集。我们的实验表明,我们的方法对于临床应用中的多模态深度学习是有效的。代码将在网上提供。
更新时间: 2024-05-24 11:18:13
领域: eess.IV,cs.CV,cs.LG
Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances
Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimensions before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the SDP solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.
Updated: 2024-05-24 11:14:56
标题: 核最大切片Wasserstein距离的统计和计算保证
摘要: 最优输运在各种机器学习任务中取得了很大成功;然而,众所周知它受到维度灾难的影响。因此,在应用于具有低维结构的高维数据时,降维是可取的。核最大切片(KMS)Wasserstein距离是为此目的而开发的,通过找到一个最优的非线性映射将数据降维到1维,然后计算Wasserstein距离。然而,其理论性质尚未完全发展。在本文中,我们在相对于最先进技术更温和的技术假设下,为KMS $p$-Wasserstein距离提供了尖锐的有限样本保证,其中$p\in[1,\infty)$。在算法方面,我们展示了计算KMS 2-Wasserstein距离是NP难题,然后我们进一步提出了一个半定松弛(SDR)形式(可以在多项式时间内有效求解),并为SDP解提供了一个松弛间隙。我们提供数值示例来展示我们的方案在高维两样本测试中的良好性能。
更新时间: 2024-05-24 11:14:56
领域: stat.ML,cs.CC,cs.LG
Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images
The application of data augmentation for deep learning (DL) methods plays an important role in achieving state-of-the-art results in supervised, semi-supervised, and self-supervised image classification. In particular, channel transformations (e.g., solarize, grayscale, brightness adjustments) are integrated into data augmentation pipelines for remote sensing (RS) image classification tasks. However, contradicting beliefs exist about their proper applications to RS images. A common point of critique is that the application of channel augmentation techniques may lead to physically inconsistent spectral data (i.e., pixel signatures). To shed light on the open debate, we propose an approach to estimate whether a channel augmentation technique affects the physical information of RS images. To this end, the proposed approach estimates a score that measures the alignment of a pixel signature within a time series that can be naturally subject to deviations caused by factors such as acquisition conditions or phenological states of vegetation. We compare the scores associated with original and augmented pixel signatures to evaluate the physical consistency. Experimental results on a multi-label image classification task show that channel augmentations yielding a score that exceeds the expected deviation of original pixel signatures can not improve the performance of a baseline model trained without augmentation.
Updated: 2024-05-24 11:14:37
标题: 估计用于遥感图像的通道数据增强的物理信息一致性
摘要: 数据增强在深度学习(DL)方法中的应用对于在监督、半监督和自监督图像分类中取得最先进结果起着重要作用。特别是,通道变换(例如,反转、灰度、亮度调整)被整合到遥感图像分类任务的数据增强流程中。然而,对于它们在遥感图像中的适当应用存在着相互矛盾的观点。批评的一个共同点是通道增强技术的应用可能导致物理上不一致的光谱数据(即,像素特征)。为了阐明这一争论,我们提出了一种方法来评估通道增强技术是否影响了遥感图像的物理信息。为此,所提出的方法估计了一个分数,该分数测量了一个像素特征在一个时间序列中的对齐程度,该时间序列可能受到因素(如采集条件或植被的季节状态)引起的偏差的影响。我们比较原始和增强的像素特征相关的分数,以评估物理上的一致性。在一个多标签图像分类任务上的实验结果表明,产生超过原始像素特征预期偏差的分数的通道增强不能提高没有增强训练的基线模型的性能。
更新时间: 2024-05-24 11:14:37
领域: cs.CV,cs.LG
Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Text-driven human motion generation is an emerging task in animation and humanoid robot design. Existing algorithms directly generate the full sequence which is computationally expensive and prone to errors as it does not pay special attention to key poses, a process that has been the cornerstone of animation for decades. We propose KeyMotion, that generates plausible human motion sequences corresponding to input text by first generating keyframes followed by in-filling. We use a Variational Autoencoder (VAE) with Kullback-Leibler regularization to project the keyframes into a latent space to reduce dimensionality and further accelerate the subsequent diffusion process. For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the keyframe latents and text condition. To complete the motion sequence, we propose a text-guided Transformer designed to perform motion-in-filling, ensuring the preservation of both fidelity and adherence to the physical constraints of human motion. Experiments show that our method achieves state-of-theart results on the HumanML3D dataset outperforming others on all R-precision metrics and MultiModal Distance. KeyMotion also achieves competitive performance on the KIT dataset, achieving the best results on Top3 R-precision, FID, and Diversity metrics.
Updated: 2024-05-24 11:12:37
标题: 使用基于关键帧的并行跳跃Transformer进行文本引导的3D人体运动生成
摘要: 文本驱动的人体运动生成是动画和人形机器人设计中的新兴任务。现有算法直接生成完整序列,这在计算上昂贵且容易出错,因为它未对关键姿势给予特别关注,而这一过程几十年来一直是动画的基石。我们提出了KeyMotion,通过首先生成关键帧,然后进行填充,生成与输入文本相对应的合理人体运动序列。我们使用具有Kullback-Leibler正则化的变分自动编码器(VAE)将关键帧投影到潜在空间中,以降低维度并进一步加速后续扩散过程。对于逆扩散,我们提出了一种新颖的并行跳过变换器,可以在关键帧潜在空间和文本条件之间执行跨模态注意力。为了完成运动序列,我们提出了一个文本引导的变换器,设计用于执行运动填充,确保保持忠实度和遵守人体运动的物理约束。实验表明,我们的方法在HumanML3D数据集上实现了最先进的结果,在所有R-precision指标和MultiModal Distance上优于其他方法。KeyMotion在KIT数据集上也取得了竞争性表现,在Top3 R-precision、FID和多样性指标上取得了最佳结果。
更新时间: 2024-05-24 11:12:37
领域: cs.CV,cs.AI
Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China
Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance.
Updated: 2024-05-24 11:10:58
标题: 比较利用新的森林清查样地在中国东北和西南地区对比森林进行遥感森林生物量制图方法
摘要: 大规模高空间分辨率地上生物量(AGB)图在确定森林碳储量及其变化中发挥着关键作用,这对于理解全球碳循环并实施减缓气候变化政策至关重要。新型空间LiDAR传感器NASA的GEDI仪器的出现为在高分辨率上准确和无偏估计森林AGB提供了无与伦比的可能性,尤其是在密集和高大的森林中,合成孔径雷达(SAR)和被动光学数据会表现出饱和。然而,GEDI是一种采集离散足迹的采样仪器,其数据必须与其他连续覆盖卫星的数据结合,使用本地机器学习方法创建高分辨率地图。在本研究中,我们开发了用于从GEDI L2A数据中估计森林AGB的本地模型,因为用于创建GEDI L4 AGB数据的模型中包含了来自中国的最少地面数据。然后,我们应用LightGBM和随机森林回归生成了25米分辨率的壁挂AGB地图,利用了大量GEDI足迹以及Sentinel-1数据、ALOS-2 PALSAR-2和Sentinel-2光学数据。通过5倍交叉验证,LightGBM在两个对比区域中表现出略高的性能优势。然而,在两个区域中,LightGBM的计算速度明显快于随机森林模型,需要大约三分之一的时间在相同硬件上计算。通过与地面数据的验证,本研究开发的本地模型生成的25米分辨率AGB地图相比GEDI L4B AGB数据具有更高的准确性。我们发现在两个地区中,随着坡度增加,误差也增加。训练的模型在附近但不同的地区进行了测试,并表现出良好的性能。
更新时间: 2024-05-24 11:10:58
领域: cs.CV,cs.LG,eess.IV
HLDC: Hindi Legal Documents Corpus
Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such as Hindi. In this resource paper, we introduce the Hindi Legal Documents Corpus (HLDC), a corpus of more than 900K legal documents in Hindi. Documents are cleaned and structured to enable the development of downstream applications. Further, as a use-case for the corpus, we introduce the task of bail prediction. We experiment with a battery of models and propose a Multi-Task Learning (MTL) based model for the same. MTL models use summarization as an auxiliary task along with bail prediction as the main task. Experiments with different models are indicative of the need for further research in this area. We release the corpus and model implementation code with this paper: https://github.com/Exploration-Lab/HLDC
Updated: 2024-05-24 11:07:12
标题: HLDC: 印地语法律文件语料库
摘要: 许多人口众多的国家,包括印度,在法律案件方面面临着相当大的积压。开发能够处理法律文件并增强法律从业者的自动化系统可以缓解这种情况。然而,缺乏需要开发这种数据驱动系统所需的高质量语料库。在印地语等资源稀缺语言的情况下,问题变得更加突出。在这篇资源论文中,我们介绍了印地语法律文件语料库(HLDC),这是一个包含超过900K份印地语法律文件的语料库。文件经过清理和结构化,以便开发下游应用程序。此外,作为语料库的一个用例,我们介绍了保释预测任务。我们尝试了一系列模型,并提出了一个基于多任务学习(MTL)的模型。MTL模型将摘要作为辅助任务,同时将保释预测作为主要任务。对不同模型的实验表明需要进一步研究这一领域。我们在本文中发布了语料库和模型实现代码:https://github.com/Exploration-Lab/HLDC
更新时间: 2024-05-24 11:07:12
领域: cs.CL,cs.AI,cs.LG
Repelling Random Walks
We present a novel quasi-Monte Carlo mechanism to improve graph-based sampling, coined repelling random walks. By inducing correlations between the trajectories of an interacting ensemble such that their marginal transition probabilities are unmodified, we are able to explore the graph more efficiently, improving the concentration of statistical estimators whilst leaving them unbiased. The mechanism has a trivial drop-in implementation. We showcase the effectiveness of repelling random walks in a range of settings including estimation of graph kernels, the PageRank vector and graphlet concentrations. We provide detailed experimental evaluation and robust theoretical guarantees. To our knowledge, repelling random walks constitute the first rigorously studied quasi-Monte Carlo scheme correlating the directions of walkers on a graph, inviting new research in this exciting nascent domain.
Updated: 2024-05-24 11:05:47
标题: 击退随机漫步
摘要: 我们提出了一种新颖的拟蒙特卡洛机制,用于改进基于图的抽样,被称为排斥随机游走。通过在相互作用集合的轨迹之间引入相关性,使它们的边际转移概率保持不变,我们能够更有效地探索图,提高统计估计的集中度,同时保持其无偏性。这种机制具有一种简单的实现方法。我们展示了排斥随机游走在估计图核、PageRank向量和图形集浓度等各种环境中的有效性。我们提供了详细的实验评估和稳健的理论保证。据我们所知,排斥随机游走构成了第一个严格研究的拟蒙特卡洛方案,它们在图上相关步行者的方向,为这个激动人心的新兴领域带来了新的研究。
更新时间: 2024-05-24 11:05:47
领域: stat.ML,cs.LG
Hybrid Context Retrieval Augmented Generation Pipeline: LLM-Augmented Knowledge Graphs and Vector Database for Accreditation Reporting Assistance
In higher education, accreditation is a quality assurance process, where an institution demonstrates a commitment to delivering high quality programs and services to their students. For business schools nationally and internationally the Association to Advance Collegiate Schools of Business (AACSB) accreditation is the gold standard. For a business school to receive and subsequently maintain accreditation, the school must undertake a rigorous, time consuming reporting and peer review process, to demonstrate alignment with the AACSB Standards. For this project we create a hybrid context retrieval augmented generation pipeline that can assist in the documentation alignment and reporting process necessary for accreditation. We implement both a vector database and knowledge graph, as knowledge stores containing both institutional data and AACSB Standard data. The output of the pipeline can be used by institution stakeholders to build their accreditation report, dually grounded by the context from the knowledge stores. To develop our knowledge graphs we utilized both a manual construction process as well as an LLM Augmented Knowledge Graph approach. We evaluated the pipeline using the RAGAs framework and observed optimal performance on answer relevancy and answer correctness metrics.
Updated: 2024-05-24 11:05:45
标题: 混合上下文检索增强生成管道:LLM增强知识图和矢量数据库用于认证报告辅助
摘要: 在高等教育中,认证是一个质量保证过程,机构通过展示承诺为学生提供高质量的课程和服务来证明自己的质量。对于国内外的商学院来说,美国 Collegiate Schools of Business (AACSB) 的认证是金标准。为了获得和维持认证,商学院必须进行严格、耗时的报告和同行评审过程,以证明与AACSB标准的一致性。在这个项目中,我们创建了一个混合上下文检索增强生成管道,可以帮助认证所需的文档对齐和报告过程。我们实现了一个包含机构数据和AACSB标准数据的知识存储的向量数据库和知识图。管道的输出可以被机构利益相关者用来构建他们的认证报告,通过知识存储中的上下文双重支撑。为了开发我们的知识图,我们既使用了手动构建过程,也使用了LLM增强知识图方法。我们使用RAGAs框架评估了管道,并观察到在答案相关性和答案正确性指标上的最佳表现。
更新时间: 2024-05-24 11:05:45
领域: cs.IR,cs.AI
Biometrics and Behavioral Modelling for Detecting Distractions in Online Learning
In this article, we explore computer vision approaches to detect abnormal head pose during e-learning sessions and we introduce a study on the effects of mobile phone usage during these sessions. We utilize behavioral data collected from 120 learners monitored while participating in a MOOC learning sessions. Our study focuses on the influence of phone-usage events on behavior and physiological responses, specifically attention, heart rate, and meditation, before, during, and after phone usage. Additionally, we propose an approach for estimating head pose events using images taken by the webcam during the MOOC learning sessions to detect phone-usage events. Our hypothesis suggests that head posture undergoes significant changes when learners interact with a mobile phone, contrasting with the typical behavior seen when learners face a computer during e-learning sessions. We propose an approach designed to detect deviations in head posture from the average observed during a learner's session, operating as a semi-supervised method. This system flags events indicating alterations in head posture for subsequent human review and selection of mobile phone usage occurrences with a sensitivity over 90%.
Updated: 2024-05-24 11:02:55
标题: 生物特征和行为模型用于检测在线学习中的干扰
摘要: 在这篇文章中,我们探讨了计算机视觉方法来检测在线学习会话期间异常头部姿势,并介绍了一项关于手机使用对这些会话的影响的研究。我们利用从120名学习者收集的行为数据,这些学习者在参加慕课学习会话时受到监控。我们的研究重点关注手机使用事件对行为和生理反应(特别是注意力、心率和冥想)的影响,包括手机使用前、期间和之后。此外,我们提出了一种方法,利用在慕课学习会话期间通过网络摄像头拍摄的图像来检测手机使用事件的头部姿势事件。我们的假设表明,当学习者与手机互动时,头部姿势会发生显著变化,与学习者在电脑面前进行在线学习会话时所见到的典型行为形成对比。我们提出了一种旨在检测头部姿势偏离学习者会话期间平均水平的方法,作为一种半监督方法。该系统会标记表明头部姿势改变的事件,以供随后的人工审查和选择手机使用事件,灵敏度超过90%。
更新时间: 2024-05-24 11:02:55
领域: cs.CV,cs.HC,cs.LG
Airship Formations for Animal Motion Capture and Behavior Analysis
Using UAVs for wildlife observation and motion capture offers manifold advantages for studying animals in the wild, especially grazing herds in open terrain. The aerial perspective allows observation at a scale and depth that is not possible on the ground, offering new insights into group behavior. However, the very nature of wildlife field-studies puts traditional fixed wing and multi-copter systems to their limits: limited flight time, noise and safety aspects affect their efficacy, where lighter than air systems can remain on station for many hours. Nevertheless, airships are challenging from a ground handling perspective as well as from a control point of view, being voluminous and highly affected by wind. In this work, we showcase a system designed to use airship formations to track, follow, and visually record wild horses from multiple angles, including airship design, simulation, control, on board computer vision, autonomous operation and practical aspects of field experiments.
Updated: 2024-05-24 10:59:48
标题: 空中飞艇编队用于动物运动捕捉和行为分析
摘要: 利用无人机进行野生动物观察和运动捕捉为研究野生动物,尤其是在开阔地带的放牧群体提供了多重优势。空中视角允许观察尺度和深度,这在地面上是不可能的,为群体行为提供了新的见解。然而,野生动物实地研究的性质使传统固定翼和多旋翼系统受到限制:有限的飞行时间、噪音和安全方面影响它们的效力,而比空气轻的系统可以在站点停留多个小时。然而,飞艇在地面处理和控制观点上具有挑战性,因为它们体积庞大,受风影响较大。在这项工作中,我们展示了一个设计为使用飞艇编队跟踪、追随和视觉记录野生马匹的系统,包括飞艇设计、模拟、控制、机载计算机视觉、自主操作和实地实验的实际方面。
更新时间: 2024-05-24 10:59:48
领域: cs.RO,cs.AI,cs.SY,eess.SY
Counterfactual Explanations for Linear Optimization
The concept of counterfactual explanations (CE) has emerged as one of the important concepts to understand the inner workings of complex AI systems. In this paper, we translate the idea of CEs to linear optimization and propose, motivate, and analyze three different types of CEs: strong, weak, and relative. While deriving strong and weak CEs appears to be computationally intractable, we show that calculating relative CEs can be done efficiently. By detecting and exploiting the hidden convex structure of the optimization problem that arises in the latter case, we show that obtaining relative CEs can be done in the same magnitude of time as solving the original linear optimization problem. This is confirmed by an extensive numerical experiment study on the NETLIB library.
Updated: 2024-05-24 10:58:00
标题: 线性优化的反事实解释
摘要: 反事实解释(CE)的概念已经成为理解复杂人工智能系统内部运作的重要概念之一。在本文中,我们将CE的概念转化为线性优化,并提出、论证和分析了三种不同类型的CE:强CE、弱CE和相对CE。尽管推导强CE和弱CE似乎是计算上难以处理的,我们表明计算相对CE可以高效地完成。通过检测和利用后一种情况中出现的优化问题的隐藏凸结构,我们表明获得相对CE可以在与解决原始线性优化问题相同数量级的时间内完成。这一点得到了对NETLIB库进行大量数值实验研究的确认。
更新时间: 2024-05-24 10:58:00
领域: math.OC,cs.LG
General Graph Random Features
We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic scaling of exact graph kernel evaluation. It can also be trivially distributed across machines, permitting learning on much larger networks. At the heart of the algorithm is a modulation function which upweights or downweights the contribution from different random walks depending on their lengths. We show that by parameterising it with a neural network we can obtain u-GRFs that give higher-quality kernel estimates or perform efficient, scalable kernel learning. We provide robust theoretical analysis and support our findings with experiments including pointwise estimation of fixed graph kernels, solving non-homogeneous graph ordinary differential equations, node clustering and kernel regression on triangular meshes.
Updated: 2024-05-24 10:57:51
标题: 一般图随机特征
摘要: 我们提出了一种基于随机游走的算法,用于对加权邻接矩阵的任意函数进行无偏估计,命名为通用图随机特征(u-GRFs)。这包括许多在图的节点上定义的最流行的核函数示例。我们的算法具有关于节点数量的次二次时间复杂度,克服了精确图核评估的臭名昭著的立方尺度。它还可以轻松分布到不同的机器上,允许在更大的网络上进行学习。算法的核心是一个调制函数,根据不同随机游走的长度增加或减少贡献。我们表明,通过用神经网络对其进行参数化,我们可以获得给出更高质量核估计或执行高效可扩展核学习的u-GRFs。我们提供了稳健的理论分析,并通过实验证明我们的发现,包括固定图核的逐点估计、解决非齐次图常微分方程、节点聚类以及三角网格上的核回归。
更新时间: 2024-05-24 10:57:51
领域: stat.ML,cs.LG
Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics
Naively trained Deep Reinforcement Learning agents may fail to satisfy vital safety constraints. To avoid costly retraining, we may desire to repair a previously trained reinforcement learning agent to obviate unsafe behaviour. We devise a counterexample-guided repair algorithm for repairing reinforcement learning systems leveraging safety critics. The algorithm jointly repairs a reinforcement learning agent and a safety critic using gradient-based constrained optimisation.
Updated: 2024-05-24 10:56:51
标题: 使用安全评论员引导修复强化学习系统的反例
摘要: 经过简单训练的深度强化学习代理可能无法满足重要的安全约束。为了避免昂贵的重新训练,我们可能希望修复先前训练过的强化学习代理,以消除不安全的行为。我们设计了一种基于反例引导的修复算法,用于修复利用安全批评家的强化学习系统。该算法通过基于梯度的约束优化同时修复强化学习代理和安全批评家。
更新时间: 2024-05-24 10:56:51
领域: cs.LG,cs.LO
E(n) Equivariant Topological Neural Networks
Graph neural networks excel at modeling pairwise interactions, but they cannot flexibly accommodate higher-order interactions and features. Topological deep learning (TDL) has emerged recently as a promising tool for addressing this issue. TDL enables the principled modeling of arbitrary multi-way, hierarchical higher-order interactions by operating on combinatorial topological spaces, such as simplicial or cell complexes, instead of graphs. However, little is known about how to leverage geometric features such as positions and velocities for TDL. This paper introduces E(n)-Equivariant Topological Neural Networks (ETNNs), which are E(n)-equivariant message-passing networks operating on combinatorial complexes, formal objects unifying graphs, hypergraphs, simplicial, path, and cell complexes. ETNNs incorporate geometric node features while respecting rotation and translation equivariance. Moreover, ETNNs are natively ready for settings with heterogeneous interactions. We provide a theoretical analysis to show the improved expressiveness of ETNNs over architectures for geometric graphs. We also show how several E(n) equivariant variants of TDL models can be directly derived from our framework. The broad applicability of ETNNs is demonstrated through two tasks of vastly different nature: i) molecular property prediction on the QM9 benchmark and ii) land-use regression for hyper-local estimation of air pollution with multi-resolution irregular geospatial data. The experiment results indicate that ETNNs are an effective tool for learning from diverse types of richly structured data, highlighting the benefits of principled geometric inductive bias.
Updated: 2024-05-24 10:55:38
标题: E(n)等变拓扑神经网络
摘要: 图神经网络在建模成对交互方面表现出色,但无法灵活地适应更高阶的交互和特征。拓扑深度学习(TDL)最近作为解决这一问题的有希望工具出现。TDL通过在组合拓扑空间(如单纯或细胞复合体)上操作,而不是在图上,实现了对任意多向、分层高阶交互的原则建模。然而,目前还很少关于如何利用位置和速度等几何特征进行TDL的研究。本文介绍了E(n)-等变拓扑神经网络(ETNNs),这是在组合复合体上运行的E(n)-等变消息传递网络,形式化对象统一了图、超图、单纯、路径和细胞复合体。ETNNs结合了几何节点特征,同时遵循旋转和平移等变性。此外,ETNNs在异质交互设置下本质上准备就绪。我们提供了理论分析,展示了ETNNs相对于几何图架构的改进表现。我们还展示了如何直接从我们的框架中导出几种E(n)等变TDL模型的变体。通过两项性质迥异的任务展示了ETNNs的广泛适用性:i)在QM9基准上的分子性质预测和ii)利用多分辨率不规则地理空间数据进行超局部空气污染估计的土地利用回归。实验结果表明,ETNNs是从不同类型的丰富结构化数据中学习的有效工具,突显了原则性几何归纳偏差的益处。
更新时间: 2024-05-24 10:55:38
领域: cs.LG,cs.NE
Improving Simulation Regression Efficiency using a Machine Learning-based Method in Design Verification
The verification throughput is becoming a major challenge bottleneck, since the complexity and size of SoC designs are still ever increasing. Simply adding more CPU cores and running more tests in parallel will not scale anymore. This paper discusses various methods of improving verification throughput: ranking and the new machine learning (ML) based technology introduced by Cadence i.e. Xcelium ML. Both methods aim at getting comparable coverage in less CPU time by applying more efficient stimulus. Ranking selects specific seeds that simply turned out to come up with the largest coverage in previous simulations, while Xcelium ML generates optimized patterns as a result of finding correlations between randomization points and achieved coverage of previous regressions. Quantified results as well as pros & cons of each approach are discussed in this paper at the example of three actual industry projects. Both Xcelium ML and Ranking methods gave comparable compression & speedup factors around 3 consistently. But the optimized ML based regressions simulated new random scenarios occasionally producing a coverage regain of more than 100%. Finally, a methodology is proposed to use Xcelium ML efficiently throughout the product development.
Updated: 2024-05-24 10:51:51
标题: 使用基于机器学习的方法提高设计验证中仿真回归效率
摘要: 验证吞吐量正在成为一个主要挑战瓶颈,因为SoC设计的复杂性和规模仍在不断增加。简单地添加更多CPU核心并并行运行更多测试将不再具有可扩展性。本文讨论了改进验证吞吐量的各种方法:排名和Cadence引入的新的基于机器学习(ML)的技术,即Xcelium ML。这两种方法旨在通过应用更高效的刺激,在更少的CPU时间内获得可比较的覆盖率。排名选择了在先前的模拟中简单地产生了最大覆盖率的特定种子,而Xcelium ML通过找到随机化点和先前回归的实现覆盖率之间的相关性,生成了优化的模式。本文讨论了每种方法的定量结果以及优缺点,以三个实际行业项目为例。Xcelium ML和排名方法在压缩率和加速比方面持续给出了大约3的可比较值。但基于优化的ML回归偶尔模拟出新的随机场景,产生了超过100%的覆盖率恢复。最后,提出了一种方法,以有效地在整个产品开发过程中使用Xcelium ML。
更新时间: 2024-05-24 10:51:51
领域: cs.LG,cs.AR
Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object Recognition
In an era of rapid climate change and its adverse effects on food production, technological intervention to monitor pollinator conservation is of paramount importance for environmental monitoring and conservation for global food security. The survival of the human species depends on the conservation of pollinators. This article explores the use of Computer Vision and Object Recognition to autonomously track and report bee behaviour from images. A novel dataset of 9664 images containing bees is extracted from video streams and annotated with bounding boxes. With training, validation and testing sets (6722, 1915, and 997 images, respectively), the results of the COCO-based YOLO model fine-tuning approaches show that YOLOv5m is the most effective approach in terms of recognition accuracy. However, YOLOv5s was shown to be the most optimal for real-time bee detection with an average processing and inference time of 5.1ms per video frame at the cost of slightly lower ability. The trained model is then packaged within an explainable AI interface, which converts detection events into timestamped reports and charts, with the aim of facilitating use by non-technical users such as expert stakeholders from the apiculture industry towards informing responsible consumption and production.
Updated: 2024-05-24 10:45:24
标题: 增强面向农业4.0的传粉者保护:通过物体识别监测蜜蜂
摘要: 在一个快速气候变化及其对食物生产的不利影响的时代,技术干预监测传粉动物保护对于环境监测和全球食品安全至关重要。人类的生存取决于传粉动物的保护。本文探讨了使用计算机视觉和目标识别来自动跟踪和报告蜜蜂行为的方法。从视频流中提取了包含蜜蜂的新颖数据集共9664张图像,并标注了边界框。通过训练、验证和测试集(分别为6722、1915和997张图像),基于COCO的YOLO模型微调方法的结果显示,YOLOv5m是在识别准确性方面最有效的方法。然而,YOLOv5s被证明是最适合实时蜜蜂检测的方法,每帧视频处理和推理的平均时间为5.1毫秒,但牺牲了一定的能力。然后将训练好的模型打包到一个可解释的人工智能界面中,将检测事件转换为带时间戳的报告和图表,旨在方便非技术用户(如养蜂业的专业利益相关者)使用,以促进负责任的消费和生产。
更新时间: 2024-05-24 10:45:24
领域: cs.CV,cs.LG
AuthNet: Neural Network with Integrated Authentication Logic
Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has become one of the major threats. Proprietary models may be protected by access controls and encryption. However, in reality, these measures can be compromised due to system breaches, query-based model extraction or a disgruntled insider. Security hardening of neural networks is also suffering from limits, for example, model watermarking is passive, cannot prevent the occurrence of piracy and not robust against transformations. To this end, we propose a native authentication mechanism, called AuthNet, which integrates authentication logic as part of the model without any additional structures. Our key insight is to reuse redundant neurons with low activation and embed authentication bits in an intermediate layer, called a gate layer. Then, AuthNet fine-tunes the layers after the gate layer to embed authentication logic so that only inputs with special secret key can trigger the correct logic of AuthNet. It exhibits two intuitive advantages. It provides the last line of defense, i.e., even being exfiltrated, the model is not usable as the adversary cannot generate valid inputs without the key. Moreover, the authentication logic is difficult to inspect and identify given millions or billions of neurons in the model. We theoretically demonstrate the high sensitivity of AuthNet to the secret key and its high confusion for unauthorized samples. AuthNet is compatible with any convolutional neural network, where our extensive evaluations show that AuthNet successfully achieves the goal in rejecting unauthenticated users (whose average accuracy drops to 22.03%) with a trivial accuracy decrease (1.18% on average) for legitimate users, and is robust against model transformation and adaptive attacks.
Updated: 2024-05-24 10:44:22
标题: AuthNet:集成认证逻辑的神经网络
摘要: 模型盗窃,即未经授权访问和外泄深度学习模型,已成为主要威胁之一。专有模型可能受到访问控制和加密的保护。然而,在现实中,这些措施可能会因系统漏洞、基于查询的模型提取或不满的内部人员而受损。神经网络的安全强化也受到限制,例如,模型水印是被动的,无法阻止盗版的发生,并且不具有对转换的鲁棒性。因此,我们提出了一种本地身份验证机制,称为AuthNet,它将身份验证逻辑集成为模型的一部分,而无需任何额外的结构。我们的关键见解是重用具有低激活的冗余神经元,并在一个称为门层的中间层中嵌入身份验证位。然后,AuthNet微调门层之后的层以嵌入身份验证逻辑,以便只有具有特殊秘钥的输入才能触发AuthNet的正确逻辑。它展示了两个直观的优势。它提供了最后的防线,即即使被外泄,模型也无法使用,因为对手无法生成有效的输入而缺少密钥。此外,鉴别逻辑难以检查和识别,因为模型中有数百万或数十亿的神经元。我们在理论上证明了AuthNet对秘密密钥的高敏感性以及对未经授权样本的高混淆性。AuthNet与任何卷积神经网络兼容,我们广泛的评估表明,AuthNet成功地实现了拒绝未经身份验证的用户的目标(其平均准确率下降到22.03%),而对合法用户的准确率减少微乎其微(平均减少1.18%),并且对模型转换和自适应攻击具有鲁棒性。
更新时间: 2024-05-24 10:44:22
领域: cs.CR
Smoothed Online Classification can be Harder than Batch Classification
We study online classification under smoothed adversaries. In this setting, at each time point, the adversary draws an example from a distribution that has a bounded density with respect to a fixed base measure, which is known apriori to the learner. For binary classification and scalar-valued regression, previous works \citep{haghtalab2020smoothed, block2022smoothed} have shown that smoothed online learning is as easy as learning in the iid batch setting under PAC model. However, we show that smoothed online classification can be harder than the iid batch classification when the label space is unbounded. In particular, we construct a hypothesis class that is learnable in the iid batch setting under the PAC model but is not learnable under the smoothed online model. Finally, we identify a condition that ensures that the PAC learnability of a hypothesis class is sufficient for its smoothed online learnability.
Updated: 2024-05-24 10:37:39
标题: 平滑在线分类可能比批处理分类更难
摘要: 我们研究在平滑对手下的在线分类。在这个设置中,每个时间点,对手从一个相对于固定基准测度有界密度的分布中抽取一个示例,该基准测度是对学习者事先已知的。对于二元分类和标量回归,先前的作品已经表明,在PAC模型下,平滑的在线学习就像在iid批处理设置下学习一样容易。然而,我们表明,当标签空间是无界时,平滑的在线分类可能比iid批处理分类更困难。特别地,我们构建了一个假设类,在PAC模型下的iid批处理设置中是可学习的,但在平滑的在线模型下是不可学习的。最后,我们确定了一个条件,确保假设类的PAC可学习性足以保证其平滑的在线可学习性。
更新时间: 2024-05-24 10:37:39
领域: cs.LG
Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models
Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.
Updated: 2024-05-24 10:37:38
标题: 迷失在平均值中:一种新的特定设置用于评估针对机器学习模型的成员推断攻击
摘要: 成员推理攻击(MIAs)被广泛用于评估机器学习(ML)模型记忆个体记录的倾向以及发布模型可能带来的隐私风险。MIAs通常类似于ML模型进行评估:MIAs在训练期间未见的数据集上进行测试,这些数据集从更大的$D_{eval}$数据集中抽样。MIAs在测试集中的所有数据集上进行评估,因此在从$D_{eval}$中抽样的分布上进行评估。虽然这是对MIAs进行ML评估的自然扩展,但最近的研究表明,记录的风险在很大程度上取决于其特定数据集。例如,异常值特别容易受到攻击,但一个数据集中的异常值在另一个数据集中可能并不是异常值。目前用于评估MIAs的随机性来源可能导致个体隐私风险估计不准确。我们提出了一种新的,针对ML模型的MIAs的特定评估设置,仅使用权重初始化作为随机性来源。这使我们能够准确评估发布在特定数据集上训练的模型所带来的风险。使用SOTA MIAs,我们在实证上展示了当前设置所给出的风险估计导致许多记录被错误分类为低风险。我们推导出理论结果,结合实证证据,表明当前设置中计算的风险是对每个抽样数据集特定风险的平均值,验证了我们将权重初始化作为唯一随机性来源的做法。最后,我们考虑了一个利用有关目标数据集信息的更强大对手的MIAs,以推断成员身份。综合考虑,我们的研究结果表明,当前的MIAs评估是对数据集风险进行平均处理,导致风险估计不准确,并且利用有关目标数据集信息的攻击所带来的风险可能被低估。
更新时间: 2024-05-24 10:37:38
领域: cs.LG,cs.CR
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Experimental control involves a lot of manual effort with non-trivial decisions for precise adjustments. Here, we study the automatic experimental alignment for coupling laser light into an optical fiber using reinforcement learning (RL). We face several real-world challenges, such as time-consuming training, partial observability, and noisy actions due to imprecision in the mirror steering motors. We show that we can overcome these challenges: To save time, we use a virtual testbed to tune our environment for dealing with partial observability and use relatively sample-efficient model-free RL algorithms like Soft Actor-Critic (SAC) or Truncated Quantile Critics (TQC). Furthermore, by fully training on the experiment, the agent learns directly to handle the noise present. In our extensive experimentation, we show that we are able to achieve 90% coupling, showcasing the effectiveness of our proposed approaches. We reach this efficiency, which is comparable to that of a human expert, without additional feedback loops despite the motors' inaccuracies. Our result is an example of the readiness of RL for real-world tasks. We consider RL a promising tool for reducing the workload in labs.
Updated: 2024-05-24 10:36:23
标题: 无模型强化学习在光学自动实验控制中的应用:带有嘈杂动作
摘要: 实验控制涉及大量手动工作,需要进行一些重要的决策以进行精确调整。在这里,我们研究了使用强化学习(RL)自动实验对准,将激光光束耦合到光纤中。我们面临着几个现实世界的挑战,比如耗时的训练、部分可观察性以及由于镜子转向电机不准确而导致的嘈杂动作。我们展示了我们可以克服这些挑战:为了节省时间,我们使用虚拟实验室来调整我们的环境,以应对部分可观察性,并使用相对高效的无模型RL算法,如软演员-评论家(SAC)或截断分位数评论家(TQC)。此外,通过完全在实验中训练,代理程序直接学习处理目前存在的噪音。在我们广泛的实验中,我们展示了我们能够实现90%的耦合效率,展示了我们提出的方法的有效性。我们达到了与人类专家相当的效率,尽管电机存在不准确性,但不需要额外的反馈回路。我们的结果是强化学习准备好应对现实世界任务的一个例子。我们认为强化学习是减少实验室工作量的一种有前途的工具。
更新时间: 2024-05-24 10:36:23
领域: cs.LG,physics.optics,J.2; I.2.1
Towards Bounding Causal Effects under Markov Equivalence
Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that in general such questions cannot be answered definitively from observational data. This realization has fuelled a growing literature introducing various identifying assumptions, for example in the form of a causal diagram among relevant variables. In practice, this paradigm is still too rigid for many practical applications as it is generally not possible to confidently delineate the true causal diagram. In this paper, we consider the derivation of bounds on causal effects given only observational data. We propose to take as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from data. In this more ``data-driven'' setting, we provide a systematic algorithm to derive bounds on causal effects that exploit the invariant properties of the equivalence class, and that can be computed analytically. We demonstrate our method with synthetic and real data examples.
Updated: 2024-05-24 10:28:48
标题: 朝着在马尔可夫等价性下限制因果效应
摘要: 预测看不见的干预效果是数据科学领域的一个基本研究问题。众所周知,通常情况下无法从观测数据中明确回答这类问题。这一认识推动了越来越多的文献介绍各种识别假设,例如在相关变量之间形成的因果图的形式。在实践中,这种范式对于许多实际应用来说仍然过于严格,因为通常不可能自信地划定真实的因果图。在本文中,我们考虑仅利用观测数据推导因果效应的界限。我们建议以一个较不具信息性的结构——部分祖先图——作为输入,该图代表因果图的马尔可夫等价类,并且可以从数据中学习。在这种更“数据驱动”的设置中,我们提供了一个系统性算法来推导利用等价类的不变特性计算因果效应的界限,可以进行解析计算。我们通过合成和真实数据示例展示了我们的方法。
更新时间: 2024-05-24 10:28:48
领域: stat.ML,cs.LG
Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.
Updated: 2024-05-24 10:25:59
标题: 鲁班:通过自主具身验证构建开放式创造性代理程序
摘要: 构建开放式智能代理一直是人工智能研究的终极目标,创造性代理更具吸引力。现有的LLM代理在具有明确定义目标的长期任务(例如在Minecraft中“挖矿”)方面表现出色。然而,在具有开放目标和抽象标准的创造性任务中,它们遇到困难,因为无法填补它们之间的差距,因此缺乏解决任务的自我改进的反馈。在这项工作中,我们引入了自主体验验证技术,为代理填补这一差距,为创造性任务奠定基础。具体地,我们提出了Luban代理目标在Minecraft中完成创造性建筑任务,该代理配备人类设计实践启发的两级自主体验验证:(1)对3D结构推测进行视觉验证,该验证来自代理合成的CAD建模程序;(2)通过生成和验证基于抽象标准的与环境相关功能程序对创作进行实用验证。广泛的多维人类研究和Elo评分表明,Luban在我们提出的基准测试中完成了多样的创造性建筑任务,并在可视化和实用性方面表现出色,超越其他基准($33\%$至$100\%$)。在现实世界中的机械臂上的额外演示展示了Luban在物理世界中的创造潜力。
更新时间: 2024-05-24 10:25:59
领域: cs.AI
ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions
Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical Ocean General Circulation Models (OGCMs), accurate forecasting of global oceanic variations for multi-year remains to be a long-standing challenge. Here, we introduce ORCA (Oceanic Reliable foreCAst), the first data-driven model predicting global ocean circulation from multi-year to decadal time scales. ORCA accurately simulates the three-dimensional circulations and dynamics of the global ocean with high physical consistency. Hindcasts of key oceanic variables demonstrate ORCA's remarkable prediction skills in predicting ocean variations compared with state-of-the-art numerical OGCMs and abilities in capturing occurrences of extreme events at the subsurface ocean and ENSO vertical patterns. These results demonstrate the potential of data-driven ocean models for providing cheap, efficient, and accurate global ocean modeling and prediction. Moreover, ORCA stably and faithfully emulates ocean dynamics at decadal timescales, demonstrating its potential even for climate projections. The model will be available at https://github.com/OpenEarthLab/ORCA.
Updated: 2024-05-24 10:23:17
标题: ORCA:多年至十年预测的全球海洋模拟器
摘要: 海洋动力学在驱动全球天气和气候模式方面起着至关重要的作用。准确而高效地对海洋动力学建模对于改进对复杂海洋环流和过程的理解、预测气候变化及其相关的遥相关性,以及应对气候变化的挑战至关重要。尽管在改进数值海洋一般环流模型(OGCMs)方面已经付出了很大努力,但准确预测多年全球海洋变化仍然是一个长期存在的挑战。在这里,我们介绍了ORCA(Oceanic Reliable foreCAst),这是第一个从多年到十年时间尺度预测全球海洋环流的数据驱动模型。ORCA能够准确模拟全球海洋的三维环流和动力学,具有高物理一致性。对关键海洋变量的回测显示,与最先进的数值OGCMs相比,ORCA在预测海洋变化方面具有显著的预测能力,并且能够捕捉亚表层海洋和ENSO垂直模式的极端事件发生。这些结果表明,数据驱动的海洋模型有潜力提供廉价、高效、准确的全球海洋建模和预测。此外,ORCA在十年时间尺度上稳定且忠实地模拟海洋动力学,展示了即使用于气候预测也具有潜力。该模型将可在https://github.com/OpenEarthLab/ORCA 上获得。
更新时间: 2024-05-24 10:23:17
领域: physics.ao-ph,cs.AI,cs.LG
Towards Client Driven Federated Learning
Conventional federated learning (FL) frameworks follow a server-driven model where the server determines session initiation and client participation, which faces challenges in accommodating clients' asynchronous needs for model updates. We introduce Client-Driven Federated Learning (CDFL), a novel FL framework that puts clients at the driving role. In CDFL, each client independently and asynchronously updates its model by uploading the locally trained model to the server and receiving a customized model tailored to its local task. The server maintains a repository of cluster models, iteratively refining them using received client models. Our framework accommodates complex dynamics in clients' data distributions, characterized by time-varying mixtures of cluster distributions, enabling rapid adaptation to new tasks with superior performance. In contrast to traditional clustered FL protocols that send multiple cluster models to a client to perform distribution estimation, we propose a paradigm that offloads the estimation task to the server and only sends a single model to a client, and novel strategies to improve estimation accuracy. We provide a theoretical analysis of CDFL's convergence. Extensive experiments across various datasets and system settings highlight CDFL's substantial advantages in model performance and computation efficiency over baselines.
Updated: 2024-05-24 10:17:49
标题: 朝向客户驱动的联邦学习
摘要: 传统的联邦学习(FL)框架遵循服务器驱动模型,其中服务器确定会话启动和客户端参与,面临着满足客户端异步需求进行模型更新的挑战。我们引入了客户驱动的联邦学习(CDFL),这是一个将客户端置于驱动角色的新颖FL框架。在CDFL中,每个客户端独立且异步地通过将本地训练的模型上传到服务器并接收定制的适合其本地任务的模型来更新其模型。服务器维护一个集群模型的存储库,通过接收的客户端模型进行迭代地优化它们。我们的框架适应了客户端数据分布中的复杂动态,其特征是时间变化的集群分布混合,使其能够快速适应新任务并具有更优越的性能。与传统的集群FL协议相反,后者向客户端发送多个集群模型用于分布估计,我们提出了一种将估计任务卸载到服务器并仅向客户端发送单个模型的范例,并提出了改善估计准确性的新策略。我们对CDFL的收敛性进行了理论分析。在各种数据集和系统设置上进行的大量实验突显了CDFL在模型性能和计算效率方面相对基线的显著优势。
更新时间: 2024-05-24 10:17:49
领域: cs.LG,cs.DC
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
In this technical report, we extensively investigate the accuracy of outputs from well-known generative artificial intelligence (AI) applications in response to prompts describing common fluid motion phenomena familiar to the fluid mechanics community. We examine a range of applications, including Midjourney, Dall-E, Runway ML, Microsoft Designer, Gemini, Meta AI, and Leonardo AI, introduced by prominent companies such as Google, OpenAI, Meta, and Microsoft. Our text prompts for generating images or videos include examples such as "Von Karman vortex street", "flow past an airfoil", "Kelvin-Helmholtz instability", "shock waves on a sharp-nosed supersonic body", etc. We compare the images generated by these applications with real images from laboratory experiments and numerical software. Our findings indicate that these generative AI models are not adequately trained in fluid dynamics imagery, leading to potentially misleading outputs. Beyond text-to-image/video generation, we further explore the transition from image/video to text generation using these AI tools, aiming to investigate the accuracy of their descriptions of fluid motion phenomena. This report serves as a cautionary note for educators in academic institutions, highlighting the potential for these tools to mislead students. It also aims to inform researchers at these renowned companies, encouraging them to address this issue. We conjecture that a primary reason for this shortcoming is the limited access to copyright-protected fluid motion images from scientific journals.
Updated: 2024-05-24 10:17:15
标题: 一种由生成式人工智能制作的流体运动的误导性画廊
摘要: 在这份技术报告中,我们广泛调查了知名生成人工智能(AI)应用程序输出的准确性,以应对描述流体力学界熟悉的常见流体运动现象的提示。我们检查了一系列应用程序,包括Midjourney、Dall-E、Runway ML、Microsoft Designer、Gemini、Meta AI和Leonardo AI,这些应用程序由谷歌、OpenAI、Meta和微软等知名公司推出。我们用于生成图像或视频的文本提示包括“冯·卡门涡街”、“气翼流动”、“Kelvin-Helmholtz不稳定性”、“超音速机身前锋处的激波”等示例。我们将这些应用程序生成的图像与实验室实验和数值软件中的真实图像进行比较。我们的研究结果表明,这些生成式AI模型在流体动力学图像方面训练不足,导致可能误导性的输出。除了文本到图像/视频生成外,我们进一步探讨了使用这些AI工具从图像/视频到文本生成的转变,旨在调查它们描述流体运动现象准确性。这份报告提醒学术机构的教育工作者,强调这些工具误导学生的潜力。它还旨在通知这些知名公司的研究人员,鼓励他们解决这个问题。我们推测,这种缺陷的主要原因是无法获得受版权保护的流体运动图像从科学期刊中获取。
更新时间: 2024-05-24 10:17:15
领域: physics.flu-dyn,cs.LG
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are unnecessarily involved in computations via multiplying values by zero or low activation values. To address this issue, we present \tool, a novel MoE designed to enhance both the efficacy and efficiency of sparse MoE models. \tool leverages small experts and a threshold-based router to enable tokens to selectively engage only essential parameters. Our extensive experiments on language modeling and machine translation tasks demonstrate that \tool can enhance model performance while decreasing the computation load at MoE layers by over 50\% without sacrificing performance. Furthermore, we present the versatility of \tool by applying it to dense models, enabling sparse computation during inference. We provide a comprehensive analysis and make our code available at https://github.com/ysngki/XMoE.
Updated: 2024-05-24 10:14:55
标题: XMoE:细粒度和自适应专家选择的稀疏模型
摘要: 稀疏模型,包括稀疏的专家混合(MoE)模型,已经成为扩展Transformer模型的有效方法。然而,它们通常由于大量参数通过将值乘以零或低激活值而不必要地参与计算而导致计算效率低下。为了解决这个问题,我们提出了一种新颖的MoE设计,名为\tool,旨在增强稀疏MoE模型的有效性和效率。 \tool 利用小专家和基于阈值的路由器,使令牌能够有选择地只与必要的参数交互。我们在语言建模和机器翻译任务上进行了大量实验,结果表明\tool 能够提高模型性能,同时在MoE层减少超过50%的计算负载,而不损失性能。此外,我们展示了\tool 的多功能性,将其应用于密集模型,实现了在推理过程中的稀疏计算。我们提供了全面的分析,并在https://github.com/ysngki/XMoE 上提供了我们的代码。
更新时间: 2024-05-24 10:14:55
领域: cs.LG,cs.CL
Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random
In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values and those missing values are generally missing-not-at-random, which deteriorates the prediction performance of models. Some existing estimators and regularizers attempt to achieve unbiased estimation to improve the predictive performance. However, variances and generalization bound of these methods are generally unbounded when the propensity scores tend to zero, compromising their stability and robustness. In this paper, we first theoretically reveal that limitations of regularization techniques. Besides, we further illustrate that, for more general estimators, unbiasedness will inevitably lead to unbounded variance. These general laws inspire us that the estimator designs is not merely about eliminating bias, reducing variance, or simply achieve a bias-variance trade-off. Instead, it involves a quantitative joint optimization of bias and variance. Then, we develop a systematic fine-grained dynamic learning framework to jointly optimize bias and variance, which adaptively selects an appropriate estimator for each user-item pair according to the predefined objective function. With this operation, the generalization bounds and variances of models are reduced and bounded with theoretical guarantees. Extensive experiments are conducted to verify the theoretical results and the effectiveness of the proposed dynamic learning framework.
Updated: 2024-05-24 10:07:09
标题: 数据缺失非随机时的偏差-方差联合优化的细粒度动态框架
摘要: 在大多数实际应用中,如推荐系统、展示广告等,收集的数据通常包含缺失值,并且这些缺失值通常是缺失非随机的,这会降低模型的预测性能。一些现有的估计器和正则化器试图实现无偏估计以提高预测性能。然而,当倾向得分趋近于零时,这些方法的方差和泛化界限通常是无界的,这会损害它们的稳定性和鲁棒性。在本文中,我们首先理论上揭示了正则化技术的局限性。此外,我们进一步说明,对于更一般的估计器,无偏性将不可避免地导致方差无界。这些一般规律启发我们,估计器设计不仅仅是消除偏差、减少方差,或简单地实现偏差-方差折衷。相反,它涉及对偏差和方差进行定量联合优化。然后,我们开发了一个系统化的细粒度动态学习框架,以联合优化偏差和方差,根据预定义的目标函数自适应地为每个用户-项目对选择合适的估计器。通过这种操作,模型的泛化界限和方差得到了减少和受到理论保证。我们进行了大量实验来验证理论结果和所提出的动态学习框架的有效性。
更新时间: 2024-05-24 10:07:09
领域: cs.LG,stat.ML
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.
Updated: 2024-05-24 10:03:59
标题: 理解变压器的序列建模表现能力和机制
摘要: 我们对Transformer在长、稀疏和复杂记忆序列建模中的近似性能进行了系统研究。我们调查了Transformer的不同组件(如点积自注意力、位置编码和前馈层)如何影响其表达能力的机制,并通过建立显式的近似率来研究它们的综合效应。我们的研究揭示了Transformer中关键参数(如层数和注意力头数)的作用。这些理论洞察力在实验中得到验证,并为替代架构提供了自然建议。
更新时间: 2024-05-24 10:03:59
领域: cs.LG,stat.ML
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.
Updated: 2024-05-24 09:58:55
标题: 一种适应性方法用于具有广义衰减约束下的无限臂老虎机
摘要: 在这项研究中,我们考虑了一个休息腐烂环境中的无限多臂老虎机问题,其中每次拉动一个臂的平均奖励可能会减少,而其他情况下则保持不变。我们探讨了有关奖励腐烂的两种情景:一种是累积腐烂量被$V_T$限制的慢腐烂情况,另一种是累积腐烂实例被$S_T$限制的突然腐烂情况。为了解决腐烂奖励带来的挑战,我们提出了一种算法,利用带有自适应滑动窗口的UCB,旨在管理由于腐烂奖励而产生的偏差和方差权衡。我们提出的算法在慢腐烂和突然腐烂情景下实现了紧密的遗憾上界。最后,我们使用数值实验展示了我们算法的性能。
更新时间: 2024-05-24 09:58:55
领域: cs.LG,stat.ML
Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently with an uncoupled and single time-scale algorithm under weaker conditions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based approach. The algorithm learns an approximate Nash equilibrium in polynomial time, requiring only the existence of a policy pair that induces an irreducible and aperiodic Markov chain, thus considerably weakening past assumptions. Our analysis leverages negative drift inequalities and introduces novel properties of Tsallis entropy that are of independent interest.
Updated: 2024-05-24 09:57:54
标题: 学习零和马尔可夫博弈中的纳什均衡:在弱可达性下的单时间尺度算法
摘要: 我们考虑零和博弈的分散学习,其中玩家只看到他们的收益信息,对对手的行动和收益不知情。先前的研究表明,在这种情况下,使用双时间尺度算法在强可达性假设下收敛到纳什均衡。我们致力于在较弱条件下通过一个不耦合且单时间尺度算法有效地实现近似纳什均衡这一开放问题。我们的贡献是一个理性且收敛的算法,利用基于价值迭代的Tsallis-熵正则化方法。该算法在多项式时间内学习出近似的纳什均衡,只需要存在一个引起不可约且非周期性马尔可夫链的策略对,因此大大削弱了过去的假设。我们的分析利用了负漂移不等式,并引入了Tsallis熵的新性质,这些性质在独立的兴趣方面具有创新性。
更新时间: 2024-05-24 09:57:54
领域: cs.GT,cs.LG
PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds
Running deep neural networks for large medical images is a resource-hungry and time-consuming task with centralized computing. Outsourcing such medical image processing tasks to hybrid clouds has benefits, such as a significant reduction of execution time and monetary cost. However, due to privacy concerns, it is still challenging to process sensitive medical images over clouds, which would hinder their deployment in many real-world applications. To overcome this, we first formulate the overall optimization objectives of the privacy-preserving distributed system model, i.e., minimizing the amount of information about the private data learned by the adversaries throughout the process, reducing the maximum execution time and cost under the user budget constraint. We propose a novel privacy-preserving and cost-effective method called PriCE to solve this multi-objective optimization problem. We performed extensive simulation experiments for artifact detection tasks on medical images using an ensemble of five deep convolutional neural network inferences as the workflow task. Experimental results show that PriCE successfully splits a wide range of input gigapixel medical images with graph-coloring-based strategies, yielding desired output utility and lowering the privacy risk, makespan, and monetary cost under user's budget.
Updated: 2024-05-24 09:52:00
标题: PriCE:隐私保护和成本效益调度用于在混合云上并行化大型医学图像处理工作流程
摘要: 在处理大型医学图像时,运行深度神经网络是一个资源密集且耗时的任务,需要集中计算。将这种医学图像处理任务外包给混合云具有诸多好处,例如显著减少执行时间和货币成本。然而,由于隐私问题,在云中处理敏感的医学图像仍然具有挑战性,这将阻碍它们在许多实际应用中的部署。为了克服这一问题,我们首先制定了隐私保护分布式系统模型的总体优化目标,即在用户预算约束下最大程度地减少对手在整个过程中学习到的关于私人数据的信息量,减少最大执行时间和成本。我们提出了一种名为PriCE的新颖的隐私保护和成本效益方法来解决这个多目标优化问题。我们对使用五个深度卷积神经网络推断的集成作为工作流任务进行医学图像上的伪迹检测任务进行了大量的模拟实验。实验结果表明,PriCE成功地利用基于图着色的策略拆分了各种输入吉比像素医学图像,产生了期望的输出效用,并在用户预算下降低了隐私风险、最大执行时间和货币成本。
更新时间: 2024-05-24 09:52:00
领域: cs.CE,cs.AI,cs.CV,cs.DC,cs.ET
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization
Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.
Updated: 2024-05-24 09:48:18
标题: 重新排列重采样切分可以改善超参数优化的泛化效果
摘要: 超参数优化对于获得机器学习模型的最佳性能至关重要。标准协议通过使用泛化误差的重新采样估计来评估各种超参数配置,以指导优化并选择最终的超参数配置。缺乏证据的情况下,通常建议使用成对的重新采样分割,即固定的训练-验证分割或固定的交叉验证方案。我们发现,令人惊讶的是,对于每个配置重新洗牌分割往往会提高最终模型在未知数据上的泛化性能。我们的理论分析解释了重新洗牌如何影响验证损失表面的渐近行为,并提供了在极限情况下预期遗憾的界限。这个界限将重新洗牌的潜在好处与基础优化问题的信号和噪声特性联系起来。我们在控制模拟研究中验证了我们的理论结果,并展示了重新洗牌在大规模、实际的超参数优化实验中的实用性。虽然重新洗牌会导致测试性能与使用固定分割竞争,但它显著改善了单一的训练 - 验证保留协议的结果,并往往使保留协议在计算上更加便宜的情况下变得具有竞争力。
更新时间: 2024-05-24 09:48:18
领域: stat.ML,cs.LG
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.
Updated: 2024-05-24 09:45:09
标题: 微笑:用语言模型理解视频中笑声的多模态数据集
摘要: 尽管人工智能近年取得了进展,构建社交智能仍然是一个挑战。在社交信号中,笑声是人类社交互动中的一个独特表达。在这项工作中,我们针对机器理解视频中笑声背后的原因这一新挑战,提出了视频笑声推理任务。我们介绍了这一新任务,解释为什么人们在特定视频中会笑,并为此任务提供了一个数据集。我们提出的数据集SMILE包括视频片段和描述人们为什么笑的语言。我们通过利用大型语言模型(LLMs)的推理能力与文本视频表示提出了一个基线。实验证明我们的基线可以生成笑声的合理解释。我们进一步探讨了我们基线的可扩展性,通过探讨其他视频理解任务和野外视频。我们在https://github.com/postech-ami/SMILE-Dataset上发布了我们的数据集、代码和模型检查点。
更新时间: 2024-05-24 09:45:09
领域: cs.CL,cs.AI
Tensor Frames -- How To Make Any Message Passing Network Equivariant
In many applications of geometric deep learning, the choice of global coordinate frame is arbitrary, and predictions should be independent of the reference frame. In other words, the network should be equivariant with respect to rotations and reflections of the input, i.e., the transformations of O(d). We present a novel framework for building equivariant message passing architectures and modifying existing non-equivariant architectures to be equivariant. Our approach is based on local coordinate frames, between which geometric information is communicated consistently by including tensorial objects in the messages. Our framework can be applied to message passing on geometric data in arbitrary dimensional Euclidean space. While many other approaches for equivariant message passing require specialized building blocks, such as non-standard normalization layers or non-linearities, our approach can be adapted straightforwardly to any existing architecture without such modifications. We explicitly demonstrate the benefit of O(3)-equivariance for a popular point cloud architecture and produce state-of-the-art results on normal vector regression on point clouds.
Updated: 2024-05-24 09:41:06
标题: 张量框架--如何使任何消息传递网络等变
摘要: 在许多几何深度学习应用中,全局坐标框架的选择是任意的,预测应该独立于参考框架。换句话说,网络应该对输入的旋转和反射具有等变性,即对O(d)的变换具有等变性。我们提出了一个新颖的框架,用于构建等变消息传递架构,并修改现有的非等变架构以实现等变性。我们的方法基于局部坐标框架,通过在消息中包含张量对象来一致地传递几何信息。我们的框架可以应用于在任意维欧几里得空间上的几何数据的消息传递。虽然许多其他等变消息传递方法需要专门的构建块,如非标准的归一化层或非线性,我们的方法可以直接适应任何现有架构而无需进行这些修改。我们明确展示了O(3)等变性对流行点云架构的益处,并在点云上的法线向量回归方面取得了最先进的结果。
更新时间: 2024-05-24 09:41:06
领域: cs.LG
Language-Driven Interactive Traffic Trajectory Generation
Realistic trajectory generation with natural language control is pivotal for advancing autonomous vehicle technology. However, previous methods focus on individual traffic participant trajectory generation, thus failing to account for the complexity of interactive traffic dynamics. In this work, we propose InteractTraj, the first language-driven traffic trajectory generator that can generate interactive traffic trajectories. InteractTraj interprets abstract trajectory descriptions into concrete formatted interaction-aware numerical codes and learns a mapping between these formatted codes and the final interactive trajectories. To interpret language descriptions, we propose a language-to-code encoder with a novel interaction-aware encoding strategy. To produce interactive traffic trajectories, we propose a code-to-trajectory decoder with interaction-aware feature aggregation that synergizes vehicle interactions with the environmental map and the vehicle moves. Extensive experiments show our method demonstrates superior performance over previous SoTA methods, offering a more realistic generation of interactive traffic trajectories with high controllability via diverse natural language commands. Our code is available at https://github.com/X1a-jk/InteractTraj.git
Updated: 2024-05-24 09:38:36
标题: 语言驱动的交互式交通轨迹生成
摘要: 实现具有自然语言控制的现实轨迹生成对于推进自动驾驶汽车技术至关重要。然而,先前的方法专注于单个交通参与者轨迹生成,因此未能考虑交互式交通动态的复杂性。在这项工作中,我们提出了InteractTraj,这是第一个可以生成交互式交通轨迹的语言驱动交通轨迹生成器。InteractTraj将抽象轨迹描述解释为具体格式化的交互感知数值代码,并学习这些格式化代码与最终交互式轨迹之间的映射关系。为了解释语言描述,我们提出了一种具有新颖的交互感知编码策略的语言到代码编码器。为了生成交互式交通轨迹,我们提出了一种具有交互感知特征聚合的代码到轨迹解码器,通过将车辆互动与环境地图和车辆移动相结合,实现了交互式交通轨迹的生成。大量实验证明我们的方法表现优于先前的最先进方法,通过多样的自然语言命令提供了更逼真的交互式交通轨迹生成,具有高度可控性。我们的代码可在https://github.com/X1a-jk/InteractTraj.git 上找到。
更新时间: 2024-05-24 09:38:36
领域: cs.AI,cs.RO
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate
Real-world decision-making tasks are usually partially observable Markov decision processes (POMDPs), where the state is not fully observable. Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks. However, previous recurrent RL methods face training stability issues due to the gradient instability of RNNs. In this paper, we propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue. Specifically, RESeL uses a lower learning rate for context encoder than other MLP layers to ensure the stability of the former while maintaining the training efficiency of the latter. We integrate this technique into existing off-policy RL methods, resulting in the RESeL algorithm. We evaluated RESeL in 18 POMDP tasks, including classic, meta-RL, and credit assignment scenarios, as well as five MDP locomotion tasks. The experiments demonstrate significant improvements in training stability with RESeL. Comparative results show that RESeL achieves notable performance improvements over previous recurrent RL baselines in POMDP tasks, and is competitive with or even surpasses state-of-the-art methods in MDP tasks. Further ablation studies highlight the necessity of applying a distinct learning rate for the context encoder.
Updated: 2024-05-24 09:33:47
标题: 高效的基于上下文编码器的离线强化学习需要特定的学习率
摘要: 真实世界中的决策任务通常是部分可观察的马尔可夫决策过程(POMDPs),其中状态并非完全可观察。最近的进展表明,循环强化学习(RL)可以缓解部分可观察性,并作为POMDP任务的强大基准线,其基础是基于循环神经网络(RNNs)的上下文编码器用于不可观察状态预测以及用于决策制定的多层感知器(MLP)。然而,先前的循环RL方法由于RNN的梯度不稳定性而面临训练稳定性问题。在本文中,我们提出了具有上下文编码器特定学习率的循环离策略RL(RESeL)来解决这个问题。具体而言,RESeL为上下文编码器使用较低的学习率,以确保前者的稳定性,同时保持后者的训练效率。我们将这一技术集成到现有的离策略RL方法中,形成了RESeL算法。我们在18个POMDP任务中评估了RESeL,包括经典、元RL和信用分配场景,以及五个MDP运动任务。实验表明,RESeL在训练稳定性方面取得了显著改进。比较结果显示,RESeL在POMDP任务中实现了显著的性能提升,竞争力与MDP任务中甚至超过现有的最先进方法。进一步的消融研究突出了为上下文编码器应用不同学习率的必要性。
更新时间: 2024-05-24 09:33:47
领域: cs.LG
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search
In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has the advantages of being precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.
Updated: 2024-05-24 09:31:26
标题: 用蒙特卡洛树搜索引导的大型语言模型生成代码世界模型
摘要: 在这项工作中,我们考虑了代码世界模型,这是由大型语言模型(LLM)生成的世界模型,采用Python代码形式用于基于模型的强化学习(RL)。与LLMs进行规划相比,使用代码具有精确、可靠、可解释和极其高效的优点。然而,编写适当的代码世界模型需要理解复杂的指令、生成带有非平凡逻辑的精确代码,并通过单元测试和环境轨迹的反馈进行自我调试长程序的能力。为了解决这些挑战,我们提出了基于蒙特卡洛树搜索(GIF-MCTS)的生成、改进和修复策略,这是一种新的LLMs代码生成策略。为了测试我们的方法,我们引入了代码世界模型基准(CWMB),这是一个由18个不同的RL环境和相应的文本描述以及策划的轨迹组成的程序综合和规划任务套件。GIF-MCTS在CWMB和另外两个基准测试中均超越了所有基线,并且我们展示了使用这种方法合成的代码世界模型可以成功用于规划,从而导致基于模型的RL代理的样本效率和推理速度大大提高。
更新时间: 2024-05-24 09:31:26
领域: cs.AI
Nonlinear Meta-Learning Can Guarantee Faster Rates
Many recent theoretical works on meta-learning aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- may scale with the number $N$ of tasks (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,
Updated: 2024-05-24 09:31:02
标题: 非线性元学习可以保证更快速率
摘要: 许多最近关于元学习的理论作品旨在利用相关任务中相似的表征结构来简化目标任务,并实现保证。重要的是,关于这一主题的理论作品的主要目标是理解学习共同表征的收敛速率在多任务数量$N$(以及每个任务的样本数量)的情况下如何扩展。在这一设置中的第一步展示了当任务之间的共享表征和特定任务的回归函数均为线性时,这一属性。这种线性设置很容易展示了通过平均参数等方式对任务进行聚合的好处。然而,在实践中,表征通常是高度非线性的,引入了在每个任务中不能像线性情况下那样轻易平均掉的非平凡偏差。在本文中,我们为具有非线性表示的元学习推导了理论保证。特别是,假设共享的非线性映射到一个无限维RKHS,我们展示了通过利用任务特定回归函数的平滑性来谨慎进行正则化,可以缓解额外的偏差。
更新时间: 2024-05-24 09:31:02
领域: stat.ML,cs.LG,math.ST,stat.TH
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems
Architectural simulators hold a vital role in RISC-V research, providing a crucial platform for workload evaluation without the need for costly physical prototypes. They serve as a dynamic environment for exploring innovative architectural concepts, enabling swift iteration and thorough analysis of performance metrics. As deep learning algorithms become increasingly pervasive, it is essential to benchmark new architectures with machine learning workloads. The diverse computational kernels used in deep learning algorithms highlight the necessity for a comprehensive compilation toolchain to map to target hardware platforms. This study evaluates the performance of a wide array of machine learning workloads on RISC-V architectures using gem5, an open-source architectural simulator. Leveraging an open-source compilation toolchain based on Multi-Level Intermediate Representation (MLIR), the research presents benchmarking results specifically focused on deep learning inference workloads. Additionally, the study sheds light on current limitations of gem5 when simulating RISC-V architectures, offering insights for future development and refinement.
Updated: 2024-05-24 09:24:46
标题: RISC-V系统中机器学习推理工作负载的全栈评估
摘要: 建筑模拟器在RISC-V研究中起着重要作用,为工作负载评估提供了一个至关重要的平台,而无需昂贵的物理原型。它们作为一个动态环境,用于探索创新的架构概念,实现快速迭代和深入分析性能指标。随着深度学习算法日益普及,对新架构进行机器学习工作负载的基准测试变得至关重要。深度学习算法中使用的各种计算内核突出了需要一个全面的编译工具链来映射到目标硬件平台的必要性。本研究评估了使用gem5,一个开源的架构模拟器,在RISC-V架构上执行各种机器学习工作负载的性能。利用基于多级中间表示(MLIR)的开源编译工具链,研究重点呈现了针对深度学习推理工作负载的基准测试结果。此外,该研究还揭示了gem5在模拟RISC-V架构时的当前限制,为未来的发展和改进提供了见解。
更新时间: 2024-05-24 09:24:46
领域: cs.AR,cs.AI
Log-Concave Sampling on Compact Supports: A Versatile Proximal Framework
In this paper, we explore sampling from strongly log-concave distributions defined on convex and compact supports. We propose a general proximal framework that involves projecting onto the constrained set, which is highly flexible and supports various projection options. Specifically, we consider the cases of Euclidean and Gauge projections, with the latter having the advantage of being performed efficiently using a membership oracle. This framework can be seamlessly integrated with multiple sampling methods. Our analysis focuses on Langevin-type sampling algorithms within the context of constrained sampling. We provide nonasymptotic upper bounds on the W1 and W2 errors, offering a detailed comparison of the performance of these methods in constrained sampling.
Updated: 2024-05-24 09:24:21
标题: 对紧支撑上的对数凹采样:一种多功能的近端框架
摘要: 在本文中,我们探讨了在凸紧支撑上定义的强对数凹分布的抽样。我们提出了一个涉及投影到约束集的通用近端框架,这个框架非常灵活,支持各种投影选项。具体来说,我们考虑了欧几里德和标量投影的情况,后者具有使用成员预言效率高的优势。这个框架可以与多种抽样方法无缝集成。我们的分析集中在约束抽样的背景下的 Langevin 类型抽样算法上。我们提供了 W1 和 W2 错误的非渐近上界,详细比较了这些方法在约束抽样中的性能。
更新时间: 2024-05-24 09:24:21
领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH
Fast, accurate training and sampling of Restricted Boltzmann Machines
Thanks to their simple architecture, Restricted Boltzmann Machines (RBMs) are powerful tools for modeling complex systems and extracting interpretable insights from data. However, training RBMs, as other energy-based models, on highly structured data poses a major challenge, as effective training relies on mixing the Markov chain Monte Carlo simulations used to estimate the gradient. This process is often hindered by multiple second-order phase transitions and the associated critical slowdown. In this paper, we present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM through a convex optimization procedure. This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process. By starting the standard training process with a model that already accurately represents the main modes of the data, we bypass the initial phase transitions. Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail. Furthermore, we use the training trajectories to propose a new sampling method, {\em parallel trajectory tempering}, which allows us to sample the equilibrium measure of the trained model much faster than previous optimized MCMC approaches and a better estimation of the log-likelihood. We illustrate the success of the training method on several highly structured datasets.
Updated: 2024-05-24 09:23:43
标题: 快速、准确的训练和采样受限玻尔兹曼机
摘要: 由于其简单的结构,受限玻尔兹曼机(RBMs)是对建模复杂系统和从数据中提取可解释见解的强大工具。然而,对RBMs进行训练,就像对其他基于能量的模型进行训练一样,在高度结构化的数据上面临着重大挑战,因为有效的训练依赖于混合马尔科夫链蒙特卡洛仿真用于估计梯度。这个过程通常受到多个二阶相变和相关的临界减速的影响。在本文中,我们提出了一种创新方法,通过凸优化过程将数据集的主要方向集成到低秩RBMs中。这种方法通过静态蒙特卡洛过程实现了对平衡测度的高效采样。通过以已经准确表示数据主要模式的模型开始标准训练过程,我们绕过了初始相变。我们的结果表明,这种策略成功地训练了RBMs来捕捉以前方法失败的数据集中的完整多样性。此外,我们利用训练轨迹提出了一种新的采样方法,“并行轨迹调温”,使我们能够比以前优化的MCMC方法更快地对训练模型的平衡测度进行采样,并更好地估计对数似然。我们在几个高度结构化的数据集上展示了训练方法的成功。
更新时间: 2024-05-24 09:23:43
领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech
A Planet Scale Spatial-Temporal Knowledge Graph Based On OpenStreetMap And H3 Grid
Geospatial data plays a central role in modeling our world, for which OpenStreetMap (OSM) provides a rich source of such data. While often spatial data is represented in a tabular format, a graph based representation provides the possibility to interconnect entities which would have been separated in a tabular representation. We propose in our paper a framework which supports a planet scale transformation of OpenStreetMap data into a Spatial Temporal Knowledge Graph. In addition to OpenStreetMap data, we align the different OpenStreetMap geometries on individual h3 grid cells. We compare our constructed spatial knowledge graph to other spatial knowledge graphs and outline our contribution in this paper. As a basis for our computation, we use Apache Sedona as a computational framework for our Spatial Temporal Knowledge Graph construction
Updated: 2024-05-24 09:22:20
标题: 基于OpenStreetMap和H3格网的全球尺度时空知识图谱
摘要: 地理空间数据在建模我们的世界中发挥着核心作用,OpenStreetMap(OSM)为这类数据提供了丰富的来源。虽然空间数据通常以表格格式表示,但基于图形的表示提供了将在表格表示中分开的实体相互连接的可能性。我们在本文中提出了一个框架,支持将OpenStreetMap数据转换为空间时间知识图的全球范围转换。除了OpenStreetMap数据外,我们将不同的OpenStreetMap几何体对齐到各个单独的h3网格单元上。我们将我们构建的空间知识图与其他空间知识图进行比较,并概述了本文中的贡献。作为计算的基础,我们使用Apache Sedona作为我们的空间时间知识图构建的计算框架。
更新时间: 2024-05-24 09:22:20
领域: cs.AI,cs.DB,cs.DC
Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph
The proposed research aims to develop an innovative semantic query processing system that enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University (ANU). The system integrates Large Language Models (LLMs) with the ANU Scholarly Knowledge Graph (ASKG), a structured repository of all research-related artifacts produced at ANU in the CS field. Each artifact and its parts are represented as textual nodes stored in a Knowledge Graph (KG). To address the limitations of traditional scholarly KG construction and utilization methods, which often fail to capture fine-grained details, we propose a novel framework that integrates the Deep Document Model (DDM) for comprehensive document representation and the KG-enhanced Query Processing (KGQP) for optimized complex query handling. DDM enables a fine-grained representation of the hierarchical structure and semantic relationships within academic papers, while KGQP leverages the KG structure to improve query accuracy and efficiency with LLMs. By combining the ASKG with LLMs, our approach enhances knowledge utilization and natural language understanding capabilities. The proposed system employs an automatic LLM-SPARQL fusion to retrieve relevant facts and textual nodes from the ASKG. Initial experiments demonstrate that our framework is superior to baseline methods in terms of accuracy retrieval and query efficiency. We showcase the practical application of our framework in academic research scenarios, highlighting its potential to revolutionize scholarly knowledge management and discovery. This work empowers researchers to acquire and utilize knowledge from documents more effectively and provides a foundation for developing precise and reliable interactions with LLMs.
Updated: 2024-05-24 09:19:45
标题: 利用大型语言模型在学术知识图谱中进行语义查询处理
摘要: 该研究旨在开发一种创新的语义查询处理系统,使用户能够获取有关澳大利亚国立大学(ANU)计算机科学(CS)研究人员所产生的研究作品的全面信息。该系统将大型语言模型(LLMs)与ANU学术知识图(ASKG)集成在一起,ASKG是存储在ANU计算机科学领域中所有研究相关文物的结构化存储库。每个文物及其部分都表示为存储在知识图(KG)中的文本节点。 为了解决传统学术知识图构建和利用方法的局限性,这些方法通常无法捕捉细粒度的细节,我们提出了一个集成了深度文档模型(DDM)和KG增强查询处理(KGQP)的新框架,用于全面文档表示和优化复杂查询处理。DDM能够细粒度地表示学术论文中的层次结构和语义关系,而KGQP利用知识图结构与LLMs提高查询的准确性和效率。 通过将ASKG与LLMs结合起来,我们的方法增强了知识利用和自然语言理解能力。提出的系统采用自动LLM-SPARQL融合来从ASKG中检索相关事实和文本节点。初步实验表明,我们的框架在准确检索和查询效率方面优于基准方法。 我们展示了我们框架在学术研究场景中的实际应用,突出了其革新学术知识管理和发现的潜力。这项工作使研究人员能够更有效地从文档中获得和利用知识,并为与LLMs开发精确可靠的交互提供了基础。
更新时间: 2024-05-24 09:19:45
领域: cs.IR,cs.AI,cs.CL,H.3.3; I.2.4; I.7.5; I.2.7
On Computing Plans with Uniform Action Costs
In many real-world planning applications, agents might be interested in finding plans whose actions have costs that are as uniform as possible. Such plans provide agents with a sense of stability and predictability, which are key features when humans are the agents executing plans suggested by planning tools. This paper adapts three uniformity metrics to automated planning, and introduce planning-based compilations that allow to lexicographically optimize sum of action costs and action costs uniformity. Experimental results both in well-known and novel planning benchmarks show that the reformulated tasks can be effectively solved in practice to generate uniform plans.
Updated: 2024-05-24 09:19:23
标题: 关于计算具有统一行动成本的计划
摘要: 在许多现实世界的规划应用中,代理人可能对寻找其行动成本尽可能均匀的计划感兴趣。这样的计划为代理人提供了稳定性和可预测性的感觉,这在人类作为执行规划工具建议的计划的代理人时是关键的特征。本文将三种均匀性度量指标调整为自动规划,并介绍基于规划的编译,允许按字典顺序优化行动成本总和和行动成本均匀性。在众所周知和新颖的规划基准中的实验结果表明,重新制定的任务可以在实践中有效地解决,以生成均匀的计划。
更新时间: 2024-05-24 09:19:23
领域: cs.AI
Analyzing the Impact of Climate Change With Major Emphasis on Pollution: A Comparative Study of ML and Statistical Models in Time Series Data
Industrial operations have grown exponentially over the last century, driving advancements in energy utilization through vehicles and machinery.This growth has significant environmental implications, necessitating the use of sophisticated technology to monitor and analyze climate data.The surge in industrial activities presents a complex challenge in forecasting its diverse environmental impacts, which vary greatly across different regions.Aim to understand these dynamics more deeply to predict and mitigate the environmental impacts of industrial activities.
Updated: 2024-05-24 09:18:17
标题: 分析气候变化对污染的影响:基于时间序列数据的机器学习和统计模型的比较研究
摘要: 在过去一个世纪中,工业运营呈指数级增长,推动了能源利用在车辆和机械领域的进步。这种增长带来了重要的环境影响,需要利用先进技术来监测和分析气候数据。工业活动的激增给环境影响的预测带来了复杂挑战,这些影响在不同地区之间差异巨大。旨在更深入地了解这些动态,以预测和减轻工业活动的环境影响。
更新时间: 2024-05-24 09:18:17
领域: stat.AP,cs.AI,stat.ML
SeDe: Balancing Blockchain Privacy and Regulatory Compliance by Selective De-Anonymization
Privacy is one of the essential pillars for the widespread adoption of blockchains, but public blockchains are transparent by nature. Modern analytics techniques can easily subdue the pseudonymity feature of a blockchain user. Some applications have been able to provide practical privacy protections using privacy-preserving cryptography techniques. However, malicious actors have abused them illicitly, discouraging honest actors from using privacy-preserving applications as "mixing" user interactions and funds with anonymous bad actors, causing compliance and regulatory concerns. In this paper, we propose a framework that balances privacy-preserving features by establishing a regulatory and compliant framework called Selective De-Anonymization (SeDe). The adoption of this framework allows privacy-preserving applications on blockchains to de-anonymize illicit transactions by recursive traversal of subgraphs of linked transactions. Our technique achieves this without leaving de-anonymization decisions or control in the hands of a single entity but distributing it among multiple entities while holding them accountable for their respective actions. To instantiate, our framework uses threshold encryption schemes and Zero-Knowledge Proofs (ZKPs).
Updated: 2024-05-24 09:18:10
标题: SeDe:通过选择性去匿名化实现区块链隐私和监管合规的平衡
摘要: 隐私是区块链广泛采用的重要支柱之一,但公共区块链天生透明。现代分析技术可以轻易地削弱区块链用户的匿名特性。一些应用程序能够利用隐私保护密码技术提供实际的隐私保护。然而,恶意行为者滥用它们,非法使用,使诚实的行为者不愿使用隐私保护应用程序,因为这些应用程序将用户交互和资金与匿名的不良行为者混合在一起,引起合规和监管方面的担忧。 在本文中,我们提出了一个框架,通过建立一个名为选择性去匿名化(SeDe)的监管和合规框架来平衡隐私保护功能。采用这一框架可以通过递归遍历链接交易的子图来去匿名非法交易,而无需将去匿名决策或控制权交给单一实体,而是将其分散在多个实体之间,并使其对各自的行为负责。为实现这一目标,我们的框架使用了阈值加密方案和零知识证明(ZKPs)。
更新时间: 2024-05-24 09:18:10
领域: cs.CR
A Fisher-Rao gradient flow for entropic mean-field min-max games
Gradient flows play a substantial role in addressing many machine learning problems. We examine the convergence in continuous-time of a \textit{Fisher-Rao} (Mean-Field Birth-Death) gradient flow in the context of solving convex-concave min-max games with entropy regularization. We propose appropriate Lyapunov functions to demonstrate convergence with explicit rates to the unique mixed Nash equilibrium.
Updated: 2024-05-24 09:15:29
标题: 一个用于熵场均最小-最大博弈的Fisher-Rao梯度流
摘要: 梯度流在解决许多机器学习问题中起着重要作用。本文研究了在连续时间中\textit{Fisher-Rao}(均场出生-死亡)梯度流在解决带有熵正则化的凸-凹极小-极大博弈中的收敛性。我们提出了适当的李亚普诺夫函数,以明确的速率证明收敛到唯一的混合纳什均衡。
更新时间: 2024-05-24 09:15:29
领域: math.OC,cs.LG,math.PR
Autonomous Quilt Spreading for Caregiving Robots
In this work, we propose a novel strategy to ensure infants, who inadvertently displace their quilts during sleep, are promptly and accurately re-covered. Our approach is formulated into two subsequent steps: interference resolution and quilt spreading. By leveraging the DWPose human skeletal detection and the Segment Anything instance segmentation models, the proposed method can accurately recognize the states of the infant and the quilt over her, which involves addressing the interferences resulted from an infant's limbs laid on part of the quilt. Building upon prior research, the EM*D deep learning model is employed to forecast quilt state transitions before and after quilt spreading actions. To improve the sensitivity of the network in distinguishing state variation of the handled quilt, we introduce an enhanced loss function that translates the voxelized quilt state into a more representative one. Both simulation and real-world experiments validate the efficacy of our method, in spreading and recover a quilt over an infant.
Updated: 2024-05-24 09:11:29
标题: 无人机器人自主传播被子以进行照料
摘要: 在这项工作中,我们提出了一种新颖的策略,以确保在睡眠中无意中移动他们的被子的婴儿能够及时而准确地重新盖上被子。我们的方法分为两个连续步骤:干涉解决和被子展开。通过利用DWPose人体骨骼检测和Segment Anything实例分割模型,所提出的方法可以准确识别婴儿和被子的状态,包括解决由婴儿的肢体放在被子一部分上引起的干扰。在先前研究的基础上,我们采用EM*D深度学习模型来预测被子展开动作之前和之后的被子状态转换。为了提高网络在区分被处理被子状态变化的敏感性,我们引入了一种增强损失函数,将体素化的被子状态转化为更具代表性的状态。模拟和真实世界实验验证了我们的方法在展开和重新盖上婴儿被子方面的有效性。
更新时间: 2024-05-24 09:11:29
领域: cs.RO,cs.AI
TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance
Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges, we propose an External Knowledge-Enhanced Recommendation method with LLM Assistance (TRAWL). This method utilizes large language models (LLMs) to extract relevant recommendation knowledge from raw external data and employs a contrastive learning strategy for adapter training. Experiments on public datasets and real-world online recommender systems validate the effectiveness of our approach.
Updated: 2024-05-24 09:09:35
标题: TRAWL:LLM辅助的外部知识增强推荐
摘要: 将语义信息与行为数据相结合是推荐系统中一个至关重要的研究领域。一种有前途的方法涉及利用外部知识来丰富基于行为的推荐系统的语义信息。然而,这种方法面临两个主要挑战:去噪原始外部知识和适应语义表示。为了解决这些挑战,我们提出了一种具有LLM辅助的外部知识增强推荐方法(TRAWL)。该方法利用大型语言模型(LLMs)从原始外部数据中提取相关的推荐知识,并采用对比学习策略进行适配器训练。对公共数据集和真实世界在线推荐系统的实验验证了我们方法的有效性。
更新时间: 2024-05-24 09:09:35
领域: cs.IR,cs.AI,cs.CL
Cross-Domain Policy Adaptation by Capturing Representation Mismatch
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL). In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain, and one can get access to sufficient source domain data, while can only have limited interactions with the target domain. Existing methods address this problem by learning domain classifiers, performing data filtering from a value discrepancy perspective, etc. Instead, we tackle this challenge from a decoupled representation learning perspective. We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain, which we show can be a signal of dynamics mismatch. We also show that representation deviation upper bounds performance difference of a given policy in the source domain and target domain, which motivates us to adopt representation deviation as a reward penalty. The produced representations are not involved in either policy or value function, but only serve as a reward penalizer. We conduct extensive experiments on environments with kinematic and morphology mismatch, and the results show that our method exhibits strong performance on many tasks. Our code is publicly available at https://github.com/dmksjfl/PAR.
Updated: 2024-05-24 09:06:12
标题: 跨领域政策适应:通过捕捉表征不匹配进行适应
摘要: 在强化学习(RL)中,学习可以转移到不同领域的具有动态差异的有效政策至关重要。本文考虑存在源域和目标域之间动态不匹配的动态适应设置,并且可以访问足够的源域数据,但只能在目标域中进行有限的交互。现有方法通过学习域分类器,从价值差异的角度进行数据过滤等方式来解决这个问题。相反,我们从解耦表示学习的角度来应对这一挑战。我们仅在目标域中进行表示学习,并测量从源域到目标域的转换中的表示偏差,这显示出动态不匹配的信号。我们还表明,表示偏差上限了给定策略在源域和目标域中的性能差异,这促使我们采用表示偏差作为奖励惩罚。生成的表示既不涉及策略也不涉及价值函数,而只作为奖励惩罚器。我们在具有运动学和形态不匹配的环境中进行了大量实验,结果表明我们的方法在许多任务上表现出色。我们的代码公开可用于https://github.com/dmksjfl/PAR。
更新时间: 2024-05-24 09:06:12
领域: cs.LG,cs.AI
FloodDamageCast: Building Flood Damage Nowcasting with Machine Learning and Data Augmentation
Near-real time estimation of damage to buildings and infrastructure, referred to as damage nowcasting in this study, is crucial for empowering emergency responders to make informed decisions regarding evacuation orders and infrastructure repair priorities during disaster response and recovery. Here, we introduce FloodDamageCast, a machine learning framework tailored for property flood damage nowcasting. The framework leverages heterogeneous data to predict residential flood damage at a resolution of 500 meters by 500 meters within Harris County, Texas, during the 2017 Hurricane Harvey. To deal with data imbalance, FloodDamageCast incorporates a generative adversarial networks-based data augmentation coupled with an efficient machine learning model. The results demonstrate the model's ability to identify high-damage spatial areas that would be overlooked by baseline models. Insights gleaned from flood damage nowcasting can assist emergency responders to more efficiently identify repair needs, allocate resources, and streamline on-the-ground inspections, thereby saving both time and effort.
Updated: 2024-05-24 09:04:33
标题: 洪水损害预测:利用机器学习和数据增强构建洪水损害即时预测
摘要: 这项研究中所称为灾害现在预测的建筑物和基础设施损坏的近实时估计对于赋予紧急响应人员在灾难响应和恢复期间做出有根据的决策,如疏散命令和基础设施修复优先级,至关重要。在这里,我们介绍了FloodDamageCast,一个专为财产洪水损害现在预测而设计的机器学习框架。该框架利用异构数据,在2017年飓风哈维期间,在得克萨斯州哈里斯县的500米×500米的分辨率内预测住宅洪水损害。为了处理数据不平衡问题,FloodDamageCast结合了基于生成对抗网络的数据增强和高效机器学习模型。结果表明,该模型能够识别被基线模型忽略的高损害空间区域。从洪水损害现在预测中获得的见解可以帮助紧急响应人员更有效地识别修复需求,分配资源,并简化现场检查,从而节省时间和精力。
更新时间: 2024-05-24 09:04:33
领域: cs.LG
Pipeline Parallelism with Controllable Memory
Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our proposed methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models.
Updated: 2024-05-24 08:54:36
标题: 具有可控内存的管道并行ism
摘要: 管道并行性已经被广泛探讨,但大多数现有的调度缺乏系统方法。在本文中,我们提出了一个框架来将管道调度分解为重复的构建块,并展示了构建块的寿命决定了管道调度的峰值激活内存。在观察的指导下,我们发现据我们所知,几乎所有现有的管道调度都是内存效率低下的。为了解决这一问题,我们引入了一系列内存高效的构建块,其激活内存可控,可以将峰值激活内存降低到1F1B的1/2,而不牺牲效率,甚至可以降低到1/3,同时保持可比的吞吐量。我们还可以实现几乎零的管道气泡,同时保持与1F1B相同的激活内存。我们的评估表明,在纯管道并行设置中,我们的方法在吞吐量方面比1F1B提高了7%至55%。在实际情景中通过混合并行超参数进行网格搜索时,我们提出的方法相比于1F1B基线大语言模型的吞吐量提高了16%。
更新时间: 2024-05-24 08:54:36
领域: cs.LG,cs.CL,cs.DC
Less is More: on the Over-Globalizing Problem in Graph Transformers
Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.
Updated: 2024-05-24 08:53:13
标题: Less is More:关于图变换器中过度全球化问题
摘要: 由于其全局注意机制,图变换器已经成为处理图结构数据的新工具。众所周知,全局注意机制在完全连接的图中考虑了更广泛的感受野,导致许多人认为可以从所有节点中提取有用信息。在本文中,我们挑战这种观念:全局化属性是否总是有利于图变换器?我们通过提供经验证据和理论分析揭示了图变换器中的过度全局化问题,即当前的注意机制过度关注那些远处的节点,而实际上包含大部分有用信息的近处节点相对被削弱。然后,我们提出了一种新颖的双层全局图变换器与协作训练(CoBFormer),包括簇间和簇内变换器,以防止过度全局化问题,同时保持从远处节点提取有价值信息的能力。此外,提出了协作训练以提高模型的泛化能力,并附带理论保证。在各种图上的大量实验证明了我们提出的CoBFormer的有效性。
更新时间: 2024-05-24 08:53:13
领域: cs.LG,cs.AI
Proportional Fairness in Clustering: A Social Choice Perspective
We study the proportional clustering problem of Chen et al. [ICML'19] and relate it to the area of multiwinner voting in computational social choice. We show that any clustering satisfying a weak proportionality notion of Brill and Peters [EC'23] simultaneously obtains the best known approximations to the proportional fairness notion of Chen et al. [ICML'19], but also to individual fairness [Jung et al., FORC'20] and the "core" [Li et al. ICML'21]. In fact, we show that any approximation to proportional fairness is also an approximation to individual fairness and vice versa. Finally, we also study stronger notions of proportional representation, in which deviations do not only happen to single, but multiple candidate centers, and show that stronger proportionality notions of Brill and Peters [EC'23] imply approximations to these stronger guarantees.
Updated: 2024-05-24 08:52:23
标题: 在聚类中的比例公平:社会选择的视角
摘要: 我们研究了陈等人的比例聚类问题,并将其与计算社会选择中的多赢家投票领域相关联。我们展示了任何满足Brill和Peters的弱比例概念[EC'23]的聚类同时得到了陈等人[ICML'19]的比例公平概念的最佳已知近似,同时也得到了个体公平[Jung等人,FORC'20]和“核心”[Li等人,ICML'21]的最佳已知近似。事实上,我们展示了任何对比例公平的近似也是对个体公平的近似,反之亦然。最后,我们还研究了更强的比例代表概念,其中偏差不仅发生在单个候选中心,还发生在多个候选中心,并展示了Brill和Peters的更强比例概念[EC'23]意味着对这些更强保证的近似。
更新时间: 2024-05-24 08:52:23
领域: cs.LG,cs.CY,cs.GT
Coordinated Multi-Neighborhood Learning on a Directed Acyclic Graph
Learning the structure of causal directed acyclic graphs (DAGs) is useful in many areas of machine learning and artificial intelligence, with wide applications. However, in the high-dimensional setting, it is challenging to obtain good empirical and theoretical results without strong and often restrictive assumptions. Additionally, it is questionable whether all of the variables purported to be included in the network are observable. It is of interest then to restrict consideration to a subset of the variables for relevant and reliable inferences. In fact, researchers in various disciplines can usually select a set of target nodes in the network for causal discovery. This paper develops a new constraint-based method for estimating the local structure around multiple user-specified target nodes, enabling coordination in structure learning between neighborhoods. Our method facilitates causal discovery without learning the entire DAG structure. We establish consistency results for our algorithm with respect to the local neighborhood structure of the target nodes in the true graph. Experimental results on synthetic and real-world data show that our algorithm is more accurate in learning the neighborhood structures with much less computational cost than standard methods that estimate the entire DAG. An R package implementing our methods may be accessed at https://github.com/stephenvsmith/CML.
Updated: 2024-05-24 08:49:43
标题: 有向无环图上的多邻域协调学习
摘要: 学习因果有向无环图(DAGs)的结构在许多机器学习和人工智能领域都是有用的,并具有广泛的应用。然而,在高维环境中,要在没有强大且常常具有限制性的假设的情况下获得良好的实证和理论结果是具有挑战性的。此外,有待质疑的是网络中所谓包含的所有变量是否都是可观测的。因此,将考虑范围限制在一部分变量以进行相关且可靠的推断是非常有意义的。事实上,各个学科的研究人员通常可以选择网络中一组目标节点进行因果发现。本文提出了一种新的基于约束的方法,用于估算多个用户指定目标节点周围的局部结构,实现了邻域之间的结构学习协调。我们的方法促进了因果发现,而无需学习整个DAG结构。我们针对目标节点在真实图中的局部邻域结构建立了算法的一致性结果。对合成和真实数据的实验结果表明,我们的算法在学习邻域结构时比估算整个DAG的标准方法具有更高的准确性,并且计算成本更低。实现我们方法的R包可以在https://github.com/stephenvsmith/CML 上访问。
更新时间: 2024-05-24 08:49:43
领域: stat.ML,cs.LG
Strong screening rules for group-based SLOPE models
Tuning the regularization parameter in penalized regression models is an expensive task, requiring multiple models to be fit along a path of parameters. Strong screening rules drastically reduce computational costs by lowering the dimensionality of the input prior to fitting. We develop strong screening rules for group-based Sorted L-One Penalized Estimation (SLOPE) models: Group SLOPE and Sparse-group SLOPE. The developed rules are applicable for the wider family of group-based OWL models, including OSCAR. Our experiments on both synthetic and real data show that the screening rules significantly accelerate the fitting process. The screening rules make it accessible for group SLOPE and sparse-group SLOPE to be applied to high-dimensional datasets, particularly those encountered in genetics.
Updated: 2024-05-24 08:48:06
标题: 基于群体的SLOPE模型的强筛选规则
摘要: 在惩罚回归模型中调整正则化参数是一项昂贵的任务,需要在参数路径上拟合多个模型。强筛选规则通过在拟合之前降低输入的维度,大幅降低了计算成本。我们为基于分组的排序L-One惩罚估计(SLOPE)模型开发了强筛选规则:分组SLOPE和稀疏分组SLOPE。开发的规则适用于更广泛的基于分组的OWL模型,包括OSCAR。我们在合成数据和真实数据上的实验表明,筛选规则显著加速了拟合过程。筛选规则使得基于分组的SLOPE和稀疏分组SLOPE可以应用于高维数据集,特别是在遗传学中遇到的数据集。
更新时间: 2024-05-24 08:48:06
领域: stat.ML,cs.LG,stat.ME
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
As the size and context length of Large Language Models (LLMs) grow, weight-activation quantization has emerged as a crucial technique for efficient deployment of LLMs. Compared to weight-only quantization, weight-activation quantization presents greater challenges due to the presence of outliers in activations. Existing methods have made significant progress by exploring mixed-precision quantization and outlier suppression. However, these methods primarily focus on optimizing the results of single matrix multiplication, neglecting the bidirectional propagation of quantization errors in LLMs. Specifically, errors accumulate vertically within the same token through layers, and diffuse horizontally across different tokens due to self-attention mechanisms. To address this issue, we introduce BiSup, a Bidirectional quantization error Suppression method. By constructing appropriate optimizable parameter spaces, BiSup utilizes a small amount of data for quantization-aware parameter-efficient fine-tuning to suppress the error vertical accumulation. Besides, BiSup employs prompt mixed-precision quantization strategy, which preserves high precision for the key-value cache of system prompts, to mitigate the error horizontal diffusion. Extensive experiments on Llama and Qwen families demonstrate that BiSup can improve performance over two state-of-the-art methods (the average WikiText2 perplexity decreases from 13.26 to 9.41 for Atom and from 14.33 to 7.85 for QuaRot under the W3A3-g128 configuration), further facilitating the practical applications of low-bit weight-activation quantization.
Updated: 2024-05-24 08:39:27
标题: BiSup:大型语言模型的双向量化误差抑制
摘要: 随着大型语言模型(LLMs)的规模和上下文长度的增长,权重激活量化已经成为LLMs高效部署的关键技术。与仅权重量化相比,权重激活量化由于激活中存在异常值而面临更大挑战。现有方法通过探索混合精度量化和异常值抑制取得了重大进展。然而,这些方法主要集中在优化单个矩阵乘法的结果,忽略了LLMs中量化误差的双向传播。具体而言,误差会通过层内在同一标记中垂直积累,并通过自注意机制在不同标记之间水平扩散。为了解决这个问题,我们引入了BiSup,一种双向量化误差抑制方法。通过构建适当的可优化参数空间,BiSup利用少量数据进行量化感知参数高效微调,以抑制误差的垂直积累。此外,BiSup采用提示混合精度量化策略,保留系统提示的高精度键值缓存,以减轻误差的水平扩散。在Llama和Qwen系列上进行的大量实验表明,BiSup可以提高性能,优于两种最先进的方法(在W3A3-g128配置下,Atom的平均WikiText2困惑度从13.26降至9.41,QuaRot从14.33降至7.85),进一步促进低比特权重激活量化的实际应用。
更新时间: 2024-05-24 08:39:27
领域: cs.CL,cs.AI,cs.LG
Decoding-time Realignment of Language Models
Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularization is critical: insufficient regularization can lead to reduced model capabilities due to reward hacking, whereas excessive regularization hinders alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths. This process, however, is resource-intensive, especially for large models. To address this challenge, we propose decoding-time realignment (DeRa), a simple method to explore and evaluate different regularization strengths in aligned models without retraining. DeRa enables control over the degree of alignment, allowing users to smoothly transition between unaligned and aligned models. It also enhances the efficiency of hyperparameter tuning by enabling the identification of effective regularization strengths using a validation dataset.
Updated: 2024-05-24 08:39:07
标题: 语言模型的解码时间重新对齐
摘要: 将语言模型与人类偏好对齐对于减少这些模型中的错误和偏见至关重要。对齐技术,如从人类反馈中强化学习(RLHF),通常被描述为优化人类偏好奖励和鼓励保持接近未对齐模型的近似正则化项之间的权衡。选择适当的正则化水平至关重要:不足的正则化可能导致由于奖励欺骗而降低模型能力,而过度正则化会阻碍对齐。传统方法寻找最佳正则化水平需要用不同的正则化强度重新训练多个模型。然而,这个过程是资源密集型的,尤其是对于大型模型。为了解决这一挑战,我们提出了解码时间重新对齐(DeRa),这是一种简单的方法,用于在不重新训练的情况下探索和评估对齐模型中不同的正则化强度。DeRa使用户能够控制对齐的程度,使用户能够在未对齐和对齐模型之间平滑过渡。它还通过使用验证数据集来识别有效的正则化强度,提高了超参数调整的效率。
更新时间: 2024-05-24 08:39:07
领域: cs.LG,cs.AI,cs.CL
DETECTA 2.0: Research into non-intrusive methodologies supported by Industry 4.0 enabling technologies for predictive and cyber-secure maintenance in SMEs
The integration of predictive maintenance and cybersecurity represents a transformative advancement for small and medium-sized enterprises (SMEs) operating within the Industry 4.0 paradigm. Despite their economic importance, SMEs often face significant challenges in adopting advanced technologies due to resource constraints and knowledge gaps. The DETECTA 2.0 project addresses these hurdles by developing an innovative system that harmonizes real-time anomaly detection, sophisticated analytics, and predictive forecasting capabilities. The system employs a semi-supervised methodology, combining unsupervised anomaly detection with supervised learning techniques. This approach enables more agile and cost-effective development of AI detection systems, significantly reducing the time required for manual case review. At the core lies a Digital Twin interface, providing intuitive real-time visualizations of machine states and detected anomalies. Leveraging cutting-edge AI engines, the system intelligently categorizes anomalies based on observed patterns, differentiating between technical errors and potential cybersecurity incidents. This discernment is fortified by detailed analytics, including certainty levels that enhance alert reliability and minimize false positives. The predictive engine uses advanced time series algorithms like N-HiTS to forecast future machine utilization trends. This proactive approach optimizes maintenance planning, enhances cybersecurity measures, and minimizes unplanned downtimes despite variable production processes. With its modular architecture enabling seamless integration across industrial setups and low implementation costs, DETECTA 2.0 presents an attractive solution for SMEs to strengthen their predictive maintenance and cybersecurity strategies.
Updated: 2024-05-24 08:38:38
标题: DETECTA 2.0:基于工业4.0支持的非侵入性方法的研究,为中小企业提供预测性和网络安全维护技术。
摘要: 预测性维护和网络安全的整合代表了在第四次工业革命范式下运营的中小型企业(SMEs)的一项革命性进步。尽管它们在经济上的重要性,由于资源约束和知识差距,SMEs经常面临采用先进技术的重大挑战。DETECTA 2.0项目通过开发一种创新系统来解决这些障碍,该系统协调实时异常检测、复杂分析和预测能力。 该系统采用半监督方法,结合无监督异常检测和监督学习技术。这种方法使得AI检测系统的开发更加灵活和具有成本效益,显著减少了手动案例审查所需的时间。 核心是数字孪生界面,提供机器状态和检测到的异常的直观实时可视化。利用先进的AI引擎,系统根据观察到的模式智能地对异常进行分类,区分技术错误和潜在的网络安全事件。这种洞察力得到详细分析的支持,包括增强警报可靠性和最小化误报的确定性水平。 预测引擎使用先进的时间序列算法如N-HiTS来预测未来机器利用趋势。这种积极的方法优化了维护计划,增强了网络安全措施,并尽量减少了由于可变生产过程而产生的计划外停机时间。 由于其模块化架构能够在工业设置中实现无缝集成,并且实施成本低,DETECTA 2.0为SMEs加强其预测性维护和网络安全战略提供了一种具有吸引力的解决方案。
更新时间: 2024-05-24 08:38:38
领域: cs.AI
Implementation of New Security Features in CMSWEB Kubernetes Cluster at CERN
The CMSWEB cluster is pivotal to the activities of the Compact Muon Solenoid (CMS) experiment, as it hosts critical services required for the operational needs of the CMS experiment. The security of these services and the corresponding data is crucial to CMS. Any malicious attack can compromise the availability of our services. Therefore, it is important to construct a robust security infrastructure. In this work, we discuss new security features introduced to the CMSWEB Kubernetes ("k8s") cluster. The new features include the implementation of network policies, deployment of Open Policy Agent (OPA), enforcement of OPA policies, and the integration of Vault. The network policies act as an inside-the-cluster firewall to limit the network communication between the pods to the minimum necessary, and its dynamic nature allows us to work with microservices. The OPA validates the objects against some custom-defined policies during create, update, and delete operations to further enhance security. Without recompiling or changing the configuration of the Kubernetes API server, it can apply customized policies on Kubernetes objects and their audit functionality enabling us to detect pre-existing conflicts and issues. Although Kubernetes incorporates the concepts of secrets, they are only base64 encoded and are not dynamically configured. This is where Vault comes into play: Vault dynamically secures, stores, and tightly controls access to sensitive data. This way, the secret information is encrypted, secured, and centralized, making it more scalable and easier to manage. Thus, the implementation of these three security features corroborate the enhanced security and reliability of the CMSWEB Kubernetes infrastructure.
Updated: 2024-05-24 08:22:22
标题: 在欧洲核子研究中心(CERN)的CMSWEB Kubernetes集群中实施新的安全功能
摘要: CMSWEB集群对紧凑介子磁铁(CMS)实验的活动至关重要,因为它托管了CMS实验所需的关键服务,以满足CMS实验的运行需求。这些服务和相应的数据的安全对CMS至关重要。任何恶意攻击都可能危及我们服务的可用性。因此,建立健壮的安全基础设施至关重要。在这项工作中,我们讨论了引入CMSWEB Kubernetes("k8s")集群的新安全功能。这些新功能包括实施网络策略、部署Open Policy Agent(OPA)、执行OPA策略以及集成Vault。网络策略充当集群内防火墙,限制pod之间的网络通信至最低必要水平,其动态特性使我们可以与微服务一起工作。OPA在创建、更新和删除操作期间根据一些自定义策略验证对象,以进一步增强安全性。在不重新编译或更改Kubernetes API服务器配置的情况下,它可以在Kubernetes对象上应用定制策略,并通过审计功能使我们能够检测预先存在的冲突和问题。尽管Kubernetes包含了秘密概念,但它们仅以base64编码,并且不能动态配置。这就是Vault发挥作用的地方:Vault动态保护、存储和严格控制对敏感数据的访问。这样,秘密信息被加密、保护和集中存储,使其更具可扩展性和易于管理。因此,这三个安全功能的实施证实了CMSWEB Kubernetes基础设施的增强安全性和可靠性。
更新时间: 2024-05-24 08:22:22
领域: cs.CR,cs.DC
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM
In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting the potential of existing models to enhance automation levels. To bridge this gap, this paper presents V-Zen, an innovative Multimodal Large Language Model (MLLM) meticulously crafted to revolutionise the domain of GUI understanding and grounding. Equipped with dual-resolution image encoders, V-Zen establishes new benchmarks in efficient grounding and next-action prediction, thereby laying the groundwork for self-operating computer systems. Complementing V-Zen is the GUIDE dataset, an extensive collection of real-world GUI elements and task-based sequences, serving as a catalyst for specialised fine-tuning. The successful integration of V-Zen and GUIDE marks the dawn of a new era in multimodal AI research, opening the door to intelligent, autonomous computing experiences. This paper extends an invitation to the research community to join this exciting journey, shaping the future of GUI automation. In the spirit of open science, our code, data, and model will be made publicly available, paving the way for multimodal dialogue scenarios with intricate and precise interactions.
Updated: 2024-05-24 08:21:45
标题: V-Zen:一种新型多模态LLM的高效GUI理解和精准对接
摘要: 在人工智能研究和应用的快速发展领域中,多模态大语言模型(MLLMs)已经成为一股变革力量,擅长解释和整合来自文本、图像和图形用户界面(GUIs)等多种模态的信息。尽管取得了这些进展,但GUI的微妙交互和理解仍然是一个重大挑战,限制了现有模型提升自动化水平的潜力。为了弥合这一差距,本文提出了V-Zen,一种精心打造的创新多模态大语言模型(MLLM),旨在彻底改变GUI理解和基础。V-Zen配备了双分辨率图像编码器,为高效的基础和下一步行动预测建立了新的基准,从而为自主操作计算系统奠定了基础。辅助V-Zen的是GUIDE数据集,这是一个包含大量真实世界GUI元素和基于任务序列的集合,可作为专门微调的催化剂。V-Zen和GUIDE的成功整合标志着多模态人工智能研究的新时代的开端,为智能、自主的计算体验打开了大门。本文邀请研究社区加入这一激动人心的旅程,共同塑造GUI自动化的未来。秉承开放科学精神,我们的代码、数据和模型将公开发布,为复杂和精确的多模态对话场景铺平道路。
更新时间: 2024-05-24 08:21:45
领域: cs.AI,cs.CV
Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map
Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coupling relationship and even leading to conflict decisions. In this paper, we introduce a novel data-driven deep reinforcement learning (DRL) approach, to handle multiple power flow adjustment tasks jointly instead of learning each task from scratch. At the heart of the proposed method is a multi-task attribution map (MAM), which enables the DRL agent to explicitly attribute each transmission interface task to different power system nodes with task-adaptive attention weights. Based on this MAM, the agent can further provide effective strategies to solve the multi-task adjustment problem with a near-optimal operation cost. Simulation results on the IEEE 118-bus system, a realistic 300-bus system in China, and a very large European system with 9241 buses demonstrate that the proposed method significantly improves the performance compared with several baseline methods, and exhibits high interpretability with the learnable MAM.
Updated: 2024-05-24 08:20:53
标题: 传输接口功率流调整:基于多任务归属图的深度强化学习方法
摘要: 传输接口功率流调整是确保电力系统安全和经济运行的关键措施。然而,传统基于模型的调整方案受限于电力系统中日益增加的变化和不确定性,其中不同传输接口的调整问题通常被视为几个独立的任务,忽略它们之间的耦合关系,甚至导致冲突决策。本文介绍了一种新颖的数据驱动深度强化学习(DRL)方法,以处理多个功率流调整任务,而不是从头开始学习每个任务。提出方法的核心是多任务归因映射(MAM),使DRL代理能够将每个传输接口任务明确地归因于不同的电力系统节点,并具有任务自适应的注意力权重。基于这个MAM,代理可以进一步提供有效的策略来解决多任务调整问题,以达到接近最优的运行成本。在IEEE 118节点系统、中国一个实际的300节点系统和一个拥有9241个节点的非常庞大的欧洲系统上的仿真结果表明,所提出的方法相比几种基准方法显著提高了性能,并且具有可学习的MAM的高可解释性。
更新时间: 2024-05-24 08:20:53
领域: eess.SY,cs.AI,cs.LG,cs.SY
Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved. Specifically, the estimation accuracy of the TV distance is proven to inherently depend on the separation of two Gaussian distributions: smaller estimation errors are achieved when the two Gaussian distributions are farther apart. This phenomenon is also validated empirically through extensive simulations. In the end, we apply this discriminative estimation method to rank fidelity of synthetic image data using the MNIST dataset.
Updated: 2024-05-24 08:18:09
标题: 判别估计总变差距离:生成数据的忠实审计员
摘要: 随着生成式人工智能的普及和生成数据(也称为合成数据)量的增加,评估生成数据的真实性已经成为一个关键问题。在本文中,我们提出了一种辨别方法,用于估计两个分布之间的总变差(TV)距离,作为生成数据真实性的有效度量。我们的方法定量地表征了在分类两个分布的贝叶斯风险与它们的TV距离之间的关系。因此,总变差距离的估计归结为贝叶斯风险的估计。特别地,本文建立了关于估计两个高斯分布之间TV距离的估计误差的收敛速度的理论结果。我们证明,通过在分类中选择特定的假设类别,可以实现对TV距离的快速收敛率的估计。具体而言,TV距离的估计精度被证明天生取决于两个高斯分布之间的分离:当两个高斯分布越远时,可以实现更小的估计误差。这一现象也通过大量模拟进行了实证验证。最后,我们将这种辨别估计方法应用于使用MNIST数据集对合成图像数据的真实性进行排名。
更新时间: 2024-05-24 08:18:09
领域: stat.ML,cs.LG
Cross-Validated Off-Policy Evaluation
In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory-based approaches, which provide only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.
Updated: 2024-05-24 08:13:16
标题: 交叉验证的离线策略评估
摘要: 在本文中,我们研究了离策略评估中的估计器选择和超参数调优问题。虽然交叉验证是监督学习中模型选择最流行的方法,但离策略评估主要依赖基于理论的方法,这些方法只能为从业者提供有限的指导。我们展示了如何在离策略评估中使用交叉验证。这挑战了一个流行的观念,即在离策略评估中不可行的交叉验证。我们通过实证评估我们的方法,并展示它可以解决各种使用案例。
更新时间: 2024-05-24 08:13:16
领域: cs.LG
Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
Recently, the strong latent Diffusion Probabilistic Model (DPM) has been applied to high-quality Text-to-Image (T2I) generation (e.g., Stable Diffusion), by injecting the encoded target text prompt into the gradually denoised diffusion image generator. Despite the success of DPM in practice, the mechanism behind it remains to be explored. To fill this blank, we begin by examining the intermediate statuses during the gradual denoising generation process in DPM. The empirical observations indicate, the shape of image is reconstructed after the first few denoising steps, and then the image is filled with details (e.g., texture). The phenomenon is because the low-frequency signal (shape relevant) of the noisy image is not corrupted until the final stage in the forward process (initial stage of generation) of adding noise in DPM. Inspired by the observations, we proceed to explore the influence of each token in the text prompt during the two stages. After a series of experiments of T2I generations conditioned on a set of text prompts. We conclude that in the earlier generation stage, the image is mostly decided by the special token [\texttt{EOS}] in the text prompt, and the information in the text prompt is already conveyed in this stage. After that, the diffusion model completes the details of generated images by information from themselves. Finally, we propose to apply this observation to accelerate the process of T2I generation by properly removing text guidance, which finally accelerates the sampling up to 25\%+.
Updated: 2024-05-24 08:12:41
标题: 朝向理解文本到图像扩散模型的工作机制
摘要: 最近,强大的潜在扩散概率模型(DPM)已被应用于高质量的文本到图像(T2I)生成(例如,稳定扩散),通过将编码的目标文本提示注入逐渐去噪的扩散图像生成器中。尽管DPM在实践中取得了成功,但其背后的机制仍有待探索。为了填补这一空白,我们首先通过检查DPM中逐步去噪生成过程中的中间状态来开始研究。经验观察表明,在经过前几个去噪步骤后,图像的形状被重建,然后图像被填充了细节(例如纹理)。这种现象是因为在DPM中向噪声图像添加噪声的正向过程(生成的初始阶段)中,噪声图像的低频信号(与形状有关)直到最后阶段才被破坏。受到这些观察的启发,我们继续探索文本提示中每个标记在两个阶段期间的影响。经过一系列基于一组文本提示的T2I生成实验后,我们得出结论,在较早的生成阶段,图像主要由文本提示中的特殊标记[\texttt{EOS}]决定,并且文本提示中的信息已在这个阶段传达。之后,扩散模型通过图像自身的信息完成了生成图像的细节。最后,我们建议将这一观察结果应用于通过适当删除文本指导来加速T2I生成过程,最终将采样加速至25\%以上。
更新时间: 2024-05-24 08:12:41
领域: cs.CV,cs.LG
Multi-Modal Recommendation Unlearning
Unlearning methods for recommender systems (RS) have emerged to address privacy issues and concerns about legal compliance. However, evolving user preferences and content licensing issues still remain unaddressed. This is particularly true in case of multi-modal recommender systems (MMRS), which aim to accommodate the growing influence of multi-modal information on user preferences. Previous unlearning methods for RS are inapplicable to MMRS due to incompatibility of multi-modal user-item behavior data graph with the matrix based representation of RS. Partitioning based methods degrade recommendation performance and incur significant overhead costs during aggregation. This paper introduces MMRecUN, a new framework for multi-modal recommendation unlearning, which, to the best of our knowledge, is the first attempt in this direction. Given the trained recommendation model and marked forget data, we devise Reverse Bayesian Personalized Ranking (BPR) objective to force the model to forget it. MMRecUN employs both reverse and forward BPR loss mechanisms to selectively attenuate the impact of interactions within the forget set while concurrently reinforcing the significance of interactions within the retain set. Our experiments demonstrate that MMRecUN outperforms baseline methods across various unlearning requests when evaluated on benchmark multi-modal recommender datasets. MMRecUN achieves recall performance improvements of up to $\mathbf{49.85%}$ compared to the baseline methods. It is up to $\mathbf{1.3}\times$ faster than the \textsc{Gold} model, which is trained on retain data from scratch. MMRecUN offers advantages such as superior performance in removing target elements, preservation of performance for retained elements, and zero overhead costs in comparison to previous methods.
Updated: 2024-05-24 08:11:59
标题: 多模态推荐反学习
摘要: 推荐系统(RS)的取消学习方法已经出现,以解决隐私问题和对法律合规性的担忧。然而,不断变化的用户偏好和内容许可问题仍然未得到解决。这在多模态推荐系统(MMRS)中尤为明显,MMRS旨在适应多模态信息对用户偏好的日益增长的影响。由于多模态用户-项目行为数据图与RS基于矩阵的表示不兼容,先前用于RS的取消学习方法不适用于MMRS。基于分区的方法会降低推荐性能,并在聚合过程中产生显着的额外成本。本文引入了MMRecUN,一种新的多模态推荐取消学习框架,据我们所知,这是这个方向的首次尝试。给定训练好的推荐模型和标记的遗忘数据,我们设计了反向贝叶斯个性化排名(BPR)目标来迫使模型忘记它。MMRecUN同时采用反向和正向BPR损失机制,有选择地减弱遗忘集合内的交互影响,同时加强保留集合内的交互的重要性。我们的实验表明,MMRecUN在基准多模态推荐数据集上评估时,在各种取消学习请求中优于基准方法。与基准方法相比,MMRecUN实现了高达49.85%的召回性能改善。它比从头开始使用保留数据训练的Gold模型快1.3倍。与先前方法相比,MMRecUN具有优势,如在移除目标元素方面表现出色、保留元素性能的保持以及零额外成本。
更新时间: 2024-05-24 08:11:59
领域: cs.LG,cs.IR
On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study
Previous efforts on hyperparameter optimization (HPO) of machine learning (ML) models predominately focus on algorithmic advances, yet little is known about the topography of the underlying hyperparameter (HP) loss landscape, which plays a fundamental role in governing the search process of HPO. While several works have conducted fitness landscape analysis (FLA) on various ML systems, they are limited to properties of isolated landscape without interrogating the potential structural similarities among them. The exploration of such similarities can provide a novel perspective for understanding the mechanism behind modern HPO methods, but has been missing, possibly due to the expensive cost of large-scale landscape construction, and the lack of effective analysis methods. In this paper, we mapped 1,500 HP loss landscapes of 6 representative ML models on 63 datasets across different fidelity levels, with 11M+ configurations. By conducting exploratory analysis on these landscapes with fine-grained visualizations and dedicated FLA metrics, we observed a similar landscape topography across a wide range of models, datasets, and fidelities, and shed light on several central topics in HPO.
Updated: 2024-05-24 08:08:51
标题: 关于机器学习模型的超参数损失景观:一项探索性研究
摘要: 以前关于机器学习(ML)模型的超参数优化(HPO)的努力主要集中在算法进展上,但对于潜在的超参数(HP)损失景观的地形却知之甚少,这在HPO的搜索过程中起着基础性的作用。虽然有几项研究对各种ML系统进行了适应性景观分析(FLA),但它们仅限于孤立景观的特性,没有探究它们之间的潜在结构相似性。对这种相似性的探索可以为理解现代HPO方法背后的机制提供一种新的视角,但这个研究一直缺失,可能是因为大规模景观构建的昂贵成本,以及缺乏有效的分析方法。在本文中,我们在63个不同的数据集上,通过11M+的配置,对代表性的6个ML模型的1,500个HP损失景观进行了绘制。通过对这些景观进行细粒度可视化和专门的FLA指标的探索性分析,我们观察到在各种模型、数据集和忠实度上存在类似的景观地形,并揭示了HPO中几个核心主题。
更新时间: 2024-05-24 08:08:51
领域: cs.LG
On the Identification of Temporally Causal Representation with Instantaneous Dependence
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.
Updated: 2024-05-24 08:08:05
标题: 关于通过瞬时依赖性识别时序因果表示
摘要: 时间因果表示学旨在从时间序列观察中识别潜在的因果过程,但大多数方法需要假设潜在的因果过程没有瞬时关系。尽管一些最近的方法在瞬时因果关系案例中实现了可识别性,但它们要求对潜在变量进行干预或对观察进行分组,这在一般情况下难以在现实世界场景中获得。为了填补这一空白,我们提出了一个用于瞬时潜在动态(IDOL)的识别框架,通过施加稀疏影响约束,使潜在因果过程具有稀疏的延迟和瞬时关系。具体来说,我们利用时间序列数据的情境信息,基于充分的变异性和稀疏影响约束建立了潜在因果过程的可识别性结果。基于这些理论,我们结合了一个时间变分推断架构来估计潜在变量,并使用基于梯度的稀疏正则化来识别潜在因果过程。对模拟数据集的实验结果表明,我们的方法可以识别潜在的因果过程。此外,在具有瞬时依赖性的多个人体运动预测基准上的评估表明,我们的方法在实际环境中是有效的。
更新时间: 2024-05-24 08:08:05
领域: cs.LG,stat.ML
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. Specifically, LeapAD emulates human attention by selecting critical objects relevant to driving decisions, simplifying environmental interpretation, and mitigating decision-making complexities. Additionally, LeapAD incorporates an innovative dual-process decision-making module, which consists of an Analytic Process (System-II) for thorough analysis and reasoning, along with a Heuristic Process (System-I) for swift and empirical processing. The Analytic Process leverages its logical reasoning to accumulate linguistic driving experience, which is then transferred to the Heuristic Process by supervised fine-tuning. Through reflection mechanisms and a growing memory bank, LeapAD continuously improves itself from past mistakes in a closed-loop environment. Closed-loop testing in CARLA shows that LeapAD outperforms all methods relying solely on camera input, requiring 1-2 orders of magnitude less labeled data. Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1.8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement. Code will be released at https://github.com/PJLab-ADG/LeapAD.
Updated: 2024-05-24 08:07:28
标题: 不断学习、适应和改进:自动驾驶的双过程方法
摘要: 自主驾驶技术由于传感器、机器学习和人工智能的进步而取得了显著进展。然而,现有方法在复杂场景和因果关系方面存在困难,影响了在不同环境中的适应性和可解释性。为了解决上述问题,我们引入了LeapAD,这是一种受人类认知过程启发的自主驾驶新范式。具体而言,LeapAD通过选择与驾驶决策相关的关键对象来模拟人类注意力,简化环境解释,并减轻决策复杂性。此外,LeapAD还融入了一种创新的双过程决策模块,包括用于彻底分析和推理的分析过程(系统-II)以及用于快速和经验性处理的启发式过程(系统-I)。分析过程利用其逻辑推理来积累语言驾驶经验,然后通过监督微调将其转移到启发式过程中。通过反思机制和不断增长的记忆库,LeapAD在封闭环境中不断改进自身,从过去的错误中学习。在CARLA的封闭环测试中,LeapAD表现优于仅依赖摄像头输入的所有方法,仅需要1-2个数量级更少的标记数据。实验还表明,随着记忆库的扩展,只有18亿参数的启发式过程可以继承由GPT-4提供动力的分析过程的知识,并实现持续的性能改进。代码将在https://github.com/PJLab-ADG/LeapAD发布。
更新时间: 2024-05-24 08:07:28
领域: cs.RO,cs.AI,cs.CV
Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or pre-defined rules, which inevitably result in a cumbersome and expensive learning process. In this paper, we introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the unilateral advisor-guidance mechanism with a bidirectional learner-initiative one, and thereby enables a customized and efficacious message exchange between learner and advisor. At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector, that can be readily incorporated into various discrete actor-critic architectures. The former component allows the agent to initiatively seek advisor intervention in the presence of uncertain states, while the latter identifies the unstable states potentially missed by the former especially when environment changes, and then learns to promote the ask action on such states. Experimental results on both stationary and non-stationary environments and across different actor-critic backbones demonstrate that the proposed framework significantly improves the learning efficiency of the agent, and achieves the performances on par with those obtained by continuous advisor monitoring.
Updated: 2024-05-24 08:05:29
标题: Ask-AC:一个在循环中提供建议的演员评论框架
摘要: 尽管取得了令人期待的成果,但目前最先进的交互式强化学习方案仍依赖于 passively receiving supervision signals from advisor experts,这些信号以连续监控或预定义规则的形式提供,这不可避免地导致了繁琐且昂贵的学习过程。在本文中,我们介绍了一种新颖的倡议者在环-批评者框架,称为 Ask-AC,它用一个双向的学习者倡议机制取代了单向的倡议者指导机制,从而实现了学习者和倡议者之间的定制和高效的信息交流。Ask-AC 的核心是两个互补的组件,即行动请求者和自适应状态选择器,可以轻松地整合到各种离散的行动-批评者架构中。前者使代理能够在存在不确定状态时主动寻求倡议者的干预,而后者则识别前者可能错过的不稳定状态,尤其是当环境发生变化时,然后学习在这些状态上促进倡议行动。在静态和非静态环境以及不同的行动-批评者支柱上的实验结果表明,所提出的框架显著提高了代理的学习效率,并取得了与连续倡议者监控获得的性能相当的表现。
更新时间: 2024-05-24 08:05:29
领域: cs.LG,cs.AI
Dishonest Approximate Computing: A Coming Crisis for Cloud Clients
Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial profits, untrusted CSPs may deploy low-cost AC devices and deceive clients by presenting AC services as promised accurate computing products, while falsely claiming AC outputs as accurate results. This misuse of AC will cause both financial loss and computing degradation to cloud clients. In this paper, we define this malicious attack as DisHonest Approximate Computing (DHAC) and analyze the technical challenges faced by clients in detecting such attacks. To address this issue, we propose two golden model free detection methods: Residual Class Check (RCC) and Forward-Backward Check (FBC). RCC provides clients a low-cost approach to infer the residual class to which a legitimate accurate output should belong. By comparing the residual class of the returned result, clients can determine whether a computing service contains any AC elements. FBC detects potential DHAC by computing an invertible check branch using the intermediate values of the program. It compares the values before entering and after returning from the check branch to identify any discrepancies. Both RCC and FBC can be executed concurrently with real computing tasks, enabling real-time DHAC detection with current inputs. Our experimental results show that both RCC and FBC can detect over 96%-99% of DHAC cases without misjudging any legitimate accurate results.
Updated: 2024-05-24 08:04:42
标题: 不诚实的近似计算:云客户面临的即将到来的危机
摘要: 大致计算(AC)已经成为实现节能架构的一种有前途的技术,并有望成为减少云服务提供商(CSP)电费成本的有效技术。然而,AC的潜在误用并未得到足够重视,这是AC蓝图背后的一个即将到来的危机。受追求非法财务利润驱使,不可信任的CSP可能部署低成本的AC设备,并通过将AC服务呈现为承诺的精确计算产品来欺骗客户,同时虚假宣称AC输出为准确结果。这种AC的误用将导致云客户的财务损失和计算降级。在本文中,我们将这种恶意攻击定义为不诚实的近似计算(DHAC),并分析客户在检测此类攻击时面临的技术挑战。为解决这一问题,我们提出了两种无金模型检测方法:残差类检查(RCC)和前后检查(FBC)。RCC为客户提供了一种低成本的方法,推断一个合法准确输出应属于的残差类。通过比较返回结果的残差类,客户可以确定一个计算服务是否包含任何AC元素。FBC通过使用程序的中间值计算可逆检查分支来检测潜在的DHAC。它比较进入和返回检查分支之前的值,以识别任何差异。RCC和FBC都可以与实际计算任务同时执行,使得能够实时检测DHAC,并使用当前输入。我们的实验结果表明,RCC和FBC可以检测超过96%-99%的DHAC案例,而不会误判任何合法准确结果。
更新时间: 2024-05-24 08:04:42
领域: cs.CR,cs.AR
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Recently, optimization on the Riemannian manifold has provided new insights to the optimization community. In this regard, the manifold taken as the probability measure metric space equipped with the second-order Wasserstein distance is of particular interest, since optimization on it can be linked to practical sampling processes. In general, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. The two flows in Euclidean space are standard continuous stochastic methods, while their Riemannian counterparts are unexplored. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete dynamics of desired Riemannian stochastic methods in Euclidean space. Then, our probability measures flows are obtained by the Fokker-Planck equation. Finally, the convergence rates of our Riemannian stochastic flows are proven, which match the results in Euclidean space.
Updated: 2024-05-24 08:04:03
标题: 在Wasserstein概率空间上的连续时间Riemannian SGD和SVRG流
摘要: 最近,在黎曼流形上的优化为优化社区提供了新的见解。在这方面,以概率测度度量空间为基础,配备二阶Wasserstein距离的流形尤为引人关注,因为在其上的优化可以与实际的采样过程相关联。一般来说,在Wasserstein空间上的标准(连续)优化方法是黎曼梯度流(即,在最小化KL散度时的Langevin动力学)。在本文中,我们旨在丰富Wasserstein空间中的连续优化方法,通过将其上的梯度流扩展为随机梯度下降(SGD)流和随机方差减少梯度(SVRG)流。在欧几里得空间中,这两种方法是标准的连续随机方法,而它们的黎曼对应尚未被探索。通过利用Wasserstein空间的性质,我们构建随机微分方程(SDEs)来近似欧几里得空间中所需的黎曼随机方法的相应离散动态。然后,我们的概率测度流通过Fokker-Planck方程获得。最后,我们证明了黎曼随机流的收敛速度,与欧几里得空间中的结果相匹配。
更新时间: 2024-05-24 08:04:03
领域: cs.LG
Ada-Tracker: Soft Tissue Tracking via Inter-Frame and Adaptive-Template Matching
Soft tissue tracking is crucial for computer-assisted interventions. Existing approaches mainly rely on extracting discriminative features from the template and videos to recover corresponding matches. However, it is difficult to adopt these techniques in surgical scenes, where tissues are changing in shape and appearance throughout the surgery. To address this problem, we exploit optical flow to naturally capture the pixel-wise tissue deformations and adaptively correct the tracked template. Specifically, we first implement an inter-frame matching mechanism to extract a coarse region of interest based on optical flow from consecutive frames. To accommodate appearance change and alleviate drift, we then propose an adaptive-template matching method, which updates the tracked template based on the reliability of the estimates. Our approach, Ada-Tracker, enjoys both short-term dynamics modeling by capturing local deformations and long-term dynamics modeling by introducing global temporal compensation. We evaluate our approach on the public SurgT benchmark, which is generated from Hamlyn, SCARED, and Kidney boundary datasets. The experimental results show that Ada-Tracker achieves superior accuracy and performs more robustly against prior works. Code is available at https://github.com/wrld/Ada-Tracker.
Updated: 2024-05-24 08:01:56
标题: Ada-Tracker: 通过帧间和自适应模板匹配进行软组织跟踪
摘要: 软组织跟踪对于计算机辅助手术至关重要。现有方法主要依赖于从模板和视频中提取区分特征以恢复相应匹配。然而,在手术现场很难采用这些技术,因为组织在手术过程中形状和外观都在变化。为了解决这个问题,我们利用光流自然捕捉像素级组织变形并自适应校正跟踪的模板。具体来说,我们首先实现一个帧间匹配机制,基于连续帧的光流提取一个粗略的兴趣区域。为了适应外观变化并减轻漂移,我们提出了一种自适应模板匹配方法,根据估计的可靠性更新跟踪的模板。我们的方法Ada-Tracker通过捕捉局部变形实现短期动态建模,并通过引入全局时间补偿实现长期动态建模。我们在公开的SurgT基准测试上评估了我们的方法,该基准测试是从Hamlyn、SCARED和肾脏边界数据集生成的。实验结果显示,Ada-Tracker实现了更高的准确性,并且对抗先前的工作更加稳健。代码可在https://github.com/wrld/Ada-Tracker找到。
更新时间: 2024-05-24 08:01:56
领域: cs.CV,cs.AI
Isotropy, Clusters, and Classifiers
Whether embedding spaces use all their dimensions equally, i.e., whether they are isotropic, has been a recent subject of discussion. Evidence has been accrued both for and against enforcing isotropy in embedding spaces. In the present paper, we stress that isotropy imposes requirements on the embedding space that are not compatible with the presence of clusters -- which also negatively impacts linear classification objectives. We demonstrate this fact both mathematically and empirically and use it to shed light on previous results from the literature.
Updated: 2024-05-24 08:01:39
标题: 各向同性、聚类和分类器
摘要: 最近关于嵌入空间是否均匀利用所有维度,即是否各向同性,一直是讨论的话题。已经有证据支持和反对在嵌入空间中强制实现各向同性。在本文中,我们强调各向同性对嵌入空间的要求与存在聚类的情况不兼容,这也会对线性分类目标产生负面影响。我们在理论和实证方面展示了这一事实,并利用它来阐明先前文献中的结果。
更新时间: 2024-05-24 08:01:39
领域: cs.LG,cs.CL
Perfect Alignment May be Poisonous to Graph Contrastive Learning
Graph Contrastive Learning (GCL) aims to learn node representations by aligning positive pairs and separating negative ones. However, few of researchers have focused on the inner law behind specific augmentations used in graph-based learning. What kind of augmentation will help downstream performance, how does contrastive learning actually influence downstream tasks, and why the magnitude of augmentation matters so much? This paper seeks to address these questions by establishing a connection between augmentation and downstream performance. Our findings reveal that GCL contributes to downstream tasks mainly by separating different classes rather than gathering nodes of the same class. So perfect alignment and augmentation overlap which draw all intra-class samples the same can not fully explain the success of contrastive learning. Therefore, in order to understand how augmentation aids the contrastive learning process, we conduct further investigations into the generalization, finding that perfect alignment that draw positive pair the same could help contrastive loss but is poisonous to generalization, as a result, perfect alignment may not lead to best downstream performance, so specifically designed augmentation is needed to achieve appropriate alignment performance and improve downstream accuracy. We further analyse the result by information theory and graph spectrum theory and propose two simple but effective methods to verify the theories. The two methods could be easily applied to various GCL algorithms and extensive experiments are conducted to prove its effectiveness. The code is available at https://github.com/somebodyhh1/GRACEIS
Updated: 2024-05-24 08:01:29
标题: 完美对准可能对图对比学习产生有害影响
摘要: 图对比学习(Graph Contrastive Learning,GCL)旨在通过对齐正样本和分离负样本来学习节点表示。然而,很少有研究人员关注图学习中使用的特定增强背后的内在规律。什么样的增强会帮助下游性能,对比学习实际上如何影响下游任务,以及增强的幅度为什么如此重要?本文试图通过建立增强与下游性能之间的联系来解决这些问题。我们的研究结果表明,GCL主要通过分离不同类别而不是聚集相同类别的节点来贡献于下游任务。因此,完美的对齐和增强重叠,将所有类内样本都吸引到一起,不能完全解释对比学习的成功。因此,为了理解增强如何帮助对比学习过程,我们进一步研究了泛化,发现完美的对齐可以帮助对比损失,但对泛化有害,因此,完美的对齐可能不会导致最佳的下游性能,因此需要专门设计的增强来实现适当的对齐性能并提高下游准确性。我们通过信息理论和图谱理论进一步分析结果,并提出了两种简单但有效的方法来验证这些理论。这两种方法可以轻松应用于各种GCL算法,并进行了大量实验证明其有效性。代码可在https://github.com/somebodyhh1/GRACEIS找到。
更新时间: 2024-05-24 08:01:29
领域: cs.LG,cs.AI
Organic Data-Driven Approach for Turkish Grammatical Error Correction and LLMs
Grammatical Error Correction has seen significant progress with the recent advancements in deep learning. As those methods require huge amounts of data, synthetic datasets are being built to fill this gap. Unfortunately, synthetic datasets are not organic enough in some cases and even require clean data to start with. Furthermore, most of the work that has been done is focused mostly on English. In this work, we introduce a new organic data-driven approach, clean insertions, to build parallel Turkish Grammatical Error Correction datasets from any organic data, and to clean the data used for training Large Language Models. We achieve state-of-the-art results on two Turkish Grammatical Error Correction test sets out of the three publicly available ones. We also show the effectiveness of our method on the training losses of training language models.
Updated: 2024-05-24 08:00:24
标题: 有机数据驱动方法用于土耳其语语法错误校正和LLMs
摘要: 语法错误校正在深度学习的最新进展中取得了显著进展。由于这些方法需要大量的数据,因此正在构建合成数据集以填补这一空白。不幸的是,在某些情况下,合成数据集并不够有机,甚至需要干净的数据作为起点。此外,大部分已经完成的工作主要集中在英语上。在这项工作中,我们引入了一种新的有机数据驱动方法,即清洁插入,用于从任何有机数据构建平行的土耳其语语法错误校正数据集,并清洁用于训练大型语言模型的数据。我们在三个公开可用的土耳其语语法错误校正测试集中的两个上取得了最新成果。我们还展示了我们的方法对训练语言模型的训练损失的有效性。
更新时间: 2024-05-24 08:00:24
领域: cs.CL,cs.AI
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehensive evaluation, ($\textit{O}$2) untested viability for scaling, and ($\textit{O}$3) lack of empirical guidelines. To tackle $\textit{O}$1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called $G_{\text{stack}}$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into $G_{\text{stack}}$ to address $\textit{O}$2 and $\textit{O}$3. For $\textit{O}$2 (untested scalability), our study shows that $G_{\text{stack}}$ is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54.6\% speedup. We further address $\textit{O}$3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for $G_{\text{stack}}$, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of $G_{\text{stack}}$. Our code and pre-trained model are available at $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$.
Updated: 2024-05-24 08:00:00
标题: 堆叠您的变压器:对于高效LLM预训练模型增长的深入研究
摘要: LLMs由于规模庞大,在预训练过程中计算开销较大。模型增长作为一种有前景的方法,通过利用较小的模型来加速较大模型的训练。然而,这些模型增长方法在高效的LLM预训练中的可行性尚未得到充分探讨。本研究确认了三个关键障碍:(O1)缺乏全面评估,(O2)未经验证的扩展性,以及(O3)缺乏经验指导。为了解决O1,我们将现有方法总结为四个原子增长操作符,并在标准化的LLM预训练设置中对它们进行系统评估。我们的研究结果显示,一种称为$G_{\text{stack}}$的深度堆叠操作符,在训练中表现出显著的加速效果,导致损失减少,并在八个标准NLP基准测试中相对于强基线实现了改进的整体性能。受到这些有希望的结果的启发,我们进行了大量实验,进一步深入研究$G_{\text{stack}}$以解决O2和O3。针对O2(未经验证的可扩展性),我们的研究表明$G_{\text{stack}}$具有可扩展性,并且在经过增长后进行了实验,预训练了750B标记的LLMs。例如,与使用300B标记进行传统训练的7B模型相比,我们的$G_{\text{stack}}$模型在194B标记下收敛到相同的损失,实现了54.6\%的加速。我们进一步通过制定指导方针,确定$G_{\text{stack}}$的增长时机和增长因子,来解决O3(缺乏经验指导),使其在一般LLM预训练中更加实用。我们还提供了对$G_{\text{stack}}$的深入讨论和全面的消融研究。我们的代码和预训练模型可在https://llm-stacking.github.io/上找到。
更新时间: 2024-05-24 08:00:00
领域: cs.CL,cs.AI
WPDA: Frequency-based Backdoor Attack with Wavelet Packet Decomposition
This work explores an emerging security threat against deep neural networks (DNNs) based image classification, i.e., backdoor attack. In this scenario, the attacker aims to inject a backdoor into the model by manipulating training data, such that the backdoor could be activated by a particular trigger and bootstraps the model to make a target prediction at inference. Currently, most existing data poisoning-based attacks struggle to achieve success at low poisoning ratios, increasing the risk of being defended by defense methods. In this paper, we propose a novel frequency-based backdoor attack via Wavelet Packet Decomposition (WPD), WPD decomposes the original image signal to a spectrogram that contains frequency information with different semantic meanings. We leverage WPD to statistically analyze the frequency distribution of the dataset to infer the key frequency regions the DNNs would focus on, and the trigger information is only injected into the key frequency regions. Our method mainly includes three parts: 1) the selection of the poisoning frequency regions in spectrogram; 2) trigger generation; 3) the generation of the poisoned dataset. Our method is stealthy and precise, evidenced by the 98.12% Attack Success Rate (ASR) on CIFAR-10 with the extremely low poisoning ratio 0.004% (i.e., only 2 poisoned samples among 50,000 training samples) and can bypass most existing defense methods. Besides, we also provide visualization analyses to explain why our method works.
Updated: 2024-05-24 07:59:58
标题: WPDA:基于频率的小道攻击与小波包分解
摘要: 这项工作探讨了一种新兴的安全威胁,即基于深度神经网络(DNNs)的图像分类的后门攻击。在这种情况下,攻击者旨在通过操纵训练数据向模型中注入后门,使得后门可以通过特定触发器激活,并引导模型在推理时做出目标预测。目前,大多数现有的基于数据污染的攻击在低污染比率下很难取得成功,增加了被防御方法防守的风险。在本文中,我们提出了一种通过小波包分解(WPD)的新型基于频率的后门攻击。WPD将原始图像信号分解为包含不同语义含义频率信息的频谱图。我们利用WPD对数据集的频率分布进行统计分析,以推断DNNs将关注的关键频率区域,并且触发信息仅注入到关键频率区域。我们的方法主要包括三个部分:1)在频谱图中选择污染频率区域;2)触发器生成;3)生成污染数据集。我们的方法隐蔽而精确,证明了在CIFAR-10上攻击成功率(ASR)达到了98.12%,污染比率极低为0.004%(即在50,000个训练样本中仅有2个被污染样本),并且可以绕过大多数现有的防御方法。此外,我们还提供了可视化分析来解释我们的方法为什么有效。
更新时间: 2024-05-24 07:59:58
领域: cs.CR,I.4.9
Assessing Political Bias in Large Language Models
The assessment of bias within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) in the context of their potential impact on societal dynamics. Especially, recognizing and considering political bias within LLM applications is central when closing in on the tipping point toward performative prediction. Then, being educated about potential effects and the societal behavior LLMs can drive at scale due to their interplay with human operators. In this way, the upcoming elections of the European Parliament will not remain unaffected by LLMs. We evaluate the political bias of the currently most popular open-source LLMs (instruct or assistant models) concerning political issues within the European Union (EU) from a German voter's perspective. To do so, we use the "Wahl-O-Mat", a voting advice application used in Germany. From the voting advice of the "Wahl-O-Mat" we quantize the degree of alignment of LLMs with German political parties. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, particularly when prompted in English. The central finding is, that LLMs are similarly biased, with low variances in the alignment with respect to a specific party. Our findings underline the importance of rigorously assessing and making bias transparent in LLMs to safeguard the integrity and trustworthiness of applications that employ the capabilities of performative prediction and the invisible hand of machine learning prediction and language generation.
Updated: 2024-05-24 07:59:31
标题: 评估大型语言模型中的政治偏见
摘要: 大型语言模型(LLMs)中的偏见评估已成为当代关于人工智能(AI)的话语中的一个关键问题,尤其是在其对社会动态潜在影响的背景下。特别地,在接近执行性预测的转折点时,认识和考虑LLMs应用中的政治偏见至关重要。然后,了解LLMs可能对社会行为产生的潜在影响以及由于它们与人类操作者相互作用而能够推动的规模效应。这样,欧洲议会即将举行的选举将不会不受LLMs影响。我们从德国选民的角度评估目前最流行的开源LLMs(指导或助手模型)在欧洲联盟(EU)内政治问题上的政治偏见。为此,我们使用在德国使用的选举建议应用程序“Wahl-O-Mat”。根据“Wahl-O-Mat”的选举建议,我们量化LLMs与德国政党的一致程度。我们表明,较大的模型,如Llama3-70B,往往与左倾政党更密切对齐,而较小的模型在英语提示下通常保持中立。中心发现是,LLMs具有类似的偏见,对特定政党的一致性变化较低。我们的发现强调了严格评估和公开LLMs中的偏见的重要性,以确保应用程序的完整性和可信度,这些应用程序利用执行性预测的能力和机器学习预测以及语言生成的看不见的手。
更新时间: 2024-05-24 07:59:31
领域: cs.CL,cs.AI
Are Long-LLMs A Necessity For Long-Context Tasks?
The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framework called LC-Boost (Long-Context Bootstrapper), which enables a short-LLM to address the long-context tasks in a bootstrapping manner. In our framework, the short-LLM prompts itself to reason for two critical decisions: 1) how to access to the appropriate part of context within the input, 2) how to make effective use of the accessed context. By adaptively accessing and utilizing the context based on the presented tasks, LC-Boost can serve as a general framework to handle diversified long-context processing problems. We comprehensively evaluate different types of tasks from popular long-context benchmarks, where LC-Boost is able to achieve a substantially improved performance with a much smaller consumption of resource.
Updated: 2024-05-24 07:59:30
标题: 是否长LLMs对于长文本任务是必须的?
摘要: 尽管近年来取得了一些进展,但长LLM的学习和部署仍然是一个具有挑战性的问题。在这项工作中,我们认为长LLM并非解决长上下文任务的必要条件,因为常见的长上下文任务是可以通过仅使用长上下文任务输入中的短上下文来解决的,即它们可以通过纯粹使用长上下文任务输入中的短上下文来解决。基于这一论点,我们提出了一个名为LC-Boost(长上下文引导器)的框架,该框架使短LLM能够以一种引导方式解决长上下文任务。在我们的框架中,短LLM被提示自我思考两个关键决策:1)如何访问输入中的适当上下文部分,2)如何有效利用所访问的上下文。通过根据提出的任务自适应地访问和利用上下文,LC-Boost可以作为一个处理多样化的长上下文处理问题的通用框架。我们全面评估了来自流行的长上下文基准的不同类型任务,在这些任务中,LC-Boost能够在消耗更少资源的情况下实现大幅提高的性能。
更新时间: 2024-05-24 07:59:30
领域: cs.CL,cs.AI
How do Large Language Models Handle Multilingualism?
Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. Using $\texttt{PLND}$, we validate $\texttt{MWork}$ through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. Moreover, $\texttt{MWork}$ allows fine-tuning of language-specific neurons with a small dataset, enhancing multilingual abilities in a specific language without compromising others. This approach results in an average improvement of $3.6\%$ for high-resource languages and $2.3\%$ for low-resource languages across all tasks with just $400$ documents.
Updated: 2024-05-24 07:59:10
标题: 大型语言模型如何处理多语言?
摘要: 大型语言模型(LLMs)在各种语言中展示出令人印象深刻的能力。本研究探讨了LLMs如何处理多语言性。基于观察到的层间语言比例变化以及网络结构与某些能力之间的关系,我们假设LLMs的多语言工作流程($\texttt{MWork}$):LLMs最初理解查询,将多语言输入转换为英文以解决任务。在中间层,它们使用英语思考,并通过自注意力和前馈结构整合多语言知识。在最终层,LLMs生成与查询原始语言对齐的响应。为了验证$\texttt{MWork}$,我们引入了并行语言特定神经元检测($\texttt{PLND}$)来识别激活的神经元,而无需任何标记数据。使用$\texttt{PLND}$,我们通过涉及跨不同层和结构的多个实验来验证$\texttt{MWork}$,通过关闭各种层和结构中的语言特定神经元。此外,$\texttt{MWork}$允许使用小数据集微调语言特定神经元,在不影响其他语言的情况下增强特定语言的多语言能力。这种方法导致高资源语言在所有任务中平均提高$3.6\%$,低资源语言提高$2.3\%$,仅使用$400$篇文档。
更新时间: 2024-05-24 07:59:10
领域: cs.CL,cs.AI
NuwaTS: Mending Every Incomplete Time Series
Time series imputation plays a crucial role in various real-world systems and has been extensively explored. Models for time series imputation often require specialization, necessitating distinct designs for different domains and missing patterns. In this study, we introduce NuwaTS, a framework to repurpose Pre-trained Language Model (PLM) for general time series imputation. Once trained, this model can be applied to imputation tasks on incomplete time series from any domain with any missing patterns. We begin by devising specific embeddings for each sub-series patch of the incomplete time series. These embeddings encapsulate information about the patch itself, the missing data patterns within the patch, and the patch's statistical characteristics. To enhance the model's adaptability to different missing patterns, we propose a contrastive learning approach to make representations of the same patch more similar across different missing patterns. By combining this contrastive loss with the missing data imputation task, we train PLMs to obtain a one-for-all imputation model. Furthermore, we utilize a plug-and-play layer-wise fine-tuning approach to train domain-specific models. Experimental results demonstrate that leveraging a dataset of over seventeen million time series from diverse domains, we obtain a one-for-all imputation model which outperforms existing domain-specific models across various datasets and missing patterns. Additionally, we find that NuwaTS can be generalized to other time series tasks such as forecasting. Our codes are available at https://github.com/Chengyui/NuwaTS.
Updated: 2024-05-24 07:59:02
标题: NuwaTS:修复每个不完整的时间序列
摘要: 时间序列插补在各种现实世界系统中起着至关重要的作用,并得到了广泛探讨。时间序列插补模型通常需要专门化,需要为不同领域和缺失模式设计不同的模型。在本研究中,我们介绍了NuwaTS,这是一个重新利用预训练语言模型(PLM)进行一般时间序列插补的框架。一旦训练完成,该模型可以应用于任何领域、任何缺失模式的不完整时间序列的插补任务。我们首先为不完整时间序列的每个子序列补丁设计了特定的嵌入。这些嵌入包含了有关补丁本身、补丁内缺失数据模式和补丁的统计特征的信息。为了增强模型对不同缺失模式的适应性,我们提出了一种对比学习方法,使同一补丁的表示在不同缺失模式下更加相似。通过将这种对比损失与缺失数据插补任务相结合,我们训练PLM获取一个全能的插补模型。此外,我们利用一种插拔式逐层微调方法来训练特定领域的模型。实验结果表明,利用来自不同领域的一千七百万个时间序列的数据集,我们得到一个全能的插补模型,这个模型在各种数据集和缺失模式上均优于现有的特定领域模型。此外,我们发现NuwaTS可以推广到其他时间序列任务,如预测。我们的代码可在https://github.com/Chengyui/NuwaTS 上获得。
更新时间: 2024-05-24 07:59:02
领域: cs.LG,cs.AI
Decaf: Data Distribution Decompose Attack against Federated Learning
In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed Decaf. This attack enables an honest-but-curious FL server to meticulously profile the proportion of each class owned by the victim FL user, divulging sensitive information like local market item distribution and business competitiveness. The crux of Decaf lies in the profound observation that the magnitude of local model gradient changes closely mirrors the underlying data distribution, including the proportion of each class. Decaf addresses two crucial challenges: accurately identify the missing/null class(es) given by any victim user as a premise and then quantify the precise relationship between gradient changes and each remaining non-null class. Notably, Decaf operates stealthily, rendering it entirely passive and undetectable to victim users regarding the infringement of their data distribution privacy. Experimental validation on five benchmark datasets (MNIST, FASHION-MNIST, CIFAR-10, FER-2013, and SkinCancer) employing diverse model architectures, including customized convolutional networks, standardized VGG16, and ResNet18, demonstrates Decaf's efficacy. Results indicate its ability to accurately decompose local user data distribution, regardless of whether it is IID or non-IID distributed. Specifically, the dissimilarity measured using $L_{\infty}$ distance between the distribution decomposed by Decaf and ground truth is consistently below 5\% when no null classes exist. Moreover, Decaf achieves 100\% accuracy in determining any victim user's null classes, validated through formal proof.
Updated: 2024-05-24 07:56:32
标题: 去咖啡因:针对联合学习的数据分布分解攻击
摘要: 与流行的联邦学习(FL)隐私推断技术(如生成对抗网络攻击、成员推断攻击、属性推断攻击和模型反演攻击)相比,我们提出了一种创新的隐私威胁:FL中的数据分布分解攻击,称为Decaf。这种攻击使一个诚实但好奇的FL服务器能够详细地分析受害FL用户拥有的每个类别的比例,揭示像本地市场物品分布和商业竞争力等敏感信息。Decaf的关键在于深刻观察到,本地模型梯度变化的幅度与底层数据分布密切相关,包括每个类别的比例。Decaf解决了两个关键挑战:准确识别任何受害用户给出的缺失/空类别作为前提,然后量化梯度变化与每个剩余非空类别之间的精确关系。值得注意的是,Decaf运行隐秘,使其完全被动且不被受害用户察觉,无法触犯其数据分布隐私。采用不同的模型架构,在五个基准数据集(MNIST,FASHION-MNIST,CIFAR-10,FER-2013和SkinCancer)上进行实验验证,包括定制的卷积网络、标准化的VGG16和ResNet18,证明了Decaf的有效性。结果表明,Decaf能够准确分解本地用户数据分布,无论其是否是IID分布还是非IID分布。具体而言,在没有空类别存在时,使用$L_{\infty}$距离测量的分解分布与实际情况之间的差异始终低于5\%。此外,Decaf通过正式证明实现了100\%的准确性,确定任何受害用户的空类别。
更新时间: 2024-05-24 07:56:32
领域: cs.LG,cs.CR
Output-Constrained Decision Trees
When there is a correlation between any pair of targets, one needs a prediction method that can handle vector-valued output. In this setting, multi-target learning is particularly important as it is widely used in various applications. This paper introduces new variants of decision trees that can handle not only multi-target output but also the constraints among the targets. We focus on the customization of conventional decision trees by adjusting the splitting criteria to handle the constraints and obtain feasible predictions. We present both an optimization-based exact approach and several heuristics, complete with a discussion on their respective advantages and disadvantages. To support our findings, we conduct a computational study to demonstrate and compare the results of the proposed approaches.
Updated: 2024-05-24 07:54:44
标题: Output-Constrained Decision Trees (输出受限决策树)
摘要: 当任何一对目标之间存在相关性时,需要一种能够处理向量值输出的预测方法。在这种情况下,多目标学习尤为重要,因为它在各种应用中被广泛使用。本文介绍了一种可以处理多目标输出以及目标之间约束的决策树的新变体。我们专注于通过调整分裂标准来定制传统决策树,以处理约束并获得可行的预测结果。我们提出了基于优化的精确方法和几种启发式方法,并讨论它们各自的优缺点。为了支持我们的发现,我们进行了计算研究,以展示和比较所提出方法的结果。
更新时间: 2024-05-24 07:54:44
领域: cs.LG
Resource-Efficient Heartbeat Classification Using Multi-Feature Fusion and Bidirectional LSTM
In this article, we present a resource-efficient approach for electrocardiogram (ECG) based heartbeat classification using multi-feature fusion and bidirectional long short-term memory (Bi-LSTM). The dataset comprises five original classes from the MIT-BIH Arrhythmia Database: Normal (N), Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Premature Ventricular Contraction (PVC), and Paced Beat (PB). Preprocessing methods including the discrete wavelet transform and dual moving average windows are used to reduce noise and artifacts in the raw ECG signal, and extract the main points (PQRST) of the ECG waveform. Multi-feature fusion is achieved by utilizing time intervals and the proposed under-the-curve areas, which are inherently robust against noise, as input features. Simulations demonstrated that incorporating under-the-curve area features improved the classification accuracy for the challenging RBBB and LBBB classes from 31.4\% to 84.3\% for RBBB, and from 69.6\% to 87.0\% for LBBB. Using a Bi-LSTM network, rather than a conventional LSTM network, resulted in higher accuracy (33.8\% vs 21.8\%) with a 28\% reduction in required network parameters for the RBBB class. Multiple neural network models with varying parameter sizes, including tiny (84k), small (150k), medium (478k), and large (1.25M) models, are developed to achieve high accuracy \textit{across all classes}, a more crucial and challenging goal than overall classification accuracy.
Updated: 2024-05-24 07:53:27
标题: 资源高效的心跳分类:利用多特征融合和双向LSTM
摘要: 在本文中,我们提出了一种资源高效的方法,用于基于心电图(ECG)的心跳分类,采用多特征融合和双向长短期记忆(Bi-LSTM)。数据集包括来自MIT-BIH心律失常数据库的五个原始类别:正常(N),左束支传导阻滞(LBBB),右束支传导阻滞(RBBB),早发性室性收缩(PVC)和起搏心跳(PB)。预处理方法包括离散小波变换和双移动平均窗口,用于减少原始ECG信号中的噪音和伪影,并提取ECG波形的主要点(PQRST)。通过利用时间间隔和提出的曲线下面积,实现多特征融合,这些特征对噪音具有固有的鲁棒性。模拟表明,引入曲线下面积特征可以将具有挑战性的RBBB和LBBB类别的分类准确率从31.4%提高到84.3%(RBBB),从69.6%提高到87.0%(LBBB)。使用Bi-LSTM网络而不是传统的LSTM网络,导致更高的准确性(33.8%对21.8%),并且对于RBBB类别,所需网络参数减少了28%。开发了多个神经网络模型,包括微型(84k)、小型(150k)、中型(478k)和大型(1.25M)模型,以实现跨所有类别的高准确性,这是一个更为关键和具有挑战性的目标,而不仅仅是总体分类准确性。
更新时间: 2024-05-24 07:53:27
领域: cs.LG
\textsc{Retro}]{\textsc{Retro}: \underline{Re}using \underline{t}eacher p\underline{ro}jection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.
Updated: 2024-05-24 07:53:09
标题: \textsc{Retro}: 重用老师投影头,在轻量级模型上通过自监督学习实现高效嵌入蒸馏
摘要: 自监督学习(SSL)因其能够利用大量未标记数据学习有效表示而受到关注。可以使用对比和一致性约束从较大的自监督预训练模型中提炼出轻量级模型。然而,投影头的不同大小使得学生难以准确模仿教师的嵌入。我们提出了\textsc{Retro},它重新利用教师的投影头用于学生,我们的实验结果表明,在所有轻量级模型上都取得了显著的改进。例如,当使用ResNet-50/101/152作为教师来训练EfficientNet-B0时,我们的方法将ImageNet上的线性结果分别提高到$66.9\%$、$69.3\%$和$69.8\%$,并且参数数量显著减少。
更新时间: 2024-05-24 07:53:09
领域: cs.CV,cs.AI
Spectraformer: A Unified Random Feature Framework for Transformer
Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods use a subset of combinations of component functions and weight matrices within the random features paradigm. We identify the need for a systematic comparison of different combinations of weight matrix and component functions for attention learning in Transformer. In this work, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in linearized attention of the Transformer. We experiment with broad classes of component functions and weight matrices for three textual tasks in the LRA benchmark. Our experimentation with multiple combinations of component functions and weight matrices leads us to a novel combination with 23.4% faster training time and 25.2% lower memory consumption over the previous SOTA random feature Transformer, while maintaining the performance, as compared to the Original Transformer. Our code is available at: https://anonymous.4open.science/r/spectraformer-8A97 .
Updated: 2024-05-24 07:52:53
标题: Spectraformer:Transformer的统一随机特征框架
摘要: 使用各种核逼近和核学习技术对注意力进行线性化显示出潜力。过去的方法使用随机特征范式中的一组组件函数和权重矩阵的子集。我们确定了需要对Transformer中不同组合的权重矩阵和组件函数进行系统比较以学习注意力的需求。在这项工作中,我们引入了Spectraformer,这是一个用于在Transformer的线性化注意力中近似和学习核函数的统一框架。我们对LRA基准测试中的三个文本任务的广泛类别的组件函数和权重矩阵进行实验。我们尝试了多种组件函数和权重矩阵的组合,找到了一种新颖的组合,训练时间比之前的SOTA随机特征Transformer快23.4%,内存消耗降低25.2%,同时保持了与原始Transformer相比的性能。我们的代码可以在以下链接找到:https://anonymous.4open.science/r/spectraformer-8A97。
更新时间: 2024-05-24 07:52:53
领域: cs.LG
Full private delegated quantum computing tailored from user to industry
In this paper, we present a set of private and secure delegated quantum computing protocols and techniques tailored to user-level and industry-level use cases, depending on the computational resources available to the client, the specific privacy needs required, and the type of algorithm. Our protocols are presented at a high level as they are independent of the particular algorithm used for such encryption and decryption processes. Additionally, we propose a method to verify the correct execution of operations by the external server.
Updated: 2024-05-24 07:52:11
标题: 个性化定制的全私人委托量子计算,从用户到行业
摘要: 在本文中,我们提出一组针对用户级和行业级使用案例定制的私密和安全的委托量子计算协议和技术,取决于客户可用的计算资源、所需的特定隐私需求以及算法类型。我们的协议在高级别上呈现,因为它们独立于用于加密和解密过程的特定算法。此外,我们提出了一种验证外部服务器操作正确执行的方法。
更新时间: 2024-05-24 07:52:11
领域: quant-ph,cs.CR,cs.DC,cs.ET,81P68,E.3
Nudging Users to Change Breached Passwords Using the Protection Motivation Theory
We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment ($n$=$1,386$) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a coping appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condition, participants receiving the threat appeal were more likely to intend to change their passwords, and participants receiving both appeals were more likely to end up changing their passwords; both comparisons have a small effect size. Participants' password change behaviors are further associated with other factors such as their security attitudes (SA-6) and time passed since the breach, suggesting that PMT-based nudges are useful but insufficient to fully motivate users to change their passwords. Our study contributes to PMT's application in security research and provides concrete design implications for improving compromised credential notifications.
Updated: 2024-05-24 07:51:15
标题: 用保护动机理论推动用户更改被泄露的密码
摘要: 我们借鉴了保护动机理论(PMT)来设计鼓励用户更改被破解密码的提示。我们的在线实验(n=1,386)比较了威胁呼吁(强调被破解密码的负面后果)和应对呼吁(提供更改被破解密码的指导)在2x2因子设计中的有效性。与对照组相比,接收威胁呼吁的参与者更有意愿更改他们的密码,同时接收两种呼吁的参与者更有可能最终更改他们的密码;这两种比较都有很小的效应大小。参与者的密码更改行为还与其他因素相关,如他们的安全态度(SA-6)和自遭遇破解以来的时间,这表明基于PMT的提示是有用的,但不足以完全激励用户更改他们的密码。我们的研究促进了PMT在安全研究中的应用,并为改进受损凭据通知提供了具体的设计启示。
更新时间: 2024-05-24 07:51:15
领域: cs.CR,cs.HC
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been shown to be a promising approach to addressing these issues. Nonetheless, existing MU methods for concept erasure encounter two primary bottlenecks: 1) generalization issues, where concept erasure is effective only for the data within the unlearn set, and prompts outside the unlearn set often still result in the generation of sensitive concepts; and 2) utility drop, where erasing target concepts significantly degrades the model's performance. To this end, this paper first proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results. Secondly, we devise a concept-preserving scheme based on gradient surgery. This approach alleviates the parts of the unlearning gradient that contradict the relearning gradient, ensuring that the process of unlearning minimally disrupts the model's performance. Finally, extensive experiments validate the effectiveness of our model, demonstrating our method's capability to address the challenges of concept unlearning in diffusion models while preserving model utility.
Updated: 2024-05-24 07:47:36
标题: 通过概念域校正和概念保持梯度在扩散模型中学习概念
摘要: 目前的文本到图像扩散模型在图像生成任务中取得了突破性的成果。然而,在预训练过程中不可避免地包含敏感信息,这会在生成的图像中引入重大风险,如侵犯版权和侵犯隐私。机器去学习(MU)提供了一种有效的方式来处理模型捕获的敏感概念,已被证明是解决这些问题的一种有前途的方法。然而,现有的概念消除的MU方法遇到两个主要瓶颈:1)泛化问题,即概念消除仅对于未学习集中的数据有效,而对于未学习集之外的提示往往仍然会导致生成敏感概念;2)效用下降,即擦除目标概念会显著降低模型的性能。因此,本文首先提出了一个用于扩散模型中概念去学习的概念域校正框架。通过通过对抗训练对齐敏感概念和锚定概念的输出域,我们增强了去学习结果的泛化能力。其次,我们设计了一种基于梯度手术的保留概念方案。这种方法缓解了与重新学习梯度相矛盾的部分去学习梯度,确保去学习过程最小程度地干扰模型的性能。最后,大量实验证实了我们模型的有效性,展示了我们的方法能够解决扩散模型中概念去学习的挑战,同时保留模型的实用性。
更新时间: 2024-05-24 07:47:36
领域: cs.LG,cs.CV
Adversarial Robust Low Rank Matrix Estimation: Compressed Sensing and Matrix Completion
We consider robust low rank matrix estimation as a trace regression when outputs are contaminated by adversaries. The adversaries are allowed to add arbitrary values to arbitrary outputs. Such values can depend on any samples. We deal with matrix compressed sensing, including lasso as a partial problem, and matrix completion, and then we obtain sharp estimation error bounds. To obtain the error bounds for different models such as matrix compressed sensing and matrix completion, we propose a simple unified approach based on a combination of the Huber loss function and the nuclear norm penalization, which is a different approach from the conventional ones. Some error bounds obtained in the present paper are sharper than the past ones.
Updated: 2024-05-24 07:44:50
标题: 对抗性鲁棒低秩矩阵估计:压缩感知与矩阵完成
摘要: 我们将鲁棒低秩矩阵估计视为一种迹回归,当输出受到对手的干扰时。对手被允许向任意输出添加任意值。这些值可以取决于任何样本。我们处理矩阵压缩感知,包括Lasso作为部分问题,以及矩阵完成,然后获得尖锐的估计误差界限。为了获得不同模型的误差界限,如矩阵压缩感知和矩阵完成,我们提出了一种简单的统一方法,基于Huber损失函数和核范数惩罚的组合,这是一种与传统方法不同的方法。本文获得的一些误差界限比过去的更为尖锐。
更新时间: 2024-05-24 07:44:50
领域: stat.ML,cs.LG,math.ST,stat.TH,62G35, 62G05
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
Large-scale pre-trained generative models are taking the world by storm, due to their abilities in generating creative content. Meanwhile, safeguards for these generative models are developed, to protect users' rights and safety, most of which are designed for large language models. Existing methods primarily focus on jailbreak and adversarial attacks, which mainly evaluate the model's safety under malicious prompts. Recent work found that manually crafted safe prompts can unintentionally trigger unsafe generations. To further systematically evaluate the safety risks of text-to-image models, we propose a novel Automatic Red-Teaming framework, ART. Our method leverages both vision language model and large language model to establish a connection between unsafe generations and their prompts, thereby more efficiently identifying the model's vulnerabilities. With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models. The experiments also validate the effectiveness, adaptability, and great diversity of ART. Additionally, we introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models. Datasets and models can be found in https://github.com/GuanlinLee/ART.
Updated: 2024-05-24 07:44:27
标题: 自动红队行动技术:保护良性用户的文本到图像模型
摘要: 大规模预训练生成模型正在风靡世界,因其在生成创意内容方面的能力。同时,为了保护用户的权益和安全,已开发了针对这些生成模型的安全防护措施,其中大多数是为大型语言模型设计的。现有方法主要集中在越狱和对抗性攻击上,主要评估模型在恶意提示下的安全性。最近的研究发现,手工制作的安全提示可能会无意中触发不安全的生成。为了进一步系统评估文本到图像模型的安全风险,我们提出了一种新颖的自动红队框架,ART。我们的方法利用视觉语言模型和大型语言模型之间的联系,更有效地识别模型的漏洞。通过我们全面的实验证明了流行的开源文本到图像模型的有毒性。实验还验证了ART的有效性、适应性和多样性。此外,我们介绍了三个用于研究与文本到图像模型相关的安全风险的大规模红队数据集。数据集和模型可在https://github.com/GuanlinLee/ART找到。
更新时间: 2024-05-24 07:44:27
领域: cs.CR,cs.AI
Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining
Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, in multi-objective hyperparameter optimization scenarios, the insights gained from the iterative learning procedure typically remain underutilized. We notice that tracking the model performance across multiple epochs under a hyperparameter setting creates a trajectory in the objective space and that trade-offs along the trajectories are often overlooked despite their potential to offer valuable insights to decision-making for model retraining. Therefore, in this study, we propose to enhance the multi-objective hyperparameter optimization problem by having training epochs as an additional decision variable to incorporate trajectory information. Correspondingly, we present a novel trajectory-based multi-objective Bayesian optimization algorithm characterized by two features: 1) an acquisition function that captures the improvement made by the predictive trajectory of any hyperparameter setting and 2) a multi-objective early stopping mechanism that determines when to terminate the trajectory to maximize epoch efficiency. Numerical experiments on diverse synthetic simulations and hyperparameter tuning benchmarks indicate that our algorithm outperforms the state-of-the-art multi-objective optimizers in both locating better trade-offs and tuning efficiency.
Updated: 2024-05-24 07:43:45
标题: 基于轨迹的多目标超参数优化用于模型重新训练
摘要: 训练机器学习模型固有地涉及资源密集和嘈杂的迭代学习过程,允许按时代监控模型性能。然而,在多目标超参数优化场景中,通常对迭代学习过程获得的见解利用不足。我们注意到,在超参数设置下跟踪模型性能跨多个时代会在目标空间中创建轨迹,而沿着轨迹的权衡通常被忽视,尽管它们有潜力为模型重新训练的决策提供宝贵的见解。因此,在这项研究中,我们提出通过将训练时代作为附加决策变量来增强多目标超参数优化问题,以整合轨迹信息。相应地,我们提出了一种新颖的基于轨迹的多目标贝叶斯优化算法,其特点是:1)一种获取函数,捕捉任何超参数设置的预测轨迹所取得的改进;2)一种多目标早期停止机制,确定何时终止轨迹以最大化时代效率。对各种合成模拟和超参数调整基准的数值实验表明,我们的算法在定位更好的权衡和调整效率方面优于最先进的多目标优化器。
更新时间: 2024-05-24 07:43:45
领域: cs.LG
Privacy-preserving recommender system using the data collaboration analysis for distributed datasets
In order to provide high-quality recommendations for users, it is desirable to share and integrate multiple datasets held by different parties. However, when sharing such distributed datasets, we need to protect personal and confidential information contained in the datasets. To this end, we establish a framework for privacy-preserving recommender systems using the data collaboration analysis of distributed datasets. Numerical experiments with two public rating datasets demonstrate that our privacy-preserving method for rating prediction can improve the prediction accuracy for distributed datasets. This study opens up new possibilities for privacy-preserving techniques in recommender systems.
Updated: 2024-05-24 07:43:00
标题: 隐私保护推荐系统:使用数据协作分析处理分布式数据集
摘要: 为了为用户提供高质量的推荐,分享和整合由不同方持有的多个数据集是可取的。然而,当分享这些分布式数据集时,我们需要保护数据集中包含的个人和机密信息。为此,我们建立了一个隐私保护推荐系统框架,利用分布式数据集的数据协作分析。通过对两个公共评级数据集进行数值实验,我们证明了我们的评级预测隐私保护方法可以提高分布式数据集的预测准确性。这项研究为推荐系统中的隐私保护技术开辟了新的可能性。
更新时间: 2024-05-24 07:43:00
领域: cs.IR,cs.CR,cs.LG
Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capabilities. In this study, we examine the matching mechanism employed by Transformer for multi-step reasoning on a constructed dataset. We investigate factors that influence the model's matching mechanism and discover that small initialization and post-LayerNorm can facilitate the formation of the matching mechanism, thereby enhancing the model's reasoning ability. Moreover, we propose a method to improve the model's reasoning capability by adding orthogonal noise. Finally, we investigate the parallel reasoning mechanism of Transformers and propose a conjecture on the upper bound of the model's reasoning ability based on this phenomenon. These insights contribute to a deeper understanding of the reasoning processes in large language models and guide designing more effective reasoning architectures and training strategies.
Updated: 2024-05-24 07:41:26
标题: 朝向理解Transformer如何通过匹配操作进行多步推理
摘要: 大型语言模型在复杂推理任务中一直存在困难,比如数学问题解决。研究这些模型的内部推理机制可以帮助我们设计更好的模型架构和训练策略,从而最终增强它们的推理能力。在这项研究中,我们研究了Transformer在构建数据集上用于多步推理的匹配机制。我们调查了影响模型匹配机制的因素,并发现小的初始化和LayerNorm后处理可以促进匹配机制的形成,从而增强模型的推理能力。此外,我们提出了一种通过添加正交噪声来提高模型推理能力的方法。最后,我们研究了Transformer的并行推理机制,并根据这一现象提出了模型推理能力的上限猜想。这些见解有助于更深入地理解大型语言模型中的推理过程,并指导设计更有效的推理架构和训练策略。
更新时间: 2024-05-24 07:41:26
领域: cs.AI,cs.CL,cs.LG
Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing
Uplift modeling has been widely employed in online marketing by predicting the response difference between the treatment and control groups, so as to identify the sensitive individuals toward interventions like coupons or discounts. Compared with traditional \textit{conversion uplift modeling}, \textit{revenue uplift modeling} exhibits higher potential due to its direct connection with the corporate income. However, previous works can hardly handle the continuous long-tail response distribution in revenue uplift modeling. Moreover, they have neglected to optimize the uplift ranking among different individuals, which is actually the core of uplift modeling. To address such issues, in this paper, we first utilize the zero-inflated lognormal (ZILN) loss to regress the responses and customize the corresponding modeling network, which can be adapted to different existing uplift models. Then, we study the ranking-related uplift modeling error from the theoretical perspective and propose two tighter error bounds as the additional loss terms to the conventional response regression loss. Finally, we directly model the uplift ranking error for the entire population with a listwise uplift ranking loss. The experiment results on offline public and industrial datasets validate the effectiveness of our method for revenue uplift modeling. Furthermore, we conduct large-scale experiments on a prominent online fintech marketing platform, Tencent FiT, which further demonstrates the superiority of our method in practical applications.
Updated: 2024-05-24 07:40:55
标题: 在线营销中增强排名可提高收入的建模框架
摘要: 提升建模已被广泛应用于在线营销中,通过预测处理组和对照组之间的响应差异,以识别对诸如优惠券或折扣等干预敏感的个人。与传统的转化提升建模相比,收入提升建模由于与企业收入的直接联系,具有更高的潜力。然而,先前的研究很难处理收入提升建模中连续的长尾响应分布。此外,它们忽视了优化不同个体之间的提升排名,而这实际上是提升建模的核心。为了解决这些问题,在本文中,我们首先利用零膨胀对数正态(ZILN)损失来回归响应,并定制相应的建模网络,可以适应不同现有的提升模型。然后,我们从理论的角度研究了与排名相关的提升建模误差,并提出了两个更紧密的误差界限作为传统响应回归损失的附加损失项。最后,我们直接对整个人口的提升排名误差进行建模,使用一个列表式提升排名损失。离线公共和工业数据集上的实验结果验证了我们的方法对于收入提升建模的有效性。此外,我们在知名在线金融科技营销平台腾讯 FiT 上进行了大规模实验,进一步展示了我们的方法在实际应用中的优越性。
更新时间: 2024-05-24 07:40:55
领域: cs.LG
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing.
Updated: 2024-05-24 07:40:27
标题: 大语言模型中加速生成技术的全面调查
摘要: 尽管在大语言模型(LLMs)中加速文本生成的关键重要性,以有效地产生内容,但这一过程的顺序性往往导致推断延迟较高,对实时应用构成挑战。已经提出和发展了各种技术来解决这些挑战并提高效率。本文提出了在自回归语言模型中加速生成的技术的综合调查,旨在了解最新方法及其应用。我们将这些技术分为几个关键领域:推测性解码、提前退出机制和非自回归方法。我们讨论每个类别的基本原理、优势、局限性和最新进展。通过这项调查,我们旨在提供对LLMs中技术现状的洞察,并为自然语言处理这一关键领域的未来研究方向提供指导。
更新时间: 2024-05-24 07:40:27
领域: cs.CL,cs.AI
Robust Diffusion Models for Adversarial Purification
Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks.
Updated: 2024-05-24 07:33:04
标题: Robust Diffusion Models for Adversarial Purification 对抗净化的强大扩散模型
摘要: 扩散模型(DMs)基于对抗净化(AP)已被证明是对抗训练(AT)的最有效替代方法。然而,这些方法忽视了预训练的扩散模型本身对对抗攻击也不具有鲁棒性的事实。此外,扩散过程很容易破坏语义信息,并在逆向过程后生成一幅高质量的图像,但与原始输入图像完全不同,导致标准准确度下降。为了解决这些问题,一个自然的想法是利用对抗训练策略来重新训练或微调预训练的扩散模型,但这在计算上是不可行的。我们提出了一种新颖的具有对抗引导的鲁棒逆向过程,它不依赖于给定的预训练DMs,并避免重新训练或微调DMs。这种鲁棒的引导不仅可以确保生成保留更多语义内容的纯净示例,还可以首次减轻DMs的准确度-鲁棒性权衡,这也为基于DMs的AP提供了一种有效的适应能力,以抵御新攻击。我们在CIFAR-10、CIFAR-100和ImageNet上进行了大量实验,证明我们的方法达到了最先进的结果,并展现出对不同攻击的泛化能力。
更新时间: 2024-05-24 07:33:04
领域: cs.CV,cs.AI
Semi-Supervised Learning guided by the Generalized Bayes Rule under Soft Revision
We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-Maximin method with soft revision. We eventually select pseudo-labeled data that are most likely in light of the least favorable distribution from the so updated credal set. We formalize the task of finding optimal pseudo-labeled data w.r.t. the Gamma-Maximin method with soft revision as an optimization problem. A concrete implementation for the class of logistic models then allows us to compare the predictive power of the method with competing approaches. It is observed that the Gamma-Maximin method with soft revision can achieve very promising results, especially when the proportion of labeled data is low.
Updated: 2024-05-24 07:30:45
标题: 基于软修正的广义贝叶斯规则指导的半监督学习
摘要: 我们对最近提出的带软修订的Gamma-Maximin方法进行了理论和计算研究,该方法被提议作为半监督学习中伪标签选择(PLS)的健壮准则。与传统的PLS方法相反,我们使用先验的置信区间集(“广义贝叶斯”)来表示认识建模不确定性。然后,这些先验通过带软修订的Gamma-Maximin方法进行更新。最终,我们选择在经过更新的置信区间集中最有可能的伪标签数据,这是根据最不利分布而言的。我们将根据Gamma-Maximin方法与软修订找到最佳伪标签数据的任务形式化为一个优化问题。然后,对于逻辑模型类的具体实现,我们可以比较该方法与竞争方法的预测能力。观察到,带软修订的Gamma-Maximin方法可以取得非常有希望的结果,特别是在标记数据比例较低的情况下。
更新时间: 2024-05-24 07:30:45
领域: stat.ML,cs.AI,cs.LG,math.ST,stat.ME,stat.TH,62C12 62C10,I.2.6; G.3
Robust estimation with Lasso when outputs are adversarially contaminated
We consider robust estimation when outputs are adversarially contaminated. Nguyen and Tran (2012) proposed an extended Lasso for robust parameter estimation and then they showed the convergence rate of the estimation error. Recently, Dalalyan and Thompson (2019) gave some useful inequalities and then they showed a faster convergence rate than Nguyen and Tran (2012). They focused on the fact that the minimization problem of the extended Lasso can become that of the penalized Huber loss function with $L_1$ penalty. The distinguishing point is that the Huber loss function includes an extra tuning parameter, which is different from the conventional method. We give the proof, which is different from Dalalyan and Thompson (2019) and then we give the same convergence rate as Dalalyan and Thompson (2019). The significance of our proof is to use some specific properties of the Huber function. Such techniques have not been used in the past proofs.
Updated: 2024-05-24 07:29:03
标题: 当输出受到对抗性污染时,使用Lasso进行稳健估计
摘要: 我们考虑在输出受到对手污染时的鲁棒估计。Nguyen和Tran(2012)提出了一种用于鲁棒参数估计的扩展Lasso,并展示了估计误差的收敛速度。最近,Dalalyan和Thompson(2019)给出了一些有用的不等式,然后展示了比Nguyen和Tran(2012)更快的收敛速度。他们专注于扩展Lasso的最小化问题可能变为带有$L_1$惩罚的Huber损失函数的问题。区别点在于Huber损失函数包含一个额外的调节参数,这与传统方法不同。我们提供了一个与Dalalyan和Thompson(2019)不同的证明,然后给出了与他们相同的收敛速度。我们证明的重要性在于利用Huber函数的一些特定属性。这样的技术在过去的证明中并未被使用过。
更新时间: 2024-05-24 07:29:03
领域: math.ST,cs.LG,stat.ML,stat.TH
Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the internal representations harnessed by neural networks and Transformers. Building on recent progress toward comprehending how networks execute distinct target functions, our study embarks on an exploration of the underlying reasons behind networks adopting specific computational strategies. We direct our focus to the complex algebraic learning task of modular addition involving $k$ inputs. Our research presents a thorough analytical characterization of the features learned by stylized one-hidden layer neural networks and one-layer Transformers in addressing this task. A cornerstone of our theoretical framework is the elucidation of how the principle of margin maximization shapes the features adopted by one-hidden layer neural networks. Let $p$ denote the modulus, $D_p$ denote the dataset of modular arithmetic with $k$ inputs and $m$ denote the network width. We demonstrate that a neuron count of $ m \geq 2^{2k-2} \cdot (p-1) $, these networks attain a maximum $ L_{2,k+1} $-margin on the dataset $ D_p $. Furthermore, we establish that each hidden-layer neuron aligns with a specific Fourier spectrum, integral to solving modular addition problems. By correlating our findings with the empirical observations of similar studies, we contribute to a deeper comprehension of the intrinsic computational mechanisms of neural networks. Furthermore, we observe similar computational mechanisms in the attention matrix of the one-layer Transformer. This research stands as a significant stride in unraveling their operation complexities, particularly in the realm of complex algebraic tasks.
Updated: 2024-05-24 07:28:24
标题: 神经网络中的傅立叶电路:解锁大型语言模型在数学推理和模块算术中的潜力
摘要: 在机器学习不断发展的领域中,一个关键挑战在于解读神经网络和Transformer所利用的内部表示。建立在最近对网络执行不同目标函数的理解进展基础上,我们的研究开始探索网络采用特定计算策略背后的原因。我们将焦点放在涉及$k$个输入的复杂代数学习任务模块化加法上。我们的研究提供了对经过风格化的单隐藏层神经网络和单层Transformer在解决这一任务时学习到的特征的彻底分析。我们理论框架的一个基石是阐明边缘最大化原则如何塑造了单隐藏层神经网络采用的特征。让$p$表示模数,$D_p$表示具有$k$个输入的模运算数据集,$m$表示网络宽度。我们证明当神经元数量$ m \geq 2^{2k-2} \cdot (p-1) $时,这些网络在数据集$D_p$上达到最大$ L_{2,k+1} $边缘。此外,我们建立了每个隐藏层神经元与特定傅立叶频谱对齐,对解决模块化加法问题至关重要。通过将我们的发现与类似研究的经验观察相关联,我们有助于更深入理解神经网络的内在计算机制。此外,我们观察到单层Transformer的注意力矩阵中存在类似的计算机制。这项研究在揭示它们的操作复杂性,特别是在复杂代数任务领域中,迈出了重要的一步。
更新时间: 2024-05-24 07:28:24
领域: cs.LG,stat.ML
Robustly overfitting latents for flexible neural image compression
Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models. We extend this idea by introducing SGA+, which contains three different methods that build upon SGA. We show how our method improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors. Additionally, we show how refinement of the latents with our best-performing method improves the compression performance on both the Tecnick and CLIC dataset. Our method is deployed for a pre-trained hyperprior and for a more flexible model. Further, we give a detailed analysis of our proposed methods and show that they are less sensitive to hyperparameter choices. Finally, we show how each method can be extended to three- instead of two-class rounding.
Updated: 2024-05-24 07:27:08
标题: 弹性神经图像压缩中的鲁棒性过拟合隐变量
摘要: 神经图像压缩取得了很大的进展。最先进的模型基于变分自动编码器,表现优于传统模型。神经压缩模型学习将图像编码为一个可以有效传输到解码器的量化潜在表示,解码器将量化的潜在表示解码为重建图像。虽然这些模型在实践中表现出成功,但由于优化不完善和编码器和解码器容量的限制,导致结果次优。最近的研究表明如何使用随机Gumbel退火(SGA)来改进预训练的神经图像压缩模型的潜在表示。我们通过引入SGA+来扩展这个想法,其中包含三种不同的方法,这些方法建立在SGA的基础上。我们展示了我们的方法如何改善整体压缩性能,相对于其前身,在R-D权衡方面。此外,我们展示了如何使用我们表现最佳的方法改进潜在表示在Tecnick和CLIC数据集上的压缩性能。我们的方法部署在一个预先训练的超先验和一个更灵活的模型中。此外,我们对我们提出的方法进行了详细分析,并表明它们对超参数选择不太敏感。最后,我们展示了如何将每种方法扩展为三类而不是两类四舍五入。
更新时间: 2024-05-24 07:27:08
领域: cs.CV,cs.LG,stat.ML
Transaction Fee Estimation in the Bitcoin System
In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter. Hence it is indispensable to help a client set a reasonable fee, as a higher fee incurs over-spending and a lower fee could delay the confirmation. In this work, we focus on estimating the transaction fee for a new transaction to help with its confirmation within a given expected time. We identify two major drawbacks in the existing works. First, the current industry products are built on explicit analytical models, ignoring the complex interactions of different factors which could be better captured by machine learning based methods; Second, all of the existing works utilize limited knowledge for the estimation which hinders the potential of further improving the estimation quality. As a result, we propose a framework FENN, which aims to integrate the knowledge from a wide range of sources, including the transaction itself, unconfirmed transactions in the mempool and the blockchain confirmation environment, into a neural network model in order to estimate a proper transaction fee. Finally, we conduct experiments on real blockchain datasets to demonstrate the effectiveness and efficiency of our proposed framework over the state-of-the-art works evaluated by MAPE and RMSE. Each variation model in our framework can finish training within one block interval, which shows the potential of our framework to process the realtime transaction updates in the Bitcoin blockchain.
Updated: 2024-05-24 07:27:00
标题: 比特币系统中的交易费用估算
摘要: 在比特币系统中,交易费用作为激励来促使区块链确认。通常,交易费用较高的交易很可能会被包括在下一个被挖掘的区块中,而交易费用较低或者没有费用的交易可能会被延迟或者根本不会被处理。然而,提交交易时需要指定交易费用,几乎不能在此后更改。因此,帮助客户设定合理的费用是必不可少的,因为更高的费用会导致超支,而更低的费用可能会延迟确认。在这项工作中,我们专注于估计新交易的交易费用,以帮助在给定的预期时间内确认交易。我们发现现有研究中存在两个主要缺点。首先,当前的行业产品是建立在明确的分析模型上的,忽略了不同因素的复杂交互作用,这些因素可以更好地通过基于机器学习的方法来捕捉;其次,所有现有研究都利用有限的知识来进行估计,这阻碍了进一步提高估计质量的潜力。因此,我们提出了一个名为FENN的框架,旨在整合来自各种来源的知识,包括交易本身、未确认交易和区块链确认环境,以便估计适当的交易费用。最后,我们在真实的区块链数据集上进行实验,以证明我们提出的框架在MAPE和RMSE评估的现有最先进作品上的有效性和效率。我们框架中的每个变体模型都可以在一个区块间隔内完成训练,这显示了我们框架处理比特币区块链中的实时交易更新的潜力。
更新时间: 2024-05-24 07:27:00
领域: cs.CR
Towards a Probabilistic Fusion Approach for Robust Battery Prognostics
Batteries are a key enabling technology for the decarbonization of transport and energy sectors. The safe and reliable operation of batteries is crucial for battery-powered systems. In this direction, the development of accurate and robust battery state-of-health prognostics models can unlock the potential of autonomous systems for complex, remote and reliable operations. The combination of Neural Networks, Bayesian modelling concepts and ensemble learning strategies, form a valuable prognostics framework to combine uncertainty in a robust and accurate manner. Accordingly, this paper introduces a Bayesian ensemble learning approach to predict the capacity depletion of lithium-ion batteries. The approach accurately predicts the capacity fade and quantifies the uncertainty associated with battery design and degradation processes. The proposed Bayesian ensemble methodology employs a stacking technique, integrating multiple Bayesian neural networks (BNNs) as base learners, which have been trained on data diversity. The proposed method has been validated using a battery aging dataset collected by the NASA Ames Prognostics Center of Excellence. Obtained results demonstrate the improved accuracy and robustness of the proposed probabilistic fusion approach with respect to (i) a single BNN model and (ii) a classical stacking strategy based on different BNNs.
Updated: 2024-05-24 07:26:36
标题: 走向一种概率融合方法,用于稳健的电池预测
摘要: 电池是交通和能源领域脱碳的关键技术。电池的安全和可靠运行对于电池驱动系统至关重要。在这方面,精确和健壮的电池健康状态预测模型的发展可以释放自主系统在复杂、远程和可靠操作方面的潜力。神经网络、贝叶斯建模概念和集成学习策略的结合形成了一个有价值的预测框架,以稳健和准确的方式结合不确定性。因此,本文介绍了一种贝叶斯集成学习方法来预测锂离子电池的容量衰减。这种方法准确预测了容量衰减并量化了与电池设计和退化过程相关的不确定性。所提出的贝叶斯集成方法采用了一种堆叠技术,将多个在数据多样性上进行训练的贝叶斯神经网络(BNNs)作为基本学习器。该方法已通过NASA Ames预测中心收集的电池老化数据集进行验证。实验结果表明,相对于(i)单个BNN模型和(ii)基于不同BNNs的经典堆叠策略,所提出的概率融合方法具有改进的准确性和稳健性。
更新时间: 2024-05-24 07:26:36
领域: cs.LG,cs.AI
Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent
Over the past two years, the use of large language models (LLMs) has advanced rapidly. While these LLMs offer considerable convenience, they also raise security concerns, as LLMs are vulnerable to adversarial attacks by some well-designed textual perturbations. In this paper, we introduce a novel defense technique named Large LAnguage MOdel Sentinel (LLAMOS), which is designed to enhance the adversarial robustness of LLMs by purifying the adversarial textual examples before feeding them into the target LLM. Our method comprises two main components: a) Agent instruction, which can simulate a new agent for adversarial defense, altering minimal characters to maintain the original meaning of the sentence while defending against attacks; b) Defense guidance, which provides strategies for modifying clean or adversarial examples to ensure effective defense and accurate outputs from the target LLMs. Remarkably, the defense agent demonstrates robust defensive capabilities even without learning from adversarial examples. Additionally, we conduct an intriguing adversarial experiment where we develop two agents, one for defense and one for defense, and engage them in mutual confrontation. During the adversarial interactions, neither agent completely beat the other. Extensive experiments on both open-source and closed-source LLMs demonstrate that our method effectively defends against adversarial attacks, thereby enhancing adversarial robustness.
Updated: 2024-05-24 07:23:56
标题: 大型语言模型哨兵:通过LLM代理提升对抗性鲁棒性
摘要: 在过去的两年里,大型语言模型(LLMs)的使用迅速发展。虽然这些LLMs提供了相当大的便利性,但它们也引发了安全问题,因为LLMs容易受到一些精心设计的文本扰动的对抗攻击。在本文中,我们介绍了一种名为Large LAnguage MOdel Sentinel(LLAMOS)的新颖防御技术,旨在通过在将对抗性文本示例输入目标LLM之前净化这些对抗性文本示例来增强LLMs的对抗性鲁棒性。我们的方法包括两个主要组件:a)Agent instruction,可以模拟一个新的代理进行对抗性防御,改变最小的字符以保持句子的原始含义,同时抵御攻击;b)Defense guidance,提供修改干净或对抗性示例的策略,以确保有效防御和目标LLMs的准确输出。值得注意的是,即使没有从对抗性示例中学习,这种防御代理也展示了强大的防御能力。此外,我们进行了一项有趣的对抗实验,我们开发了两个代理,一个用于防御,一个用于防御,并让它们相互对抗。在对抗互动中,两个代理都没有完全击败对方。对开源和闭源LLMs进行的大量实验表明,我们的方法有效地抵御了对抗性攻击,从而增强了对抗性鲁棒性。
更新时间: 2024-05-24 07:23:56
领域: cs.CL,cs.AI,cs.CR
The Writing is on the Wall: Analyzing the Boom of Inscriptions and its Impact on EVM-compatible Blockchains
Despite the level of attention given to rollups there is limited empirical research about their performance. To address this gap, we conduct a comprehensive data-driven analysis of the late 2023 transaction boom that is attributed to inscriptions: a novel approach to record data onto a blockchain with no outside server needed. Inscriptions were first introduced on the Bitcoin blockchain to allow for the representation of NFTs or ERC-20-like tokens without smart contracts, but were later spread to other blockchains. This work examines the applications of inscription transactions in Ethereum and its major EVM-compatible rollups and their impact on blockchain scalability during periods of sudden transaction surges. We found that on certain days, inscription-related transactions comprised over 89% on Arbitrum, over 88% on zkSync Era, and over 53% on Ethereum. Furthermore, 99% of these transactions were related to the minting of meme coins, followed by limited trading activity. Unlike L1 blockchains, during periods of transaction surges, zkSync and Arbitrum experienced lower median gas fees, attributable to the compression of L2 transactions for a single L1 batch. Additionally, zkSync Era, a ZK rollup, demonstrated a stronger reduction in fees than optimistic rollups considered in our study: Arbitrum, Base, and Optimism.
Updated: 2024-05-24 07:21:53
标题: 墙上的文字:分析铭文的激增及其对EVM兼容区块链的影响
摘要: 尽管对于卷起现象受到了关注,但关于其表现的经验研究有限。为了填补这一空白,我们对被归因于铭文的2023年末交易热潮进行了全面的数据驱动分析:铭文是一种将数据记录到区块链上的新颖方法,无需外部服务器。铭文最初在比特币区块链上推出,以允许NFT或类似ERC-20代币的表示,而无需智能合约,但后来扩展到其他区块链上。 本研究探讨了以太坊及其主要与EVM兼容的卷起中的铭文交易的应用,并分析了在突发交易激增期间对区块链可扩展性的影响。我们发现,在某些日子,与铭文相关的交易在Arbitrum上占比超过89%,在zkSync Era上占比超过88%,在以太坊上占比超过53%。此外,这些交易中99%与制作模因币有关,交易活动有限。与L1区块链不同,在交易激增期间,zkSync和Arbitrum的中位气体费用较低,这归因于将L2交易压缩为单个L1批次。此外,在我们的研究中考虑的乐观卷起(Arbitrum、Base和Optimism)中,ZK卷起zkSync Era显示了比乐观卷起更强的费用降低。
更新时间: 2024-05-24 07:21:53
领域: cs.CR
Airdrops: Giving Money Away Is Harder Than It Seems
Airdrops are used by blockchain applications and protocols to attract an initial user base, and to grow the user base over time. In the case of many airdrops, tokens are distributed to select users as a "reward" for interacting with the underlying protocol, with a long-term goal of creating a loyal community that will generate genuine economic activity well after the airdrop. Although airdrops are widely used by the blockchain industry, a proper understanding of the factors contributing to an airdrop's success is generally lacking. In this work, we outline the design space for airdrops, and specify a reasonable list of outcomes that an airdrop should ideally result in. We then analyze on-chain data from several larger-scale airdrops to empirically evaluate the success of previous airdrops, with respect to our desiderata. In our analysis, we demonstrate that airdrop farmers frequently dispose of the lion's share of airdrops proceeds via exchanges. Our analysis is followed by an overview of common pitfalls that common airdrop designs lend themselves to, which are then used to suggest concrete guidelines for better airdrops.
Updated: 2024-05-24 07:21:06
标题: 空投:赠送金钱并不像看起来那么容易
摘要: 空投被区块链应用程序和协议用来吸引初始用户群,并随着时间的推移扩大用户群。在许多空投的情况下,代币被分发给选择的用户作为与基础协议互动的“奖励”,长期目标是创建一个忠诚的社区,即使在空投之后也能产生真正的经济活动。尽管空投被区块链行业广泛使用,但对导致空投成功的因素的适当理解通常缺乏。在这项工作中,我们概述了空投的设计空间,并指定了空投理想应该产生的合理结果的清单。然后,我们从几个规模较大的空投中分析了链上数据,以实证评估先前空投的成功程度,与我们的期望相比。在我们的分析中,我们证明了空投农民经常通过交易所处理大部分空投收益。我们的分析后跟着对普通空投设计可能存在的常见陷阱的概述,然后提出了更好的空投的具体指南。
更新时间: 2024-05-24 07:21:06
领域: cs.CR
Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization
Local Bayesian optimization is a promising practical approach to solve the high dimensional black-box function optimization problem. Among them is the approximated gradient class of methods, which implements a strategy similar to gradient descent. These methods have achieved good experimental results and theoretical guarantees. However, given the distributional properties of the Gaussian processes applied on these methods, there may be potential to further exploit the information of the Gaussian processes to facilitate the BO search. In this work, we develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB), and show that the latter can be a better strategy than direct gradient descent when a Gaussian process is applied as a surrogate. Through this insight, we propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO. We further show that MinUCB maintains a similar convergence rate with GIBO. We then improve the acquisition function of MinUCB further through a look ahead strategy, and obtain a more efficient algorithm LA-MinUCB. We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method. Our algorithms also illustrate improvements on local search strategies from an upper bound perspective in Bayesian optimization, and provides a new direction for future algorithm design.
Updated: 2024-05-24 07:17:24
标题: 将UCB最小化:本地贝叶斯优化中更好的局部搜索策略
摘要: 本地贝叶斯优化是解决高维黑盒函数优化问题的一种有前途的实用方法。其中,近似梯度类方法实施了类似于梯度下降的策略。这些方法取得了良好的实验结果和理论保证。然而,考虑到应用在这些方法上的高斯过程的分布特性,可能进一步利用高斯过程的信息来促进BO搜索。在本研究中,我们建立了梯度下降方法步骤与最小化上置信度界(UCB)的关系,并表明当高斯过程作为代理时,后者可能比直接梯度下降更好。通过这一洞察,我们提出了一种新的本地贝叶斯优化算法,MinUCB,该算法用最小化UCB来替代GIBO中的梯度下降步骤。我们进一步展示了MinUCB与GIBO具有类似的收敛速度。然后,我们通过前瞻策略进一步改进了MinUCB的获取函数,并获得了更高效的算法LA-MinUCB。我们将我们的算法应用于不同的合成和真实世界函数,并结果显示了我们方法的有效性。我们的算法还从贝叶斯优化的上界视角展示了对本地搜索策略的改进,并为未来算法设计提供了一个新方向。
更新时间: 2024-05-24 07:17:24
领域: cs.LG,math.OC
MarkLLM: An Open-Source Toolkit for LLM Watermarking
LLM watermarking, which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of large language models. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily experiment with, understand, and assess the latest advancements. To address these issues, we introduce MarkLLM, an open-source toolkit for LLM watermarking. MarkLLM offers a unified and extensible framework for implementing LLM watermarking algorithms, while providing user-friendly interfaces to ensure ease of access. Furthermore, it enhances understanding by supporting automatic visualization of the underlying mechanisms of these algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines. Through MarkLLM, we aim to support researchers while improving the comprehension and involvement of the general public in LLM watermarking technology, fostering consensus and driving further advancements in research and application. Our code is available at https://github.com/THU-BPM/MarkLLM.
Updated: 2024-05-24 07:15:54
标题: MarkLLM:一种用于LLM水印技术的开源工具包
摘要: LLM数字水印技术将隐形但可通过算法检测的信号嵌入到模型输出中,以识别由LLM生成的文本,已成为减轻大型语言模型潜在误用的关键。然而,LLM数字水印算法的丰富性、复杂的机制以及复杂的评估程序和观点对研究人员和社区构成了挑战,使他们难以轻松地进行实验、理解和评估最新的进展。为了解决这些问题,我们推出了MarkLLM,这是一个开源的LLM数字水印工具包。MarkLLM提供了一个统一且可扩展的框架,用于实现LLM数字水印算法,同时提供用户友好的界面以确保易于访问。此外,它通过支持这些算法的基本机制的自动可视化来增强理解。在评估方面,MarkLLM提供了涵盖三个观点的12个工具套件,以及两种类型的自动评估流水线。通过MarkLLM,我们旨在支持研究人员,同时提高公众对LLM数字水印技术的理解和参与,促进共识,并推动研究和应用的进一步发展。我们的代码可在https://github.com/THU-BPM/MarkLLM上找到。
更新时间: 2024-05-24 07:15:54
领域: cs.CR,cs.CL,68T50,I.2.7
Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation
Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LOPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LOPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.
Updated: 2024-05-24 07:11:42
标题: 迅速调整回击:使用低秩提示适应定制基础模型
摘要: 参数高效微调(PEFT)已成为定制基础模型(FMs)以适应用户特定下游任务的标准。然而,典型的PEFT方法需要存储多个任务特定的适配器,这会导致可扩展性问题,因为这些适配器必须放置和在FM服务器上运行。传统提示微调通过定制任务特定的输入前缀提供了一个潜在的解决方案,但与LoRA等其他PEFT方法相比表现不佳。为了弥补这一差距,我们提出了低秩提示适应(LOPA),这是一种基于提示微调的方法,与最先进的PEFT方法和完整微调方法不相上下,同时更加参数高效,且不需要基于服务器的适配器。LOPA通过在实例之间共享任务特定信息和为每个实例定制之间取得平衡,生成软提示。它使用对为每个实例编码的软提示组件进行低秩分解来实现参数效率。我们对多个自然语言理解和代码生成和理解任务以及不同大小的基础模型进行了全面评估。
更新时间: 2024-05-24 07:11:42
领域: cs.LG,cs.AI
NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classification
We present a neuro-symbolic (NeSy) workflow combining a symbolic-based learning technique with a large language model (LLM) agent to generate synthetic data for code comment classification in the C programming language. We also show how generating controlled synthetic data using this workflow fixes some of the notable weaknesses of LLM-based generation and increases the performance of classical machine learning models on the code comment classification task. Our best model, a Neural Network, achieves a Macro-F1 score of 91.412% with an increase of 1.033% after data augmentation.
Updated: 2024-05-24 07:11:17
标题: NeSy活得好好的:一种基于LLM驱动的符号方法,用于更好地生成和分类代码注释数据
摘要: 我们提出了一种神经符号(NeSy)工作流程,将基于符号的学习技术与大型语言模型(LLM)代理结合起来,用于生成C编程语言中的代码注释分类的合成数据。我们还展示了如何使用这种工作流程生成受控合成数据修复了LLM生成的一些明显弱点,并提高了经典机器学习模型在代码注释分类任务上的性能。我们的最佳模型,一个神经网络,在数据增强后实现了91.412%的Macro-F1得分,增加了1.033%。
更新时间: 2024-05-24 07:11:17
领域: cs.SE,cs.AI
DFGNN: Dual-frequency Graph Neural Network for Sign-aware Feedback
The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underexplored. In this study, we first conducted a comprehensive experimental analysis and found that (1) existing graph neural networks are not well-suited for modeling negative feedback, which acts as a high-frequency signal in a user-item graph. (2) The graph-based recommendation suffers from the representation degeneration problem. Based on the two observations, we propose a novel model that models positive and negative feedback from a frequency filter perspective called Dual-frequency Graph Neural Network for Sign-aware Recommendation (DFGNN). Specifically, in DFGNN, the designed dual-frequency graph filter (DGF) captures both low-frequency and high-frequency signals that contain positive and negative feedback. Furthermore, the proposed signed graph regularization is applied to maintain the user/item embedding uniform in the embedding space to alleviate the representation degeneration problem. Additionally, we conduct extensive experiments on real-world datasets and demonstrate the effectiveness of the proposed model. Codes of our model will be released upon acceptance.
Updated: 2024-05-24 07:07:41
标题: DFGNN: 双频图神经网络用于符号感知反馈
摘要: 基于图的推荐在近年取得了巨大成功。然而,大多数现有的基于图的推荐侧重于基于正向边/反馈捕捉用户偏好,而忽视了在现实世界的推荐系统中广泛存在的负向边/反馈(例如不喜欢、低评分)。如何在基于图的推荐中利用负向反馈仍然未被充分探讨。在这项研究中,我们首先进行了全面的实验分析,发现(1)现有的图神经网络不适合建模作为用户-项目图中高频信号的负向反馈。 (2)基于图的推荐存在表示退化问题。基于这两个观察结果,我们提出了一种新颖的模型,从频率过滤器的角度建模正向和负向反馈,称为双频图神经网络用于符号感知推荐(DFGNN)。具体而言,在DFGNN中,设计的双频图滤波器(DGF)捕捉包含正向和负向反馈的低频和高频信号。此外,提出的符号图正则化方法被应用于维持嵌入空间中的用户/项目嵌入均匀,以减轻表示退化问题。此外,我们在真实世界数据集上进行了大量实验,并展示了所提出模型的有效性。我们的模型代码将在接受后发布。
更新时间: 2024-05-24 07:07:41
领域: cs.IR,cs.AI,cs.LG
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them.
Updated: 2024-05-24 07:05:59
标题: ComboStoc:扩散生成模型的组合随机性
摘要: 在这篇论文中,我们研究了扩散生成模型中一个未被充分探讨但重要的因素,即组合复杂性。数据样本通常是高维的,对于各种结构生成任务,还有额外的属性与数据样本结合。我们发现,现有的扩散生成模型训练方案未能充分采样由维度和属性组合构成的空间,导致测试时性能下降。我们提出了一个简单的解决方案,通过构建完全利用组合结构的随机过程,因此命名为ComboStoc。使用这种简单策略,我们展示了网络训练在包括图像和3D结构形状在内的各种数据模态下显著加速。此外,ComboStoc实现了一种新的测试生成方式,使用不同维度和属性的非同步时间步长,从而可以对它们进行不同程度的控制。
更新时间: 2024-05-24 07:05:59
领域: cs.LG,cs.AI,cs.CV,cs.GR
Provable Training for Graph Contrastive Learning
Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.
Updated: 2024-05-24 07:05:30
标题: 图对比学习的可证明训练
摘要: 图对比学习(GCL)已经成为学习来自增强图的节点嵌入的流行训练方法,而这些图没有标签。尽管已经确立了最大化正节点对之间的相似性,同时最小化负节点对之间的相似性的关键原则,但仍然存在一些基本问题尚不清楚。考虑到复杂的图结构,是否有一些节点在不同的图增强情况下始终受到良好训练,并且遵循这一原则?还是有一些节点更有可能在图增强中未被训练,并且违反该原则?如何区分这些节点并进一步指导GCL的训练?为了回答这些问题,我们首先提出了实验证据,表明GCL的训练确实在所有节点之间存在不平衡。为了解决这个问题,我们提出了“节点紧凑性”这一度量,它是一个节点在范围增强相关的GCL原则下遵循的下限。我们进一步通过边界传播在理论上推导了节点紧凑性的形式,它可以作为正则化集成到二元交叉熵中。为此,我们提出了PrOvable Training(POT)用于GCL,它将GCL的训练规范化,以更好地编码遵循GCL原则的节点嵌入。通过对各种基准测试的广泛实验,POT始终提高了现有的GCL方法,作为一种友好的插件。
更新时间: 2024-05-24 07:05:30
领域: cs.LG
Infinite-Dimensional Feature Interaction
The past neural network design has largely focused on feature representation space dimension and its capacity scaling (e.g., width, depth), but overlooked the feature interaction space scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, multiplications predominantly capture low-order interactions, thus remaining confined to a finite-dimensional interaction space. To transcend this limitation, classic kernel methods emerge as a promising solution to engage features in an infinite-dimensional space. We introduce InfiNet, a model architecture that enables feature interaction within an infinite-dimensional space created by RBF kernel. Our experiments reveal that InfiNet achieves new state-of-the-art, owing to its capability to leverage infinite-dimensional interactions, significantly enhancing model performance.
Updated: 2024-05-24 07:04:23
标题: 无限维特征交互
摘要: 过去的神经网络设计主要集中在特征表示空间维度及其容量扩展(例如,宽度、深度),但忽略了特征交互空间的扩展。最近的进展显示,焦点转向逐元素乘法,以促进更高维的特征交互空间,实现更好的信息转换。尽管取得了进展,但乘法主要捕捉低阶交互,因此仍局限于有限维交互空间。为了超越这一限制,经典的核方法被视为一种有前途的解决方案,可以在无限维空间中进行特征交互。我们引入InfiNet,这是一种模型架构,可以在由RBF核创建的无限维空间中进行特征交互。我们的实验表明,由于其能够利用无限维交互,InfiNet实现了新的最先进水平,显著提高了模型性能。
更新时间: 2024-05-24 07:04:23
领域: cs.LG
A Quantum Approximation Scheme for k-Means
We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
Updated: 2024-05-24 07:01:12
标题: 一个用于k-Means的量子近似方案
摘要: 我们在QRAM模型中提出了一个量子近似方案(即对于每个$\varepsilon > 0$,都是$(1 + \varepsilon)$-近似),用于经典的$k$-均值聚类问题,其运行时间仅对数据点数量具有对数依赖。更具体地说,给定一个存储在QRAM数据结构中的包含$N$个点的数据集$V$,这些点在$\mathbb{R}^d$中,我们的量子算法在时间$\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$内运行,并且以很高的概率输出一个包含$k$个中心的集合$C$,使得$cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$。这里$C_{OPT}$表示最优的$k$个中心,$cost(.)$表示标准的$k$-均值成本函数(即点到最近中心的距离平方和),$\eta$是纵横比(即最大距离与最小距离的比率)。这是第一个具有对数运行时间的量子算法,为$k$-均值问题提供了$(1+\varepsilon)$的可证近似保证。另外,与先前的无监督学习工作不同,我们的量子算法不需要量子线性代数子程序,并且其运行时间与这些过程中出现的参数(例如条件数)无关。
更新时间: 2024-05-24 07:01:12
领域: quant-ph,cs.DS,cs.LG
Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Building upon the understanding of these mechanisms, we can predict the learning behavior of models with different initialization scales when faced with data of varying complexity. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional tasks.
Updated: 2024-05-24 07:00:31
标题: 初始化对于transformers是否适用于通过推理还是记忆复合函数至关重要
摘要: 变压器在各种任务中展现出令人印象深刻的能力,但它们在组合问题上的表现仍然是一个争论的话题。在这项工作中,我们研究了变压器在未知组合任务上的行为机制。我们发现参数初始化规模在决定模型是否学习推理解决方案(捕捉基础组合原语)或对称解决方案(仅仅记忆映射而不理解组合结构)方面起着关键作用。通过分析模型内的信息流和向量表示,我们揭示了这些解决方案类型背后的不同机制。我们进一步发现,推理解决方案表现出低复杂性偏差,我们假设这是使它们能够学习单个锚点的个体映射的关键因素。在理解这些机制的基础上,我们可以预测在面对不同复杂性数据时,具有不同初始化规模的模型的学习行为。我们的发现为初始化规模在塑造变压器学习的解决方案类型及其学习和泛化组合任务能力方面提供了有价值的见解。
更新时间: 2024-05-24 07:00:31
领域: cs.LG
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomaly detection model, which is pre-trained on extensive multi-domain datasets and can subsequently apply to a multitude of downstream scenarios. The significant divergence of time series data across different domains presents two primary challenges in building such a general model: (1) meeting the diverse requirements of appropriate information bottlenecks tailored to different datasets in one unified model, and (2) enabling distinguishment between multiple normal and abnormal patterns, both are crucial for effective anomaly detection in various target scenarios. To tackle these two challenges, we propose a General time series anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders (DADA), which enables flexible selection of bottlenecks based on different data and explicitly enhances clear differentiation between normal and abnormal series. We conduct extensive experiments on nine target datasets from different domains. After pre-training on multi-domain data, DADA, serving as a zero-shot anomaly detector for these datasets, still achieves competitive or even superior results compared to those models tailored to each specific dataset.
Updated: 2024-05-24 06:59:43
标题: 朝向具有自适应瓶颈和双对抗解码器的通用时间序列异常检测器
摘要: 时间序列异常检测在广泛的应用中起着至关重要的作用。现有方法要求为每个数据集训练一个特定模型,这在不同目标数据集之间展现了有限的泛化能力,阻碍了在训练数据稀缺的各种情景中的异常检测性能。针对这个问题,我们提出了构建一个通用的时间序列异常检测模型,该模型在广泛的多领域数据集上进行了预训练,并且随后可应用于多种下游情景。时间序列数据在不同领域之间的显著差异提出了建立这种通用模型的两个主要挑战:(1)在一个统一模型中满足针对不同数据集的适当信息瓶颈的多样化需求,以及 (2)使多个正常和异常模式之间的区分变得明显,这两点对于在各种目标情景中有效地检测异常至关重要。为了解决这两个挑战,我们提出了一个具有自适应瓶颈和双对抗解码器(DADA)的通用时间序列异常检测器,它能够基于不同数据灵活选择瓶颈,并明确增强正常和异常序列之间的清晰区分。我们在来自不同领域的九个目标数据集上进行了广泛的实验。在多领域数据上进行预训练后,DADA作为这些数据集的零样本异常检测器,仍然实现了与针对每个特定数据集定制的模型相比具有竞争力甚至更好的结果。
更新时间: 2024-05-24 06:59:43
领域: cs.LG
A rationale from frequency perspective for grokking in training neural network
Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon across both synthetic and real datasets, offering a novel viewpoint for elucidating the grokking phenomenon by characterizing it through the lens of frequency dynamics during the training process. Our empirical frequency-based analysis sheds new light on understanding the grokking phenomenon and its underlying mechanisms.
Updated: 2024-05-24 06:57:23
标题: 一个基于频率视角的理念:神经网络训练中的理解
摘要: Grokking是指神经网络NNs最初拟合训练数据,随后在训练过程中泛化到测试数据的现象。在本文中,我们从经验角度提供了一个频率视角来解释NNs中出现这一现象的原因。核心洞察是网络最初学习测试数据中存在的较不显著的频率成分。我们观察到这一现象在合成数据集和真实数据集中都存在,通过在训练过程中的频率动态角度对grokking现象进行表征,为阐明grokking现象提供了一种新颖的视角。我们的经验基于频率的分析为理解grokking现象及其潜在机制提供了新的见解。
更新时间: 2024-05-24 06:57:23
领域: cs.LG,cs.NE,stat.ML
Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation
In recent years, there has been a lot of research activity focused on carrying out non-asymptotic convergence analyses for actor-critic algorithms. Recently a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the timescales of the actor and the critic are reversed and only asymptotic convergence shown. In our work, we present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting and present the first finite-time non-asymptotic as well as asymptotic convergence analysis for such a scheme. We obtain optimal learning rates and prove that our algorithm achieves a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.08})$ for the mean squared error of the critic to be upper bounded by $\epsilon$ which is better than the one obtained for two-timescale actor-critic in a similar setting. A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete asymptotic convergence analysis of our scheme in addition to the finite-time bounds that we obtain and show that the (slower) critic recursion converges asymptotically to the attractor of an associated differential inclusion with actor parameters corresponding to local maxima of a perturbed average reward objective. We also show the results of numerical experiments on three benchmark settings and observe that our critic-actor algorithm performs on par and is in fact better than the other algorithms considered.
Updated: 2024-05-24 06:57:17
标题: 使用函数逼近的双时间尺度评论-演员算法用于平均奖励MDPs
摘要: 近年来,有很多研究活动集中在对演员-评论算法进行非渐近收敛分析上。最近,在折扣成本设定下,提出了一种用于查找表情况的两时间尺度评论-演员算法,其中演员和评论的时间尺度被颠倒,并且只展示了渐近收敛。在我们的工作中,我们提出了第一个长期平均奖励设定下具有函数逼近的两时间尺度评论-演员算法,并且为此方案提供了第一个有限时间的非渐近和渐近收敛分析。我们获得了最优学习率,并证明我们的算法实现了评论者的均方误差的样本复杂度为$\mathcal{\tilde{O}}(\epsilon^{-2.08})$,这比在类似情况下获得的两时间尺度演员-评论算法更好。我们分析的一个显著特点是,与最近的单时间尺度演员-评论算法不同,我们除了获得的有限时间界限外,还对我们的方案进行了完整的渐近收敛分析,并展示了(较慢的)评论者递归在演员参数对应于扰动平均奖励目标的局部极大值的微分包含吸引子的渐近收敛。我们还展示了三种基准设置的数值实验结果,并观察到我们的评论-演员算法表现出色,并且实际上比其他考虑的算法更好。
更新时间: 2024-05-24 06:57:17
领域: cs.LG
BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection
Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that could be attacked by inserted triggers in downstream tasks with a high success rate. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the inference stage. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. Specifically, we first prompt the language model (e.g., GPT-4) to produce class-related description texts (benign) and class-perturbed random texts (malignant) by specially designed instructions. Then, the distribution difference in cosine similarity between images and the two types of class description texts can be used as the criterion to detect backdoor samples. Extensive experiments validate that our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
Updated: 2024-05-24 06:52:54
标题: BDetCLIP:多模态提示对比测试时后门检测
摘要: 多模态对比学习方法(例如CLIP)由于其在视觉和文本模态的联合表示学习方面的强大能力,已经展示出令人印象深刻的零样本分类性能。然而,最近的研究发现,在受污染的预训练数据上进行多模态对比学习,只需要很少比例的恶意后门数据,就可以诱发出具有后门的CLIP,这可能会被下游任务中插入触发器的攻击成功。为了防御CLIP上的后门攻击,现有的防御方法要么专注于预训练阶段,要么专注于微调阶段,但这两种方法都会因为参数更新次数过多而导致高计算成本。在本文中,我们提出了第一个在推断阶段对抗后门的计算效率高的检测方法。我们经验性地发现,受后门影响的图像的视觉表示对于课程描述文本的良性和恶性变化都不敏感。受到这一观察的启发,我们提出了BDetCLIP,这是一种基于对比提示的新颖的测试时后门检测方法。具体地,我们首先提示语言模型(例如GPT-4)通过特别设计的指令生成与类别相关的描述文本(良性)和类别扰动的随机文本(恶性)。然后,图像与这两种类型的类别描述文本之间的余弦相似度的分布差异可以作为检测后门样本的标准。大量实验证明,我们提出的BDetCLIP在效果和效率方面均优于现有的最先进的后门检测方法。
更新时间: 2024-05-24 06:52:54
领域: cs.CV,cs.LG
ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks
The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} This motivates a new direction: learning semantic representations hidden in the parameter spaces to characterize mixed-typed noisy data. {Accordingly, we propose a representation learning framework named ParamReL, which operates in the parameter space to obtain parameter-wise latent semantics that exhibit progressive structures. Specifically, ParamReL proposes a \emph{self-}encoder to learn latent semantics directly from parameters, rather than from observations. The encoder is then integrated into BFNs, enabling representation learning with various formats of observations. Mutual information terms further promote the disentanglement of latent semantics and capture meaningful semantics simultaneously.} We illustrate {conditional generation and reconstruction} in ParamReL via expanding BFNs, and extensive {quantitative} experimental results demonstrate the {superior effectiveness} of ParamReL in learning parameter representation.
Updated: 2024-05-24 06:51:38
标题: ParamReL:通过逐步编码贝叶斯流网络学习参数空间表示
摘要: 最近提出的贝叶斯流网络(BFNs)在建模参数空间方面表现出巨大潜力,提供了处理连续、离散和离散数据的统一策略。然而,BFNs不能从参数空间中学习高级语义表示,因为通用编码器无法捕捉参数中的语义变化。这激发了一个新方向:学习隐藏在参数空间中的语义表示,以表征混合类型的嘈杂数据。因此,我们提出了一个称为ParamReL的表示学习框架,它在参数空间中运行,以获取展示渐进结构的逐参数潜在语义。具体而言,ParamReL提出了一个自编码器,直接从参数中学习潜在语义,而不是从观察中学习。然后将编码器集成到BFNs中,从而实现不同格式观察的表示学习。互信息项进一步促进了潜在语义的解缠和同时捕捉有意义的语义。我们通过扩展BFNs说明了ParamReL中的条件生成和重建,并广泛的定量实验结果表明ParamReL在学习参数表示方面的卓越有效性。
更新时间: 2024-05-24 06:51:38
领域: cs.LG
Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images
We present a pioneering investigation into the application of deep learning techniques to analyze histopathological images for addressing the substantial challenge of automated prognostic prediction. Prognostic prediction poses a unique challenge as the ground truth labels are inherently weak, and the model must anticipate future events that are not directly observable in the image. To address this challenge, we propose a novel three-part framework comprising of a convolutional network based tissue segmentation algorithm for region of interest delineation, a contrastive learning module for feature extraction, and a nested multiple instance learning classification module. Our study explores the significance of various regions of interest within the histopathological slides and exploits diverse learning scenarios. The pipeline is initially validated on artificially generated data and a simpler diagnostic task. Transitioning to prognostic prediction, tasks become more challenging. Employing bladder cancer as use case, our best models yield an AUC of 0.721 and 0.678 for recurrence and treatment outcome prediction respectively.
Updated: 2024-05-24 06:45:36
标题: 自对比弱监督学习框架用于使用全切片图像进行预后预测
摘要: 我们提出了一项开创性的研究,探讨了深度学习技术在分析组织病理图像中的应用,以应对自动预后预测的重大挑战。预后预测提出了独特的挑战,因为地面真实标签本质上是弱的,模型必须预测未来在图像中直接无法观察到的事件。为了解决这一挑战,我们提出了一个新颖的三部分框架,包括基于卷积网络的组织分割算法用于感兴趣区域的划分,用于特征提取的对比学习模块,以及一个嵌套的多实例学习分类模块。我们的研究探讨了组织病理学幻灯片中各种感兴趣区域的重要性,并利用了多样的学习场景。该流程首先在人工生成的数据和一个较简单的诊断任务上进行验证。转向预后预测,任务变得更具挑战性。以膀胱癌为案例,我们最佳的模型在复发和治疗结果预测方面分别获得了0.721和0.678的AUC。
更新时间: 2024-05-24 06:45:36
领域: cs.CV,cs.AI
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hindering their ability to generalize to the extensive variety of open-world use cases and identify rare but crucial long-tail risks. Additionally, these static tests fail to adapt to the rapid evolution of LLMs, making it hard to evaluate timely alignment issues. To address these challenges, we propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments. ALI-Agent operates through two principal stages: Emulation and Refinement. During the Emulation stage, ALI-Agent automates the generation of realistic test scenarios. In the Refinement stage, it iteratively refines the scenarios to probe long-tail risks. Specifically, ALI-Agent incorporates a memory module to guide test scenario generation, a tool-using module to reduce human labor in tasks such as evaluating feedback from target LLMs, and an action module to refine tests. Extensive experiments across three aspects of human values--stereotypes, morality, and legality--demonstrate that ALI-Agent, as a general evaluation framework, effectively identifies model misalignment. Systematic analysis also validates that the generated test scenarios represent meaningful use cases, as well as integrate enhanced measures to probe long-tail risks. Our code is available at https://github.com/SophieZheng998/ALI-Agent.git
Updated: 2024-05-24 06:38:49
标题: ALI-Agent:通过基于代理的评估评估LLMs与人类价值观的一致性
摘要: 大型语言模型(LLMs)与人类价值观不一致时可能引发意想不到甚至有害的内容,给用户和社会带来严重风险。为了减轻这些风险,当前的评估基准主要采用专家设计的情境来评估LLMs与人类价值观的一致性。然而,这些基准的劳动密集性限制了它们的测试范围,阻碍了它们推广到广泛的开放世界用例并识别罕见但至关重要的长尾风险的能力。此外,这些静态测试无法适应LLMs的快速演变,使得难以评估及时的一致性问题。为了解决这些挑战,我们提出了ALI-Agent,一个评估框架,利用LLM动力代理的自主能力进行深入和自适应的一致性评估。ALI-Agent通过两个主要阶段运作:模拟和精炼。在模拟阶段,ALI-Agent自动生成真实的测试情境。在精炼阶段,它迭代地完善情境以探究长尾风险。具体而言,ALI-Agent包括一个记忆模块来指导测试情境的生成,一个工具使用模块来减少人类在任务中的劳动,如评估目标LLMs的反馈,以及一个行动模块来完善测试。对人类价值观的三个方面--刻板印象、道德和合法性--进行的广泛实验表明,作为一个通用评估框架,ALI-Agent有效地识别了模型不一致性。系统分析还验证了生成的测试情境代表了有意义的用例,并整合了增强探索长尾风险的措施。我们的代码可以在https://github.com/SophieZheng998/ALI-Agent.git找到。
更新时间: 2024-05-24 06:38:49
领域: cs.AI,cs.CL
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Instruction tuning large language models (LLMs) remains a challenging task, owing to the complexity of hyperparameter selection and the difficulty involved in evaluating the tuned models. To determine the optimal hyperparameters, an automatic, robust, and reliable evaluation benchmark is essential. However, establishing such a benchmark is not a trivial task due to the challenges associated with evaluation accuracy and privacy protection. In response to these challenges, we introduce a judge large language model, named PandaLM, which is trained to distinguish the superior model given several LLMs. PandaLM's focus extends beyond just the objective correctness of responses, which is the main focus of traditional evaluation datasets. It addresses vital subjective factors such as relative conciseness, clarity, adherence to instructions, comprehensiveness, and formality. To ensure the reliability of PandaLM, we collect a diverse human-annotated test dataset, where all contexts are generated by humans and labels are aligned with human preferences. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLM enables the evaluation of LLM to be fairer but with less cost, evidenced by significant improvements achieved by models tuned through PandaLM compared to their counterparts trained with default Alpaca's hyperparameters. In addition, PandaLM does not depend on API-based evaluations, thus avoiding potential data leakage. All resources of PandaLM are released at https://github.com/WeOpenML/PandaLM.
Updated: 2024-05-24 06:37:31
标题: PandaLM:用于LLM指令调优优化的自动评估基准Benchmark
摘要: 指导调整大型语言模型(LLMs)仍然是一个具有挑战性的任务,这是由于超参数选择的复杂性和评估调整模型的困难所致。为了确定最佳超参数,自动、可靠和可靠的评估基准是必不可少的。然而,建立这样一个基准并不是一项简单的任务,因为与评估准确性和隐私保护相关的挑战。针对这些挑战,我们引入了一种名为PandaLM的评价大型语言模型,该模型经过训练,可以区分出在给定多个LLMs的情况下哪一个是更优越的模型。PandaLM的重点不仅仅是对回答的客观正确性,这是传统评估数据集的主要关注点。它还考虑了诸如相对简洁性、清晰度、对指令的遵守、全面性和正式性等重要主观因素。为了确保PandaLM的可靠性,我们收集了一个多样化的人工标注测试数据集,其中所有上下文均由人类生成,标签与人类偏好相一致。我们的结果表明,PandaLM-7B在我们的测试数据集上以F1分数为93.75%达到了GPT-3.5的评估能力,以88.28%达到了GPT-4的评估能力。PandaLM使得LLM的评估更加公平,但成本更低,通过PandaLM调整的模型相比于使用默认Alpaca的超参数进行训练的模型取得了显著的改进。此外,PandaLM不依赖于基于API的评估,从而避免潜在的数据泄露。PandaLM的所有资源均在https://github.com/WeOpenML/PandaLM 上公开发布。
更新时间: 2024-05-24 06:37:31
领域: cs.CL,cs.AI
Leakage-Resilient and Carbon-Neutral Aggregation Featuring the Federated AI-enabled Critical Infrastructure
AI-enabled critical infrastructures (ACIs) integrate artificial intelligence (AI) technologies into various essential systems and services that are vital to the functioning of society, offering significant implications for efficiency, security and resilience. While adopting decentralized AI approaches (such as federated learning technology) in ACIs is plausible, private and sensitive data are still susceptible to data reconstruction attacks through gradient optimization. In this work, we propose Compressed Differentially Private Aggregation (CDPA), a leakage-resilient, communication-efficient, and carbon-neutral approach for ACI networks. Specifically, CDPA has introduced a novel random bit-flipping mechanism as its primary innovation. This mechanism first converts gradients into a specific binary representation and then selectively flips masked bits with a certain probability. The proposed bit-flipping introduces a larger variance to the noise while providing differentially private protection and commendable efforts in energy savings while applying vector quantization techniques within the context of federated learning. The experimental evaluation indicates that CDPA can reduce communication cost by half while preserving model utility. Moreover, we demonstrate that CDPA can effectively defend against state-of-the-art data reconstruction attacks in both computer vision and natural language processing tasks. We highlight existing benchmarks that generate 2.6x to over 100x more carbon emissions than CDPA. We hope that the CDPA developed in this paper can inform the federated AI-enabled critical infrastructure of a more balanced trade-off between utility and privacy, resilience protection, as well as a better carbon offset with less communication overhead.
Updated: 2024-05-24 06:35:09
标题: 具有泄露韧性和碳中性的基于联合人工智能的关键基础设施聚合
摘要: AI-enabled critical infrastructures (ACIs) integrate artificial intelligence (AI) technologies into various essential systems and services that are vital to the functioning of society, offering significant implications for efficiency, security, and resilience. While adopting decentralized AI approaches (such as federated learning technology) in ACIs is plausible, private and sensitive data are still susceptible to data reconstruction attacks through gradient optimization. In this work, we propose Compressed Differentially Private Aggregation (CDPA), a leakage-resilient, communication-efficient, and carbon-neutral approach for ACI networks. Specifically, CDPA has introduced a novel random bit-flipping mechanism as its primary innovation. This mechanism first converts gradients into a specific binary representation and then selectively flips masked bits with a certain probability. The proposed bit-flipping introduces a larger variance to the noise while providing differentially private protection and commendable efforts in energy savings while applying vector quantization techniques within the context of federated learning. The experimental evaluation indicates that CDPA can reduce communication cost by half while preserving model utility. Moreover, we demonstrate that CDPA can effectively defend against state-of-the-art data reconstruction attacks in both computer vision and natural language processing tasks. We highlight existing benchmarks that generate 2.6x to over 100x more carbon emissions than CDPA. We hope that the CDPA developed in this paper can inform the federated AI-enabled critical infrastructure of a more balanced trade-off between utility and privacy, resilience protection, as well as a better carbon offset with less communication overhead.
更新时间: 2024-05-24 06:35:09
领域: cs.CR
Enhancing Q-Learning with Large Language Model Heuristics
Q-learning excels in learning from feedback within sequential decision-making tasks but often requires extensive sampling to achieve significant improvements. While reward shaping can enhance learning efficiency, non-potential-based methods introduce biases that affect performance, and potential-based reward shaping, though unbiased, lacks the ability to provide heuristics for state-action pairs, limiting its effectiveness in complex environments. Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations. To address these challenges, we propose \textbf{LLM-guided Q-learning}, a framework that leverages LLMs as heuristics to aid in learning the Q-function for reinforcement learning. Our theoretical analysis demonstrates that this approach adapts to hallucinations, improves sample efficiency, and avoids biasing final performance. Experimental results show that our algorithm is general, robust, and capable of preventing ineffective exploration.
Updated: 2024-05-24 06:32:06
标题: 用大型语言模型启发增强Q学习
摘要: Q学习在序贯决策任务中从反馈中学习方面表现出色,但通常需要大量抽样才能实现显著改进。尽管奖励塑造可以提高学习效率,但非基于潜力的方法会引入影响性能的偏见,而基于潜力的奖励塑造虽然无偏,但缺乏为状态-动作对提供启发的能力,从而限制了其在复杂环境中的有效性。大型语言模型(LLMs)可以实现对简单任务的零-shot学习,但它们受到推理速度低和偶发幻觉等问题的困扰。为了解决这些挑战,我们提出了一种名为LLM引导Q学习的框架,利用LLMs作为启发,帮助学习强化学习的Q函数。我们的理论分析表明,这种方法适应幻觉,提高采样效率,并避免偏向最终性能。实验结果显示,我们的算法具有通用性,鲁棒性,并能防止无效探索。
更新时间: 2024-05-24 06:32:06
领域: cs.LG,cs.AI
Generalized Laplace Approximation
In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors. We interpret the generalization of the posterior with a temperature factor as a correction for misspecified models through adjustments to the joint probability model, and the recalibration of priors by redistributing probability mass on models within the hypothesis space using data samples. Additionally, we highlight a distinctive feature of Laplace approximation, which ensures that the generalized normalizing constant can be treated as invariant, unlike the typical scenario in general Bayesian learning where this constant varies with model parameters post-generalization. Building on this insight, we propose the generalized Laplace approximation, which involves a simple adjustment to the computation of the Hessian matrix of the regularized loss function. This method offers a flexible and scalable framework for obtaining high-quality posterior distributions. We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
Updated: 2024-05-24 06:31:55
标题: 广义拉普拉斯逼近
摘要: 最近几年,贝叶斯深度学习中的不一致性引起了越来越多的关注。调节或广义后验分布通常提供了直接和有效的解决方案。然而,理解广义后验的潜在原因并评估其有效性仍然是研究的活跃领域。在这项研究中,我们引入了一个统一的理论框架,将贝叶斯不一致性归因于模型规范错误和先验不足。我们将后验的泛化与温度因子解释为通过调整联合概率模型纠正规范错误模型,并通过在假设空间内的模型上重新分配数据样本上的概率质量来重新校准先验。此外,我们强调拉普拉斯近似的一个独特特征,即确保广义归一化常数可以被视为不变,不像在一般贝叶斯学习中的典型情况,这个常数会随着模型参数的泛化而变化。基于这一理解,我们提出了广义拉普拉斯近似,涉及对正则化损失函数的海森矩阵计算进行简单调整。这种方法提供了一个灵活且可扩展的框架,用于获得高质量的后验分布。我们评估了广义拉普拉斯近似在最先进的神经网络和真实世界数据集上的性能和特性。
更新时间: 2024-05-24 06:31:55
领域: cs.LG,stat.ML
FTMixer: Frequency and Time Domain Representations Fusion for Time Series Modeling
Time series data can be represented in both the time and frequency domains, with the time domain emphasizing local dependencies and the frequency domain highlighting global dependencies. To harness the strengths of both domains in capturing local and global dependencies, we propose the Frequency and Time Domain Mixer (FTMixer). To exploit the global characteristics of the frequency domain, we introduce the Frequency Channel Convolution (FCC) module, designed to capture global inter-series dependencies. Inspired by the windowing concept in frequency domain transformations, we present the Windowing Frequency Convolution (WFC) module to capture local dependencies. The WFC module first applies frequency transformation within each window, followed by convolution across windows. Furthermore, to better capture these local dependencies, we employ channel-independent scheme to mix the time domain and frequency domain patches. Notably, FTMixer employs the Discrete Cosine Transformation (DCT) with real numbers instead of the complex-number-based Discrete Fourier Transformation (DFT), enabling direct utilization of modern deep learning operators in the frequency domain. Extensive experimental results across seven real-world long-term time series datasets demonstrate the superiority of FTMixer, in terms of both forecasting performance and computational efficiency.
Updated: 2024-05-24 06:31:46
标题: FTMixer:频率和时间域表示融合用于时间序列建模
摘要: 时间序列数据可以在时间域和频率域中表示,时间域强调局部依赖关系,而频率域突出全局依赖关系。为了充分利用捕捉局部和全局依赖关系的两个领域的优势,我们提出了频率和时间域混合器(FTMixer)。为了利用频率域的全局特征,我们引入了频率通道卷积(FCC)模块,旨在捕捉全局跨序列依赖关系。受频率域转换中的窗口概念启发,我们提出了窗口频率卷积(WFC)模块来捕捉局部依赖关系。WFC模块首先在每个窗口内应用频率变换,然后在窗口之间进行卷积。此外,为了更好地捕捉这些局部依赖关系,我们采用通道独立方案来混合时间域和频率域补丁。值得注意的是,FTMixer使用实数的离散余弦变换(DCT)而不是基于复数的离散傅里叶变换(DFT),从而可以直接利用现代深度学习算子在频率域中。对七个真实世界的长期时间序列数据集进行的大量实验结果表明,FTMixer在预测性能和计算效率方面表现出优越性。
更新时间: 2024-05-24 06:31:46
领域: cs.LG
Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples.
Updated: 2024-05-24 06:30:36
标题: 超参数化范围之外的神经网络的新型核模型和精确表征理论
摘要: 这篇论文提出了两种神经网络模型及其训练方法,适用于任意宽度、深度和拓扑结构的神经网络,假设仅有有限能量的神经激活;以及一种新颖的代表理论,用矩阵值核描述神经网络。第一种模型是精确的(非近似的)和全局的,将神经网络视为再生核Banach空间中的元素;我们使用该模型提供对Rademacher复杂度的严格界限。第二种模型是精确的和局部的,将神经网络功能的变化(由权重和偏差的有界变化引起,即训练步骤)在再生核希尔伯特空间中用局部内在神经核(LiNK)描述。这种局部模型通过对网络适应性的Rademacher复杂度提供了洞察力。我们还证明神经切线核(NTK)是LiNK核的一阶近似。最后,由于LiNK由于技术原因无法提供代表理论,我们提出了一种准确的新代表理论,用局部外在神经核(LeNK)描述逐层神经网络训练的非正则化梯度下降。这种代表理论揭示了高阶统计在神经网络训练中的作用以及核演化在神经网络核模型中的影响。在整个论文中,使用前馈ReLU网络和残差网络(ResNet)作为说明性示例。
更新时间: 2024-05-24 06:30:36
领域: stat.ML,cs.AI,cs.LG
Fast 3D Molecule Generation via Unified Geometric Optimal Transport
This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a unified, equivalent, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivalent networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a flow refinement and purification mechanism for optimal coupling identification. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to a unified optimal transport path for fast 3D molecule generation. The purification filters the subpar molecules to ensure the ultimate generation performance. We theoretically prove the proposed method indeed reduced the transport cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty.
Updated: 2024-05-24 06:22:01
标题: 快速三维分子生成通过统一几何最优输送
摘要: 本文提出了一个新的三维分子生成框架,称为GOAT,用于基于流匹配最优传输目标进行快速有效的三维分子生成。具体来说,我们制定了一个几何传输公式,用于衡量在基础分布和目标数据分布之间映射多模态特征(例如连续原子坐标和分类原子类型)的成本。我们的公式在一个统一的、等价的、平滑的表征空间中解决。这是通过将多模态特征转换为具有等价网络的连续潜在空间来实现的。此外,我们发现确定最佳分布耦合对于在任意两个分布之间进行快速有效传输是必要的。我们进一步提出了一种流精炼和净化机制用于最佳耦合识别。通过这样做,GOAT可以将任意分布耦合转化为新的确定性耦合,从而实现快速三维分子生成的统一最优传输路径。净化过程将次优分子过滤掉,以确保最终生成性能。我们在理论上证明了所提出的方法确实降低了传输成本。最后,广泛的实验表明,GOAT具有解决几何最优传输的效率,比次优方法快两倍,并在有效性、独特性和新颖性方面实现了最佳生成质量。
更新时间: 2024-05-24 06:22:01
领域: cs.LG
Learning to optimize: A tutorial for continuous and mixed-integer optimization
Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike.
Updated: 2024-05-24 06:21:01
标题: 学习优化:连续和混合整数优化的教程
摘要: Learning to Optimize (L2O) 处于传统优化和机器学习的交集,利用机器学习的能力来增强传统优化技术。由于现实世界中的优化问题通常共享共同的结构,L2O 提供了一种工具来利用这些结构以获得更好或更快的解决方案。本教程深入探讨了 L2O 技术,介绍了如何加速优化算法,迅速估计解决方案,甚至重塑优化问题本身,使其更适应现实世界的应用。通过考虑成功应用 L2O 的先决条件和手头优化问题的结构,本教程为从业者和研究人员提供了全面的指南。
更新时间: 2024-05-24 06:21:01
领域: math.OC,cs.LG,stat.ML
Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth
Chatbots' role in fostering self-reflection is now widely recognized, especially in inducing users' behavior change. While the benefits of 24/7 availability, scalability, and consistent responses have been demonstrated in contexts such as healthcare and tutoring to help one form a new habit, their utilization in coaching necessitating deeper introspective dialogue to induce leadership growth remains unexplored. This paper explores the potential of such a chatbot powered by recent Large Language Models (LLMs) in collaboration with professional coaches in the field of executive coaching. Through a design workshop with them and two weeks of user study involving ten coach-client pairs, we explored the feasibility and nuances of integrating chatbots to complement human coaches. Our findings highlight the benefits of chatbots' ubiquity and reasoning capabilities enabled by LLMs while identifying their limitations and design necessities for effective collaboration between human coaches and chatbots. By doing so, this work contributes to the foundation for augmenting one's self-reflective process with prevalent conversational agents through the human-in-the-loop approach.
Updated: 2024-05-24 06:20:56
标题: 教练副驾驶:LLM动力聊天机器人和人类教练的混合形式,有效支持领导力成长的自我反思
摘要: 聊天机器人在促进自我反思方面的作用得到了广泛认可,尤其是在诱导用户行为改变方面。虽然在医疗保健和辅导等领域已经证明了24/7可用性、可扩展性和一致性响应的好处,以帮助形成新习惯,但它们在需要更深层次内省对话以诱导领导力增长的教练领域的利用尚未被探索。本文探讨了由最新的大型语言模型(LLMs)支持的这种聊天机器人与执行教练领域的专业教练合作的潜力。通过与他们进行设计研讨会和涉及十对教练-客户的两周用户研究,我们探讨了整合聊天机器人以补充人类教练的可行性和细微差别。我们的研究结果突出了聊天机器人通过LLMs实现的无处不在和推理能力带来的好处,同时识别了它们的限制和有效协作所需的设计要求。通过这样做,这项工作为通过人在回路中方法将自我反思过程与流行的对话代理相结合做出了贡献。
更新时间: 2024-05-24 06:20:56
领域: cs.HC,cs.AI
Learning Dynamical Systems by Leveraging Data from Similar Systems
We consider the problem of learning the dynamics of a linear system when one has access to data generated by an auxiliary system that shares similar (but not identical) dynamics, in addition to data from the true system. We use a weighted least squares approach, and provide finite sample error bounds of the learned model as a function of the number of samples and various system parameters from the two systems as well as the weight assigned to the auxiliary data. We show that the auxiliary data can help to reduce the intrinsic system identification error due to noise, at the price of adding a portion of error that is due to the differences between the two system models. We further provide a data-dependent bound that is computable when some prior knowledge about the systems, such as upper bounds on noise levels and model difference, is available. This bound can also be used to determine the weight that should be assigned to the auxiliary data during the model training stage.
Updated: 2024-05-24 06:19:00
标题: 通过利用类似系统的数据学习动态系统
摘要: 我们考虑学习线性系统动态的问题,当一个人可以访问由一个辅助系统生成的数据时,该辅助系统具有类似(但不完全相同)的动态,除了真实系统的数据。我们使用加权最小二乘法,并提供学习模型的有限样本误差界,该界限是样本数量和来自两个系统的各种系统参数以及分配给辅助数据的权重的函数。我们表明,辅助数据可以帮助减少由于噪声引起的固有系统识别误差,但代价是增加由于两个系统模型之间差异而产生的部分误差。我们进一步提供一个数据相关的界限,当一些关于系统的先验知识可用时,例如噪声水平和模型差异的上限。该界限还可用于确定在模型训练阶段应分配给辅助数据的权重。
更新时间: 2024-05-24 06:19:00
领域: stat.ML,cs.LG,cs.SY,eess.SY
Learning Antenna Pointing Correction in Operations: Efficient Calibration of a Black Box
We propose an efficient offline pointing calibration method for operational antenna systems which does not require any downtime. Our approach minimizes the calibration effort and exploits technical signal information which is typically used for monitoring and control purposes in ground station operations. Using a standard antenna interface and data from an operational satellite contact, we come up with a robust strategy for training data set generation. On top of this, we learn the parameters of a suitable coordinate transform by means of linear regression. In our experiments, we show the usefulness of the method in a real-world setup.
Updated: 2024-05-24 06:17:05
标题: 学习中的天线指向校正:黑匣子的高效校准
摘要: 我们提出了一种高效的离线指向校准方法,适用于运行中的天线系统,不需要任何停机时间。我们的方法最大程度地减少了校准工作量,并利用了通常用于地面站操作监控和控制目的的技术信号信息。利用标准天线接口和操作卫星联系的数据,我们设计了一个稳健的训练数据集生成策略。除此之外,我们通过线性回归学习了适当坐标变换的参数。在我们的实验中,我们展示了该方法在真实环境中的实用性。
更新时间: 2024-05-24 06:17:05
领域: cs.LG
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
We propose the Data Contamination Quiz (DCQ), a simple and effective approach to detect data contamination in large language models (LLMs) and estimate the amount of it. Specifically, we frame data contamination detection as a series of multiple-choice questions and devise a quiz format wherein three perturbed versions of each subsampled instance from a specific dataset partition (e.g., GSM8k test set) are created. These changes only include word-level perturbations. The generated perturbations, along with the original dataset instance, form the options in the DCQ, with an extra option accommodating the possibility of selecting none of the provided options. Given that the only distinguishing signal among the options is the exact wording with respect to the original dataset instance, an LLM, when tasked with identifying the original dataset instance, gravitates towards selecting the original one if it has been exposed to it in its pre-training phase -- a trait intrinsic to LLMs. While accounting for positional biases in LLMs, the quiz performance reveals the contamination level for the model being examined with the dataset partition to which the quiz pertains. Applied to various datasets with GPT-4 and GPT-3.5, our findings -- while fully lacking access to pre-training data and model parameters -- suggest that DCQ achieves state-of-the-art results and uncovers greater contamination/memorization levels compared to existing methods and proficiently bypasses more safety filters, especially those set to avoid generating copyrighted contents.
Updated: 2024-05-24 06:14:09
标题: 数据污染测验:一种用于检测和估计大型语言模型中污染的工具
摘要: 我们提出了数据污染测验(DCQ),这是一种简单有效的方法,用于检测大型语言模型(LLMs)中的数据污染并估计其数量。具体而言,我们将数据污染检测构建为一系列多项选择问题,并设计了一种测验格式,在其中创建了来自特定数据集分区(例如GSM8k测试集)的每个子采样实例的三个扰动版本。这些变化仅包括单词级的扰动。生成的扰动,以及原始数据集实例,构成了DCQ中的选项,额外的选项适应了选择提供的选项中的任何一个的可能性。鉴于选项之间唯一的区分信号是与原始数据集实例相关的确切措辞,当LLM被要求识别原始数据集实例时,如果其在预训练阶段已经接触过它,它会倾向于选择原始数据集实例--这是LLMs固有的特征。考虑到LLMs中的位置偏差,测验性能揭示了正在检查的模型与测验相关的数据集分区的污染水平。应用于使用GPT-4和GPT-3.5的各种数据集,我们的发现--尽管完全没有访问预训练数据和模型参数--表明DCQ实现了最先进的结果,并发现了与现有方法相比更大的污染/记忆水平,并且有效地绕过了更多的安全过滤器,特别是那些旨在避免生成受版权保护的内容。
更新时间: 2024-05-24 06:14:09
领域: cs.CL,cs.AI,cs.LG
Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee
The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor behavior into multiple components according to the state space of RL. Each malicious agent hides one component in its policy and shares its policy with the benign agents. When a benign agent learns all the poisoned policies, the backdoor attack is assembled in its policy. The theoretical proof is given to show that our cooperative method can successfully inject the backdoor into the RL policies of benign agents. Compared with the existing backdoor attacks, our cooperative method is more covert since the policy from each attacker only contains a component of the backdoor attack and is harder to detect. Extensive simulations are conducted based on Atari environments to demonstrate the efficiency and covertness of our method. To the best of our knowledge, this is the first paper presenting a provable cooperative backdoor attack in decentralized reinforcement learning.
Updated: 2024-05-24 06:13:31
标题: 去中心化强化学习中具有理论保证的合作后门攻击
摘要: 去中心化强化学习(RL)的安全性是一个具有挑战性的问题,因为恶意代理可以与良性代理共享其受污染的策略。本文调查了在去中心化强化学习场景中的合作后门攻击。与现有方法不同,这种方法将后门行为根据RL的状态空间分解为多个组件。每个恶意代理在其策略中隐藏一个组件,并与良性代理共享其策略。当一个良性代理学习了所有受污染的策略时,后门攻击就会在其策略中组装起来。理论证明表明,我们的合作方法可以成功地将后门注入到良性代理的RL策略中。与现有的后门攻击相比,我们的合作方法更加隐蔽,因为每个攻击者的策略只包含后门攻击的一个组件,更难以检测。基于Atari环境进行了大量模拟实验,以展示我们方法的效率和隐密性。据我们所知,这是第一篇提出了在去中心化强化学习中可证明的合作后门攻击的论文。
更新时间: 2024-05-24 06:13:31
领域: cs.LG,cs.AI
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To improve the recall of the RAG for the target contexts, we introduce a knowledge graph to construct structured data to achieve hard matching at a fine-grained level. Moreover, we normalize the backdoor scenarios in LLMs to analyze the real harm caused by backdoors from both attackers' and users' perspectives and further verify whether the context is a favorable tool for jailbreaking models. Extensive experimental results on truthfulness, language understanding, and harmfulness show that TrojanRAG exhibits versatility threats while maintaining retrieval capabilities on normal queries.
Updated: 2024-05-24 06:12:51
标题: 特洛伊木马RAG:检索增强生成可以成为大型语言模型中的后门驱动程序
摘要: 大型语言模型(LLMs)尽管在自然语言处理(NLP)中表现出色,但引发了关于潜在安全威胁的担忧。后门攻击最初验证了LLM在所有阶段都造成了实质性危害,但成本和鲁棒性受到了批评。攻击LLMs在安全审查中具有固有风险,而且成本过高。此外,LLMs的持续迭代将降低后门的鲁棒性。在本文中,我们提出了TrojanRAG,它利用检索增强生成中的联合后门攻击,从而在通用攻击场景中操纵LLMs。具体而言,对手构建了精心设计的目标上下文和触发集。通过对比学习,通过正交优化多对后门快捷方式,从而将触发条件限制在参数子空间以改善匹配。为了提高RAG对目标上下文的召回率,我们引入了知识图来构建结构化数据,以在细粒度水平上实现硬匹配。此外,我们规范了LLMs中的后门场景,以分析后门从攻击者和用户的角度造成的实际危害,并进一步验证上下文是否是破解模型的有利工具。关于真实性、语言理解和有害性的广泛实验结果表明,TrojanRAG展示了多样化的威胁,同时保持了对正常查询的检索能力。
更新时间: 2024-05-24 06:12:51
领域: cs.CR,cs.CL
Adversarial Attacks on Hidden Tasks in Multi-Task Learning
Deep learning models are susceptible to adversarial attacks, where slight perturbations to input data lead to misclassification. Adversarial attacks become increasingly effective with access to information about the targeted classifier. In the context of multi-task learning, where a single model learns multiple tasks simultaneously, attackers may aim to exploit vulnerabilities in specific tasks with limited information. This paper investigates the feasibility of attacking hidden tasks within multi-task classifiers, where model access regarding the hidden target task and labeled data for the hidden target task are not available, but model access regarding the non-target tasks is available. We propose a novel adversarial attack method that leverages knowledge from non-target tasks and the shared backbone network of the multi-task model to force the model to forget knowledge related to the target task. Experimental results on CelebA and DeepFashion datasets demonstrate the effectiveness of our method in degrading the accuracy of hidden tasks while preserving the performance of visible tasks, contributing to the understanding of adversarial vulnerabilities in multi-task classifiers.
Updated: 2024-05-24 06:11:30
标题: 对多任务学习中隐藏任务的对抗性攻击
摘要: 深度学习模型容易受到对抗性攻击的影响,即对输入数据进行轻微扰动会导致错误分类。对抗性攻击会随着对目标分类器的信息获取而变得越来越有效。在多任务学习的背景下,一个模型同时学习多个任务,攻击者可能会利用有限信息来利用特定任务的漏洞。本文调查了在多任务分类器中攻击隐藏任务的可行性,其中隐藏目标任务的模型访问和标记数据不可用,但非目标任务的模型访问是可用的。我们提出了一种新颖的对抗性攻击方法,利用非目标任务的知识和多任务模型的共享主干网络,迫使模型忘记与目标任务相关的知识。CelebA和DeepFashion数据集上的实验结果证明了我们的方法在降低隐藏任务的准确性同时保持可见任务性能方面的有效性,有助于理解多任务分类器中的对抗性漏洞。
更新时间: 2024-05-24 06:11:30
领域: cs.LG
Towards Real World Debiasing: A Fine-grained Analysis On Spurious Correlation
Spurious correlations in training data significantly hinder the generalization capability of machine learning models when faced with distribution shifts in real-world scenarios. To tackle the problem, numerous debias approaches have been proposed and benchmarked on datasets intentionally designed with severe biases. However, it remains to be asked: \textit{1. Do existing benchmarks really capture biases in the real world? 2. Can existing debias methods handle biases in the real world?} To answer the questions, we revisit biased distributions in existing benchmarks and real-world datasets, and propose a fine-grained framework for analyzing dataset bias by disentangling it into the magnitude and prevalence of bias. We observe and theoretically demonstrate that existing benchmarks poorly represent real-world biases. We further introduce two novel biased distributions to bridge this gap, forming a nuanced evaluation framework for real-world debiasing. Building upon these results, we evaluate existing debias methods with our evaluation framework. Results show that existing methods are incapable of handling real-world biases. Through in-depth analysis, we propose a simple yet effective approach that can be easily applied to existing debias methods, named Debias in Destruction (DiD). Empirical results demonstrate the superiority of DiD, improving the performance of existing methods on all types of biases within the proposed evaluation framework.
Updated: 2024-05-24 06:06:41
标题: 走向真实世界的去偏差化:对伪相关性进行细粒度分析
摘要: 训练数据中的虚假相关性显著阻碍了机器学习模型在面对真实世界情境中的分布偏移时的泛化能力。为了解决这个问题,已经提出了许多去偏方法,并在故意设计有严重偏见的数据集上进行了基准测试。然而,仍然有人问到:1. 现有基准是否真正捕捉到真实世界中的偏见?2. 现有去偏方法是否能处理真实世界中的偏见?为了回答这些问题,我们重新审视了现有基准和真实世界数据集中的偏见分布,并提出了一个细致的框架,通过将偏见分解为偏见的程度和普遍性来分析数据集的偏见。我们观察到并从理论上证明了现有基准不足以代表真实世界的偏见。我们进一步引入了两种新的偏见分布来弥合这一差距,形成了一个细致的真实世界去偏评估框架。基于这些结果,我们使用我们的评估框架评估了现有的去偏方法。结果显示,现有方法无法处理真实世界中的偏见。通过深入分析,我们提出了一个简单而有效的方法,可以轻松应用于现有的去偏方法,称为去偏破坏(DiD)。实证结果表明了DiD的优越性,在建议的评估框架内提高了现有方法在所有类型的偏见上的性能。
更新时间: 2024-05-24 06:06:41
领域: cs.LG,cs.CV
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection
This paper describes PAWS-VMK, a prototypical deep learning approach that obtains state-of-the-art results for image classification tasks in both a semi-supervised learning (SSL) and out-of-distribution (OOD) detection context. We consider developments in the fields of SSL, OOD detection, and computer vision foundation models to introduce a number of innovations that connect the key ideas within these works to create PAWS-VMK. These innovations include (1) parametric von Mises-Fisher Stochastic Neighbour Embedding (vMF-SNE) to initialise a projection head for SSL using the high-quality embeddings of the foundation model; (2) the PAWS-MixMatch loss, that creates more compact embeddings and obtains higher accuracy in comparison to the consistency loss used in PAWS and (3) simple $k$-Means prototype selection (SKMPS), a simple technique that obtains competitive performance with more complex unsupervised label selection approaches. PAWS-VMK sets new benchmarks in semi-supervised learning for CIFAR-10 (99.2%) and CIFAR-100 (89.8%) with four labelled instances per class, and Food-101 (90.1%) with two labelled instances per class. We also observe that PAWS-VMK can efficiently detect OOD samples in a manner that is competitive with specialised methods specifically designed for this purpose, achieving 93.1/98.0 and 95.2/96.3 on the CIFAR-10 and CIFAR-100 OpenOOD benchmarks.
Updated: 2024-05-24 06:06:34
标题: PAWS-VMK:一种统一的半监督学习和分布外检测方法
摘要: 这篇论文描述了PAWS-VMK,这是一个原型深度学习方法,可以在半监督学习(SSL)和超出分布(OOD)检测情境下获得图像分类任务的最先进结果。我们考虑了SSL、OOD检测和计算机视觉基础模型领域的发展,引入了一些创新,将这些工作中的关键思想联系起来,创建了PAWS-VMK。这些创新包括:(1)参数化von Mises-Fisher随机邻域嵌入(vMF-SNE)用于使用基础模型的高质量嵌入初始化SSL的投影头;(2)PAWS-MixMatch损失,相对于PAWS中使用的一致性损失,可以创建更紧凑的嵌入并获得更高的准确性;(3)简单的$k$-Means原型选择(SKMPS),这是一种简单的技术,可以通过更复杂的无监督标签选择方法获得具有竞争力的性能。PAWS-VMK在每个类别有四个标记实例的情况下,为CIFAR-10(99.2%)和CIFAR-100(89.8%)设立了新的基准,并且对于每个类别有两个标记实例的情况下,为Food-101(90.1%)设立了新的基准。我们还观察到,PAWS-VMK可以有效地检测OOD样本,其性能与专门设计用于此目的的方法具有竞争力,在CIFAR-10和CIFAR-100的OpenOOD基准上分别达到了93.1/98.0和95.2/96.3。
更新时间: 2024-05-24 06:06:34
领域: cs.CV,cs.LG
Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction
Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG signals that is capable of generating channel-agnostic, unified representations for ECG signals. Through joint optimization of reconstruction and alignment, we ensure that the embeddings of the different channels contain an amalgamation of the overall information across channels while also retaining their specific information. On an independent test dataset, we generated highly correlated channel embeddings from different ECG channels, leading to a moderate approximation of the 12-lead signals from a single-channel embedding. Our generated embeddings can work as competent features for ECG signals for downstream tasks.
Updated: 2024-05-24 06:06:05
标题: 多导联心电图信号的模态减少表示学习:通过同时对齐和重建
摘要: 心电图(ECG)信号是用于诊断应用的电活动的概况,然而,ECG系统需要多个导联或信号通道来捕捉心脏系统的完整视图,这限制了它们在智能手表和可穿戴设备中的应用。在这项工作中,我们提出了一种用于ECG信号的模态减少表示学习方法,能够生成通道不可知的统一ECG信号表示。通过重建和对齐的联合优化,我们确保不同通道的嵌入包含跨通道的整体信息的融合,同时保留它们的特定信息。在独立的测试数据集上,我们从不同ECG通道生成了高度相关的通道嵌入,从而从单一通道嵌入中适度近似出12导联信号。我们生成的嵌入可以作为ECG信号的有竞争力的特征用于下游任务。
更新时间: 2024-05-24 06:06:05
领域: eess.SP,cs.LG
ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning
With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to capture domain-specific features from time series data across various domains for adaptive transfer in downstream tasks. To address these challenges, we propose a Register Assisted General Time Series Forecasting Model with Decomposed Frequency Learning (ROSE), a novel pre-trained model for time series forecasting. ROSE employs Decomposed Frequency Learning for the pre-training task, which decomposes coupled semantic and periodic information in time series with frequency-based masking and reconstruction to obtain unified representations across domains. We also equip ROSE with a Time Series Register, which learns to generate a register codebook to capture domain-specific representations during pre-training and enhances domain-adaptive transfer by selecting related register tokens on downstream tasks. After pre-training on large-scale time series data, ROSE achieves state-of-the-art forecasting performance on 8 real-world benchmarks. Remarkably, even in few-shot scenarios, it demonstrates competitive or superior performance compared to existing methods trained with full data.
Updated: 2024-05-24 06:01:09
标题: ROSE:使用分解频率学习辅助的通用时间序列预测
摘要: 随着各个领域时间序列数据的不断增加收集,对于预先在大量时间序列数据集上训练的通用时间序列预测模型以支持各种下游预测任务需求日益增长。实现通用时间序列预测面临两个挑战:如何从多领域时间序列数据中获取统一的表示,并如何捕捉跨不同领域时间序列数据的领域特定特征,以便在下游任务中进行自适应转移。为解决这些挑战,我们提出了一种具有分解频率学习的注册辅助通用时间序列预测模型(ROSE),这是一种新颖的时间序列预测预训练模型。ROSE采用分解频率学习进行预训练任务,通过基于频率的遮蔽和重构来分解时间序列中的耦合语义和周期信息,以获取跨领域的统一表示。我们还为ROSE配备了时间序列注册器,该注册器学习生成一个注册码书,以捕获在预训练期间的领域特定表示,并通过在下游任务中选择相关的注册令牌来增强领域自适应转移。在大规模时间序列数据上进行预训练后,ROSE在8个现实世界基准测试中实现了最先进的预测性能。值得注意的是,即使在少样本场景中,与使用完整数据训练的现有方法相比,它也展示出具有竞争力或优越的性能。
更新时间: 2024-05-24 06:01:09
领域: cs.LG,stat.ML
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms without compromising the ability to capture long-range dependencies and in-context learning. At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its kernel conditioned on input sequence using a dedicated conditioning neural network. We design two simple conditioning networks that maintain shift equivariance in our data-dependent convolution operation. The dynamic nature of the proposed convolution kernel grants Orchid high expressivity while maintaining quasilinear scalability for long sequences. We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality. Our experiments demonstrate that this architecture not only outperforms traditional attention-based architectures such as BERT and Vision Transformers with smaller model sizes, but also extends the feasible sequence length beyond the limitations of the dense attention layers. This achievement represents a significant step towards more efficient and scalable deep learning models for sequence modeling.
Updated: 2024-05-24 05:51:52
标题: 兰花:用于序列建模的灵活且数据相关的卷积
摘要: 在快速发展的深度学习领域,对既具有表达力又具有计算效率的模型的需求从未如此关键。本文介绍了Orchid,这是一种新颖的架构,旨在解决传统注意力机制的二次复杂度,同时不损害捕捉长距离依赖性和上下文学习的能力。这种架构的核心是一种新的数据相关全局卷积层,它通过专门的条件神经网络根据输入序列情况调整其内核。我们设计了两个简单的条件网络,以在我们的数据相关卷积操作中保持移位等变性。所提出的卷积核的动态特性赋予Orchid高表达力,同时在长序列上保持准线性可扩展性。我们在多个领域对所提出的模型进行评估,包括语言建模和图像分类,以突显其性能和通用性。我们的实验证明,这种架构不仅在模型尺寸较小的情况下优于传统基于注意力的架构(如BERT和Vision Transformers),而且将可行序列长度扩展到超出密集注意力层的限制。这一成就代表了迈向更高效和可扩展的序列建模深度学习模型的重要一步。
更新时间: 2024-05-24 05:51:52
领域: cs.LG
Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both powerful tools for making decisions in the presence of uncertainties. Previous efforts have aimed to establish their connections, revealing equivalences in specific formulations. This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure (Ruszczy\'nski 2010), and establishes its equivalence with a class of soft robust MDP (RMDP) problems, including the standard RMDP as a special case. Leveraging this equivalence, we further derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method under the tabular setting with direct parameterization. This forms a sharp contrast to the Markov risk measure, known to be potentially non-gradient-dominant (Huang et al. 2021). We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI), for a specific soft RMDP problem with a KL-divergence regularization term (or equivalently the risk-sensitive MDP with an entropy risk measure). We showcase its streamlined design and less stringent assumptions due to the equivalence and analyze its sample complexity
Updated: 2024-05-24 05:51:18
标题: 软鲁棒MDPs和风险敏感MDPs:等价性、策略梯度和样本复杂性
摘要: 坚固的马尔可夫决策过程(MDPs)和风险敏感MDPs都是在存在不确定性的情况下做出决策的强大工具。先前的努力旨在建立它们之间的联系,揭示了特定公式中的等价性。本文介绍了一种新的风险敏感MDPs公式,与经典的马尔可夫风险测度(Ruszczy\'nski 2010)相比,以略有不同的方式评估风险,并建立其与一类软鲁棒MDP(RMDP)问题的等价性,包括标准的RMDP作为特例。利用这种等价性,我们进一步推导了两种问题的政策梯度定理,证明了在具有直接参数化的表格设置下精确政策梯度方法的梯度支配和全局收敛。这与已知的有可能非梯度支配的马尔可夫风险测度形成鲜明对比(Huang等,2021)。我们还提出了一种基于样本的离线学习算法,即强健拟合-Z迭代(RFZI),用于具有KL散度正则化项的特定软RMDP问题(或等效地,具有熵风险测度的风险敏感MDP)。我们展示了其简化的设计和对假设的要求较少,这是由于等价性,分析了其样本复杂性。
更新时间: 2024-05-24 05:51:18
领域: math.OC,cs.LG
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs' image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at: https://github.com/OPTML-Group/AdvUnlearn
Updated: 2024-05-24 05:47:23
标题: 扩散模型中对抗训练的防御性遗忘以实现稳健概念消除
摘要: 扩散模型(DMs)在文本到图像生成方面取得了显著成功,但它们也存在安全风险,如可能生成有害内容和侵犯版权。机器取消学习技术,也称为概念消除,已经发展出来以解决这些风险。然而,这些技术仍然容易受到对抗提示攻击的影响,这可能导致DMs在取消学习后重新生成包含被删除概念(如裸露)的不受欢迎图像。本研究旨在通过将对抗训练(AT)原则整合到机器取消学习中,提高概念消除的鲁棒性,从而产生被称为AdvUnlearn的鲁棒取消学习框架。然而,要有效且高效地实现这一目标非常困难。首先,我们发现直接实施AT会损害DMs取消学习后的图像生成质量。为了解决这个问题,我们在一个额外的保留集上开发了一个保留实用性的正则化,优化了AdvUnlearn中概念消除鲁棒性和模型实用性之间的权衡。此外,我们发现文本编码器相比UNet更适合于鲁棒化模块,确保取消学习的有效性。获取的文本编码器可以作为各种DM类型的即插即用鲁棒取消学习器。在实证上,我们进行了大量实验,展示了AdvUnlearn在各种DM取消学习场景中的鲁棒优势,包括消除裸露、对象和风格概念。除了鲁棒性外,AdvUnlearn还实现了与模型实用性的平衡权衡。据我们所知,这是第一项通过AT系统地探索鲁棒DM取消学习的工作,使其与现有忽视概念消除鲁棒性的方法区别开来。源代码可在以下网址获取:https://github.com/OPTML-Group/AdvUnlearn
更新时间: 2024-05-24 05:47:23
领域: cs.CV,cs.CR
Cardinality Estimation on Hyper-relational Knowledge Graphs
Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE over has achieved great success in knowledge graphs (KGs) that consist of triple facts. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers, where qualifiers provide additional context to the fact. However, existing CE methods over KGs achieve unsatisfying performance on HKGs due to the complexity of qualifiers in HKGs. Also, there is only one dataset for HKG query cardinality estimation, i.e., WD50K-QE, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-attached graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments illustrate that the proposed hyper-relational query encoder outperforms all state-of-the-art CE methods over three popular HKGs on the diverse and unbiased benchmark.
Updated: 2024-05-24 05:44:43
标题: 在超关系知识图上的基数估计
摘要: 查询的基数估计(CE)是在不执行的情况下估计结果数量,是查询优化中的一个有效指标。最近,CE在由三元事实组成的知识图(KGs)中取得了巨大成功。为了更准确地表示事实,当前研究人员提出了超关系KGs(HKGs)来表示具有限定词的三元事实,其中限定词为事实提供了额外的上下文。然而,由于HKG中限定词的复杂性,现有的CE方法在HKG上取得了不尽人意的表现。此外,HKG查询基数估计只有一个数据集,即WD50K-QE,这并不全面,只涵盖了有限的模式。HKG上的查询集的缺乏也成为全面研究HKG上CE问题的瓶颈。在这项工作中,我们首先构建了三个流行HKG上多样化和公正的超关系查询集,以便进行CE研究。此外,我们还提出了一种新颖的带有限定词的图神经网络(GNN)模型,有效地整合了限定词信息,并自适应地结合了多个GNN层的输出,以准确预测基数。我们的实验表明,所提出的超关系查询编码器在多样化和公正的基准上优于所有流行HKG上的最先进CE方法。
更新时间: 2024-05-24 05:44:43
领域: cs.LG
$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labelled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
Updated: 2024-05-24 05:42:11
标题: $i$REPO: 基于隐式奖励两两差异的经验偏好优化
摘要: 尽管大型语言模型(LLM)具有惊人的能力,但有时会产生与人类期望不符的输出。这种偏离使得需要进行对齐阶段,以防止传播不真实、有毒或有偏见的信息。基于强化学习的传统对齐方法通常难以应对已识别的不稳定性,而基于偏好优化方法受到它们对预先收集的硬标签数据集的过拟合的限制。在本文中,我们提出了一种名为$i$REPO的新颖LLM对齐框架,它利用隐式奖励成对差异回归进行经验偏好优化。特别地,$i$REPO利用由经验人类(或AI标注者)标记的自动生成数据集,通过一种基于回归的损失函数迭代地优化对齐策略。此外,我们引入了一种创新算法,通过理论保证实现在理想假设下的最佳结果,并在没有这些假设的情况下提供实际性能差距结果。通过Phi-2和Mistral-7B的实验结果表明,$i$REPO有效地利用软标签、自动生成的响应和经验AI标注者的logit实现了自对齐。此外,我们的方法在使用语言模型评估工具和多轮基准测试中超过了偏好优化基线。
更新时间: 2024-05-24 05:42:11
领域: cs.AI,cs.LG
Learning from True-False Labels via Multi-modal Prompt Retrieving
Weakly supervised learning has recently achieved considerable success in reducing annotation costs and label noise. Unfortunately, existing weakly supervised learning methods are short of ability in generating reliable labels via pre-trained vision-language models (VLMs). In this paper, we propose a novel weakly supervised labeling setting, namely True-False Labels (TFLs) which can achieve high accuracy when generated by VLMs. The TFL indicates whether an instance belongs to the label, which is randomly and uniformly sampled from the candidate label set. Specifically, we theoretically derive a risk-consistent estimator to explore and utilize the conditional probability distribution information of TFLs. Besides, we propose a convolutional-based Multi-modal Prompt Retrieving (MRP) method to bridge the gap between the knowledge of VLMs and target learning tasks. Experimental results demonstrate the effectiveness of the proposed TFL setting and MRP learning method. The code to reproduce the experiments is at https://github.com/Tranquilxu/TMP.
Updated: 2024-05-24 05:39:15
标题: 通过多模态提示检索从真假标签中学习
摘要: 弱监督学习最近在减少注释成本和标签噪音方面取得了相当大的成功。不幸的是,现有的弱监督学习方法在通过预训练的视觉-语言模型(VLMs)生成可靠标签方面能力不足。在本文中,我们提出了一种新颖的弱监督标签设置,即真假标签(TFLs),可以在由VLMs生成时实现高准确性。TFL指示一个实例是否属于从候选标签集中随机均匀抽样的标签。具体来说,我们从理论上推导了一个风险一致的估计器,以探索和利用TFLs的条件概率分布信息。此外,我们提出了一种基于卷积的多模态提示检索(MRP)方法,以弥合VLMs的知识和目标学习任务之间的差距。实验结果证明了所提出的TFL设置和MRP学习方法的有效性。重现实验的代码位于https://github.com/Tranquilxu/TMP。
更新时间: 2024-05-24 05:39:15
领域: cs.LG,cs.CV
Inverse Feasibility in Over-the-Air Federated Learning
We introduce the concept of inverse feasibility for linear forward models as a tool to enhance OTA FL algorithms. Inverse feasibility is defined as an upper bound on the condition number of the forward operator as a function of its parameters. We analyze an existing OTA FL model using this definition, identify areas for improvement, and propose a new OTA FL model. Numerical experiments illustrate the main implications of the theoretical results. The proposed framework, which is based on inverse problem theory, can potentially complement existing notions of security and privacy by providing additional desirable characteristics to networks.
Updated: 2024-05-24 05:29:46
标题: 逆向可行性在空中联邦学习中的应用
摘要: 我们引入了线性正向模型的反可行性概念,作为增强OTA FL算法的工具。反可行性被定义为正向算子的条件数的上界,作为其参数的函数。我们使用这个定义分析了一个现有的OTA FL模型,识别了需要改进的领域,并提出了一个新的OTA FL模型。数值实验展示了理论结果的主要含义。基于逆问题理论的提出的框架,可能通过为网络提供额外的理想特性,来补充现有的安全性和隐私概念。
更新时间: 2024-05-24 05:29:46
领域: stat.ML,cs.LG
iVideoGPT: Interactive VideoGPTs are Scalable World Models
World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens, facilitating an interactive experience of agents via next-token prediction. iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations. Leveraging its scalable architecture, we are able to pre-train iVideoGPT on millions of human and robotic manipulation trajectories, establishing a versatile foundation that is adaptable to serve as interactive world models for a wide range of downstream tasks. These include action-conditioned video prediction, visual planning, and model-based reinforcement learning, where iVideoGPT achieves competitive performance compared with state-of-the-art methods. Our work advances the development of interactive general world models, bridging the gap between generative video models and practical model-based reinforcement learning applications.
Updated: 2024-05-24 05:29:12
标题: iVideoGPT:交互式VideoGPT是可伸缩的世界模型
摘要: 世界模型赋予基于模型的代理能力,使其能够在想象的环境中进行互动式探索、推理和规划,以进行实际决策。然而,互动性需求的增加在利用最近的视频生成模型的最新进展以实现规模化世界模型发展方面带来了挑战。本文介绍了交互式VideoGPT(iVideoGPT),这是一个可扩展的自回归变换器框架,将多模态信号--视觉观察、动作和奖励--整合到一系列令牌中,通过下一个令牌的预测来促进代理的互动体验。iVideoGPT具有一种新颖的压缩令牌化技术,可以有效离散化高维视觉观察。借助其可扩展的架构,我们能够在数百万人类和机器人操作轨迹上进行iVideoGPT的预训练,建立一个多功能的基础,可适用于各种下游任务的互动式世界模型。这些任务包括动作条件视频预测、视觉规划和基于模型的强化学习,在这些任务中,iVideoGPT与最先进的方法相比表现出竞争性能。我们的工作推动了互动通用世界模型的发展,弥合了生成视频模型与实际基于模型的强化学习应用之间的差距。
更新时间: 2024-05-24 05:29:12
领域: cs.CV,cs.LG,cs.RO
Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation
Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's knowledge base with distinguishable but previously overlooked information. Furthermore, we propose the label-wise meta-correlation module (LWMCM) to harness relationships among objects with and without labels, and obtain enhanced objects information. Specially, we propose target feature generator (TFG) to generate the features representation of the unlabeled target objects. Subsequently, the unlabeled object identifier (UOI) module assesses whether the unlabeled target object appears in the current observation frame captured by the camera and produces an adapted target features representation specific to the observed context. In meta contrastive feature modifier (MCFM), the target features is modified via approaching the features of objects within the observation frame while distancing itself from features of unobserved objects. Finally, the meta object-graph learner (MOGL) module is utilized to calculate the relationships among objects based on the features. Experiments conducted on AI2THOR and RoboTHOR platforms demonstrate the effectiveness of our proposed method.
Updated: 2024-05-24 05:26:18
标题: 利用未知对象构建标记-未标记的元关系,用于零样本目标导航
摘要: 零样本物体导航(ZSON)解决了一个情况,即代理在未见过的训练集中不存在的物体上导航。先前的研究主要使用具有已知标签的已见物体来训练代理,并忽略了没有标签的已见物体。在本文中,我们将没有标签的已见物体引入到训练过程中,称之为“未知物体”,以丰富代理的知识库,包含可区分但以前被忽视的信息。此外,我们提出了标签相关元模块(LWMCM),以利用具有和没有标签的物体之间的关系,并获得增强的物体信息。特别地,我们提出了目标特征生成器(TFG),用于生成未标记目标物体的特征表示。随后,未标记物体识别器(UOI)模块评估未标记目标物体是否出现在相机捕捉的当前观察帧中,并产生适应观察环境的特定目标特征表示。在元对比特征修改器(MCFM)中,目标特征通过接近观察帧内物体的特征而远离未观察到的物体的特征进行修改。最后,利用元对象图学习器(MOGL)模块基于特征计算物体之间的关系。在AI2THOR和RoboTHOR平台上进行的实验证明了我们提出的方法的有效性。
更新时间: 2024-05-24 05:26:18
领域: cs.CV,cs.AI,cs.RO
Critical Learning Periods Emerge Even in Deep Linear Networks
Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that critical periods depend on the depth of the model and structure of the data distribution. We also show analytically and in simulations that the learning of features is tied to competition between sources. Finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. To the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.
Updated: 2024-05-24 05:23:57
标题: 即使在深度线性网络中,关键学习时期仍然出现
摘要: 关键学习期是在发育早期发生的临时感觉缺陷可能对行为和学习表征产生永久影响的时期。尽管生物网络和人工网络之间存在根本性差异,但关键学习期在这两种系统中均有经验观察到。这表明关键期可能对学习至关重要,而不是生物学的偶然。然而,为什么深度网络中会出现关键学习期仍然是一个悬而未决的问题,尤其是不清楚在这两种系统中观察到的关键期是否取决于特定的结构或优化细节。为了分离关键的基础因素,我们专注于深度线性网络模型,并且展示出,令人惊讶的是,这样的网络也展示出生物和人工网络中看到的许多行为,同时可以进行分析处理。我们展示了关键期取决于模型的深度和数据分布的结构。我们还通过分析和模拟显示,特征的学习与来源之间的竞争有关。最后,我们将我们的分析扩展到多任务学习,以显示在某些任务上进行预训练可能会损害对新任务的转移性能,并展示这取决于任务之间的关系和预训练阶段的持续时间。据我们所知,我们的工作提供了第一个可以进行分析处理的模型,阐明了为什么关键学习期会在生物和人工网络中出现。
更新时间: 2024-05-24 05:23:57
领域: cs.LG,cs.AI,q-bio.NC,stat.ML
AGS-GNN: Attribute-guided Sampling for Graph Neural Networks
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ($\le$ 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.
Updated: 2024-05-24 05:15:46
标题: AGS-GNN:图神经网络的属性引导采样
摘要: 我们提出了AGS-GNN,这是一种新颖的属性引导采样算法,用于图神经网络(GNN),它利用了图的节点特征和连接结构,同时适应了图中同质性和异质性。在同质性图中,同一类别的顶点更有可能连接,而在异质性图中,不同类别的顶点往往相连。虽然GNN已成功应用于同质性图,但将其应用于异质性图仍具有挑战性。最适用于异质性图的GNN并不适合采样范式,计算成本高且不具有归纳性。我们使用基于特征相似性和特征多样性的采样器来选择节点的邻居子集,并通过双通道自适应地捕获同质性和异质性邻域的信息。目前,AGS-GNN是我们所知的唯一一个通过相似和不同邻域样本明确控制同质性的算法。对于不同邻域采样,我们使用了此前未在此上下文中使用的子模性。采样分布是预先计算的,高度并行,实现了期望的可扩展性。通过一个包含35个小(≤ 100K个节点)和大(> 100K个节点)同质性和异质性图的大量数据集,我们展示了AGS-GNN相对于文献中当前方法的优越性。AGS-GNN实现了与表现最佳的异质性GNN相当的测试准确性,甚至胜过使用整个图进行节点分类的方法。与随机采样邻域的方法相比,AGS-GNN也收敛更快,并可纳入已使用节点或图采样的现有GNN模型中。
更新时间: 2024-05-24 05:15:46
领域: cs.LG
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make. Error correction models are designed to fix ASR errors, however, they showed little improvement over traditional LMs mainly due to the lack of supervised training data. In this paper, we present Denoising LM (DLM), which is a $\textit{scaled}$ error correction model trained with vast amounts of synthetic data, significantly exceeding prior attempts meanwhile achieving new state-of-the-art ASR performance. We use text-to-speech (TTS) systems to synthesize audio, which is fed into an ASR system to produce noisy hypotheses, which are then paired with the original texts to train the DLM. DLM has several $\textit{key ingredients}$: (i) up-scaled model and data; (ii) usage of multi-speaker TTS systems; (iii) combination of multiple noise augmentation strategies; and (iv) new decoding techniques. With a Transformer-CTC ASR, DLM achieves 1.5% word error rate (WER) on $\textit{test-clean}$ and 3.3% WER on $\textit{test-other}$ on Librispeech, which to our knowledge are the best reported numbers in the setting where no external audio data are used and even match self-supervised methods which use external audio data. Furthermore, a single DLM is applicable to different ASRs, and greatly surpassing the performance of conventional LM based beam-search rescoring. These results indicate that properly investigated error correction models have the potential to replace conventional LMs, holding the key to a new level of accuracy in ASR systems.
Updated: 2024-05-24 05:05:12
标题: 去噪LM:将错误校正模型的极限推向语音识别
摘要: 语言模型(LMs)长期以来一直用于改善自动语音识别(ASR)系统的结果,但它们对ASR系统产生的错误毫无察觉。错误校正模型旨在修复ASR错误,然而,由于缺乏监督训练数据,它们在传统LMs上表现出的改进很少。在本文中,我们提出了去噪LM(DLM),它是一个经过训练的$\textit{scaled}$错误校正模型,使用大量合成数据,显著超过以往的尝试,同时实现了新的最先进的ASR性能。我们使用文本转语音(TTS)系统合成音频,将其输入ASR系统生成噪声假设,然后将其与原始文本配对训练DLM。DLM具有几个$\textit{key ingredients}$:(i)放大的模型和数据;(ii)使用多扬声器TTS系统;(iii)结合多种噪声增强策略;以及(iv)新的解码技术。通过Transformer-CTC ASR,DLM在Librispeech的$\textit{test-clean}$上实现了1.5%的字词错误率(WER),在$\textit{test-other}$上实现了3.3%的WER,据我们所知,这些是在没有使用外部音频数据的情况下报告的最佳数字,甚至与使用外部音频数据的自监督方法相匹配。此外,单个DLM适用于不同的ASRs,并大大超越基于传统LM的波束搜索重排性能。这些结果表明,经过适当调查的错误校正模型有潜力取代传统的LMs,成为ASR系统精度的新水平的关键。
更新时间: 2024-05-24 05:05:12
领域: cs.LG,cs.CL,cs.SD,eess.AS
Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?
We study the problem of reaching agreement in a synchronous distributed system by $n$ autonomous parties, when the communication links from/to faulty parties can omit messages. The faulty parties are selected and controlled by an adaptive, full-information, computationally unbounded adversary. We design a randomized algorithm that works in $O(\sqrt{n}\log^2 n)$ rounds and sends $O(n^2\log^3 n)$ communication bits, where the number of faulty parties is $\Theta(n)$. Our result is simultaneously tight for both these measures within polylogarithmic factors: due to the $\Omega(n^2)$ lower bound on communication by Abraham et al. (PODC'19) and $\Omega(\sqrt{n/\log n})$ lower bound on the number of rounds by Bar-Joseph and Ben-Or (PODC'98). We also quantify how much randomness is necessary and sufficient to reduce time complexity to a certain value, while keeping the communication complexity (nearly) optimal. We prove that no MC algorithm can work in less than $\Omega(\frac{n^2}{\max\{R,n\}\log n})$ rounds if it uses less than $O(R)$ calls to a random source, assuming a constant fraction of faulty parties. This can be contrasted with a long line of work on consensus against an {\em adversary limited to polynomial computation time}, thus unable to break cryptographic primitives, culminating in a work by Ghinea et al. (EUROCRYPT'22), where an optimal $O(r)$-round solution with probability $1-(cr)^{-r}$ is given. Our lower bound strictly separates these two regimes, by excluding such results if the adversary is computationally unbounded. On the upper bound side, we show that for $R\in\tilde{O}(n^{3/2})$ there exists an algorithm solving consensus in $\tilde{O}(\frac{n^2}{R})$ rounds with high probability, where tilde notation hides a polylogarithmic factor. The communication complexity of the algorithm does not depend on the amount of randomness $R$ and stays optimal within polylogarithmic factor.
Updated: 2024-05-24 04:59:32
标题: 几乎最佳的容忍自适应遗漏的共识:为什么需要大量的随机性?
摘要: 我们研究了在一个由$n$个自治方参与的同步分布式系统中,当来自/去往有故障的方的通信链路可能会省略消息时达成一致的问题。这些有故障的方是由一个自适应的、具有完全信息的、计算能力无限的对手选择和控制的。我们设计了一个随机算法,在$O(\sqrt{n}\log^2 n)$轮内运行,并发送$O(n^2\log^3 n)$通信比特,其中有故障的方的数量为$\Theta(n)$。我们的结果对于这两个度量同时是紧密的,即在多对数因子内是紧密的:由Abraham等人在PODC'19中对通信的$\Omega(n^2)$下界和对Bar-Joseph和Ben-Or在PODC'98中对轮次的$\Omega(\sqrt{n/\log n})$下界。我们还量化了为了将时间复杂度降低到某个值而需要和足够的随机性,同时保持通信复杂度(几乎)最优。我们证明了如果一个MC算法对一个恒定比例的有故障的方使用小于$O(R)$次对一个随机源的调用,则它不能在少于$\Omega(\frac{n^2}{\max\{R,n\}\log n})$轮内运行。这与一系列针对一个{\em 有多项式计算时间限制的对手}的共识工作形成对比,这种对手无法破解密码原语,最终在Ghinea等人在EUROCRYPT'22中给出了一个概率为$1-(cr)^{-r}$的最优$O(r)$轮解决方案。我们的下界严格地将这两种情况分开,通过排除如果对手的计算能力是无限的话这样的结果。在上界方面,我们表明对于$R\in\tilde{O}(n^{3/2})$,存在一个算法以高概率在$\tilde{O}(\frac{n^2}{R})$轮内解决共识问题,其中tilde符号隐藏了一个多对数因子。该算法的通信复杂度不依赖于随机性$R$的数量,并且在多对数因子内保持最优。
更新时间: 2024-05-24 04:59:32
领域: cs.DC,cs.CR,cs.DS
ConfusionPrompt: Practical Private Inference for Online Large Language Models
State-of-the-art large language models (LLMs) are commonly deployed as online services, necessitating users to transmit informative prompts to cloud servers, thus engendering substantial privacy concerns. In response, we present ConfusionPrompt, a novel private LLM inference framework designed to obfuscate the server by: (i) decomposing the prompt into sub-prompts, and (ii) generating pseudo prompts along with the genuine sub-prompts as input to the online LLM. Eventually, the returned responses can be recomposed by the user to obtain the final whole response. Such designs endows our framework with advantages over previous protocols that (i) it can be seamlessly integrated with existing black-box LLMs, and (ii) it achieves significantly better privacy-utility trade-off than existing text perturbation-based methods. We develop a $(\lambda, \mu, \rho)$-privacy model to formulate the requirement for a privacy-preserving group of prompts, and provide a complexity analysis, affirming ConfusionPrompt's efficiency. Our empirical evaluation reveals that our method offers significantly higher utility compared to local inference methods using open-source models and perturbation-based techniques, while also requiring much less memory than open-source LLMs.
Updated: 2024-05-24 04:57:36
标题: 混淆提示:针对在线大型语言模型的实用私密推断
摘要: 最新的大型语言模型(LLMs)通常作为在线服务部署,需要用户向云服务器传递信息提示,从而引发了重大的隐私问题。作为回应,我们提出了ConfusionPrompt,这是一个新颖的私密LLM推断框架,旨在通过以下方式混淆服务器:(i)将提示分解为子提示,以及(ii)生成伪提示以及真实子提示作为在线LLM的输入。最终,用户可以重新组合返回的响应以获得最终的完整响应。这种设计赋予了我们的框架优势,相对于先前的协议,它可以无缝集成现有的黑盒LLMs,并且比现有的基于文本扰动的方法实现了明显更好的隐私效用权衡。我们开发了一个(λ,μ,ρ)-隐私模型来规定对隐私保护的提示组的要求,并提供了一个复杂性分析,证实了ConfusionPrompt的效率。我们的实证评估表明,与使用开源模型和基于扰动的技术的本地推断方法相比,我们的方法提供了显着更高的效用,同时比开源LLMs需要更少的内存。
更新时间: 2024-05-24 04:57:36
领域: cs.CR,cs.AI,I.2.7
How Culturally Aware are Vision-Language Models?
An image is often said to be worth a thousand words, and certain images can tell rich and insightful stories. Can these stories be told via image captioning? Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture. Our research compares the performance of four popular vision-language models (GPT-4V, Gemini Pro Vision, LLaVA, and OpenFlamingo) in identifying culturally specific information in such images and creating accurate and culturally sensitive image captions. We also propose a new evaluation metric, Cultural Awareness Score (CAS), dedicated to measuring the degree of cultural awareness in image captions. We provide a dataset MOSAIC-1.5k, labeled with ground truth for images containing cultural background and context, as well as a labeled dataset with assigned Cultural Awareness Scores that can be used with unseen data. Creating culturally appropriate image captions is valuable for scientific research and can be beneficial for many practical applications. We envision that our work will promote a deeper integration of cultural sensitivity in AI applications worldwide. By making the dataset and Cultural Awareness Score available to the public, we aim to facilitate further research in this area, encouraging the development of more culturally aware AI systems that respect and celebrate global diversity.
Updated: 2024-05-24 04:45:14
标题: 视觉语言模型对文化的认识有多深?
摘要: 一幅图像常常被说成价值千言,某些图像能够讲述丰富而有见地的故事。这些故事能够通过图像标题来讲述吗?来自民俗流派的图像,如神话、民间舞蹈、文化符号和象征,对每个文化都至关重要。我们的研究比较了四种流行的视觉语言模型(GPT-4V、Gemini Pro Vision、LLaVA和OpenFlamingo)在识别这些图像中的文化特定信息并创建准确且具有文化敏感性的图像标题方面的表现。我们还提出了一个新的评估指标,即文化意识分数(CAS),专门用于衡量图像标题中的文化意识程度。我们提供了一个名为MOSAIC-1.5k的数据集,其中标记有包含文化背景和语境的图像的真实信息,以及一个带有指定文化意识分数的标记数据集,可用于未知数据。创建具有文化适应性的图像标题对于科学研究是有价值的,对许多实际应用也是有益的。我们设想我们的工作将促进全球范围内对人工智能应用中文化敏感性的更深入融合。通过向公众提供数据集和文化意识分数,我们旨在促进该领域的进一步研究,鼓励开发更加尊重和庆祝全球多样性的文化意识AI系统。
更新时间: 2024-05-24 04:45:14
领域: cs.CV,cs.AI,cs.CL,cs.LG
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the decoding process without sacrificing output quality. The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel. Extensive experiments validate that our method substantially reduces decoding time while maintaining generation quality, i.e., 33\% speed up on natural language generation with no quality loss, and 30\% speed up on code generation with a negligible quality loss of 3\%. Distinctively, LUD requires no auxiliary models and does not require changes to existing architectures. It can also be integrated with other decoding acceleration methods, thus achieving an even more pronounced inference efficiency boost. We posit that the foundational principles of LUD could define a new decoding paradigm for future language models, enhancing their applicability for a broader spectrum of applications. All codes are be publicly available at https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-. Keywords: Parallel Decoding, Lexical Unit Decoding, Large Language Model
Updated: 2024-05-24 04:35:13
标题: 思维速度解码:利用LLMs的词汇单元并行解码
摘要: 大型语言模型已经展示了在自然语言理解和生成方面的出色能力。然而,它们的生成速度受到了解码过程本质上的顺序性的限制,这给实时应用带来了挑战。本文介绍了一种新的解码方法:词汇单元解码(LUD),这种解码方法以数据驱动的方式实现,加速了解码过程而不牺牲输出质量。我们方法的核心观察是,一个经过预训练的语言模型可以自信地预测多个连续的标记,形成一个\textit{词汇单元},在这个词汇单元中,这些连续的标记可以并行解码。大量实验证实了我们的方法显著减少了解码时间,同时保持生成质量,即在自然语言生成方面加快了33\%,没有质量损失,在代码生成方面加快了30%,质量损失仅为3%。与众不同的是,LUD不需要辅助模型,也不需要对现有体系结构进行更改。它还可以与其他解码加速方法集成,从而实现更明显的推理效率提升。我们认为,LUD的基本原理可能定义了未来语言模型的新解码范式,增强了它们在更广泛应用领域的适用性。所有代码将会公开在https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-。关键词:并行解码,词汇单元解码,大型语言模型
更新时间: 2024-05-24 04:35:13
领域: cs.CL,cs.AI
Cross-Task Defense: Instruction-Tuning LLMs for Content Safety
Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robust defenses for LLMs in processing malicious documents alongside benign NLP task queries. We introduce a defense dataset comprised of safety-related examples and propose single-task and mixed-task losses for instruction tuning. Our empirical results demonstrate that LLMs can significantly enhance their capacity to safely manage dangerous content with appropriate instruction tuning. Additionally, strengthening the defenses of tasks most susceptible to misuse is effective in protecting LLMs against processing harmful information. We also observe that trade-offs between utility and safety exist in defense strategies, where Llama2, utilizing our proposed approach, displays a significantly better balance compared to Llama1.
Updated: 2024-05-24 04:14:32
标题: 跨任务防御:调整指令的LLMs以确保内容安全
摘要: 最近的研究表明,大型语言模型(LLMs)在平衡安全性和效用方面面临挑战,特别是在处理用于NLP任务(如总结和翻译)的长文本时。尽管有针对恶意短问题的防御措施,但LLMs安全处理危险的长内容(例如教授非法活动的手册)的能力仍不清楚。我们的工作旨在为LLMs在处理恶意文档时开发强大的防御措施,同时处理良性的NLP任务查询。我们引入了一个由与安全相关的示例组成的防御数据集,并提出了用于指导调整的单任务和混合任务损失。我们的实证结果表明,LLMs可以通过适当的指导调整显著增强其安全处理危险内容的能力。此外,加强对最容易被滥用的任务的防御措施,可以有效保护LLMs免受处理有害信息的影响。我们还观察到,在防御策略中存在效用和安全性之间的权衡,其中Llama2,利用我们提出的方法,与Llama1相比显示出明显更好的平衡。
更新时间: 2024-05-24 04:14:32
领域: cs.CL,cs.CR
A Simple Solution for Homomorphic Evaluation on Large Intervals
Homomorphic encryption (HE) is a promising technique used for privacy-preserving computation. Since HE schemes only support primitive polynomial operations, homomorphic evaluation of polynomial approximations for non-polynomial functions plays an important role in privacy-preserving machine learning. In this paper, we introduce a simple solution to approximating any functions, which might be overmissed by researchers: just using the neural networks for regressions. By searching decent superparameters, neural networks can achieve near-optimal computation depth for a given function with fixed precision, thereby reducing the modulus consumed. There are three main reasons why we choose neural networks for homomorphic evaluation of polynomial approximations. Firstly, neural networks with polynomial activation functions can be used to approximate whatever functions are needed in an encrypted state. This means that we can compute by one unified process for any polynomial approximation, such as that of Sigmoid or of ReLU. Secondly, by carefully finding an appropriate architecture, neural networks can efficiently evaluate a polynomial using near-optimal multiplicative depth, which would consume less modulus and therefore employ less ciphertext refreshing. Finally, as popular tools, model neural networks have many well-studied techniques that can conveniently serve our solution. Experiments showed that our method can be used for approximation of various functions. We exploit our method to the evaluation of the Sigmoid function on large intervals $[-30, +30]$, $[-50, +50]$, and $[-70, +70]$, respectively.
Updated: 2024-05-24 04:13:22
标题: 一个简单的解决方案用于大间隔上的同态评估
摘要: 同态加密(HE)是一种用于隐私保护计算的有前途的技术。由于HE方案只支持原始多项式操作,因此对于非多项式函数的多项式逼近的同态评估在隐私保护机器学习中起着重要作用。在本文中,我们提出了一种简单的逼近任何函数的解决方案,这可能是研究人员忽视的:只需使用神经网络进行回归。通过搜索合适的超参数,神经网络可以实现给定函数的近似最优计算深度,从而减少消耗的模数。 我们选择神经网络用于多项式逼近的同态评估的主要原因有三个。首先,具有多项式激活函数的神经网络可用于逼近加密状态下所需的任何函数。这意味着我们可以通过一个统一的过程计算任何多项式逼近,例如Sigmoid或ReLU的逼近。其次,通过仔细找到一个合适的结构,神经网络可以高效地评估多项式,使用近似最优的乘法深度,从而消耗更少的模数,因此使用更少的密文刷新。最后,作为流行工具,模型神经网络具有许多经过深入研究的技术,可以方便地为我们的解决方案提供服务。 实验证明我们的方法可以用于各种函数的逼近。我们利用我们的方法对Sigmoid函数在大区间[-30,+30],[-50,+50]和[-70,+70]上进行评估。
更新时间: 2024-05-24 04:13:22
领域: cs.CR
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits
The Indexed Minimum Empirical Divergence (IMED) algorithm is a highly effective approach that offers a stronger theoretical guarantee of the asymptotic optimality compared to the Kullback--Leibler Upper Confidence Bound (KL-UCB) algorithm for the multi-armed bandit problem. Additionally, it has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. Despite its effectiveness, the generalization of this algorithm to contextual bandits with linear payoffs has remained elusive. In this paper, we present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms. We demonstrate that LinIMED provides a $\widetilde{O}(d\sqrt{T})$ upper regret bound where $d$ is the dimension of the context and $T$ is the time horizon. Furthermore, extensive empirical studies reveal that LinIMED and its variants outperform widely-used linear bandit algorithms such as LinUCB and Linear Thompson Sampling in some regimes.
Updated: 2024-05-24 04:11:58
标题: 线性赌博机的基于最小经验分歧的索引算法
摘要: Indexed Minimum Empirical Divergence (IMED)算法是一种非常有效的方法,与Kullback-Leibler上置信区间(KL-UCB)算法相比,它在渐近最优性方面提供了更强的理论保证,适用于多臂老虎机问题。此外,据观察,它在实践中优于基于UCB的算法和汤普森采样。尽管其有效性,将此算法推广到具有线性回报的情境老虎机问题一直是困难的。在本文中,我们提出了IMED算法的线性版本,称为LinIMED算法系列。我们证明LinIMED提供了一个$\widetilde{O}(d\sqrt{T})$的上限遗憾界,其中$d$是情境的维度,$T$是时间跨度。此外,广泛的实证研究表明,在某些情况下,LinIMED及其变体优于广泛使用的线性老虎机算法,如LinUCB和线性汤普森采样。
更新时间: 2024-05-24 04:11:58
领域: cs.LG,cs.IT,math.IT
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.
Updated: 2024-05-24 03:56:20
标题: BlockFusion: 使用潜在三平面外推扩展的可扩展3D场景生成
摘要: 我们提出了BlockFusion,这是一个基于扩散的模型,将3D场景生成为单位块,并无缝地将新块整合到场景中。BlockFusion使用从完整3D场景网格中随机裁剪的3D块数据集进行训练。通过逐块拟合,所有训练块被转换为混合神经场:其中包含几何特征的三平面,然后是用于解码有符号距离值的多层感知器(MLP)。变分自动编码器被用来将三平面压缩到潜在的三平面空间,然后在该空间上进行去噪扩散过程。应用于潜在表示的扩散允许生成高质量和多样化的3D场景。在生成过程中扩展场景时,只需添加空块以与当前场景重叠,并外推现有的潜在三平面以填充新块。外推是通过在去噪迭代期间使用重叠三平面的特征样本来进行生成过程的条件化。潜在三平面外推产生语义和几何上有意义的过渡,与现有场景和谐融合。使用2D布局调整机制来控制场景元素的放置和排列。实验结果表明,BlockFusion能够生成多样化、几何一致且边界无限的大型3D场景,无论是在室内还是室外场景中,形状质量均具有前所未有的高质量。
更新时间: 2024-05-24 03:56:20
领域: cs.CV,cs.AI,cs.GR
Efficient Reinforcement Learning via Large Language Model-based Search
Reinforcement Learning (RL) suffers from sample inefficiency in sparse reward domains, and the problem is pronounced if there are stochastic transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster. However, designing a useful reward shaping function specific to each problem is challenging, even for domain experts. They would either have to rely on task-specific domain knowledge or provide an expert demonstration independently for each task. Given, that Large Language Models (LLMs) have rapidly gained prominence across a magnitude of natural language tasks, we aim to answer the following question: Can we leverage LLMs to construct a reward shaping function that can boost the sample efficiency of an RL agent? In this work, we aim to leverage off-the-shelf LLMs to generate a guide policy by solving a simpler deterministic abstraction of the original problem that can then be used to construct the reward shaping function for the downstream RL agent. Given the ineffectiveness of directly prompting LLMs, we propose MEDIC: a framework that augments LLMs with a Model-based feEDback critIC, which verifies LLM-generated outputs, to generate a possibly sub-optimal but valid plan for the abstract problem. Our experiments across domains from the BabyAI environment suite show 1) the effectiveness of augmenting LLMs with MEDIC, 2) a significant improvement in the sample complexity of PPO and A2C-based RL agents when guided by our LLM-generated plan, and finally, 3) pave the direction for further explorations of how these models can be used to augment existing RL pipelines.
Updated: 2024-05-24 03:53:57
标题: 通过大型语言模型搜索实现高效强化学习
摘要: 强化学习(RL)在稀疏奖励领域中存在样本效率低的问题,如果存在随机转换,则问题尤为突出。为了提高样本效率,奖励塑造是一种经过深入研究的方法,可以引入内在奖励,帮助RL代理更快地收敛到最优策略。然而,为每个问题设计一个有用的奖励塑造函数具有挑战性,即使对于领域专家也是如此。他们可能不得不依赖于特定任务领域知识或为每个任务提供专家演示。鉴于大型语言模型(LLMs)在自然语言任务中迅速获得重要地位,我们的目标是回答以下问题:我们是否可以利用LLMs构建一个奖励塑造函数,可以提高RL代理的样本效率?在这项工作中,我们旨在利用现成的LLMs通过解决原始问题的更简单的确定性抽象来生成指导策略,然后可以用它来构建下游RL代理的奖励塑造函数。鉴于直接提示LLMs的无效性,我们提出了MEDIC:一种将LLMs与基于模型的反馈评论员相结合的框架,用于验证LLM生成的输出,生成可能次优但有效的抽象问题计划。我们在BabyAI环境套件的各个领域进行的实验表明:1)增强LLMs与MEDIC的有效性,2)在我们生成的LLM计划的指导下,PPO和基于A2C的RL代理的样本复杂性显著提高,最后,3)为进一步探索这些模型如何被用于增强现有RL管道铺平了道路。
更新时间: 2024-05-24 03:53:57
领域: cs.LG,cs.AI
EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems
Reinforcement Learning (RL)-Based Recommender Systems (RSs) have gained rising attention for their potential to enhance long-term user engagement. However, research in this field faces challenges, including the lack of user-friendly frameworks, inconsistent evaluation metrics, and difficulties in reproducing existing studies. To tackle these issues, we introduce EasyRL4Rec, an easy-to-use code library designed specifically for RL-based RSs. This library provides lightweight and diverse RL environments based on five public datasets and includes core modules with rich options, simplifying model development. It provides unified evaluation standards focusing on long-term outcomes and offers tailored designs for state modeling and action representation for recommendation scenarios. Furthermore, we share our findings from insightful experiments with current methods. EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs. The library is available for public use.
Updated: 2024-05-24 03:45:21
标题: EasyRL4Rec:一个用于基于强化学习的推荐系统的易于使用的库
摘要: 基于强化学习的推荐系统(RSs)因其增强长期用户参与度的潜力而受到关注。然而,这一领域的研究面临挑战,包括缺乏用户友好的框架、评估指标不一致以及难以重现现有研究。为了解决这些问题,我们介绍了EasyRL4Rec,这是一个专门为基于强化学习的RSs设计的易于使用的代码库。该库提供基于五个公共数据集的轻量级和多样化的强化学习环境,并包括具有丰富选项的核心模块,简化了模型开发过程。它提供了关注长期结果的统一评估标准,并为推荐场景的状态建模和行为表示提供了定制设计。此外,我们分享了与当前方法的有见地的实验结果。EasyRL4Rec旨在促进基于强化学习的RSs领域中的模型开发和实验过程。该库可供公众使用。
更新时间: 2024-05-24 03:45:21
领域: cs.IR,cs.LG
UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to developing unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding and computationally expensive, we warm-start the transformer from pretrained LLMs and perform explicit alignment to reduce the modality gap while improving data and compute efficiency. The cross-modal UPS achieves state-of-the-art results on a wide range of 1D and 2D PDE families from PDEBench, outperforming existing unified models using 4 times less data and 26 times less compute. Meanwhile, it is capable of few-shot transfer to unseen PDE families and coefficients.
Updated: 2024-05-24 03:44:20
标题: UPS:通过跨模态适应高效构建偏微分方程求解的基础模型
摘要: 我们提出了统一的PDE求解器(UPS),这是一种数据和计算有效的方法,用于开发各种领域、维度和分辨率的时空PDE的统一神经算子。UPS将不同的PDE嵌入到共享表示空间中,并使用FNO-transformer架构进行处理。我们不是从头开始训练网络,这需要大量数据和计算资源,而是从预训练的LLMs开始热启动transformer,并进行显式对齐以减小模态差距,同时提高数据和计算效率。跨模态的UPS在来自PDEBench的广泛1D和2D PDE族中取得了最先进的结果,优于现有的统一模型,数据使用量减少了4倍,计算量减少了26倍。同时,它能够在未见过的PDE族和系数上进行少样本迁移。
更新时间: 2024-05-24 03:44:20
领域: cs.LG
Timely Fusion of Surround Radar/Lidar for Object Detection in Autonomous Driving Systems
Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating speed of surround Radar, and thus the frequency to generate Radar data frames, is much lower than surround Lidar. Existing Radar/Lidar fusion methods have to work at the low frequency of surround Radar, which cannot meet the high responsiveness requirement of autonomous driving systems.This paper develops techniques to fuse surround Radar/Lidar with working frequency only limited by the faster surround Lidar instead of the slower surround Radar, based on the state-of-the-art object detection model MVDNet. The basic idea of our approach is simple: we let MVDNet work with temporally unaligned data from Radar/Lidar, so that fusion can take place at any time when a new Lidar data frame arrives, instead of waiting for the slow Radar data frame. However, directly applying MVDNet to temporally unaligned Radar/Lidar data greatly degrades its object detection accuracy. The key information revealed in this paper is that we can achieve high output frequency with little accuracy loss by enhancing the training procedure to explore the temporal redundancy in MVDNet so that it can tolerate the temporal unalignment of input data. We explore several different ways of training enhancement and compare them quantitatively with experiments.
Updated: 2024-05-24 03:40:22
标题: 自动驾驶系统中环绕雷达/激光雷达的及时融合用于目标检测
摘要: 将雷达和激光雷达传感器数据融合可以充分利用它们互补的优势,并为自动驾驶系统提供更准确的周围重建。环绕雷达/激光雷达可以以最低成本提供360度视野采样,这些是自动驾驶系统的有前途的感知硬件解决方案。然而,由于固有的物理约束,环绕雷达的旋转速度,以及因此生成雷达数据帧的频率,远低于环绕激光雷达。现有的雷达/激光雷达融合方法必须在环绕雷达的低频率下工作,这无法满足自动驾驶系统的高响应性要求。本文提出了一种技术,将环绕雷达/激光雷达与工作频率仅受较快的环绕激光雷达限制而不是较慢的环绕雷达融合,基于最先进的目标检测模型MVDNet。我们方法的基本思想很简单:我们让MVDNet与来自雷达/激光雷达的时间不对齐的数据一起工作,这样融合可以在新的激光雷达数据帧到达时的任何时间发生,而不是等待慢速雷达数据帧。然而,直接将MVDNet应用于时间不对齐的雷达/激光雷达数据会严重降低其目标检测精度。本文揭示的关键信息是,通过增强训练过程来探索MVDNet中的时间冗余,使其能够容忍输入数据的时间不对齐,我们可以在几乎不损失准确性的情况下实现高输出频率。我们探索了几种不同的训练增强方式,并通过实验进行了定量比较。
更新时间: 2024-05-24 03:40:22
领域: cs.CV,cs.AI
An Evaluation of Estimative Uncertainty in Large Language Models
Words of estimative probability (WEPs), such as ''maybe'' or ''probably not'' are ubiquitous in natural language for communicating estimative uncertainty, compared with direct statements involving numerical probability. Human estimative uncertainty, and its calibration with numerical estimates, has long been an area of study -- including by intelligence agencies like the CIA. This study compares estimative uncertainty in commonly used large language models (LLMs) like GPT-4 and ERNIE-4 to that of humans, and to each other. Here we show that LLMs like GPT-3.5 and GPT-4 align with human estimates for some, but not all, WEPs presented in English. Divergence is also observed when the LLM is presented with gendered roles and Chinese contexts. Further study shows that an advanced LLM like GPT-4 can consistently map between statistical and estimative uncertainty, but a significant performance gap remains. The results contribute to a growing body of research on human-LLM alignment.
Updated: 2024-05-24 03:39:31
标题: 大型语言模型中估计不确定性的评估
摘要: 评估概率的词(WEPs),如“可能”或“可能不”在自然语言中普遍用于传达估计不确定性,与涉及数值概率的直接陈述相比。人类的估计不确定性及其与数值估计的校准长期以来一直是一个研究领域,包括像中央情报局这样的情报机构。本研究比较了像GPT-4和ERNIE-4这样的常用大型语言模型(LLMs)中的估计不确定性与人类的估计不确定性以及彼此之间的差异。在这里,我们展示了像GPT-3.5和GPT-4这样的LLMs与人类对一些英文WEPs的估计是一致的,但并非全部。当LLM被呈现出性别角色和中国背景时也观察到分歧。进一步研究表明,像GPT-4这样的高级LLM可以在统计和估计不确定性之间始终建立映射,但仍存在显著的性能差距。这些结果有助于日益增长的关于人类-LLM对齐的研究。
更新时间: 2024-05-24 03:39:31
领域: cs.CL,cs.AI,cs.HC
TrojanForge: Adversarial Hardware Trojan Examples with Reinforcement Learning
The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently been key in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and detection capabilities. HT insertion with ML techniques, specifically, has seen a spike in research activity due to the shortcomings of conventional HT benchmarks and the inherent human design bias that occurs when we create them. This work continues this innovation by presenting a tool called "TrojanForge", capable of generating HT adversarial examples that defeat HT detectors; demonstrating the capabilities of GAN-like adversarial tools for automatic HT insertion. We introduce an RL environment where the RL insertion agent interacts with HT detectors in an insertion-detection loop where the agent collects rewards based on its success in bypassing HT detectors. Our results show that this process leads to inserted HTs that evade various HT detectors, achieving high attack success percentages. This tool provides insight into why HT insertion fails in some instances and how we can leverage this knowledge in defense.
Updated: 2024-05-24 03:37:32
标题: 特洛伊锻造:具有强化学习的敌对硬件特洛伊示例
摘要: 硬件特洛伊木马(HT)问题可以被看作是攻击者和防御者之间的持续博弈,双方都在努力利用任何可用手段获取优势来智胜对方。机器学习(ML)最近在推进HT研究方面起到关键作用。各种新颖技术,如强化学习(RL)和图神经网络(GNNs),已经展示了HT插入和检测的能力。具体来说,基于ML技术的HT插入由于传统HT基准的缺陷和人类设计偏见的固有缺陷而引起了研究活跃度的上升。本研究通过介绍一种名为“TrojanForge”的工具,能够生成能够打败HT检测器的HT对抗样本,展示了类似GAN的对抗工具在自动HT插入方面的能力。我们引入了一个RL环境,在这个环境中,RL插入代理与HT检测器在插入-检测循环中互动,代理根据其成功绕过HT检测器而收集奖励。我们的结果表明,这一过程导致插入的HT能够逃避各种HT检测器,实现高攻击成功率。这个工具为我们提供了洞察力,解释了为什么在某些情况下HT插入会失败,以及我们如何利用这些知识进行防御。
更新时间: 2024-05-24 03:37:32
领域: cs.CR,cs.AR,cs.LG,B.8.1
Towards Geometry-Aware Pareto Set Learning for Neural Multi-Objective Combinatorial Optimization
Multi-objective combinatorial optimization (MOCO) problems are prevalent in various real-world applications. Most existing neural MOCO methods rely on problem decomposition to transform an MOCO problem into a series of singe-objective combinatorial optimization (SOCO) problems. However, these methods often approximate partial regions of the Pareto front and spend excessive time on diversity enhancement because of ambiguous decomposition and time-consuming precise hypervolume calculation. To address these limitations, we design a Geometry-Aware Pareto set Learning algorithm named GAPL, which provides a novel geometric perspective for neural MOCO via a Pareto attention model based on hypervolume expectation maximization. In addition, we propose a hypervolume residual update strategy to enable the Pareto attention model to capture both local and non-local information of the Pareto set/front. We also design a novel inference approach to further improve quality of the solution set and speed up hypervolume calculation. Experimental results on three classic MOCO problems demonstrate that our GAPL outperforms several state-of-the-art baselines via superior decomposition and efficient diversity enhancement.
Updated: 2024-05-24 03:36:10
标题: 朝向几何感知帕累托集学习,用于神经多目标组合优化
摘要: 多目标组合优化(MOCO)问题在各种实际应用中很普遍。大多数现有的神经MOCO方法依赖于问题分解,将MOCO问题转化为一系列单目标组合优化(SOCO)问题。然而,由于模糊的分解和耗时的精确超体积计算,这些方法通常只能近似地表示帕累托前沿的部分区域,并花费过多时间在多样性增强上。为了解决这些限制,我们设计了一种名为GAPL的几何感知帕累托集学习算法,通过基于超体积期望最大化的帕累托注意力模型为神经MOCO提供了一种新颖的几何视角。此外,我们提出了一种超体积残差更新策略,使帕累托注意力模型能够捕捉帕累托集/前沿的局部和非局部信息。我们还设计了一种新颖的推理方法,进一步提高解集的质量并加快超体积计算的速度。对三个经典的MOCO问题的实验结果表明,我们的GAPL通过优越的分解和高效的多样性增强,优于几个最先进的基准方法。
更新时间: 2024-05-24 03:36:10
领域: cs.LG,cs.AI
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. We discover that via implicit preferences, where a visual trajectory inherently aligns better with its corresponding language instruction than mismatched pairs, the popular Bradley-Terry model can transform into representation learning through proper reward reparameterizations. The resulted framework, DecisionNCE, mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features, with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning. Project Page: https://2toinf.github.io/DecisionNCE/
Updated: 2024-05-24 03:31:50
标题: DecisionNCE:通过隐式偏好学习实现具体多模态表示
摘要: 多模态预训练是自主机器人表示学习三大目标的有效策略:1)提取本地和全局任务进展;2)强化视觉表示的时间一致性;3)捕捉轨迹级别的语言基础。大多数现有方法通过单独的目标来实现这些目标,通常达到次优解。在本文中,我们提出了一个通用统一目标,可以同时从图像序列中提取有意义的任务进展信息,并将其与语言指令无缝对齐。我们发现,通过隐式偏好,其中一个视觉轨迹与其对应的语言指令自然地比不匹配的对更好地对齐,流行的Bradley-Terry模型可以通过适当的奖励重参数化转化为表示学习。所得到的框架DecisionNCE,类似于InfoNCE风格的目标,但专门为决策任务定制,提供了一个优雅地提取本地和全局任务进展特征的具身体表示学习框架,通过隐式时间对比学习强化时间一致性,同时通过多模态联合编码确保轨迹级别指令基础。对模拟和实际机器人进行评估表明,DecisionNCE有效地促进了各种下游策略学习任务,为统一表示和奖励学习提供了多功能解决方案。项目页面:https://2toinf.github.io/DecisionNCE/
更新时间: 2024-05-24 03:31:50
领域: cs.RO,cs.AI,cs.CL,cs.CV,cs.LG
RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation
Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies against poisoning attacks rely on the analysis of local updates in plaintext, making them incompatible with SecAgg. To reconcile the conflicts, we propose a robust federated learning framework against poisoning attacks (RFLPA) based on SecAgg protocol. Our framework computes the cosine similarity between local updates and server updates to conduct robust aggregation. Furthermore, we leverage verifiable packed Shamir secret sharing to achieve reduced communication cost of $O(M+N)$ per user, and design a novel dot-product aggregation algorithm to resolve the issue of increased information leakage. Our experimental results show that RFLPA significantly reduces communication and computation overhead by over $75\%$ compared to the state-of-the-art method, BREA, while maintaining competitive accuracy.
Updated: 2024-05-24 03:31:10
标题: RFLPA: 一种抗毒化攻击的强大联合学习框架,具有安全聚合功能
摘要: 联邦学习(FL)允许多个设备在不共享数据的情况下共同训练模型。尽管具有诸多优点,FL容易受到隐私泄露和毒化攻击的威胁。为了解决隐私问题,通常使用安全聚合(SecAgg)来在服务器上获取梯度的聚合,而无需检查个别用户的更新。不幸的是,现有的防御策略依赖于明文本地更新的分析,因此与SecAgg不兼容。为了解决这些冲突,我们提出了一种基于SecAgg协议的抗毒化攻击的强大联邦学习框架(RFLPA)。我们的框架计算本地更新和服务器更新之间的余弦相似度,以进行强大的聚合。此外,我们利用可验证的打包Shamir秘密共享来实现每个用户的通信成本降低到$O(M+N)$,并设计了一种新颖的点积聚合算法来解决信息泄露增加的问题。我们的实验结果显示,与现有最先进的方法BREA相比,RFLPA在保持竞争性准确性的同时,将通信和计算开销显著减少了超过75%。
更新时间: 2024-05-24 03:31:10
领域: cs.CR,cs.AI,E.4
TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality.
Updated: 2024-05-24 03:27:54
标题: TACO: 时间潜在动作驱动对比损失用于视觉强化学习
摘要: 尽管最近在从原始像素数据中进行强化学习(RL)方面取得了进展,但样本效率仍然是一个重大障碍。先前的工作尝试通过创建自监督辅助任务来解决这一挑战,旨在丰富代理学习表示,以便将控制相关信息用于未来状态预测。然而,这些目标通常不足以学习能够表示最优策略或值函数的表示,并且它们通常考虑具有小型、抽象的离散动作空间的任务,因此忽视了连续控制中动作表示学习的重要性。在本文中,我们介绍了TACO:时间驱动对比学习,这是一种简单而强大的时间对比学习方法,有助于代理同时获取潜在状态和动作表示。TACO通过优化当前状态的表示与动作序列以及相应未来状态的表示之间的互信息来同时学习状态和动作表示。从理论上讲,TACO可以被证明学习到足够用于控制的状态和动作表示,从而提高样本效率。对于在线RL,在Deepmind控制套件中的九个具有挑战性的视觉连续控制任务中,TACO在一百万个环境交互步骤后平均实现了40%的性能提升。此外,我们展示了TACO还可以作为一种即插即用的模块添加到现有的离线视觉RL方法中,以在具有不同质量的离线数据集上建立新的离线视觉RL的最新性能。
更新时间: 2024-05-24 03:27:54
领域: cs.LG,cs.AI
Transformers for Image-Goal Navigation
Visual perception and navigation have emerged as major focus areas in the field of embodied artificial intelligence. We consider the task of image-goal navigation, where an agent is tasked to navigate to a goal specified by an image, relying only on images from an onboard camera. This task is particularly challenging since it demands robust scene understanding, goal-oriented planning and long-horizon navigation. Most existing approaches typically learn navigation policies reliant on recurrent neural networks trained via online reinforcement learning. However, training such policies requires substantial computational resources and time, and performance of these models is not reliable on long-horizon navigation. In this work, we present a generative Transformer based model that jointly models image goals, camera observations and the robot's past actions to predict future actions. We use state-of-the-art perception models and navigation policies to learn robust goal conditioned policies without the need for real-time interaction with the environment. Our model demonstrates capability in capturing and associating visual information across long time horizons, helping in effective navigation. NOTE: This work was submitted as part of a Master's Capstone Project and must be treated as such. This is still an early work in progress and not the final version.
Updated: 2024-05-24 03:25:08
标题: 图像目标导航的变压器
摘要: 视觉感知和导航已经成为体现人工智能领域的主要关注点。我们考虑了图像目标导航的任务,其中一个代理被指定导航到一个由图像指定的目标,仅依赖于来自机载摄像头的图像。这个任务特别具有挑战性,因为它要求强大的场景理解、目标导向规划和长期导航。大多数现有方法通常通过在线强化学习训练依赖于递归神经网络的导航策略。然而,训练这样的策略需要大量的计算资源和时间,而这些模型在长期导航上的性能并不可靠。在这项工作中,我们提出了一种基于生成Transformer的模型,它联合建模图像目标、摄像头观察和机器人过去的动作,以预测未来的动作。我们使用最先进的感知模型和导航策略来学习强大的目标条件策略,而无需与环境进行实时交互。我们的模型展示了在长时间范围内捕捉和关联视觉信息的能力,有助于有效导航。 注:这项工作是作为硕士毕业项目的一部分提交的,并且必须作为此类对待。这仍然是一个初期的工作进展,而不是最终版本。
更新时间: 2024-05-24 03:25:08
领域: cs.RO,cs.CV,cs.LG,I.2.9; I.2.10; I.4.9
Repeat-Aware Neighbor Sampling for Dynamic Graph Learning
Dynamic graph learning equips the edges with time attributes and allows multiple links between two nodes, which is a crucial technology for understanding evolving data scenarios like traffic prediction and recommendation systems. Existing works obtain the evolving patterns mainly depending on the most recent neighbor sequences. However, we argue that whether two nodes will have interaction with each other in the future is highly correlated with the same interaction that happened in the past. Only considering the recent neighbors overlooks the phenomenon of repeat behavior and fails to accurately capture the temporal evolution of interactions. To fill this gap, this paper presents RepeatMixer, which considers evolving patterns of first and high-order repeat behavior in the neighbor sampling strategy and temporal information learning. Firstly, we define the first-order repeat-aware nodes of the source node as the destination nodes that have interacted historically and extend this concept to high orders as nodes in the destination node's high-order neighbors. Then, we extract neighbors of the source node that interacted before the appearance of repeat-aware nodes with a slide window strategy as its neighbor sequence. Next, we leverage both the first and high-order neighbor sequences of source and destination nodes to learn temporal patterns of interactions via an MLP-based encoder. Furthermore, considering the varying temporal patterns on different orders, we introduce a time-aware aggregation mechanism that adaptively aggregates the temporal representations from different orders based on the significance of their interaction time sequences. Experimental results demonstrate the superiority of RepeatMixer over state-of-the-art models in link prediction tasks, underscoring the effectiveness of the proposed repeat-aware neighbor sampling strategy.
Updated: 2024-05-24 03:24:29
标题: 动态图学习中的重复感知邻居采样
摘要: 动态图学习为边缘赋予时间属性,并允许两个节点之间存在多个链接,这是理解交通预测和推荐系统等不断发展的数据场景的关键技术。现有研究主要依赖于最近的邻居序列来获取演变模式。然而,我们认为两个节点是否在未来互动与过去发生的相同互动密切相关。仅考虑最近的邻居会忽视重复行为的现象,并无法准确捕捉互动的时间演变。为填补这一空白,本文提出了RepeatMixer,其考虑了邻居采样策略和时间信息学习中的一阶和高阶重复行为的演变模式。首先,我们将源节点的一阶重复感知节点定义为历史上曾与其互动的目标节点,并将此概念扩展到高阶邻居作为目标节点的节点。然后,我们使用滑动窗口策略提取在重复感知节点出现之前与源节点互动的邻居序列。接下来,我们利用源节点和目标节点的一阶和高阶邻居序列通过基于MLP的编码器学习互动的时间模式。此外,考虑到不同阶的时间模式变化,我们引入了一种时间感知聚合机制,根据互动时间序列的重要性自适应聚合不同阶的时间表示。实验结果表明,RepeatMixer在链路预测任务中优于现有模型,强调了提出的重复感知邻居采样策略的有效性。
更新时间: 2024-05-24 03:24:29
领域: cs.LG,cs.AI,cs.SI
FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing
Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such unconstrained adaptability has also been utilized for illegal purposes, such as forging public figures' portraits and duplicating copyrighted artworks. Most existing work focuses on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffusion models. Other schemes of model unlearning and reinitialization, similarly, cannot prevent users from relearning the knowledge of illegal model adaptation with custom data. In this paper, we present FreezeAsGuard, a new technique that addresses these limitations and enables irreversible mitigation of illegal adaptations of diffusion models. The basic approach is that the model publisher selectively freezes tensors in pre-trained diffusion models that are critical to illegal model adaptations, to mitigate the fine-tuned model's representation power in illegal domains but minimize the impact on legal model adaptations in other domains. Such tensor freezing can be enforced via APIs provided by the model publisher for fine-tuning, can motivate users' adoption due to its computational savings. Experiment results with datasets in multiple domains show that FreezeAsGuard provides stronger power in mitigating illegal model adaptations of generating fake public figures' portraits, while having the minimum impact on model adaptation in other legal domains. The source code is available at: https://github.com/pittisl/FreezeAsGuard/
Updated: 2024-05-24 03:23:51
标题: FreezeAsGuard: 通过选择性张量冻结减轻扩散模型的非法适应
摘要: 文本到图像扩散模型可以在自定义领域进行微调,以适应特定用户偏好,但这种无约束的适应性也被用于非法目的,例如伪造公众人物的肖像和复制受版权保护的艺术作品。大多数现有工作集中在检测非法生成的内容,但无法阻止或减轻对扩散模型的非法适应。同样,其他模型遗忘和重新初始化方案也无法阻止用户通过自定义数据重新学习非法模型适应的知识。在本文中,我们提出了一种新技术FreezeAsGuard,解决了这些限制,并实现了对扩散模型非法适应的不可逆缓解。基本方法是,模型发布者有选择地冻结预训练扩散模型中对非法模型适应至关重要的张量,以减轻在非法领域微调的模型表示能力,同时最小化对其他领域合法模型适应的影响。这种张量冻结可以通过模型发布者提供的用于微调的API强制执行,可以激励用户采用,因为它节省了计算资源。使用多个领域的数据集进行的实验结果显示,FreezeAsGuard在减轻生成虚假公众人物肖像的非法模型适应方面具有更强的能力,同时对其他合法领域的模型适应的影响最小化。源代码可在以下网址找到:https://github.com/pittisl/FreezeAsGuard/
更新时间: 2024-05-24 03:23:51
领域: cs.LG,cs.AI,cs.CR,cs.CV
Diffusion Actor-Critic with Entropy Regulator
Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER). This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy. Since the distribution of the diffusion policy lacks an analytical expression, its entropy cannot be determined analytically. To mitigate this, we propose a method to estimate the entropy of the diffusion policy utilizing Gaussian mixture model. Building on the estimated entropy, we can learn a parameter $\alpha$ that modulates the degree of exploration and exploitation. Parameter $\alpha$ will be employed to adaptively regulate the variance of the added noise, which is applied to the action output by the diffusion model. Experimental trials on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting a stronger representational capacity of the diffusion policy.
Updated: 2024-05-24 03:23:27
标题: 扩散演员-评论家与熵调节器
摘要: 强化学习(RL)在解决复杂的决策和控制任务方面已被证明非常有效。然而,在大多数传统的RL算法中,策略通常被参数化为带有学习均值和方差的对角高斯分布,这限制了它们获取复杂策略的能力。针对这个问题,我们提出了一种名为扩散演员-评论家与熵调节器(DACER)的在线RL算法。该算法将扩散模型的逆向过程构想为一种新颖的策略函数,并利用扩散模型适应多模态分布的能力,从而增强了策略的表征能力。由于扩散策略的分布缺乏解析表达式,其熵无法通过解析方法确定。为了缓解这一问题,我们提出了一种利用高斯混合模型估计扩散策略熵的方法。基于估计的熵,我们可以学习一个调节探索和利用程度的参数α。参数α将被用来自适应调节添加到扩散模型输出的动作的噪声的方差。在MuJoCo基准测试和多模态任务上的实验验证表明,DACER算法在大多数MuJoCo控制任务中表现出最先进的性能,同时展示了扩散策略的更强表征能力。
更新时间: 2024-05-24 03:23:27
领域: cs.LG,cs.AI
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models
The success of current Large-Language Models (LLMs) hinges on extensive training data that is collected and stored centrally, called Centralized Learning (CL). However, such a collection manner poses a privacy threat, and one potential solution is Federated Learning (FL), which transfers gradients, not raw data, among clients. Unlike traditional networks, FL for LLMs incurs significant communication costs due to their tremendous parameters. This study introduces an innovative approach to compress gradients to improve communication efficiency during LLM FL, formulating the new FL pipeline named CG-FedLLM. This approach integrates an encoder on the client side to acquire the compressed gradient features and a decoder on the server side to reconstruct the gradients. We also developed a novel training strategy that comprises Temporal-ensemble Gradient-Aware Pre-training (TGAP) to identify characteristic gradients of the target model and Federated AutoEncoder-Involved Fine-tuning (FAF) to compress gradients adaptively. Extensive experiments confirm that our approach reduces communication costs and improves performance (e.g., average 3 points increment compared with traditional CL- and FL-based fine-tuning with LlaMA on a well-recognized benchmark, C-Eval). This improvement is because our encoder-decoder, trained via TGAP and FAF, can filter gradients while selectively preserving critical features. Furthermore, we present a series of experimental analyses focusing on the signal-to-noise ratio, compression rate, and robustness within this privacy-centric framework, providing insight into developing more efficient and secure LLMs.
Updated: 2024-05-24 03:17:41
标题: CG-FedLLM:如何在大型语言模型的联邦调优中压缩梯度
摘要: 目前大型语言模型(LLMs)的成功取决于集中收集和存储的大量训练数据,称为集中式学习(CL)。然而,这种收集方式存在隐私威胁,一种潜在的解决方案是联邦学习(FL),它在客户端之间传输梯度而非原始数据。与传统网络不同,LLMs的FL会产生显著的通信成本,因为其参数庞大。本研究引入了一种创新方法,用于压缩梯度以提高LLM FL期间的通信效率,形成了名为CG-FedLLM的新FL管道。该方法在客户端集成了一个编码器来获取压缩梯度特征,并在服务器端集成了一个解码器来重建梯度。我们还开发了一个新颖的训练策略,包括Temporal-ensemble Gradient-Aware Pre-training(TGAP)来识别目标模型的特征梯度,以及Federated AutoEncoder-Involved Fine-tuning(FAF)来自适应地压缩梯度。大量实验证实,我们的方法减少了通信成本并提高了性能(例如,在一个公认的基准测试C-Eval上,与传统的CL和FL基础上使用LlaMA进行微调相比,平均提高了3个点)。这种改进是因为我们通过TGAP和FAF训练的编码器-解码器可以在选择性保留关键特征的同时过滤梯度。此外,我们提供了一系列实验分析,重点关注隐私中心框架中的信噪比、压缩率和稳健性,为开发更高效和安全的LLMs提供了见解。
更新时间: 2024-05-24 03:17:41
领域: cs.LG,cs.AI,cs.DC
Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information
Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective compression and quantization techniques are crucial for addressing these issues, reducing memory footprint and computational requirements without significantly compromising performance. Traditional methods that uniformly map parameters to compressed spaces fail to account for the uneven distribution of parameters, leading to substantial accuracy loss. In this work, we propose Athena, a novel algorithm for efficient block-wise post-training quantization of LLMs. Athena leverages Second-Order Matrix Derivative Information to guide the quantization process using the curvature information of the loss landscape. By grouping parameters by columns or rows and iteratively optimizing the quantization process, Athena updates the model parameters and Hessian matrix to achieve significant compression while maintaining high accuracy. This makes Athena a practical solution for deploying LLMs in various settings.
Updated: 2024-05-24 03:14:29
标题: 雅典娜:利用二阶矩阵导数信息对大型语言模型进行高效的分块后训练量化
摘要: 大型语言模型(LLMs)显著推动了自然语言处理任务,如机器翻译、文本生成和情感分析。然而,它们通常包含数十亿参数,其巨大大小给存储、计算和部署带来挑战,尤其是在资源受限的环境中,如移动设备和边缘计算平台。有效的压缩和量化技术对于解决这些问题至关重要,可以在不显著影响性能的情况下减少内存占用和计算需求。传统方法将参数均匀映射到压缩空间无法考虑参数的不均匀分布,导致严重的精度损失。在本研究中,我们提出了一种名为Athena的新算法,用于对LLMs进行高效的基于块的训练后量化。Athena利用二阶矩阵导数信息,通过损失景观的曲率信息指导量化过程。通过按列或行对参数进行分组,并通过迭代优化量化过程,Athena更新模型参数和Hessian矩阵以实现显著的压缩同时保持高精度。这使得Athena成为在各种环境中部署LLMs的实际解决方案。
更新时间: 2024-05-24 03:14:29
领域: cs.LG,cs.AI,cs.CL
Learning the Distribution Map in Reverse Causal Performative Prediction
In numerous predictive scenarios, the predictive model affects the sampling distribution; for example, job applicants often meticulously craft their resumes to navigate through a screening systems. Such shifts in distribution are particularly prevalent in the realm of social computing, yet, the strategies to learn these shifts from data remain remarkably limited. Inspired by a microeconomic model that adeptly characterizes agents' behavior within labor markets, we introduce a novel approach to learn the distribution shift. Our method is predicated on a reverse causal model, wherein the predictive model instigates a distribution shift exclusively through a finite set of agents' actions. Within this framework, we employ a microfoundation model for the agents' actions and develop a statistically justified methodology to learn the distribution shift map, which we demonstrate to be effective in minimizing the performative prediction risk.
Updated: 2024-05-24 03:12:13
标题: 学习逆因果执行预测中的分布映射
摘要: 在许多预测场景中,预测模型会影响采样分布;例如,求职者经常精心制作简历以通过筛选系统。这种分布的转变在社交计算领域尤为普遍,然而,从数据中学习这些转变的策略仍然非常有限。受到微观经济模型的启发,该模型熟练地描述了劳动市场中代理行为,我们引入了一种新颖的方法来学习分布转变。我们的方法建立在一个反向因果模型的基础上,其中预测模型仅通过有限的一组代理行为引发分布转变。在这个框架内,我们采用了一个代理行为的微基础模型,并开发了一个经过统计验证的方法来学习分布转变图,我们证明这种方法在最小化表现预测风险方面是有效的。
更新时间: 2024-05-24 03:12:13
领域: stat.ML,cs.LG
ProDAG: Projection-induced variational inference for directed acyclic graphs
Directed acyclic graph (DAG) learning is a rapidly expanding field of research. Though the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. Our article addresses the difficult task of quantifying graph uncertainty by developing a variational Bayes inference framework based on novel distributions that have support directly on the space of DAGs. The distributions, which we use to form our prior and variational posterior, are induced by a projection operation, whereby an arbitrary continuous distribution is projected onto the space of sparse weighted acyclic adjacency matrices (matrix representations of DAGs) with probability mass on exact zeros. Though the projection constitutes a combinatorial optimization problem, it is solvable at scale via recently developed techniques that reformulate acyclicity as a continuous constraint. We empirically demonstrate that our method, ProDAG, can deliver accurate inference, and often outperforms existing state-of-the-art alternatives.
Updated: 2024-05-24 03:04:28
标题: ProDAG: 基于投影的有向无环图的变分推断
摘要: 有向无环图(DAG)学习是一个快速发展的研究领域。虽然在过去几年中,该领域取得了显著进展,但从数据中学习单个(点估计)DAG仍然在统计和计算上具有挑战性,更不用说提供不确定性量化了。我们的文章通过开发基于新颖分布的变分贝叶斯推断框架来解决量化图的不确定性这一困难任务,这些分布直接支持DAG空间。我们用这些分布构建我们的先验和变分后验,这些分布是通过投影操作引入的,通过投影操作,任意连续分布被投影到稀疏加权无环邻接矩阵空间(DAG的矩阵表示),且概率质量集中在精确的零点上。尽管投影构成一个组合优化问题,但通过最近开发的技术,将无环性重新阐述为一个连续约束,可以在规模上解决。我们在实证中证明,我们的方法ProDAG能够提供准确的推断,并且通常优于现有的最先进替代方案。
更新时间: 2024-05-24 03:04:28
领域: stat.ML,cs.LG
A Dataset for Research on Water Sustainability
Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2019 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it. Our dataset is publicly available at Open Science Framework (OSF)
Updated: 2024-05-24 02:59:52
标题: 一个用于研究水资源可持续性的数据集
摘要: 淡水短缺是一个全球性问题,需要各行业部门共同努力。然而,由于缺乏操作性水足迹数据,许多应用无法探索隐藏在时间和空间变化中的优化机会。为了突破这一障碍,我们建立了一个数据集,用于冷却系统中的直接用水和电力发电中的间接用水。我们的数据集包括2019年至2023年美国主要城市和州的每小时用水效率。我们还提供了捕捉天气对水效率影响的冷却系统模型。我们对数据集进行了初步分析,并讨论了三个潜在的应用程序可以从中受益。我们的数据集可以在Open Science Framework(OSF)上公开获取。
更新时间: 2024-05-24 02:59:52
领域: cs.LG,cs.AI,cs.CY,cs.PF
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities
The advent of artificial intelligence has led to a growing emphasis on data-driven modeling in macroeconomics, with agent-based modeling (ABM) emerging as a prominent bottom-up simulation paradigm. In ABM, agents (e.g., households, firms) interact within a macroeconomic environment, collectively generating market dynamics. Existing agent modeling typically employs predetermined rules or learning-based neural networks for decision-making. However, customizing each agent presents significant challenges, complicating the modeling of agent heterogeneity. Additionally, the influence of multi-period market dynamics and multifaceted macroeconomic factors are often overlooked in decision-making processes. In this work, we introduce EconAgent, a large language model-empowered agent with human-like characteristics for macroeconomic simulation. We first construct a simulation environment that incorporates various market dynamics driven by agents' decisions regarding work and consumption. Through the perception module, we create heterogeneous agents with distinct decision-making mechanisms. Furthermore, we model the impact of macroeconomic trends using a memory module, which allows agents to reflect on past individual experiences and market dynamics. Simulation experiments show that EconAgent can make realistic decisions, leading to more reasonable macroeconomic phenomena compared to existing rule-based or learning-based agents. Our codes are released at https://github.com/tsinghua-fib-lab/ACL24-EconAgent.
Updated: 2024-05-24 02:53:59
标题: EconAgent: 大型语言模型增强代理模拟宏观经济活动
摘要: 人工智能的出现导致宏观经济学中对数据驱动建模的重视不断增加,代理人模型(ABM)作为一种突出的自底向上模拟范式逐渐崭露头角。在代理人模型中,代理人(如家庭、公司)在宏观经济环境中相互作用,共同生成市场动态。现有的代理人建模通常采用预定规则或基于学习的神经网络进行决策。然而,定制每个代理人存在重大挑战,使代理人异质性的建模变得复杂。此外,在决策过程中,经常忽视多期市场动态和多方面宏观经济因素的影响。在这项工作中,我们介绍了EconAgent,这是一个具有类人特征的大型语言模型强化代理人,用于宏观经济模拟。我们首先构建了一个模拟环境,其中包含由代理人关于工作和消费决策驱动的各种市场动态。通过感知模块,我们创建了具有不同决策机制的异质代理人。此外,我们使用记忆模块对宏观经济趋势的影响进行建模,使代理人能够反思过去的个人经历和市场动态。模拟实验表明,与现有基于规则或学习的代理人相比,EconAgent能够做出现实决策,导致更加合理的宏观经济现象。我们的代码已发布在https://github.com/tsinghua-fib-lab/ACL24-EconAgent。
更新时间: 2024-05-24 02:53:59
领域: cs.AI
Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand Prediction
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swapping services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swapping services favors mobile battery-swapping stations, but as the system matures, fixed-location stations are preferred.
Updated: 2024-05-24 02:44:43
标题: 促进货车电池更换服务的空间-时间需求预测
摘要: 将重型卡车电气化为减少碳排放提供了重要机遇,有助于向碳中和的未来迈进。然而,重型卡车限制的电池能量和车重导致里程减少和充电时间延长的挑战。因此,电池更换服务成为这些卡车的一个吸引人的解决方案。本文采用两方面方法来研究和增强这种服务的潜力和效果。首先,采用时空需求预测模型来预测未来几小时的交通模式。随后,预测指导一个优化模块,用于高效的电池分配和部署。通过分析跨越2500英里的公路网络上的重型卡车数据,我们的模型和分析强调了预测/机器学习在促进未来决策制定中的价值。特别是,我们发现实施电池更换服务的初始阶段偏向于移动电池更换站,但随着系统成熟,固定位置站更受青睐。
更新时间: 2024-05-24 02:44:43
领域: eess.SY,cs.AI,cs.SY,90B06, 68T07
A Solution-based LLM API-using Methodology for Academic Information Seeking
Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence. The addition of the solution reduces the difficulty for the model to understand the complex relationships between APIs. Code improves the efficiency of reasoning. To evaluate SoAy, we introduce SoAyBench, an evaluation benchmark accompanied by SoAyEval, built upon a cloned environment of APIs from AMiner. Experimental results demonstrate a 34.58-75.99\% performance improvement compared to state-of-the-art LLM API-based baselines. All datasets, codes, tuned models, and deployed online services are publicly accessible at https://github.com/RUCKBReasoning/SoAy.
Updated: 2024-05-24 02:44:14
标题: 一个基于解决方案的用于学术信息检索的LLM API使用方法论
摘要: 将大型语言模型(LLMs)应用于学术API使用显示出在减少研究人员学术信息获取努力方面具有潜力。然而,目前的LLM API使用方法在学术查询中常遇到的复杂API耦合方面存在困难。为了解决这个问题,我们介绍了SoAy,一种基于解决方案的LLM API使用方法,用于学术信息获取。它使用带有解决方案的代码作为推理方法,其中解决方案是预先构建的API调用序列。解决方案的添加减少了模型理解API之间复杂关系的难度。代码提高了推理的效率。 为了评估SoAy,我们介绍了SoAyBench,一个评估基准,并伴随着SoAyEval,建立在AMiner的API克隆环境之上。实验结果显示,与最先进的LLM API基准相比,性能提升了34.58-75.99\%。所有数据集、代码、调整过的模型和部署的在线服务都可以通过https://github.com/RUCKBReasoning/SoAy进行公开访问。
更新时间: 2024-05-24 02:44:14
领域: cs.CL,cs.AI,cs.SE
HTN-Based Tutors: A New Intelligent Tutoring Framework Based on Hierarchical Task Networks
Intelligent tutors have shown success in delivering a personalized and adaptive learning experience. However, there exist challenges regarding the granularity of knowledge in existing frameworks and the resulting instructions they can provide. To address these issues, we propose HTN-based tutors, a new intelligent tutoring framework that represents expert models using Hierarchical Task Networks (HTNs). Like other tutoring frameworks, it allows flexible encoding of different problem-solving strategies while providing the additional benefit of a hierarchical knowledge organization. We leverage the latter to create tutors that can adapt the granularity of their scaffolding. This organization also aligns well with the compositional nature of skills.
Updated: 2024-05-24 02:38:22
标题: 基于HTN的导师:基于分层任务网络的新智能辅导框架
摘要: 智能导师在提供个性化和自适应学习体验方面取得成功。然而,现有框架中存在关于知识粒度和由此产生的指导的挑战。为了解决这些问题,我们提出了基于HTN的导师,这是一个新的智能辅导框架,使用层次任务网络(HTNs)来表示专家模型。与其他辅导框架一样,它允许灵活地编码不同的问题解决策略,同时提供了层次化知识组织的额外好处。我们利用后者来创建可以调整其支撑粒度的导师。这种组织也与技能的组成性质相吻合。
更新时间: 2024-05-24 02:38:22
领域: cs.AI,cs.HC
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.
Updated: 2024-05-24 02:36:07
标题: 从弗雷格到chatGPT:语言、认知和深度神经网络中的组合性
摘要: 组合性长期以来被认为是人类智能的一个关键解释属性:任意概念可以组合成新颖的复杂组合,从有限的学习经验中获得开放式、潜在无限的表达能力。有影响力的论点认为神经网络无法解释这一行为特征,导致许多人认为它们不适用于人类认知的模型。然而,在过去的十年里,现代深度神经网络(DNNs),与其前身共享相同的基本设计原则,已经成为人工智能的主导,展示出迄今为止在机器中所展示的最先进的认知行为。特别是,大型语言模型(LLMs),是在大量文本语料库上训练以预测下一个单词的DNNs,已被证明能够展现出如撰写句法复杂且没有语法错误的句子、产生连贯的推理链,甚至编写原创的计算机程序等复杂行为,这些行为被认为需要组合处理。在本章中,我们调查了来自机器学习的最新实证研究,为哲学、认知科学和神经科学中的广泛受众提供了背景,将最新的突破放置在有关组合性的哲学论点的更广泛背景下。具体来说,我们的综述强调了两种赋予神经网络组合泛化能力的方法:(1)架构归纳偏见,和(2)元学习,或者学会学习。我们还提出了一些发现,表明LLM的预训练可以被理解为一种元学习,从而以类似的方式为DNNs提供组合泛化能力。最后,我们讨论了这些发现对人类认知中组合性研究可能产生的影响,并提出了未来研究的方向。
更新时间: 2024-05-24 02:36:07
领域: cs.NE,cs.AI,cs.LG
Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark
Visual Prompt Learning (VPL) differs from traditional fine-tuning methods in reducing significant resource consumption by avoiding updating pre-trained model parameters. Instead, it focuses on learning an input perturbation, a visual prompt, added to downstream task data for making predictions. Since learning generalizable prompts requires expert design and creation, which is technically demanding and time-consuming in the optimization process, developers of Visual Prompts as a Service (VPaaS) have emerged. These developers profit by providing well-crafted prompts to authorized customers. However, a significant drawback is that prompts can be easily copied and redistributed, threatening the intellectual property of VPaaS developers. Hence, there is an urgent need for technology to protect the rights of VPaaS developers. To this end, we present a method named \textbf{WVPrompt} that employs visual prompt watermarking in a black-box way. WVPrompt consists of two parts: prompt watermarking and prompt verification. Specifically, it utilizes a poison-only backdoor attack method to embed a watermark into the prompt and then employs a hypothesis-testing approach for remote verification of prompt ownership. Extensive experiments have been conducted on three well-known benchmark datasets using three popular pre-trained models: RN50, BIT-M, and Instagram. The experimental results demonstrate that WVPrompt is efficient, harmless, and robust to various adversarial operations.
Updated: 2024-05-24 02:31:03
标题: 你在抄袭我的提示吗?通过水印保护VPaaS的视觉提示版权
摘要: 视觉提示学习(VPL)与传统的微调方法不同,它通过避免更新预训练模型参数来减少显着的资源消耗。相反,它专注于学习一种输入扰动,即一种视觉提示,添加到下游任务数据中以进行预测。由于学习可泛化的提示需要专业设计和创建,这在优化过程中技术要求高且耗时,因此出现了Visual Prompts as a Service(VPaaS)的开发者。这些开发者通过为授权客户提供精心设计的提示而获利。然而,一个重要的缺点是提示很容易被复制和重新分发,威胁到VPaaS开发者的知识产权。因此,迫切需要技术保护VPaaS开发者的权利。为此,我们提出了一种名为\textbf{WVPrompt}的方法,采用黑盒方式进行视觉提示水印处理。WVPrompt由两部分组成:提示水印处理和提示验证。具体来说,它利用一种仅有毒后门攻击方法将水印嵌入提示中,然后采用假设检验方法进行远程验证提示所有权。在三个知名基准数据集上进行了大量实验,使用了三种流行的预训练模型:RN50、BIT-M和Instagram。实验结果表明,WVPrompt高效、无害,并且对各种对抗操作具有强大的鲁棒性。
更新时间: 2024-05-24 02:31:03
领域: cs.CR,cs.CV
ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception
Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies have yet to thoroughly explore the intricate functional information contained in the protein domains. To fill this gap, we propose a synergistic integration approach for a function-aware domain representation, and a domain-joint contrastive learning strategy to distinguish different protein functions while aligning the modalities. Specifically, we associate domains with the GO terms as function priors to pre-train domain embeddings. Furthermore, we partition proteins into multiple sub-views based on continuous joint domains for contrastive training under the supervision of a novel triplet InfoNCE loss. Our approach significantly and comprehensively outperforms the state-of-the-art methods on various benchmarks, and clearly differentiates proteins carrying distinct functions compared to the competitor.
Updated: 2024-05-24 02:26:45
标题: ProtFAD:引入功能感知域作为隐性模态,以提高蛋白质功能感知
摘要: 蛋白质功能预测目前通过对其序列或结构进行编码来实现,序列与功能的超越和高质量结构数据的稀缺性导致性能瓶颈明显。蛋白质结构域是蛋白质的“构建块”,在功能上是独立的,它们的组合决定了多样化的生物功能。然而,大多数现有研究尚未彻底探索蛋白质结构域中包含的复杂功能信息。为了填补这一空白,我们提出了一种功能感知域表示的协同集成方法,以及一种区分不同蛋白质功能并对齐模态的域联合对比学习策略。具体地,我们将结构域与GO术语关联为功能先验,以预训练域嵌入。此外,我们基于连续联合域将蛋白质分为多个子视图,以在新型三元信息最大化估计损失的监督下进行对比训练。我们的方法在各种基准测试中显著和全面地优于最先进的方法,并与竞争对手相比清楚地区分携带不同功能的蛋白质。
更新时间: 2024-05-24 02:26:45
领域: q-bio.BM,cs.LG
EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records
Transformers have significantly advanced the modeling of Electronic Health Records (EHR), yet their deployment in real-world healthcare is limited by several key challenges. Firstly, the quadratic computational cost and insufficient context length of these models pose significant obstacles for hospitals in processing the extensive medical histories typical in EHR data. Additionally, existing models employ separate finetuning for each clinical task, complicating maintenance in healthcare environments. Moreover, these models focus exclusively on either clinical prediction or EHR forecasting, lacking the flexibility to perform well across both. To overcome these limitations, we introduce EHRMamba, a robust foundation model built on the Mamba architecture. EHRMamba can process sequences up to four times longer than previous models due to its linear computational cost. We also introduce a novel approach to Multitask Prompted Finetuning (MTF) for EHR data, which enables EHRMamba to simultaneously learn multiple clinical tasks in a single finetuning phase, significantly enhancing deployment and cross-task generalization. Furthermore, our model leverages the HL7 FHIR data standard to simplify integration into existing hospital systems. Alongside EHRMamba, we open-source Odyssey, a toolkit designed to support the development and deployment of EHR foundation models, with an emphasis on data standardization and interpretability. Our evaluations on the MIMIC-IV dataset demonstrate that EHRMamba advances state-of-the-art performance across 6 major clinical tasks and excels in EHR forecasting, marking a significant leap forward in the field.
Updated: 2024-05-24 02:22:21
标题: EHRMamba:面向电子健康记录的通用可扩展基础模型
摘要: 变压器已经显著推进了电子健康记录(EHR)建模,但它们在现实医疗环境中的部署受到几个关键挑战的限制。首先,这些模型的二次计算成本和不足的上下文长度给医院在处理EHR数据中典型的广泛医疗历史记录带来了显著障碍。此外,现有模型为每个临床任务单独进行微调,使其在医疗环境中的维护变得复杂。此外,这些模型专注于临床预测或EHR预测,缺乏在两者之间表现良好的灵活性。为了克服这些限制,我们介绍了EHRMamba,这是一个基于Mamba架构构建的强大基础模型。由于其线性计算成本,EHRMamba可以处理比以往模型长四倍的序列。我们还引入了一种新颖的多任务提示微调(MTF)方法用于EHR数据,这使得EHRMamba能够在单次微调阶段同时学习多个临床任务,极大地增强了部署和跨任务泛化能力。此外,我们的模型利用HL7 FHIR数据标准简化了与现有医院系统的集成。除了EHRMamba,我们还开源了Odyssey,这是一个旨在支持EHR基础模型开发和部署的工具包,强调数据标准化和可解释性。我们在MIMIC-IV数据集上的评估表明,EHRMamba在6个主要临床任务中推动了最先进的性能,并在EHR预测方面表现出色,标志着该领域的重大飞跃。
更新时间: 2024-05-24 02:22:21
领域: cs.LG
Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS) that incorporates intelligent components, notably Convolutional Neural Networks (CNNs) and Deep Reinforcement Learning (DRL), to undertake multifaceted tasks encompassing perception, decision-making, and control. The utilization of DRL for decision-making facilitates dynamic interaction with the environment, generating control actions aimed at maximizing cumulative rewards. Nevertheless, the inherent uncertainty of the operational environment and the intricate nature of ICPS necessitate exploration within complex and dynamic state spaces during the learning phase. DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process. In response to these challenges, we propose an innovative abstract modeling approach grounded in spatial-temporal value semantics, capturing the evolution in the distribution of semantic value across time and space. A semantics-based abstraction is introduced to construct an abstract Markov Decision Process (MDP) for the DRL learning process. Furthermore, optimization techniques for abstraction are delineated, aiming to refine the abstract model and mitigate semantic gaps between abstract and concrete states. The efficacy of the abstract modeling is assessed through the evaluation and analysis of the abstract MDP model using PRISM. A series of experiments are conducted, involving diverse scenarios such as lane-keeping, adaptive cruise control, and intersection crossroad assistance, to demonstrate the effectiveness of our abstracting approach.
Updated: 2024-05-24 02:21:10
标题: 基于时空价值语义的密集深度强化学习抽象化
摘要: 智能网络物理系统(ICPS)代表了一种特殊形式的网络物理系统(CPS),其中包括智能组件,尤其是卷积神经网络(CNN)和深度强化学习(DRL),以执行包括感知、决策和控制在内的多方面任务。利用DRL进行决策有助于与环境动态互动,生成旨在最大化累积奖励的控制动作。然而,操作环境的固有不确定性和ICPS的复杂性要求在学习阶段在复杂和动态的状态空间内进行探索。在决策过程中,DRL面临效率、泛化能力和数据稀缺性方面的挑战。针对这些挑战,我们提出了一种基于时空价值语义的创新抽象建模方法,捕捉了语义值在时间和空间上的分布演变。引入基于语义的抽象,构建了用于DRL学习过程的抽象马尔可夫决策过程(MDP)。此外,还详细阐述了抽象优化技术,旨在完善抽象模型并减少抽象和具体状态之间的语义差距。通过使用PRISM对抽象MDP模型进行评估和分析,评估了抽象建模的有效性。进行了一系列实验,涉及车道保持、自适应巡航控制和十字路口辅助等各种场景,以展示我们的抽象方法的有效性。
更新时间: 2024-05-24 02:21:10
领域: cs.LG,cs.AI,68N30,D.2.4
Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed intervals tied to specific event types. The neural semi-CRF approach requires an interval scoring matrix that assigns a score for every candidate interval. However, designing an efficient and expressive architecture for scoring intervals is not trivial. In this paper, we introduce a simple method for scoring intervals using scaled inner product operations that resemble how attention scoring is done in transformers. We show theoretically that, due to the special structure from encoding the non-overlapping intervals, under a mild condition, the inner product operations are expressive enough to represent an ideal scoring matrix that can yield the correct transcription result. We then demonstrate that an encoder-only non-hierarchical transformer backbone, operating only on a low-time-resolution feature map, is capable of transcribing piano notes and pedals with high accuracy and time precision. The experiment shows that our approach achieves the new state-of-the-art performance across all subtasks in terms of the F1 measure on the Maestro dataset.
Updated: 2024-05-24 02:20:54
标题: 使用非层次变换器为自动钢琴转录打分间隔
摘要: 神经半马尔可夫条件随机场(半CRF)框架已显示出在基于事件的钢琴转录中具有潜力。在该框架中,所有事件(音符或踏板)都被表示为与特定事件类型相关联的闭合区间。神经半CRF方法需要一个为每个候选区间分配分数的区间评分矩阵。然而,为区间评分设计一个高效且表现力强的架构并不是一件简单的事情。在本文中,我们介绍了一种简单的使用缩放内积操作对区间进行评分的方法,这种方法类似于变压器中的注意力评分方法。我们理论上展示了,由于编码非重叠区间的特殊结构,在一定条件下,内积操作足够表现力强,可以表示一个能产生正确转录结果的理想评分矩阵。然后,我们证明了仅基于低时间分辨率特征图操作的仅编码器非层次变压器骨干结构能够以高准确度和时间精度转录钢琴音符和踏板。实验证明,我们的方法在Maestro数据集的所有子任务中以F1度量为标准实现了最新的性能。
更新时间: 2024-05-24 02:20:54
领域: cs.SD,cs.LG,eess.AS
Model-based Reinforcement Learning for Parameterized Action Spaces
We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.
Updated: 2024-05-24 02:15:42
标题: 基于模型的参数化动作空间强化学习
摘要: 我们提出了一种新颖的基于模型的强化学习算法——具有参数化动作的动态学习和预测控制(DLPA)——用于参数化动作马尔可夫决策过程(PAMDPs)。代理学习一个参数化动作条件的动态模型,并使用修改的模型预测路径积分控制进行规划。我们在理论上量化了规划过程中生成轨迹与最优轨迹之间的差异,从Lipschitz连续性的角度衡量它们所实现的价值。我们在几个标准基准测试上的实证结果表明,我们的算法比最先进的PAMDP方法具有更优越的样本效率和渐近性能。
更新时间: 2024-05-24 02:15:42
领域: cs.LG,cs.AI
Online Prompt Pricing based on Combinatorial Multi-Armed Bandit and Hierarchical Stackelberg Game
Generation models have shown promising performance in various tasks, making trading around machine learning models possible. In this paper, we aim at a novel prompt trading scenario, prompt bundle trading (PBT) system, and propose an online pricing mechanism. Based on the combinatorial multi-armed bandit (CMAB) and three-stage hierarchical Stackelburg (HS) game, our pricing mechanism considers the profits of the consumer, platform, and seller, simultaneously achieving the profit satisfaction of these three participants. We break down the pricing issue into two steps, namely unknown category selection and incentive strategy optimization. The former step is to select a set of categories with the highest qualities, and the latter is to derive the optimal strategy for each participant based on the chosen categories. Unlike the existing fixed pricing mode, the PBT pricing mechanism we propose is more flexible and diverse, which is more in accord with the transaction needs of real-world scenarios. We test our method on a simulated text-to-image dataset. The experimental results demonstrate the effectiveness of our algorithm, which provides a feasible price-setting standard for the prompt marketplaces.
Updated: 2024-05-24 02:13:46
标题: 基于组合多臂老虎机和分层斯塔克尔贝格博弈的在线提示定价
摘要: 生成模型在各种任务中表现出了良好的性能,使得围绕机器学习模型的交易成为可能。本文旨在提出一种新颖的提示交易场景,即提示捆绑交易(PBT)系统,并提出一个在线定价机制。基于组合多臂赌博机(CMAB)和三阶层次斯塔克尔伯格(HS)博弈,我们的定价机制同时考虑了消费者、平台和卖家的利润,实现了这三个参与者的利润满意度。我们将定价问题分解为两个步骤,即未知类别选择和激励策略优化。前一步是选择具有最高质量的一组类别,后一步是根据选择的类别为每个参与者制定最佳策略。与现有的固定定价模式不同,我们提出的PBT定价机制更灵活多样,更符合现实场景的交易需求。我们在模拟文本到图像数据集上测试了我们的方法。实验结果证明了我们的算法的有效性,为提示市场提供了可行的定价标准。
更新时间: 2024-05-24 02:13:46
领域: cs.AI,cs.LG
Machine Unlearning in Large Language Models
Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safety standards by leveraging the gradient ascent algorithm for knowledge unlearning. Our approach aims to selectively erase or modify learned information in LLMs, targeting harmful responses and copyrighted content. This paper presents a dual-pronged approach to enhance the ethical and safe behavior of large language models (LLMs) by addressing the issues of harmful responses and copyrighted content. To mitigate harmful responses, we applied gradient ascent on the PKU dataset, achieving a 75\% reduction in harmful responses for Open Pre-trained Transformer Language Models (OPT1.3b and OPT2.7b) \citet{zhang2022opt} while retaining previous knowledge using the TruthfulQA dataset \citet{DBLP:journals/corr/abs-2109-07958}. For handling copyrighted content, we constructed a custom dataset based on the Lord of the Rings corpus and aligned LLMs (OPT1.3b and OPT2.7b) \citet{zhang2022opt} through LoRA: Low-Rank Adaptation of Large Language Models \citet{DBLP:journals/corr/abs-2106-09685} finetuning. Subsequently, we employed gradient ascent to unlearn the Lord of the Rings content, resulting in a remarkable reduction in the presence of copyrighted material. To maintain a diverse knowledge base, we utilized the Book Corpus dataset. Additionally, we propose a new evaluation technique for assessing the effectiveness of harmful unlearning.
Updated: 2024-05-24 02:12:51
标题: 大型语言模型中的机器取消学习
摘要: 机器遗忘是人工智能领域内的一个新领域,专注于解决在机器学习模型中有选择性地遗忘或减少不良知识或行为的挑战,特别是在大型语言模型(LLMs)的背景下。本文介绍了一种方法论,通过利用梯度上升算法实现对LLMs(如Open Pre-trained Transformer语言模型)与伦理、隐私和安全标准的对齐,实现知识遗忘。我们的方法旨在有选择性地消除或修改LLMs中学到的信息,针对有害响应和受版权保护内容。本文提出了一个双管齐下的方法,通过解决有害响应和版权内容的问题,增强大型语言模型(LLMs)的伦理和安全行为。为了减轻有害响应,我们在PKU数据集上应用了梯度上升算法,实现了Open Pre-trained Transformer语言模型(OPT1.3b和OPT2.7b)\citet{zhang2022opt}有害响应减少75\%,同时利用TruthfulQA数据集\citet{DBLP:journals/corr/abs-2109-07958}保留了先前的知识。为了处理受版权保护的内容,我们基于《指环王》语料库构建了一个定制数据集,并通过LoRA(大型语言模型的低秩调整)\citet{DBLP:journals/corr/abs-2106-09685}微调来对齐LLMs(OPT1.3b和OPT2.7b)\citet{zhang2022opt}。随后,我们采用梯度上升算法来遗忘《指环王》内容,使受版权材料的存在大大减少。为了保持多样化的知识库,我们利用了Book Corpus数据集。此外,我们提出了一种用于评估有害遗忘效果的新评估技术。
更新时间: 2024-05-24 02:12:51
领域: cs.CL,cs.AI
Enhancing Learning with Label Differential Privacy by Vector Approximation
Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.
Updated: 2024-05-24 02:08:45
标题: 通过向量近似提高标签差分隐私的学习效果
摘要: 标签差分隐私(DP)是一种保护训练数据集中标签隐私的框架,同时特征向量是公开的。现有方法通过随机翻转标签来保护标签的隐私,然后训练模型使输出近似于私有化标签。然而,随着类别数K的增加,需要更强的随机化,因此这些方法的性能显著变差。在本文中,我们提出了一种向量逼近方法,易于实施并且引入了少量额外的计算开销。我们的方法不是将每个标签翻转成单个标量,而是将每个标签转换成具有K个分量的随机向量,其期望反映类别条件概率。直观地说,向量逼近保留了比标量标签更多的信息。简要的理论分析显示,我们的方法的性能随K值仅略微下降。最后,我们在合成和真实数据集上进行实验,验证了我们的理论分析以及我们方法的实际性能。
更新时间: 2024-05-24 02:08:45
领域: cs.LG
Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis
Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying semantic interdependency among activities. Moreover, they are also constrained by the data source. These two factors thereby limit their realism and adaptability, respectively. Meanwhile, traditional activity-based models (ABMs) in transportation modeling rely on rigid assumptions and are costly and time-consuming to calibrate, making them difficult to adapt and scale to new regions, especially those regions with limited amount of required conventional travel data. To address these limitations, we develop a novel generative deep learning approach for human mobility modeling and synthesis, using ubiquitous and open-source data. Additionally, the model can be fine-tuned with local data, enabling adaptable and accurate representations of mobility patterns across different regions. The model is evaluated on a nationwide dataset of the United States, where it demonstrates superior performance in generating activity chains that closely follow ground truth distributions. Further tests using state- or city-specific datasets from California, Washington, and Mexico City confirm its transferability. This innovative approach offers substantial potential to advance mobility modeling research, especially in generating human activity chains as input for downstream activity-based mobility simulation models and providing enhanced tools for urban planners and policymakers.
Updated: 2024-05-24 02:04:10
标题: 深层活动模型:一种用于人类移动模式合成的生成方法
摘要: 人类流动显著影响社会的各个方面,包括交通、城市规划和公共卫生。不断增加的多样化流动数据的可用性和深度学习的进展已经彻底改变了流动建模。然而,现有的深度学习模型主要使用轨迹研究时空模式,往往无法捕捉活动之间的基本语义相互依存关系。此外,它们还受数据来源的限制。这两个因素限制了它们的现实性和适应性。与此同时,在交通建模中,传统的基于活动的模型(ABMs)依赖于刚性假设,难以校准,成本高且耗时,使其难以在新区域,特别是那些需要传统出行数据的有限地区适应和扩展。为了解决这些限制,我们开发了一种新颖的生成式深度学习方法,用于人类流动建模和合成,利用普遍和开源数据。此外,该模型可以通过本地数据进行微调,实现对不同地区流动模式的适应和准确表示。该模型在美国的全国数据集上进行了评估,在那里它展示了在生成活动链方面与实际分布密切相关的卓越性能。使用来自加利福尼亚、华盛顿和墨西哥城的州或城市特定数据集进行进一步测试,证实了其可传递性。这种创新方法为推进流动建模研究提供了巨大潜力,特别是在生成人类活动链作为下游基于活动的流动模拟模型的输入,并为城市规划者和政策制定者提供增强工具方面。
更新时间: 2024-05-24 02:04:10
领域: cs.LG,cs.AI
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs. We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. Results show that for content moderation, our GPT-3.5-based models either match or outperform GPT-4 on datasets. Regarding cultural alignment, our models surpass GPT-4 on Hofstede's VSM 13 framework. Furthermore, for cultural education of human participants, our models demonstrate superior outcomes in both learning efficacy and user experience compared to GPT-4. CulturePark proves an important step in addressing cultural bias and advancing the democratization of AI, highlighting the critical role of culturally inclusive data in model training.
Updated: 2024-05-24 01:49:02
标题: 文化公园:在大型语言模型中促进跨文化理解
摘要: 文化偏见在许多大型语言模型(LLMs)中普遍存在,主要是由于代表不同文化的数据不足。通常,文化数据集和基准是通过从现有数据集中提取子集或从维基百科和社交媒体等平台聚合构建的。然而,这些方法高度依赖于现实世界数据和人类注释,使其成本高昂且难以扩展。受社会交流认知理论的启发,本文介绍了CulturePark,这是一个由LLM驱动的多代理通信框架,用于文化数据收集。CulturePark模拟跨文化人类交流,LLM代理扮演不同文化中的角色。它生成包含人类信仰、规范和习俗的高质量跨文化对话。使用CulturePark,我们生成了41,000个文化样本,以微调八个特定文化的LLM。我们在三个下游任务中评估了这些模型:内容管理、文化对齐和文化教育。结果显示,在内容管理方面,我们基于GPT-3.5的模型在数据集上要么与GPT-4相匹敌,要么表现更好。关于文化对齐,我们的模型在霍夫斯泰德的VSM 13框架上超越了GPT-4。此外,对于人类参与者的文化教育,我们的模型在学习效果和用户体验方面均表现出比GPT-4更优异的结果。CulturePark是解决文化偏见和推动人工智能民主化的重要一步,突出了在模型训练中文化包容性数据的关键作用。
更新时间: 2024-05-24 01:49:02
领域: cs.AI,cs.CL,cs.MA
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functions that go beyond the simple static objective structures, which pose significant challenges of using existing bilevel solutions. To tackle this new class of bilevel problems, we introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based (policy) gradient algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
Updated: 2024-05-24 01:47:54
标题: 基于原则的惩罚方法用于双层强化学习和RLHF
摘要: 双层优化最近被应用于许多机器学习任务中。然而,它们的应用被限制在监督学习设置中,其中考虑了具有良性结构的静态目标函数。但是,激励设计、反向强化学习(RL)和来自人类反馈的RL(RLHF)等双层问题通常被建模为超越简单静态目标结构的动态目标函数,这给使用现有双层解决方案带来了重大挑战。为了解决这类新的双层问题,我们通过惩罚制定的视角引入了第一个解决双层RL问题的原则性算法框架。我们对问题景观及其基于惩罚的(策略)梯度算法进行了理论研究。通过在Stackelberg马尔可夫博弈、来自人类反馈的RL和激励设计中进行模拟,我们展示了我们算法的有效性。
更新时间: 2024-05-24 01:47:54
领域: cs.LG,math.OC,stat.ML
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
Updated: 2024-05-24 01:45:27
标题: 智能 Go-Explore:站在巨大基础模型的肩膀上
摘要: Go-Explore是一组强大的算法家族,旨在解决难以探索的问题,其建立在存档已发现状态并迭代返回和探索最有前途的状态的原则上。这种方法已经在包括Atari游戏和机器人控制在内的各种具有挑战性的问题上取得了超人类表现,但需要手动设计启发式来引导探索,这在一般情况下是耗时且不可行的。为了解决这个问题,我们提出了智能Go-Explore(IGE),它通过用巨大的基础模型(FMs)捕捉的有趣性的智能和内在的人类概念取代这些启发式,从而大大扩展了原始Go-Explore的范围。这为IGE提供了类似于人类的能力,可以本能地识别任何新状态的有趣或有前途之处(例如,发现新对象、位置或行为),即使在难以定义启发式的复杂环境中也是如此。此外,IGE还提供了令人兴奋且以前不可能的机会,即识别和利用无法提前预测的偶然发现。我们在一系列需要搜索和探索的基于语言的任务中评估了IGE的性能。在24点游戏中,一个多步数学推理问题,IGE的成功率达到100%,比最佳经典图搜索基线快70.8%。接下来,在BabyAI-Text中,一个具有挑战性的部分可观察网格世界,IGE比以前的SOTA表现出数量级更少的在线样本。最后,在TextWorld中,我们展示了IGE在需要长期探索的环境中成功的独特能力,而先前的SOTA FM代理(如Reflexion)完全失败。总的来说,IGE结合了FMs的巨大优势和强大的Go-Explore算法,开创了一个新的研究领域,致力于创造更具有令人印象深刻的探索能力的普遍能力代理。
更新时间: 2024-05-24 01:45:27
领域: cs.LG,cs.AI,cs.CL
Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More
Combinatorial optimization (CO) is naturally discrete, making machine learning based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each component confronts unique challenges. First, deriving objectives under various conditions (e.g., cardinality constraints and minimum) is nontrivial. Second, the derandomization process is underexplored, and the existing derandomization methods are either random sampling or naive rounding. In this work, we aim to tackle prevalent (i.e., commonly involved) conditions in unsupervised CO. First, we concretize the targets for objective construction and derandomization with theoretical justification. Then, for various conditions commonly involved in different CO problems, we derive nontrivial objectives and derandomization to meet the targets. Finally, we apply the derivations to various CO problems. Via extensive experiments on synthetic and real-world graphs, we validate the correctness of our derivations and show our empirical superiority w.r.t. both optimization quality and speed.
Updated: 2024-05-24 01:44:42
标题: 解决无监督组合优化中的普遍条件:基数、最小值、覆盖等问题
摘要: 组合优化(CO)在本质上是离散的,这使得基于可微优化的机器学习无法应用。Karalias&Loukas(2020)将概率方法应用于将CO纳入可微优化中。他们的工作引发了关于CO的无监督学习的研究,由两个主要组成部分组成:概率目标和去随机化。然而,每个组成部分都面临着独特的挑战。首先,在各种条件下(例如,基数约束和最小值)推导目标是非平凡的。其次,去随机化过程尚未被充分探讨,现有的去随机化方法要么是随机抽样,要么是朴素舍入。在这项工作中,我们旨在解决无监督CO中普遍涉及的条件。首先,我们通过理论证明具体化目标以进行目标构建和去随机化。然后,针对不同CO问题中常见的各种条件,我们推导出非平凡的目标和去随机化以满足这些目标。最后,我们将推导应用于各种CO问题。通过在合成和真实世界图上进行大量实验,我们验证了我们推导的正确性,并展示了我们在优化质量和速度方面的经验优势。
更新时间: 2024-05-24 01:44:42
领域: cs.LG,math.OC
Bayesian Vector AutoRegression with Factorised Granger-Causal Graphs
We study the problem of automatically discovering Granger causal relations from observational multivariate time-series data.Vector autoregressive (VAR) models have been time-tested for this problem, including Bayesian variants and more recent developments using deep neural networks. Most existing VAR methods for Granger causality use sparsity-inducing penalties/priors or post-hoc thresholds to interpret their coefficients as Granger causal graphs. Instead, we propose a new Bayesian VAR model with a hierarchical factorised prior distribution over binary Granger causal graphs, separately from the VAR coefficients. We develop an efficient algorithm to infer the posterior over binary Granger causal graphs. Comprehensive experiments on synthetic, semi-synthetic, and climate data show that our method is more uncertainty aware, has less hyperparameters, and achieves better performance than competing approaches, especially in low-data regimes where there are less observations.
Updated: 2024-05-24 01:40:45
标题: 具有因子化Granger-因果图的贝叶斯向量自回归模型
摘要: 我们研究了从观测多变量时间序列数据中自动发现Granger因果关系的问题。矢量自回归(VAR)模型在这个问题上经过了时间的考验,包括贝叶斯变体和使用深度神经网络的最新发展。大多数现有的VAR方法用稀疏性诱导惩罚/先验或事后阈值来解释它们的系数作为Granger因果图。相反,我们提出了一种新的贝叶斯VAR模型,其中对二进制Granger因果图具有分层因子化先验分布,与VAR系数分开。我们开发了一种有效的算法来推断二进制Granger因果图的后验分布。在合成、半合成和气候数据上的全面实验表明,我们的方法更加关注不确定性,具有较少的超参数,并且在特别是在数据稀缺的情况下,表现比竞争方法更好。
更新时间: 2024-05-24 01:40:45
领域: cs.LG,stat.ML
PDE Control Gym: A Benchmark for Data-Driven Boundary Control of Partial Differential Equations
Over the last decade, data-driven methods have surged in popularity, emerging as valuable tools for control theory. As such, neural network approximations of control feedback laws, system dynamics, and even Lyapunov functions have attracted growing attention. With the ascent of learning based control, the need for accurate, fast, and easy-to-use benchmarks has increased. In this work, we present the first learning-based environment for boundary control of PDEs. In our benchmark, we introduce three foundational PDE problems - a 1D transport PDE, a 1D reaction-diffusion PDE, and a 2D Navier-Stokes PDE - whose solvers are bundled in an user-friendly reinforcement learning gym. With this gym, we then present the first set of model-free, reinforcement learning algorithms for solving this series of benchmark problems, achieving stability, although at a higher cost compared to model-based PDE backstepping. With the set of benchmark environments and detailed examples, this work significantly lowers the barrier to entry for learning-based PDE control - a topic largely unexplored by the data-driven control community. The entire benchmark is available on Github along with detailed documentation and the presented reinforcement learning models are open sourced.
Updated: 2024-05-24 01:40:41
标题: PDE控制健身房:基于数据驱动的偏微分方程边界控制的基准Benchmark
摘要: 在过去的十年中,数据驱动方法在流行度上迅速增长,成为控制理论中宝贵的工具。因此,控制反馈定律、系统动力学甚至李亚普诺夫函数的神经网络逼近引起了越来越多的关注。随着基于学习的控制的崛起,对准确、快速和易于使用的基准的需求也在增加。在这项工作中,我们提出了第一个用于PDE边界控制的基于学习的环境。在我们的基准中,我们引入了三个基础PDE问题 - 一维输运PDE、一维反应扩散PDE和二维Navier-Stokes PDE - 它们的求解器被捆绑在一个用户友好的强化学习健身房中。借助这个健身房,我们提出了第一组无模型、强化学习算法来解决这一系列基准问题,虽然与基于模型的PDE反步法相比成本更高。通过一系列基准环境和详细示例,这项工作显著降低了学习型PDE控制的门槛 - 这是数据驱动控制社区尚未深入探讨的主题。整个基准可在Github上获得,详细文档和提供的强化学习模型也是开源的。
更新时间: 2024-05-24 01:40:41
领域: eess.SY,cs.AI,cs.CE,cs.LG,cs.SY,math.OC
Better Membership Inference Privacy Measurement through Discrepancy
Membership Inference Attacks have emerged as a dominant method for empirically measuring privacy leakage from machine learning models. Here, privacy is measured by the {\em{advantage}} or gap between a score or a function computed on the training and the test data. A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the advantage is relatively low, or the attack involves training multiple models which is highly compute-intensive. In this work, inspired by discrepancy theory, we propose a new empirical privacy metric that is an upper bound on the advantage of a family of membership inference attacks. We show that this metric does not involve training multiple models, can be applied to large Imagenet classification models in-the-wild, and has higher advantage than existing metrics on models trained with more recent and sophisticated training recipes. Motivated by our empirical results, we also propose new membership inference attacks tailored to these training losses.
Updated: 2024-05-24 01:33:22
标题: 更好的成员推理隐私测量方法:通过差异性
摘要: 成员推理攻击已经成为衡量机器学习模型隐私泄漏的主要方法。在这里,隐私是通过在训练数据和测试数据上计算的分数或函数之间的优势或差距来衡量的。这些攻击实际部署的一个主要障碍是它们无法扩展到大规模且广义模型——要么优势相对较低,要么攻击涉及训练多个模型,这是高度计算密集的。在这项工作中,受到差异理论的启发,我们提出了一个新的经验隐私度量,它是一类成员推理攻击优势的上限。我们展示了这个度量不涉及训练多个模型,可以应用于大规模的Imagenet分类模型,并且在最新和复杂训练配方训练的模型上具有比现有度量更高的优势。受到我们的实证结果的启发,我们还提出了针对这些训练损失量身定制的新的成员推理攻击。
更新时间: 2024-05-24 01:33:22
领域: cs.LG
An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking
In this work, we consider data association problems involving multi-object tracking (MOT). In particular, we address the challenges arising from object occlusions. We propose a framework called approximate dynamic programming track (ADPTrack), which applies dynamic programming principles to improve an existing method called the base heuristic. Given a set of tracks and the next target frame, the base heuristic extends the tracks by matching them to the objects of this target frame directly. In contrast, ADPTrack first processes a few subsequent frames and applies the base heuristic starting from the next target frame to obtain tentative tracks. It then leverages the tentative tracks to match the objects of the target frame. This tends to reduce the occlusion-based errors and leads to an improvement over the base heuristic. When tested on the MOT17 video dataset, the proposed method demonstrates a 0.7% improvement in the association accuracy (IDF1 metric) over a state-of-the-art method that is used as the base heuristic. It also obtains improvements with respect to all the other standard metrics. Empirically, we found that the improvements are particularly pronounced in scenarios where the video data is obtained by fixed-position cameras.
Updated: 2024-05-24 01:27:14
标题: 一个用于抗遮挡的多目标跟踪的近似动态规划框架
摘要: 在这项工作中,我们考虑涉及多目标跟踪(MOT)的数据关联问题。特别是,我们解决了由于目标遮挡而产生的挑战。我们提出了一个名为近似动态规划跟踪(ADPTrack)的框架,该框架应用动态规划原则来改进一种名为基础启发式方法的现有方法。给定一组轨迹和下一个目标帧,基本启发式方法通过直接将这些轨迹与该目标帧的对象进行匹配来扩展这些轨迹。相比之下,ADPTrack首先处理几个连续的帧,然后从下一个目标帧开始应用基本启发式方法以获得临时轨迹。然后利用这些临时轨迹来匹配目标帧的对象。这倾向于减少基于遮挡的错误,并且比基础启发式方法有所改进。在MOT17视频数据集上进行测试时,所提出的方法在关联准确性(IDF1指标)方面比用作基础启发式方法的最先进方法提高了0.7%。它还在所有其他标准指标方面取得了改进。经验上,我们发现在视频数据由固定位置摄像机获取的情况下,这些改进尤为显著。
更新时间: 2024-05-24 01:27:14
领域: cs.CV,cs.AI
Exploring the Evolution of Hidden Activations with Live-Update Visualization
Monitoring the training of neural networks is essential for identifying potential data anomalies, enabling timely interventions and conserving significant computational resources. Apart from the commonly used metrics such as losses and validation accuracies, the hidden representation could give more insight into the model progression. To this end, we introduce SentryCam, an automated, real-time visualization tool that reveals the progression of hidden representations during training. Our results show that this visualization offers a more comprehensive view of the learning dynamics compared to basic metrics such as loss and accuracy over various datasets. Furthermore, we show that SentryCam could facilitate detailed analysis such as task transfer and catastrophic forgetting to a continual learning setting. The code is available at https://github.com/xianglinyang/SentryCam.
Updated: 2024-05-24 01:23:20
标题: 探究隐含激活的演变:通过实时更新可视化进行研究
摘要: 监控神经网络的训练对于识别潜在的数据异常、及时干预和节约大量计算资源至关重要。除了常用的指标如损失和验证准确性外,隐藏表示可以提供更多关于模型进展的见解。为此,我们引入了SentryCam,一种自动化、实时的可视化工具,可以展示训练过程中隐藏表示的进展。我们的结果表明,与基本指标如损失和准确性相比,这种可视化提供了更全面的学习动态视图,适用于各种数据集。此外,我们展示了SentryCam可以促进诸如任务转移和灾难性遗忘等详细分析,将其应用于持续学习设置。代码可在https://github.com/xianglinyang/SentryCam 上获得。
更新时间: 2024-05-24 01:23:20
领域: cs.LG
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis, characterized by extensive input sequences, as opposed to the shorter spans typical of traditional approaches. While longer sequences inherently offer richer information for enhanced predictive precision, prevailing studies often respond by escalating model complexity. These intricate models can inflate into millions of parameters, resulting in prohibitive parameter scales. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation while achieving uniformly superior and robust results across various datasets. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks, using over 99 \% fewer parameters than the majority of competing methods. Through this work, we aim to unleash the power of a restricted set of parameters by capitalizing on domain characteristics--a timely reminder that in the realm of LTSF, bigger is not invariably better.
Updated: 2024-05-24 01:17:03
标题: “简洁还是能力?分解方法在长期时间序列预测中同时实现两者”
摘要: 长期时间序列预测(LTSF)代表了时间序列分析中的一个关键前沿,其特点是具有广泛的输入序列,与传统方法中较短的时间跨度相对。尽管较长的序列在增强预测精度方面固有地提供了更丰富的信息,但现有研究往往通过提高模型复杂性来应对。这些复杂的模型可能会膨胀成数百万个参数,导致参数规模过大。我们的研究通过分析和实证证据表明,分解是控制过度模型膨胀的关键,同时在各种数据集上实现统一优越和稳健的结果。值得注意的是,通过根据时间序列数据的固有动态调整分解,我们提出的模型胜过现有的基准模型,使用的参数比大多数竞争方法少99%以上。通过这项工作,我们旨在利用专门设计的一组参数的强大能力,充分利用领域特征,这是一个及时的提醒,即在LTSF领域,并非总是越大越好。
更新时间: 2024-05-24 01:17:03
领域: cs.LG
Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification
The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also be erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. Since to estimate the density it is necessary to know the ID, this condition is imposed self-consistently. We illustrate the usefulness and robustness of this procedure by benchmarks on artificial and real-world datasets.
Updated: 2024-05-24 01:08:05
标题: 超越噪音:通过最佳邻域识别进行内在维度估计
摘要: 内在维度(ID)是无监督学习和特征选择中的关键概念,因为它是描述系统所需变量数量的下限。然而,在几乎任何真实世界的数据集中,ID取决于数据分析的尺度。通常在小尺度下,由于受到测量误差的影响,ID非常大。在大尺度下,由于包含数据的流形的曲率和拓扑结构,ID也可能错误地变得很大。在这项工作中,我们引入了一个自动协议来选择“甜点”,即ID有意义和有用的正确尺度范围。该协议基于一个条件,即对于小于正确尺度的距离,数据的密度是恒定的。由于估计密度需要知道ID,因此这个条件是自洽的。通过在人工和真实数据集上进行基准测试,我们展示了这种程序的实用性和稳健性。
更新时间: 2024-05-24 01:08:05
领域: stat.ML,cs.LG,math.ST,stat.CO,stat.ME,stat.TH
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation
As natural language generation (NLG) models have become prevalent, systematically assessing the quality of machine-generated texts has become increasingly important. Recent studies introduce LLM-based evaluators that operate as reference-free metrics, demonstrating their capability to adeptly handle novel tasks. However, these models generally rely on a single-agent approach, which, we argue, introduces an inherent limit to their performance. This is because there exist biases in LLM agent's responses, including preferences for certain text structure or content. In this work, we propose DEBATE, an NLG evaluation framework based on multi-agent scoring system augmented with a concept of Devil's Advocate. Within the framework, one agent is instructed to criticize other agents' arguments, potentially resolving the bias in LLM agent's answers. DEBATE substantially outperforms the previous state-of-the-art methods in two meta-evaluation benchmarks in NLG evaluation, SummEval and TopicalChat. We also show that the extensiveness of debates among agents and the persona of an agent can influence the performance of evaluators.
Updated: 2024-05-24 01:06:41
标题: 辩论:以批判者立场为基础的评估和文本评价
摘要: 随着自然语言生成(NLG)模型变得普遍,系统评估机器生成文本的质量变得日益重要。最近的研究引入了基于LLM的评估器,作为无参考度量,展示了它们处理新任务的能力。然而,这些模型通常依赖于单一代理方法,我们认为,这会限制它们的性能。这是因为LLM代理的回应中存在偏见,包括对特定文本结构或内容的偏好。在这项工作中,我们提出了一个基于多代理评分系统的NLG评估框架DEBATE,增加了“魔鬼的辩护人”概念。在这个框架内,一个代理被指示批评其他代理的论点,可能解决LLM代理回答中的偏见。DEBATE在NLG评估的两个元评估基准SummEval和TopicalChat中显著优于先前的最先进方法。我们还展示了代理之间辩论的广泛程度和代理的个性可以影响评估器的性能。
更新时间: 2024-05-24 01:06:41
领域: cs.CL,cs.AI
OptLLM: Optimal Assignment of Queries to Large Language Models
Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy.
Updated: 2024-05-24 01:05:37
标题: OptLLM:将查询最优分配到大型语言模型
摘要: 大型语言模型(LLMs)由于其卓越的能力而引起了相当大的关注,导致越来越多的公司将LLMs作为服务提供。不同的LLMs在不同的成本下实现了不同的性能。用户面临的挑战在于选择最适合他们需求的LLMs,平衡成本和性能。在本文中,我们提出了一个框架来解决LLMs的成本效益查询分配问题。给定一组输入查询和候选LLMs,我们的框架,命名为OptLLM,为用户提供了一系列最佳解决方案可供选择,符合其预算限制和性能偏好,包括最大化准确性和最小化成本的选项。OptLLM使用具有不确定性估计的多标签分类模型来预测每个查询上候选LLMs的性能,然后通过破坏和重建当前解决方案来迭代生成一组非支配解。为了评估OptLLM的有效性,我们在各种类型的任务上进行了广泛的实验,包括文本分类、问答、情感分析、推理和日志解析。我们的实验结果表明,OptLLM在保持与最佳LLM相同准确性的同时,将成本大幅降低了2.40%至49.18%。与其他多目标优化算法相比,OptLLM在相同成本下提高了2.94%至69.05%的准确性,或者在保持最高可达准确性的同时节省了8.79%至95.87%的成本。
更新时间: 2024-05-24 01:05:37
领域: cs.SE,cs.CL,cs.LG
Executable Code Actions Elicit Better LLM Agents
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
Updated: 2024-05-24 01:05:14
标题: 可执行代码操作引发更好的LLM代理
摘要: 大型语言模型(LLM)代理能够执行广泛的动作,例如调用工具和控制机器人,在解决现实世界挑战方面表现出巨大潜力。LLM代理通常通过生成JSON或文本以预定义格式来产生动作,这通常受限于受限的动作空间(例如,预定义工具的范围)和受限的灵活性(例如,无法组合多个工具)。本文提出使用可执行的Python代码将LLM代理的动作整合到统一的动作空间中(CodeAct)。与Python解释器集成,CodeAct可以执行代码动作,并通过多轮交互在新观察发生时动态修订先前的动作或发出新的动作。我们对17个LLM在API-Bank上进行了广泛分析,并展示了CodeAct在新的基准测试中表现优于广泛使用的替代方案(成功率高出20%)。CodeAct的鼓舞人心的表现激励我们构建一个开源的LLM代理,通过执行可解释代码与环境进行交互,并使用自然语言与用户进行协作。为此,我们收集了一个指令调整数据集CodeActInstruct,包括使用CodeAct进行的7,000个多轮交互。我们展示它可以与现有数据一起使用,以提高代理任务中的模型而不损害其一般功能。CodeActAgent,从Llama2和Mistral进行微调,集成了Python解释器,并独特定制以执行复杂任务(例如,模型训练)使用现有库并自主调试。
更新时间: 2024-05-24 01:05:14
领域: cs.CL,cs.AI
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images
Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid Transformer (HIPT) model for the specific task of classification of colorectal biopsies and polyps. After evaluating the effectiveness of TCGA-learned features in the original HIPT model, we incorporate colon biopsy image information into HIPT's pretraining using two distinct strategies: (1) fine-tuning HIPT from the existing TCGA weights and (2) pretraining HIPT from random weight initialization. We compare the performance of these pretraining regimes on two colorectal biopsy classification tasks: binary and multiclass classification.
Updated: 2024-05-24 00:59:30
标题: 基准测试分层图像金字塔变换器用于结肠活检和组织病理学图像中息肉的分类
摘要: 在组织病理学全切片图像中,使用高质量的像素级标注来训练神经网络是一项昂贵的过程,因为全切片图像具有千兆像素的分辨率。然而,最近自监督学习的进展表明,可以学习到高度描述性的图像表示,而无需注释。我们研究了最近的分层图像金字塔变换器(HIPT)模型在结直肠活检和息肉分类特定任务中的应用。在评估原始HIPT模型中TCGA学习特征的有效性后,我们使用两种不同的策略将结肠活检图像信息引入HIPT的预训练中:(1)从现有的TCGA权重微调HIPT,(2)从随机权重初始化预训练HIPT。我们比较这些预训练方案在两个结直肠活检分类任务上的性能:二元分类和多类分类。
更新时间: 2024-05-24 00:59:30
领域: eess.IV,cs.AI,cs.CV
Scaling Law for Time Series Forecasting
Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.\footnote{Codes for our experiments will be made public at: \url{https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting}.
Updated: 2024-05-24 00:46:27
标题: 时间序列预测的缩放定律
摘要: 在深度学习的各个领域中观察到了一种奖励大型数据集、复杂模型和增强数据粒度的标度律。然而,关于时间序列预测的研究对于深度学习方法在时间序列预测中的标度行为产生了怀疑:虽然更多的训练数据会提高性能,但更强大的模型并不总是能胜过性能较差的模型,并且对于一些模型来说,更长的输入视野可能会损害性能。我们提出了一种适用于时间序列预测的标度律理论,可以解释这些表面上异常的行为。我们考虑了数据集大小和模型复杂性的影响,以及时间序列数据的粒度,特别关注了回顾视野这一方面,这在先前的理论中尚未被探索。此外,我们通过使用多样化的时间序列预测数据集对各种模型进行了实证评估,这既验证了数据集大小和模型复杂性在时间序列预测领域内的标度律的有效性,又验证了我们的理论框架,特别是关于回顾视野的影响。我们希望我们的研究成果能激发出针对有限规模时间序列预测数据集的新模型,以及未来工作中针对时间序列预测的大型基础数据集和模型。我们的实验代码将在以下网址公开:https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting。
更新时间: 2024-05-24 00:46:27
领域: cs.LG,cs.AI
Oil & Water? Diffusion of AI Within and Across Scientific Fields
This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting a dramatic shift from niche to mainstream. Moreover, we provide the first empirical examination of the distribution of AI-engaged publications across publication venues within individual fields, with results that reveal a broadening of AI engagement within disciplines. While this broadening engagement suggests a move toward greater disciplinary integration in every field, increased ubiquity is associated with a semantic tension between AI-engaged research and more traditional disciplinary research. Through an analysis of tens of millions of document embeddings, we observe a complex interplay between AI-engaged and non-AI-engaged research within and across fields, suggesting that increasing ubiquity is something of an oil-and-water phenomenon -- AI-engaged work is spreading out over fields, but not mixing well with non-AI-engaged work.
Updated: 2024-05-24 00:39:32
标题: 油与水?人工智能在科学领域内外的扩散
摘要: 这项研究从1985年到2022年,通过对20个不同科学领域约8000万篇研究论文的学术参与变化进行考察,实证调查了人工智能(AI)在研究领域中日益普遍的主张。我们观察到人工智能参与的文章呈指数增长,跨所有领域增长约13倍,表明从小众走向主流发生了戏剧性转变。此外,我们首次对各个领域内出版场所的人工智能参与文章分布进行了实证检验,结果显示人工智能在学科内的参与范围正在拓宽。尽管这种广泛的参与暗示着各领域朝着更大的学科整合方向发展,但普遍性的增强与AI参与研究和传统学科研究之间存在语义上的紧张关系。通过分析数千万篇文档嵌入,我们观察到AI参与和非AI参与研究在各领域内部和跨领域之间的复杂互动,暗示着普遍性增加是一种油水混合现象 - AI参与工作正在扩展到各个领域,但与非AI参与工作的融合并不顺利。
更新时间: 2024-05-24 00:39:32
领域: cs.DL,cs.AI
A Counterfactual Analysis of the Dishonest Casino
The dishonest casino is a well-known hidden Markov model (HMM) used in educational settings to introduce HMMs and graphical models. Here, a sequence of die rolls is observed, with the casino switching between a fair and a loaded die. Typically, the goal is to use the observed rolls to infer the pattern of fair and loaded dice, leading to filtering, smoothing, and Viterbi algorithms. This paper, however, explores how much of the winnings is attributable to the casino's cheating, a counterfactual question beyond the scope of HMM primitives. To address this, we introduce a structural causal model (SCM) consistent with the HMM and show that the expected winnings attributable to cheating (EWAC) can be bounded using linear programs (LPs). Through numerical experiments, we compute these bounds and develop intuition using benchmark SCMs based on independence, comonotonic, and counter-monotonic copulas. We show that tighter bounds are obtained with a time-homogeneity condition on the SCM, while looser bounds allow for an almost explicit LP solution. Domain-specific knowledge like pathwise monotonicity or counterfactual stability can be incorporated via linear constraints. Our work contributes to bounding counterfactuals in causal inference and is the first to develop LP bounds in a dynamic HMM setting, benefiting educational contexts where counterfactual inference is taught.
Updated: 2024-05-24 00:26:54
标题: 一个不实分析关于不诚实赌场
摘要: 这篇文献摘要介绍了一个著名的隐藏马尔可夫模型(HMM),名为不诚实的赌场,用于教育环境中介绍HMM和图形模型。在这里,观察到一系列掷骰子的结果,赌场在公平和加载的骰子之间切换。通常,目标是利用观察到的掷骰子结果推断公平和加载骰子的模式,从而引出滤波、平滑和维特比算法。然而,本文探讨了赌场作弊所导致的赢利的多少,这是一个超出HMM范围的反事实问题。为了解决这个问题,我们引入了一个与HMM一致的结构因果模型(SCM),并展示了利用线性规划(LPs)可以限定作弊所导致的预期收益(EWAC)。通过数值实验,我们计算了这些限制,并利用基于独立性、共单调和反单调copulas的基准SCMs发展了直观理解。我们展示了在SCM上使用时间均匀性条件可以获得更紧密的限制,而更宽松的限制则允许几乎显式的LP解决方案。领域特定的知识,如路径单调性或反事实稳定性,可以通过线性约束加入。我们的工作有助于在因果推断中限制反事实,并是首次在动态HMM设置中开发LP限制,有益于教育背景下教授反事实推断的情况。
更新时间: 2024-05-24 00:26:54
领域: cs.LG
Bayesian Optimization of Functions over Node Subsets in Graphs
We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph structure without considering the characteristics of the function. To address these limitations, we utilize Bayesian Optimization (BO), a sample-efficient black-box solver, and propose a novel framework for combinatorial optimization on graphs. More specifically, we map each $k$-node subset in the original graph to a node in a new combinatorial graph and adopt a local modeling approach to efficiently traverse the latter graph by progressively sampling its subgraphs using a recursive algorithm. Extensive experiments under both synthetic and real-world setups demonstrate the effectiveness of the proposed BO framework on various types of graphs and optimization tasks, where its behavior is analyzed in detail with ablation studies.
Updated: 2024-05-24 00:24:55
标题: 贝叶斯优化在图中节点子集函数上的应用
摘要: 我们解决了在图上定义节点子集上进行优化的问题。由于这些函数的组合性质、黑盒性质和昂贵的评估成本,优化这些函数通常是一项非常复杂的任务。尽管文献中引入了各种算法,但大多数要么是特定于任务,要么计算效率低下,并且只利用了关于图结构的信息,而没有考虑函数的特性。为了解决这些限制,我们利用贝叶斯优化(BO),一种高效率的黑盒求解器,并提出了一种新颖的图组合优化框架。更具体地说,我们将原始图中的每个k节点子集映射到一个新的组合图中的一个节点,并采用局部建模方法来有效地遍历后者图,通过递归算法逐渐对其子图进行采样。在合成和真实世界设置下进行的大量实验表明,所提出的BO框架在各种类型的图和优化任务上都表现出了有效性,其行为通过消融研究进行了详细分析。
更新时间: 2024-05-24 00:24:55
领域: cs.LG,stat.ML
STC-ViT: Spatio Temporal Continuous Vision Transformer for Weather Forecasting
Operational weather forecasting system relies on computationally expensive physics-based models. Recently, transformer based models have shown remarkable potential in weather forecasting achieving state-of-the-art results. However, transformers are discrete models which limit their ability to learn the continuous spatio-temporal features of the dynamical weather system. We address this issue with STC-ViT, a Spatio-Temporal Continuous Vision Transformer for weather forecasting. STC-ViT incorporates the continuous time Neural ODE layers with multi-head attention mechanism to learn the continuous weather evolution over time. The attention mechanism is encoded as a differentiable function in the transformer architecture to model the complex weather dynamics. We evaluate STC-ViT against a operational Numerical Weather Prediction (NWP) model and several deep learning based weather forecasting models. STC-ViT performs competitively with current data-driven methods in global forecasting while only being trained at lower resolution data and with less compute power.
Updated: 2024-05-24 00:19:33
标题: STC-ViT:用于天气预测的时空连续视觉变换器
摘要: 气象运营预报系统依赖于计算密集的基于物理的模型。最近,基于Transformer的模型在气象预报中表现出显著的潜力,取得了最先进的结果。然而,Transformers是离散模型,限制了它们学习动态天气系统连续时空特征的能力。我们提出了STC-ViT,一种用于天气预报的时空连续视觉Transformer。STC-ViT结合了连续时间神经ODE层和多头注意力机制,以学习随时间连续变化的天气演变。注意力机制被编码为Transformer架构中的可微函数,以建模复杂的天气动态。我们将STC-ViT与运营数值天气预报(NWP)模型和几种基于深度学习的天气预报模型进行了评估。STC-ViT在全球预测中表现出与当前数据驱动方法竞争力相当的水平,而且只是在较低分辨率数据和更少计算资源的情况下进行训练。
更新时间: 2024-05-24 00:19:33
领域: cs.LG
Quantifying the Gain in Weak-to-Strong Generalization
Recent advances in large language models have shown capabilities that are extraordinary and near-superhuman. These models operate with such complexity that reliably evaluating and aligning them proves challenging for humans. This leads to the natural question: can guidance from weak models (like humans) adequately direct the capabilities of strong models? In a recent and somewhat surprising work, Burns et al. (2023) empirically demonstrated that when strong models (like GPT-4) are finetuned using labels generated by weak supervisors (like GPT-2), the strong models outperform their weaker counterparts -- a phenomenon they term weak-to-strong generalization. In this work, we present a theoretical framework for understanding weak-to-strong generalization. Specifically, we show that the improvement in performance achieved by strong models over their weaker counterparts is quantified by the misfit error incurred by the strong model on labels generated by the weaker model. Our theory reveals several curious algorithmic insights. For instance, we can predict the amount by which the strong model will improve over the weak model, and also choose among different weak models to train the strong model, based on its misfit error. We validate our theoretical findings through various empirical assessments.
Updated: 2024-05-24 00:14:16
标题: 量化弱到强泛化的收益
摘要: 最近大型语言模型的进展显示出了非同寻常和几乎超人的能力。这些模型运行的复杂性如此之高,以至于对它们进行可靠评估和对齐对人类来说是具有挑战性的。这引出了一个自然的问题:弱模型(如人类)的指导是否能够充分引导强模型的能力?在最近的一项有些令人惊讶的研究中,Burns等人(2023)经验性地证明,当强模型(如GPT-4)使用由弱监督者(如GPT-2)生成的标签进行微调时,强模型表现出了优于其较弱对应物的能力--他们称之为弱到强的泛化现象。 在这项工作中,我们提出了一个理论框架来理解弱到强的泛化。具体来说,我们展示了强模型相对于其较弱对应物在性能上的提升是由强模型在由弱模型生成的标签上产生的误差引起的。我们的理论揭示了几个有趣的算法洞见。例如,我们可以预测强模型将比弱模型提高多少,并且还可以根据其误差选择不同的弱模型来训练强模型。我们通过各种经验评估验证了我们的理论发现。
更新时间: 2024-05-24 00:14:16
领域: cs.LG,cs.AI
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation $\mathbb{E}[Y|X]$ and the conditional variance Var$(Y|X)$. This additional uncertainty quantification objective provides a handle to (i) better design out-of-distribution experiments to distinguish ICL from in-weight learning (IWL) and (ii) make a better separation between the algorithms with and without using the prior information of the training distribution. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training distribution. Our method can be extended to other cases. Specifically, with the Transformer's context window $S$, we prove a generalization bound of $\tilde{\mathcal{O}}(\sqrt{\min\{S, T\}/(n T)})$ on $n$ tasks with sequences of length $T$, providing sharper analysis compared to previous results of $\tilde{\mathcal{O}}(\sqrt{1/n})$. Empirically, we illustrate that while the trained Transformer behaves as the Bayes-optimal solution as a natural consequence of supervised training in distribution, it does not necessarily perform a Bayesian inference when facing task shifts, in contrast to the \textit{equivalence} between these two proposed in many existing literature. We also demonstrate the trained Transformer's ICL ability over covariates shift and prompt-length shift and interpret them as a generalization over a meta distribution.
Updated: 2024-05-24 00:08:55
标题: 朝着更好地理解上下文学习能力的方向,从上下文不确定性量化开始
摘要: 预测简单函数类别已被广泛用作开发理论和理解训练Transformer的上下文学习(ICL)能力的试验平台。在本文中,我们重新审视了Transformer在线性回归任务上的训练,与所有现有文献不同,我们考虑了一个双目标预测任务,即预测条件期望 $\mathbb{E}[Y|X]$ 和条件方差 Var$(Y|X)$。这个额外的不确定性量化目标提供了一个方法来(i)更好地设计超出分布的实验,以区分ICL和权重内学习(IWL),以及(ii)更好地区分使用训练分布的先验信息和不使用该信息的算法。理论上,我们展示了训练后的Transformer接近贝叶斯最优,表明利用训练分布的信息。我们的方法可以扩展到其他情况。具体来说,通过Transformer的上下文窗口$S$,我们证明了在长度为$T$的序列上进行$n$个任务的泛化界为$\tilde{\mathcal{O}}(\sqrt{\min\{S, T\}/(n T)})$,相比于以前的结果$\tilde{\mathcal{O}}(\sqrt{1/n})$,提供了更精确的分析。在实证方面,我们阐明了尽管经过监督训练的Transformer在分布中表现为贝叶斯最优解的自然结果,但当面对任务转移时,并不一定执行贝叶斯推断,与许多现有文献中提出的这两者之间的“等价性”形成对比。我们还展示了经过训练的Transformer在协变量转移和提示长度转移上的ICL能力,并将其解释为对元分布的泛化。
更新时间: 2024-05-24 00:08:55
领域: cs.LG,cs.CL,stat.ML