MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents.
Updated: 2024-06-04 23:47:43
标题: MedAgents:将大型语言模型作为零样本医学推理的合作者
摘要: 大型语言模型(LLMs),尽管在各个通用领域取得了显著进展,但在医学和医疗保健领域面临重大障碍。这一领域面临着独特的挑战,如领域特定术语和对专业知识的推理。为了解决这些问题,我们提出了MedAgents,这是一个用于医学领域的新型跨学科合作框架。MedAgents利用基于LLM的代理在角色扮演设置中参与协作多轮讨论,从而增强LLM的熟练度和推理能力。这个无需培训的框架包括五个关键步骤:收集领域专家,提出个别分析,将这些分析总结成报告,反复讨论直到达成共识,最终做出决策。我们的工作侧重于零-shot设置,在现实场景中适用。对九个数据集(MedQA、MedMCQA、PubMedQA和MMLU的六个子任务)的实验结果表明,我们提出的MedAgents框架在挖掘和利用LLMs内的医学专业知识以及扩展其推理能力方面表现出色。我们的代码可在https://github.com/gersteinlab/MedAgents 找到。
更新时间: 2024-06-04 23:47:43
领域: cs.CL,cs.AI
Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction
Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group (e.g., queer women) fairness and "rich get richer" dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN's preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores within social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, social, and credit networks.
Updated: 2024-06-04 23:47:06
标题: 网络不平等:图神经网络链接预测中的优先附加偏差
摘要: 图神经网络(GNN)链接预测越来越多地应用于引用、合作和在线社交网络,以推荐学术文献、合作者和朋友。尽管先前的研究已经调查了GNN链接预测的双重公平性,但是链接预测的群内(例如,酷儿女性)公平性和“富者愈富”动态仍然未被充分探讨。然而,这些方面对网络中的程度和权力不平衡有重要影响。在本文中,我们阐明了网络中程度偏差如何影响图卷积网络(GCN)链接预测。特别是,我们在理论上揭示了具有对称归一化图滤波器的GCN具有群内优先附着偏差。我们在现实世界的引用、合作和在线社交网络上验证了我们的理论分析。我们进一步将GCN的优先附着偏差与链接预测中的不公平联系起来,并提出了一种新的群内公平度量。该度量量化了社会群体内链接预测分数的不平等,以抵制程度和权力不平等的放大。最后,我们提出了一种简单的训练时策略来减轻群内不公平,并证明其在引用、社交和信用网络上是有效的。
更新时间: 2024-06-04 23:47:06
领域: cs.LG,cs.CY,cs.SI
ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models
Text-to-image diffusion models have recently taken center stage as pivotal tools in promoting visual creativity across an array of domains such as comic book artistry, children's literature, game development, and web design. These models harness the power of artificial intelligence to convert textual descriptions into vivid images, thereby enabling artists and creators to bring their imaginative concepts to life with unprecedented ease. However, one of the significant hurdles that persist is the challenge of maintaining consistency in character generation across diverse contexts. Variations in textual prompts, even if minor, can yield vastly different visual outputs, posing a considerable problem in projects that require a uniform representation of characters throughout. In this paper, we introduce a novel framework designed to produce consistent character representations from a single text prompt across diverse settings. Through both quantitative and qualitative analyses, we demonstrate that our framework outperforms existing methods in generating characters with consistent visual identities, underscoring its potential to transform creative industries. By addressing the critical challenge of character consistency, we not only enhance the practical utility of these models but also broaden the horizons for artistic and creative expression.
Updated: 2024-06-04 23:39:08
标题: ORACLE:利用互信息与扩散模型中的LoRAs实现一致字符生成
摘要: 文本到图像扩散模型最近成为推动视觉创意在漫画艺术、儿童文学、游戏开发和网页设计等领域的关键工具。这些模型利用人工智能的力量将文本描述转换为生动的图像,从而使艺术家和创作者能够以前所未有的轻松程度将他们的想象概念变为现实。然而,一个持续存在的重要障碍是在不同情境下保持角色生成的一致性的挑战。即使是微小的文本提示变化也可能导致完全不同的视觉输出,这在需要角色统一表示的项目中构成了一个相当大的问题。在本文中,我们介绍了一个新颖的框架,旨在从单个文本提示中产生一致的角色表示,跨多种环境。通过定量和定性分析,我们证明我们的框架在生成具有一致视觉身份的角色方面优于现有方法,凸显了其转变创意产业的潜力。通过解决角色一致性的关键挑战,我们不仅提高了这些模型的实际效用,也扩展了艺术和创意表达的视野。
更新时间: 2024-06-04 23:39:08
领域: cs.CV,cs.LG
How many labelers do you have? A closer look at gold-standard labels
The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of "gold-standard". We question the wisdom of this pipeline by developing a (stylized) theoretical model of this process and analyzing its statistical consequences, showing how access to non-aggregated label information can make training well-calibrated models more feasible than it is with gold-standard labels. The entire story, however, is subtle, and the contrasts between aggregated and fuller label information depend on the particulars of the problem, where estimators that use aggregated information exhibit robust but slower rates of convergence, while estimators that can effectively leverage all labels converge more quickly if they have fidelity to (or can learn) the true labeling process. The theory makes several predictions for real-world datasets, including when non-aggregate labels should improve learning performance, which we test to corroborate the validity of our predictions.
Updated: 2024-06-04 23:23:11
标题: 你有多少标签者?对黄金标准标签进行更仔细的研究
摘要: 大多数监督学习数据集的构建都围绕着为每个实例收集多个标签,然后将这些标签聚合形成一种“黄金标准”。我们通过开发一个(程式化的)理论模型来质疑这一流程的智慧,并分析其统计结果,展示了访问非聚合标签信息如何使训练良好校准的模型比使用黄金标准标签更可行。整个故事,然而,是微妙的,聚合和更充分的标签信息之间的对比取决于问题的具体情况,使用聚合信息的估计器表现出稳健但较慢的收敛速度,而可以有效利用所有标签的估计器如果对真实标记过程忠实(或能够学习)则更快收敛。该理论对真实世界数据集提出了几个预测,包括非聚合标签何时应该提高学习性能,我们进行测试以验证我们预测的有效性。
更新时间: 2024-06-04 23:23:11
领域: math.ST,cs.HC,cs.LG,stat.TH
Harnessing the Power of Neural Operators with Automatically Encoded Conservation Laws
Neural operators (NOs) have emerged as effective tools for modeling complex physical systems in scientific machine learning. In NOs, a central characteristic is to learn the governing physical laws directly from data. In contrast to other machine learning applications, partial knowledge is often known a priori about the physical system at hand whereby quantities such as mass, energy and momentum are exactly conserved. Currently, NOs have to learn these conservation laws from data and can only approximately satisfy them due to finite training data and random noise. In this work, we introduce conservation law-encoded neural operators (clawNOs), a suite of NOs that endow inference with automatic satisfaction of such conservation laws. ClawNOs are built with a divergence-free prediction of the solution field, with which the continuity equation is automatically guaranteed. As a consequence, clawNOs are compliant with the most fundamental and ubiquitous conservation laws essential for correct physical consistency. As demonstrations, we consider a wide variety of scientific applications ranging from constitutive modeling of material deformation, incompressible fluid dynamics, to atmospheric simulation. ClawNOs significantly outperform the state-of-the-art NOs in learning efficacy, especially in small-data regimes.
Updated: 2024-06-04 22:43:59
标题: 利用自动编码的守恒定律来利用神经算子的力量
摘要: 神经算子(NOs)已经成为科学机器学习中建模复杂物理系统的有效工具。在NOs中,一个核心特征是直接从数据中学习支配物理规律。与其他机器学习应用不同,通常关于手头的物理系统有部分先验知识,其中诸如质量、能量和动量等量是确切守恒的。目前,NOs必须从数据中学习这些守恒定律,并且由于有限的训练数据和随机噪声,只能近似满足它们。在这项工作中,我们介绍了编码了守恒定律的神经算子(clawNOs),一套使推断自动满足这种守恒定律的NOs。ClawNOs构建于解场的无散预测之上,因此连续性方程被自动保证。因此,clawNOs符合最基本和普遍的守恒定律,这对于正确的物理一致性至关重要。作为示范,我们考虑了各种科学应用,从材料变形的构成建模、不可压缩流体动力学到大气模拟。ClawNOs在学习效果方面明显优于最先进的NOs,特别是在小数据情况下。
更新时间: 2024-06-04 22:43:59
领域: cs.LG,cs.CE,cs.NA,math.NA
Prototypical Self-Explainable Models Without Re-training
Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs (The code is available at https://github.com/SrishtiGautam/KMEx).
Updated: 2024-06-04 22:40:40
标题: 无需重新训练的原型自解释模型
摘要: 可解释人工智能(XAI)在两个不同的研究方向上展开,一方面是事后方法,用于解释预训练的黑盒模型的预测,另一方面是自解释模型(SEMs),直接训练以在其预测之外提供解释。虽然后者在安全关键场景中更受青睐,但事后方法到目前为止占据了大部分注意力,因为其简单性和能够解释基础模型而无需重新训练。当前的SEMs需要复杂的架构和重度正则化损失函数,从而需要特定且昂贵的训练。为了解决这个缺点并促进更广泛地使用SEMs,我们提出了一种简单而高效的通用方法,称为KMEx(K-Means Explainer),可以将任何现有的预训练模型转换为典型的SEM。KMEx的动机是通过基于类原型的解释增强基于深度学习的决策的透明度,这些解释是多样且可信的,而无需重新训练基础模型。我们通过广泛的定性评估将通过KMEx获得的模型与最先进的SEMs进行比较,以突出每个模型的优势和劣势,进一步为对SEMs的更可靠和客观评估铺平道路(代码可在https://github.com/SrishtiGautam/KMEx上找到)。
更新时间: 2024-06-04 22:40:40
领域: cs.LG,cs.AI
Linguistic Calibration of Long-Form Generations
Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions. This definition enables a training framework where a supervised finetuning step bootstraps an LM to emit long-form generations with confidence statements such as "I estimate a 30% chance of..." or "I am certain that...", followed by a reinforcement learning step which rewards generations that enable a user to provide calibrated answers to related questions. We linguistically calibrate Llama 2 7B and find in automated and human evaluations of long-form generations that it is significantly more calibrated than strong finetuned factuality baselines with comparable accuracy. These findings generalize under significant domain shifts to scientific and biomedical questions and to an entirely held-out person biography generation task. Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making.
Updated: 2024-06-04 22:39:58
标题: 长篇生成文本的语言校准
摘要: 语言模型(LMs)可能会在自信地产生幻觉时导致用户做出次优的下游决策。这个问题可以通过让LM口头传达其主张正确性的概率来缓解,但现有模型无法生成带有校准置信度陈述的长篇文本。通过决策的视角,我们为长篇生成定义了语言校准:如果LM的生成使其用户能够做出校准的概率预测,那么该LM就是语言校准的。这个定义启发了一个训练框架,其中监督微调步骤引导LM发出带有置信度陈述的长篇生成,比如“我估计有30%的机会…”或“我确定…”,接着是一个强化学习步骤,奖励那些使用户能够提供与相关问题校准答案的生成。我们对Llama 27B进行了语言校准,并在长篇生成的自动化和人工评估中发现,与具有可比准确性的强微调事实基线相比,它显著更加校准。这些发现在领域巨大转变到科学和生物医学问题以及完全保留的人物传记生成任务下具有普遍意义。我们的结果表明,通过构建用户在下游决策中所做预测空间内的目标,长篇生成可以端到端地进行校准。
更新时间: 2024-06-04 22:39:58
领域: cs.LG,cs.AI,cs.CL,stat.ML
Randomized Geometric Algebra Methods for Convex Neural Networks
We introduce randomized algorithms to Clifford's Geometric Algebra, generalizing randomized linear algebra to hypercomplex vector spaces. This novel approach has many implications in machine learning, including training neural networks to global optimality via convex optimization. Additionally, we consider fine-tuning large language model (LLM) embeddings as a key application area, exploring the intersection of geometric algebra and modern AI techniques. In particular, we conduct a comparative analysis of the robustness of transfer learning via embeddings, such as OpenAI GPT models and BERT, using traditional methods versus our novel approach based on convex optimization. We test our convex optimization transfer learning method across a variety of case studies, employing different embeddings (GPT-4 and BERT embeddings) and different text classification datasets (IMDb, Amazon Polarity Dataset, and GLUE) with a range of hyperparameter settings. Our results demonstrate that convex optimization and geometric algebra not only enhances the performance of LLMs but also offers a more stable and reliable method of transfer learning via embeddings.
Updated: 2024-06-04 22:22:39
标题: 随机几何代数方法用于凸神经网络
摘要: 我们将随机算法引入克利福德几何代数,将随机线性代数推广到超复数向量空间。这种新颖的方法在机器学习中有许多影响,包括通过凸优化训练神经网络达到全局最优性。此外,我们将微调大型语言模型(LLM)嵌入视为一个关键应用领域,探索几何代数与现代人工智能技术的交集。具体而言,我们通过凸优化基于传统方法和我们的新颖方法对传递学习的嵌入(如OpenAI GPT模型和BERT)的鲁棒性进行了比较分析。我们在各种案例研究中测试了我们的凸优化传递学习方法,使用不同的嵌入(GPT-4和BERT嵌入)和不同的文本分类数据集(IMDb、亚马逊极性数据集和GLUE)以及一系列超参数设置。我们的结果表明,凸优化和几何代数不仅提高了LLM的性能,而且提供了一种更稳定可靠的通过嵌入进行传递学习的方法。
更新时间: 2024-06-04 22:22:39
领域: cs.LG,math.OC,stat.ML
Will we run out of data? Limits of LLM scaling based on human-generated data
We investigate the potential constraints on LLM scaling posed by the availability of public human-generated text data. We forecast the growing demand for training data based on current trends and estimate the total stock of public human text data. Our findings indicate that if current LLM development trends continue, models will be trained on datasets roughly equal in size to the available stock of public human text data between 2026 and 2032, or slightly earlier if models are overtrained. We explore how progress in language modeling can continue when human-generated text datasets cannot be scaled any further. We argue that synthetic data generation, transfer learning from data-rich domains, and data efficiency improvements might support further progress.
Updated: 2024-06-04 22:09:46
标题: 我们会耗尽数据吗?基于人类生成数据的LLM扩展的限制
摘要: 我们调查了公共人类生成文本数据对LLM扩展可能施加的潜在限制。我们根据当前趋势预测了对训练数据的不断增长需求,并估计了公共人类文本数据的总库存。我们的研究结果表明,如果当前的LLM发展趋势持续下去,模型将在2026年至2032年之间或稍早的时间被训练在与公共人类文本数据库存规模大致相同的数据集上,或者如果模型被过度训练。我们探讨了当人类生成文本数据集无法进一步扩展时,语言建模的进展如何继续。我们认为,合成数据生成、从数据丰富的领域进行迁移学习以及数据效率改进可能支持进一步的进展。
更新时间: 2024-06-04 22:09:46
领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.CY
$\texttt{ACCORD}$: Closing the Commonsense Measurability Gap
We present $\texttt{ACCORD}$, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. $\texttt{ACCORD}$ introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, $\texttt{ACCORD}$ can automatically generate benchmarks of arbitrary reasoning complexity, and so it scales with future LLM improvements. Benchmarking state-of-the-art LLMs -- including GPT-4o (2024-05-13), Llama-3-70B-Instruct, and Mixtral-8x22B-Instruct-v0.1 -- shows performance degrading to random chance with only moderate scaling, leaving substantial headroom for improvement. We release a leaderboard of the benchmark suite tested in this work, as well as code for automatically generating more complex benchmarks.
Updated: 2024-06-04 22:08:24
标题: $\texttt{ACCORD}$:弥合常识可测性差距
摘要: 我们提出了$\texttt{ACCORD}$,一个用于通过受控的多跳反事实情境来解开大型语言模型(LLMs)的常识基础和推理能力的框架和基准套件。$\texttt{ACCORD}$引入了形式元素来对常识推理进行明确控制和量化,超越了典型的1或2跳。独特的是,$\texttt{ACCORD}$可以自动生成任意推理复杂度的基准,因此它随着未来LLM的改进而扩展。对最先进的LLM进行基准测试--包括GPT-4o(2024-05-13)、Llama-3-70B-Instruct和Mixtral-8x22B-Instruct-v0.1--显示性能在仅有适度扩展的情况下下降到随机机会水平,留下了大量改进的空间。我们发布了在本研究中测试的基准套件的排行榜,以及用于自动生成更复杂基准的代码。
更新时间: 2024-06-04 22:08:24
领域: cs.AI,cs.CL,cs.LG,I.2.0; I.2.7
Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks
Graph Neural Networks (GNNs) are recognized as potent tools for processing real-world data organized in graph structures. Especially inductive GNNs, which enable the processing of graph-structured data without relying on predefined graph structures, are gaining importance in an increasingly wide variety of applications. As these networks demonstrate proficiency across a range of tasks, they become lucrative targets for model-stealing attacks where an adversary seeks to replicate the functionality of the targeted network. A large effort has been made to develop model-stealing attacks that focus on models trained with images and texts. However, little attention has been paid to GNNs trained on graph data. This paper introduces a novel method for unsupervised model-stealing attacks against inductive GNNs, based on graph contrasting learning and spectral graph augmentations to efficiently extract information from the target model. The proposed attack is thoroughly evaluated on six datasets. The results show that this approach demonstrates a higher level of efficiency compared to existing stealing attacks. More concretely, our attack outperforms the baseline on all benchmarks achieving higher fidelity and downstream accuracy of the stolen model while requiring fewer queries sent to the target model.
Updated: 2024-06-04 22:08:09
标题: 对归纳图神经网络的有效模型窃取攻击
摘要: 图神经网络(GNNs)被认为是处理组织在图结构中的现实世界数据的有效工具。特别是归纳式GNNs,使得在不依赖预定义图结构的情况下处理图结构化数据成为可能,在越来越广泛的应用中变得越来越重要。随着这些网络展示出在各种任务上的熟练度,它们成为了模型窃取攻击的有利目标,其中对手试图复制目标网络的功能。已经做出了大量努力,以开发专注于使用图像和文本训练的模型的模型窃取攻击。然而,对于在图数据上训练的GNNs却鲜有人关注。本文介绍了一种基于图对比学习和谱图增强的新型无监督模型窃取攻击方法,用于高效地从目标模型中提取信息。所提出的攻击在六个数据集上进行了全面评估。结果表明,这种方法相比现有的窃取攻击表现出更高的效率。更具体地说,我们的攻击在所有基准测试中表现优于基线,实现了更高的模型保真度和被窃模型的下游准确性,同时需要发送更少的查询到目标模型。
更新时间: 2024-06-04 22:08:09
领域: cs.LG
The Illusion of State in State-Space Models
State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$. In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that Mamba-style SSMs indeed struggle with state tracking. Thus, despite its recurrent formulation, the "state" in an SSM is an illusion: SSMs have similar expressiveness limitations to non-recurrent models like transformers, which may fundamentally limit their ability to solve real-world state-tracking problems.
Updated: 2024-06-04 22:05:45
标题: 状态空间模型中的状态幻觉
摘要: 状态空间模型(SSMs)已经成为相对于先前普遍存在的transformer架构而言构建大型语言模型(LLMs)的潜在替代方案。transformers的一个理论弱点是它们无法表达某些类型的顺序计算和状态跟踪(Merrill&Sabharwal,2023),而SSMs则通过其与递归神经网络(RNNs)的紧密架构相似性明确设计来解决这个问题。但是SSMs是否真正具有优势(相对于transformers)在状态跟踪方面的表现能力?令人惊讶的是,答案是否定的。我们的分析揭示了SSMs的表现能力与transformers非常类似:SSMs无法表达超出复杂度类$\mathsf{TC}^0$的计算。特别是,这意味着它们无法解决像排列组合这样简单的状态跟踪问题。由此可见,SSMs被证明无法准确跟踪具有特定符号的棋盘移动,评估代码或在一个长篇叙述中跟踪实体。为了补充我们的正式分析,我们报告了实验证明Mamba风格的SSMs确实在状态跟踪方面遇到困难。因此,尽管其递归构造,SSM中的“状态”是一个幻觉:SSMs具有与transformers等非递归模型类似的表达能力限制,这可能从根本上限制了它们解决现实世界的状态跟踪问题的能力。
更新时间: 2024-06-04 22:05:45
领域: cs.LG,cs.CC,cs.CL,cs.FL
Sparse and Structured Hopfield Networks
Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.
Updated: 2024-06-04 22:04:40
标题: 稀疏和结构化的霍普菲尔德网络
摘要: 现代Hopfield网络由于与transformer中的注意力机制相关而受到近期关注。我们的论文通过与Fenchel-Young损失建立联系,为稀疏Hopfield网络提供了统一框架。结果是一种新的Hopfield-Fenchel-Young能量家族,其更新规则是端到端可微稀疏变换。我们揭示了损失边界、稀疏性和精确记忆检索之间的联系。我们进一步通过SparseMAP变换将这一框架扩展到结构化Hopfield网络,该网络可以检索模式关联而不是单个模式。在多实例学习和文本理解方面的实验表明了我们方法的实用性。
更新时间: 2024-06-04 22:04:40
领域: cs.LG
Auditing Privacy Mechanisms via Label Inference Attacks
We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., aggregate of labels from different users or noisy labels output by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We consider two such auditing measures: one additive, and one multiplicative. These incorporate previous approaches taken in the literature on empirical auditing and differential privacy. The measures allow us to place a variety of proposed privatization schemes -- some differentially private, some not -- on the same footing. We analyze these measures theoretically under a distributional model which encapsulates reasonable adversarial settings. We also quantify their behavior empirically on real and simulated prediction tasks. Across a range of experimental settings, we find that differentially private schemes dominate or match the privacy-utility tradeoff of more heuristic approaches.
Updated: 2024-06-04 21:48:30
标题: 通过标签推断攻击审计隐私机制
摘要: 我们提出了重建优势度量来审计标签私有化机制。重建优势度量量化了攻击者在得到数据集中标签的私有化版本(例如,来自不同用户的标签的聚合或由随机响应输出的嘈杂标签)时,推断未标记示例的真实标签的能力增加的程度,相比之下,仅观察特征向量但可能具有先前了解特征和标签之间相关性的攻击者。我们考虑了两种这样的审计度量:一种是加法的,一种是乘法的。这些度量结合了文献中关于经验审计和差分隐私的先前方法。这些度量使我们能够将各种提出的私有化方案(一些差分私有化,一些不是)置于同一水平。我们在一个包含合理对抗设置的分布模型下从理论上分析了这些度量。我们还在真实和模拟的预测任务中对它们的行为进行了实证量化。在一系列实验设置中,我们发现差分私有化方案在隐私-效用权衡方面占据主导地位或与更启发式方法相匹敌。
更新时间: 2024-06-04 21:48:30
领域: cs.LG,cs.CR
Calibrated and Conformal Propensity Scores for Causal Effect Estimation
Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated -- i.e., a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group -- and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.
Updated: 2024-06-04 21:40:48
标题: 校准和一致的倾向评分用于因果效应估计
摘要: 倾向分数通常用于从观察数据中估计治疗效果。我们认为,学习的倾向分数模型的概率输出应该进行校准,即预测治疗概率为90%应该对应于90%的个体被分配到治疗组,并且我们提出了简单的重新校准技术来确保这一属性。我们证明了校准是使用流行的逆倾向加权和双重稳健估计器进行无偏治疗效果估计时的必要条件。我们推导出因果效应估计的误差界限,直接与概率倾向分数模型提供的不确定性质量相关,并且显示出校准严格改善了这一误差界限,同时避免了极端的倾向权重。我们展示了在包括高维图像协变量和全基因组关联研究(GWASs)在内的多项任务中,使用校准的倾向分数可以改善因果效应估计。通过使得可以使用更简单、更快速训练的模型,校准的倾向分数使得GWAS分析的速度提高了两倍以上。
更新时间: 2024-06-04 21:40:48
领域: stat.ME,cs.LG,I.2.m
Modelling Commonsense Commonalities with Multi-Facet Concept Embeddings
Concept embeddings offer a practical and efficient mechanism for injecting commonsense knowledge into downstream tasks. Their core purpose is often not to predict the commonsense properties of concepts themselves, but rather to identify commonalities, i.e.\ sets of concepts which share some property of interest. Such commonalities are the basis for inductive generalisation, hence high-quality concept embeddings can make learning easier and more robust. Unfortunately, standard embeddings primarily reflect basic taxonomic categories, making them unsuitable for finding commonalities that refer to more specific aspects (e.g.\ the colour of objects or the materials they are made of). In this paper, we address this limitation by explicitly modelling the different facets of interest when learning concept embeddings. We show that this leads to embeddings which capture a more diverse range of commonsense properties, and consistently improves results in downstream tasks such as ultra-fine entity typing and ontology completion.
Updated: 2024-06-04 21:36:42
标题: 用多维概念嵌入模拟常识共性
摘要: 概念嵌入提供了一种实用和高效的机制,可将常识知识注入到下游任务中。它们的核心目的通常不是预测概念本身的常识属性,而是识别共性,即共享某种感兴趣属性的概念集合。这些共性是归纳概括的基础,因此高质量的概念嵌入可以使学习更容易、更健壮。不幸的是,标准嵌入主要反映基本的分类类别,使其不适合用于找到涉及更具体方面的共性(例如物体的颜色或其材质)。在本文中,我们通过在学习概念嵌入时明确建模不同的感兴趣方面来解决这一限制。我们展示这导致嵌入可以捕捉更多元化的常识属性,并在下游任务(如超细实体类型和本体完成)中持续改进结果。
更新时间: 2024-06-04 21:36:42
领域: cs.AI,cs.CL
Calibrated Regression Against An Adversary Without Regret
We are interested in probabilistic prediction in online settings in which data does not follow a probability distribution. Our work seeks to achieve two goals: (1) producing valid probabilities that accurately reflect model confidence; and (2) ensuring that traditional notions of performance (e.g., high accuracy) still hold. We introduce online algorithms guaranteed to achieve these goals on arbitrary streams of data points, including data chosen by an adversary. Specifically, our algorithms produce forecasts that are (1) calibrated -- i.e., an 80% confidence interval contains the true outcome 80% of the time -- and (2) have low regret relative to a user-specified baseline model. We implement a post-hoc recalibration strategy that provably achieves these goals in regression; previous algorithms applied to classification or achieved (1) but not (2). In the context of Bayesian optimization, an online model-based decision-making task in which the data distribution shifts over time, our method yields accelerated convergence to improved optima.
Updated: 2024-06-04 21:33:04
标题: 不带遗憾对抗对手的校准回归
摘要: 我们对在线环境中的概率预测感兴趣,其中数据不遵循概率分布。我们的工作旨在实现两个目标:(1)生成准确反映模型置信度的有效概率;(2)确保传统性能概念(如高准确性)仍然成立。我们引入了在线算法,保证能在任意数据流中实现这些目标,包括对手选择的数据。具体来说,我们的算法产生的预测具有以下特点:(1)校准——即80%的置信区间在80%的时间内包含真实结果;(2)相对于用户指定的基线模型,具有低遗憾。我们实施了一种事后重新校准策略,在回归中可以证明实现这些目标;先前的算法应用于分类或实现了(1)但没有实现(2)。在贝叶斯优化的背景下,这是一种数据分布随时间变化的在线模型决策任务,在这种情况下,我们的方法加速收敛到更好的最优解。
更新时间: 2024-06-04 21:33:04
领域: cs.LG
Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Classical planning approaches guarantee finding a set of actions that can achieve a given goal state when possible, but require an expert to specify logical action semantics that govern the dynamics of the environment. Researchers have shown that Large Language Models (LLMs) can be used to directly infer planning steps based on commonsense knowledge and minimal domain information alone, but such plans often fail on execution. We bring together the strengths of classical planning and LLM commonsense inference to perform domain induction, learning and validating action pre- and post-conditions based on closed-loop interactions with the environment itself. We propose PSALM, which leverages LLM inference to heuristically complete partial plans emitted by a classical planner given partial domain knowledge, as well as to infer the semantic rules of the domain in a logical language based on environment feedback after execution. Our analysis on 7 environments shows that with just one expert-curated example plans, using LLMs as heuristic planners and rule predictors achieves lower environment execution steps and environment resets than random exploration while simultaneously recovering the underlying ground truth action semantics of the domain.
Updated: 2024-06-04 21:29:56
标题: 语言模型可以从环境反馈中推断经典规划器的动作语义
摘要: 传统的规划方法通常能够在可能的情况下找到一组可以实现给定目标状态的行动,但需要专家来指定控制环境动态的逻辑行动语义。研究人员已经证明,大型语言模型(LLMs)可以仅基于常识知识和最小领域信息直接推断规划步骤,但这样的计划通常在执行时失败。我们将传统规划和LLM常识推理的优势结合起来,执行领域归纳,学习和验证基于与环境本身的闭环交互的行动前后条件。我们提出了PSALM,利用LLM推理来启发性地完成由传统规划器根据部分领域知识发出的部分计划,并根据执行后的环境反馈推断领域的语义规则。我们对7个环境的分析表明,仅使用一个专家策划的示例计划,将LLMs用作启发式规划器和规则预测器可以实现比随机探索更低的环境执行步骤和环境重置,同时恢复领域的基本行动语义。
更新时间: 2024-06-04 21:29:56
领域: cs.AI,cs.CL,cs.RO
Building Socially-Equitable Public Models
Public models offer predictions to a variety of downstream tasks and have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, the exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents. Recognizing the public model's predictions as a service, we advocate for integrating the objectives of downstream agents into the optimization process. Concretely, to address performance disparities and foster fairness among heterogeneous agents in training, we propose a novel Equitable Objective. This objective, coupled with a policy gradient algorithm, is crafted to train the public model to produce a more equitable/uniform performance distribution across downstream agents, each with their unique concerns. Both theoretical analysis and empirical case studies have proven the effectiveness of our method in advancing performance equity across diverse downstream agents utilizing the public model for their decision-making. Codes and datasets are released at https://github.com/Ren-Research/Socially-Equitable-Public-Models.
Updated: 2024-06-04 21:27:43
标题: 建立社会公平的公共模式
摘要: 公共模型为各种下游任务提供预测,并在各种人工智能应用中发挥关键作用,展示了它们在准确预测方面的熟练性。然而,仅仅强调预测准确性可能与下游代理的多样化最终目标不符。将公共模型的预测视为一项服务,我们倡导将下游代理的目标整合到优化过程中。具体而言,为了解决训练中异质代理之间的性能差距并促进公平性,我们提出了一种新颖的公平目标。这个目标与政策梯度算法结合,旨在训练公共模型在下游代理之间产生更公平/均匀的性能分布,每个代理都有自己独特的关注点。理论分析和实证案例研究已经证明了我们的方法在促进通过利用公共模型进行决策的多样化下游代理之间的性能公平性方面的有效性。代码和数据集发布在https://github.com/Ren-Research/Socially-Equitable-Public-Models。
更新时间: 2024-06-04 21:27:43
领域: cs.LG,cs.CY
Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions
We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{n\epsilon})^{1 - \frac 1 k}$ under $(\epsilon, \delta)$-approximate differential privacy, up to a mild $\textup{polylog}(\frac{1}{\delta})$ factor, where $G_2^2$ and $G_k^k$ are the $2^{\text{nd}}$ and $k^{\text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.
Updated: 2024-06-04 21:26:29
标题: 私有随机凸优化与重尾:简单减少近似最优解
摘要: 我们研究了具有重尾梯度的差分隐私随机凸优化(DP-SCO)问题,其中我们假设样本函数的Lipschitz常数有$k$阶矩限制,而不是一致限制。我们提出了一种新的基于归约的方法,使我们能够在重尾情况下获得第一个最优率(直到对数因子),在$(\epsilon, \delta)$-近似差分隐私条件下,达到误差$G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{n\epsilon})^{1 - \frac 1 k}$,直到一个轻微的$\textup{polylog}(\frac{1}{\delta})$因子,其中$G_2^2$和$G_k^k$分别是样本Lipschitz常数的$2$阶和$k$阶矩限制,几乎匹配了[Lowy and Razaviyayn 2023]的一个下界。 此外,我们在重尾情况下提供了一套私有算法,这些算法在额外假设下改进了我们的基本结果,包括在已知Lipschitz常数假设下的最优算法,对于平滑函数的近线性时间算法,以及对于平滑广义线性模型的最优线性时间算法。
更新时间: 2024-06-04 21:26:29
领域: cs.DS,cs.CR,cs.LG,stat.ML
Representation Surgery: Theory and Practice of Affine Steering
Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model's representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations in a manner that reduces the probability of it generating undesirable text. This paper investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model's representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.
Updated: 2024-06-04 21:26:07
标题: Representation Surgery: 仿射转向的理论与实践
摘要: 语言模型经常表现出不良行为,例如生成有毒或性别偏见的文本。在神经语言模型的情况下,不良行为的编码通常存在于模型的表示中。因此,防止模型表现出不良行为的一个自然(和常见)方法是引导模型的表示,以减少生成不良文本的概率。本文研究了引导函数的形式和经验特性,即改变神经语言模型表示的转换,从而改变其行为。首先,我们推导了两种在不同约束条件下最佳的、在最小二乘意义下的仿射引导函数。我们的理论为现有方法提供了理论依据,并提供了一种新的、改进的引导方法。其次,我们进行了一系列实验,证明了这些方法在减轻偏见和减少有毒生成方面的经验有效性。
更新时间: 2024-06-04 21:26:07
领域: cs.LG,cs.CL,cs.CY
Controllable Prompt Tuning For Balancing Group Distributional Robustness
Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. However, directly applying such optimization involves updating the parameters of the entire network, making it both computationally expensive and challenging. Thus, we introduce Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data, while requiring only 0.4% tunable parameters.
Updated: 2024-06-04 21:25:20
标题: 可控提示调整以平衡组分布稳健性
摘要: 在由不同群体或领域组成的数据上训练的模型可能会在分布转移下遭受严重的性能下降。尽管最近的方法主要集中在优化最糟糕群体的目标,但这往往会以其他群体的良好性能为代价。为了解决这个问题,我们引入了一种优化方案,以实现跨群体的良好性能,并找到一个良好的解决方案,而不会严重牺牲任何一个群体的性能。然而,直接应用这种优化涉及更新整个网络的参数,这既计算昂贵又具有挑战性。因此,我们引入了Controllable Prompt Tuning(CPT),将我们的方法与提示调整技术结合起来。在虚假相关基准测试中,我们的程序在变压器和非变压器架构以及单模态和多模态数据上均实现了最先进的结果,而仅需要0.4%的可调参数。
更新时间: 2024-06-04 21:25:20
领域: cs.LG
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abstract logical problems alone accurately benchmark an LLM's reasoning ability in real-world scenarios, disentangled from contextual support in practical settings? (2) Does fine-tuning LLMs on abstract logic problem generalize to contextualized logic problems and vice versa? To investigate these questions, we focus on standard propositional logic, specifically propositional deductive and abductive logic reasoning. In particular, we construct instantiated datasets for deductive and abductive reasoning with 4 levels of difficulty, encompassing 12 distinct categories or domains based on the categorization of Wikipedia. Our experiments aim to provide insights into disentangling context in logical reasoning and the true reasoning capabilities of LLMs and their generalization potential. The code and dataset are available at: https://github.com/agiresearch/ContextHub.
Updated: 2024-06-04 21:25:06
标题: 解开逻辑的困惑:上下文在大型语言模型推理能力中的作用
摘要: 这项研究旨在通过调查抽象和具体逻辑问题之间的对比,系统地澄清纯逻辑推理和文本理解之间的区别,从多个领域的全面集合中进行研究。我们探讨在底层逻辑结构保持恒定时,LLMs是否展示了跨各种领域的真正推理能力。我们关注两个主要问题:(1)仅凭抽象逻辑问题能否准确地评估LLM在实际场景中的推理能力,脱离实际设置中的上下文支持?(2)将LLMs在抽象逻辑问题上进行微调是否能推广到具体化逻辑问题,反之亦然?为了研究这些问题,我们专注于标准命题逻辑,具体来说是命题演绎和归纳逻辑推理。特别是,我们构建了具体的数据集,用于命题演绎和归纳推理,包括4个难度级别,涵盖了基于维基百科分类的12个不同类别或领域。我们的实验旨在提供有关逻辑推理中上下文澄清和LLMs真正推理能力及其泛化潜力的见解。代码和数据集可在以下网址获得:https://github.com/agiresearch/ContextHub。
更新时间: 2024-06-04 21:25:06
领域: cs.CL,cs.AI,cs.LG
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound $\widetilde{O}(SA\frac{H}{\varepsilon^2} )$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$, and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. We also initiate the study of sample complexity in general (multichain) average-reward MDPs. We argue a new transient time parameter $B$ is necessary, establish an $\widetilde{O}(SA\frac{B + H}{\varepsilon^2})$ complexity bound, and prove a matching (up to log factors) minimax lower bound. Both results are based on reducing the average-reward MDP to a discounted MDP, which requires new ideas in the general setting. To optimally analyze this reduction, we develop improved bounds for $\gamma$-discounted MDPs, showing that $\widetilde{O}(SA\frac{H}{(1-\gamma)^2\varepsilon^2} )$ and $\widetilde{O}(SA\frac{B + H}{(1-\gamma)^2\varepsilon^2} )$ samples suffice to learn $\varepsilon$-optimal policies in weakly communicating and in general MDPs, respectively. Both these results circumvent the well-known minimax lower bound of $\widetilde{\Omega}(SA\frac{1}{(1-\gamma)^3\varepsilon^2} )$ for $\gamma$-discounted MDPs, and establish a quadratic rather than cubic horizon dependence for a fixed MDP instance.
Updated: 2024-06-04 21:22:45
标题: 基于跨度的弱通信和一般平均奖励MDPs的最优样本复杂度
摘要: 我们研究在一个生成模型下学习平均奖励马尔可夫决策过程(MDP)中一个$\varepsilon$-最优策略的样本复杂度。对于弱通信的MDP,我们建立了复杂度上界$\widetilde{O}(SA\frac{H}{\varepsilon^2})$,其中$H$是最优策略的偏置函数的范围,$SA$是状态-动作空间的基数。我们的结果是在所有参数$S,A,H$和$\varepsilon$上(除对数因子外)都是极小化最优的,优于现有工作,后者要么假定所有策略的混合时间均匀有界,要么对参数有次优依赖。我们还开展了对一般(多链)平均奖励MDP的样本复杂度研究。我们认为一个新的瞬时时间参数$B$是必要的,建立了一个$\widetilde{O}(SA\frac{B + H}{\varepsilon^2})$的复杂度上界,并证明了一个匹配的(除对数因子外)极小化下界。这两个结果都基于将平均奖励MDP简化为折扣MDP,这需要在一般设置中提出新的想法。为了最佳地分析这种简化,我们为$\gamma$-折扣MDP开发了改进的界限,表明在弱通信和一般MDP中,学习$\varepsilon$-最优策略需要$\widetilde{O}(SA\frac{H}{(1-\gamma)^2\varepsilon^2})$和$\widetilde{O}(SA\frac{B + H}{(1-\gamma)^2\varepsilon^2})$个样本。这两个结果都规避了$\gamma$-折扣MDP的已知极小化下界$\widetilde{\Omega}(SA\frac{1}{(1-\gamma)^3\varepsilon^2})$,并为一个固定MDP实例建立了二次而不是三次的时间依赖性。
更新时间: 2024-06-04 21:22:45
领域: cs.LG,cs.IT,math.IT,math.OC,stat.ML
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls - e.g., when asking the LM to answer each question multiple times and taking a majority vote - affects such a compound system's performance. In this paper, we initiate the study of scaling properties of compound inference systems. We analyze, theoretically and empirically, how the number of LM calls affects the performance of Vote and Filter-Vote, two of the simplest compound system designs, which aggregate LM responses via majority voting, optionally applying LM filters. We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls. Our theoretical results suggest that this non-monotonicity is due to the diversity of query difficulties within a task: more LM calls lead to higher performance on "easy" queries, but lower performance on "hard" queries, and non-monotone behavior can emerge when a task contains both types of queries. This insight then allows us to compute, from a small number of samples, the number of LM calls that maximizes system performance, and define an analytical scaling model for both systems. Experiments show that our scaling model can accurately predict the performance of Vote and Filter-Vote systems and thus find the optimal number of LM calls to make.
Updated: 2024-06-04 21:20:33
标题: 你需要更多的LLM调用吗?走向扩展化合推理系统的法则
摘要: 近年来,许多语言任务中取得的最新成果是利用执行多个语言模型(LM)调用并聚合其响应的复合系统实现的。然而,对于LM调用的次数 - 例如,当要求LM多次回答每个问题并进行多数投票时 - 如何影响这种复合系统的性能却知之甚少。在本文中,我们开始研究复合推理系统的缩放特性。我们从理论和实证的角度分析了LM调用次数如何影响Vote和Filter-Vote这两种最简单的复合系统设计的表现,这两种设计通过多数投票聚合LM的响应,可选择应用LM过滤器。令人惊讶的是,在多个语言任务中,Vote和Filter-Vote的性能在LM调用次数的函数下可能会先增加,然后减少。我们的理论结果表明,这种非单调性是由于任务中查询难度的多样性:更多的LM调用会提高“简单”查询的性能,但会降低“困难”查询的性能,并且当一个任务包含这两种类型的查询时,非单调行为会出现。这一观点随后使我们能够从少量样本中计算出最大化系统性能的LM调用次数,并为这两种系统定义一个分析缩放模型。实验证明,我们的缩放模型可以准确预测Vote和Filter-Vote系统的性能,从而找到最佳的LM调用次数。
更新时间: 2024-06-04 21:20:33
领域: cs.LG,cs.AI,cs.CL,cs.SY,eess.SY
Harnessing Density Ratios for Online Reinforcement Learning
The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other. However, the notion of density ratio modeling, an emerging paradigm in offline RL, has been largely absent from online RL, perhaps for good reason: the very existence and boundedness of density ratios relies on access to an exploratory dataset with good coverage, but the core challenge in online RL is to collect such a dataset without having one to start. In this work we show -- perhaps surprisingly -- that density ratio-based algorithms have online counterparts. Assuming only the existence of an exploratory distribution with good coverage, a structural condition known as coverability (Xie et al., 2023), we give a new algorithm (GLOW) that uses density ratio realizability and value function realizability to perform sample-efficient online exploration. GLOW addresses unbounded density ratios via careful use of truncation, and combines this with optimism to guide exploration. GLOW is computationally inefficient; we complement it with a more efficient counterpart, HyGLOW, for the Hybrid RL setting (Song et al., 2022) wherein online RL is augmented with additional offline data. HyGLOW is derived as a special case of a more general meta-algorithm that provides a provable black-box reduction from hybrid RL to offline RL, which may be of independent interest.
Updated: 2024-06-04 21:19:10
标题: 利用密度比率进行在线强化学习
摘要: 离线和在线强化学习理论,尽管在平行发展,但已经开始显示出可能统一的迹象,因为一个设置的算法和分析技术通常在另一个设置中有自然对应物。然而,在离线强化学习中新兴的密度比建模范式在在线强化学习中基本上缺席,也许是有充分理由的:密度比的存在和有界性依赖于具有良好覆盖率的探索性数据集的访问,但在线强化学习的核心挑战是在没有起点数据集的情况下收集这样的数据集。在这项工作中,我们展示 - 或许令人惊讶地 - 基于密度比的算法具有在线对应物。假设只存在具有良好覆盖率的探索分布,一种被称为可覆盖性的结构条件(Xie等人,2023),我们提出了一种新算法(GLOW),它利用密度比实现性和值函数实现性来执行样本高效的在线探索。GLOW通过谨慎使用截断来解决无界密度比,并将乐观主义与其结合以指导探索。GLOW在计算上效率低下;我们将其与更高效的对应物HyGLOW相结合,用于混合RL设置(Song等人,2022),其中在线RL被额外的离线数据增强。HyGLOW是更一般的元算法的一个特例,该算法提供了一个可证明的黑盒从混合RL到离线RL的减少,这可能具有独立的兴趣。
更新时间: 2024-06-04 21:19:10
领域: cs.LG,stat.ML
Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors
Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases to the image reconstruction. These priors can be weak (e.g., impose only basic natural-image statistics) or strong (e.g., impose assumptions of black-hole structure). Our framework uses Bayesian inference with score-based priors, which are data-driven priors arising from a deep generative model that can learn complicated image distributions. Using our Bayesian imaging approach with sophisticated data-driven priors, we can assess how visual features and uncertainty of reconstructed images change depending on the prior. In addition to simulated data, we image the real EHT M87* data and discuss how recovered features are influenced by the choice of prior.
Updated: 2024-06-04 21:08:07
标题: 通过不同假设下的深度生成图像先验对M87*的事件视界尺度成像
摘要: 重建M87*超大质量黑洞事件视界望远镜(EHT)观测的图像取决于先验,以施加所需的图像统计信息。然而,鉴于直接观测黑洞的不可能性,先验的选择并不明确。我们提出了一个灵活设计一系列先验的框架,每个先验都对图像重建带来不同的偏见。这些先验可以是弱的(例如,仅施加基本的自然图像统计信息)或强的(例如,施加黑洞结构的假设)。我们的框架使用基于得分的贝叶斯推断,这些先验是由深度生成模型产生的数据驱动先验,可以学习复杂的图像分布。利用我们带有复杂数据驱动先验的贝叶斯成像方法,我们可以评估重建图像的视觉特征和不确定性如何根据先验变化。除了模拟数据,我们还对真实的EHT M87*数据进行成像,并讨论所选择先验对恢复特征的影响。
更新时间: 2024-06-04 21:08:07
领域: astro-ph.IM,cs.LG,eess.IV
Copyright Traps for Large Language Models
Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize significantly, we hypothesize--and later confirm--that they will not work against models that do not naturally memorize, e.g. medium-size 1B models. We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. We carefully design a randomized controlled experimental setup, inserting traps into original content (books) and train a 1.3B LLM from scratch. We first validate that the use of content in our target model would be undetectable using existing methods. We then show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods. However, we show that longer sequences repeated a large number of times can be reliably detected (AUC=0.75) and used as copyright traps. Beyond copyright applications, our findings contribute to the study of LLM memorization: the randomized controlled setup enables us to draw causal relationships between memorization and certain sequence properties such as repetition in model training data and perplexity.
Updated: 2024-06-04 21:07:11
标题: 大型语言模型的版权陷阱
摘要: 对于使用受版权保护内容训练大型语言模型(LLMs)的公平使用问题正在积极讨论中。文档级推断被提出作为一项新任务:通过对经过训练的模型的黑盒访问来推断一段内容是否在训练过程中被看到。然而,现有技术通常依赖于(部分)内容的自然出现记忆。虽然对于显著记忆模型非常有效,但我们假设 - 并后来证实 - 这些方法对于不会自然记忆的模型,例如中等规模的10亿模型,将无效。我们在这里提出使用版权陷阱,即在原始内容中包含虚构条目,以检测LLMs中对版权材料的使用,重点放在不会自然发生记忆的模型上。我们精心设计了一个随机对照实验设置,将陷阱插入原始内容(书籍)并从头开始训练一个13亿模型。我们首先验证了使用我们目标模型中的内容将是无法检测到的现有方法。然后,我们展示,与直觉相反,即使中等长度的陷阱句子重复了大量次数(100次),也无法使用现有方法检测到。然而,我们发现,重复次数较多的较长序列可以可靠地检测(AUC=0.75)并用作版权陷阱。除了版权应用之外,我们的发现还有助于研究LLM记忆:随机对照设置使我们能够推断记忆与某些序列属性之间的因果关系,例如在模型训练数据中的重复和困惑度。
更新时间: 2024-06-04 21:07:11
领域: cs.CL,cs.CR
Its Not a Modality Gap: Characterizing and Addressing the Contrastive Gap
Multi-modal contrastive models such as CLIP achieve state-of-the-art performance in zero-shot classification by embedding input images and texts on a joint representational space. Recently, a modality gap has been reported in two-encoder contrastive models like CLIP, meaning that the image and text embeddings reside in disjoint areas of the latent space. Previous studies suggest that this gap exists due to 1) the cone effect, 2) mismatched pairs in the dataset, and 3) insufficient training. We show that, even when accounting for all these factors, and even when using the same modality, the contrastive loss actually creates a gap during training. As a result, We propose that the modality gap is inherent to the two-encoder contrastive loss and rename it the contrastive gap. We present evidence that attributes this contrastive gap to low uniformity in CLIP space, resulting in embeddings that occupy only a small portion of the latent space. To close the gap, we adapt the uniformity and alignment properties of unimodal contrastive loss to the multi-modal setting and show that simply adding these terms to the CLIP loss distributes the embeddings more uniformly in the representational space, closing the gap. In our experiments, we show that the modified representational space achieves better performance than default CLIP loss in downstream tasks such as zero-shot image classification and multi-modal arithmetic.
Updated: 2024-06-04 20:53:32
标题: 不是一种模态差距:表征和解决对比差距
摘要: 多模态对比模型如CLIP在零样本分类中取得了最先进的性能,通过将输入图像和文本嵌入到共同的表示空间中。最近,有报道称两个编码器对比模型(如CLIP)存在模态差距,意味着图像和文本嵌入存在于潜在空间的不同区域。先前的研究表明,这种差距存在的原因包括:1)锥效应,2)数据集中不匹配的对,以及3)训练不充分。我们表明,即使考虑了所有这些因素,甚至在使用相同模态时,对比损失在训练过程中实际上会产生差距。因此,我们建议模态差距是两个编码器对比损失固有的,并将其重新命名为对比差距。我们提供证据表明,这种对比差距归因于CLIP空间中的低均匀性,导致嵌入仅占潜在空间的一小部分。为了缩小这一差距,我们将单模态对比损失的均匀性和对齐性属性调整到多模态设置中,并表明简单地将这些项添加到CLIP损失中可以使嵌入在表示空间中更加均匀,从而缩小差距。在我们的实验中,我们展示修改后的表示空间在零样本图像分类和多模态算术等下游任务中比默认的CLIP损失表现更好。
更新时间: 2024-06-04 20:53:32
领域: cs.CV,cs.CL,cs.IR,cs.LG
LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery
ML-based computer vision models are promising tools for supporting emergency management operations following natural disasters. Arial photographs taken from small manned and unmanned aircraft can be available soon after a disaster and provide valuable information from multiple perspectives for situational awareness and damage assessment applications. However, emergency managers often face challenges finding the most relevant photos among the tens of thousands that may be taken after an incident. While ML-based solutions could enable more effective use of aerial photographs, there is still a lack of training data for imagery of this type from multiple perspectives and for multiple hazard types. To address this, we present the LADI v2 (Low Altitude Disaster Imagery version 2) dataset, a curated set of about 10,000 disaster images captured in the United States by the Civil Air Patrol (CAP) in response to federally-declared emergencies (2015-2023) and annotated for multi-label classification by trained CAP volunteers. We also provide two pretrained baseline classifiers and compare their performance to state-of-the-art vision-language models in multi-label classification. The data and code are released publicly to support the development of computer vision models for emergency management research and applications.
Updated: 2024-06-04 20:51:04
标题: LADI v2:低空灾难图像的多标签数据集和分类器
摘要: 基于机器学习的计算机视觉模型是支持自然灾害后的紧急管理操作的有希望的工具。从小型有人和无人飞机拍摄的航空照片可以在灾害发生后不久出现,并为情况意识和损害评估应用提供了来自多个角度的宝贵信息。然而,紧急管理人员往往面临挑战,在事故发生后可能拍摄了数以万计的照片中找到最相关的照片。虽然基于机器学习的解决方案可以实现更有效地利用航空照片,但仍然缺乏来自多个角度和多种危害类型的此类图像的训练数据。为了解决这个问题,我们提出了LADI v2(低空灾难图像版本2)数据集,这是一个由美国国民航空队(CAP)在联邦宣布紧急情况(2015-2023年)中拍摄的大约10,000张灾难图像的策划集,并由受过训练的CAP志愿者进行了多标签分类注释。我们还提供了两个预训练的基线分类器,并将它们的性能与最先进的视觉语言模型在多标签分类中进行了比较。这些数据和代码已公开发布,以支持紧急管理研究和应用的计算机视觉模型的发展。
更新时间: 2024-06-04 20:51:04
领域: cs.CV,cs.AI,cs.LG,68T45,J.2
MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning
Deriving meaningful representations from complex, high-dimensional data in unsupervised settings is crucial across diverse machine learning applications. This paper introduces a framework for multi-scale graph network embedding based on spectral graph wavelets that employs a contrastive learning approach. A significant feature of the proposed embedding is its capacity to establish a correspondence between the embedding space and the input feature space which aids in deriving feature importance of the original features. We theoretically justify our approach and demonstrate that, in Paley-Wiener spaces on combinatorial graphs, the spectral graph wavelets operator offers greater flexibility and better control over smoothness properties compared to the Laplacian operator. We validate the effectiveness of our proposed graph embedding on a variety of public datasets through a range of downstream tasks, including clustering and unsupervised feature importance.
Updated: 2024-06-04 20:48:33
标题: MS-IMAP -- 一种用于可解释流形学习的多尺度图嵌入方法
摘要: 从复杂的、高维数据中获取有意义的表示在各种机器学习应用中是至关重要的。本文介绍了一种基于谱图小波的多尺度图网络嵌入框架,采用对比学习方法。所提出的嵌入的一个显著特点是它能够建立嵌入空间和输入特征空间之间的对应关系,有助于推导原始特征的重要性。我们在理论上证明了我们的方法,并证明,在组合图上的 Paley-Wiener 空间中,谱图小波算子相对于拉普拉斯算子提供了更大的灵活性和更好的控制平滑性质。通过一系列下游任务,包括聚类和无监督特征重要性,我们在多个公共数据集上验证了我们提出的图嵌入的有效性。
更新时间: 2024-06-04 20:48:33
领域: cs.LG
Diagnostic Digital Twin for Anomaly Detection in Floating Offshore Wind Energy
The demand for condition-based and predictive maintenance is rising across industries, especially for remote, high-value, and high-risk assets. In this article, the diagnostic digital twin concept is introduced, discussed, and implemented for a floating offshore turbine. A diagnostic digital twin is a virtual representation of an asset that combines real-time data and models to monitor damage, detect anomalies, and diagnose failures, thereby enabling condition-based and predictive maintenance. By applying diagnostic digital twins to offshore assets, unexpected failures can be alleviated, but the implementation can prove challenging. Here, a diagnostic digital twin is implemented for an operational floating offshore wind turbine. The asset is monitored through measurements. Unsupervised learning methods are employed to build a normal operation model, detect anomalies, and provide a fault diagnosis. Warnings and diagnoses are sent through text messages, and a more detailed diagnosis can be accessed in a virtual reality interface. The diagnostic digital twin successfully detected an anomaly with high confidence hours before a failure occurred. The paper concludes by discussing diagnostic digital twins in the broader context of offshore engineering. The presented approach can be generalized to other offshore assets to improve maintenance and increase the lifetime, efficiency, and sustainability of offshore assets.
Updated: 2024-06-04 20:45:20
标题: 在浮动式海上风能中用于异常检测的诊断数字孪生
摘要: 随着各行业对基于条件和预测性维护的需求不断增加,尤其是对远程、高价值和高风险资产的需求。本文介绍、讨论和实施了一种诊断数字孪生概念,应用于浮动海上风力涡轮机。诊断数字孪生是资产的虚拟表示,结合实时数据和模型来监测损坏、检测异常和诊断故障,从而实现基于条件和预测性维护。通过将诊断数字孪生应用于海上资产,可以减轻意外故障,但实施可能具有挑战性。在这里,为一个运行中的海上浮动风力涡轮机实施了诊断数字孪生。通过测量监测资产。采用无监督学习方法构建正常运行模型,检测异常并提供故障诊断。警告和诊断通过短信发送,更详细的诊断可以在虚拟现实界面中访问。诊断数字孪生在故障发生数小时前成功检测到了一个异常,且具有高置信度。论文最后讨论了海上工程中诊断数字孪生的更广泛背景。所提出的方法可以推广到其他海上资产,以改进维护,增加海上资产的寿命、效率和可持续性。
更新时间: 2024-06-04 20:45:20
领域: cs.LG,cs.AI,cs.ET,eess.SP
Cyclic Sparse Training: Is it Enough?
The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive pruning-training iterations, starting from a dense network. To achieve this with sparse training instead, we propose SCULPT-ing, i.e., repeated cyclic training of any sparse mask followed by a single pruning step to couple the parameters and the mask, which is able to match the performance of state-of-the-art iterative pruning methods in the high sparsity regime at reduced computational cost.
Updated: 2024-06-04 20:40:27
标题: 循环稀疏训练:足够吗?
摘要: 迭代剪枝方法在实现最先进的稀疏网络方面的成功很大程度上归因于改进的掩模识别和由剪枝引起的隐式正则化。我们对这一假设提出质疑,认为它们重复的周期性训练时间表使优化得到改善。为了验证这一点,我们展示了在初始化时进行的剪枝通过重复的周期性训练大大增强,甚至超过了标准的迭代剪枝方法。我们推测,实现这一点的主要机制可以归因于对损失景观的更好探索,从而导致更低的训练损失。然而,在高稀疏度下,仅通过重复的周期性训练还不足以获得竞争性能。似乎需要学习参数初始化和掩模之间的强耦合。标准方法通过昂贵的剪枝训练迭代,从密集网络开始获得这种耦合。为了用稀疏训练实现这一点,我们提出了SCULPT-ing,即对任何稀疏掩模进行重复的周期性训练,然后进行单次剪枝步骤以耦合参数和掩模,能够在高稀疏度范围内以更低的计算成本匹配最先进的迭代剪枝方法的性能。
更新时间: 2024-06-04 20:40:27
领域: cs.LG,cs.CV
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance
Graph Neural Networks (GNNs) have excelled in predicting graph properties in various applications ranging from identifying trends in social networks to drug discovery and malware detection. With the abundance of new architectures and increased complexity, GNNs are becoming highly specialized when tested on a few well-known datasets. However, how the performance of GNNs depends on the topological and features properties of graphs is still an open question. In this work, we introduce a comprehensive benchmarking framework for graph machine learning, focusing on the performance of GNNs across varied network structures. Utilizing the geometric soft configuration model in hyperbolic space, we generate synthetic networks with realistic topological properties and node feature vectors. This approach enables us to assess the impact of network properties, such as topology-feature correlation, degree distributions, local density of triangles (or clustering), and homophily, on the effectiveness of different GNN architectures. Our results highlight the dependency of model performance on the interplay between network structure and node features, providing insights for model selection in various scenarios. This study contributes to the field by offering a versatile tool for evaluating GNNs, thereby assisting in developing and selecting suitable models based on specific data characteristics.
Updated: 2024-06-04 20:40:06
标题: 非线性基准测试揭示了图神经网络性能中的网络拓扑特征关系
摘要: 图神经网络(GNNs)在从识别社交网络中的趋势到药物发现和恶意软件检测等各种应用中预测图属性方面表现出色。随着新架构的丰富和复杂性增加,当在一些知名数据集上进行测试时,GNNs变得高度专业化。然而,GNNs的性能如何取决于图的拓扑和特征属性仍然是一个悬而未决的问题。在这项工作中,我们引入了一个针对图机器学习的全面基准测试框架,重点关注GNNs在不同网络结构上的性能。利用双曲空间中的几何软配置模型,我们生成具有现实拓扑属性和节点特征向量的合成网络。这种方法使我们能够评估网络属性(如拓扑特征相关性、度分布、三角形的局部密度(或聚类)和同质性)对不同GNN架构有效性的影响。我们的结果突出了模型性能依赖于网络结构和节点特征之间相互作用的情况,为各种情况下的模型选择提供了见解。这项研究通过提供一个多功能工具来评估GNNs,从而有助于基于特定数据特征开发和选择合适的模型,为该领域做出了贡献。
更新时间: 2024-06-04 20:40:06
领域: cs.LG
Improved context-sensitive transformer model for inland vessel trajectory prediction
Physics-related and model-based vessel trajectory prediction is highly accurate but requires specific knowledge of the vessel under consideration which is not always practical. Machine learning-based trajectory prediction models do not require expert knowledge, but rely on the implicit knowledge extracted from massive amounts of data. Several deep learning (DL) methods for vessel trajectory prediction have recently been suggested. The DL models developed typically only process information about the (dis)location of vessels defined with respect to a global reference system. In the context of inland navigation, this can be problematic, since without knowledge of the limited navigable space, irrealistic trajectories are likely to be determined. If spatial constraintes are introduced, e.g., by implementing an additional submodule to process map data, however, overall complexity increases. Instead of processing the vessel displacement information on the one hand and the spatial information on the other hand, the paper proposes the merging of both information. Here, fairway-related and navigation-related displacement information are used directly. In this way, the previously proposed context-sensitive Classification Transformer (CSCT) shows an improved spatial awareness. Additionally, the CSCT is adapted to assess the model uncertainty by enabling dropout during inference. This approach is trained on different inland waterways to analyze its generalizability. As the improved CSCT obtains lower prediction errors and enables to estimate the trustworthiness of each prediction, it is more suitable for safety-critical applications in inland navigation than previously developed models.
Updated: 2024-06-04 20:39:14
标题: 改进的上下文敏感变压器模型用于内陆船舶轨迹预测
摘要: 与物理相关和基于模型的船舶轨迹预测通常是非常准确的,但需要对所考虑的船舶具有特定知识,这在实际中并不总是可行的。基于机器学习的轨迹预测模型不需要专业知识,而是依赖于从大量数据中提取的隐式知识。最近提出了几种用于船舶轨迹预测的深度学习(DL)方法。开发的DL模型通常仅处理与全球参考系统有关的船舶(失)位信息。在内河航行的情况下,这可能会有问题,因为缺乏对有限可航行空间的了解,可能会确定不现实的轨迹。然而,如果引入空间约束,例如通过实现一个额外的子模块来处理地图数据,整体复杂性会增加。本文提出了合并两者信息的方法,而不是在一方面处理船舶位移信息,在另一方面处理空间信息。这里,与航道相关和导航相关的位移信息被直接使用。通过这种方式,先前提出的上下文敏感的分类变换器(CSCT)展现出了改善的空间感知能力。此外,CSCT被调整以在推断过程中启用辍学来评估模型不确定性。这种方法在不同的内河水道上进行训练以分析其泛化能力。由于改进的CSCT获得了更低的预测误差,并且能够估计每个预测的可信度,因此它比先前开发的模型更适用于内河航行中的安全关键应用。
更新时间: 2024-06-04 20:39:14
领域: cs.LG
Diffusion Models With Learned Adaptive Noise
Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MULAN), a learned diffusion process that applies noise at different rates across an image. Specifically, our method relies on a multivariate noise schedule that is a function of the data to ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet and reduces the number of training steps by 50%. Code is available at https://github.com/s-sahoo/MuLAN
Updated: 2024-06-04 20:38:49
标题: 具有学习自适应噪声的扩散模型
摘要: 扩散模型作为合成高质量图像的强大算法已经受到关注。这些算法的核心是扩散过程,一组将数据映射到噪声的方程,可以显著影响性能。在本文中,我们探讨了扩散过程是否可以从数据中学习。我们的工作基于贝叶斯推断,并试图通过将学习到的扩散过程作为近似变分后验来提高对数似然估计,从而产生更紧密的似然下界(ELBO)。一个被广泛认为的假设是ELBO与噪声过程无关:我们的工作驳斥了这一假设,并提出了多变量学习自适应噪声(MULAN),这是一个学习扩散过程,可以在图像中以不同速率应用噪声。具体来说,我们的方法依赖于一个多变量噪声调度,这是数据的一个函数,以确保ELBO不再对噪声调度的选择不变,就像以前的作品一样。从经验上看,MULAN在CIFAR-10和ImageNet上设置了新的密度估计技术,并将训练步骤的数量减少了50%。代码可在https://github.com/s-sahoo/MuLAN 上找到。
更新时间: 2024-06-04 20:38:49
领域: cs.LG,cs.CV
Short-term Inland Vessel Trajectory Prediction with Encoder-Decoder Models
Accurate vessel trajectory prediction is necessary for save and efficient navigation. Deep learning-based prediction models, esp. encoder-decoders, are rarely applied to inland navigation specifically. Approaches from the maritime domain cannot directly be transferred to river navigation due to specific driving behavior influencing factors. Different encoder-decoder architectures, including a transformer encoder-decoder, are compared herein for predicting the next positions of inland vessels, given not only spatio-temporal information from AIS, but also river specific features. The results show that the reformulation of the regression task as classification problem and the inclusion of river specific features yield the lowest displacement errors. The standard LSTM encoder-decoder outperforms the transformer encoder-decoder for the data considered, but is computationally more expensive. In this study for the first time a transformer-based encoder-decoder model is applied to the problem of predicting the ship trajectory. Here, a feature vector using the river-specific context of navigation input parameters is established. Future studies can built on the proposed models, investigate the improvement of the computationally more efficient transformer, e.g. through further hyper-parameter optimization, and use additional river-specific information in the context representation to further increase prediction accuracy.
Updated: 2024-06-04 20:37:30
标题: 使用编码器-解码器模型进行短期内陆船舶轨迹预测
摘要: 准确的船舶轨迹预测对于安全和高效的航行是必要的。基于深度学习的预测模型,特别是编码器-解码器,在内陆航行中很少被应用。来自海事领域的方法不能直接转移到河流航行中,因为特定的驾驶行为会影响因素。本文比较了不同的编码器-解码器架构,包括变压器编码器-解码器,用于预测内陆船舶的下一个位置,不仅提供来自AIS的时空信息,还包括河流特定特征。结果显示,将回归任务重构为分类问题,并包含河流特定特征可以产生最小的位移误差。对于所考虑的数据,标准的LSTM编码器-解码器优于变压器编码器-解码器,但计算成本更高。在本研究中,首次将基于变压器的编码器-解码器模型应用于预测船舶轨迹的问题。在这里,建立了一个使用河流特定导航输入参数背景的特征向量。未来的研究可以基于提出的模型,探讨计算效率更高的变压器的改进,例如通过进一步的超参数优化,并利用额外的河流特定信息在上下文表示中进一步提高预测准确性。
更新时间: 2024-06-04 20:37:30
领域: cs.LG
Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks
The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and $\ell_p$-norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
Updated: 2024-06-04 20:37:17
标题: 线性对角网络加权最小二乘算法的精确渐近行为
摘要: 经典的迭代加权最小二乘(IRLS)算法旨在通过执行一系列加权最小二乘问题来从线性测量中恢复未知信号,其中权重在每一步都被递归更新。已经证明了这种算法的各种变体在稀疏恢复和$\ell_p$范数最小化方面取得了有利的经验性能和理论保证。最近,还发现了IRLS与某些非凸线性神经网络架构之间的初步联系,这些架构被观察到可以利用高维线性模型中的低维结构。在这项工作中,我们为一个包括IRLS、最近提出的lin-RFM算法(受神经网络中特征学习启发)以及在线性对角神经网络上的交替最小化算法的算法家族提供了统一的渐近分析。我们的分析在具有i.i.d.高斯协变量的“分批”设置中进行,并表明,通过适当选择的重新加权策略,该算法在只有少数迭代时就可以取得有利的性能。我们还将结果扩展到组稀疏恢复的情况,并显示利用重新加权方案中的这种结构可以证明地改善测试误差,与逐坐标重新加权相比。
更新时间: 2024-06-04 20:37:17
领域: stat.ML,cs.LG
Lightweight CNN-BiLSTM based Intrusion Detection Systems for Resource-Constrained IoT Devices
Intrusion Detection Systems (IDSs) have played a significant role in detecting and preventing cyber-attacks within traditional computing systems. It is not surprising that the same technology is being applied to secure Internet of Things (IoT) networks from cyber threats. The limited computational resources available on IoT devices make it challenging to deploy conventional computing-based IDSs. The IDSs designed for IoT environments must also demonstrate high classification performance, utilize low-complexity models, and be of a small size. Despite significant progress in IoT-based intrusion detection, developing models that both achieve high classification performance and maintain reduced complexity remains challenging. In this study, we propose a hybrid CNN architecture composed of a lightweight CNN and bidirectional LSTM (BiLSTM) to enhance the performance of IDS on the UNSW-NB15 dataset. The proposed model is specifically designed to run onboard resource-constrained IoT devices and meet their computation capability requirements. Despite the complexity of designing a model that fits the requirements of IoT devices and achieves higher accuracy, our proposed model outperforms the existing research efforts in the literature by achieving an accuracy of 97.28\% for binary classification and 96.91\% for multiclassification.
Updated: 2024-06-04 20:36:21
标题: 基于轻量级CNN-BiLSTM的资源受限IoT设备入侵检测系统
摘要: 入侵检测系统(IDSs)在传统计算系统中检测和防止网络攻击方面发挥了重要作用。不足为奇的是,相同的技术正在应用于保护物联网(IoT)网络免受网络威胁。物联网设备上的有限计算资源使得部署传统基于计算的IDSs具有挑战性。为物联网环境设计的IDSs还必须表现出高分类性能,利用低复杂度模型,并且体积小。尽管在基于物联网的入侵检测方面取得了显著进展,但开发既能够实现高分类性能又能够保持降低复杂度的模型仍然具有挑战性。在本研究中,我们提出了一个由轻量级CNN和双向LSTM(BiLSTM)组成的混合CNN架构,以提高UNS-NB15数据集上IDS性能。所提出的模型专门设计用于在资源受限的物联网设备上运行,并满足其计算能力要求。尽管设计符合物联网设备要求且实现更高准确度的模型的复杂性,我们提出的模型在文献中超越了现有的研究成果,实现了97.28%的二分类准确度和96.91%的多分类准确度。
更新时间: 2024-06-04 20:36:21
领域: cs.CR,cs.NI,eess.SP
Spatial and social situation-aware transformer-based trajectory prediction of autonomous systems
Autonomous transportation systems such as road vehicles or vessels require the consideration of the static and dynamic environment to dislocate without collision. Anticipating the behavior of an agent in a given situation is required to adequately react to it in time. Developing deep learning-based models has become the dominant approach to motion prediction recently. The social environment is often considered through a CNN-LSTM-based sub-module processing a $\textit{social tensor}$ that includes information of the past trajectory of surrounding agents. For the proposed transformer-based trajectory prediction model, an alternative, computationally more efficient social tensor definition and processing is suggested. It considers the interdependencies between target and surrounding agents at each time step directly instead of relying on information of last hidden LSTM states of individually processed agents. A transformer-based sub-module, the Social Tensor Transformer, is integrated into the overall prediction model. It is responsible for enriching the target agent's dislocation features with social interaction information obtained from the social tensor. For the awareness of spatial limitations, dislocation features are defined in relation to the navigable area. This replaces additional, computationally expensive map processing sub-modules. An ablation study shows, that for longer prediction horizons, the deviation of the predicted trajectory from the ground truth is lower compared to a spatially and socially agnostic model. Even if the performance gain from a spatial-only to a spatial and social context-sensitive model is small in terms of common error measures, by visualizing the results it can be shown that the proposed model in fact is able to predict reactions to surrounding agents and explicitely allows an interpretable behavior.
Updated: 2024-06-04 20:36:16
标题: 空间和社会情境感知的基于变压器的自主系统轨迹预测
摘要: 自主运输系统,如道路车辆或船只,需要考虑静态和动态环境,以在没有碰撞的情况下进行移位。在给定情况下预测代理的行为是及时做出充分反应的必要条件。最近,基于深度学习的模型已经成为运动预测的主导方法。社交环境通常通过基于CNN-LSTM的子模块进行处理,该子模块处理包含周围代理的过去轨迹信息的$\textit{社交张量}$。对于提出的基于transformer的轨迹预测模型,建议采用一种替代的、计算效率更高的社交张量定义和处理方法。它直接考虑每个时间步骤中目标和周围代理之间的相互依赖关系,而不是依靠单独处理代理的最后隐藏LSTM状态的信息。一个基于transformer的子模块,社交张量变换器,被整合到整体预测模型中。它负责用从社交张量中获取的社会互动信息丰富目标代理的位移特征。为了考虑空间限制,位移特征是与可导航区域相关定义的。这取代了额外的、计算昂贵的地图处理子模块。一项消融研究显示,对于较长的预测时间段,与空间和社交无关的模型相比,预测轨迹与实际情况的偏差较小。即使从仅空间到空间和社交上下文敏感模型的性能提升在常见错误度量方面较小,通过可视化结果可以看出,所提出的模型实际上能够预测对周围代理的反应,并明确允许解释性行为。
更新时间: 2024-06-04 20:36:16
领域: cs.LG
Fuzzy Convolution Neural Networks for Tabular Data Classification
Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.
Updated: 2024-06-04 20:33:35
标题: 模糊卷积神经网络用于表格数据分类
摘要: 最近,卷积神经网络(CNNs)由于在各个领域,特别是在图像和文本分类任务中表现出色,吸引了大量关注。然而,它们在表格数据分类方面的应用仍未深入研究。在许多领域,如生物信息学、金融、医学等,非图像数据占主导地位。适应CNNs对于非图像数据的分类仍然具有很高的挑战性。本文研究了CNNs在表格数据分类中的有效性,旨在弥合传统机器学习方法和深度学习技术之间的差距。我们提出了一种专门针对表格数据设计的新型模糊卷积神经网络(FCNN),用于捕捉特征向量中的局部模式。在我们的方法中,我们将特征值映射到模糊成员资格。将模糊成员资格向量转换为图像,用于训练CNN模型。训练好的CNN模型用于对未知的特征向量进行分类。为了验证我们的方法,我们生成了六个复杂的嘈杂数据集。我们从每个数据集中随机选择了百分之七十的样本进行训练,百分之三十进行测试。数据集还使用诸如决策树(DT)、支持向量机(SVM)、模糊神经网络(FNN)、贝叶斯分类器和随机森林(RF)等最先进的机器学习算法进行分类。实验结果表明,我们提出的模型可以有效地从表格数据中学习有意义的表示,与现有方法相比,实现了竞争性或更优越的性能。总的来说,我们的发现表明,所提出的FCNN模型有望成为表格数据分类任务的一种可行替代方案,提供了一种新的前景,并有可能为结构化数据分析中的深度学习开启新的机会。
更新时间: 2024-06-04 20:33:35
领域: cs.LG,cs.AI,I.2.10,I.4.6
Discovering Dynamic Symbolic Policies with Genetic Programming
Artificial intelligence (AI) techniques are increasingly being applied to solve control problems. However, control systems developed in AI are often black-box methods, in that it is not clear how and why they generate their outputs. A lack of transparency can be problematic for control tasks in particular, because it complicates the identification of biases or errors, which in turn negatively influences the user's confidence in the system. To improve the interpretability and transparency in control systems, the black-box structure can be replaced with white-box symbolic policies described by mathematical expressions. Genetic programming offers a gradient-free method to optimise the structure of non-differentiable mathematical expressions. In this paper, we show that genetic programming can be used to discover symbolic control systems. This is achieved by learning a symbolic representation of a function that transforms observations into control signals. We consider both systems that implement static control policies without memory and systems that implement dynamic memory-based control policies. In case of the latter, the discovered function becomes the state equation of a differential equation, which allows for evidence integration. Our results show that symbolic policies are discovered that perform comparably with black-box policies on a variety of control tasks. Furthermore, the additional value of the memory capacity in the dynamic policies is demonstrated on experiments where static policies fall short. Overall, we demonstrate that white-box symbolic policies can be optimised with genetic programming, while offering interpretability and transparency that lacks in black-box models.
Updated: 2024-06-04 20:33:29
标题: 通过遗传规划发现动态符号策略
摘要: 人工智能(AI)技术越来越多地被应用于解决控制问题。然而,AI中开发的控制系统通常是黑盒方法,即它们生成输出的方式和原因并不清楚。缺乏透明度对于控制任务特别有问题,因为这使得识别偏见或错误变得复杂,从而负面影响用户对系统的信心。为了提高控制系统的可解释性和透明性,黑盒结构可以被描述为由数学表达式描述的白盒符号策略所替代。遗传编程提供了一种无梯度方法来优化不可微分的数学表达式的结构。在这篇论文中,我们展示了遗传编程可以用于发现符号控制系统。这是通过学习将观测转换为控制信号的函数的符号表示来实现的。我们考虑了既实现静态控制策略又没有记忆的系统,也实现了动态基于记忆的控制策略的系统。在后一种情况下,发现的函数成为微分方程的状态方程,这允许证据整合。我们的结果表明,在各种控制任务中,发现了与黑盒策略表现相当的符号策略。此外,在静态策略表现不佳的实验中,动态策略的记忆容量的附加价值也得到了证明。总的来说,我们展示了白盒符号策略可以通过遗传编程进行优化,同时提供黑盒模型中缺乏的可解释性和透明性。
更新时间: 2024-06-04 20:33:29
领域: cs.NE,cs.LG
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength. By incorporating an adaptive scaling parameter into the loss for each pair, our method increases the flexibility of the reward function. Specifically, it assigns small scaling parameters to pairs with ambiguous preferences, leading to more comparable rewards, and large scaling parameters to those with clear preferences for more distinct rewards. Computationally, our proposed loss function is strictly convex and univariate with respect to each scaling parameter, enabling its efficient optimization through a simple second-order algorithm. Our method is versatile and can be readily adapted to various preference optimization frameworks, including direct preference optimization (DPO). Our experiments with robotic control and natural language generation with large language models (LLMs) show that our method not only improves policy performance but also aligns reward function selection more closely with policy optimization, simplifying the hyperparameter tuning process.
Updated: 2024-06-04 20:33:22
标题: 使用人类反馈的自适应偏好缩放对强化学习的翻译
摘要: 人类反馈强化学习(RLHF)是一种常见的方法,通过从人类偏好数据中学习奖励来使人工智能系统与人类价值观保持一致。然而,由于各种原因,这种数据通常采用轨迹段对的排序形式,这种形式无法捕捉不同对之间偏好强度的变化。在本文中,我们提出了一种新颖的自适应偏好损失,基于分布鲁棒优化(DRO),旨在解决偏好强度不确定性。通过将自适应缩放参数纳入每对的损失中,我们的方法增加了奖励函数的灵活性。具体而言,它为具有模糊偏好的对分配小的缩放参数,从而导致更具可比性的奖励,为那些具有明确偏好的对分配大的缩放参数,以获得更明显的奖励。在计算上,我们提出的损失函数在每个缩放参数方面都是严格凸的和单变量的,通过简单的二阶算法可以进行高效优化。我们的方法多功能且可以轻松地适应各种偏好优化框架,包括直接偏好优化(DPO)。我们在机器人控制和大型语言模型(LLMs)的自然语言生成实验中发现,我们的方法不仅提高了策略性能,还将奖励函数选择更加紧密地与策略优化保持一致,简化了超参数调整过程。
更新时间: 2024-06-04 20:33:22
领域: cs.LG,cs.AI
Dynamic and Adaptive Feature Generation with LLM
The representation of feature space is a crucial environment where data points get vectorized and embedded for upcoming modeling. Thus the efficacy of machine learning (ML) algorithms is closely related to the quality of feature engineering. As one of the most important techniques, feature generation transforms raw data into an optimized feature space conducive to model training and further refines the space. Despite the advancements in automated feature engineering and feature generation, current methodologies often suffer from three fundamental issues: lack of explainability, limited applicability, and inflexible strategy. These shortcomings frequently hinder and limit the deployment of ML models across varied scenarios. Our research introduces a novel approach adopting large language models (LLMs) and feature-generating prompts to address these challenges. We propose a dynamic and adaptive feature generation method that enhances the interpretability of the feature generation process. Our approach broadens the applicability across various data types and tasks and draws advantages over strategic flexibility. A broad range of experiments showcases that our approach is significantly superior to existing methods.
Updated: 2024-06-04 20:32:14
标题: 动态和自适应特征生成与LLM
摘要: 特征空间的表示是一个关键的环境,数据点在其中被向量化和嵌入,用于即将到来的建模。因此,机器学习(ML)算法的效力与特征工程的质量密切相关。作为最重要的技术之一,特征生成将原始数据转化为有利于模型训练的优化特征空间,并进一步完善该空间。尽管自动特征工程和特征生成方面取得了进展,但当前的方法往往存在三个基本问题:缺乏解释性、适用性有限和策略不灵活。这些缺点经常阻碍和限制了在不同场景下部署ML模型。我们的研究引入了一种新颖的方法,采用大型语言模型(LLMs)和特征生成提示来解决这些挑战。我们提出了一种动态和自适应的特征生成方法,提高了特征生成过程的可解释性。我们的方法扩展了在各种数据类型和任务中的适用性,并具有战略灵活性的优势。广泛的实验表明,我们的方法明显优于现有方法。
更新时间: 2024-06-04 20:32:14
领域: cs.LG,cs.AI
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.
Updated: 2024-06-04 20:29:48
标题: 通过无声标记增强对大型语言模型的越狱攻击
摘要: 随着语言模型的显著成功,最近的研究也开始探索LLMs的安全威胁,包括越狱攻击。攻击者精心制作越狱提示,使目标LLM会对有害问题作出回应。现有的越狱攻击要么需要人类专家,要么利用复杂的算法来制作越狱提示。在本文中,我们介绍了BOOST,这是一种简单的攻击,仅利用了eos令牌。我们展示了,攻击者不需要构造复杂的越狱提示,只需在有害问题的末尾添加几个eos令牌。这将绕过LLMs的安全对齐,并导致成功的越狱攻击。我们进一步将BOOST应用于四种代表性的越狱方法,并展示了简单地添加eos令牌到提示中可以显著提高这些方法的攻击成功率。为了理解这一简单但新颖的现象,我们进行了实证分析。我们的分析表明,添加eos令牌会使目标LLM相信输入的危害要小得多,eos令牌具有较低的注意力值,不影响LLM对有害问题的理解,导致模型实际上会回应这些问题。我们的发现揭示了LLM对越狱攻击的脆弱性,促使开发强大的安全对齐方法。
更新时间: 2024-06-04 20:29:48
领域: cs.AI
Multi-layer Learnable Attention Mask for Multimodal Tasks
While the Self-Attention mechanism in the Transformer model has proven to be effective in many domains, we observe that it is less effective in more diverse settings (e.g. multimodality) due to the varying granularity of each token and the high computational demands of lengthy sequences. To address the challenges, we introduce the Learnable Attention Mask (LAM), strategically designed to globally regulate attention maps and prioritize critical tokens within the sequence. Leveraging the Self-Attention module in a BERT-like transformer network, our approach adeptly captures associations between tokens. The extension of the LAM to a multi-layer version accommodates the varied information aspects embedded at each layer of the Transformer network. Comprehensive experimental validation on various datasets, such as MADv2, QVHighlights, ImageNet 1K, and MSRVTT, demonstrates the efficacy of the LAM, exemplifying its ability to enhance model performance while mitigating redundant computations. This pioneering approach presents a significant advancement in enhancing the understanding of complex scenarios, such as in movie understanding.
Updated: 2024-06-04 20:28:02
标题: 多层可学习的注意力蒙版用于多模态任务
摘要: 在Transformer模型中的自注意机制已被证明在许多领域是有效的,但我们观察到它在更多样化的环境(例如多模态)中效果较差,这是由于每个标记的粒度不同和长序列的高计算需求所致。为了解决这些挑战,我们引入了可学习的注意力掩模(LAM),其被策略性地设计用于全局调节注意力图并优先处理序列中的关键标记。利用类似BERT的Transformer网络中的自注意力模块,我们的方法巧妙地捕捉了标记之间的关联。将LAM扩展为多层版本以适应Transformer网络每一层中嵌入的各种信息方面。在各种数据集(如MADv2、QVHighlights、ImageNet 1K和MSRVTT)上进行全面的实验证明了LAM的功效,展示了其提高模型性能并减少冗余计算的能力。这一开创性方法在增强对复杂场景的理解方面取得了重大进展,例如在电影理解方面。
更新时间: 2024-06-04 20:28:02
领域: cs.CV,cs.AI,cs.LG,cs.MM
Aligning Large Language Models via Fine-grained Supervision
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learning process. However, because this approach operates on sequence-level feedback, it lacks the precision to identify the exact parts of the output affecting user preferences. To address this gap, we propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained Proximal Policy Optimization (PPO) model. Our experiment results demonstrate that this approach can achieve up to an absolute improvement of $5.1\%$ in LLM performance, in terms of win rate against the reference model, compared with the traditional PPO model.
Updated: 2024-06-04 20:21:45
标题: 通过细粒度监督对齐大型语言模型
摘要: 预训练的大规模语言模型(LLMs)擅长生成连贯的文章,但它们的输出可能不真实、有毒,或者与用户期望不符。目前的方法侧重于使用强化学习与人类反馈(RLHF)来改善模型对齐,这通过将LLM输出的粗略人类偏好转化为反馈信号来指导模型学习过程。然而,由于这种方法基于序列级反馈运作,缺乏精确性来识别影响用户偏好的确切部分。为了弥补这一差距,我们提出了一种通过细粒度标记级别监督来增强LLM对齐的方法。具体来说,我们要求标记者最小程度地编辑标准奖励建模数据集中较不受欢迎的响应,使它们更受欢迎,确保只在必要时进行更改,同时保留大部分原始内容。精炼的数据集用于训练标记级别奖励模型,然后用于训练我们的细粒度Proximal Policy Optimization(PPO)模型。我们的实验结果表明,与传统的PPO模型相比,这种方法在LLM性能方面可以实现高达5.1%的绝对改善,即胜率与参考模型相比。
更新时间: 2024-06-04 20:21:45
领域: cs.CL,cs.AI,cs.LG
Scalable Online Exploration via Coverability
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration. Within this framework, we introduce a new objective, $L_1$-Coverage, which generalizes previous exploration schemes and supports three fundamental desiderata: 1. Intrinsic complexity control. $L_1$-Coverage is associated with a structural parameter, $L_1$-Coverability, which reflects the intrinsic statistical difficulty of the underlying MDP, subsuming Block and Low-Rank MDPs. 2. Efficient planning. For a known MDP, optimizing $L_1$-Coverage efficiently reduces to standard policy optimization, allowing flexible integration with off-the-shelf methods such as policy gradient and Q-learning approaches. 3. Efficient exploration. $L_1$-Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability. Empirically, we find that $L_1$-Coverage effectively drives off-the-shelf policy optimization algorithms to explore the state space.
Updated: 2024-06-04 20:12:47
标题: 可扩展的在线探索:通过覆盖性
摘要: 探索是强化学习中的一项重大挑战,特别是对于需要函数逼近的高维领域。我们提出探索目标 - 政策优化目标,可以使下游最大化任何奖励函数,作为一个概念框架来系统化研究探索。在这个框架内,我们引入一个新的目标,$L_1$-Coverage,它泛化了先前的探索方案,并支持三个基本愿望: 1. 内在复杂度控制。$L_1$-Coverage与一个结构参数$L_1$-Coverability相关联,反映了底层MDP的内在统计难度,包括Block和Low-Rank MDPs。 2. 高效规划。对于已知的MDP,优化$L_1$-Coverage可以高效地转化为标准的政策优化,允许与诸如政策梯度和Q-learning等现成方法灵活集成。 3. 高效探索。$L_1$-Coverage使得在具有低覆盖性的MDP中进行在线(无奖励或受奖励驱动)强化学习的第一个计算有效的基于模型和无模型算法成为可能。在实证研究中,我们发现$L_1$-Coverage有效地推动现成的政策优化算法探索状态空间。
更新时间: 2024-06-04 20:12:47
领域: cs.LG,math.OC,stat.ML
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing on their principles, strengths, and limitations. The survey also covers tasks related to automatic story generation, such as image and video captioning, and visual question answering, as well as story generation without visual inputs. These tasks share common challenges with visual story generation and have served as inspiration for the techniques used in the field. We analyze the main datasets and evaluation metrics, providing a critical perspective on their limitations.
Updated: 2024-06-04 20:07:58
标题: 通过视觉输入生成故事:技术、相关任务和挑战
摘要: 从视觉数据中创造引人入胜的叙事对于自动化数字媒体消费、辅助技术和互动娱乐至关重要。本文概述了用于生成这些叙事的方法论,重点关注其原则、优势和局限性。该调查还涵盖了与自动生成故事相关的任务,如图像和视频字幕以及视觉问答,以及没有视觉输入的故事生成。这些任务与视觉叙事生成共享共同挑战,并已成为该领域使用的技术的灵感来源。我们分析了主要数据集和评估指标,提供了对它们局限性的批判性视角。
更新时间: 2024-06-04 20:07:58
领域: cs.CV,cs.AI,I.2.7; I.2.10
Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions
Although binary classification is a well-studied problem in computer vision, training reliable classifiers under severe class imbalance remains a challenging problem. Recent work has proposed techniques that mitigate the effects of training under imbalance by modifying the loss functions or optimization methods. While this work has led to significant improvements in the overall accuracy in the multi-class case, we observe that slight changes in hyperparameter values of these methods can result in highly variable performance in terms of Receiver Operating Characteristic (ROC) curves on binary problems with severe imbalance. To reduce the sensitivity to hyperparameter choices and train more general models, we propose training over a family of loss functions, instead of a single loss function. We develop a method for applying Loss Conditional Training (LCT) to an imbalanced classification problem. Extensive experiment results, on both CIFAR and Kaggle competition datasets, show that our method improves model performance and is more robust to hyperparameter choices. Code is available at https://github.com/klieberman/roc_lct.
Updated: 2024-06-04 20:03:25
标题: 在类别不平衡数据上通过训练一系列损失函数来优化ROC曲线
摘要: 尽管二元分类在计算机视觉中是一个经过深入研究的问题,但在严重的类别不平衡情况下训练可靠的分类器仍然是一个具有挑战性的问题。最近的工作提出了通过修改损失函数或优化方法来减轻在不平衡情况下训练的影响的技术。虽然这项工作在多类情况下的整体准确性方面取得了显著的改善,但我们观察到这些方法的超参数值的轻微变化可能导致在具有严重不平衡的二元问题上的接收器操作特征(ROC)曲线方面的性能高度变化。为了减少对超参数选择的敏感性并训练更通用的模型,我们提出在一组损失函数上进行训练,而不是单个损失函数。我们开发了一种将损失条件训练(LCT)应用于不平衡分类问题的方法。在CIFAR和Kaggle竞赛数据集上进行的大量实验结果显示,我们的方法提高了模型性能,并且对超参数选择更加稳健。代码可在https://github.com/klieberman/roc_lct 上找到。
更新时间: 2024-06-04 20:03:25
领域: cs.LG,cs.CV
Measuring Stochastic Data Complexity with Boltzmann Influence Functions
Estimating the uncertainty of a model's prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts. A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point, and decreases confidence in a prediction if other labels are also consistent with the model and training data. In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in both labelled and unlabelled settings. We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or beats strong baseline methods.
Updated: 2024-06-04 20:01:39
标题: 用玻尔兹曼影响函数测量随机数据复杂性
摘要: 估计模型对测试点的预测的不确定性是确保可靠性和在分布转移下校准的关键部分。这个问题的最小描述长度方法使用了预测标准化最大似然(pNML)分布,该分布考虑了数据点的每一个可能的标签,并且如果其他标签也与模型和训练数据一致,则降低对预测的信心。在这项工作中,我们提出了IF-COMP,这是pNML分布的一个可伸缩和高效的近似方法,它通过温度调节的玻尔兹曼影响函数线性化模型。IF-COMP可用于在测试点上产生校准良好的预测,同时在有标签和无标签设置中测量复杂性。我们在不确定性校准、误标签检测和OOD检测任务上对IF-COMP进行了实验证实,结果表明它始终能够匹配或超越强基准方法。
更新时间: 2024-06-04 20:01:39
领域: cs.LG
DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning
Differentially Private Stochastic Gradients Descent (DP-SGD) is a prominent paradigm for preserving privacy in deep learning. It ensures privacy by perturbing gradients with random noise calibrated to their entire norm at each training step. However, this perturbation suffers from a sub-optimal performance: it repeatedly wastes privacy budget on the general converging direction shared among gradients from different batches, which we refer as common knowledge, yet yields little information gain. Motivated by this, we propose a differentially private training framework with early gradient decomposition and reconstruction (DPDR), which enables more efficient use of the privacy budget. In essence, it boosts model utility by focusing on incremental information protection and recycling the privatized common knowledge learned from previous gradients at early training steps. Concretely, DPDR incorporates three steps. First, it disentangles common knowledge and incremental information in current gradients by decomposing them based on previous noisy gradients. Second, most privacy budget is spent on protecting incremental information for higher information gain. Third, the model is updated with the gradient reconstructed from recycled common knowledge and noisy incremental information. Theoretical analysis and extensive experiments show that DPDR outperforms state-of-the-art baselines on both convergence rate and accuracy.
Updated: 2024-06-04 19:57:47
标题: DPDR:差分隐私深度学习的梯度分解和重构
摘要: 差分隐私随机梯度下降(DP-SGD)是在深度学习中保护隐私的一个重要范例。它通过在每个训练步骤中用与整个梯度范数校准的随机噪声扰动梯度来确保隐私。然而,这种扰动存在效果亚优的问题:它反复在不同批次梯度中共享的普遍收敛方向上浪费隐私预算,我们将其称为共同知识,却产生很少信息增益。受此启发,我们提出了一种具有早期梯度分解和重构(DPDR)的差分私有训练框架,它能够更有效地利用隐私预算。本质上,它通过专注于增量信息保护和回收早期训练步骤中学到的经过私有化的共同知识,提升了模型效用。具体来说,DPDR 包括三个步骤。首先,通过根据先前的带噪梯度对当前梯度进行分解,将共同知识和增量信息进行解耦。其次,大部分隐私预算用于保护增量信息以获取更高的信息增益。第三,模型使用从回收的共同知识和带噪增量信息重构的梯度进行更新。理论分析和广泛实验表明,DPDR 在收敛速度和准确性方面优于最先进的基准。
更新时间: 2024-06-04 19:57:47
领域: cs.CR,cs.LG
Tolerant Algorithms for Learning with Arbitrary Covariate Shift
We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], allowing abstention on adversarially generated parts of the test distribution, and TDS learning [Klivans, Stavropoulos, Vasilyan COLT 2024], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift. We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant testable learning [Rubinfeld and Vasilyan STOC 2023], and learning with nasty noise.
Updated: 2024-06-04 19:50:05
标题: 适用于任意协变量转移学习的容忍算法
摘要: 我们研究了在任意分布转移下学习的问题,即学习者在从一个分布中的标记集上进行训练,但在不同的、可能是对手生成的测试分布上进行评估。我们关注两个框架:PQ学习[Golwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020],允许在测试分布的对手生成部分上放弃,以及TDS学习[Klivans, Stavropoulos, Vasilyan COLT 2024],如果检测到分布转移,则允许在整个测试分布上放弃。所有先前已知的算法要么依赖于对于简单函数类而言计算上困难的学习原语,要么即使在存在微小的分布转移时也最终会完全放弃。 我们针对自然函数类,包括半空间和决策树的交集,以及标准的训练分布,包括高斯分布,解决了这两个挑战。对于PQ学习,我们提供了高效的学习算法,而对于TDS学习,我们的算法可以容忍适度的分布转移。我们方法的核心是对学习中带有恶性噪声的谱异常值去除技术的改进分析。我们的分析可以(1)处理任意大比例的异常值,这对处理任意分布转移至关重要,以及(2)在异常值去除后获得关于分布的多项式矩的更强边界,为在分布转移下的多项式回归提供了新的见解。最后,我们的技术为容忍可测试学习[Rubinfeld and Vasilyan STOC 2023]和学习中的恶性噪声带来了新的结果。
更新时间: 2024-06-04 19:50:05
领域: cs.DS,cs.LG
Statistical Guarantees of Group-Invariant GANs
Group-invariant generative adversarial networks (GANs) are a type of GANs in which the generators and discriminators are hardwired with group symmetries. Empirical studies have shown that these networks are capable of learning group-invariant distributions with significantly improved data efficiency. In this study, we aim to rigorously quantify this improvement by analyzing the reduction in sample complexity for group-invariant GANs. Our findings indicate that when learning group-invariant distributions, the number of samples required for group-invariant GANs decreases proportionally by a factor of the group size. Importantly, this sample complexity reduction cannot be achieved merely through data augmentation due to the probabilistic dependence of augmented data. Numerical results substantiate our theory and highlight the stark contrast between learning with group-invariant GANs and using data augmentation. This work presents the first statistical performance guarantees for group-invariant generative models, specifically for GANs, and it may shed light on the study of other generative models with group symmetries.
Updated: 2024-06-04 19:48:43
标题: 群不变性GAN的统计保证
摘要: 群不变生成对抗网络(GANs)是一种GANs类型,其中生成器和鉴别器被硬连线具有群对称性。实证研究表明,这些网络能够学习具有显著改进数据效率的群不变分布。在本研究中,我们旨在通过分析群不变GANs的样本复杂度减少来严格量化这种改进。我们的发现表明,在学习群不变分布时,群不变GANs所需的样本数量按照群大小的因子成比例减少。重要的是,这种样本复杂度的减少不能仅通过数据增强来实现,因为增强数据的概率依赖性。数值结果证实了我们的理论,并突显了使用群不变GANs与使用数据增强之间的鲜明对比。这项工作为群不变生成模型,特别是GANs,提供了首个统计性能保证,并可能为具有群对称性的其他生成模型的研究提供启示。
更新时间: 2024-06-04 19:48:43
领域: stat.ML,cs.LG
Long Range Propagation on Continuous-Time Dynamic Graphs
Learning Continuous-Time Dynamic Graphs (C-TDGs) requires accurately modeling spatio-temporal information on streams of irregularly sampled events. While many methods have been proposed recently, we find that most message passing-, recurrent- or self-attention-based methods perform poorly on long-range tasks. These tasks require correlating information that occurred "far" away from the current event, either spatially (higher-order node information) or along the time dimension (events occurred in the past). To address long-range dependencies, we introduce Continuous-Time Graph Anti-Symmetric Network (CTAN). Grounded within the ordinary differential equations framework, our method is designed for efficient propagation of information. In this paper, we show how CTAN's (i) long-range modeling capabilities are substantiated by theoretical findings and how (ii) its empirical performance on synthetic long-range benchmarks and real-world benchmarks is superior to other methods. Our results motivate CTAN's ability to propagate long-range information in C-TDGs as well as the inclusion of long-range tasks as part of temporal graph models evaluation.
Updated: 2024-06-04 19:42:19
标题: 连续时间动态图上的长距离传播
摘要: 学习连续时间动态图(C-TDGs)需要准确地对不规则采样事件流的时空信息进行建模。尽管最近提出了许多方法,但我们发现大多数基于消息传递、循环或自注意力的方法在长距离任务上表现不佳。这些任务需要相关联远离当前事件发生地点的信息,无论是在空间上(高阶节点信息)还是沿着时间维度(过去发生的事件)。为了解决长距离依赖关系,我们引入了连续时间图反对称网络(CTAN)。基于普通微分方程框架,我们的方法旨在有效传播信息。在本文中,我们展示了CTAN的(i)长距离建模能力通过理论发现得到证实,以及(ii)它在合成长距离基准和真实世界基准上的实证表现优于其他方法。我们的结果证明了CTAN在C-TDGs中传播长距离信息的能力,以及将长距离任务作为时间图模型评估的一部分的必要性。
更新时间: 2024-06-04 19:42:19
领域: cs.LG
CAMP: Compiler and Allocator-based Heap Memory Protection
The heap is a critical and widely used component of many applications. Due to its dynamic nature, combined with the complexity of heap management algorithms, it is also a frequent target for security exploits. To enhance the heap's security, various heap protection techniques have been introduced, but they either introduce significant runtime overhead or have limited protection. We present CAMP, a new sanitizer for detecting and capturing heap memory corruption. CAMP leverages a compiler and a customized memory allocator. The compiler adds boundary-checking and escape-tracking instructions to the target program, while the memory allocator tracks memory ranges, coordinates with the instrumentation, and neutralizes dangling pointers. With the novel error detection scheme, CAMP enables various compiler optimization strategies and thus eliminates redundant and unnecessary check instrumentation. This design minimizes runtime overhead without sacrificing security guarantees. Our evaluation and comparison of CAMP with existing tools, using both real-world applications and SPEC CPU benchmarks, show that it provides even better heap corruption detection capability with lower runtime overhead.
Updated: 2024-06-04 19:37:41
标题: CAMP:基于编译器和分配器的堆内存保护
摘要: 堆是许多应用程序中至关重要且广泛使用的组件。由于其动态性,结合堆管理算法的复杂性,堆也经常成为安全漏洞的目标。为增强堆的安全性,引入了各种堆保护技术,但它们要么引入了显著的运行时开销,要么具有有限的保护性。 我们提出了CAMP,一种用于检测和捕获堆内存破坏的新型消毒剂。CAMP利用编译器和定制的内存分配器。编译器向目标程序添加边界检查和逃逸跟踪指令,而内存分配器跟踪内存范围,与仪器协调,并中和悬空指针。通过新颖的错误检测方案,CAMP使各种编译器优化策略成为可能,从而消除了多余和不必要的检查仪器。这种设计最大程度地降低了运行时开销,同时不损害安全性保证。我们使用真实应用程序和SPEC CPU基准测试对CAMP与现有工具进行了评估和比较,结果显示它提供了更好的堆内存破坏检测能力,并具有更低的运行时开销。
更新时间: 2024-06-04 19:37:41
领域: cs.CR,cs.SE
Synthetic Data Outliers: Navigating Identity Disclosure
Multiple synthetic data generation models have emerged, among which deep learning models have become the vanguard due to their ability to capture the underlying characteristics of the original data. However, the resemblance of the synthetic to the original data raises important questions on the protection of individuals' privacy. As synthetic data is perceived as a means to fully protect personal information, most current related work disregards the impact of re-identification risk. In particular, limited attention has been given to exploring outliers, despite their privacy relevance. In this work, we analyze the privacy of synthetic data w.r.t the outliers. Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved. Furthermore, additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
Updated: 2024-06-04 19:35:44
标题: 合成数据异常值:导航身份泄露
摘要: 许多合成数据生成模型已经出现,其中深度学习模型由于能够捕捉原始数据的基本特征而成为先锋。然而,合成数据与原始数据的相似性引发了有关个人隐私保护的重要问题。由于合成数据被视为全面保护个人信息的手段,大多数当前相关工作忽视了再识别风险的影响。特别是,尽管异常值对隐私具有重要意义,但对其进行探索的注意力有限。在这项工作中,我们分析合成数据相对于异常值的隐私。我们的主要发现表明,通过链接攻击,异常值的再识别是可行且容易实现的。此外,像差分隐私这样的额外保护措施可以防止再识别,尽管这样做会牺牲数据的效用。
更新时间: 2024-06-04 19:35:44
领域: cs.LG,cs.CR
GP+: A Python Library for Kernel-based learning via Gaussian Processes
In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+ has a few unique advantages over other GP modeling libraries. We achieve these advantages primarily by integrating nonlinear manifold learning techniques with GPs' covariance and mean functions. As part of introducing GP+, in this paper we also make methodological contributions that (1) enable probabilistic data fusion and inverse parameter estimation, and (2) equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables. We demonstrate the impact of these contributions in the context of Bayesian optimization, multi-fidelity modeling, sensitivity analysis, and calibration of computer models.
Updated: 2024-06-04 19:34:51
标题: GP+:一种通过高斯过程进行基于核的学习的Python库
摘要: 在本文中,我们介绍了GP+,一个基于高斯过程(GPs)的内核学习的开源库,GPs是强大的统计模型,完全由它们的参数协方差和均值函数所描述。GP+基于PyTorch构建,为概率学习和推断提供了一个用户友好的面向对象的工具。正如我们通过大量示例演示的那样,GP+相比其他GP建模库具有一些独特的优势。我们主要通过将非线性流形学习技术与GPs的协方差和均值函数集成,实现了这些优势。作为介绍GP+的一部分,在本文中我们还提出了一些方法论贡献,这些贡献使得(1)概率数据融合和反参数估计成为可能,(2)为GPs配备了简约的参数均值函数,涵盖具有分类和定量变量的混合特征空间。我们展示了这些贡献在贝叶斯优化、多信度建模、敏感性分析和计算模型校准等领域的影响。
更新时间: 2024-06-04 19:34:51
领域: cs.LG,stat.ML
Proof-of-Learning with Incentive Security
Most concurrent blockchain systems rely heavily on the Proof-of-Work (PoW) or Proof-of-Stake (PoS) mechanisms for decentralized consensus and security assurance. However, the substantial energy expenditure stemming from computationally intensive yet meaningless tasks has raised considerable concerns surrounding traditional PoW approaches, The PoS mechanism, while free of energy consumption, is subject to security and economic issues. Addressing these issues, the paradigm of Proof-of-Useful-Work (PoUW) seeks to employ challenges of practical significance as PoW, thereby imbuing energy consumption with tangible value. While previous efforts in Proof of Learning (PoL) explored the utilization of deep learning model training SGD tasks as PoUW challenges, recent research has revealed its vulnerabilities to adversarial attacks and the theoretical hardness in crafting a byzantine-secure PoL mechanism. In this paper, we introduce the concept of incentive-security that incentivizes rational provers to behave honestly for their best interest, bypassing the existing hardness to design a PoL mechanism with computational efficiency, a provable incentive-security guarantee and controllable difficulty. Particularly, our work is secure against two attacks to the recent work of Jia et al. [2021], and also improves the computational overhead from $\Theta(1)$ to $O(\frac{\log E}{E})$. Furthermore, while most recent research assumes trusted problem providers and verifiers, our design also guarantees frontend incentive-security even when problem providers are untrusted, and verifier incentive-security that bypasses the Verifier's Dilemma. By incorporating ML training into blockchain consensus mechanisms with provable guarantees, our research not only proposes an eco-friendly solution to blockchain systems, but also provides a proposal for a completely decentralized computing power market in the new AI age.
Updated: 2024-06-04 19:34:30
标题: 学习证明与激励安全性
摘要: 大多数并发区块链系统在分散式共识和安全保障方面严重依赖工作量证明(PoW)或股权证明(PoS)机制。然而,由于计算密集但无意义任务导致的巨大能量消耗引发了对传统PoW方法的重大关注。PoS机制虽然不消耗能量,但存在安全和经济问题。为解决这些问题,Proof-of-Useful-Work(PoUW)范式旨在将具有实际意义的挑战作为PoW,从而赋予能量消耗实际价值。虽然以前的Proof of Learning(PoL)尝试将深度学习模型训练SGD任务作为PoUW挑战,但最近的研究揭示了其容易受到对抗性攻击的漏洞,以及设计拜占庭安全PoL机制的理论困难。在本文中,我们引入了激励安全的概念,激励理性的证明者为自身最大利益而诚实行事,绕过了设计PoL机制的现有困难,具有计算效率、可证明的激励安全保证和可控难度。特别是,我们的工作安全抵御了贾等人[2021]最近工作的两种攻击,并将计算开销从$\Theta(1)$改进为$O(\frac{\log E}{E})$。此外,尽管大多数最近的研究假设问题提供者和验证者是可信的,我们的设计还保证了前端激励安全,即使问题提供者不可信,也能绕过验证者困境的验证者激励安全。通过将机器学习训练纳入具有可证明保障的区块链共识机制,我们的研究不仅提出了区块链系统的生态友好解决方案,还提供了一个全新人工智能时代的完全去中心化计算能力市场的提议。
更新时间: 2024-06-04 19:34:30
领域: cs.CR,cs.AI,cs.ET,cs.GT,cs.LG
Improved Convergence of Score-Based Diffusion Models via Prediction-Correction
Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time $T_1$ by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires $T_1\to\infty$. This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as $T_1$ diverges; from a practical viewpoint, a large $T_1$ increases computational costs and leads to error propagation. This paper addresses the issue by considering a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time $T_1$. Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the $L^2$ loss on the score approximation, which is the quantity minimized in practice.
Updated: 2024-06-04 19:24:50
标题: 通过预测校正改进基于分数的扩散模型的收敛性
摘要: 基于得分的生成模型(SGMs)是从复杂数据分布中抽样的强大工具。其基本思想是(i)通过向数据添加噪声在时间$T_1$内运行正向过程,(ii)估计其得分函数,(iii)使用该估计来运行逆向过程。由于逆向过程是以正向过程的稳态分布为初始值,现有的分析范式要求$T_1\to\infty$。然而,这在理论上存在问题:对于给定精度的得分近似,当$T_1$发散时,收敛保证失败;从实际角度看,较大的$T_1$会增加计算成本并导致误差传播。本文通过考虑一种流行的预测校正方案的版本来解决这个问题:在运行正向过程后,我们首先通过不精确的朗之万动力学来估计最终分布,然后恢复过程。我们的关键技术贡献是提供收敛保证,只需要在固定有限时间$T_1$内运行正向过程。我们的界限在输入维度和目标分布的次高斯范数上表现出较弱的对数依赖性,对数据的假设最小,并且只需要控制得分近似上的$L^2$损失,这在实践中是最小化的数量。
更新时间: 2024-06-04 19:24:50
领域: cs.LG,math.ST,stat.ML,stat.TH
GEFL: Extended Filtration Learning for Graph Classification
Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods.
Updated: 2024-06-04 19:18:05
标题: GEFL:图分类的扩展过滤学习
摘要: 扩展持久性是拓扑数据分析中的一种技术,用于从图中获取全局多尺度的拓扑信息。这包括由所谓的持久性条形码捕获的有关连通分量和循环的信息。我们将扩展持久性引入到用于图分类的监督学习框架中。全局拓扑信息以四种不同类型的条形码及其明确的循环代表形式的形式结合到模型中,这是通过扩展持久性计算的读取函数实现的。整个模型是端到端可微的。我们使用链接剪切树数据结构和并行性来降低计算扩展持久性的复杂性,获得比最先进的扩展持久性计算速度提升超过60倍。这使得扩展持久性在机器学习中变得可行。我们展示,在某些条件下,扩展持久性在表达能力上超越了WL图同构测试和0维条形码,因为它添加了更多全局(拓扑)信息。特别是,可以表示任意长的循环,这对于有限接受域消息传递图神经网络来说是困难的。此外,我们展示了与许多现有最新的图表示学习方法相比,我们的方法在真实世界数据集上的有效性。
更新时间: 2024-06-04 19:18:05
领域: cs.LG,cs.DS
Temporal Graph Learning Recurrent Neural Network for Traffic Forecasting
Accurate traffic flow forecasting is a crucial research topic in transportation management. However, it is a challenging problem due to rapidly changing traffic conditions, high nonlinearity of traffic flow, and complex spatial and temporal correlations of road networks. Most existing studies either try to capture the spatial dependencies between roads using the same semantic graph over different time steps, or assume all sensors on the roads are equally likely to be connected regardless of the distance between them. However, we observe that the spatial dependencies between roads indeed change over time, and two distant roads are not likely to be helpful to each other when predicting the traffic flow, both of which limit the performance of existing studies. In this paper, we propose Temporal Graph Learning Recurrent Neural Network (TGLRN) to address these problems. More precisely, to effectively model the nature of time series, we leverage Recurrent Neural Networks (RNNs) to dynamically construct a graph at each time step, thereby capturing the time-evolving spatial dependencies between roads (i.e., microscopic view). Simultaneously, we provide the Adaptive Structure Information to the model, ensuring that close and consecutive sensors are considered to be more important for predicting the traffic flow (i.e., macroscopic view). Furthermore, to endow TGLRN with robustness, we introduce an edge sampling strategy when constructing the graph at each time step, which eventually leads to further improvements on the model performance. Experimental results on four commonly used real-world benchmark datasets show the effectiveness of TGLRN.
Updated: 2024-06-04 19:08:40
标题: 时间图学习循环神经网络用于交通预测
摘要: 准确的交通流量预测是交通管理中一个关键的研究课题。然而,由于交通状况的快速变化、交通流的高非线性性以及道路网络的复杂空间和时间相关性,这是一个具有挑战性的问题。大多数现有研究要么尝试使用相同的语义图捕捉道路之间的空间依赖性,要么假设道路上的所有传感器在不考虑它们之间的距离的情况下都有相等的连接可能性。然而,我们观察到,道路之间的空间依赖性确实随时间变化,并且两条距离较远的道路在预测交通流量时不太可能相互有帮助,这些都限制了现有研究的性能。在本文中,我们提出了一种称为Temporal Graph Learning Recurrent Neural Network(TGLRN)的方法来解决这些问题。更准确地说,为了有效地模拟时间序列的特性,我们利用循环神经网络(RNNs)在每个时间步动态构建一个图,从而捕捉道路之间的时变空间依赖性(即微观视角)。同时,我们向模型提供自适应结构信息,确保将靠近和连续的传感器视为更重要的因素来预测交通流量(即宏观视角)。此外,为了使TGLRN具有鲁棒性,我们在构建每个时间步的图时引入了边采样策略,最终提高了模型性能。在四个常用的真实世界基准数据集上的实验结果显示了TGLRN的有效性。
更新时间: 2024-06-04 19:08:40
领域: cs.LG
Predicting AI Agent Behavior through Approximation of the Perron-Frobenius Operator
Predicting the behavior of AI-driven agents is particularly challenging without a preexisting model. In our paper, we address this by treating AI agents as nonlinear dynamical systems and adopting a probabilistic perspective to predict their statistical behavior using the Perron-Frobenius (PF) operator. We formulate the approximation of the PF operator as an entropy minimization problem, which can be solved by leveraging the Markovian property of the operator and decomposing its spectrum. Our data-driven methodology simultaneously approximates the PF operator to perform prediction of the evolution of the agents and also predicts the terminal probability density of AI agents, such as robotic systems and generative models. We demonstrate the effectiveness of our prediction model through extensive experiments on practical systems driven by AI algorithms.
Updated: 2024-06-04 19:06:49
标题: 通过逼近Perron-Frobenius算子预测AI代理的行为
摘要: 预测由人工智能驱动的代理的行为在没有预先存在的模型的情况下尤其具有挑战性。在我们的论文中,我们通过将人工智能代理视为非线性动力系统,并采用概率视角来预测它们的统计行为,使用Perron-Frobenius(PF)算子来解决这个问题。我们将PF算子的近似形式化为一个熵最小化问题,可以通过利用算子的马尔可夫性质和分解其谱来解决。我们的数据驱动方法同时近似PF算子来预测代理的演变,并预测人工智能代理的终端概率密度,如机器人系统和生成模型。我们通过对由人工智能算法驱动的实际系统进行广泛实验,展示了我们预测模型的有效性。
更新时间: 2024-06-04 19:06:49
领域: cs.AI
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.
Updated: 2024-06-04 19:05:10
标题: 通过将后缀梯度压缩成前缀控制器来自我控制LLM行为
摘要: 我们提出了自控(Self-Control),这是一种利用后缀梯度来控制大型语言模型(LLMs)行为的新方法,而无需明确的人类注释。给定以后缀字符串表示的指导方针和模型对遵守程度的自我评估,自控计算这种自我判断关于模型隐藏状态的梯度,直接影响自回归生成过程朝向期望的行为。为了提高效率,我们引入了Self-Control_{prefix},一个紧凑的模块,将从后缀梯度中学到的表示封装成一个前缀控制器,为各种LLM行为提供推理时控制。我们的实验证明了自控在多个领域的有效性,包括情感调节、确保无害性和增强复杂推理。特别是,Self-Control_{prefix}可以实现即插即用的控制,并联合控制多个属性,改善模型输出,而不改变模型参数或增加推理时成本。
更新时间: 2024-06-04 19:05:10
领域: cs.CL,cs.AI
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for handling requests involving external tools. We introduce a novel interface for tool developers to expose partial execution opportunities to the LLM serving system and a request scheduler that facilitates partial tool execution. Our results demonstrate that tool partial execution can improve request completion latency by up to 38.8%.
Updated: 2024-06-04 19:00:36
标题: 传送带:具有部分工具执行的高效工具感知LLM服务
摘要: 大型语言模型(LLM)服务工作负载的复杂性由于与外部工具调用的集成而大幅增加,例如ChatGPT插件。在本文中,我们发现了一种新的有效LLM服务请求的机会:在LLM解码过程中执行工具的部分。为此,我们设计了Conveyor,一种针对处理涉及外部工具的请求进行优化的高效LLM服务系统。我们引入了一种新颖的接口,供工具开发人员向LLM服务系统公开部分执行机会,并设计了一个请求调度程序,促进部分工具执行。我们的结果表明,工具的部分执行可以将请求完成延迟提高高达38.8%。
更新时间: 2024-06-04 19:00:36
领域: cs.CL,cs.DC,cs.LG
Optimal Rates for DP-SCO with a Single Epoch and Large Batches
The most common algorithms for differentially private (DP) machine learning (ML) are all based on stochastic gradient descent, for example, DP-SGD. These algorithms achieve DP by treating each gradient as an independent private query. However, this independence can cause us to overpay in privacy loss because we don't analyze the entire gradient trajectory. In this work, we propose a new DP algorithm, which we call Accelerated-DP-SRGD (DP stochastic recursive gradient descent), that enables us to break this independence and only pay for privacy in the gradient difference, i.e., in the new information at the current step. Our algorithm achieves the optimal DP-stochastic convex optimization (DP-SCO) error (up to polylog factors) using only a single epoch over the dataset, and converges at the Nesterov's accelerated rate. Our algorithm can be run in at most $\sqrt{n}$ batch gradient steps with batch size at least $\sqrt{n}$, unlike prior work which required $O(n)$ queries with mostly constant batch sizes. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting. Finally, we also show that our algorithm improves over existing SoTA on multi-class logistic regression on MNIST and CIFAR-10.
Updated: 2024-06-04 18:59:42
标题: 单个纪元和大批量下DP-SCO的最佳速率
摘要: 最常见的差分隐私(DP)机器学习(ML)算法都基于随机梯度下降,例如DP-SGD。这些算法通过将每个梯度视为独立的私有查询来实现DP。然而,这种独立性可能导致我们在隐私损失方面支付过多,因为我们没有分析整个梯度轨迹。在这项工作中,我们提出了一种新的DP算法,称为加速-DP-SRGD(DP随机递归梯度下降),它使我们能够打破这种独立性,并且只支付梯度差异中的隐私,即在当前步骤的新信息中支付隐私。我们的算法在数据集上仅使用一个epoch就实现了最佳的DP-随机凸优化(DP-SCO)误差(多对数因子),并且以Nesterov的加速速率收敛。 我们的算法可以在最多$\sqrt{n}$批梯度步骤中运行,批量大小至少为$\sqrt{n}$,这与以前的工作需要$O(n)$查询并且大多数情况下批量大小恒定的情况不同。为了实现这一点,我们的算法结合了三个关键因素,即随机递归梯度(SRG)的变体,加速梯度下降,以及来自DP连续计数的相关噪声生成。 最后,我们还展示了我们的算法在MNIST和CIFAR-10上的多类逻辑回归中优于现有的SoTA。
更新时间: 2024-06-04 18:59:42
领域: cs.LG,cs.CR
Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance
From the classical and influential works of Neal (1996), it is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance. Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits. The tractable properties of Gaussian processes then allow straightforward posterior inference and uncertainty quantification, considerably simplifying the study of the limit process compared to a network of finite width. Neural network weights with unbounded variance, however, pose unique challenges. In this case, the classical central limit theorem breaks down and it is well known that the scaling limit is an $\alpha$-stable process under suitable conditions. However, current literature is primarily limited to forward simulations under these processes and the problem of posterior inference under such a scaling limit remains largely unaddressed, unlike in the Gaussian process case. To this end, our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.
Updated: 2024-06-04 18:54:55
标题: 无穷宽度浅层贝叶斯神经网络中权重具有无界方差的后验推断
摘要: 根据Neal(1996)的经典和有影响力的作品,已知具有有界先验方差的贝叶斯神经网络在隐藏层为一层时的无限宽度缩放极限是一个高斯过程。Neal的结果已经扩展到具有多个隐藏层和卷积神经网络的网络,并且也具有高斯过程缩放极限。高斯过程的可处理特性使得后验推断和不确定性量化变得直接,相比于有限宽度的网络,极限过程的研究大大简化。然而,具有无界方差的神经网络权重带来独特挑战。在这种情况下,经典的中心极限定理失效,根据适当条件,已知缩放极限是$\alpha$-稳定过程。然而,当前文献主要限于在这些过程下进行前向模拟,而在这种缩放极限下进行后验推断的问题仍然大多未解决,不像在高斯过程情况下。因此,我们的贡献是一个解释性和计算效率高的后验推断程序,使用条件高斯表示,然后充分利用高斯过程机制进行可处理的后验推断和在非高斯区域中的不确定性量化。
更新时间: 2024-06-04 18:54:55
领域: stat.ML,cs.LG
Self-Trained Model for ECG Complex Delineation
Electrocardiogram (ECG) delineation plays a crucial role in assisting cardiologists with accurate diagnoses. Prior research studies have explored various methods, including the application of deep learning techniques, to achieve precise delineation. However, existing approaches face limitations primarily related to dataset size and robustness. In this paper, we introduce a dataset for ECG delineation and propose a novel self-trained method aimed at leveraging a vast amount of unlabeled ECG data. Our approach involves the pseudolabeling of unlabeled data using a neural network trained on our dataset. Subsequently, we train the model on the newly labeled samples to enhance the quality of delineation. We conduct experiments demonstrating that our dataset is a valuable resource for training robust models and that our proposed self-trained method improves the prediction quality of ECG delineation.
Updated: 2024-06-04 18:54:10
标题: 自训练模型用于心电图复杂轮廓的描绘
摘要: 心电图(ECG)的描绘在协助心脏病学家进行准确诊断方面起着至关重要的作用。先前的研究已经探讨了各种方法,包括应用深度学习技术,以实现精确的描绘。然而,现有方法存在主要与数据集大小和稳健性相关的限制。在本文中,我们介绍了一个用于心电图描绘的数据集,并提出了一种旨在利用大量未标记的心电图数据的新颖的自训练方法。我们的方法涉及使用在我们的数据集上训练的神经网络对未标记数据进行伪标记。随后,我们在新标记的样本上训练模型,以增强描绘的质量。我们进行了实验证明我们的数据集是训练稳健模型的宝贵资源,并且我们提出的自训练方法改善了心电图描绘的预测质量。
更新时间: 2024-06-04 18:54:10
领域: cs.LG,eess.SP
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected. While the principles of optimism or pessimism under uncertainty are well-established in standard reinforcement learning (RL), a practically-implementable and theoretically-grounded form amenable to large language models is not yet available, as standard techniques for constructing confidence intervals become intractable under arbitrary policy parameterizations. In this paper, we introduce a unified approach to online and offline RLHF -- value-incentivized preference optimization (VPO) -- which regularizes the maximum-likelihood estimate of the reward function with the corresponding value function, modulated by a $\textit{sign}$ to indicate whether the optimism or pessimism is chosen. VPO also directly optimizes the policy with implicit reward modeling, and therefore shares a simpler RLHF pipeline similar to direct preference optimization. Theoretical guarantees of VPO are provided for both online and offline settings, matching the rates of their standard RL counterparts. Moreover, experiments on text summarization and dialog verify the practicality and effectiveness of VPO.
Updated: 2024-06-04 18:49:36
标题: 价值激励的偏好优化:在线和离线RLHF的统一方法
摘要: 人类反馈强化学习(RLHF)已经展示出与人类偏好相一致的大型语言模型(LLMs)的巨大潜力。根据偏好数据的可用性,在线和离线RLHF都是活跃的研究领域。一个关键瓶颈是了解如何在从偏好数据中学习到的奖励函数中合并不确定性估计,无论偏好数据是如何收集的。虽然在标准强化学习(RL)中,乐观或悲观的不确定性原则已经被广泛接受,但适用于大型语言模型的实用可实施且有理论基础的形式尚未出现,因为在任意策略参数化下构建置信区间的标准技术变得棘手。 在本文中,我们介绍了一种统一的在线和离线RLHF方法--价值激励偏好优化(VPO)--它通过与相应的值函数调节的奖励函数的最大似然估计进行正则化,并通过一个$\textit{sign}$表示选择乐观或悲观。VPO还直接优化具有隐式奖励建模的策略,因此与直接偏好优化类似,共享一个更简单的RLHF流水线。VPO在在线和离线设置下提供了理论保证,与它们的标准RL对应物的速率相匹配。此外,对文本摘要和对话的实验验证了VPO的实用性和有效性。
更新时间: 2024-06-04 18:49:36
领域: cs.LG,cs.AI,stat.ML
Extreme Compression of Adaptive Neural Images
Implicit Neural Representations (INRs) and Neural Fields are a novel paradigm for signal representation, from images and audio to 3D scenes and videos. The fundamental idea is to represent a signal as a continuous and differentiable neural network. This idea offers unprecedented benefits such as continuous resolution and memory efficiency, enabling new compression techniques. However, representing data as neural networks poses new challenges. For instance, given a 2D image as a neural network, how can we further compress such a neural image?. In this work, we present a novel analysis on compressing neural fields, with the focus on images. We also introduce Adaptive Neural Images (ANI), an efficient neural representation that enables adaptation to different inference or transmission requirements. Our proposed method allows to reduce the bits-per-pixel (bpp) of the neural image by 4x, without losing sensitive details or harming fidelity. We achieve this thanks to our successful implementation of 4-bit neural representations. Our work offers a new framework for developing compressed neural fields.
Updated: 2024-06-04 18:42:01
标题: 自适应神经图像的极端压缩
摘要: Implicit Neural Representations (INRs)和神经场是一种新的信号表示范式,从图像和音频到3D场景和视频。基本思想是将信号表示为连续和可微的神经网络。这一想法带来了前所未有的好处,如连续分辨率和内存效率,使新的压缩技术成为可能。然而,将数据表示为神经网络也带来了新的挑战。例如,将2D图像表示为神经网络后,如何进一步压缩这样的神经图像?在这项工作中,我们提出了对神经场进行压缩分析的新方法,重点是图像。我们还引入了自适应神经图像(ANI),这是一种高效的神经表示,能够适应不同的推理或传输需求。我们提出的方法可以将神经图像的每像素位数(bpp)减少4倍,而不会丢失敏感细节或损害保真度。我们之所以能够实现这一点,得益于我们成功实现了4位神经表示。我们的工作为开发压缩神经场提供了一个新的框架。
更新时间: 2024-06-04 18:42:01
领域: cs.CV,cs.AI,cs.GR,cs.MM
Window to Wall Ratio Detection using SegFormer
Window to Wall Ratios (WWR) are key to assessing the energy, daylight and ventilation performance of buildings. Studies have shown that window area has a large impact on building performance and simulation. However, data to set up these environmental models and simulations is typically not available. Instead, a standard 40% WWR is typically assumed for all buildings. This paper leverages existing computer vision window detection methods to predict WWR of buildings from external street view images using semantic segmentation, demonstrating the potential for adapting established computer vision technique in architectural applications
Updated: 2024-06-04 18:36:11
标题: 使用SegFormer进行窗户到墙比率的检测
摘要: 窗墙比(WWR)是评估建筑能源、采光和通风性能的关键。研究表明,窗户面积对建筑性能和模拟有很大影响。然而,用于建立这些环境模型和模拟的数据通常不可用。相反,通常假定所有建筑物的窗墙比为标准的40%。本文利用现有的计算机视觉窗户检测方法,通过语义分割从外部街景图像中预测建筑物的窗墙比,展示了在建筑应用中采用已建立的计算机视觉技术的潜力。
更新时间: 2024-06-04 18:36:11
领域: cs.CV,cs.LG
Efficient Exploration for LLMs
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.
Updated: 2024-06-04 18:35:09
标题: LLMs的高效探索
摘要: 我们提供了有效探索在收集人类反馈以改进大型语言模型中带来重大益处的证据。在我们的实验中,一个代理顺序地生成查询,同时将奖励模型拟合到收到的反馈中。我们表现最佳的代理使用双Thompson抽样生成查询,不确定性由认知神经网络表示。我们的结果表明,有效的探索使得在更少的查询数量下实现高水平的性能成为可能。此外,不确定性估计和探索方案的选择都起着关键作用。
更新时间: 2024-06-04 18:35:09
领域: cs.LG,cs.AI,cs.CL,stat.ME,stat.ML
Deep Stochastic Mechanics
This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schr\"odinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function by sampling from the Markovian diffusion. Depending on the latent dimension, our method may have far lower computational complexity in higher dimensions. Moreover, we propose novel equations for stochastic quantum mechanics, resulting in quadratic computational complexity with respect to the number of dimensions. Numerical simulations verify our theoretical findings and show a significant advantage of our method compared to other deep-learning-based approaches used for quantum mechanics.
Updated: 2024-06-04 18:30:51
标题: 深度随机力学
摘要: 这篇论文介绍了一种基于深度学习的新方法,用于数值模拟受随机力学和生成扩散模型启发的时变Schr\"odinger方程。与现有方法不同,现有方法的计算复杂度随问题维度呈指数增长,我们的方法允许我们根据从马尔可夫扩散中采样的波函数的潜在低维结构进行调整。根据潜在维度的不同,我们的方法在高维度下可能具有更低的计算复杂度。此外,我们提出了用于随机量子力学的新方程,导致与维度数量成二次关系的计算复杂度。数值模拟验证了我们的理论发现,并显示了我们的方法相对于其他用于量子力学的基于深度学习的方法的显著优势。
更新时间: 2024-06-04 18:30:51
领域: cs.LG,quant-ph,stat.ML
Operational Latent Spaces
We investigate the construction of latent spaces through self-supervised learning to support semantically meaningful operations. Analogous to operational amplifiers, these "operational latent spaces" (OpLaS) not only demonstrate semantic structure such as clustering but also support common transformational operations with inherent semantic meaning. Some operational latent spaces are found to have arisen "unintentionally" in the progress toward some (other) self-supervised learning objective, in which unintended but still useful properties are discovered among the relationships of points in the space. Other spaces may be constructed "intentionally" by developers stipulating certain kinds of clustering or transformations intended to produce the desired structure. We focus on the intentional creation of operational latent spaces via self-supervised learning, including the introduction of rotation operators via a novel "FiLMR" layer, which can be used to enable ring-like symmetries found in some musical constructions.
Updated: 2024-06-04 18:25:15
标题: 操作潜在空间
摘要: 我们通过自监督学习调查了潜在空间的构建,以支持语义有意义的操作。类似于运算放大器,这些“操作潜在空间”(OpLaS)不仅展示出像聚类这样的语义结构,还支持具有固有语义含义的常见转换操作。一些操作潜在空间被发现在朝着某种(其他)自监督学习目标的进展中“无意中”产生,其中在空间中点之间的关系中发现了未经计划但仍然有用的属性。其他空间可能是开发者“有意地”构建的,通过规定某种聚类或转换来产生所需的结构。我们关注通过自监督学习有意创造操作潜在空间,包括通过一个新的“FiLMR”层引入旋转操作符,可以用来实现一些音乐构造中发现的环状对称性。
更新时间: 2024-06-04 18:25:15
领域: cs.LG,eess.AS,I.2.4; J.5
iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning
Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite.
Updated: 2024-06-04 18:15:44
标题: iQRL -- 隐式量化表示用于高效样本强化学习
摘要: 学习强化学习(RL)的表示已经在连续控制方面显示出了很大的潜力。我们提出了一种仅使用自监督潜在状态一致性损失的高效表示学习方法。我们的方法利用编码器和动态模型将观测映射到潜在状态,并分别预测未来的潜在状态。通过对潜在表示进行量化,使得表示的秩在经验上得以保留,从而实现了高性能并防止表示坍塌。我们的方法名为iQRL:隐式量化强化学习,简单易懂,与任何无模型RL算法兼容,并通过在DeepMind Control Suite的连续控制基准测试中胜过其他最近提出的表示学习方法,展现出卓越性能。
更新时间: 2024-06-04 18:15:44
领域: cs.LG
Toward Conversational Agents with Context and Time Sensitive Long-term Memory
There has recently been growing interest in conversational agents with long-term memory which has led to the rapid development of language models that use retrieval-augmented generation (RAG). Until recently, most work on RAG has focused on information retrieval from large databases of texts, like Wikipedia, rather than information from long-form conversations. In this paper, we argue that effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval: 1) time/event-based queries, which requires the model to retrieve information about previous conversations based on time or the order of a conversational event (e.g., the third conversation on Tuesday), and 2) ambiguous queries that require surrounding conversational context to understand. To better develop RAG-based agents that can deal with these challenges, we generate a new dataset of ambiguous and time-based questions that build upon a recent dataset of long-form, simulated conversations, and demonstrate that standard RAG based approaches handle such questions poorly. We then develop a novel retrieval model which combines chained-of-table search methods, standard vector-database retrieval, and a prompting method to disambiguate queries, and demonstrate that this approach substantially improves over current methods at solving these tasks. We believe that this new dataset and more advanced RAG agent can act as a key benchmark and stepping stone towards effective memory augmented conversational agents that can be used in a wide variety of AI applications.
Updated: 2024-06-04 18:01:03
标题: 朝向具有上下文和时间敏感的长期记忆的对话代理
摘要: 最近对具有长期记忆的对话代理引起了越来越多的兴趣,这导致了使用检索增强生成(RAG)的语言模型的快速发展。直到最近,大多数关于RAG的工作都集中在从大型文本数据库(如维基百科)中检索信息,而不是来自长篇对话的信息。在本文中,我们认为有效地从长篇对话数据中检索面临两个独特的问题,与静态数据库检索相比:1)基于时间/事件的查询,这要求模型根据时间或对话事件的顺序检索关于先前对话的信息(例如,星期二的第三次对话),以及2)需要周围对话上下文来理解的模糊查询。为了更好地开发能够处理这些挑战的基于RAG的代理,我们生成了一个新的数据集,其中包含模糊和基于时间的问题,这些问题建立在最近的长篇模拟对话数据集之上,并且展示了标准的基于RAG的方法在处理这些问题时表现不佳。然后,我们开发了一个结合了链式表搜索方法、标准矢量数据库检索和提示方法的新型检索模型,以消除查询的歧义,并展示了这种方法在解决这些任务方面显著优于当前的方法。我们相信,这个新数据集和更先进的RAG代理可以作为有效的记忆增强对话代理的关键基准和跳板,可以在各种AI应用中使用。
更新时间: 2024-06-04 18:01:03
领域: cs.CL,cs.LG
Symmetric Kernels with Non-Symmetric Data: A Data-Agnostic Learnability Bound
Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning with recent applications to highly over-parameterized deep neural networks. The ability of these tools to learn a target function is directly related to the eigenvalues of their kernel sampled on the input data. Targets having support on higher eigenvalues are more learnable. While kernels are often highly symmetric objects, the data is often not. Thus kernel symmetry seems to have little to no bearing on the above eigenvalues or learnability, making spectral analysis on real-world data challenging. Here, we show that contrary to this common lure, one may use eigenvalues and eigenfunctions associated with highly idealized data-measures to bound learnability on realistic data. As a demonstration, we give a theoretical lower bound on the sample complexity of copying heads for kernels associated with generic transformers acting on natural language.
Updated: 2024-06-04 18:00:00
标题: 对称核与非对称数据:一种数据无关的可学习性界限
摘要: 核岭回归(KRR)和高斯过程(GPs)是统计学和机器学习中的基本工具,最近在高度超参数化的深度神经网络中得到了应用。这些工具学习目标函数的能力直接与它们在输入数据上采样的核的特征值相关。支持在较高特征值上的目标更容易学习。虽然核通常是高度对称的对象,但数据通常不是。因此,核对称性似乎与上述特征值或可学习性无关,使得在真实世界数据上的谱分析具有挑战性。在这里,我们展示了与这种常见诱惑相反的是,可以使用与高度理想化的数据测量相关的特征值和特征函数来限制在现实数据上的可学习性。作为演示,我们给出了与自然语言上的通用变换器相关的核的样本复杂度的理论下限。
更新时间: 2024-06-04 18:00:00
领域: stat.ML,cond-mat.dis-nn,cs.LG
Neural Representations of Dynamic Visual Stimuli
Humans experience the world through constantly changing visual stimuli, where scenes can shift and move, change in appearance, and vary in distance. The dynamic nature of visual perception is a fundamental aspect of our daily lives, yet the large majority of research on object and scene processing, particularly using fMRI, has focused on static stimuli. While studies of static image perception are attractive due to their computational simplicity, they impose a strong non-naturalistic constraint on our investigation of human vision. In contrast, dynamic visual stimuli offer a more ecologically-valid approach but present new challenges due to the interplay between spatial and temporal information, making it difficult to disentangle the representations of stable image features and motion. To overcome this limitation -- given dynamic inputs, we explicitly decouple the modeling of static image representations and motion representations in the human brain. Three results demonstrate the feasibility of this approach. First, we show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI. Second, we show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model (where the motion is driven by fMRI brain activity). Third, we show prediction in the reverse direction: existing video encoders can be fine-tuned to predict fMRI brain activity from video imagery, and can do so more effectively than image encoders. This foundational work offers a novel, extensible framework for interpreting how the human brain processes dynamic visual information.
Updated: 2024-06-04 17:59:49
标题: 动态视觉刺激的神经表征
摘要: 人类通过不断变化的视觉刺激体验世界,场景可以转变和移动,外观变化,距离不同。视觉知觉的动态性是我们日常生活的基本特征,然而,关于对象和场景处理的大多数研究,特别是使用fMRI,主要集中在静态刺激上。虽然静态图像知觉研究因其计算简单而具有吸引力,但它们对我们对人类视觉的研究施加了强烈的非自然约束。相比之下,动态视觉刺激提供了一种更具生态效度的方法,但由于空间和时间信息之间的相互作用,使得难以分离稳定图像特征和运动的表征。为了克服这一限制 - 考虑到动态输入,我们明确地将人脑中静态图像表征和运动表征的建模分开。三个结果证明了这种方法的可行性。首先,我们展示了视觉运动信息,如光流,可以从通过fMRI测量的脑活动中预测(或解码)。其次,我们展示了这种预测的运动可以用于使用受运动条件控制的视频扩散模型(其中运动由fMRI脑活动驱动)实现静态图像的真实动画化。第三,我们展示了在相反方向的预测:现有的视频编码器可以被微调以从视频图像中预测fMRI脑活动,并且比图像编码器更有效地实现这一目标。这项基础工作提供了一个新颖的、可扩展的框架,用于解释人脑如何处理动态视觉信息。
更新时间: 2024-06-04 17:59:49
领域: q-bio.NC,cs.AI,cs.CV
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions $z = a \, x + b \, y \;\mathrm{mod}\; p$ labeled by the vector $(a, b) \in \mathbb{Z}_p^2$. We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization phase is \emph{transient}, necessitating early stopping. Finally, we perform an interpretability study of the pre-trained models, revealing the highly structured representations in both phases; and discuss the learnt algorithm.
Updated: 2024-06-04 17:59:36
标题: 学习领悟:模块算术任务中上下文学习和技能组合的出现
摘要: 大型语言模型可以解决训练集中不存在的任务。这种能力被认为是由于上下文学习和技能组合。在这项工作中,我们研究了一组模块化算术任务中上下文学习和技能组合的出现。具体来说,我们考虑一个有限的线性模块函数集合$z = a \, x + b \, y \;\mathrm{mod}\; p$,由向量$(a, b) \in \mathbb{Z}_p^2$标记。我们使用其中一些任务进行预训练,其余任务用于分布外测试。我们凭经验证明,类似GPT风格的变压器在预训练任务数量增加时,从分布内到分布外泛化呈现出过渡。我们发现,能够进行分布外泛化的最小模型需要两个变压器块,而对于更深层次的模型,分布外泛化阶段是瞬时的,需要早停止。最后,我们对预训练模型进行了解释性研究,揭示了两个阶段中高度结构化的表示;并讨论了学到的算法。
更新时间: 2024-06-04 17:59:36
领域: cs.LG,cond-mat.dis-nn,hep-th,stat.ML
Robust and highly scalable estimation of directional couplings from time-shifted signals
The estimation of directed couplings between the nodes of a network from indirect measurements is a central methodological challenge in scientific fields such as neuroscience, systems biology and economics. Unfortunately, the problem is generally ill-posed due to the possible presence of unknown delays in the measurements. In this paper, we offer a solution of this problem by using a variational Bayes framework, where the uncertainty over the delays is marginalized in order to obtain conservative coupling estimates. To overcome the well-known overconfidence of classical variational methods, we use a hybrid-VI scheme where the (possibly flat or multimodal) posterior over the measurement parameters is estimated using a forward KL loss while the (nearly convex) conditional posterior over the couplings is estimated using the highly scalable gradient-based VI. In our ground-truth experiments, we show that the network provides reliable and conservative estimates of the couplings, greatly outperforming similar methods such as regression DCM.
Updated: 2024-06-04 17:58:33
标题: 稳健且高度可扩展的从时间偏移信号中估计方向耦合
摘要: 从间接测量中估计网络节点之间的定向耦合是科学领域(如神经科学、系统生物学和经济学)中的一个核心方法学挑战。不幸的是,由于测量中可能存在未知延迟,这个问题通常是不适当的。在本文中,我们提出了一个解决这个问题的方法,即使用变分贝叶斯框架,其中通过边缘化延迟的不确定性以获得保守的耦合估计。为了克服经典变分方法的过度自信问题,我们采用了混合-VI方案,其中使用前向KL损失来估计测量参数的(可能是平坦的或多峰的)后验概率,同时使用高度可扩展的基于梯度的VI来估计耦合的条件后验概率(几乎是凸的)。在我们的基准实验中,我们展示了该网络提供了可靠且保守的耦合估计,远远优于类似方法(如回归DCM)。
更新时间: 2024-06-04 17:58:33
领域: cs.LG
To Believe or Not to Believe Your LLM
We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers). In particular, we derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large, in which case the output of the model is unreliable. This condition can be computed based solely on the output of the model obtained simply by some special iterative prompting based on the previous responses. Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses. This is in contrast to many standard uncertainty quantification strategies (such as thresholding the log-likelihood of a response) where hallucinations in the multi-answer case cannot be detected. We conduct a series of experiments which demonstrate the advantage of our formulation. Further, our investigations shed some light on how the probabilities assigned to a given output by an LLM can be amplified by iterative prompting, which might be of independent interest.
Updated: 2024-06-04 17:58:18
标题: 相信还是不相信你的LLM
摘要: 我们探讨了大型语言模型(LLMs)中的不确定性量化问题,旨在确定在给定查询时响应的不确定性何时较大。我们同时考虑认知不确定性和偶然不确定性,前者来自于对基本事实(如事实或语言)的缺乏知识,后者来自于不可减少的随机性(如多个可能的答案)。特别地,我们推导出一种信息理论度量标准,可以可靠地检测仅有认知不确定性较大的情况,此时模型的输出是不可靠的。这种条件可以仅基于通过一些特殊的迭代提示获得的模型输出计算,而这些提示基于先前的响应。这种量化,例如,可以检测出在单个和多个答案响应中出现的幻觉(当认知不确定性较高时的情况)。这与许多标准不确定性量化策略(如对响应的对数似然性进行阈值处理)形成对比,在多个答案的情况下无法检测出幻觉。我们进行了一系列实验,证明了我们的公式的优势。此外,我们的研究揭示了如何通过迭代提示来增强由LLM分配给给定输出的概率,这可能具有独立的兴趣。
更新时间: 2024-06-04 17:58:18
领域: cs.LG,cs.AI,cs.CL
Loki: Low-Rank Keys for Efficient Sparse Attention
Inference on large language models can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used. In particular, the self-attention mechanism used in such models contributes significantly to these costs, which has resulted in several recent works that propose sparse attention approximations for inference. In this work, we propose to approximate the self-attention computation by focusing on the dimensionality of key vectors computed in the attention block. Our analysis reveals that the key vectors lie in a significantly lower-dimensional space, consistently across several datasets and models. Exploiting this observation, we propose Loki, a novel sparse attention method that ranks and selects tokens in the KV-cache based on attention scores computed in low-dimensional space. Our evaluations show that Loki is able to maintain the efficacy of the models better than other popular approximation methods, while speeding up the attention computation due to reduced data movement (load/store) and compute costs.
Updated: 2024-06-04 17:58:03
标题: 洛基:用于高效稀疏注意力的低秩键
摘要: 大型语言模型的推理在计算和内存成本方面可能很昂贵,特别是在使用较长序列长度时。特别是,在这些模型中使用的自注意力机制显著增加了这些成本,这导致了一些最近的工作提出了用于推理的稀疏注意力近似。在这项工作中,我们提出通过关注注意力块中计算的关键向量的维度来近似自注意力计算。我们的分析显示,关键向量位于一个显著较低维度的空间中,一致地在多个数据集和模型中表现出来。利用这一观察结果,我们提出了Loki,一种新颖的稀疏注意力方法,根据在低维空间中计算的注意力分数在KV缓存中对令牌进行排序和选择。我们的评估表明,Loki能够比其他流行的近似方法更好地保持模型的效力,同时由于减少了数据移动(加载/存储)和计算成本,加快了注意力计算的速度。
更新时间: 2024-06-04 17:58:03
领域: cs.LG
Parrot: Multilingual Visual Instruction Tuning
The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training process evolves. We empirically find that the imbalanced SFT datasets, primarily composed of English-centric image-text pairs, lead to significantly reduced performance in non-English languages. This is due to the failure of aligning the vision encoder and LLM with multilingual tokens during the SFT process. In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Specifically, to enhance non-English visual tokens alignment, we compute the cross-attention using the initial visual features and textual embeddings, the result of which is then fed into the MoE router to select the most relevant experts. The selected experts subsequently convert the initial visual tokens into language-specific visual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB. Our method not only demonstrates state-of-the-art performance on multilingual MMBench and MMMB, but also excels across a broad range of multimodal tasks. Both the source code and the training dataset of Parrot will be made publicly available.
Updated: 2024-06-04 17:56:28
标题: 鹦鹉:多语言视觉指导调整
摘要: 多模式大型语言模型(MLLMs)如GPT-4V的快速发展标志着人工通用智能迈出了重要一步。现有方法主要集中在通过监督微调(SFT)来使视觉编码器与LLMs对齐,赋予LLMs多模态能力,使MLLMs对多种语言的固有能力在训练过程中逐渐恶化。我们在实证中发现,不平衡的SFT数据集,主要由以英语为中心的图像文本对组成,导致非英语语言的性能显著降低。这是由于在SFT过程中未能对视觉编码器和LLM进行多语言标记对齐。在本文中,我们引入了一种名为Parrot的新方法,利用文本引导驱动语言级别的视觉标记对齐。Parrot使视觉标记依赖于不同语言输入,并使用专家混合(MoE)来促进多语言标记的对齐。具体地,为了增强非英语视觉标记的对齐,我们使用初始视觉特征和文本嵌入计算交叉关注,其结果然后被输入到MoE路由器中选择最相关的专家。所选的专家随后将初始视觉标记转换为特定语言的视觉标记。此外,考虑到目前缺乏用于评估领域内多语言能力的基准,我们收集并提供了一个名为MMMB的大规模多语言多模态基准,其中包括6种语言、15个类别和12,000个问题。我们的方法不仅在多语言MMBench和MMMB上展示了最先进的性能,而且在广泛的多模态任务中表现出色。Parrot的源代码和训练数据集将公开提供。
更新时间: 2024-06-04 17:56:28
领域: cs.CV,cs.AI,cs.CL,cs.LG
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Top-view perspective denotes a typical way in which humans read and reason over different types of maps, and it is vital for localization and navigation of humans as well as of `non-human' agents, such as the ones backed by large Vision-Language Models (VLMs). Nonetheless, spatial reasoning capabilities of modern VLMs remain unattested and underexplored. In this work, we thus study their capability to understand and reason over spatial relations from the top view. The focus on top view also enables controlled evaluations at different granularity of spatial reasoning; we clearly disentangle different abilities (e.g., recognizing particular objects versus understanding their relative positions). We introduce the TopViewRS (Top-View Reasoning in Space) dataset, consisting of 11,384 multiple-choice questions with either realistic or semantic top-view map as visual input. We then use it to study and evaluate VLMs across 4 perception and reasoning tasks with different levels of complexity. Evaluation of 10 representative open- and closed-source VLMs reveals the gap of more than 50% compared to average human performance, and it is even lower than the random baseline in some cases. Although additional experiments show that Chain-of-Thought reasoning can boost model capabilities by 5.82% on average, the overall performance of VLMs remains limited. Our findings underscore the critical need for enhanced model capability in top-view spatial reasoning and set a foundation for further research towards human-level proficiency of VLMs in real-world multimodal tasks.
Updated: 2024-06-04 17:55:43
标题: TopViewRS:将视觉-语言模型作为顶视图空间推理器
摘要: 俯视视角是人类阅读和推理不同类型地图的典型方式,对于人类以及由大型视觉语言模型(VLMs)支持的“非人类”代理的定位和导航至关重要。然而,现代VLMs的空间推理能力尚未得到证实和充分探讨。因此,在这项工作中,我们研究它们理解和推理顶视空间关系的能力。对顶视的关注还可以在不同粒度的空间推理上进行受控评估;我们清晰地区分不同的能力(例如,识别特定对象与理解它们的相对位置)。我们引入了TopViewRS(空间顶视推理)数据集,包含11,384个选择题,其中包含真实或语义顶视地图作为视觉输入。然后,我们使用它来研究和评估VLMs在4个感知和推理任务上的表现,这些任务具有不同的复杂性水平。对10个代表性的开源和闭源VLMs进行评估,发现与人类平均表现相比存在超过50%的差距,并且在某些情况下甚至低于随机基线。尽管额外实验证明思维链推理可以平均提升模型能力5.82%,但VLMs的整体表现仍受限。我们的发现强调了对顶视空间推理增强模型能力的重要性,并为进一步研究朝着VLMs在真实多模态任务中达到人类水平的熟练度奠定了基础。
更新时间: 2024-06-04 17:55:43
领域: cs.CL,cs.CV,cs.LG
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.
Updated: 2024-06-04 17:55:38
标题: 通过扩展单一维度来减轻大型语言模型中的位置偏差
摘要: 大型语言模型(LLMs)由于其出色的泛化能力和强大的生成能力,越来越多地应用于各种实际场景。然而,它们存在位置偏差,也称为“中间丢失”,这种现象在长上下文场景中特别明显,表明将关键信息放置在提示的不同位置可以显著影响准确性。本文首先探讨了位置偏差的微观表现,并得出结论:注意力权重是位置偏差的微观表达。进一步确定,除了位置嵌入外,因果关注蒙版还通过创建位置特定的隐藏状态来导致位置偏差。基于这些见解,我们提出了一种通过扩展这些位置隐藏状态来减轻位置偏差的方法。在自然问题多文档QA、KV检索、LongBench和时间线重新排序任务上进行的实验,使用包括RoPE模型、上下文窗口扩展模型和Alibi模型在内的各种模型,展示了我们方法的有效性和泛化能力。我们的方法可以通过修改隐藏状态的一个维度来提高性能高达15.2%。我们的代码可在https://aka.ms/PositionalHidden上获得。
更新时间: 2024-06-04 17:55:38
领域: cs.CL,cs.LG
Enhancing predictive imaging biomarker discovery through treatment effect analysis
Identifying predictive biomarkers, which forecast individual treatment effectiveness, is crucial for personalized medicine and informs decision-making across diverse disciplines. These biomarkers are extracted from pre-treatment data, often within randomized controlled trials, and have to be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses on the discovery of predictive imaging biomarkers, aiming to leverage pre-treatment images to unveil new causal relationships. Previous approaches relied on labor-intensive handcrafted or manually derived features, which may introduce biases. In response, we present a new task of discovering predictive imaging biomarkers directly from the pre-treatment images to learn relevant image features. We propose an evaluation protocol for this task to assess a model's ability to identify predictive imaging biomarkers and differentiate them from prognostic ones. It employs statistical testing and a comprehensive analysis of image feature attribution. We explore the suitability of deep learning models originally designed for estimating the conditional average treatment effect (CATE) for this task, which previously have been primarily assessed for the precision of CATE estimation, overlooking the evaluation of imaging biomarker discovery. Our proof-of-concept analysis demonstrates promising results in discovering and validating predictive imaging biomarkers from synthetic outcomes and real-world image datasets.
Updated: 2024-06-04 17:54:44
标题: 通过治疗效果分析提升预测性成像生物标记的发现
摘要: 识别预测生物标志物,预测个体治疗效果,对个性化医学至关重要,并在各个学科领域的决策制定中提供信息。这些生物标志物是从治疗前数据中提取的,通常是在随机对照试验中,必须与与治疗分配无关的预后生物标志物区分开来。我们的研究侧重于发现预测性成像生物标志物,旨在利用治疗前图像揭示新的因果关系。先前的方法依赖于繁重的手工制作或手动提取的特征,这可能会引入偏见。为此,我们提出了一项新任务,直接从治疗前图像中发现预测性成像生物标志物,以学习相关的图像特征。我们提出了一个用于评估模型识别预测性成像生物标志物并将其与预后生物标志物区分开的评估协议。它采用统计测试和对图像特征归因的全面分析。我们探讨了最初设计用于估计条件平均治疗效果(CATE)的深度学习模型对于这项任务的适用性,先前对于CATE估计的精度进行了主要评估,而忽视了成像生物标志物发现的评估。我们的概念验证分析展示了从合成结果和真实世界图像数据集中发现和验证预测性成像生物标志物的有希望的结果。
更新时间: 2024-06-04 17:54:44
领域: eess.IV,cs.AI,cs.CV,cs.LG
ReLUs Are Sufficient for Learning Implicit Neural Representations
Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to remedy the spectral bias. This in turn enables its use for various INR tasks. Empirically, we demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons. Next, by leveraging recent theoretical works which characterize the kinds of functions ReLU neural networks learn, we provide a way to quantify the regularity of the learned function. This offers a principled approach to selecting the hyperparameters in INR architectures. We substantiate our claims through experiments in signal representation, super resolution, and computed tomography, demonstrating the versatility and effectiveness of our method. The code for all experiments can be found at https://github.com/joeshenouda/relu-inrs.
Updated: 2024-06-04 17:51:08
标题: ReLUs足以学习隐式神经表示
摘要: 受到使用修正线性单元(ReLU)作为激活函数的神经网络的理论理解不断增长的启发,我们重新审视了ReLU激活函数用于学习隐式神经表示(INRs)的应用。受二阶B样条小波的启发,我们在深度神经网络(DNN)的每一层中为ReLU神经元引入一组简单的约束,以纠正谱偏差。这反过来使其能够用于各种INR任务。实验证明,与普遍观念相反,可以基于仅由ReLU神经元组成的DNN学习最先进的INRs。接下来,通过利用最近的理论工作对ReLU神经网络学习的函数类型进行表征,我们提供了一种量化所学函数规则性的方法。这为选择INR架构中的超参数提供了一个原则性的方法。我们通过在信号表示、超分辨率和计算机断层扫描等实验中证实了我们方法的多功能性和有效性。所有实验的代码都可以在https://github.com/joeshenouda/relu-inrs 上找到。
更新时间: 2024-06-04 17:51:08
领域: eess.IV,cs.AI,cs.CV,cs.LG
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental difference. In this work, we investigate how the behavior of adaptive methods changes when we remove the root, i.e. strengthen their second-order motivation. Surprisingly, we find that such square-root-free adaptive methods close the generalization gap to SGD on convolutional architectures, while maintaining their root-based counterpart's performance on transformers. The second-order perspective also has practical benefits for the development of non-diagonal adaptive methods through the concept of preconditioner invariance. In contrast to root-based methods like Shampoo, the root-free counterparts do not require numerically unstable matrix root decompositions and inversions, thus work well in half precision. Our findings provide new insights into the development of adaptive methods and raise important questions regarding the currently overlooked role of adaptivity for their success.
Updated: 2024-06-04 17:47:41
标题: 我们能在自适应梯度方法中移除平方根吗?第二阶角度视角
摘要: 像Adam(W)这样的自适应梯度优化器是许多深度学习架构的默认训练算法,如变压器。它们的对角预处理器是基于梯度外积的,通过平方根将其纳入参数更新。虽然这些方法通常被视为近似二阶方法,但平方根代表了一个根本的区别。在这项工作中,我们研究了当我们去掉根时,即加强它们的二阶动机时,自适应方法的行为如何改变。令人惊讶的是,我们发现这种无平方根的自适应方法将卷积架构的泛化差距缩小到SGD的水平,同时保持其基于根的对应方法在变压器上的性能。第二阶视角对于通过预处理器不变性的概念发展非对角自适应方法也具有实际好处。与像Shampoo这样的基于根的方法不同,无根的对应方法不需要数值不稳定的矩阵根分解和求逆,因此在半精度下工作良好。我们的发现为自适应方法的发展提供了新的见解,并提出了关于适应性对其成功的当前被忽视作用的重要问题。
更新时间: 2024-06-04 17:47:41
领域: cs.LG,math.OC
Block Transformer: Global-to-Local Language Modeling for Fast Inference
This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inference. We notice that these costs stem from applying self-attention on the global context, therefore we isolate the expensive bottlenecks of global modeling to lower layers and apply fast local modeling in upper layers. To mitigate the remaining costs in the lower layers, we aggregate input tokens into fixed size blocks and then apply self-attention at this coarse level. Context information is aggregated into a single embedding to enable upper layers to decode the next block of tokens, without global attention. Free of global attention bottlenecks, the upper layers can fully utilize the compute hardware to maximize inference throughput. By leveraging global and local modules, the Block Transformer architecture demonstrates 10-20x gains in inference throughput compared to vanilla transformers with equivalent perplexity. Our work introduces a new approach to optimize language model inference through novel application of global-to-local modeling. Code is available at https://github.com/itsnamgyu/block-transformer.
Updated: 2024-06-04 17:45:26
标题: 块变换器:全局到局部的语言建模,用于快速推断
摘要: 这篇论文介绍了块转换器架构,该架构采用分层的全局到局部建模方法,以自回归变压器来缓解自注意力的推断瓶颈。为了应用自注意力,必须在每个解码步骤从内存中检索所有先前序列的键-值(KV)缓存。因此,这种KV缓存I/O在批量推断中成为一个重要的瓶颈。我们注意到,这些成本源自在全局上下文中应用自注意力,因此我们将全局建模的昂贵瓶颈隔离到较低层,并在上层应用快速局部建模。为了缓解较低层的剩余成本,我们将输入令牌聚合成固定大小的块,然后在这个粗粒度级别应用自注意力。上层将上下文信息聚合成单个嵌入,以使上层能够解码下一个令牌块,而无需全局注意力。摆脱全局注意力瓶颈后,上层可以充分利用计算硬件,以最大化推断吞吐量。通过利用全局和局部模块,块变换器架构与等效困惑度的基准变压器相比,推断吞吐量实现了10-20倍的增益。我们的工作通过全局到局部建模的新应用,引入了一种优化语言模型推断的新方法。代码可在https://github.com/itsnamgyu/block-transformer上找到。
更新时间: 2024-06-04 17:45:26
领域: cs.CL,cs.AI,cs.LG
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. RoboCasa features realistic and diverse scenes focusing on kitchen environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. We enrich the realism and diversity of our simulation with generative AI tools, such as object assets from text-to-3D models and environment textures from text-to-image models. We design a set of 100 tasks for systematic evaluation, including composite tasks generated by the guidance of large language models. To facilitate learning, we provide high-quality human demonstrations and integrate automated trajectory generation methods to substantially enlarge our datasets with minimal human burden. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks. Videos and open-source code are available at https://robocasa.ai/
Updated: 2024-06-04 17:41:31
标题: RoboCasa:通用机器人的大规模日常任务模拟
摘要: 人工智能(AI)最近的进展主要是通过扩展而实现的。在机器人领域,扩展受到缺乏大规模机器人数据集的阻碍。我们主张利用逼真的物理模拟作为扩展环境、任务和机器人学习方法数据集的手段。我们提出了RoboCasa,这是一个用于在日常环境中训练通用机器人的大规模模拟框架。RoboCasa包含逼真而多样化的场景,重点放在厨房环境上。我们提供了超过150个物体类别和数十个可互动家具和电器的数千个3D资源。我们利用生成式AI工具丰富了我们模拟的逼真性和多样性,例如从文本到3D模型的物体资源和从文本到图像模型的环境纹理。我们设计了一套包括通过大型语言模型指导生成的复合任务在内的100个任务进行系统评估。为了促进学习,我们提供高质量的人类演示,并集成了自动轨迹生成方法,以最小化人类负担大幅扩展我们的数据集。我们的实验表明,使用合成生成的机器人数据进行大规模模仿学习呈现出明显的扩展趋势,同时展示了在真实世界任务中利用模拟数据的巨大潜力。视频和开源代码可在https://robocasa.ai/上获得。
更新时间: 2024-06-04 17:41:31
领域: cs.RO,cs.AI,cs.LG
Uncertainty of Joint Neural Contextual Bandit
Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $\mu$, an uncertainty $\sigma$ and a hyper-parameter $\alpha$ which balances exploitation and exploration, e.g., $\mu + \alpha \sigma$. The tuning of the parameter $\alpha$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $\sigma$ of the joint neural contextual bandit model. Our analysis reveals that $\alpha$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $\sigma \propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.
Updated: 2024-06-04 17:38:24
标题: 关于联合神经上下文臂带不确定性
摘要: 上下文强化学习在现代大规模推荐系统中越来越受青睐。为了更好地利用上下文信息和可用的用户或物品特征,引入了神经网络来增强上下文强化学习,并引起了学术界和行业的广泛关注。然而,在大规模推荐系统中实现离散神经上下文强化学习解决方案时会出现一个主要挑战,其中每个物品或用户可能对应一个单独的强化学习臂。推荐的大量物品造成了现实世界生产部署的重要障碍。本文重点研究了一个联合神经上下文强化学习解决方案,它在一个单一模型中为所有推荐的物品提供服务。输出包括一个预测奖励$\mu$,一个不确定性$\sigma$和一个超参数$\alpha$,用于平衡开发和探索,例如$\mu + \alpha \sigma$。 参数$\alpha$的调整通常在实践中是启发式的和复杂的,因为它具有随机的性质。为了解决这一挑战,我们提供关于联合神经上下文强化学习模型不确定性$\sigma$的理论分析和实验结果。我们的分析揭示了$\alpha$与最后隐藏层大小$F$的近似平方根关系和与训练数据量$N$的倒数平方根关系,即$\sigma \propto \sqrt{\frac{F}{N}}$。通过使用真实工业数据进行的实验与理论分析相一致,有助于理解模型行为,并在离线训练和在线部署期间协助超参数调整。
更新时间: 2024-06-04 17:38:24
领域: cs.LG
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance
Phishing, a prevalent cybercrime tactic for decades, remains a significant threat in today's digital world. By leveraging clever social engineering elements and modern technology, cybercrime targets many individuals, businesses, and organizations to exploit trust and security. These cyber-attackers are often disguised in many trustworthy forms to appear as legitimate sources. By cleverly using psychological elements like urgency, fear, social proof, and other manipulative strategies, phishers can lure individuals into revealing sensitive and personalized information. Building on this pervasive issue within modern technology, this paper aims to analyze the effectiveness of 15 Large Language Models (LLMs) in detecting phishing attempts, specifically focusing on a randomized set of "419 Scam" emails. The objective is to determine which LLMs can accurately detect phishing emails by analyzing a text file containing email metadata based on predefined criteria. The experiment concluded that the following models, ChatGPT 3.5, GPT-3.5-Turbo-Instruct, and ChatGPT, were the most effective in detecting phishing emails.
Updated: 2024-06-04 17:37:08
标题: 大型语言模型以令人惊讶的准确性识别钓鱼邮件:性能比较分析
摘要: 网络钓鱼是一种长期存在的网络犯罪手段,在当今数字世界仍然是一个重要的威胁。通过利用巧妙的社会工程元素和现代技术,网络犯罪针对许多个人、企业和组织,以利用信任和安全。这些网络攻击者经常伪装成许多可信赖的形式,以表现为合法来源。通过巧妙地使用紧急性、恐惧、社会证明和其他操纵策略等心理元素,网络钓鱼者可以诱使个人透露敏感和个性化信息。基于现代技术中存在的这一普遍问题,本文旨在分析15个大型语言模型(LLMs)在检测网络钓鱼尝试方面的有效性,特别关注一组随机选择的“419骗局”电子邮件。目标是通过分析包含基于预定义标准的电子邮件元数据的文本文件,确定哪些LLMs可以准确检测网络钓鱼电子邮件。实验得出结论,ChatGPT 3.5、GPT-3.5-Turbo-Instruct和ChatGPT这些模型在检测网络钓鱼邮件方面最为有效。
更新时间: 2024-06-04 17:37:08
领域: cs.CL,cs.AI
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths.
Updated: 2024-06-04 17:32:52
标题: V-Express:肖像视频生成的渐进训练中的有条件丢弃
摘要: 在肖像视频生成领域,利用单幅图像生成肖像视频已经越来越普遍。一种常见的方法是利用生成模型来增强适配器以进行控制生成。然而,控制信号(例如文本、音频、参考图像、姿势、深度图等)的强度可能会有所不同。在这些信号中,较弱的条件通常由于受到较强条件的干扰而难以有效,这在平衡这些条件方面提出了挑战。在我们对肖像视频生成的研究中,我们发现音频信号特别弱,常常被较强的信号如面部姿势和参考图像所掩盖。然而,直接训练弱信号往往会导致收敛困难。为了解决这个问题,我们提出了V-Express,一种通过渐进训练和条件性丢弃操作平衡不同控制信号的简单方法。我们的方法逐渐使弱条件能够有效控制,从而实现了同时考虑面部姿势、参考图像和音频的生成能力。实验结果表明,我们的方法能够有效生成由音频控制的肖像视频。此外,我们为同时和有效使用不同强度条件提供了潜在解决方案。
更新时间: 2024-06-04 17:32:52
领域: cs.CV,cs.AI
Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks
Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (that is, mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth
Updated: 2024-06-04 17:29:21
标题: 公平性优化的合成电子病历生成,用于任意下游预测任务
摘要: 在确保医疗应用人工智能工具负责任设计的各个方面中,解决公平性问题一直是关注的重点领域。具体而言,鉴于电子健康记录(EHR)数据的广泛传播和它们在临床决策支持任务中具有巨大潜力,改善这一类健康人工智能工具中的公平性至关重要。虽然已经采用各种方法来解决这一广泛问题(即减轻基于EHR的人工智能模型的公平性),但任务和模型无关的方法明显较少。在本研究中,我们旨在针对这一空白,提出一个新的流程,生成合成的EHR数据,该数据不仅与真实EHR数据一致(忠实于),而且在与真实数据结合时可以减少下游任务中的公平性问题(由最终用户定义)。我们展示了我们提出的流程在各种下游任务和两个不同的EHR数据集上的有效性。我们提出的流程可以为现有方法工具箱中解决健康人工智能应用中的公平性问题提供一个广泛适用且互补的工具,例如修改下游模型设计。我们项目的代码库可在https://github.com/healthylaife/FairSynth 上找到。
更新时间: 2024-06-04 17:29:21
领域: cs.LG
State-Constrained Zero-Sum Differential Games with One-Sided Information
We study zero-sum differential games with state constraints and one-sided information, where the informed player (Player 1) has a categorical payoff type unknown to the uninformed player (Player 2). The goal of Player 1 is to minimize his payoff without violating the constraints, while that of Player 2 is to violate the state constraints if possible, or to maximize the payoff otherwise. One example of the game is a man-to-man matchup in football. Without state constraints, Cardaliaguet (2007) showed that the value of such a game exists and is convex to the common belief of players. Our theoretical contribution is an extension of this result to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies. Different from existing works that are concerned about the scalability of no-regret learning in games with discrete dynamics, our study reveals the underlying structure of strategies for belief manipulation resulting from information asymmetry and state constraints. This structure will be necessary for scalable learning on games with continuous actions and long time windows. We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random deceptive moves to take advantage of information asymmetry, and compute how the defender should respond.
Updated: 2024-06-04 17:26:30
标题: 具有单边信息的状态约束零和微分博弈
摘要: 我们研究具有状态约束和单边信息的零和差分博弈,其中被告知的玩家(玩家1)具有未知于未告知的玩家(玩家2)的分类支付类型。玩家1的目标是最小化他的支付,而不违反约束,而玩家2的目标是尽可能违反状态约束,或者在其他情况下最大化支付。该游戏的一个例子是足球中的一对一对位。在没有状态约束的情况下,Cardaliaguet(2007)表明这种游戏的价值存在,并且是对玩家的共同信念而言是凸的。我们的理论贡献是将这一结果扩展到具有状态约束的游戏,并推导出计算行为策略所必需的原始和对偶子动态原则。与现有关注在具有离散动态的游戏中无悔学习的可扩展性不同,我们的研究揭示了由信息不对称和状态约束导致的信念操纵策略的潜在结构。这种结构对于在具有连续动作和较长时间窗口的游戏中的可扩展学习是必要的。我们使用简化的足球游戏来展示这项工作的实用性,我们揭示了球员位置和信念状态,在其中进攻者应该(或不应该)进行特定的随机欺骗性移动,以利用信息不对称,并计算防守者应该如何做出反应。
更新时间: 2024-06-04 17:26:30
领域: cs.GT,cs.LG
Guiding a Diffusion Model with a Bad Version of Itself
The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.
Updated: 2024-06-04 17:25:59
标题: 用一个坏版本本身来引导扩散模型
摘要: 图像生成扩散模型中感兴趣的主要轴是图像质量、结果中的变化量以及结果与给定条件(例如类标签或文本提示)的对齐程度。流行的无分类器引导方法使用一个无条件模型来引导一个条件模型,从而同时实现更好的提示对齐和更高质量的图像,但代价是降低了变化量。这些效果似乎是固有纠缠的,因此难以控制。我们发现令人惊讶的是,可以通过使用一个较小、经过较少训练的模型的生成来获得对图像质量的解耦控制,而不会损害结果的变化量。这导致在ImageNet生成方面取得了显著的改进,使用公开可用的网络,为64x64和512x512分别设置了1.01和1.25的记录FID。此外,该方法也适用于无条件扩散模型,大幅提高了它们的质量。
更新时间: 2024-06-04 17:25:59
领域: cs.CV,cs.AI,cs.LG,cs.NE,stat.ML
VideoPoet: A Large Language Model for Zero-Shot Video Generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/
Updated: 2024-06-04 17:25:20
标题: VideoPoet:一种用于零样本视频生成的大型语言模型
摘要: 我们提出了VideoPoet,这是一种能够从各种调节信号中合成高质量视频并匹配音频的语言模型。VideoPoet采用了仅解码器的变压器架构,处理多模态输入,包括图像、视频、文本和音频。训练协议遵循大型语言模型(LLMs)的方式,包括两个阶段:预训练和任务特定调整。在预训练期间,VideoPoet在自回归Transformer框架中融合了多模态生成目标的混合。预训练的LLM作为一个基础,可以为各种视频生成任务进行调整。我们提供了实证结果,展示了该模型在零样本视频生成方面的最先进能力,特别突出了VideoPoet生成高保真度运动的能力。项目页面:http://sites.research.google/videopoet/
更新时间: 2024-06-04 17:25:20
领域: cs.CV,cs.AI
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE introduces potential redundancy (e.g., parameters) and extra costs (e.g., communication overhead). Despite numerous compression techniques developed for mitigating the redundancy in dense models, the compression of MoE remains under-explored. We first bridge this gap with a cutting-edge unified framework that not only seamlessly integrates mainstream compression methods but also helps systematically understand MoE compression. This framework approaches compression from two perspectives: Expert Slimming which compresses individual experts and Expert Trimming which removes structured modules. Within this framework, we explore the optimization space unexplored by existing methods,and further introduce aggressive Expert Trimming techniques, i.e., Layer Drop and Block Drop, to eliminate redundancy at larger scales. Based on these insights,we present a comprehensive recipe to guide practitioners in compressing MoE effectively. Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6.05x speedup and only 20.0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B.
Updated: 2024-06-04 17:18:40
标题: 《通过统一框架揭秘混合专家压缩》
摘要: 大规模语言模型的扩展已经在不同领域彻底改变了性能,然而模型大小的持续增长对实际部署提出了重大挑战。混合专家(MoE)方法通过动态选择并激活仅部分专家来解决这个问题,显著降低了计算成本同时保持高性能。然而,MoE引入了潜在的冗余(如参数)和额外成本(如通信开销)。尽管针对稠密模型缓解冗余开发了许多压缩技术,但MoE的压缩仍未得到充分探索。我们首先通过一个尖端的统一框架弥合了这一差距,该框架不仅无缝集成了主流压缩方法,还有助于系统地理解MoE的压缩。该框架从两个角度进行压缩:专家瘦身(Expert Slimming)压缩单个专家和专家修剪(Expert Trimming)删除结构化模块。在此框架内,我们探索了现有方法尚未探索的优化空间,并进一步引入了激进的专家修剪技术,即层丢弃和块丢弃,以在更大规模上消除冗余。基于这些见解,我们提出了一套全面的指南,指导从业者有效压缩MoE。广泛的实验结果表明,在我们的框架和提议的指南下,压缩方法的有效性,实现了6.05倍的加速和仅20.0GB的内存使用量,同时在Mixtral-8x7B上保持了超过92%的性能。
更新时间: 2024-06-04 17:18:40
领域: cs.LG,cs.AI
Momentum Particle Maximum Likelihood
Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.
Updated: 2024-06-04 17:17:53
标题: 动量粒子最大似然率
摘要: 潜变量模型的最大似然估计(MLE)通常被重新定义为在扩展参数和概率分布空间上最小化自由能泛函。最近,这种视角与最优输运的洞察力结合起来,得到了用于将潜变量模型拟合到数据的新颖基于粒子的算法。汲取了之前解释“动量丰富”优化算法为普通微分方程的离散化的先前作品的灵感,我们提出了一种类似动力系统灵感的方法来最小化自由能泛函。结果是一个动力系统,融合了Nesterov的加速梯度方法、欠阻尼Langevin扩散和粒子方法的元素。在适当的假设下,我们证明了连续时间系统最小化了泛函。通过离散化系统,我们得到了一个用于潜变量模型MLE的实用算法。该算法在数值实验中优于现有的粒子方法,并与其他MLE算法相比表现良好。
更新时间: 2024-06-04 17:17:53
领域: cs.LG
Dropout MPC: An Ensemble Neural MPC Approach for Systems with Learned Dynamics
Neural networks are lately more and more often being used in the context of data-driven control, as an approximate model of the true system dynamics. Model Predictive Control (MPC) adopts this practise leading to neural MPC strategies. This raises a question of whether the trained neural network has converged and generalized in a way that the learned model encapsulates an accurate approximation of the true dynamic model of the system, thus making it a reliable choice for model-based control, especially for disturbed and uncertain systems. To tackle that, we propose Dropout MPC, a novel sampling-based ensemble neural MPC algorithm that employs the Monte-Carlo dropout technique on the learned system model. The closed loop is based on an ensemble of predictive controllers, that are used simultaneously at each time-step for trajectory optimization. Each member of the ensemble influences the control input, based on a weighted voting scheme, thus by employing different realizations of the learned system dynamics, neural control becomes more reliable by design. An additional strength of the method is that it offers by design a way to estimate future uncertainty, leading to cautious control. While the method aims in general at uncertain systems with complex dynamics, where models derived from first principles are hard to infer, to showcase the application we utilize data gathered in the laboratory from a real mobile manipulator and employ the proposed algorithm for the navigation of the robot in simulation.
Updated: 2024-06-04 17:15:25
标题: 辍学MPC:一种用于具有学习动态系统的集成神经MPC方法
摘要: 最近,神经网络在数据驱动控制的背景下越来越频繁地被使用,作为真实系统动态的近似模型。模型预测控制(MPC)采用这种做法,导致了神经MPC策略的出现。这引发了一个问题,即训练好的神经网络是否已经收敛并且泛化,以便学习到的模型能够准确地近似系统的真实动态模型,从而使其成为模型控制的可靠选择,特别是对于受扰动和不确定性影响的系统。为了解决这个问题,我们提出了Dropout MPC,这是一种新颖的基于采样的集成神经MPC算法,使用了蒙特卡洛dropout技术在学习的系统模型上。闭环控制基于一组预测控制器,这些控制器同时在每个时间步用于轨迹优化。集成的每个成员根据加权投票方案影响控制输入,因此通过采用学习到的系统动态的不同实现,神经控制在设计上变得更加可靠。该方法的另一个优势是,它通过设计提供了一种估计未来不确定性的方法,从而实现谨慎的控制。虽然该方法总体上旨在处理具有复杂动态的不确定系统,其中从第一原理推导模型很难推断,但为了展示应用,我们利用从实际移动操作机器人实验室中收集的数据,并在仿真中利用提出的算法来导航机器人。
更新时间: 2024-06-04 17:15:25
领域: eess.SY,cs.AI,cs.LG,cs.RO,cs.SY
Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability
Kolmogorov-Arnold Networks (KAN) is a groundbreaking model recently proposed by the MIT team, representing a revolutionary approach with the potential to be a game-changer in the field. This innovative concept has rapidly garnered worldwide interest within the AI community. Inspired by the Kolmogorov-Arnold representation theorem, KAN utilizes spline-parametrized univariate functions in place of traditional linear weights, enabling them to dynamically learn activation patterns and significantly enhancing interpretability. In this paper, we explore the application of KAN to time series forecasting and propose two variants: T-KAN and MT-KAN. T-KAN is designed to detect concept drift within time series and can explain the nonlinear relationships between predictions and previous time steps through symbolic regression, making it highly interpretable in dynamically changing environments. MT-KAN, on the other hand, improves predictive performance by effectively uncovering and leveraging the complex relationships among variables in multivariate time series. Experiments validate the effectiveness of these approaches, demonstrating that T-KAN and MT-KAN significantly outperform traditional methods in time series forecasting tasks, not only enhancing predictive accuracy but also improving model interpretability. This research opens new avenues for adaptive forecasting models, highlighting the potential of KAN as a powerful and interpretable tool in predictive analytics.
Updated: 2024-06-04 17:14:31
标题: 科尔莫哥洛夫-阿诺德网络用于时间序列:搭建预测能力与可解释性的桥梁
摘要: Kolmogorov-Arnold Networks(KAN)是由麻省理工团队最近提出的一个开创性模型,代表着一种具有改革性潜力的革命性方法。这一创新概念迅速引起了全球人工智能社区的兴趣。受Kolmogorov-Arnold表示定理的启发,KAN利用样条参数化的一元函数代替传统的线性权重,使其能够动态学习激活模式并显著增强可解释性。在本文中,我们探讨了KAN在时间序列预测中的应用,并提出了两种变体:T-KAN和MT-KAN。T-KAN旨在检测时间序列中的概念漂移,并通过符号回归解释预测与先前时间步之间的非线性关系,使其在动态变化的环境中具有高度可解释性。另一方面,MT-KAN通过有效地发现和利用多变量时间序列中的复杂关系,提高了预测性能。实验证实了这些方法的有效性,表明T-KAN和MT-KAN在时间序列预测任务中明显优于传统方法,不仅提高了预测准确性,还提高了模型的可解释性。这项研究为自适应预测模型开辟了新的途径,突显了KAN作为预测分析中一个强大且可解释的工具的潜力。
更新时间: 2024-06-04 17:14:31
领域: cs.LG,cs.AI
Comparing Graph Transformers via Positional Encodings
The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each pair of nodes, e.g., graph distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. Based on our theoretical results, we provide a study on several APEs and RPEs (including the resistance distance and the recently introduced stable and expressive positional encoding (SPE)) and compare their distinguishing power in terms of transformers. We believe our work will help navigate the huge number of choices of positional encoding and will provide guidance on the future design of positional encodings for graph transformers.
Updated: 2024-06-04 17:11:02
标题: 通过位置编码比较图变换器
摘要: 图形变换器的区分能力与位置编码的选择密切相关:用于增强基础变换器与图形信息的特征。主要有两种类型的位置编码:绝对位置编码(APEs)和相对位置编码(RPEs)。APEs为每个节点分配特征,并作为输入提供给变换器。相反,RPEs为每对节点分配一个特征,例如,图形距离,并用于增强注意力块。事先不清楚哪种方法更好地最大化生成的图形变换器的能力。在本文中,我们旨在了解这些不同类型位置编码之间的关系。有趣的是,我们展示了使用APEs和RPEs的图形变换器在区分能力方面是等效的。特别地,我们演示了如何在维持它们在图形变换器中的区分能力的同时交换APEs和RPEs。基于我们的理论结果,我们对几种APEs和RPEs(包括阻抗距离和最近引入的稳定和表达位置编码(SPE))进行研究,并比较它们在变换器方面的区分能力。我们相信我们的工作将有助于解决大量位置编码选择,并将为未来设计图形变换器的位置编码提供指导。
更新时间: 2024-06-04 17:11:02
领域: cs.LG
Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut
Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some (unknown) distribution on the instances of the problem. We build upon recent work in this line of research by considering the setup where, instead of selecting a single algorithm that has the best performance, we allow the possibility of selecting an algorithm based on the instance to be solved, using neural networks. In particular, given a representative sample of instances, we learn a neural network that maps an instance of the problem to the most appropriate algorithm for that instance. We formalize this idea and derive rigorous sample complexity bounds for this learning problem, in the spirit of recent work in data-driven algorithm design. We then apply this approach to the problem of making good decisions in the branch-and-cut framework for mixed-integer optimization (e.g., which cut to add?). In other words, the neural network will take as input a mixed-integer optimization instance and output a decision that will result in a small branch-and-cut tree for that instance. Our computational results provide evidence that our particular way of using neural networks for cut selection can make a significant impact in reducing branch-and-cut tree sizes, compared to previous data-driven approaches.
Updated: 2024-06-04 17:05:20
标题: 算法选择的样本复杂性:使用神经网络及其在分支-割中的应用
摘要: 数据驱动的算法设计是一种范式,它利用统计和机器学习技术从一个计算问题的算法类中选择一个在某个(未知)分布上具有最佳性能的算法。我们在这一研究方向上构建了最近的工作,考虑了这样的设置:与其选择一个性能最佳的单个算法,我们允许基于要解决的实例选择算法的可能性,使用神经网络。具体来说,给定一个实例的代表性样本,我们学习一个神经网络,将问题的一个实例映射到最适合该实例的算法。我们正式化这个想法,并推导出这个学习问题的严格样本复杂度界限,符合最近数据驱动算法设计的工作精神。然后,我们将这种方法应用于在分支切割框架中做出良好决策的问题(例如,要添加哪个切割?)。换句话说,神经网络将以混合整数优化实例作为输入,并输出一个决策,将导致该实例的分支切割树较小。我们的计算结果表明,与以前的数据驱动方法相比,我们特定的切割选择神经网络的使用方式可以显著减少分支切割树的大小。
更新时间: 2024-06-04 17:05:20
领域: cs.LG,math.OC
Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps
Markov chain Monte Carlo methods have become popular in statistics as versatile techniques to sample from complicated probability distributions. In this work, we propose a method to parameterize and train transition kernels of Markov chains to achieve efficient sampling and good mixing. This training procedure minimizes the total variation distance between the stationary distribution of the chain and the empirical distribution of the data. Our approach leverages involutive Metropolis-Hastings kernels constructed from reversible neural networks that ensure detailed balance by construction. We find that reversibility also implies $C_2$-equivariance of the discriminator function which can be used to restrict its function space.
Updated: 2024-06-04 17:00:14
标题: Ai-Sampler: 利用逆向映射进行马尔可夫核的对抗学习
摘要: 马尔可夫链蒙特卡洛方法已经在统计学中变得流行,作为从复杂概率分布中抽样的多功能技术。在本研究中,我们提出了一种方法,参数化和训练马尔可夫链的转移核,以实现高效的抽样和良好的混合。这种训练过程最小化链的稳态分布与数据的经验分布之间的总变差距离。我们的方法利用从可逆神经网络构建的涉及Metropolis-Hastings核,通过构造确保详细平衡。我们发现,可逆性也意味着鉴别器函数的$C_2$-等变性,可用于限制其函数空间。
更新时间: 2024-06-04 17:00:14
领域: cs.LG,stat.ML
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization
Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without compromising the original accuracy remains a key challenge. In this paper, we propose an innovative PTQ algorithm termed COMQ, which sequentially conducts coordinate-wise minimization of the layer-wise reconstruction errors. We consider the widely used integer quantization, where every quantized weight can be decomposed into a shared floating-point scalar and an integer bit-code. Within a fixed layer, COMQ treats all the scaling factor(s) and bit-codes as the variables of the reconstruction error. Every iteration improves this error along a single coordinate while keeping all other variables constant. COMQ is easy to use and requires no hyper-parameter tuning. It instead involves only dot products and rounding operations. We update these variables in a carefully designed greedy order, significantly enhancing the accuracy. COMQ achieves remarkable results in quantizing 4-bit Vision Transformers, with a negligible loss of less than 1% in Top-1 accuracy. In 4-bit INT quantization of convolutional neural networks, COMQ maintains near-lossless accuracy with a minimal drop of merely 0.3% in Top-1 accuracy.
Updated: 2024-06-04 16:57:16
标题: COMQ:一种用于后训练量化的无反向传播算法
摘要: 训练后量化(PTQ)已经成为一种实用的方法,用于压缩大型神经网络,使它们在部署时高效。然而,在不影响原始准确性的情况下有效地将这些模型减少到低比特位仍然是一个关键挑战。在本文中,我们提出了一种创新的PTQ算法,称为COMQ,该算法顺序地进行逐层重构误差的坐标最小化。我们考虑广泛使用的整数量化,其中每个量化权重可以分解为共享的浮点标量和整数位编码。在固定层内,COMQ将所有缩放因子和位编码视为重构误差的变量。每次迭代都会改善这个误差沿着单个坐标,同时保持所有其他变量不变。COMQ易于使用,无需调整超参数。它只涉及点积和舍入运算。我们按照精心设计的贪婪顺序更新这些变量,极大地提高了准确性。COMQ在对4比特视觉变换器进行量化方面取得了显著的结果,Top-1准确度仅损失不到1%。在卷积神经网络的4比特INT量化中,COMQ保持接近无损准确度,Top-1准确度仅下降了0.3%。
更新时间: 2024-06-04 16:57:16
领域: cs.LG,cs.CV
A Temporal Kolmogorov-Arnold Transformer for Time Series Forecasting
Capturing complex temporal patterns and relationships within multivariate data streams is a difficult task. We propose the Temporal Kolmogorov-Arnold Transformer (TKAT), a novel attention-based architecture designed to address this task using Temporal Kolmogorov-Arnold Networks (TKANs). Inspired by the Temporal Fusion Transformer (TFT), TKAT emerges as a powerful encoder-decoder model tailored to handle tasks in which the observed part of the features is more important than the a priori known part. This new architecture combined the theoretical foundation of the Kolmogorov-Arnold representation with the power of transformers. TKAT aims to simplify the complex dependencies inherent in time series, making them more "interpretable". The use of transformer architecture in this framework allows us to capture long-range dependencies through self-attention mechanisms.
Updated: 2024-06-04 16:55:42
标题: 一个用于时间序列预测的时态科尔莫戈洛夫-阿诺德变换器
摘要: 捕捉多变量数据流中复杂的时间模式和关系是一项困难的任务。我们提出了时间 Kolmogorov-Arnold 变换器(TKAT),这是一种新颖的基于注意力机制的架构,旨在利用时间 Kolmogorov-Arnold 网络(TKANs)来解决这一任务。受到时间融合变换器(TFT)的启发,TKAT 成为一个强大的编码器-解码器模型,专门用于处理观察到的特征部分比先验已知部分更重要的任务。这种新的架构结合了 Kolmogorov-Arnold 表示的理论基础和变换器的力量。TKAT 的目标是简化时间序列中固有的复杂依赖关系,使它们更加“可解释”。在这个框架中使用变换器架构使我们能够通过自注意力机制捕捉长程依赖关系。
更新时间: 2024-06-04 16:55:42
领域: cs.LG
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
Partially manipulating a sentence can greatly change its meaning. Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing. However, the current understanding of the decision-making process of CMs is limited. We utilize Grad-CAM and introduce a quantitative analysis metric to interpret CMs' decisions. We find that CMs prioritize the artifacts of transition regions created when concatenating bona fide and spoofed audio. This focus differs from that of CMs trained on fully spoofed audio, which concentrate on the pattern differences between bona fide and spoofed parts. Our further investigation explains the varying nature of CMs' focus while making correct or incorrect predictions. These insights provide a basis for the design of CM models and the creation of datasets. Moreover, this work lays a foundation of interpretability in the field of partial spoofed audio detection that has not been well explored previously.
Updated: 2024-06-04 16:51:42
标题: 神经欺骗对策如何检测部分伪造的音频?
摘要: 部分篡改一个句子可能会极大地改变其含义。最近的研究表明,在部分篡改音频上训练的对抗措施(CMs)可以有效地检测此类篡改。然而,目前对CMs的决策过程的理解有限。我们利用Grad-CAM并引入了一个定量分析指标来解释CMs的决策。我们发现,CMs优先考虑在连接真实和伪造音频时创建的过渡区域的痕迹。这种关注重点不同于在完全篡改音频上训练的CMs,后者集中于真实和伪造部分之间的模式差异。我们进一步的研究解释了在做出正确或错误预测时,CMs关注点的变化性质。这些见解为CM模型的设计和数据集的创建提供了基础。此外,这项工作为先前尚未深入探讨的部分篡改音频检测领域的可解释性奠定了基础。
更新时间: 2024-06-04 16:51:42
领域: eess.AS,cs.AI,cs.SD
Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion
With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the context of steganography, the LLM serves as a carrier for hidden messages that can be disclosed through a designated trigger. Our work demonstrates that embedding hidden text in the LLM via fine-tuning, though seemingly secure due to the vast number of potential triggers (any sequence of characters or tokens could serve as a trigger), is susceptible to extraction through analysis of the LLM's output decoding process. We propose a novel approach to extraction called Unconditional Token Forcing. It is premised on the hypothesis that iteratively feeding each token from the LLM's vocabulary into the model should reveal sequences with abnormally high token probabilities, indicating potential embedded text candidates. Additionally, our experiments show that when the first token of a hidden fingerprint is used as an input, the LLM not only produces an output sequence with high token probabilities, but also repetitively generates the fingerprint itself. We also present a method to hide text in such a way that it is resistant to Unconditional Token Forcing, which we named Unconditional Token Forcing Confusion.
Updated: 2024-06-04 16:49:06
标题: 在大型语言模型中隐藏文本:引入无条件标记强制混淆
摘要: 借助简单的微调,可以人为地将隐藏文本嵌入大型语言模型(LLMs)中。只有在对LLM发出特定查询时,才会显示这些文本。LLM指纹识别和隐写术是两个主要应用领域。在LLM指纹识别的背景下,将一个唯一的文本标识符(指纹)嵌入模型中,以验证许可合规性。在隐写术的背景下,LLM作为隐藏信息的载体,可以通过指定的触发器来披露隐藏的信息。 我们的工作表明,通过微调在LLM中嵌入隐藏文本,虽然看似安全,因为潜在触发器的数量庞大(任何字符序列或令牌都可以作为触发器),但却容易通过分析LLM的输出解码过程来提取。我们提出了一种称为无条件令牌强制的新方法。它的基础是这样的假设:将LLM词汇表中的每个令牌迭代地馈送到模型中应该会显示出具有异常高令牌概率的序列,表明潜在的嵌入文本候选者。此外,我们的实验表明,当使用隐藏指纹的第一个令牌作为输入时,LLM不仅会产生具有高令牌概率的输出序列,还会重复生成指纹本身。我们还提出了一种隐藏文本的方法,使其对无条件令牌强制具有抗性,我们将其命名为无条件令牌强制混淆。
更新时间: 2024-06-04 16:49:06
领域: cs.CL,cs.CR
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Data driven models for automated diagnosis in radiology suffer from insufficient and imbalanced datasets due to low representation of pathology in a population and the cost of expert annotations. Datasets can be bolstered through data augmentation. However, even when utilizing a full suite of transformations during model training, typical data augmentations do not address variations in human anatomy. An alternative direction is to synthesize data using generative models, which can potentially craft datasets with specific attributes. While this holds promise, commonly used generative models such as Generative Adversarial Networks may inadvertently produce anatomically inaccurate features. On the other hand, diffusion models, which offer greater stability, tend to memorize training data, raising concerns about privacy and generative diversity. Alternatively, inpainting has the potential to augment data through directly inserting pathology in medical images. However, this approach introduces a new challenge: accurately merging the generated pathological features with the surrounding anatomical context. While inpainting is a well established method for addressing simple lesions, its application to pathologies that involve complex structural changes remains relatively unexplored. We propose an efficient method for inpainting pathological features onto healthy anatomy in MRI through voxelwise noise scheduling in a latent diffusion model. We evaluate the method's ability to insert disc herniation and central canal stenosis in lumbar spine sagittal T2 MRI, and it achieves superior Frechet Inception Distance compared to state-of-the-art methods.
Updated: 2024-06-04 16:47:47
标题: 在腰椎MRI中使用潜在扩散修复病变
摘要: 放射学中用于自动诊断的数据驱动模型存在数据不足和不平衡的问题,这是由于人群中病理学代表性低和专家注释成本高所致。数据增强可以增强数据集。然而,即使在模型训练过程中利用了完整的转换套件,典型的数据增强也无法解决人体解剖结构的变化。另一种替代方向是使用生成模型合成数据,这种方法可能会创造具有特定属性的数据集。虽然这种方法很有前途,但常用的生成模型,如生成对抗网络,可能会无意中产生解剖不准确的特征。另一方面,扩散模型提供了更大的稳定性,但倾向于记忆训练数据,引发了对隐私和生成多样性的担忧。另外,修补有潜力通过直接在医学图像中插入病理学特征来增强数据。然而,这种方法引入了一个新挑战:准确地将生成的病理特征与周围解剖背景合并。虽然修补是一种解决简单病变的成熟方法,但其应用于涉及复杂结构变化的病理学仍相对未经探讨。我们提出了一种通过潜在扩散模型中的体素噪声调度将病理特征修补到健康解剖学中的MRI的有效方法。我们评估了该方法在腰椎矢状T2型MRI中插入椎间盘突出和中央管狭窄的能力,并与最先进的方法相比,它实现了优越的Frechet Inception Distance。
更新时间: 2024-06-04 16:47:47
领域: eess.IV,cs.CV,cs.LG
Meta-Designing Quantum Experiments with Language Models
Artificial Intelligence (AI) has the potential to significantly advance scientific discovery by finding solutions beyond human capabilities. However, these super-human solutions are often unintuitive and require considerable effort to uncover underlying principles, if possible at all. Here, we show how a code-generating language model trained on synthetic data can not only find solutions to specific problems but can create meta-solutions, which solve an entire class of problems in one shot and simultaneously offer insight into the underlying design principles. Specifically, for the design of new quantum physics experiments, our sequence-to-sequence transformer architecture generates interpretable Python code that describes experimental blueprints for a whole class of quantum systems. We discover general and previously unknown design rules for infinitely large classes of quantum states. The ability to automatically generate generalized patterns in readable computer code is a crucial step toward machines that help discover new scientific understanding -- one of the central aims of physics.
Updated: 2024-06-04 16:40:55
标题: 用语言模型进行元设计量子实验
摘要: 人工智能(AI)有潜力通过发现超越人类能力的解决方案,显著推动科学发现。然而,这些超人类解决方案通常难以理解,并且需要大量努力才能揭示潜在原则,如果可能的话。在这里,我们展示了一个在合成数据上训练的代码生成语言模型不仅可以找到特定问题的解决方案,还可以创建元解决方案,一次性解决一个整类问题,并同时提供对基本设计原则的洞察。具体来说,在设计新的量子物理实验方面,我们的序列到序列变压器架构生成可解释的Python代码,描述了整类量子系统的实验蓝图。我们发现了对无限大类量子态的一般和以前未知的设计规则。自动生成可读计算机代码中的广义模式的能力是朝着帮助发现新的科学理解的机器的关键一步--这是物理学的中心目标之一。
更新时间: 2024-06-04 16:40:55
领域: quant-ph,cs.LG
kNN Classification of Malware Data Dependency Graph Features
Feature resolution impacts the ability of classifiers to make explainable inferences when applied to malware classification. We explore classification based on features constructed from data dependency graphs, and present results from k-Nearest Neighbors (kNN) classifiers. Our study demonstrates that classification based on a novel feature representation not only yields high accuracy, but also increases explainability in inference, as features of data dependency are directly representative of program behavior. We present classification results using the Microsoft Kaggle 2015 malware dataset which was processed with a novel approach to feature extraction and representation. We show that non-parametric approaches to classification in the metric space are able to obtain classification accuracy of 87.5\% when applied to multi-class classification in the Kaggle malware dataset. Additionally, similarity in the metric space can be calculated directly without prior training. Our results provide evidence that data dependency graphs accurately capture both semantic and structural information.
Updated: 2024-06-04 16:39:02
标题: kNN分类恶意软件数据依赖图特征
摘要: 特征分辨率影响分类器在应用于恶意软件分类时进行可解释推断的能力。我们探讨了基于数据依赖图构建的特征的分类,并展示了k-最近邻(kNN)分类器的结果。我们的研究表明,基于新颖特征表示的分类不仅能够获得高准确性,而且在推断中增加了可解释性,因为数据依赖的特征直接代表程序行为。我们展示了使用Microsoft Kaggle 2015恶意软件数据集进行的分类结果,该数据集经过一种新颖的特征提取和表示方法处理。我们展示了在度量空间中使用非参数方法进行分类时,能够在Kaggle恶意软件数据集的多类分类中获得87.5%的分类准确性。此外,在度量空间中的相似性可以直接计算,无需事先训练。我们的结果证明数据依赖图准确捕捉了语义和结构信息。
更新时间: 2024-06-04 16:39:02
领域: cs.CR,cs.AI,cs.LG
Landscape-Aware Growing: The Power of a Little LAG
Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have extensively focused on loss- and/or function-preserving behavior at initialization or simply performance at the end of training. Instead, we identify that behavior at initialization can be misleading as a predictor of final performance and present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)". We perform extensive analysis of correlation of the final performance with performance in the initial steps of training and find early and more accurate predictions of the optimal growing strategy (i.e., with only a small "lag" after initialization). This perspective also motivates an adaptive strategy for gradual stacking.
Updated: 2024-06-04 16:38:57
标题: 景观感知生长:一点LAG的力量
摘要: 最近,对于训练基于Transformer的模型的高效预训练范式越来越受到关注。最近的一些方法使用较小的模型来初始化较大的模型,以节省计算资源(例如,堆叠和融合)。在这项工作中,我们研究了如何从给定的增长策略池中选择最佳增长策略的基本问题。先前的研究主要关注于初始化时的损失和/或函数保持行为,或者简单地关注训练结束时的性能。相反,我们认为初始化时的行为可能会误导最终性能的预测,并提出了一种基于早期训练动态的替代观点,我们称之为“景观感知增长(LAG)”。我们对最终性能与训练初期性能之间的相关性进行了广泛分析,并发现了对最佳增长策略的早期和更准确的预测(即仅在初始化后有小的“滞后”)。这种观点也激发了一种逐步堆叠的自适应策略。
更新时间: 2024-06-04 16:38:57
领域: cs.LG,cs.CL
Pancreatic Tumor Segmentation as Anomaly Detection in CT Images Using Denoising Diffusion Models
Despite the advances in medicine, cancer has remained a formidable challenge. Particularly in the case of pancreatic tumors, characterized by their diversity and late diagnosis, early detection poses a significant challenge crucial for effective treatment. The advancement of deep learning techniques, particularly supervised algorithms, has significantly propelled pancreatic tumor detection in the medical field. However, supervised deep learning approaches necessitate extensive labeled medical images for training, yet acquiring such annotations is both limited and costly. Conversely, weakly supervised anomaly detection methods, requiring only image-level annotations, have garnered interest. Existing methodologies predominantly hinge on generative adversarial networks (GANs) or autoencoder models, which can pose complexity in training and, these models may face difficulties in accurately preserving fine image details. This research presents a novel approach to pancreatic tumor detection, employing weak supervision anomaly detection through denoising diffusion algorithms. By incorporating a deterministic iterative process of adding and removing noise along with classifier guidance, the method enables seamless translation of images between diseased and healthy subjects, resulting in detailed anomaly maps without requiring complex training protocols and segmentation masks. This study explores denoising diffusion models as a recent advancement over traditional generative models like GANs, contributing to the field of pancreatic tumor detection. Recognizing the low survival rates of pancreatic cancer, this study emphasizes the need for continued research to leverage diffusion models' efficiency in medical segmentation tasks.
Updated: 2024-06-04 16:38:11
标题: 胰腺肿瘤分割作为CT图像中异常检测的方法:使用去噪扩散模型
摘要: 尽管医学取得了进步,癌症仍然是一个严峻的挑战。特别是在胰腺肿瘤的情况下,其特点是多样性和晚期诊断,早期检测对有效治疗至关重要。深度学习技术的进步,特别是监督算法,显著推动了医学领域胰腺肿瘤的检测。然而,监督深度学习方法需要大量标记的医学图像进行训练,但获取这种注释既有限又昂贵。相比之下,仅需要图像级注释的弱监督异常检测方法引起了关注。现有方法主要依赖生成对抗网络(GANs)或自编码器模型,这可能导致训练复杂性,并且这些模型可能在准确保留细节图像方面面临困难。本研究提出了一种通过去噪扩散算法进行弱监督异常检测的胰腺肿瘤检测新方法。通过结合确定性迭代过程添加和去除噪声以及分类器指导,该方法能够在患病和健康受试者之间无缝转换图像,生成详细的异常地图,而无需复杂的训练协议和分割掩模。本研究探讨了去噪扩散模型作为传统生成模型如GANs的最新进展,为胰腺肿瘤检测领域做出贡献。鉴于胰腺癌的低生存率,本研究强调了继续研究利用扩散模型在医学分割任务中的效率的必要性。
更新时间: 2024-06-04 16:38:11
领域: eess.IV,cs.AI,cs.CV,cs.LG
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders
Can pretrained models generalize to new datasets without any retraining? We deploy pretrained image models on datasets they were not trained for, and investigate whether their embeddings form meaningful clusters. Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, deployed on image datasets that were not seen during training, and clustered with conventional clustering algorithms. This evaluation provides new insights into the embeddings of self-supervised models, which prioritize different features to supervised models. Supervised encoders typically offer more utility than SSL encoders within the training domain, and vice-versa far outside of it, however, fine-tuned encoders demonstrate the opposite trend. Clustering provides a way to evaluate the utility of self-supervised learned representations orthogonal to existing methods such as kNN. Additionally, we find the silhouette score when measured in a UMAP-reduced space is highly correlated with clustering performance, and can therefore be used as a proxy for clustering performance on data with no ground truth labels. Our code implementation is available at \url{https://github.com/scottclowe/zs-ssl-clustering/}.
Updated: 2024-06-04 16:34:17
标题: 一个关于利用自监督编码器对未见数据集进行聚类的实证研究
摘要: 预训练模型能否在没有重新训练的情况下泛化到新数据集?我们在它们没有经过训练的数据集上部署预训练的图像模型,并研究它们的嵌入是否形成有意义的聚类。我们的一系列基准实验使用仅在ImageNet-1k上预训练的编码器,其中使用监督或自监督训练技术,在训练过程中未见过的图像数据集上部署,并使用传统的聚类算法进行聚类。这种评估为我们提供了对自监督模型嵌入的新见解,这些模型更注重不同于监督模型的特征。监督编码器通常在训练领域内提供更多的效用,而自监督编码器在远离训练领域时提供更多效用,然而,微调编码器展现了相反的趋势。聚类提供了一种评估自监督学习表示效用的方法,与诸如kNN之类的现有方法正交。此外,我们发现,在UMAP缩减空间中测量的轮廓分数与聚类性能高度相关,因此可以用作在没有地面真实标签的数据上聚类性能的代理。我们的代码实现可在\url{https://github.com/scottclowe/zs-ssl-clustering/}上找到。
更新时间: 2024-06-04 16:34:17
领域: cs.LG,cs.AI,cs.CV
Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine. Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries. Furthermore, we allow for violations of standard causal assumptions, namely, overlap within the environments and unconfoundedness. To this end, we move away from point identification and focus on partial identification. Specifically, we show that current assumptions from the literature on multiple environments allow us to interpret the environment as an instrumental variable (IV). This allows us to adapt bounds from the IV literature for partial identification of CATE by leveraging treatment assignment mechanisms across environments. Then, we propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models. We further demonstrate the effectiveness of our meta-learners across various experiments using both simulated and real-world data. Finally, we discuss the applicability of our meta-learners to partial identification in instrumental variable settings, such as randomized controlled trials with non-compliance.
Updated: 2024-06-04 16:31:43
标题: 多环境下部分识别治疗效果的元学习者
摘要: 从观测数据中估计条件平均处理效应(CATE)在许多应用中是相关的,例如个性化医学。在这里,我们聚焦于普遍情况,即观测数据来自多个环境,如不同医院、医生或国家。此外,我们允许违反标准因果假设,即环境内的重叠和未混杂性。为此,我们远离点估计,聚焦于部分识别。具体来说,我们展示了文献中关于多个环境的当前假设让我们能够将环境解释为工具变量(IV)。这使我们能够通过利用跨环境的处理分配机制来调整IV文献中的界限,部分识别CATE。然后,我们提出了不同的模型无关学习器(所谓的元学习器)来估计这些界限,这些界限可以与任意机器学习模型结合使用。我们进一步展示了我们的元学习器在使用模拟和真实数据的各种实验中的有效性。最后,我们讨论了我们的元学习器在工具变量设置中的部分识别的适用性,例如具有不遵从性的随机对照试验。
更新时间: 2024-06-04 16:31:43
领域: cs.LG,cs.AI,stat.ML
Click Without Compromise: Online Advertising Measurement via Per User Differential Privacy
Online advertising is a cornerstone of the Internet ecosystem, with advertising measurement playing a crucial role in optimizing efficiency. Ad measurement entails attributing desired behaviors, such as purchases, to ad exposures across various platforms, necessitating the collection of user activities across these platforms. As this practice faces increasing restrictions due to rising privacy concerns, safeguarding user privacy in this context is imperative. Our work is the first to formulate the real-world challenge of advertising measurement systems with real-time reporting of streaming data in advertising campaigns. We introduce Ads-BPC, a novel user-level differential privacy protection scheme for advertising measurement results. This approach optimizes global noise power and results in a non-identically distributed noise distribution that preserves differential privacy while enhancing measurement accuracy. Through experiments on both real-world advertising campaigns and synthetic datasets, Ads-BPC achieves a 25% to 50% increase in accuracy over existing streaming DP mechanisms applied to advertising measurement. This highlights our method's effectiveness in achieving superior accuracy alongside a formal privacy guarantee, thereby advancing the state-of-the-art in privacy-preserving advertising measurement.
Updated: 2024-06-04 16:31:19
标题: 点击而无须妥协:通过个人差分隐私进行在线广告测量
摘要: 在线广告是互联网生态系统的基石,广告测量在优化效率中起着关键作用。广告测量涉及将期望的行为(如购买)归因于跨各种平台的广告曝光,需要收集这些平台上的用户活动。随着隐私关注度的上升,这种做法面临越来越多的限制,因此在这种情况下保护用户隐私至关重要。我们的工作是第一个在广告活动中实时报告流数据的广告测量系统的现实挑战。我们介绍了一种新颖的用户级差分隐私保护方案Ads-BPC,用于广告测量结果。该方法优化全局噪声功率,产生一个非同分布的噪声分布,保护差分隐私同时增强测量准确性。通过对真实广告活动和合成数据集的实验,Ads-BPC相比现有的应用于广告测量的实时DP机制,准确性提高了25%至50%。这突显了我们方法在实现卓越准确性的同时提供正式隐私保证的效果,从而推动了隐私保护广告测量技术的发展。
更新时间: 2024-06-04 16:31:19
领域: cs.CR
Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiency while still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.
Updated: 2024-06-04 16:30:37
标题: 通过基于补丁的扩散模型学习图像先验以解决反问题
摘要: 扩散模型可以从基础数据分布中学习强大的图像先验,并利用它们来解决逆问题,但训练过程计算成本高昂且需要大量数据。这种瓶颈阻碍了大多数现有作品在高维和高分辨率数据(如3D图像)方面的可行性。本文提出了一种方法,通过仅在图像块上训练扩散模型来学习整个图像的高效数据先验。具体而言,我们提出了一种基于图像块的位置感知扩散逆问题求解器,称为PaDIS,通过图像块的得分和位置编码获得整个图像的评分函数,并利用这一先验来解决逆问题。首先,我们证明了这种扩散模型实现了改进的内存效率和数据效率,同时仍保持通过位置编码生成整个图像的能力。此外,所提出的PaDIS模型非常灵活,可以与不同的扩散逆问题求解器(DIS)配合使用。我们证明了所提出的PaDIS方法能够通过学习基于图像块的先验来解决自然和医学图像领域中的各种逆问题,包括CT重建、去模糊和超分辨率,仅提供基于图像块的先验。值得注意的是,PaDIS在训练数据有限的情况下优于先前仅在整个图像先验上进行训练的DIS方法,通过学习基于图像块的先验展示了我们提出的方法的数据效率。
更新时间: 2024-06-04 16:30:37
领域: cs.CV,cs.AI
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun
Updated: 2024-06-04 16:25:25
标题: CPsyCoun:基于报告的中文心理咨询多轮对话重建和评估框架
摘要: 目前利用大型语言模型(LLMs)辅助心理咨询是一个重要但具有挑战性的任务。目前已经尝试改进具有同理心的对话或将LLMs作为治疗中的有效助手。然而,现有数据集缺乏咨询知识,导致LLMs缺乏专业咨询能力。此外,如何在咨询过程中自动评估多轮对话仍是一个未被充分研究的领域。为了弥补这一差距,我们提出了CPsyCoun,一个基于报告的中文心理咨询多轮对话重建和评估框架。为了充分利用心理咨询报告,我们设计了一个两阶段方法来构建高质量对话,同时开发了一个综合评估基准,用于有效自动评估多轮心理咨询。竞争性实验结果表明我们提出的框架在心理咨询中的有效性。我们开放源数据集和模型,供未来研究使用,网址为https://github.com/CAS-SIAT-XinHai/CPsyCoun。
更新时间: 2024-06-04 16:25:25
领域: cs.CL,cs.AI,cs.CY
Machine learning Hubbard parameters with equivariant neural networks
Density-functional theory with extended Hubbard functionals (DFT+$U$+$V$) provides a robust framework to accurately describe complex materials containing transition-metal or rare-earth elements. It does so by mitigating self-interaction errors inherent to semi-local functionals which are particularly pronounced in systems with partially-filled $d$ and $f$ electronic states. However, achieving accuracy in this approach hinges upon the accurate determination of the on-site $U$ and inter-site $V$ Hubbard parameters. In practice, these are obtained either by semi-empirical tuning, requiring prior knowledge, or, more correctly, by using predictive but expensive first-principles calculations. Here, we present a machine learning model based on equivariant neural networks which uses atomic occupation matrices as descriptors, directly capturing the electronic structure, local chemical environment, and oxidation states of the system at hand. We target here the prediction of Hubbard parameters computed self-consistently with iterative linear-response calculations, as implemented in density-functional perturbation theory (DFPT), and structural relaxations. Remarkably, when trained on data from 11 materials spanning various crystal structures and compositions, our model achieves mean absolute relative errors of 3% and 5% for Hubbard $U$ and $V$ parameters, respectively. By circumventing computationally expensive DFT or DFPT self-consistent protocols, our model significantly expedites the prediction of Hubbard parameters with negligible computational overhead, while approaching the accuracy of DFPT. Moreover, owing to its robust transferability, the model facilitates accelerated materials discovery and design via high-throughput calculations, with relevance for various technological applications.
Updated: 2024-06-04 16:21:24
标题: 使用等变神经网络学习哈伯德参数
摘要: 密度泛函理论与扩展Hubbard泛函(DFT+$U$+$V$)提供了一个稳健的框架,可以准确描述含有过渡金属或稀土元素的复杂材料。它通过减轻半局部泛函中固有的自相互作用误差来实现这一点,这种误差在具有部分填充$d$和$f$电子态的系统中尤为明显。然而,在这种方法中实现准确性取决于准确确定现场$U$和点间$V$ Hubbard参数。在实践中,这些参数通常通过半经验调整获得,需要先验知识,或者更正确地说,通过使用预测但昂贵的第一性原理计算获得。在这里,我们提出了一种基于等变神经网络的机器学习模型,该模型使用原子占据矩阵作为描述符,直接捕捉所研究系统的电子结构、局部化学环境和氧化态。我们的目标是使用迭代线性响应计算(实现在密度泛函微扰理论(DFPT)中)自洽计算得出的Hubbard参数进行预测,以及结构弛豫。值得注意的是,当我们在跨越各种晶体结构和组成的11种材料数据上进行训练时,我们的模型分别实现了Hubbard $U$和$V$参数的平均绝对相对误差为3%和5%。通过绕过计算昂贵的DFT或DFPT自洽协议,我们的模型显著加快了Hubbard参数的预测速度,同时接近DFPT的准确性。此外,由于其强大的可转移性,该模型通过高通量计算促进了材料的快速发现和设计,在各种技术应用中具有重要意义。
更新时间: 2024-06-04 16:21:24
领域: cond-mat.mtrl-sci,cs.LG,physics.chem-ph
Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs
We address the challenge of quantifying Bayesian uncertainty and incorporating it in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics. Our approach provides a principled method to disentangle epistemic and aleatoric uncertainty, and a novel technique to find policies that optimise Bayesian posterior expected value without relying on strong assumptions about the MDP's posterior distribution. First, we utilise standard Bayesian reinforcement learning methods to capture the posterior uncertainty in MDP parameters based on available data. We then analytically compute the first two moments of the return distribution across posterior samples and apply the law of total variance to disentangle aleatoric and epistemic uncertainties. To find policies that maximise posterior expected value, we leverage the closed-form expression for value as a function of policy. This allows us to propose a stochastic gradient-based approach for solving the problem. We illustrate the uncertainty quantification and Bayesian posterior value optimisation performance of our agent in simple, interpretable gridworlds and validate it through ground-truth evaluations on synthetic MDPs. Finally, we highlight the real-world impact and computational scalability of our method by applying it to the AI Clinician problem, which recommends treatment for patients in intensive care units and has emerged as a key use case of finite-state MDPs with offline data. We discuss the challenges that arise with Bayesian modelling of larger scale MDPs while demonstrating the potential to apply our methods rooted in Bayesian decision theory into the real world. We make our code available at https://github.com/filippovaldettaro/finite-state-mdps .
Updated: 2024-06-04 16:21:14
标题: 有限状态MDP中的离线贝叶斯不确定性量化和后验值优化
摘要: 我们解决了在未知动态的有限状态马尔可夫决策过程(MDPs)的离线使用案例中量化贝叶斯不确定性并将其纳入的挑战。我们的方法提供了一种原则性的方法来区分认知不确定性和随机不确定性,并提出了一种新颖的技术,可以在不依赖于MDP后验分布的强假设的情况下找到优化贝叶斯后验期望值的策略。首先,我们利用标准的贝叶斯强化学习方法来基于可用数据捕捉MDP参数的后验不确定性。然后,我们通过分析计算跨后验样本的回报分布的前两个矩,并应用总方差定律来区分随机和认知不确定性。为了找到最大化后验期望值的策略,我们利用价值作为策略函数的封闭形式表达式。这使我们能够提出一种基于随机梯度的方法来解决问题。我们通过在简单、可解释的网格世界中展示我们的代理的不确定性量化和贝叶斯后验价值优化性能,并通过合成MDP的基准评估来验证它。最后,我们通过将方法应用于AI临床医师问题来突出我们的方法在现实世界中的影响和计算可扩展性,该问题推荐重症监护病房患者的治疗,已成为具有离线数据的有限状态MDPs的关键用例。我们讨论了在贝叶斯建模更大规模MDPs时出现的挑战,同时展示了将我们基于贝叶斯决策理论的方法应用于现实世界的潜力。我们的代码可以在https://github.com/filippovaldettaro/finite-state-mdps 上找到。
更新时间: 2024-06-04 16:21:14
领域: cs.LG
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies
A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonstrations are generated with a homogeneous policy driven by a single reward function. Still, some AL algorithms which consider heterogeneity, often can not generalize to large continuous state space and only work with discrete states. In this paper, we propose an expectation-maximization(EM)-EDM, a general AL framework to induce effective pedagogical policies from given optimal or near-optimal demonstrations, which are assumed to be driven by heterogeneous reward functions. We compare the effectiveness of the policies induced by our proposed EM-EDM against four AL-based baselines and two policies induced by DRL on two different but related tasks that involve pedagogical action prediction. Our overall results showed that, for both tasks, EM-EDM outperforms the four AL baselines across all performance metrics and the two DRL baselines. This suggests that EM-EDM can effectively model complex student pedagogical decision-making processes through the ability to manage a large, continuous state space and adapt to handle diverse and heterogeneous reward functions with very few given demonstrations.
Updated: 2024-06-04 16:14:55
标题: 一个用于建模异质学生教学策略的广义学徒学习框架
摘要: 在智能辅导系统(ITSs)等电子学习环境中,一个关键挑战是有效而高效地引导教学政策。虽然深度强化学习(DRL)经常面临样本效率低和奖励函数设计困难的问题,但学徒学习(AL)算法可以克服这些问题。然而,大多数AL算法无法处理异质性,因为它们假定所有演示都是由单一奖励函数驱动的同质政策生成的。尽管一些考虑异质性的AL算法往往无法推广到大的连续状态空间,并且仅适用于离散状态。在本文中,我们提出了一种期望最大化(EM)-EDM,这是一个通用的AL框架,用于从给定的最佳或接近最佳的演示中引导有效的教学政策,这些演示被假定是由异质奖励函数驱动的。我们比较了我们提出的EM-EDM引导的政策与四种基于AL的基线和两种由DRL引导的政策在涉及教学行为预测的两个不同但相关任务上的有效性。我们的总体结果显示,在两个任务中,EM-EDM在所有绩效指标上均优于四种AL基线和两种DRL基线。这表明EM-EDM能够通过管理大型连续状态空间和适应处理多样化和异质奖励函数来有效地建模复杂的学生教学决策过程,即使只有很少的给定演示。
更新时间: 2024-06-04 16:14:55
领域: cs.LG,cs.AI
RepCNN: Micro-sized, Mighty Models for Wakeword Detection
Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.
Updated: 2024-06-04 16:14:19
标题: RepCNN:微型、强大的唤醒词检测模型
摘要: 始终开启的机器学习模型需要非常低的内存和计算占用。它们受限的参数数量限制了模型的学习能力,以及通常的训练算法找到最佳参数的效果。我们展示了一个小型卷积模型可以通过首先将其计算重构为一个更大的冗余多分支架构而得到更好的训练。然后,在推断阶段,我们对训练好的模型进行代数重参数化,转换为参数更少的单分支形式,以降低内存占用和计算成本。使用这种技术,我们展示了我们的始终开启的唤醒词检测器模型RepCNN在推断过程中提供了延迟和准确性之间的良好平衡。RepCNN重参数化模型比单分支卷积模型更准确43%,同时具有相同的运行时间。RepCNN还在具有2倍更低峰值内存使用量和10倍更快运行时间的情况下,达到了像BC-ResNet这样复杂架构的准确性。
更新时间: 2024-06-04 16:14:19
领域: eess.AS,cs.AI,cs.LG
Representations as Language: An Information-Theoretic Framework for Interpretability
Large scale neural models show impressive performance across a wide array of linguistic tasks. Despite this they remain, largely, black-boxes - inducing vector-representations of their input that prove difficult to interpret. This limits our ability to understand what they learn, and when the learn it, or describe what kinds of representations generalise well out of distribution. To address this we introduce a novel approach to interpretability that looks at the mapping a model learns from sentences to representations as a kind of language in its own right. In doing so we introduce a set of information-theoretic measures that quantify how structured a model's representations are with respect to its input, and when during training that structure arises. Our measures are fast to compute, grounded in linguistic theory, and can predict which models will generalise best based on their representations. We use these measures to describe two distinct phases of training a transformer: an initial phase of in-distribution learning which reduces task loss, then a second stage where representations becoming robust to noise. Generalisation performance begins to increase during this second phase, drawing a link between generalisation and robustness to noise. Finally we look at how model size affects the structure of the representational space, showing that larger models ultimately compress their representations more than their smaller counterparts.
Updated: 2024-06-04 16:14:00
标题: 表征作为语言:可解释性的信息论框架
摘要: 大规模神经模型在各种语言任务中展现出令人印象深刻的性能。尽管如此,它们仍然在很大程度上是黑匣子-产生难以解释的输入向量表示。这限制了我们理解它们学到了什么,以及何时学到,或者描述哪种表征能很好地泛化到分布之外。为了解决这个问题,我们引入了一种新颖的可解释性方法,将模型从句子到表示的映射视为一种语言。通过这样做,我们引入了一组信息理论量度,用于量化模型的表示相对于其输入的结构化程度,以及在训练过程中何时出现这种结构。我们的量度计算速度快,基于语言理论,可以根据其表示来预测哪些模型将在泛化性能上表现最佳。我们使用这些量度描述了transformer训练的两个不同阶段:一个在分布学习中降低任务损失的初始阶段,然后是表示开始对噪音具有鲁棒性的第二阶段。在这第二阶段,泛化性能开始增加,将泛化性和对噪音的鲁棒性联系起来。最后,我们研究了模型大小如何影响表示空间的结构,显示较大的模型最终比较小的模型更有效地压缩它们的表示。
更新时间: 2024-06-04 16:14:00
领域: cs.CL,cs.AI
Reducing Bias in Federated Class-Incremental Learning with Hierarchical Generative Prototypes
Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients) while safeguarding data privacy. On top of that, Federated Continual Learning (FCL) also accounts for data distribution evolving over time, mirroring the dynamic nature of real-world environments. In this work, we shed light on the Incremental and Federated biases that naturally emerge in FCL. While the former is a known problem in Continual Learning, stemming from the prioritization of recently introduced classes, the latter (i.e., the bias towards local distributions) remains relatively unexplored. Our proposal constrains both biases in the last layer by efficiently fine-tuning a pre-trained backbone using learnable prompts, resulting in clients that produce less biased representations and more biased classifiers. Therefore, instead of solely relying on parameter aggregation, we also leverage generative prototypes to effectively balance the predictions of the global model. Our method improves on the current State Of The Art, providing an average increase of +7.9% in accuracy.
Updated: 2024-06-04 16:12:27
标题: 使用分层生成原型减少联邦式增量学习中的偏差
摘要: 联邦学习(FL)旨在通过在多个设备(客户端)之间分配计算来减轻深度模型的训练负担,同时确保数据隐私。除此之外,联邦持续学习(FCL)还考虑到数据分布随时间演变,反映了现实世界环境的动态性质。在这项工作中,我们揭示了在FCL中自然产生的增量和联邦偏见。前者是持续学习中已知的问题,源自于对最近引入的类别的优先级。而后者(即偏向于局部分布的偏见)仍然相对未被探讨。我们的提议通过使用可学习提示有效微调预训练的骨干,限制了最后一层中的两种偏见,从而使客户端产生更少偏见的表示和更多偏见的分类器。因此,我们不仅仅依靠参数聚合,还利用生成原型有效平衡全局模型的预测。我们的方法改进了当前技术水平,提供了平均+7.9%的准确性增加。
更新时间: 2024-06-04 16:12:27
领域: cs.LG
A Sentiment Consolidation Framework for Meta-Review Generation
Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the scientific domain. To make scientific sentiment summarization more grounded, we hypothesize that human meta-reviewers follow a three-layer framework of sentiment consolidation to write meta-reviews. Based on the framework, we propose novel prompting methods for LLMs to generate meta-reviews and evaluation metrics to assess the quality of generated meta-reviews. Our framework is validated empirically as we find that prompting LLMs based on the framework -- compared with prompting them with simple instructions -- generates better meta-reviews.
Updated: 2024-06-04 16:10:13
标题: 一个用于元评审生成的情感整合框架
摘要: 现代自然语言生成系统与大型语言模型(LLMs)展现了生成多篇文档的合理摘要的能力;然而,是否它们真正拥有整合信息的能力以生成摘要尚不确定,尤其是在包含观点性信息的文档上。我们关注元评论生成,这是科学领域的一种情感总结形式。为了使科学情感总结更具实质性,我们假设人类元评论者遵循一种情感整合的三层框架来撰写元评论。基于这一框架,我们提出了新的提示方法,以便LLMs生成元评论,并提出了评估生成的元评论质量的度量标准。通过实证验证,我们发现基于该框架提示LLMs生成元评论比简单指令提示它们更好。
更新时间: 2024-06-04 16:10:13
领域: cs.CL,cs.AI
Explainable Deep Learning Analysis for Raga Identification in Indian Art Music
The task of Raga Identification is a very popular research problem in Music Information Retrieval. Few studies that have explored this task employed various approaches, such as signal processing, Machine Learning (ML) methods, and more recently Deep Learning (DL) based methods. However, a key question remains unanswered in all of these works: do these ML/DL methods learn and interpret Ragas in a manner similar to human experts? Besides, a significant roadblock in this research is the unavailability of ample supply of rich, labeled datasets, which drives these ML/DL based methods. In this paper, we introduce "Prasarbharti Indian Music" version-1 (PIM-v1), a novel dataset comprising of 191 hours of meticulously labeled Hindustani Classical Music (HCM) recordings, which is the largest labeled dataset for HCM recordings to the best of our knowledge. Our approach involves conducting ablation studies to find the benchmark classification model for Automatic Raga Identification (ARI) using PIM-v1 dataset. We achieve a chunk-wise f1-score of 0.89 for a subset of 12 Raga classes. Subsequently, we employ model explainability techniques to evaluate the classifier's predictions, aiming to ascertain whether they align with human understanding of Ragas or are driven by arbitrary patterns. We validate the correctness of model's predictions by comparing the explanations given by two ExAI models with human expert annotations. Following this, we analyze explanations for individual test examples to understand the role of regions highlighted by explanations in correct or incorrect predictions made by the model.
Updated: 2024-06-04 16:06:51
标题: 可解释的深度学习分析用于印度艺术音乐中的拉嘎识别
摘要: Raga Identification任务是音乐信息检索中一个非常流行的研究问题。对于探索这一任务的研究采用了各种方法,如信号处理、机器学习(ML)方法,以及最近基于深度学习(DL)的方法。然而,所有这些工作中仍有一个关键问题没有得到回答:这些ML/DL方法是否以类似人类专家的方式学习和解释Ragas?此外,这项研究中的一个重要障碍是缺乏丰富、标记的数据集,这促使这些基于ML/DL的方法。在本文中,我们介绍了“Prasarbharti Indian Music”版本1(PIM-v1),这是一个新颖的数据集,包括191小时精心标记的北印度古典音乐(HCM)录音,据我们所知,这是迄今为止最大的HCM录音标记数据集。我们的方法涉及进行消融研究,以找到使用PIM-v1数据集的自动Raga识别(ARI)的基准分类模型。我们为12个Raga类别的一个子集实现了0.89的块状f1分数。随后,我们采用模型可解释性技术来评估分类器的预测,旨在确定它们是否与人类对Ragas的理解一致,还是受任意模式驱动。我们通过比较两个ExAI模型给出的解释与人类专家注释来验证模型的预测的正确性。在此之后,我们分析单个测试例子的解释,以了解解释中突出显示的区域在模型所作出的正确或不正确预测中的作用。
更新时间: 2024-06-04 16:06:51
领域: eess.AS,cs.AI
Fast Decision Boundary based Out-of-Distribution Detector
Efficient and effective Out-of-Distribution (OOD) detection is essential for the safe deployment of AI systems. Existing feature space methods, while effective, often incur significant computational overhead due to their reliance on auxiliary models built from training features. In this paper, we propose a computationally-efficient OOD detector without using auxiliary models while still leveraging the rich information embedded in the feature space. Specifically, we detect OOD samples based on their feature distances to decision boundaries. To minimize computational cost, we introduce an efficient closed-form estimation, analytically proven to tightly lower bound the distance. Based on our estimation, we discover that In-Distribution (ID) features tend to be further from decision boundaries than OOD features. Additionally, ID and OOD samples are better separated when compared at equal deviation levels from the mean of training features. By regularizing the distances to decision boundaries based on feature deviation from the mean, we develop a hyperparameter-free, auxiliary model-free OOD detector. Our method matches or surpasses the effectiveness of state-of-the-art methods in extensive experiments while incurring negligible overhead in inference latency. Overall, our approach significantly improves the efficiency-effectiveness trade-off in OOD detection. Code is available at: https://github.com/litianliu/fDBD-OOD.
Updated: 2024-06-04 16:01:27
标题: 基于快速决策边界的超出分布检测器
摘要: 高效且有效的异常检测对于AI系统的安全部署至关重要。现有的特征空间方法虽然有效,但往往由于依赖于从训练特征构建的辅助模型而产生显著的计算开销。在本文中,我们提出了一种在不使用辅助模型的情况下具有计算效率的异常检测器,同时利用了特征空间中嵌入的丰富信息。具体来说,我们根据样本特征到决策边界的距离来检测异常样本。为了最小化计算成本,我们引入了一种高效的闭式估计,经过分析证明能够严格下界距离。基于我们的估计,我们发现正常分布的特征通常比异常分布的特征离决策边界更远。此外,在与训练特征的均值相等的偏差水平下比较时,正常分布和异常样本的分隔更好。通过根据特征偏离均值的程度对到决策边界的距离进行正则化,我们开发了一种无超参数、无辅助模型的异常检测器。我们的方法在广泛实验中与最先进的方法的有效性相匹配或超越,同时在推断延迟上产生可忽略的开销。总的来说,我们的方法在异常检测中显著改善了效率和有效性的权衡。代码可在以下网址获取:https://github.com/litianliu/fDBD-OOD。
更新时间: 2024-06-04 16:01:27
领域: cs.LG,eess.IV
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models, the current state-of-the-art models in numerous fields. While MoE has been mostly investigated for the pre-training stage, its use in parameter-efficient transfer learning settings is under-explored. To narrow this gap, this paper attempts to demystify the use of MoE for parameter-efficient fine-tuning of Audio Spectrogram Transformers to audio and speech downstream tasks. Specifically, we propose Soft Mixture of Adapters (Soft-MoA). It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited. Extensive experiments across 4 benchmarks demonstrate that Soft-MoA outperforms the single adapter method and performs on par with the dense MoA counterpart. We finally present ablation studies on key elements of Soft-MoA, showing for example that Soft-MoA achieves better scaling with more experts, as well as ensuring that all experts contribute to the computation of the output tokens, thus dispensing with the expert imbalance issue.
Updated: 2024-06-04 15:53:28
标题: 通过软适配器混合实现音频频谱变换器的高效微调
摘要: 混合专家(MoE)架构最近开始蓬勃发展,因为它们能够在保持可承受的计算成本的同时扩展模型的容量。此外,它们可以应用于变压器和状态空间模型,这是许多领域中最先进的模型。虽然MoE主要用于预训练阶段的研究,但其在参数高效的迁移学习设置中的应用尚未充分探讨。为了弥补这一差距,本文试图揭示MoE在音频和语音下游任务的参数高效微调中的应用。具体地,我们提出了软适配器混合(Soft-MoA)。它利用适配器作为专家,并利用最近的软MoE方法,在输入令牌和专家之间保持软分配,以限制计算时间。在4个基准测试中进行的大量实验表明,Soft-MoA优于单个适配器方法,并与密集MoA相媲美。最后,我们对Soft-MoA的关键要素进行了消融研究,例如,Soft-MoA在更多专家的情况下实现更好的扩展,并确保所有专家都对输出令牌的计算做出贡献,从而消除专家失衡问题。
更新时间: 2024-06-04 15:53:28
领域: eess.AS,cs.AI
How Smooth Is Attention?
Self-attention and masked self-attention are at the heart of Transformers' outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties - which are key when it comes to analyzing robustness and expressive power - is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length $n$ and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. In particular, we show that for inputs of length $n$ in any compact set, the Lipschitz constant of self-attention is bounded by $\sqrt{n}$ up to a constant factor and that this bound is tight for reasonable sequence lengths. When the sequence length $n$ is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of $n$. Our mean-field framework for masked self-attention is novel and of independent interest. Our experiments on pretrained and randomly initialized BERT and GPT-2 support our theoretical findings.
Updated: 2024-06-04 15:51:36
标题: 关注力有多平稳?
摘要: 自注意力和掩蔽自注意力是Transformer取得卓越成功的核心。然而,我们对注意力的数学理解,特别是其利普希茨属性 - 在分析鲁棒性和表达能力时至关重要 - 还不完整。我们对几种实际场景中的自注意力的利普希茨常数进行了详细研究,讨论了序列长度$n$和层归一化对未掩蔽和掩蔽自注意力的局部利普希茨常数的影响。特别是,我们表明对于任何紧致集合中长度为$n$的输入,自注意力的利普希茨常数受到一个常数因子的限制并且这个界对于合理的序列长度是紧的。当序列长度$n$过大以至于上述界限不再紧时,我们提供了一个与$n$无关的上界和匹配的下界,我们称之为平均场区域。我们的掩蔽自注意力的平均场框架是新颖的且具有独立的兴趣。我们在预训练和随机初始化的BERT和GPT-2上的实验支持我们的理论发现。
更新时间: 2024-06-04 15:51:36
领域: cs.LG
Coresets for Multiple $\ell_p$ Regression
A coreset of a dataset with $n$ examples and $d$ features is a weighted subset of examples that is sufficient for solving downstream data analytic tasks. Nearly optimal constructions of coresets for least squares and $\ell_p$ linear regression with a single response are known in prior work. However, for multiple $\ell_p$ regression where there can be $m$ responses, there are no known constructions with size sublinear in $m$. In this work, we construct coresets of size $\tilde O(\varepsilon^{-2}d)$ for $p<2$ and $\tilde O(\varepsilon^{-p}d^{p/2})$ for $p>2$ independently of $m$ (i.e., dimension-free) that approximate the multiple $\ell_p$ regression objective at every point in the domain up to $(1\pm\varepsilon)$ relative error. If we only need to preserve the minimizer subject to a subspace constraint, we improve these bounds by an $\varepsilon$ factor for all $p>1$. All of our bounds are nearly tight. We give two application of our results. First, we settle the number of uniform samples needed to approximate $\ell_p$ Euclidean power means up to a $(1+\varepsilon)$ factor, showing that $\tilde\Theta(\varepsilon^{-2})$ samples for $p = 1$, $\tilde\Theta(\varepsilon^{-1})$ samples for $1 < p < 2$, and $\tilde\Theta(\varepsilon^{1-p})$ samples for $p>2$ is tight, answering a question of Cohen-Addad, Saulpic, and Schwiegelshohn. Second, we show that for $1<p<2$, every matrix has a subset of $\tilde O(\varepsilon^{-1}k)$ rows which spans a $(1+\varepsilon)$-approximately optimal $k$-dimensional subspace for $\ell_p$ subspace approximation, which is also nearly optimal.
Updated: 2024-06-04 15:50:42
标题: 多个$\ell_p$回归问题的核心集合
摘要: 一个包含$n$个示例和$d$个特征的数据集的核心集是一个加权的示例子集,足以解决下游数据分析任务。先前的工作已知对于具有单个响应的最小二乘和$\ell_p$线性回归的核心集的构造几乎是最优的。然而,对于具有$m$个响应的多个$\ell_p$回归,尚未发现大小与$m$的子线性的构造。在这项工作中,我们构建了大小为$\tilde O(\varepsilon^{-2}d)$的核心集,适用于$p<2$,以及独立于$m$(即与维度无关)的$\tilde O(\varepsilon^{-p}d^{p/2})$,适用于$p>2$,它们在域中的每个点上以$(1\pm\varepsilon)$的相对误差近似多个$\ell_p$回归目标。如果我们只需要在子空间约束下保留最小化器,则对于所有$p>1$,我们将这些界限提高了一个$\varepsilon$因子。我们的所有界限几乎是紧密的。 我们给出了我们结果的两个应用。首先,我们确定了近似$\ell_p$欧几里德幂均值所需的均匀样本数量,直到$(1+\varepsilon)$因子,表明对于$p = 1$,需要$\tilde\Theta(\varepsilon^{-2})$个样本,对于$1 < p < 2$,需要$\tilde\Theta(\varepsilon^{-1})$个样本,对于$p>2$,需要$\tilde\Theta(\varepsilon^{1-p})$个样本是紧密的,回答了Cohen-Addad,Saulpic和Schwiegelshohn的问题。其次,我们展示了对于$1 < p < 2$,每个矩阵都有一个包含$\tilde O(\varepsilon^{-1}k)$行的子集,它们跨越一个$(1+\varepsilon)$近似最优的$k$维子空间用于$\ell_p$子空间近似,这也是几乎最优的。
更新时间: 2024-06-04 15:50:42
领域: cs.DS,cs.LG,stat.ML
Reweighted Solutions for Weighted Low Rank Approximation
Weighted low rank approximation (WLRA) is an important yet computationally challenging primitive with applications ranging from statistical analysis, model compression, and signal processing. To cope with the NP-hardness of this problem, prior work considers heuristics, bicriteria, or fixed parameter tractable algorithms to solve this problem. In this work, we introduce a new relaxed solution to WLRA which outputs a matrix that is not necessarily low rank, but can be stored using very few parameters and gives provable approximation guarantees when the weight matrix has low rank. Our central idea is to use the weight matrix itself to reweight a low rank solution, which gives an extremely simple algorithm with remarkable empirical performance in applications to model compression and on synthetic datasets. Our algorithm also gives nearly optimal communication complexity bounds for a natural distributed problem associated with this problem, for which we show matching communication lower bounds. Together, our communication complexity bounds show that the rank of the weight matrix provably parameterizes the communication complexity of WLRA. We also obtain the first relative error guarantees for feature selection with a weighted objective.
Updated: 2024-06-04 15:50:35
标题: 重新加权的解决方案用于加权低秩逼近
摘要: 加权低秩逼近(WLRA)是一种重要但计算上具有挑战性的基本方法,其应用范围包括统计分析、模型压缩和信号处理。为了应对这个问题的NP难度,先前的研究考虑启发式方法、双目标、或固定参数可解算法来解决这个问题。在这项工作中,我们介绍了一种新的放松解决方案,该解决方案输出一个不一定是低秩的矩阵,但可以使用非常少的参数存储,并在权重矩阵具有低秩时给出可证明的逼近保证。我们的核心思想是使用权重矩阵本身来重新加权低秩解决方案,从而得到一个极其简单的算法,在模型压缩和合成数据集的应用中表现出显著的经验性能。我们的算法还为与该问题相关的一个自然分布式问题提供了几乎最优的通信复杂度界限,我们展示了匹配通信下界。综合来看,我们的通信复杂度界限表明,权重矩阵的秩可确定WLRA的通信复杂度。我们还获得了第一个带有加权目标的特征选择的相对误差保证。
更新时间: 2024-06-04 15:50:35
领域: cs.DS,cs.LG,stat.ML
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
Instruction-tuned large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints. While extensive fine-tuning and prompting can mitigate the issue, these approaches are expensive and hard to generalize. Furthermore, because syntax constraints are only learned implicitly during fine-tuning, models still make frequent syntax errors. Motivated by the fact that these constraints can be better satisfied explicitly with constrained decoding, we propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax. Our experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks. More surprisingly, when applied to generalist out-of-the-box LLMs such as Mistral-Instruct, TOOLDEC improves its accuracy in tool use from the initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.
Updated: 2024-06-04 15:50:22
标题: 不要微调,解码:通过受限解码实现无语法错误的工具使用
摘要: 调校指导的大型语言模型(LLMs)在许多任务上表现出色,但通常由于复杂和陌生的语法约束而无法使用外部工具。虽然广泛的微调和提示可以缓解这一问题,但这些方法昂贵且难以泛化。此外,由于语法约束仅在微调过程中隐式学习,模型仍然经常产生语法错误。受到这些约束可以通过有限状态机在有约束的解码中显式满足的事实的启发,我们提出了TOOLDEC,一种使用有限状态机的解码算法,强制LLMs遵循工具语法。我们的实验显示,TOOLDEC消除了所有语法错误,在各种基本模型和基准测试中取得了显着更好的性能。更令人惊讶的是,当应用于通用的开箱即用LLMs,如Mistral-Instruct时,TOOLDEC将其在工具使用方面的准确性从初始的0%提高到令人印象深刻的52%,与专门微调的模型如ToolLLM的性能相匹敌。
更新时间: 2024-06-04 15:50:22
领域: cs.CL,cs.AI
Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is compactly commensurate with the intrinsic complexity of a newly arriving task. This constructs a near-minimal network while allowing the model to expand its capacity when cannot sufficiently hold new classes. At inference time, it automatically reactivates the required neural units to retrieve knowledge and leaves the remaining inactivated to prevent interference. We name our model AutoActivator, which is effective and scalable. To gain insights into the neural unit dynamics, we theoretically analyze the model's convergence property via a universal approximation theorem on learning sequential mappings, which is under-explored in the CIL community. Experiments show that our method achieves strong CIL performance in rehearsal-free and minimal-expansion settings with different backbones.
Updated: 2024-06-04 15:47:03
标题: 利用神经元单元动态实现有效且可扩展的类增量学习
摘要: Class-incremental learning (CIL)旨在训练模型从非静态数据流中学习新类别,同时不忘记旧类别。本文提出了一种新型的连接主义模型,通过调整神经单元动态来适应神经网络在CIL中的行为。在每次训练会话中,它引入了一个监督机制,指导网络扩展,其增长大小紧密与新任务的内在复杂性相对应。这构建了一个接近最小化的网络,同时允许模型在无法充分容纳新类别时扩展其容量。在推断时,它自动重新激活必要的神经单元以检索知识,并将其余的未激活以防止干扰。我们将我们的模型命名为AutoActivator,它是有效和可扩展的。为了深入了解神经单元动态,我们通过学习序列映射的通用逼近定理在CIL社区中尚未充分探讨的模型收敛性进行了理论分析。实验证明,我们的方法在无重复练习和最小扩展设置中以不同的骨干实现了强大的CIL性能。
更新时间: 2024-06-04 15:47:03
领域: cs.LG
Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls
In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from historical samples, leading to decision performance variations with nonparametric or parametric estimators. To address this, we propose a distributionally robust approach that uses an ambiguity set by the intersection of two Wasserstein balls, each centered on typical nonparametric or parametric distribution estimators. Computationally, we establish the tractable reformulation of this distributionally robust optimization problem. Statistically, we provide guarantees for our Wasserstein ball intersection approach under covariate shift by analyzing the measure concentration of the estimators. Furthermore, to reduce computational complexity, we employ a surrogate objective that maintains similar generalization guarantees. Through synthetic and empirical case studies on income prediction and portfolio optimization, we demonstrate the strong empirical performance of our proposed models.
Updated: 2024-06-04 15:46:41
标题: 在协变量转移下的上下文优化:通过相交的Wasserstein球实现鲁棒性方法
摘要: 在情境优化中,决策者观察不确定变量和相关的同时协变量的历史样本,但并不知道它们的联合分布。在给定额外协变量观察的情况下,目标是选择一个决策,以最小化一些操作成本。这里一个普遍问题是协变量转移,新协变量的边际分布与历史样本不同,导致决策性能与非参数或参数估计器有所不同。为了解决这个问题,我们提出了一种分布鲁棒的方法,使用两个Wasserstein球的交集构成的模糊集,每个球都以典型的非参数或参数分布估计器为中心。在计算上,我们建立了这个分布鲁棒优化问题的可计算重构。在统计上,通过分析估计器的测量集中,我们提供了在协变量转移下我们的Wasserstein球交集方法的保证。此外,为了减少计算复杂性,我们采用了一个保持类似泛化保证的替代目标。通过对收入预测和投资组合优化的合成和实证案例研究,我们展示了我们提出的模型在实证性能上的强大表现。
更新时间: 2024-06-04 15:46:41
领域: math.OC,cs.LG
Differentially Private Decentralized Learning with Random Walks
The popularity of federated learning comes from the possibility of better scalability and the ability for participants to keep control of their data, improving data security and sovereignty. Unfortunately, sharing model updates also creates a new privacy attack surface. In this work, we characterize the privacy guarantees of decentralized learning with random walk algorithms, where a model is updated by traveling from one node to another along the edges of a communication graph. Using a recent variant of differential privacy tailored to the study of decentralized algorithms, namely Pairwise Network Differential Privacy, we derive closed-form expressions for the privacy loss between each pair of nodes where the impact of the communication topology is captured by graph theoretic quantities. Our results further reveal that random walk algorithms tends to yield better privacy guarantees than gossip algorithms for nodes close from each other. We supplement our theoretical results with empirical evaluation on synthetic and real-world graphs and datasets.
Updated: 2024-06-04 15:46:22
标题: 使用随机游走的差分隐私分散式学习
摘要: 联邦学习的流行性源于更好的可扩展性可能性以及参与者保持对其数据控制的能力,从而提高数据安全性和主权性。不幸的是,共享模型更新也会产生新的隐私攻击面。在这项工作中,我们对使用随机游走算法的分散式学习的隐私保证进行了表征,其中模型通过沿着通信图的边从一个节点移动到另一个节点来进行更新。利用最近针对分散式算法研究定制的差分隐私的变体,即成对网络差分隐私,我们推导出每对节点之间的隐私损失的闭合形式表达式,其中通信拓扑的影响由图论量捕获。我们的结果进一步揭示了随机游走算法倾向于为彼此接近的节点提供比八卦算法更好的隐私保证。我们通过对合成和真实世界图形和数据集的实证评估来补充我们的理论结果。
更新时间: 2024-06-04 15:46:22
领域: cs.LG,cs.CR
Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model. The firm aims to maximize its revenue, i.e. minimize its regret over a clairvoyant that knows the model in advance. The demand model is a generalized linear model (GLM), allowing for a stochastic feature vector in $\mathbb R^d$ that encodes product and consumer information. We first show that the optimal regret upper bound is of order $\sqrt{dT}$, up to a logarithmic factor, improving upon existing upper bounds in the literature by a $\sqrt{d}$ factor. This sharper rate is materialised by two algorithms: a confidence bound-type (supCB) algorithm and an explore-then-commit (ETC) algorithm. A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem with many arms based on a careful discretization. We further study contextual dynamic pricing under the local differential privacy (LDP) constraints. In particular, we propose a stochastic gradient descent based ETC algorithm that achieves an optimal regret upper bound of order $d\sqrt{T}/\epsilon$, up to a logarithmic factor, where $\epsilon>0$ is the privacy parameter. The regret upper bounds with and without LDP constraints are accompanied by newly constructed minimax lower bounds, which further characterize the cost of privacy. Extensive numerical experiments and a real data application on online lending are conducted to illustrate the efficiency and practical value of the proposed algorithms in dynamic pricing.
Updated: 2024-06-04 15:44:10
标题: 情境动态定价:算法、最优性和局部差分隐私约束
摘要: 我们研究了上下文动态定价问题,其中一家公司向$T$个顺序到达的消费者销售产品,这些消费者根据一个未知的需求模型行事。该公司旨在最大化其收入,即在一个事先知晓模型的预见者面前最小化后悔。需求模型是一个广义线性模型(GLM),允许在$\mathbb R^d$中具有随机特征向量,编码产品和消费者信息。我们首先表明,最优后悔上限的阶数为$\sqrt{dT}$,加上对数因子,相对于文献中现有的上限,提高了一个$\sqrt{d}$的因子。这种更尖锐的速率通过两种算法实现:置信度边界类型(supCB)算法和探索-然后-承诺(ETC)算法。我们理论结果的一个关键见解是动态定价与基于仔细离散化的具有许多臂的上下文多臂老虎机问题之间的内在联系。我们进一步研究了在局部差分隐私(LDP)约束下的上下文动态定价。特别地,我们提出了一个基于随机梯度下降的ETC算法,该算法实现了一个优化的后悔上限,阶数为$d\sqrt{T}/\epsilon$,加上对数因子,其中$\epsilon>0$是隐私参数。具有和没有LDP约束的后悔上限伴随着新构建的极小值下限,进一步刻画了隐私的成本。进行了广泛的数值实验和在线借贷的实际应用,以说明所提算法在动态定价中的效率和实际价值。
更新时间: 2024-06-04 15:44:10
领域: cs.LG,math.ST,stat.ME,stat.TH
ENOT: Expectile Regularization for Fast and Accurate Training of Neural Optimal Transport
We present a new approach for Neural Optimal Transport (NOT) training procedure, capable of accurately and efficiently estimating optimal transportation plan via specific regularization on dual Kantorovich potentials. The main bottleneck of existing NOT solvers is associated with the procedure of finding a near-exact approximation of the conjugate operator (i.e., the c-transform), which is done either by optimizing over non-convex max-min objectives or by the computationally intensive fine-tuning of the initial approximated prediction. We resolve both issues by proposing a new, theoretically justified loss in the form of expectile regularisation which enforces binding conditions on the learning process of dual potentials. Such a regularization provides the upper bound estimation over the distribution of possible conjugate potentials and makes the learning stable, completely eliminating the need for additional extensive fine-tuning. Proposed method, called Expectile-Regularised Neural Optimal Transport (ENOT), outperforms previous state-of-the-art approaches on the established Wasserstein-2 benchmark tasks by a large margin (up to a 3-fold improvement in quality and up to a 10-fold improvement in runtime). Moreover, we showcase performance of ENOT for varying cost functions on different tasks such as image generation, showing robustness of proposed algorithm.
Updated: 2024-06-04 15:41:11
标题: ENOT:期望值正则化用于神经最优输运的快速准确训练
摘要: 我们提出了一种新的神经最优输运(NOT)训练方法,能够通过对双Kantorovich势特定正则化来准确高效地估计最优输运方案。现有NOT求解器的主要瓶颈与寻找近似共轭算子(即c-transform)的过程有关,该过程通过优化非凸最大-最小目标或通过计算密集的初始近似预测微调来完成。我们通过提出一种新的理论上合理的expectile正则化形式的损失解决了这两个问题,该正则化对双势的学习过程施加约束条件。这种正则化为可能的共轭势分布提供了上限估计,并使学习稳定,完全消除了额外的广泛微调的需要。所提出的方法,称为Expectile-Regularised Neural Optimal Transport(ENOT),在已建立的Wasserstein-2基准任务上远远超过了先前的最先进方法(在质量上提高了3倍,在运行时间上提高了10倍)。此外,我们展示了ENOT在不同任务上变化成本函数的性能,例如图像生成,展示了所提出算法的鲁棒性。
更新时间: 2024-06-04 15:41:11
领域: cs.LG,cs.AI
Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning
In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? We investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our work makes three important technical contributions. First, we prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. Second, we show that our analysis framework extends seamlessly to nonlinear stochastic approximation schemes that subsume Q-learning. Third, we prove that for multi-agent TD learning, one can achieve linear convergence speedups with respect to the number of agents while communicating just $\tilde{O}(1)$ bits per iteration. Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our proofs hinge on the construction of novel Lyapunov functions that capture the dynamics of a memory variable introduced by error-feedback.
Updated: 2024-06-04 15:40:42
标题: 压缩更新的时间差异学习:错误反馈遇见强化学习
摘要: 在大规模分布式机器学习中,最近的研究已经研究了在随机优化中压缩梯度以缓解通信瓶颈的影响。这些研究共同揭示了随机梯度下降(SGD)对结构化扰动(如量化、稀疏化和延迟)是稳健的。也许令人惊讶的是,尽管多智能体强化学习引起了广泛兴趣,但几乎没有关于类似问题的了解:常见的强化学习(RL)算法是否也对类似的扰动具有稳健性?我们通过研究经典时间差分(TD)学习算法的变体来探讨这个问题,其中使用通用压缩算子来建模扰动。我们的工作有三个重要的技术贡献。首先,我们证明了压缩TD算法,结合在优化中广泛使用的误差反馈机制,具有与其SGD对应物相同的非渐近理论保证。其次,我们展示了我们的分析框架无缝地扩展到包含Q-learning的非线性随机逼近方案。第三,我们证明了对于多智能体TD学习,可以实现与代理数量成线性收敛速度提升,同时每次迭代通信只需$\tilde{O}(1)$比特。值得注意的是,这是RL领域中首个考虑通用压缩算子和误差反馈与线性函数逼近和马尔可夫采样并行的有限时间结果。我们的证明依赖于构建捕捉误差反馈引入的记忆变量动态的新型Lyapunov函数。
更新时间: 2024-06-04 15:40:42
领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC
IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI
Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMasks2
Updated: 2024-06-04 15:39:49
标题: IterMask2:通过空间和频率掩蔽进行脑部病变MRI图像的迭代式无监督异常分割
摘要: 无监督异常分割方法用于病理分割训练模型使用健康主体的图像,他们将其定义为“正常”数据分布。在推断过程中,他们旨在将新图像中的任何病理分割为“异常”,因为它们展现出与“正常”训练数据中的模式偏离的特征。现有方法遵循“破坏和重建”范式。他们有意破坏输入图像,重新构建以遵循学习的“正常”分布,并基于重建误差随后分割异常。然而,破坏输入图像不可避免地导致即使是正常区域的次优重建,从而导致误报。为了缓解这一问题,我们提出了一种新颖的迭代空间掩蔽精化策略IterMask2。我们迭代地掩盖图像的区域,重建它们,并基于重建误差更新掩蔽。这个迭代过程逐渐添加有关模型明确认为是正常的区域的信息。逐渐增加的内容指导附近掩蔽区域的重建,改善这些区域下正常组织的重建,减少误报。我们还使用高频图像内容作为辅助输入,为掩蔽区域提供额外的结构信息。这进一步改善了正常与异常区域的重建误差,便于后者的分割。我们在几个脑损伤数据集上进行实验证明了我们方法的有效性。代码可在以下链接找到:https://github.com/ZiyunLiang/IterMasks2
更新时间: 2024-06-04 15:39:49
领域: eess.IV,cs.CV,cs.LG
RoutePlacer: An End-to-End Routability-Aware Placer with Graph Neural Network
Placement is a critical and challenging step of modern chip design, with routability being an essential indicator of placement quality. Current routability-oriented placers typically apply an iterative two-stage approach, wherein the first stage generates a placement solution, and the second stage provides non-differentiable routing results to heuristically improve the solution quality. This method hinders jointly optimizing the routability aspect during placement. To address this problem, this work introduces RoutePlacer, an end-to-end routability-aware placement method. It trains RouteGNN, a customized graph neural network, to efficiently and accurately predict routability by capturing and fusing geometric and topological representations of placements. Well-trained RouteGNN then serves as a differentiable approximation of routability, enabling end-to-end gradient-based routability optimization. In addition, RouteGNN can improve two-stage placers as a plug-and-play alternative to external routers. Our experiments on DREAMPlace, an open-source AI4EDA platform, show that RoutePlacer can reduce Total Overflow by up to 16% while maintaining routed wirelength, compared to the state-of-the-art; integrating RouteGNN within two-stage placers leads to a 44% reduction in Total Overflow without compromising wirelength.
Updated: 2024-06-04 15:39:41
标题: RoutePlacer: 一种具有图神经网络的端到端可布线感知布局器
摘要: 放置是现代芯片设计的关键和具有挑战性的步骤,路由性是放置质量的重要指标。目前以路由性为导向的放置器通常采用迭代的两阶段方法,第一阶段生成放置解决方案,第二阶段提供非可微的路由结果以启发式地改善解决方案质量。这种方法阻碍了在放置过程中联合优化路由性方面。为了解决这个问题,本文介绍了RoutePlacer,一种端到端的路由性感知放置方法。它训练RouteGNN,一种定制的图神经网络,通过捕获和融合放置的几何和拓扑表示,高效准确地预测路由性。经过良好训练的RouteGNN可以作为路由性的可微近似,从而实现端到端基于梯度的路由性优化。此外,RouteGNN可以作为插拔式替代品,改进两阶段放置器。在一个开源的AI4EDA平台DREAMPlace上的实验证明,RoutePlacer可以将总体溢出量减少高达16%,而保持有线长度不变,相比于最先进的方法;将RouteGNN集成到两阶段放置器中可以将总体溢出量减少44%,而不影响有线长度。
更新时间: 2024-06-04 15:39:41
领域: cs.LG,cs.AI,cs.NI
Representing Piecewise-Linear Functions by Functions with Minimal Arity
Any continuous piecewise-linear function $F\colon \mathbb{R}^{n}\to \mathbb{R}$ can be represented as a linear combination of $\max$ functions of at most $n+1$ affine-linear functions. In our previous paper [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023], we showed that this upper bound of $n+1$ arguments is tight. In the present paper, we extend this result by establishing a correspondence between the function $F$ and the minimal number of arguments that are needed in any such decomposition. We show that the tessellation of the input space $\mathbb{R}^{n}$ induced by the function $F$ has a direct connection to the number of arguments in the $\max$ functions.
Updated: 2024-06-04 15:39:08
标题: 用最小元素个数表示分段线性函数
摘要: 任何连续分段线性函数$F\colon \mathbb{R}^{n}\to \mathbb{R}$都可以表示为至多$n+1$个仿射线性函数的$\max$函数的线性组合。在我们之前的论文[``通过小元数函数表示分段线性函数'', AAECC, 2023]中,我们证明了这个$n+1$参数的上界是紧的。在本文中,我们通过建立函数$F$与任何这种分解所需的最小参数数量之间的对应关系来扩展这一结果。我们表明由函数$F$诱导的输入空间$\mathbb{R}^{n}$的镶嵌与$\max$函数中参数数量之间有直接联系。
更新时间: 2024-06-04 15:39:08
领域: cs.DM,cs.LG,cs.SC
Approximate Nearest Neighbor Search with Window Filters
We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples of this problem. We propose and theoretically analyze a modular tree-based framework for transforming an index that solves the traditional c-approximate nearest neighbor problem into a data structure that solves window search. On standard nearest neighbor benchmark datasets equipped with random label values, adversarially constructed embeddings, and image search embeddings with real timestamps, we obtain up to a $75\times$ speedup over existing solutions at the same level of recall.
Updated: 2024-06-04 15:38:08
标题: 用窗口滤波器进行近似最近邻搜索
摘要: 我们定义并研究了$\textit{c-近似窗口搜索}$问题:近似最近邻搜索,其中数据集中的每个点都有一个数字标签,目标是在任意标签范围内找到查询的最近邻。许多语义搜索问题,如带有时间戳过滤器的图像和文档搜索,或带有成本过滤器的产品搜索,都是这个问题的自然例子。我们提出并理论上分析了一个基于模块化树的框架,用于将解决传统c-近似最近邻问题的索引转化为解决窗口搜索的数据结构。在配备随机标签值、对抗性构建的嵌入和带有实际时间戳的图像搜索嵌入的标准最近邻基准数据集上,我们在相同召回水平下获得了高达75倍的速度提升。
更新时间: 2024-06-04 15:38:08
领域: cs.DS,cs.IR,cs.LG
Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?
Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the performance of sixteen models across these different algorithmic foundations by assessing the pharmaceutical properties of the generated molecules and their docking affinities with specified target proteins. We highlight the unique advantages of each algorithmic approach and offer recommendations for the design of future SBDD models. We emphasize that 1D/2D ligand-centric drug design methods can be used in SBDD by treating the docking function as a black-box oracle, which is typically neglected. The empirical results show that 1D/2D methods achieve competitive performance compared with 3D-based methods that use the 3D structure of the target protein explicitly. Also, AutoGrow4, a 2D molecular graph-based genetic algorithm, dominates SBDD in terms of optimization ability. The relevant code is available in https://github.com/zkysfls/2024-sbdd-benchmark.
Updated: 2024-06-04 15:37:14
标题: 基于结构的药物设计基准:3D方法真的占主导地位吗?
摘要: 目前,基于结构的药物设计领域主要由三种主要类型的算法主导:基于搜索的算法,深度生成模型和强化学习。尽管现有的研究通常集中在比较单一算法类别内的模型,但跨算法比较仍然很少见。在本文中,为了填补这一空白,我们建立了一个基准来评估十六种模型在这些不同算法基础上的性能,通过评估生成分子的药物性能和它们与指定目标蛋白的对接亲和力。我们强调了每种算法方法的独特优势,并为未来的SBDD模型设计提供了建议。我们强调,1D/2D配体中心的药物设计方法可以通过将对接功能视为黑箱预测器来用于SBDD,这通常被忽略。实证结果表明,与明确使用目标蛋白的3D结构的3D方法相比,1D/2D方法实现了竞争性能。此外,AutoGrow4,一种基于2D分子图的遗传算法,在优化能力方面主导了SBDD。相关代码可在https://github.com/zkysfls/2024-sbdd-benchmark。
更新时间: 2024-06-04 15:37:14
领域: cs.LG,cs.AI,q-bio.QM
By Fair Means or Foul: Quantifying Collusion in a Market Simulation with Deep Reinforcement Learning
In the rapidly evolving landscape of eCommerce, Artificial Intelligence (AI) based pricing algorithms, particularly those utilizing Reinforcement Learning (RL), are becoming increasingly prevalent. This rise has led to an inextricable pricing situation with the potential for market collusion. Our research employs an experimental oligopoly model of repeated price competition, systematically varying the environment to cover scenarios from basic economic theory to subjective consumer demand preferences. We also introduce a novel demand framework that enables the implementation of various demand models, allowing for a weighted blending of different models. In contrast to existing research in this domain, we aim to investigate the strategies and emerging pricing patterns developed by the agents, which may lead to a collusive outcome. Furthermore, we investigate a scenario where agents cannot observe their competitors' prices. Finally, we provide a comprehensive legal analysis across all scenarios. Our findings indicate that RL-based AI agents converge to a collusive state characterized by the charging of supracompetitive prices, without necessarily requiring inter-agent communication. Implementing alternative RL algorithms, altering the number of agents or simulation settings, and restricting the scope of the agents' observation space does not significantly impact the collusive market outcome behavior.
Updated: 2024-06-04 15:35:08
标题: 用公平手段或卑鄙手段:利用深度强化学习量化市场模拟中的勾结
摘要: 在电子商务快速发展的背景下,基于人工智能(AI)的定价算法,特别是利用强化学习(RL)的算法,正在变得越来越普遍。这种增长导致了一个不可分割的定价情况,可能导致市场勾结。我们的研究采用了一个实验性寡头垄断模型,重复定价竞争,系统地变化环境以涵盖从基本经济理论到主观消费者需求偏好的各种情景。我们还引入了一个新颖的需求框架,使各种需求模型的实施成为可能,允许对不同模型进行加权混合。与该领域现有研究相反,我们的目标是调查代理商制定的策略和新兴的定价模式,这可能导致勾结结果。此外,我们调查了一种情况,其中代理商无法观察到竞争对手的价格。最后,我们在所有情况下提供了全面的法律分析。我们的研究结果表明,基于RL的AI代理商会收敛到一个以收取超竞争价格为特征的勾结状态,而不一定需要代理商之间的交流。实施替代的RL算法,改变代理商数量或模拟设置,并限制代理商的观察空间的范围并不会显著影响勾结市场结果行为。
更新时间: 2024-06-04 15:35:08
领域: cs.LG,cs.AI
Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds
Human motion taxonomies serve as high-level hierarchical abstractions that classify how humans move and interact with their environment. They have proven useful to analyse grasps, manipulation skills, and whole-body support poses. Despite substantial efforts devoted to design their hierarchy and underlying categories, their use remains limited. This may be attributed to the lack of computational models that fill the gap between the discrete hierarchical structure of the taxonomy and the high-dimensional heterogeneous data associated to its categories. To overcome this problem, we propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure. We achieve this by formulating a novel Gaussian process hyperbolic latent variable model that incorporates the taxonomy structure through graph-based priors on the latent space and distance-preserving back constraints. We validate our model on three different human motion taxonomies to learn hyperbolic embeddings that faithfully preserve the original graph structure. We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts. Finally, through proof-of-concept experiments, we show that our model may be used to generate realistic trajectories between the learned embeddings.
Updated: 2024-06-04 15:34:57
标题: 通过GPLVM在双曲流形上将运动分类引入连续域
摘要: 人类运动分类法充当了对人类如何移动和与环境互动进行高层次层次抽象的作用。它们已被证明对分析握持、操纵技能和整体支撑姿势非常有用。尽管在设计其层次结构和基础类别方面投入了大量工作,但它们的使用仍然有限。这可能归因于缺乏填补分类法离散层次结构与其类别相关的高维异质数据之间差距的计算模型。为了克服这一问题,我们提出通过捕捉相关的层次结构来对分类法数据进行超半球嵌入建模。我们通过制定一种新颖的高斯过程超半球潜变量模型,通过在潜在空间上基于图的先验和保持距离的背约束,将分类法结构纳入模型。我们在三种不同的人类运动分类法上验证了我们的模型,学习出忠实地保留原始图结构的超半球嵌入。我们展示了我们的模型正确地对现有或新的分类法类别中的未见数据进行编码,并且优于其欧几里德和基于VAE的对应物。最后,通过概念验证实验,我们展示了我们的模型可以用于生成学习嵌入之间的真实轨迹。
更新时间: 2024-06-04 15:34:57
领域: cs.RO,cs.LG
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the feature dimension $d$ and the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{O}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{O}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
Updated: 2024-06-04 15:34:55
标题: 多项式逻辑赌博机的近似极小化遗憾
摘要: 在本文中,我们研究了上下文多项式逻辑(MNL)赌博问题,其中学习代理根据上下文信息顺序选择一组物品,用户反馈遵循MNL选择模型。特征维度$d$和最大物品组合大小$K$之间存在显著的遗憾下限和上限之间的差异。此外,在这些上下限之间的奖励结构变化使得寻找最优性变得复杂。在均匀奖励情况下,其中所有物品的期望奖励相同,我们建立了一个遗憾下限为$\Omega(d\sqrt{\smash[b]{T/K}})$的算法,提出了一个常数时间算法OFU-MNL+,实现了一个匹配的上限为$\tilde{O}(d\sqrt{\smash[b]{T/K}})$。在非均匀奖励情况下,我们证明了一个遗憾下限为$\Omega(d\sqrt{T})$和一个上限为$\tilde{O}(d\sqrt{T})$,OFU-MNL+也能达到这个上限。我们的实证研究支持了这些理论发现。据我们所知,这是上下文MNL赌博文献中第一篇证明极小最优性的工作,无论是在均匀还是非均匀奖励设置下,并提出了一个计算效率高的算法,可以实现这种最优性直到对数因子。
更新时间: 2024-06-04 15:34:55
领域: stat.ML,cs.LG
ExGRG: Explicitly-Generated Relation Graph for Self-Supervised Representation Learning
Self-supervised Learning (SSL) has emerged as a powerful technique in pre-training deep learning models without relying on expensive annotated labels, instead leveraging embedded signals in unlabeled data. While SSL has shown remarkable success in computer vision tasks through intuitive data augmentation, its application to graph-structured data poses challenges due to the semantic-altering and counter-intuitive nature of graph augmentations. Addressing this limitation, this paper introduces a novel non-contrastive SSL approach to Explicitly Generate a compositional Relation Graph (ExGRG) instead of relying solely on the conventional augmentation-based implicit relation graph. ExGRG offers a framework for incorporating prior domain knowledge and online extracted information into the SSL invariance objective, drawing inspiration from the Laplacian Eigenmap and Expectation-Maximization (EM). Employing an EM perspective on SSL, our E-step involves relation graph generation to identify candidates to guide the SSL invariance objective, and M-step updates the model parameters by integrating the derived relational information. Extensive experimentation on diverse node classification datasets demonstrates the superiority of our method over state-of-the-art techniques, affirming ExGRG as an effective adoption of SSL for graph representation learning.
Updated: 2024-06-04 15:30:15
标题: ExGRG:自监督表示学习的显式生成关系图
摘要: 自监督学习(SSL)已经成为一种强大的技术,在不依赖昂贵的标注标签的情况下,通过利用未标记数据中的嵌入信号来预训练深度学习模型。虽然SSL在计算机视觉任务中通过直观的数据增强取得了显著成功,但是其在图结构数据上的应用由于图增强的语义改变和反直觉性质而面临挑战。为了解决这一限制,本文引入了一种新颖的非对比SSL方法,即明确生成组合关系图(ExGRG),而不仅仅依赖于传统的基于增强的隐式关系图。ExGRG提供了一个框架,用于将先验领域知识和在线提取的信息整合到SSL不变性目标中,从拉普拉斯特征映射和期望最大化(EM)中汲取灵感。通过在SSL上采用EM视角,我们的E步骤涉及关系图生成,以识别候选指导SSL不变性目标的候选者,而M步骤则通过整合衍生的关系信息来更新模型参数。对多种节点分类数据集进行广泛的实验表明,我们的方法优于最先进的技术,证实了ExGRG作为图表示学习中SSL的有效采用。
更新时间: 2024-06-04 15:30:15
领域: cs.LG,cs.AI
Semi-Supervised Learning guided by the Generalized Bayes Rule under Soft Revision
We provide a theoretical and computational investigation of the Gamma-Maximin method with soft revision, which was recently proposed as a robust criterion for pseudo-label selection (PLS) in semi-supervised learning. Opposed to traditional methods for PLS we use credal sets of priors ("generalized Bayes") to represent the epistemic modeling uncertainty. These latter are then updated by the Gamma-Maximin method with soft revision. We eventually select pseudo-labeled data that are most likely in light of the least favorable distribution from the so updated credal set. We formalize the task of finding optimal pseudo-labeled data w.r.t. the Gamma-Maximin method with soft revision as an optimization problem. A concrete implementation for the class of logistic models then allows us to compare the predictive power of the method with competing approaches. It is observed that the Gamma-Maximin method with soft revision can achieve very promising results, especially when the proportion of labeled data is low.
Updated: 2024-06-04 15:28:34
标题: 受软修订指导的广义贝叶斯规则下的半监督学习
摘要: 我们提供了一个关于Gamma-Maximin方法与软修订的理论和计算研究,这一方法最近被提出作为半监督学习中伪标签选择(PLS)的稳健准则。与传统的PLS方法相反,我们使用先验的信念集(“广义贝叶斯”)来表示认知建模的不确定性。然后,这些信念集通过Gamma-Maximin方法与软修订进行更新。最终,我们选择在更新后的信念集中最有可能的伪标记数据,以考虑最不利分布。我们将寻找相对于Gamma-Maximin方法与软修订的最佳伪标记数据的任务形式化为一个优化问题。然后,对于逻辑模型类别的具体实现使我们能够比较该方法与竞争方法的预测能力。观察到,Gamma-Maximin方法与软修订在标记数据比例较低时可以取得非常有前途的结果。
更新时间: 2024-06-04 15:28:34
领域: stat.ML,cs.AI,cs.LG,math.ST,stat.ME,stat.TH,62C12 62C10,I.2.6; G.3
A Multi-Perspective Analysis of Memorization in Large Language Models
Large Language Models (LLMs), trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memorization, the memorization of LLMs still lacks explanation, especially the cause of memorization and the dynamics of generating them. In this research, we comprehensively discussed memorization from various perspectives and extended the discussion scope to not only just the memorized content but also less and unmemorized content. Through various studies, we found that: (1) Through experiments, we revealed the relation of memorization between model size, continuation size, and context size. Further, we showed how unmemorized sentences transition to memorized sentences. (2) Through embedding analysis, we showed the distribution and decoding dynamics across model size in embedding space for sentences with different memorization scores. The n-gram statistics analysis presents d (3) An analysis over n-gram and entropy decoding dynamics discovered a boundary effect when the model starts to generate memorized sentences or unmemorized sentences. (4)We trained a Transformer model to predict the memorization of different models, showing that it is possible to predict memorizations by context.
Updated: 2024-06-04 15:28:20
标题: 大型语言模型中记忆的多角度分析
摘要: 大型语言模型(LLMs)在庞大的语料库上进行训练,拥有数十亿参数,展现出在各个领域中前所未有的性能。尽管研究人员对它们的出色表现感到惊讶,但也注意到了这些LLMs的一些特殊行为。其中之一是记忆,LLMs可以生成训练它们的相同内容。尽管先前的研究讨论了记忆,但LLMs的记忆仍然缺乏解释,尤其是记忆的原因和生成过程的动态。在这项研究中,我们全面讨论了记忆从各种角度,并将讨论范围扩展到不仅仅是记忆的内容,还包括更少和未记忆的内容。通过各种研究,我们发现:(1)通过实验,我们揭示了模型大小、连续大小和上下文大小之间的记忆关系。此外,我们展示了未记忆的句子如何过渡为记忆的句子。(2)通过嵌入分析,我们展示了在嵌入空间中不同记忆分数的句子在模型大小上的分布和解码动态。 n-gram统计分析呈现出(3)n-gram和熵解码动态的分析发现了当模型开始生成记忆的句子或未记忆的句子时的边界效应。 (4)我们训练了一个Transformer模型来预测不同模型的记忆,表明通过上下文可以预测记忆。
更新时间: 2024-06-04 15:28:20
领域: cs.CL,cs.AI
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials
In practice, training using federated learning can be orders of magnitude slower than standard centralized training. This severely limits the amount of experimentation and tuning that can be done, making it challenging to obtain good performance on a given task. Server-side proxy data can be used to run training simulations, for instance for hyperparameter tuning. This can greatly speed up the training pipeline by reducing the number of tuning runs to be performed overall on the true clients. However, it is challenging to ensure that these simulations accurately reflect the dynamics of the real federated training. In particular, the proxy data used for simulations often comes as a single centralized dataset without a partition into distinct clients, and partitioning this data in a naive way can lead to simulations that poorly reflect real federated training. In this paper we address the challenge of how to partition centralized data in a way that reflects the statistical heterogeneity of the true federated clients. We propose a fully federated, theoretically justified, algorithm that efficiently learns the distribution of the true clients and observe improved server-side simulations when using the inferred distribution to create simulated clients from the centralized data.
Updated: 2024-06-04 15:27:53
标题: 使用狄利克雷-多项式混合模型改进联合数据集建模
摘要: 在实践中,使用联邦学习进行训练可能比标准集中式训练慢几个数量级。这严重限制了可以进行的实验和调整的数量,使得在特定任务上获得良好性能变得具有挑战性。服务器端代理数据可用于运行训练模拟,例如用于超参数调整。这可以通过减少在真实客户端上总体执行的调整次数,大大加快训练流程。然而,确保这些模拟准确反映真实联邦训练的动态是具有挑战性的。特别是,用于模拟的代理数据通常作为单个集中式数据集提供,没有划分为不同客户端,以天真的方式对数据进行划分可能导致模拟不能很好地反映真实联邦训练。在本文中,我们解决了如何以反映真正联邦客户端的统计异质性的方式划分集中式数据的挑战。我们提出了一个完全联邦、在理论上得到验证的算法,通过有效地学习真实客户端的分布,并观察到在使用推断的分布从集中式数据创建模拟客户端时改进了服务器端模拟。
更新时间: 2024-06-04 15:27:53
领域: cs.LG,cs.DC
Fair Wasserstein Coresets
Data distillation and coresets have emerged as popular approaches to generate a smaller representative set of samples for downstream learning tasks to handle large-scale datasets. At the same time, machine learning is being increasingly applied to decision-making processes at a societal level, making it imperative for modelers to address inherent biases towards subgroups present in the data. While current approaches focus on creating fair synthetic representative samples by optimizing local properties relative to the original samples, their impact on downstream learning processes has yet to be explored. In this work, we present fair Wasserstein coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks. FWC uses an efficient majority minimization algorithm to minimize the Wasserstein distance between the original dataset and the weighted synthetic samples while enforcing demographic parity. We show that an unconstrained version of FWC is equivalent to Lloyd's algorithm for k-medians and k-means clustering. Experiments conducted on both synthetic and real datasets show that FWC: (i) achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches, (ii) improves downstream fairness when added to the existing training data and (iii) can be used to reduce biases in predictions from large language models (GPT-3.5 and GPT-4).
Updated: 2024-06-04 15:21:50
标题: 公平Wasserstein核心集
摘要: 数据精炼和核心集已经成为生成较小的代表性样本集以处理大规模数据集的下游学习任务的流行方法。与此同时,机器学习正日益应用于社会层面的决策过程,这使得对模型构建者来说有必要解决数据中存在的对亚组的固有偏见。尽管当前方法侧重于通过优化相对于原始样本的本地属性来创建公平的合成代表性样本,但它们对下游学习过程的影响尚未被探讨。在这项工作中,我们提出了公平Wasserstein核心集(FWC),这是一种新颖的核心集方法,它生成公平的合成代表性样本以及用于下游学习任务的样本级权重。FWC使用高效的多数最小化算法来最小化原始数据集和加权合成样本之间的Wasserstein距离,同时强制执行民族平等。我们展示了FWC的无约束版本等效于Lloyd的k-中值和k-均值聚类算法。在合成和真实数据集上进行的实验表明,FWC:(i)与现有方法相比,在下游模型中实现了竞争性的公平-效用权衡,(ii)在添加到现有训练数据时改善了下游的公平性,(iii)可用于减少大型语言模型(GPT-3.5和GPT-4)的预测中的偏见。
更新时间: 2024-06-04 15:21:50
领域: stat.ML,cs.CY,cs.LG
Identifying Equivalent Training Dynamics
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that it can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize it to: identify non-conjugate training dynamics between shallow and wide fully connected neural networks; characterize the early phase of training dynamics in convolutional neural networks; uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking. Our results, across a range of DNN architectures, illustrate the flexibility of our framework and highlight its potential for shedding new light on training dynamics.
Updated: 2024-06-04 15:20:15
标题: 识别等效训练动力学
摘要: 研究了深度神经网络(DNN)参数在训练过程中经历的非线性演变,揭示了不同动态行为的区域。尽管对这些现象的详细理解有望促进训练效率和稳健性的改进,但由于缺乏识别DNN模型具有等效动态的方法,限制了可以从先前工作中获得的见解。拓扑共轭是动态系统理论中的一个概念,提供了动态等效的精确定义,为解决这一需求提供了可能的途径。然而,历史上计算拓扑共轭一直具有挑战性。通过利用库普曼算子理论的进展,我们开发了一个框架来识别共轭和非共轭的训练动态。为了验证我们的方法,我们展示它可以正确识别在线镜像下降和在线梯度下降之间已知的等效性。然后,我们利用它来识别浅而宽的全连接神经网络之间的非共轭训练动态;表征卷积神经网络训练动态的早期阶段;发现变压器中经历或未经历grokking的非共轭训练动态。我们的结果跨越一系列DNN架构,展示了我们框架的灵活性,并突显了它在揭示训练动态方面的潜力。
更新时间: 2024-06-04 15:20:15
领域: cs.LG,cs.AI,math.DS
Alternative Methods to SHAP Derived from Properties of Kernels: A Note on Theoretical Analysis
This study first derives a general and analytical expression of AFA (Additive Feature Attribution) in terms of the kernel in LIME (Local Interpretable Model-agnostic Explanations). Then, we propose some new AFAs that have appropriate properties of kernels or that coincide with the LS prenucleolus in cooperative game theory. We also revisit existing AFAs such as SHAP (SHapley Additive exPlanations) and re-examine the properties of their kernels.
Updated: 2024-06-04 15:16:28
标题: 基于核函数特性的SHAP的替代方法:理论分析注解
摘要: 本研究首先推导了AFA(Additive Feature Attribution)的一般和分析表达式,这是基于LIME(Local Interpretable Model-agnostic Explanations)中的核的。然后,我们提出了一些具有适当核属性或与合作博弈理论中的LS预核相符的新的AFA。我们还重新审视了现有的AFAs,如SHAP(SHapley Additive exPlanations),并重新检查其核的属性。
更新时间: 2024-06-04 15:16:28
领域: cs.LG
Inhomogeneous graph trend filtering via a l2,0 cardinality penalty
We study estimation of piecewise smooth signals over a graph. We propose a $\ell_{2,0}$-norm penalized Graph Trend Filtering (GTF) model to estimate piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness across the nodes. We prove that the proposed GTF model is simultaneously a k-means clustering on the signal over the nodes and a minimum graph cut on the edges of the graph, where the clustering and the cut share the same assignment matrix. We propose two methods to solve the proposed GTF model: a spectral decomposition method and a method based on simulated annealing. In the experiment on synthetic and real-world datasets, we show that the proposed GTF model has a better performances compared with existing approaches on the tasks of denoising, support recovery and semi-supervised classification. We also show that the proposed GTF model can be solved more efficiently than existing models for the dataset with a large edge set.
Updated: 2024-06-04 15:13:12
标题: 通过L2,0基数惩罚的非均匀图趋势过滤
摘要: 我们研究了在图上估计分段平滑信号。我们提出了一个$\ell_{2,0}$-范数惩罚的图趋势滤波(GTF)模型,用于估计在节点上表现出不均匀平滑水平的分段平滑图信号。我们证明了所提出的GTF模型同时是信号在节点上的k均值聚类和图的最小割,其中聚类和割共享相同的分配矩阵。我们提出了两种方法来解决所提出的GTF模型:一种是基于谱分解的方法,另一种是基于模拟退火的方法。在合成和真实数据集上的实验中,我们展示了所提出的GTF模型在去噪、支持恢复和半监督分类任务上与现有方法相比具有更好的性能。我们还展示了所提出的GTF模型可以比现有模型更高效地解决具有大量边集的数据集。
更新时间: 2024-06-04 15:13:12
领域: cs.LG,cs.SI,stat.ML,65F50, 68U01, 68R01,G.1.6; G.1.10
Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning
Solving Constrained Horn Clauses (CHCs) is a fundamental challenge behind a wide range of verification and analysis tasks. Data-driven approaches show great promise in improving CHC solving without the painstaking manual effort of creating and tuning various heuristics. However, a large performance gap exists between data-driven CHC solvers and symbolic reasoning-based solvers. In this work, we develop a simple but effective framework, "Chronosymbolic Learning", which unifies symbolic information and numerical data points to solve a CHC system efficiently. We also present a simple instance of Chronosymbolic Learning with a data-driven learner and a BMC-styled reasoner. Despite its relative simplicity, experimental results show the efficacy and robustness of our tool. It outperforms state-of-the-art CHC solvers on a dataset consisting of 288 benchmarks, including many instances with non-linear integer arithmetics.
Updated: 2024-06-04 15:11:50
标题: 时间符号学习:利用符号推理和归纳学习高效解决CHC问题
摘要: 解决约束Horn子句(CHCs)是一系列验证和分析任务背后的基本挑战。数据驱动方法在改进CHC求解方面显示出巨大潜力,而无需费力地手动创建和调整各种启发式。然而,数据驱动的CHC求解器与基于符号推理的求解器之间存在着较大的性能差距。在这项工作中,我们开发了一个简单但有效的框架,“Chronosymbolic Learning”,它将符号信息和数值数据点统一起来,以有效解决CHC系统。我们还提出了一个简单的Chronosymbolic Learning实例,其中包含一个数据驱动的学习器和一个类似BMC的推理器。尽管相对简单,实验结果显示了我们工具的有效性和稳健性。它在一个包含288个基准测试的数据集上表现优于最先进的CHC求解器,其中包括许多具有非线性整数算术的实例。
更新时间: 2024-06-04 15:11:50
领域: cs.LO,cs.AI,cs.PL
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
Updated: 2024-06-04 15:11:27
标题: 《斯堪的纳维亚嵌入基准:多语言和单语文本嵌入的全面评估》
摘要: 英文文本嵌入的评估已经从评估少数数据集转变为通过MTEB等基准对许多任务进行广泛覆盖。然而,由于缺乏可用的基准测试,多语言文本嵌入并非如此。为了解决这个问题,我们引入了北欧语言嵌入基准(SEB)。SEB是一个全面的框架,可以对北欧语言的文本嵌入进行评估,涵盖24个任务、10个子任务和4个任务类别。在SEB的基础上,我们评估了超过26个模型,发现了公共和商业解决方案之间的显著性能差异,这是先前未被MTEB捕获的。我们开源了SEB并将其与MTEB集成,从而弥合了北欧语言的文本嵌入评估差距。
更新时间: 2024-06-04 15:11:27
领域: cs.CL,cs.AI
Majority Vote for Distributed Differentially Private Sign Selection
Privacy-preserving data analysis has become more prevalent in recent years. In this study, we propose a distributed group differentially private Majority Vote mechanism, for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. For enhanced applicability, we study the private sign selection for mean estimation and linear regression problems, in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified by theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of the proposed method.
Updated: 2024-06-04 15:11:25
标题: 分布式差分隐私符号选择的多数投票
摘要: 隐私保护数据分析在近年来变得更加普遍。在这项研究中,我们提出了一种分布式组差异隐私多数投票机制,用于分布式设置中的符号选择问题。为了实现这一目标,我们应用迭代剥离到稳定函数,并使用指数机制来恢复符号。为了增强适用性,我们研究了分布系统中的均值估计和线性回归问题的私有符号选择。我们的方法以与非私有场景相同的最佳信噪比恢复支持和符号,优于私有变量选择的现代作品。此外,符号选择的一致性得到了理论保证。进行了模拟研究以展示所提出方法的有效性。
更新时间: 2024-06-04 15:11:25
领域: cs.CR,cs.LG,stat.ME,stat.ML
GrootVL: Tree Topology is All You Need in State Space Model
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. Then, feature propagation is performed based on this graph, thereby breaking the original sequence constraints to achieve stronger representation capabilities. Additionally, we introduce a linear complexity dynamic programming algorithm to enhance long-range interactions without increasing computational cost. GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Extensive experiments demonstrate that our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. Besides, by fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
Updated: 2024-06-04 15:09:29
标题: GrootVL:树拓扑是状态空间模型中所需的一切
摘要: 状态空间模型采用递归传播特征,展示了与Transformer模型相当的强大表示能力和优越的效率。然而,受序列固有的几何约束的限制,它仍然在建模长距离依赖方面存在不足。为了解决这个问题,我们提出了GrootVL网络,首先根据空间关系和输入特征动态生成树拓扑结构。然后,基于该图进行特征传播,从而打破原始序列的约束,实现更强的表示能力。此外,我们引入了一个线性复杂度的动态规划算法,以增强长距离交互作用,而不增加计算成本。GrootVL是一个多功能的多模态框架,可应用于视觉和文本任务。大量实验证明,我们的方法在图像分类、对象检测和分割等任务上明显优于现有的结构化状态空间模型。此外,通过对大型语言模型进行微调,我们的方法在多个文本任务中实现了一致的改进,而训练成本较低。
更新时间: 2024-06-04 15:09:29
领域: cs.LG,cs.CV
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data
Large Language Models (LLMs) like ChatGPT demonstrate significant potential in the medical field, often evaluated using multiple-choice questions (MCQs) similar to those found on the USMLE. Despite their prevalence in medical education, MCQs have limitations that might be exacerbated when assessing LLMs. To evaluate the effectiveness of MCQs in assessing the performance of LLMs, we developed a fictional medical benchmark focused on a non-existent gland, the Glianorex. This approach allowed us to isolate the knowledge of the LLM from its test-taking abilities. We used GPT-4 to generate a comprehensive textbook on the Glianorex in both English and French and developed corresponding multiple-choice questions in both languages. We evaluated various open-source, proprietary, and domain-specific LLMs using these questions in a zero-shot setting. The models achieved average scores around 67%, with minor performance differences between larger and smaller models. Performance was slightly higher in English than in French. Fine-tuned medical models showed some improvement over their base versions in English but not in French. The uniformly high performance across models suggests that traditional MCQ-based benchmarks may not accurately measure LLMs' clinical knowledge and reasoning abilities, instead highlighting their pattern recognition skills. This study underscores the need for more robust evaluation methods to better assess the true capabilities of LLMs in medical contexts.
Updated: 2024-06-04 15:08:56
标题: 多项选择题和大型语言模型:虚构医疗数据案例研究
摘要: 大型语言模型(LLMs)如ChatGPT在医学领域展现出巨大潜力,常常使用类似于美国医师执照考试(USMLE)中的多项选择题(MCQs)进行评估。尽管MCQs在医学教育中很常见,但在评估LLMs时可能存在一些限制。为了评估MCQs在评估LLMs性能方面的有效性,我们开发了一个关于一个不存在的腺体——Glianorex的虚构医学基准。这种方法使我们能够将LLMs的知识与其应试能力分离开来。我们使用GPT-4生成了一本关于Glianorex的综合教科书,分别用英语和法语编写,并开发了对应的多项选择题。我们在零样本设置中使用这些问题评估了各种开源、专有和领域特定的LLMs。这些模型的平均得分约为67%,较大模型和较小模型之间的性能差异很小。英语的表现略高于法语。在英语中,微调的医学模型在基础版本上有所改进,但在法语中没有。跨模型的统一高性能表明,传统基于MCQs的基准可能无法准确衡量LLMs的临床知识和推理能力,而是突出了它们的模式识别技能。这项研究强调了在医学背景下更为健全的评估方法的必要性,以更好地评估LLMs在医学环境中的真实能力。
更新时间: 2024-06-04 15:08:56
领域: cs.CL,cs.AI,cs.LG
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.
Updated: 2024-06-04 15:08:15
标题: DiarizationLM:使用大型语言模型进行说话者辨识后处理
摘要: 在这篇论文中,我们介绍了DiarizationLM,这是一个利用大型语言模型(LLM)来后处理说话者辨识系统输出的框架。提出的框架可以实现各种目标,例如提高辨识转录的可读性,或降低词辨识错误率(WDER)。在这个框架中,自动语音识别(ASR)和说话者辨识系统的输出被表示为一个紧凑的文本格式,该格式包含在可选地微调的LLM的提示中。LLM的输出可以作为经过改进的辨识结果来使用。作为后处理步骤,这个框架可以轻松应用于任何现成的ASR和说话者辨识系统,而无需重新训练现有组件。我们的实验表明,经过微调的PaLM 2-S模型可以在Fisher电话会话数据集上将WDER降低55.5%,在Callhome英语数据集上降低44.9%。
更新时间: 2024-06-04 15:08:15
领域: eess.AS,cs.LG,cs.SD
PASOA- PArticle baSed Bayesian Optimal Adaptive design
We propose a new procedure named PASOA, for Bayesian experimental design, that performs sequential design optimization by simultaneously providing accurate estimates of successive posterior distributions for parameter inference. The sequential design process is carried out via a contrastive estimation principle, using stochastic optimization and Sequential Monte Carlo (SMC) samplers to maximise the Expected Information Gain (EIG). As larger information gains are obtained for larger distances between successive posterior distributions, this EIG objective may worsen classical SMC performance. To handle this issue, tempering is proposed to have both a large information gain and an accurate SMC sampling, that we show is crucial for performance. This novel combination of stochastic optimization and tempered SMC allows to jointly handle design optimization and parameter inference. We provide a proof that the obtained optimal design estimators benefit from some consistency property. Numerical experiments confirm the potential of the approach, which outperforms other recent existing procedures.
Updated: 2024-06-04 15:01:50
标题: PASOA-基于粒子的贝叶斯最优自适应设计
摘要: 我们提出了一个名为PASOA的新程序,用于贝叶斯实验设计,通过同时提供参数推断的连续后验分布的准确估计,执行顺序设计优化。顺序设计过程通过对比估计原则进行,使用随机优化和顺序蒙特卡洛(SMC)取样器来最大化预期信息增益(EIG)。由于在连续后验分布之间的距离越大,获得的信息增益就越大,这一EIG目标可能会降低经典SMC的性能。为了解决这个问题,提出了调温方法,旨在既获得大的信息增益,又获得准确的SMC取样,我们证明这对性能至关重要。这种随机优化和调温SMC的新颖组合允许同时处理设计优化和参数推断。我们提供了一种证明,表明获得的最优设计估计值具有一些一致性属性。数值实验证实了该方法的潜力,优于其他最近存在的程序。
更新时间: 2024-06-04 15:01:50
领域: stat.ML,cs.LG,stat.CO,stat.ME
Learning to Edit Visual Programs with Self-Supervision
We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target. In order to apply this scheme for domains that lack program annotations, we develop a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot. Our joint finetuning scheme, when coupled with an inference procedure that initializes a population from the one-shot model and evolves members of this population with the edit network, helps to infer more accurate visual programs. Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages.
Updated: 2024-06-04 14:59:38
标题: 学习使用自监督来编辑视觉程序
摘要: 我们设计了一个系统,该系统学习如何编辑视觉程序。我们的编辑网络接收完整的输入程序和一个视觉目标。从这个输入中,我们要求网络预测一个局部编辑操作,该操作可应用于输入程序以改善其与目标的相似度。为了将这个方案应用于缺乏程序注释的领域,我们开发了一种自监督学习方法,将这个编辑网络整合到一个自举微调循环中,该循环还包括一个可以一次性预测整个程序的网络。我们的联合微调方案,当与初始化一个来自一次性模型的种群并用编辑网络演化这个种群的成员的推理过程相结合时,有助于推断出更准确的视觉程序。在多个领域中,我们通过实验将我们的方法与仅使用一次性模型的替代方法进行了比较,并发现即使在相等的搜索时间预算下,我们基于编辑的范式也提供了显著的优势。
更新时间: 2024-06-04 14:59:38
领域: cs.CV,cs.AI,cs.GR,cs.LG
Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction
We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules. Interpreting RNA structures as weighted graphs, we employ deep learning to estimate the probability of base pairing between nucleotide residues. Unique to our model are its massive 11-pixel kernels, which we argue provide a distinct advantage for FCNs on the specialized domain of RNA secondary structures. On a widely adopted, standardized test set comprised of 1,305 molecules, the accuracy of our method exceeds that of current state-of-the-art (SOTA) secondary structure prediction software, achieving a Matthews Correlation Coefficient (MCC) over 11-40% higher than that of other leading methods on overall structures and 58-400% higher on pseudoknots specifically.
Updated: 2024-06-04 14:58:10
标题: 剪纸艺术:大型卷积核改进基于深度学习的RNA二级结构预测
摘要: 我们介绍了一种新颖的全卷积神经网络(FCN)架构,用于预测核糖核酸(RNA)分子的二级结构。将RNA结构解释为加权图形,我们利用深度学习来估计核苷酸残基之间的碱基配对概率。我们模型的独特之处在于其庞大的11像素卷积核,我们认为这为FCNs在RNA二级结构的专业领域提供了明显优势。在一个广泛采用的、包含1,305个分子的标准化测试集上,我们方法的准确性超过了当前最先进的二级结构预测软件,实现了比其他领先方法整体结构高11-40%的马修斯相关系数(MCC),特别在伪结上高出58-400%。
更新时间: 2024-06-04 14:58:10
领域: q-bio.BM,cs.AI
Dynamical Survival Analysis with Controlled Latent States
We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management.
Updated: 2024-06-04 14:57:58
标题: 动态存活分析与控制潜在状态
摘要: 我们考虑从一组静态变量和不规则采样的时间序列中学习计数过程的个体特定强度的任务。我们引入了一种新颖的建模方法,其中强度是受控微分方程的解。我们首先设计了一个神经估计器,通过神经受控微分方程的构建来实现。其次,我们展示了在足够的正则条件下,我们的模型可以在签名空间中线性化,得到一个基于签名的估计器,我们称之为CoxSig。我们为这两个估计器提供了理论学习保证,然后展示了我们的模型在金融、预测性维护和食品供应链管理等各种模拟和真实数据集上的性能。
更新时间: 2024-06-04 14:57:58
领域: stat.ML,cs.LG
XRec: Large Language Models for Explainable Recommendation
Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanations for the recommended items. Explainable recommendations aim to address this gap by offering transparency and insights into the recommendation decision-making process, enhancing users' understanding. This work leverages the language capabilities of Large Language Models (LLMs) to push the boundaries of explainable recommender systems. We introduce a model-agnostic framework called XRec, which enables LLMs to provide comprehensive explanations for user behaviors in recommender systems. By integrating collaborative signals and designing a lightweight collaborative adaptor, the framework empowers LLMs to understand complex patterns in user-item interactions and gain a deeper understanding of user preferences. Our extensive experiments demonstrate the effectiveness of XRec, showcasing its ability to generate comprehensive and meaningful explanations that outperform baseline approaches in explainable recommender systems. We open-source our model implementation at https://github.com/HKUDS/XRec.
Updated: 2024-06-04 14:55:14
标题: XRec:可解释推荐的大型语言模型
摘要: 推荐系统帮助用户在信息过载的情况下提供个性化推荐,与他们的偏好相符。协同过滤(CF)是一种广泛采用的方法,但是虽然像图神经网络(GNN)和自监督学习(SSL)这样的先进技术已经增强了CF模型以获得更好的用户表示,但它们通常缺乏为推荐项目提供解释的能力。可解释的推荐旨在通过提供透明性和洞察力来解决这一差距,增强用户的理解。这项工作利用大型语言模型(LLM)的语言能力来推动可解释的推荐系统的边界。我们引入了一个名为XRec的模型无关框架,使LLM能够为推荐系统中用户行为提供全面的解释。通过整合协作信号并设计一个轻量级的协作适配器,该框架使LLM能够理解用户与项目的互动中的复杂模式,并深入了解用户偏好。我们的大量实验证明了XRec的有效性,展示了其能够生成比可解释推荐系统中基线方法更全面和有意义的解释。我们在https://github.com/HKUDS/XRec上开源我们的模型实现。
更新时间: 2024-06-04 14:55:14
领域: cs.IR,cs.AI,cs.CL
An Empirical Analysis on Large Language Models in Debate Evaluation
In this study, we investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation. We discover that LLM's performance exceeds humans and surpasses the performance of state-of-the-art methods fine-tuned on extensive datasets in debate evaluation. We additionally explore and analyze biases present in LLMs, including positional bias, lexical bias, order bias, which may affect their evaluative judgments. Our findings reveal a consistent bias in both GPT-3.5 and GPT-4 towards the second candidate response presented, attributed to prompt design. We also uncover lexical biases in both GPT-3.5 and GPT-4, especially when label sets carry connotations such as numerical or sequential, highlighting the critical need for careful label verbalizer selection in prompt design. Additionally, our analysis indicates a tendency of both models to favor the debate's concluding side as the winner, suggesting an end-of-discussion bias.
Updated: 2024-06-04 14:51:25
标题: 对辩论评价中大型语言模型的实证分析
摘要: 在这项研究中,我们调查了GPT-3.5和GPT-4等先进大型语言模型(LLMs)在辩论评估领域的能力和固有偏见。我们发现LLM的表现超过了人类,并超过了在大量数据集上进行微调的最先进方法在辩论评估中的表现。我们另外探讨并分析LLMs中存在的偏见,包括位置偏见、词汇偏见、顺序偏见,这些可能影响他们的评价判断。我们的发现显示,无论是GPT-3.5还是GPT-4,在倾向第二个候选人回应的偏见方面保持一致,这可以归因于提示设计。我们还发现了GPT-3.5和GPT-4中的词汇偏见,尤其是当标签集携带数字或序列等内涵时,突显了在提示设计中仔细选择标签说出者的迫切需要。此外,我们的分析表明,这两种模型都倾向于将辩论的结论方作为胜利者,这表明了一种结束讨论的偏见。
更新时间: 2024-06-04 14:51:25
领域: cs.CL,cs.AI
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questions and short answer questions for five editorial tasks in 24 news domains. To measure performances, we propose different GPT-4 based automatic evaluation protocols to assess LLM generations for short answer questions in terms of writing proficiency and safety adherence, and both are validated by the high correlations with human evaluations. Based on the systematic evaluation framework, we conduct a comprehensive analysis of ten popular LLMs which can handle Chinese. The experimental results highlight GPT-4 and ERNIE Bot as top performers, yet reveal a relative deficiency in journalistic safety adherence in creative writing tasks. Our findings also underscore the need for enhanced ethical guidance in machine-generated journalistic content, marking a step forward in aligning LLMs with journalistic standards and safety considerations.
Updated: 2024-06-04 14:50:58
标题: NewsBench:用于评估大型语言模型在中国新闻业编辑能力的系统评估框架
摘要: 我们提出了NewsBench,一个新颖的评估框架,用于系统评估大型语言模型(LLMs)在中国新闻编辑能力方面的能力。我们构建的基准数据集集中在写作能力的四个方面和安全依从的六个方面,并包括手工设计的1,267个测试样本,涵盖了24个新闻领域中五项编辑任务的多项选择题和简答题。为了衡量性能,我们提出了基于不同GPT-4的自动评估协议,用于评估LLM生成的简答题在写作能力和安全依从方面,两者都得到了与人类评估的高相关性的验证。基于这一系统评估框架,我们对十种能处理中文的流行LLM进行了全面分析。实验结果突出了GPT-4和ERNIE Bot作为表现最佳的模型,但在创意写作任务中揭示了相对缺乏新闻安全依从性。我们的发现还强调了在机器生成的新闻内容中加强道德指导的必要性,这标志着在使LLMs符合新闻标准和安全考虑方面迈出了一步。
更新时间: 2024-06-04 14:50:58
领域: cs.CL,cs.AI
Improving Transformers with Dynamically Composable Multi-Head Attention
Multi-Head Attention (MHA) is a key component of Transformer. In MHA, attention heads work independently, causing problems such as low-rank bottleneck of attention score matrices and head redundancy. We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of MHA and increases the expressive power of the model by dynamically composing attention heads. At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way. DCMHA can be used as a drop-in replacement of MHA in any transformer architecture to obtain the corresponding DCFormer. DCFormer significantly outperforms Transformer on different architectures and model scales in language modeling, matching the performance of models with ~1.7x-2.0x compute. For example, DCPythia-6.9B outperforms open source Pythia-12B on both pretraining perplexity and downstream task evaluation. The code and models are available at https://github.com/Caiyun-AI/DCFormer.
Updated: 2024-06-04 14:49:36
标题: 通过动态可组合的多头注意力机制改进Transformer
摘要: 多头注意力(MHA)是Transformer的一个关键组件。在MHA中,注意力头独立工作,导致注意力分数矩阵的低秩瓶颈和头部冗余等问题。我们提出了动态可组合多头注意力(DCMHA),这是一种参数和计算高效的注意力架构,可以解决MHA的缺点,并通过动态组合注意力头来增加模型的表达能力。DCMHA的核心是一个$\it{Compose}$函数,以一种输入相关的方式转换注意力分数和权重矩阵。DCMHA可以作为任何Transformer架构中MHA的一个插件替代,从而获得对应的DCFormer。在语言建模中,DCFormer在不同架构和模型规模上明显优于Transformer,与计算量约为1.7-2.0倍的模型性能相匹配。例如,DCPythia-6.9B在预训练困惑度和下游任务评估方面均优于开源Pythia-12B。代码和模型可在https://github.com/Caiyun-AI/DCFormer上获得。
更新时间: 2024-06-04 14:49:36
领域: cs.LG,cs.CL
SuperGaussian: Repurposing Video Models for 3D Super Resolution
We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io
Updated: 2024-06-04 14:47:45
标题: 超高斯:将视频模型重新用于3D超分辨率
摘要: 我们提出了一种简单、模块化和通用的方法,通过添加几何和外观细节来提高粗糙的3D模型的分辨率。虽然现在存在生成式3D模型,但它们的质量尚不及图像和视频领域的对应物。我们证明,可以直接重新利用现有(预训练的)视频模型进行3D超分辨率,从而避开高质量3D训练模型的短缺问题。我们描述了如何重新利用视频上采样模型,这些模型不是3D一致的,并将它们与3D整合结合以产生3D一致的结果。作为输出,我们生成了高质量的高斯斑点模型,这些模型是以物体为中心的且有效的。我们的方法是类别无关的,并且可以轻松地融入现有的3D工作流程中。我们对各种3D输入(在复杂性和表示方面都不同,例如高斯斑点或NeRFs)评估了我们提出的SuperGaussian,并证明我们的简单方法显著提高了最终3D模型的保真度。查看我们的项目网站获取详细信息:supergaussian.github.io
更新时间: 2024-06-04 14:47:45
领域: cs.CV,cs.AI
Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models
Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.
Updated: 2024-06-04 14:45:47
标题: 寻找NeMo:在扩散模型中定位负责记忆的神经元
摘要: 扩散模型(DMs)产生非常详细和高质量的图像。它们的强大源于对大量数据进行广泛训练,通常是从互联网上获取,而未经内容创建者的适当归属或同意。不幸的是,这种做法引发了隐私和知识产权问题,因为DMs可以在推理时记忆并复制潜在敏感或受版权保护的训练图像。先前的努力通过改变扩散过程的输入,从而防止DM在推理过程中生成记忆样本,或完全删除训练中的记忆数据来解决这个问题。尽管在DM在安全和持续监控的环境中开发和部署时这些是可行的解决方案,但存在对手绕过安全措施的风险,并且当DM本身公开发布时并不有效。为了解决这个问题,我们介绍了NeMo,这是第一种将单个数据样本的记忆定位到DM交叉关注层神经元级别的方法。通过我们的实验证明,令人感兴趣的发现是,在许多情况下,单个神经元负责记忆特定的训练样本。通过停用这些记忆神经元,我们可以避免在推理时复制训练数据,增加生成输出的多样性,并减轻私人和受版权保护数据的泄露。通过这种方式,我们的NeMo有助于更负责任地部署DMs。
更新时间: 2024-06-04 14:45:47
领域: cs.LG,cs.AI
Blockchains for Internet of Things: Fundamentals, Applications, and Challenges
Internet of Things (IoT) services necessitate the storage, transmission, and analysis of diverse data for inference, autonomy, and control. Blockchains, with their inherent properties of decentralization and security, offer efficient database solutions for these devices through consensus-based data sharing. However, it's essential to recognize that not every blockchain system is suitable for specific IoT applications, and some might be more beneficial when excluded with privacy concerns. For example, public blockchains are not suitable for storing sensitive data. This paper presents a detailed review of three distinct blockchains tailored for enhancing IoT applications. We initially delve into the foundational aspects of three blockchain systems, highlighting their strengths, limitations, and implementation needs. Additionally, we discuss the security issues in different blockchains. Subsequently, we explore the blockchain's application in three pivotal IoT areas: edge AI, communications, and healthcare. We underscore potential challenges and the future directions for integrating different blockchains in IoT. Ultimately, this paper aims to offer a comprehensive perspective on the synergies between blockchains and the IoT ecosystem, highlighting the opportunities and complexities involved.
Updated: 2024-06-04 14:45:27
标题: 区块链技术在物联网中的应用:基础知识、应用和挑战
摘要: 物联网(IoT)服务需要存储、传输和分析各种数据,用于推理、自主和控制。区块链具有分散化和安全性等固有特性,通过基于共识的数据共享为这些设备提供高效的数据库解决方案。然而,必须认识到并非每种区块链系统都适用于特定的物联网应用,有些可能在隐私问题上更有益。例如,公共区块链不适合存储敏感数据。本文详细审查了三种专为增强物联网应用而设计的不同区块链。我们首先深入探讨了三个区块链系统的基础方面,突出它们的优势、局限性和实施需求。此外,我们讨论了不同区块链的安全问题。随后,我们探讨了区块链在三个关键的物联网领域的应用:边缘人工智能、通信和医疗保健。我们强调了整合不同区块链在物联网中的潜在挑战和未来发展方向。最终,本文旨在提供一个全面的视角,突出区块链与物联网生态系统之间的协同作用,突显涉及的机遇和复杂性。
更新时间: 2024-06-04 14:45:27
领域: cs.CR,cs.NI
Temporal Graph Rewiring with Expander Graphs
Evolving relations in real-world networks are often modelled by temporal graphs. Graph rewiring techniques have been utilised on Graph Neural Networks (GNNs) to improve expressiveness and increase model performance. In this work, we propose Temporal Graph Rewiring (TGR), the first approach for graph rewiring on temporal graphs. TGR enables communication between temporally distant nodes in a continuous time dynamic graph by utilising expander graph propagation to construct a message passing highway for message passing between distant nodes. Expander graphs are suitable candidates for rewiring as they help overcome the oversquashing problem often observed in GNNs. On the public tgbl-wiki benchmark, we show that TGR improves the performance of a widely used TGN model by a significant margin. Our code repository is accessible at https://anonymous.4open.science/r/TGR-254C.
Updated: 2024-06-04 14:39:51
标题: 使用扩展图进行时序图重连
摘要: 现实世界网络中的演化关系通常通过时间图来建模。图重连技术已经被应用于图神经网络(GNNs)以提高表现力并增加模型性能。在这项工作中,我们提出了时间图重连(TGR),这是第一种用于时间图的图重连方法。TGR通过利用扩展图传播构建一个消息传递高速公路,使连续时间动态图中的时间上相距较远的节点之间能够进行通信。扩展图是重连的合适候选者,因为它们有助于克服GNNs中经常观察到的过度压缩问题。在公开的tgbl-wiki基准测试中,我们展示了TGR显著提高了广泛使用的TGN模型的性能。我们的代码存储库可在https://anonymous.4open.science/r/TGR-254C访问。
更新时间: 2024-06-04 14:39:51
领域: cs.LG,cs.AI,cs.SI,stat.ML
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models
Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resolving hallucination is a pressing need in this area. Our observations indicate that current methods face significant challenges in dealing with specialized and complex knowledge in medical domain. Therefore, we propose MedLaSA, a novel Layer-wise Scalable Adapter strategy for medical model editing. MedLaSA harnesses the strengths of both adding extra parameters and locate-then-edit methods for medical model editing. We utilize causal tracing to identify the association of knowledge in neurons across different layers, and generate a corresponding scale set from the association value for each piece of knowledge. Subsequently, we incorporate scalable adapters into the dense layers of LLMs. These adapters are assigned scaling values based on the corresponding specific knowledge, which allows for the adjustment of the adapter's weight and rank. The more similar the content, the more consistent the scale between them. This ensures precise editing of semantically identical knowledge while avoiding impact on unrelated knowledge. To evaluate the editing impact on the behaviours of LLMs, we propose two model editing studies for medical domain: (1) editing factual knowledge for medical specialization and (2) editing the explanatory ability for complex knowledge. We build two novel medical benchmarking datasets and introduce a series of challenging and comprehensive metrics. Extensive experiments on medical LLMs demonstrate the editing efficiency of MedLaSA, without affecting unrelated knowledge.
Updated: 2024-06-04 14:38:34
标题: 编辑医学大型语言模型的事实知识和解释能力
摘要: 模型编辑旨在精确地改变大型语言模型(LLMs)的行为,以涉及特定知识,同时保持不相关知识不变。这种方法已被证明在处理LLMs中的幻觉和过时信息方面是有效的。然而,利用模型编辑来修改医学领域的知识的潜力仍然大部分未被探索,尽管在这个领域解决幻觉是一个迫切的需求。我们的观察表明,目前的方法在处理医学领域的专业化和复杂知识方面面临着重大挑战。因此,我们提出了MedLaSA,一种新颖的用于医学模型编辑的逐层可扩展适配器策略。MedLaSA利用了添加额外参数和找到-然后编辑方法的优势,用于医学模型编辑。我们利用因果追踪来识别不同层中神经元之间知识的关联,并为每个知识片段生成相应的比例集。随后,我们将可扩展适配器整合到LLMs的密集层中。这些适配器根据相应的具体知识被分配缩放值,从而允许调整适配器的权重和等级。内容越相似,它们之间的比例就越一致。这确保了对语义上相同的知识进行精确编辑,同时避免影响不相关的知识。为了评估对LLMs行为的编辑影响,我们提出了两个医学领域的模型编辑研究:(1)编辑医学专业化的事实知识和(2)编辑复杂知识的解释能力。我们构建了两个新颖的医学基准数据集,并引入了一系列具有挑战性和全面性的指标。对医学LLMs的大量实验表明了MedLaSA的编辑效率,而不会影响不相关的知识。
更新时间: 2024-06-04 14:38:34
领域: cs.CL,cs.AI
Using Self-supervised Learning Can Improve Model Fairness
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's generalizability on three real-world human-centric datasets (i.e., MIMIC, MESA, and GLOBEM) by systematically comparing hundreds of SSL and fine-tuned models on various dimensions spanning from the intermediate representations to appropriate evaluation metrics. Our findings demonstrate that SSL can significantly improve model fairness, while maintaining performance on par with supervised methods-exhibiting up to a 30% increase in fairness with minimal loss in performance through self-supervision. We posit that such differences can be attributed to representation dissimilarities found between the best- and the worst-performing demographics across models-up to x13 greater for protected attributes with larger performance discrepancies between segments.
Updated: 2024-06-04 14:38:30
标题: 使用自监督学习可以提高模型的公平性
摘要: 自我监督学习(SSL)已成为大型模型的事实训练范式,其中预训练后使用特定领域数据和标签进行监督微调。尽管表现出与监督方法相当的性能,但缺乏全面评估SSL对机器学习公平性(即在不同人口统计数据上表现相同)的影响的努力。假设SSL模型将学习更通用、因此更少偏见的表示,本研究探讨了预训练和微调策略对公平性的影响。我们引入了一个针对SSL的公平性评估框架,包括五个阶段:定义数据集要求,预训练,逐步解除冻结微调,根据人口统计条件评估表示相似性,建立特定领域的评估流程。我们通过系统比较数百个SSL和微调模型在各种维度上的表现,从中间表示到适当的评估指标,评估我们方法在三个真实世界以人为中心的数据集(即MIMIC、MESA和GLOBEM)上的泛化能力。我们的研究结果表明,SSL可以显著提高模型的公平性,同时保持与监督方法相当的性能-通过自我监督可以实现公平性增加高达30%,同时最小化性能损失。我们认为,这种差异可以归因于模型之间在最佳和最差表现人口统计数据之间发现的表示不相似性,对于具有较大性能差异的受保护属性,模型之间的性能差异最高可达13倍。
更新时间: 2024-06-04 14:38:30
领域: cs.LG
The complexity of approximate (coarse) correlated equilibrium for incomplete information games
We study the iteration complexity of decentralized learning of approximate correlated equilibria in incomplete information games. On the negative side, we prove that in $\mathit{extensive}$-$\mathit{form}$ $\mathit{games}$, assuming $\mathsf{PPAD} \not\subset \mathsf{TIME}(n^{\mathsf{polylog}(n)})$, any polynomial-time learning algorithms must take at least $2^{\log_2^{1-o(1)}(|\mathcal{I}|)}$ iterations to converge to the set of $\epsilon$-approximate correlated equilibrium, where $|\mathcal{I}|$ is the number of nodes in the game and $\epsilon > 0$ is an absolute constant. This nearly matches, up to the $o(1)$ term, the algorithms of [PR'24, DDFG'24] for learning $\epsilon$-approximate correlated equilibrium, and resolves an open question of Anagnostides, Kalavasis, Sandholm, and Zampetakis [AKSZ'24]. Our lower bound holds even for the easier solution concept of $\epsilon$-approximate $\mathit{coarse}$ correlated equilibrium On the positive side, we give uncoupled dynamics that reach $\epsilon$-approximate correlated equilibria of a $\mathit{Bayesian}$ $\mathit{game}$ in polylogarithmic iterations, without any dependence of the number of types. This demonstrates a separation between Bayesian games and extensive-form games.
Updated: 2024-06-04 14:35:27
标题: 不完全信息博弈中近似(粗略)相关均衡的复杂性
摘要: 我们研究了分布式学习不完全信息博弈中近似相关均衡的迭代复杂性。 在负面方面,我们证明在$\mathit{广义形式}$ $\mathit{博弈}$中,假设$\mathsf{PPAD} \not\subset \mathsf{TIME}(n^{\mathsf{polylog}(n)})$,任何多项式时间学习算法都必须至少花费$2^{\log_2^{1-o(1)}(|\mathcal{I}|)}$次迭代才能收敛到$\epsilon$-近似相关均衡集合,其中$|\mathcal{I}|$是博弈中的节点数,$\epsilon > 0$是一个绝对常数。这几乎与[PR'24, DDFG'24]中学习$\epsilon$-近似相关均衡的算法匹配,解决了Anagnostides、Kalavasis、Sandholm和Zampetakis [AKSZ'24]的一个悬而未决的问题。我们的下界甚至适用于更容易的解决概念$\epsilon$-近似$\mathit{粗略}$相关均衡。 在积极方面,我们提供了达到$\epsilon$-近似相关均衡的$\mathit{贝叶斯}$ $\mathit{博弈}$的无耦合动态,在对数多项式次迭代中实现,而不依赖于类型数量。这表明了贝叶斯博弈和广义形式博弈之间的差异。
更新时间: 2024-06-04 14:35:27
领域: cs.GT,cs.AI,cs.DS,cs.LG
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
The ability (and inability) of large language models (LLMs) to perform arithmetic tasks has been the subject of much theoretical and practical debate. We show that LLMs are frequently able to correctly and confidently predict the first digit of n-digit by m-digit multiplication tasks without using chain of thought reasoning, despite these tasks require compounding operations to solve. Simultaneously, LLMs in practice often fail to correctly or confidently predict the last digit of an n-digit by m-digit multiplication, a task equivalent to 1-digit by 1-digit multiplication which can be easily learned or memorized. We show that the latter task can be solved more robustly when the LLM is conditioned on all of the correct higher-order digits, which on average increases the confidence of the correct last digit on 5-digit by 5-digit multiplication tasks using Llama 2-13B by over 230% (0.13 to 0.43) and Mistral-7B by 150% (0.22 to 0.55).
Updated: 2024-06-04 14:34:39
标题: 语言模型很容易完成困难的算术任务,但几乎不会完成简单的算术任务。
摘要: 大型语言模型(LLMs)执行算术任务的能力(和无能力)一直是理论和实践上的热门话题。我们展示了LLMs经常能够在不使用连续思考推理的情况下,正确而自信地预测n位乘m位乘法任务的第一个数字,尽管这些任务需要复合操作来解决。与此同时,LLMs在实践中经常无法正确或自信地预测n位乘m位乘法的最后一个数字,这是一个等同于1位乘1位乘法的任务,可以轻松学习或记忆。我们展示了当LLM受到所有正确的高位数字的条件约束时,后一种任务可以更可靠地解决,平均增加了Llama 2-13B在5位乘5位乘法任务上的正确最后数字的置信度超过230%(从0.13到0.43),Mistral-7B增加了150%(从0.22到0.55)。
更新时间: 2024-06-04 14:34:39
领域: cs.LG,cs.AI,cs.CL
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.
Updated: 2024-06-04 14:34:38
标题: 社会选择应指导人工智能在处理多样化人类反馈方面的对齐
摘要: 基于社会选择的方法用于调优基础模型,比如GPT-4,以避免不安全或其他有问题的行为,比如帮助犯罪或产生种族主义文本。一种调优方法是称为从人类反馈中进行强化学习,从人类对多个输出的表达偏好中学习。另一种方法是宪法人工智能,其中来自人类的输入是高级原则的列表。但是,我们如何处理潜在的来自人类的不同输入?我们如何将输入聚合成关于“集体”偏好的一致数据,或者以其他方式将其用于对模型行为作出集体选择?在本文中,我们认为社会选择领域有望解决这些问题,并讨论了前进的途径,借鉴了2023年12月在美国加利福尼亚州伯克利举行的一次关于人工智能伦理和安全的社会选择研讨会的讨论。
更新时间: 2024-06-04 14:34:38
领域: cs.LG,cs.AI,cs.CL,cs.CY,cs.GT,68T01, 68T50, 91B14, 91B12,I.2.0; I.2.7; K.4.2; I.2.m; J.4
FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning
Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is client drift, where data heterogeneity impedes the aggregation of scattered knowledge. Recent studies have tackled the client drift issue by identifying significant divergence in the last classifier layer. To mitigate this divergence, strategies such as freezing the classifier weights and aligning the feature extractor accordingly have proven effective. Although the local alignment between classifier and feature extractor has been studied as a crucial factor in FL, we observe that it may lead the model to overemphasize the observed classes within each client. Thus, our objectives are twofold: (1) enhancing local alignment while (2) preserving the representation of unseen class samples. This approach aims to effectively integrate knowledge from individual clients, thereby improving performance for both global and personalized FL. To achieve this, we introduce a novel algorithm named FedDr+, which empowers local model alignment using dot-regression loss. FedDr+ freezes the classifier as a simplex ETF to align the features and improves aggregated global models by employing a feature distillation mechanism to retain information about unseen/missing classes. Consequently, we provide empirical evidence demonstrating that our algorithm surpasses existing methods that use a frozen classifier to boost alignment across the diverse distribution.
Updated: 2024-06-04 14:34:13
标题: FedDr+: 使用全局特征蒸馏稳定化点回归,用于联邦学习
摘要: 联邦学习(FL)已经成为开发有效的全局模型(全局FL)或个性化模型(个性化FL)的关键框架,这些模型跨越具有异质、非iid数据分布的客户端。FL面临的一个关键挑战是客户端漂移,即数据的异质性阻碍了分散知识的聚合。最近的研究通过识别最后分类器层中的显著差异来解决客户端漂移问题。为了缓解这种差异,冻结分类器权重并相应地对齐特征提取器等策略已被证明是有效的。尽管分类器和特征提取器之间的本地对齐被研究为FL中的一个关键因素,但我们观察到这可能导致模型过分强调每个客户端中观察到的类。因此,我们的目标是双重的:(1)增强本地对齐,同时(2)保留未见类样本的表示。这种方法旨在有效地整合来自个体客户端的知识,从而提高全局和个性化FL的性能。为实现这一目标,我们引入了一种名为FedDr+的新算法,该算法利用点回归损失增强本地模型对齐。FedDr+将分类器冻结为一个简单的ETF以对齐特征,并通过采用特征蒸馏机制改进聚合的全局模型,以保留关于未见/缺失类的信息。因此,我们提供了实证证据表明,我们的算法超越了使用冻结分类器来增强跨多样化分布的对齐的现有方法。
更新时间: 2024-06-04 14:34:13
领域: cs.CV,cs.AI,cs.DC,cs.LG
Label-wise Aleatoric and Epistemic Uncertainty Quantification
We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets -- including applications in the medical domain where accurate uncertainty quantification is crucial -- we establish the effectiveness of label-wise uncertainty quantification.
Updated: 2024-06-04 14:33:23
标题: 标签式的Aleatoric和Epistemic不确定性量化
摘要: 我们提出了一种新颖的分类任务中不确定性量化方法,基于标签分解不确定性度量。这种标签分解的视角允许在个别类别水平上量化不确定性,从而改善成本敏感的决策制定并帮助理解不确定性的来源。此外,它允许根据方差等非分类度量来定义总体、偶然和认知不确定性,超越常见的基于熵的度量。特别是,基于方差的度量解决了最近文献中讨论的一些与已建立方法相关的限制。我们展示了我们提出的度量符合一些理想特性。通过对各种基准数据集的经验评估,包括在医疗领域中准确量化不确定性至关重要的应用,我们建立了标签分解不确定性量化的有效性。
更新时间: 2024-06-04 14:33:23
领域: cs.LG,stat.ML
System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization
We consider the problem of optimizing initial conditions and timing in dynamical systems governed by unknown ordinary differential equations (ODEs), where evaluating different initial conditions is costly and there are constraints on observation times. To identify the optimal conditions within several trials, we introduce a few-shot Bayesian Optimization (BO) framework based on the system's prior information. At the core of our approach is the System-Aware Neural ODE Processes (SANODEP), an extension of Neural ODE Processes (NODEP) designed to meta-learn ODE systems from multiple trajectories using a novel context embedding block. Additionally, we propose a multi-scenario loss function specifically for optimization purposes. Our two-stage BO framework effectively incorporates search space constraints, enabling efficient optimization of both initial conditions and observation timings. We conduct extensive experiments showcasing SANODEP's potential for few-shot BO. We also explore SANODEP's adaptability to varying levels of prior information, highlighting the trade-off between prior flexibility and model fitting accuracy.
Updated: 2024-06-04 14:28:36
标题: Few-Shot Bayesian Optimization的System-Aware神经ODE过程
摘要: 我们考虑了在由未知常微分方程(ODEs)控制的动力系统中优化初始条件和时机的问题,其中评估不同初始条件成本高昂,并且存在对观测时间的约束。为了在几次试验中确定最佳条件,我们引入了一个基于系统先验信息的少样本贝叶斯优化(BO)框架。我们方法的核心是System-Aware Neural ODE Processes(SANODEP),这是Neural ODE Processes(NODEP)的扩展,旨在通过一种新颖的上下文嵌入块从多条轨迹中元学习ODE系统。此外,我们提出了一个专门用于优化目的的多场景损失函数。我们的两阶段BO框架有效地整合了搜索空间约束,实现了对初始条件和观测时机的高效优化。我们进行了大量实验,展示了SANODEP在少样本BO中的潜力。我们还探讨了SANODEP对不同先验信息水平的适应性,突显了在先验灵活性和模型拟合准确性之间的权衡。
更新时间: 2024-06-04 14:28:36
领域: cs.LG
On the Identifiability of Switching Dynamical Systems
The identifiability of latent variable models has received increasing attention due to its relevance in interpretability and out-of-distribution generalisation. In this work, we study the identifiability of Switching Dynamical Systems, taking an initial step toward extending identifiability analysis to sequential latent variable models. We first prove the identifiability of Markov Switching Models, which commonly serve as the prior distribution for the continuous latent variables in Switching Dynamical Systems. We present identification conditions for first-order Markov dependency structures, whose transition distribution is parametrised via non-linear Gaussians. We then establish the identifiability of the latent variables and non-linear mappings in Switching Dynamical Systems up to affine transformations, by leveraging identifiability analysis techniques from identifiable deep latent variable models. We finally develop estimation algorithms for identifiable Switching Dynamical Systems. Throughout empirical studies, we demonstrate the practicality of identifiable Switching Dynamical Systems for segmenting high-dimensional time series such as videos, and showcase the use of identifiable Markov Switching Models for regime-dependent causal discovery in climate data.
Updated: 2024-06-04 14:28:29
标题: 关于切换动态系统可识别性的研究
摘要: 潜变量模型的可识别性因其在可解释性和超出分布泛化方面的相关性而受到越来越多的关注。在这项工作中,我们研究了切换动态系统的可识别性,为将可识别性分析扩展到序列潜变量模型迈出了第一步。我们首先证明了马尔可夫切换模型的可识别性,这些模型通常作为切换动态系统中连续潜变量的先验分布。我们提出了一阶马尔可夫依赖结构的识别条件,其转移分布通过非线性高斯参数化。然后,通过利用可识别的深层潜变量模型的可识别性分析技术,我们建立了切换动态系统中潜变量和非线性映射的可识别性,直到仿射变换。最后,我们开发了可识别切换动态系统的估计算法。通过实证研究,我们展示了可识别切换动态系统在分割视频等高维时间序列方面的实用性,并展示了可识别马尔可夫切换模型在气候数据中基于制度的因果发现的应用。
更新时间: 2024-06-04 14:28:29
领域: stat.ML,cs.LG
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following. Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. In response, we introduce the Chinese Instruction-Following Benchmark (CIF-Bench), designed to evaluate the zero-shot generalizability of LLMs to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances across 20 categories. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance, totaling 45,000 data instances. Our evaluation of 28 selected LLMs reveals a noticeable performance gap, with the best model scoring only 52.9%, highlighting the limitations of LLMs in less familiar language and task contexts. This work not only uncovers the current limitations of LLMs in handling Chinese language tasks but also sets a new standard for future LLM generalizability research, pushing towards the development of more adaptable, culturally informed, and linguistically diverse models.
Updated: 2024-06-04 14:26:30
标题: CIF-Bench:一个用于评估大型语言模型泛化能力的中文指令遵循基准
摘要: 大语言模型(LLMs)的进步增强了通过遵循指令在一系列未见的自然语言处理(NLP)任务中泛化的能力。然而,在低资源语言(如中文)中,它们的有效性通常会减弱,加剧了由数据泄漏引起的评估偏见,对它们对新语言领域的真正泛化能力产生了怀疑。为此,我们引入了中文遵循指令基准(CIF-Bench),旨在评估LLMs对中文语言的零样本泛化能力。CIF-Bench包括150个任务和15,000个输入输出对,由母语者开发,用于测试20个类别中的复杂推理和中文文化细微差别。为减少数据污染,我们仅公开了数据集的一半,其余部分保持私有,并引入了多样化的指令以减少得分方差,总计45,000个数据实例。我们对28个选定的LLMs进行评估,发现明显的性能差距,最佳模型得分仅为52.9%,突显了LLMs在不太熟悉的语言和任务环境中的局限性。这项工作不仅揭示了LLMs在处理中文任务中当前的局限性,还为未来LLMs泛化性研究设定了新的标准,推动朝着开发更具适应性、具有文化信息和语言多样性的模型发展。
更新时间: 2024-06-04 14:26:30
领域: cs.CL,cs.AI
Evaluating ChatGPT as a Recommender System: A Rigorous Approach
Large Language Models (LLMs) have recently shown impressive abilities in handling various natural language-related tasks. Among different LLMs, current studies have assessed ChatGPT's superior performance across manifold tasks, especially under the zero/few-shot prompting conditions. Given such successes, the Recommender Systems (RSs) research community have started investigating its potential applications within the recommendation scenario. However, although various methods have been proposed to integrate ChatGPT's capabilities into RSs, current research struggles to comprehensively evaluate such models while considering the peculiarities of generative models. Often, evaluations do not consider hallucinations, duplications, and out-of-the-closed domain recommendations and solely focus on accuracy metrics, neglecting the impact on beyond-accuracy facets. To bridge this gap, we propose a robust evaluation pipeline to assess ChatGPT's ability as an RS and post-process ChatGPT recommendations to account for these aspects. Through this pipeline, we investigate ChatGPT-3.5 and ChatGPT-4 performance in the recommendation task under the zero-shot condition employing the role-playing prompt. We analyze the model's functionality in three settings: the Top-N Recommendation, the cold-start recommendation, and the re-ranking of a list of recommendations, and in three domains: movies, music, and books. The experiments reveal that ChatGPT exhibits higher accuracy than the baselines on books domain. It also excels in re-ranking and cold-start scenarios while maintaining reasonable beyond-accuracy metrics. Furthermore, we measure the similarity between the ChatGPT recommendations and the other recommenders, providing insights about how ChatGPT could be categorized in the realm of recommender systems. The evaluation pipeline is publicly released for future research.
Updated: 2024-06-04 14:25:45
标题: 评估ChatGPT作为推荐系统:一种严谨的方法
摘要: 最近,大型语言模型(LLMs)已经展示出在处理各种自然语言相关任务方面的令人印象深刻的能力。在不同的LLMs中,当前研究评估了ChatGPT在多任务中的卓越表现,特别是在零/少次提示条件下。鉴于这样的成功,推荐系统(RSs)研究界开始研究其在推荐场景中的潜在应用。然而,尽管已经提出了各种方法来整合ChatGPT的能力到RSs中,但当前研究在考虑生成模型的特殊性时很难全面评估这些模型。评估通常不考虑幻觉、重复和超出封闭领域的推荐,而仅专注于准确性指标,忽视了对超出准确性的影响。为了弥合这一差距,我们提出了一个强大的评估流程,以评估ChatGPT作为RS的能力,并对ChatGPT的推荐进行后处理以考虑这些方面。通过这个流程,我们研究了ChatGPT-3.5和ChatGPT-4在零次条件下使用角色扮演提示进行推荐任务时的表现。我们分析了模型在三种设置下的功能性:Top-N推荐、冷启动推荐和重新排列推荐列表,并在三个领域中进行:电影、音乐和书籍。实验表明,ChatGPT在书籍领域的准确性比基线更高。它还在重新排列和冷启动场景中表现出色,同时保持合理的超出准确性指标。此外,我们衡量了ChatGPT推荐与其他推荐系统之间的相似性,从而提供了关于ChatGPT如何被归类于推荐系统领域的见解。这个评估流程已经公开发布供未来研究使用。
更新时间: 2024-06-04 14:25:45
领域: cs.IR,cs.AI,cs.CL
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
Large language models (LLMs) have shown amazing capabilities in knowledge memorization and present. However, when it comes to domain-specific knowledge and downstream tasks like medical, general LLMs are often unable to give precise answers. In addition, when people want LLMs to answer classification questions, they usually go through instruction tuning first, however, LLMs do not always give a direct index of the categorization after instruction tuning. In this paper, we proposed LlamaCare, a fine-tuned medical language model, and Extended Classification Integration(ECI), a module to handle classification problems of LLMs. Our contributions are : (i) We fine-tuned a large language model of medical knowledge with very low carbon emissions and achieved similar performance with ChatGPT by a 24G GPU. (ii) We solved the problem of redundant categorical answers and improved the performance of LLMs by proposing a new module called Extended Classification Integration. (iii) We released our processed data for one-shot and few-shot training for some benchmarks such as PubMedQA and USMLE 1-3 step. Our method achieves a close effect with the state-of-the-art model in benchmarks while costing lower GPU resources compared to LLMs with the same quantity of parameters. Our models, codes, and datasets can be found in https://github.com/Stephen-SMJ/LLamaCare
Updated: 2024-06-04 14:24:53
标题: “LlamaCare:用于增进医疗知识共享的大型医学语言模型”
摘要: 大语言模型(LLMs)在知识记忆和表现方面展现出了令人惊叹的能力。然而,当涉及特定领域知识和下游任务(如医学)时,通常通用LLMs无法给出精确答案。此外,当人们希望LLMs回答分类问题时,他们通常会首先进行指导调整,但是LLMs在指导调整后并不总是直接给出分类的指数。在本文中,我们提出了LlamaCare,一个经过精细调整的医学语言模型,以及Extended Classification Integration(ECI),一个用于处理LLMs分类问题的模块。我们的贡献包括:(i)我们使用非常低的碳排放量对医学知识的大语言模型进行了精细调整,并通过24G GPU实现了与ChatGPT相似的性能。(ii)我们通过提出一个名为Extended Classification Integration的新模块,解决了冗余的分类答案问题,并提高了LLMs的性能。(iii)我们发布了我们处理过的数据,用于一次性和少量训练,适用于一些基准测试,如PubMedQA和USMLE 1-3步骤。我们的方法在基准测试中取得了与最先进模型接近的效果,同时与具有相同参数数量的LLMs相比,花费更少的GPU资源。我们的模型、代码和数据集可以在https://github.com/Stephen-SMJ/LLamaCare找到。
更新时间: 2024-06-04 14:24:53
领域: cs.CL,cs.AI
CADE: Cosine Annealing Differential Evolution for Spiking Neural Network
Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN.
Updated: 2024-06-04 14:24:35
标题: CADE:余弦退火差分进化用于脉冲神经网络
摘要: 脉冲神经网络(SNN)因其在神经形态计算和节能人工智能方面的潜力而备受关注,然而,由于其离散的基于脉冲的计算,对其进行优化仍然是梯度下降方法面临的巨大挑战。本文尝试通过引入余弦退火差分进化(CADE)来解决这些挑战,CADE旨在调节差分进化(DE)模型中的突变因子(F)和交叉率(CR),用于SNN模型,即脉冲元素智能(SEW)ResNet。进行了大量实证评估来分析CADE。CADE显示出在探索和利用搜索空间方面的平衡,导致收敛加速和提高准确性,相较于现有的基于梯度和基于DE的方法。此外,开发了一种基于迁移学习设置的初始化方法,对源数据集(即CIFAR-10)进行预训练并对目标数据集(即CIFAR-100)进行微调,以提高种群多样性。发现这进一步增强了SNN的CADE。值得注意的是,CADE将最高准确率SEW模型的性能提高了额外的0.52个百分点,突显了其在微调和增强SNN方面的有效性。这些发现强调了对F和CR调整的调度程序在DE-based SNN中的关键作用。Github上的源代码:https://github.com/Tank-Jiang/CADE4SNN。
更新时间: 2024-06-04 14:24:35
领域: cs.NE,cs.AI,cs.CV
AMOSL: Adaptive Modality-wise Structure Learning in Multi-view Graph Neural Networks For Enhanced Unified Representation
While Multi-view Graph Neural Networks (MVGNNs) excel at leveraging diverse modalities for learning object representation, existing methods assume identical local topology structures across modalities that overlook real-world discrepancies. This leads MVGNNs straggles in modality fusion and representations denoising. To address these issues, we propose adaptive modality-wise structure learning (AMoSL). AMoSL captures node correspondences between modalities via optimal transport, and jointly learning with graph embedding. To enable efficient end-to-end training, we employ an efficient solution for the resulting complex bilevel optimization problem. Furthermore, AMoSL adapts to downstream tasks through unsupervised learning on inter-modality distances. The effectiveness of AMoSL is demonstrated by its ability to train more accurate graph classifiers on six benchmark datasets.
Updated: 2024-06-04 14:24:30
标题: AMOSL:多视图图神经网络中的自适应模态结构学习,以增强统一表示
摘要: 多视图图神经网络(MVGNNs)擅长利用多样的模态来学习对象表示,现有方法假设不同模态之间具有相同的局部拓扑结构,忽视了现实世界中的差异。这导致MVGNNs在模态融合和表示去噪方面存在困难。为了解决这些问题,我们提出了自适应模态结构学习(AMoSL)。AMoSL通过最优传输捕捉模态之间的节点对应关系,并与图嵌入一起进行联合学习。为了实现高效的端到端训练,我们采用了一个高效的解决方案来解决由此产生的复杂双层优化问题。此外,AMoSL通过对模态间距离进行无监督学习,适应下游任务。通过在六个基准数据集上训练更准确的图分类器,证明了AMoSL的有效性。
更新时间: 2024-06-04 14:24:30
领域: cs.LG
Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models
Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of cross-lingual consistency in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarking methods reveal that current text watermarking technologies lack consistency when texts are translated into various languages. Based on this observation, we propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking by first obtaining a response from an LLM in a pivot language, which is then translated into the target language. CWRA can effectively remove watermarks, decreasing the AUCs to a random-guessing level without performance loss. Furthermore, we analyze two key factors that contribute to the cross-lingual consistency in text watermarking and propose X-SIR as a defense method against CWRA. Code: https://github.com/zwhe99/X-SIR.
Updated: 2024-06-04 14:24:15
标题: 水印能够在翻译过程中保留吗?关于大型语言模型文本水印的跨语言一致性
摘要: 文本水印技术旨在标记和识别由大型语言模型(LLMs)生成的内容,以防止滥用。在本研究中,我们介绍了跨语言一致性概念在文本水印中的应用,该概念评估了文本水印在被翻译成其他语言后保持有效性的能力。来自两个LLMs和三种水印方法的初步实证结果显示,当前的文本水印技术在文本被翻译成各种语言时缺乏一致性。基于这一观察,我们提出了一种跨语言水印移除攻击(CWRA),通过首先在一个中间语言中从LLM获取响应,然后将其翻译成目标语言来绕过水印。CWRA可以有效地删除水印,将AUC降低到随机猜测水平而无性能损失。此外,我们分析了影响文本水印跨语言一致性的两个关键因素,并提出X-SIR作为对抗CWRA的防御方法。源代码:https://github.com/zwhe99/X-SIR。
更新时间: 2024-06-04 14:24:15
领域: cs.CL,cs.AI
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swapping, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$\alpha$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at https://github.com/gojasper/flash-diffusion.
Updated: 2024-06-04 14:23:27
标题: 闪存扩散:加速任何有条件扩散模型进行少步图像生成
摘要: 在这篇论文中,我们提出了一种高效、快速、多功能的蒸馏方法,用于加速预训练扩散模型的生成:Flash Diffusion。该方法在COCO2014和COCO2017数据集上通过少量步骤图像生成达到了最先进的FID和CLIP-Score表现,同时只需要几个GPU小时的训练时间和比现有方法更少的可训练参数。除了效率之外,该方法的多功能性也在多个任务中展现出来,如文本到图像、修复、人脸交换、超分辨率以及使用不同的骨干网络,如基于UNet的去噪器(SD1.5、SDXL)或DiT(Pixart-$\alpha$),以及适配器。在所有情况下,该方法可以大大减少采样步骤的数量,同时保持非常高质量的图像生成。官方实现可在https://github.com/gojasper/flash-diffusion 上找到。
更新时间: 2024-06-04 14:23:27
领域: cs.CV,cs.AI,cs.LG
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Audio and visual signals typically occur simultaneously, and humans possess an innate ability to correlate and synchronize information from these two modalities. Recently, a challenging problem known as Audio-Visual Segmentation (AVS) has emerged, intending to produce segmentation maps for sounding objects within a scene. However, the methods proposed so far have not sufficiently integrated audio and visual information, and the computational costs have been extremely high. Additionally, the outputs of different stages have not been fully utilized. To facilitate this research, we introduce a novel Progressive Confident Masking Attention Network (PMCANet). It leverages attention mechanisms to uncover the intrinsic correlations between audio signals and visual frames. Furthermore, we design an efficient and effective cross-attention module to enhance semantic perception by selecting query tokens. This selection is determined through confidence-driven units based on the network's multi-stage predictive outputs. Experiments demonstrate that our network outperforms other AVS methods while requiring less computational resources.
Updated: 2024-06-04 14:21:41
标题: 渐进式自信掩蔽关注网络用于音视频分割
摘要: 声音和视觉信号通常同时发生,人类具有从这两种模态中协调和同步信息的天生能力。最近,出现了一个挑战性问题,被称为音频-视觉分割(AVS),旨在为场景中的发声对象生成分割地图。然而,迄今为止提出的方法并没有充分整合音频和视觉信息,并且计算成本极高。此外,不同阶段的输出尚未充分利用。为了促进这项研究,我们引入了一种新颖的渐进自信掩蔽注意力网络(PMCANet)。它利用注意机制来揭示音频信号和视觉帧之间的内在相关性。此外,我们设计了一个高效和有效的交叉注意力模块,通过选择查询标记来增强语义感知。这个选择是通过基于网络多阶段预测输出的置信驱动单元来确定的。实验表明,我们的网络在需要更少计算资源的情况下优于其他AVS方法。
更新时间: 2024-06-04 14:21:41
领域: cs.CV,cs.AI,cs.LG,cs.MM
Keyword-Guided Adaptation of Automatic Speech Recognition
Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches to effectively steer the decoder towards these prompts: KG-Whisper, which is aimed at fine-tuning the Whisper decoder, and KG-Whisper-PT, which learns a prompt prefix. Our results show a significant improvement in the recognition accuracy of specified keywords and in reducing the overall word error rates. Specifically, in unseen language generalization, we demonstrate an average WER improvement of 5.1% over Whisper.
Updated: 2024-06-04 14:20:38
标题: 关键词引导的自动语音识别适应
摘要: 自动语音识别(ASR)技术在近年来取得了显著进展,在各个领域提供了准确的转录。然而,在嘈杂环境和专业术语方面仍然存在一些挑战。在本文中,我们提出了一种改进术语识别的新方法,通过上下文偏置Whisper-based模型。我们采用了一个关键词识别模型,利用Whisper编码器表示来动态生成提示,以在转录过程中引导解码器。我们介绍了两种有效引导解码器朝向这些提示的方法:KG-Whisper,旨在微调Whisper解码器,以及KG-Whisper-PT,学习提示前缀。我们的结果显示,在指定关键词的识别准确度和降低总体词错误率方面取得了显著提高。具体来说,在未知语言泛化中,我们展示了与Whisper相比平均WER提高了5.1%。
更新时间: 2024-06-04 14:20:38
领域: eess.AS,cs.LG,cs.SD
Incorporating Navigation Context into Inland Vessel Trajectory Prediction: A Gaussian Mixture Model and Transformer Approach
Using data sources beyond the Automatic Identification System to represent the context a vessel is navigating in and consequently improve situation awareness is still rare in machine learning approaches to vessel trajectory prediction (VTP). In inland shipping, where vessel movement is constrained within fairways, navigational context information is indispensable. In this contribution targeting inland VTP, Gaussian Mixture Models (GMMs) are applied, on a fused dataset of AIS and discharge measurements, to generate multi-modal distribution curves, capturing typical lateral vessel positioning in the fairway and dislocation speeds along the waterway. By sampling the probability density curves of the GMMs, feature vectors are derived which are used, together with spatio-temporal vessel features and fairway geometries, as input to a VTP transformer model. The incorporation of these distribution features of both the current and forthcoming navigation context improves prediction accuracy. The superiority of the model over a previously proposed transformer model for inland VTP is shown. The novelty lies in the provision of preprocessed, statistics-based features representing the conditioned spatial context, rather than relying on the model to extract relevant features for the VTP task from contextual data. Oversimplification of the complexity of inland navigation patterns by assuming a single typical route or selecting specific clusters prior to model application is avoided by giving the model access to the entire distribution information. The methodology's generalizability is demonstrated through the usage of data of 3 distinct river sections. It can be integrated into an interaction-aware prediction framework, where insights into the positioning of the actual vessel behavior in the overall distribution at the current location and discharge can enhance trajectory prediction accuracy.
Updated: 2024-06-04 14:20:10
标题: 将导航上下文融入内陆船舶轨迹预测:高斯混合模型和Transformer方法
摘要: 使用超出自动识别系统的数据源来表示船舶航行的环境,并因此改善情境感知,在船舶轨迹预测中仍然很少见。在内陆航运中,船舶运动受限于航道,导航环境信息是不可或缺的。本文针对内陆航运轨迹预测,应用高斯混合模型(GMM),对融合的AIS和排放测量数据集进行分析,生成多模态分布曲线,捕捉航道中典型的船舶横向定位和水路上的位移速度。通过对GMM的概率密度曲线进行采样,得到特征向量,将其与时空船舶特征和航道几何结构一起作为VTP变换器模型的输入。将当前和即将到来的导航环境的这些分布特征融入,提高了预测准确性。展示了该模型相对于先前提出的内陆VTP变换器模型的优越性。其新颖性在于提供经预处理的、基于统计的特征,代表条件化的空间环境,而不是依赖模型从上下文数据中提取相关特征用于VTP任务。通过让模型访问整个分布信息,避免了通过假设单一典型路线或在模型应用之前选择特定簇来简化内陆导航模式的复杂性。通过使用3个不同河段的数据展示了该方法的普适性。它可以整合到一个互动感知预测框架中,其中对当前位置和排放中实际船舶行为在整体分布中的定位的洞察力可以增强轨迹预测的准确性。
更新时间: 2024-06-04 14:20:10
领域: cs.LG
Cluster-Aware Similarity Diffusion for Instance Retrieval
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph. However, existing techniques that construct the affinity graph based on pairwise instances can lead to the propagation of misinformation from outliers and other manifolds, resulting in inaccurate results. To overcome this issue, we propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval. The primary concept of CAS is to conduct similarity diffusion within local clusters, which can reduce the influence from other manifolds explicitly. To obtain a symmetrical and smooth similarity matrix, our Bidirectional Similarity Diffusion strategy introduces an inverse constraint term to the optimization objective of local cluster diffusion. Additionally, we have optimized a Neighbor-guided Similarity Smoothing approach to ensure similarity consistency among the local neighbors of each instance. Evaluations in instance retrieval and object re-identification validate the effectiveness of the proposed CAS, our code is publicly available.
Updated: 2024-06-04 14:19:50
标题: 集群感知相似度扩散用于实例检索
摘要: 基于扩散的重新排序是一种常用的方法,用于通过在最近邻图中执行相似性传播来检索实例。然而,基于成对实例构建关联图的现有技术可能会导致从异常值和其他流形中传播错误信息,导致不准确的结果。为了解决这个问题,我们提出了一种新颖的面向聚类的相似性(CAS)扩散用于实例检索。CAS的主要概念是在本地聚类内进行相似性扩散,可以明确减少来自其他流形的影响。为了获得对称且平滑的相似性矩阵,我们的双向相似性扩散策略在本地聚类扩散的优化目标中引入了一个反向约束项。此外,我们优化了一种邻域引导的相似性平滑方法,以确保每个实例的本地邻居之间的相似性一致性。在实例检索和对象重新识别中的评估验证了所提出的CAS的有效性,我们的代码是公开可用的。
更新时间: 2024-06-04 14:19:50
领域: cs.LG,cs.CV
Exploring Effects of Hyperdimensional Vectors for Tsetlin Machines
Tsetlin machines (TMs) have been successful in several application domains, operating with high efficiency on Boolean representations of the input data. However, Booleanizing complex data structures such as sequences, graphs, images, signal spectra, chemical compounds, and natural language is not trivial. In this paper, we propose a hypervector (HV) based method for expressing arbitrarily large sets of concepts associated with any input data. Using a hyperdimensional space to build vectors drastically expands the capacity and flexibility of the TM. We demonstrate how images, chemical compounds, and natural language text are encoded according to the proposed method, and how the resulting HV-powered TM can achieve significantly higher accuracy and faster learning on well-known benchmarks. Our results open up a new research direction for TMs, namely how to expand and exploit the benefits of operating in hyperspace, including new booleanization strategies, optimization of TM inference and learning, as well as new TM applications.
Updated: 2024-06-04 14:16:52
标题: 探索超维向量在Tsetlin机器中的影响
摘要: Tsetlin机(TMs)在几个应用领域取得了成功,在布尔表示的输入数据上高效运行。然而,将复杂数据结构如序列、图、图像、信号谱、化合物和自然语言进行布尔化并不是简单的。在本文中,我们提出了一种基于超矢量(HV)的方法,用于表达与任何输入数据相关的任意大量概念。使用超维空间构建向量极大地扩展了TM的容量和灵活性。我们展示了如何根据所提出的方法对图像、化合物和自然语言文本进行编码,以及结果HV动力TM如何在知名基准上实现显着更高的准确率和更快的学习。我们的结果打开了TMs的新研究方向,即如何扩展和利用在超空间中运行的好处,包括新的布尔化策略、TM推理和学习的优化,以及新的TM应用。
更新时间: 2024-06-04 14:16:52
领域: cs.LG,cs.AI
Understanding Heterophily for Graph Neural Networks
Graphs with heterophily have been regarded as challenging scenarios for Graph Neural Networks (GNNs), where nodes are connected with dissimilar neighbors through various patterns. In this paper, we present theoretical understandings of the impacts of different heterophily patterns for GNNs by incorporating the graph convolution (GC) operations into fully connected networks via the proposed Heterophilous Stochastic Block Models (HSBM), a general random graph model that can accommodate diverse heterophily patterns. Firstly, we show that by applying a GC operation, the separability gains are determined by two factors, i.e., the Euclidean distance of the neighborhood distributions and $\sqrt{\mathbb{E}\left[\operatorname{deg}\right]}$, where $\mathbb{E}\left[\operatorname{deg}\right]$ is the averaged node degree. It reveals that the impact of heterophily on classification needs to be evaluated alongside the averaged node degree. Secondly, we show that the topological noise has a detrimental impact on separability, which is equivalent to degrading $\mathbb{E}\left[\operatorname{deg}\right]$. Finally, when applying multiple GC operations, we show that the separability gains are determined by the normalized distance of the $l$-powered neighborhood distributions. It indicates that the nodes still possess separability as $l$ goes to infinity in a wide range of regimes. Extensive experiments on both synthetic and real-world data verify the effectiveness of our theory.
Updated: 2024-06-04 14:15:55
标题: 理解图神经网络的异质性
摘要: 具有异质性的图被认为是对图神经网络(GNNs)具有挑战性的场景,其中节点通过各种模式与不同的邻居相连。在本文中,我们通过将图卷积(GC)操作整合到全连接网络中,提出了异质性随机块模型(HSBM)来理论上理解不同异质性模式对GNNs的影响,这是一个可以容纳各种异质性模式的一般随机图模型。首先,我们表明通过应用GC操作,可分性增益由两个因素决定,即邻域分布的欧几里得距离和$\sqrt{\mathbb{E}\left[\operatorname{deg}\right]}$,其中$\mathbb{E}\left[\operatorname{deg}\right]$是平均节点度。这表明了异质性对分类的影响需要与平均节点度一起评估。其次,我们表明拓扑噪音对可分性有害影响,相当于降低$\mathbb{E}\left[\operatorname{deg}\right]$。最后,当应用多个GC操作时,我们表明可分性增益由$l$次幂邻域分布的标准化距离决定。这表明在广泛范围内,当$l$趋于无穷大时,节点仍然具有可分性。对合成和真实数据的大量实验证实了我们理论的有效性。
更新时间: 2024-06-04 14:15:55
领域: cs.LG,stat.ML
More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity
In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.
Updated: 2024-06-04 14:09:44
标题: 更多的PAC-Bayes界限:从有界损失到具有一般尾部行为的损失,再到任何时候的有效性
摘要: 在本文中,我们提出了针对不同类型损失的新高概率PAC-Bayes界限。首先,对于具有有界范围的损失,我们恢复了Catoni界限的加强版本,该版本对所有参数值均成立。这导致了新的快速速率和混合速率界限,这些界限比文献中先前的界限更易解释且更紧密。特别地,快速速率界限等同于Seeger--Langford界限。其次,对于具有更一般尾部行为的损失,我们引入了两个新的无参数界限:当损失的累积生成函数有界时,我们提出了一个PAC-Bayes Chernoff类比,以及当损失的二阶矩有界时的一个界限。这两个界限是使用一种基于对“概率”参数优化问题的可能事件空间的离散化的新技术获得的。这种技术比先前在参数空间上的格子上进行优化的方法更简单且更通用。最后,通过一种适用于任何现有界限的简单技术,我们将所有先前的结果扩展为任何时刻有效的界限。
更新时间: 2024-06-04 14:09:44
领域: stat.ML,cs.LG
Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection
This paper explores the correlation between linguistic diversity, sentiment analysis and transformer model architectures. We aim to investigate how different English variations impact transformer-based models for irony detection. To conduct our study, we used the EPIC corpus to extract five diverse English variation-specific datasets and applied the KEN pruning algorithm on five different architectures. Our results reveal several similarities between optimal subnetworks, which provide insights into the linguistic variations that share strong resemblances and those that exhibit greater dissimilarities. We discovered that optimal subnetworks across models share at least 60% of their parameters, emphasizing the significance of parameter values in capturing and interpreting linguistic variations. This study highlights the inherent structural similarities between models trained on different variants of the same language and also the critical role of parameter values in capturing these nuances.
Updated: 2024-06-04 14:09:36
标题: Transformer模型中的语言指纹:语言变化如何影响讽刺检测中的参数选择
摘要: 本文探讨了语言多样性、情感分析和变压器模型架构之间的相关性。我们旨在调查不同的英语变体对使用变压器模型进行讽刺检测的影响。为了进行研究,我们使用EPIC语料库提取了五个不同的英语变体特定数据集,并在五种不同的架构上应用了KEN修剪算法。我们的结果揭示了最佳子网络之间的几个相似之处,这些相似之处为我们提供了洞察力,即哪些语言变体之间存在强烈的相似性,哪些存在更大的差异性。我们发现,跨模型的最佳子网络至少共享60%的参数,强调参数值在捕捉和解释语言变体方面的重要性。这项研究突显出在不同变体的同一语言上训练的模型之间固有的结构相似性,以及参数值在捕捉这些微妙之处中的关键作用。
更新时间: 2024-06-04 14:09:36
领域: cs.CL,cs.AI
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation
We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.
Updated: 2024-06-04 14:06:15
标题: 多项式增强神经网络(PANNs)具有弱正交约束以增强函数和PDE逼近
摘要: 我们提出了多项式增强神经网络(PANNs),这是一种将深度神经网络(DNNs)与多项式逼近结合的新型机器学习架构。PANNs结合了DNNs(在高维逼近中的灵活性和效率)与多项式逼近(对于光滑函数的快速收敛率)的优势。为了在各种问题上实现稳定训练和提高精度,我们提出了以下内容:(1)一系列正交性约束,强制在PANN中的多项式和DNN之间相互正交;(2)一种简单的基础修剪方法,以应对多项式组件引入的维度灾难;(3)一种多项式预处理策略的适应性,适用于DNNs和多项式。我们测试了由此产生的架构的多项式再现特性,逼近光滑函数和有限光滑度函数的能力,以及用于解决偏微分方程(PDEs)的方法。通过这些实验,我们证明了PANNs对于回归和PDEs的数值解都比DNNs具有更优越的逼近性能,同时在回归有限光滑度函数时也比多项式和基于DNN的回归(各自)提供了更高的精度。
更新时间: 2024-06-04 14:06:15
领域: cs.LG,68T07, 68U99, 65N99
Towards Neural Architecture Search for Transfer Learning in 6G Networks
The future 6G network is envisioned to be AI-native, and as such, ML models will be pervasive in support of optimizing performance, reducing energy consumption, and in coping with increasing complexity and heterogeneity. A key challenge is automating the process of finding optimal model architectures satisfying stringent requirements stemming from varying tasks, dynamicity and available resources in the infrastructure and deployment positions. In this paper, we describe and review the state-of-the-art in Neural Architecture Search and Transfer Learning and their applicability in networking. Further, we identify open research challenges and set directions with a specific focus on three main requirements with elements unique to the future network, namely combining NAS and TL, multi-objective search, and tabular data. Finally, we outline and discuss both near-term and long-term work ahead.
Updated: 2024-06-04 14:01:03
标题: 朝向在6G网络中进行迁移学习的神经架构搜索
摘要: 未来的6G网络被设想为AI原生,因此,机器学习模型将在优化性能、减少能耗以及应对日益复杂和异构的情况下得到广泛应用。一个关键挑战是自动化寻找满足来自不同任务、动态性和基础设施以及部署位置可用资源的严格要求的最佳模型架构的过程。在本文中,我们描述和回顾了神经架构搜索和迁移学习在网络中的最新进展及其适用性。此外,我们还确定了开放的研究挑战,并着眼于三个主要要求,这些要求在未来网络中具有独特的元素,即结合NAS和TL、多目标搜索以及表格数据。最后,我们概述和讨论了即将进行的近期和长期工作。
更新时间: 2024-06-04 14:01:03
领域: cs.NI,cs.AI,cs.LG
Extended Mind Transformers
Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should be updated for the keys and values retrieved. This intuitive method uses the model's own key/query system to select and attend to the most relevant memories at each generation step, rather than using external embeddings. We demonstrate the importance of external information being retrieved in a majority of decoder layers, contrary to previous work. We open source a new counterfactual long-range retrieval benchmark, and show that Extended Mind Transformers outperform today's state of the art by 6% on average.
Updated: 2024-06-04 14:00:25
标题: 扩展思维变形金刚
摘要: 预训练语言模型展示了智能和常识,但长输入很快成为推理时记忆信息的瓶颈。我们重新提出了一种简单的方法,即记忆变换器(Wu等,2022年),该方法使模型可以访问预先计算的记忆库。我们展示了通过批判性评估如何更新用于检索的位置编码,可以修复原始方法的许多缺点,比如对微调的需求。这种直观的方法使用模型自己的关键/查询系统在每一代步中选择并关注最相关的记忆,而不是使用外部嵌入。我们展示了在大多数解码器层中检索外部信息的重要性,与之前的工作相反。我们开源了一个新的反事实长距离检索基准,并展示了延伸思维变换器在平均值上比今天的最新技术表现优越6%。
更新时间: 2024-06-04 14:00:25
领域: cs.LG,cs.CL
Practical Performance Guarantees for Pipelined DNN Inference
We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \{2, 4, 8, 16, 32, 64\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across a production testbed with $k = 16$ pipeline stages, our MIP formulations raise the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds close the optimality gap by a factor of 9.855x.
Updated: 2024-06-04 13:58:30
标题: 管道式深度神经网络推理的实际性能保证
摘要: 我们通过将模型图分为$k$个阶段,并最小化瓶颈阶段的运行时间(包括通信)来优化深度神经网络(DNN)推断的流水线并行性。我们为这个NP困难问题提供了实用和有效的算法,但我们的重点是解决从业者在决定何时解决方案足够好时面临的困境。为此,我们设计了用于证明下界的新型混合整数规划(MIP)松弛。将这些方法应用于369个生产模型的多样化测试平台,对于$k \in \{2, 4, 8, 16, 32, 64\}$,我们在实践中证明了这些下界足够强大。我们的下界明显比标准组合界强大。例如,通过在具有$k=16$个流水线阶段的生产测试平台上的几何平均值进行评估,我们的MIP公式将下界从0.4598提高到0.9452,表示为找到的最佳分区的一部分。换句话说,我们改进的下界将最优性差距缩小了9.855倍。
更新时间: 2024-06-04 13:58:30
领域: cs.LG,cs.DC
On Affine Homotopy between Language Encoders
Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the performance on downstream tasks. It is common to consider two encoders similar if they are \emph{homotopic}, i.e., if they can be aligned through some transformation. In this spirit, we study the properties of \emph{affine} alignment of language encoders and its implications on extrinsic similarity. We find that while affine alignment is fundamentally an asymmetric notion of similarity, it is still informative of extrinsic similarity. We confirm this on datasets of natural language representations. Beyond providing useful bounds on extrinsic similarity, affine intrinsic similarity also allows us to begin uncovering the structure of the space of pre-trained encoders by defining an order over them.
Updated: 2024-06-04 13:58:28
标题: 关于语言编码器之间的仿射同伦
摘要: 预训练语言编码器--将文本表示为向量的功能--是许多自然语言处理任务的重要组成部分。我们探讨了语言编码器分析中的一个自然问题:两个编码器何为相似?我们认为,一个忠实的相似度度量需要是\emph{内在的},即与任务无关,但仍需具有\emph{外在}相似性的信息性。通常认为,如果两个编码器是\emph{同伦的},即它们可以通过某种变换对齐,那么它们是相似的。在这种精神下,我们研究了语言编码器的\emph{仿射}对齐属性及其对外在相似性的影响。我们发现,虽然仿射对齐本质上是一种不对称的相似性概念,但它仍然能够提供关于外在相似性的信息。我们在自然语言表征数据集上证实了这一点。除了为外在相似性提供有用的界限外,仿射内在相似性还使我们能够通过对编码器定义一种顺序来开始揭示预训练编码器空间的结构。
更新时间: 2024-06-04 13:58:28
领域: cs.CL,cs.LG
Continual Unsupervised Out-of-Distribution Detection
Deep learning models excel when the data distribution during training aligns with testing data. Yet, their performance diminishes when faced with out-of-distribution (OOD) samples, leading to great interest in the field of OOD detection. Current approaches typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution. While this assumption is appropriate in the traditional unsupervised OOD (U-OOD) setting, it proves inadequate when considering the place of deployment of the underlying deep learning model. To better reflect this real-world scenario, we introduce the novel setting of continual U-OOD detection. To tackle this new setting, we propose a method that starts from a U-OOD detector, which is agnostic to the OOD distribution, and slowly updates during deployment to account for the actual OOD distribution. Our method uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach. Furthermore, we design a confidence-scaled few-shot OOD detector that outperforms previous methods. We show our method greatly improves upon strong baselines from related fields.
Updated: 2024-06-04 13:57:34
标题: 持续无监督的未知样本检测
摘要: 深度学习模型在训练过程中与测试数据保持一致的数据分布时表现出色。然而,当面对不在分布范围内的样本时,它们的性能会下降,这导致了对OOD检测领域的极大兴趣。目前的方法通常假设OOD样本来自于训练分布的互补分布。虽然这一假设在传统的无监督OOD(U-OOD)设置中是合适的,但在考虑到深度学习模型部署地点时,它显然是不足够的。为了更好地反映这种实际情况,我们引入了连续U-OOD检测的新领域。为了应对这种新的设置,我们提出了一种方法,从一个与OOD分布无关的U-OOD检测器开始,并在部署过程中慢慢更新,以考虑实际的OOD分布。我们的方法使用结合马氏距离和最近邻方法的新的U-OOD评分函数。此外,我们设计了一个置信度加权的少样本OOD检测器,优于先前的方法。我们展示了我们的方法在相关领域的强基线上取得了显著的改进。
更新时间: 2024-06-04 13:57:34
领域: cs.CV,cs.LG
Technical Language Processing for Telecommunications Specifications
Large Language Models (LLMs) are continuously being applied in a more diverse set of contexts. At their current state, however, even state-of-the-art LLMs such as Generative Pre-Trained Transformer 4 (GTP-4) have challenges when extracting information from real-world technical documentation without a heavy preprocessing. One such area with real-world technical documentation is telecommunications engineering, which could greatly benefit from domain-specific LLMs. The unique format and overall structure of telecommunications internal specifications differs greatly from standard English and thus it is evident that the application of out-of-the-box Natural Language Processing (NLP) tools is not a viable option. In this article, we outline the limitations of out-of-the-box NLP tools for processing technical information generated by telecommunications experts, and expand the concept of Technical Language Processing (TLP) to the telecommunication domain. Additionally, we explore the effect of domain-specific LLMs in the work of Specification Engineers, emphasizing the potential benefits of adopting domain-specific LLMs to speed up the training of experts in different telecommunications fields.
Updated: 2024-06-04 13:57:22
标题: 电信规范的技术语言处理
摘要: 大型语言模型(LLMs)正在越来越多地应用于各种不同的情境中。然而,在当前状态下,即使是最先进的LLMs,如生成式预训练变压器4(GTP-4),在提取来自真实世界技术文档的信息时也面临挑战,需要进行大量的预处理。其中一个具有真实技术文档的领域是电信工程,该领域可以极大地受益于领域特定的LLMs。电信内部规范的独特格式和整体结构与标准英语有很大的不同,因此很明显,直接使用自然语言处理(NLP)工具并不是可行的选择。在本文中,我们概述了用于处理电信专家生成的技术信息的现成NLP工具的局限性,并将技术语言处理(TLP)的概念扩展到电信领域。此外,我们探讨了领域特定的LLMs对规范工程师工作的影响,强调采用领域特定的LLMs加快不同电信领域专家培训的潜在好处。
更新时间: 2024-06-04 13:57:22
领域: cs.CL,cs.AI
Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective
Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning), raising significant concerns about their reliability and trustworthiness. Previous studies primarily focus on traditional adversarial or backdoor attacks, overlooking the resource-intensive or privileged-manipulation nature of such threats, thus limiting their practical generalization, stealthiness, universality and robustness. Correspondingly, in this paper, we delve into the inherent vulnerabilities in FRS through user studies and preliminary explorations. By exploiting these vulnerabilities, we identify a novel attack, facial identity backdoor attack dubbed FIBA, which unveils a potentially more devastating threat against FRS:an enrollment-stage backdoor attack. FIBA circumvents the limitations of traditional attacks, enabling broad-scale disruption by allowing any attacker donning a specific trigger to bypass these systems. This implies that after a single, poisoned example is inserted into the database, the corresponding trigger becomes a universal key for any attackers to spoof the FRS. This strategy essentially challenges the conventional attacks by initiating at the enrollment stage, dramatically transforming the threat landscape by poisoning the feature database rather than the training data.
Updated: 2024-06-04 13:56:27
标题: 重新思考人脸识别系统的漏洞:从实际角度出发
摘要: 面部识别系统(FRS)越来越多地融入到关键应用中,包括监控和用户认证,突显了它们在现代安全系统中的关键作用。最近的研究揭示了FRS对敌对攻击(例如,敌对贴片攻击)和后门攻击(例如,训练数据毒害)的脆弱性,引发了对它们可靠性和信任度的重大关注。先前的研究主要关注传统的敌对或后门攻击,忽视了这些威胁的资源密集型或特权操纵性质,从而限制了它们的实际泛化、偷偷性、普遍性和稳健性。因此,在本文中,我们通过用户研究和初步探索深入探讨FRS中固有的脆弱性。通过利用这些脆弱性,我们发现了一种新型攻击,名为面部身份后门攻击(FIBA),揭示了对FRS的可能更具破坏性的威胁:一种在注册阶段的后门攻击。FIBA绕过了传统攻击的限制,通过允许任何携带特定触发器的攻击者绕过这些系统,实现了广泛规模的破坏。这意味着在数据库中插入一个有毒的示例之后,相应的触发器就成为任何攻击者欺骗FRS的通用密钥。这种策略本质上挑战了传统攻击,通过在注册阶段发动攻击,通过毒害特征数据库而不是训练数据,从而戏剧性地改变了威胁格局。
更新时间: 2024-06-04 13:56:27
领域: cs.CR
MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning
In high-stakes domains like clinical reasoning, AI assistants powered by large language models (LLMs) are yet to be reliable and safe. We identify a key obstacle towards reliability: existing LLMs are trained to answer any question, even with incomplete context in the prompt or insufficient parametric knowledge. We propose to change this paradigm to develop more careful LLMs that ask follow-up questions to gather necessary and sufficient information and respond reliably. We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. The Patient may provide incomplete information in the beginning; the Expert refrains from making diagnostic decisions when unconfident, and instead elicits missing details from the Patient via follow-up questions. To evaluate MEDIQ, we convert MEDQA and CRAFT-MD -- medical benchmarks for diagnostic question answering -- into an interactive setup. We develop a reliable Patient system and prototype several Expert systems, first showing that directly prompting state-of-the-art LLMs to ask questions degrades the quality of clinical reasoning, indicating that adapting LLMs to interactive information-seeking settings is nontrivial. We then augment the Expert with a novel abstention module to better estimate model confidence and decide whether to ask more questions, thereby improving diagnostic accuracy by 20.3%; however, performance still lags compared to an (unrealistic in practice) upper bound when full information is given upfront. Further analyses reveal that interactive performance can be improved by filtering irrelevant contexts and reformatting conversations. Overall, our paper introduces a novel problem towards LLM reliability, a novel MEDIQ framework, and highlights important future directions to extend the information-seeking abilities of LLM assistants in critical domains.
Updated: 2024-06-04 13:55:05
标题: MEDIQ:用于自适应和可靠临床推理的问题提问LLMs
摘要: 在临床推理等高风险领域,由大型语言模型(LLM)驱动的人工智能助手尚未达到可靠和安全的水平。我们确定了一个关键障碍:现有的LLM被训练来回答任何问题,即使在提示中存在不完整的背景或不足的参数知识。我们提出改变这种范式,开发更加谨慎的LLM,通过提出后续问题来收集必要和充分的信息,并可靠地回答问题。我们介绍了MEDIQ,一个模拟现实临床交互的框架,其中包括患者系统和自适应专家系统。患者可能在开始时提供不完整的信息;专家在不确定时避免做出诊断决定,而是通过后续问题从患者那里获取缺失的细节。为了评估MEDIQ,我们将MEDQA和CRAFT-MD(用于诊断问题回答的医学基准)转化为一个交互设置。我们开发了一个可靠的患者系统,并原型化了几个专家系统,首先显示直接提示最先进的LLM提出问题会降低临床推理的质量,表明将LLM适应交互式信息获取设置并非易事。然后,我们增加了一个新颖的弃权模块来更好地估计模型的信心水平,并决定是否提出更多问题,从而将诊断准确性提高了20.3%;然而,性能仍然落后于(实践中不切实际的)上限,即提前提供完整信息时的表现。进一步的分析揭示了通过过滤无关上下文和重新格式化对话可以改善交互性能。总的来说,我们的论文介绍了一个关于LLM可靠性的新问题,一个新颖的MEDIQ框架,并强调了在关键领域扩展LLM助手信息获取能力的重要未来方向。
更新时间: 2024-06-04 13:55:05
领域: cs.CL,cs.AI
A Survey of Transformer Enabled Time Series Synthesis
Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the transformer, generative AI, and time series data, and reviews works in this sparsely populated subdomain. The reviewed works show great variety in approach, and have not yet converged on a conclusive answer to the problems the domain poses. GANs, diffusion models, state space models, and autoencoders were all encountered alongside or surrounding the transformers which originally motivated the survey. While too open a domain to offer conclusive insights, the works surveyed are quite suggestive, and several recommendations for best practice, and suggestions of valuable future work, are provided.
Updated: 2024-06-04 13:52:42
标题: 一项关于变压器启用的时间序列合成的调查
摘要: 生成式人工智能在图像和语言领域受到了广泛关注,而变压器神经网络继续主导着最新技术。然而,将这些模型应用于时间序列生成的研究较少,但对机器学习、隐私保护和可解释性研究非常有用。本研究回顾了变压器、生成式人工智能和时间序列数据的交叉点上这一空白,对这一稀缺领域的作品进行了审查。审查的作品在方法上表现出很大的多样性,尚未就该领域提出的问题达成定论。在最初激发本调查的变压器旁或周围,还出现了GANs、扩散模型、状态空间模型和自编码器。虽然这个领域过于开放,无法提供明确的见解,但审查的作品具有相当的启示性,并提供了一些最佳实践建议和宝贵未来工作的建议。
更新时间: 2024-06-04 13:52:42
领域: cs.LG,cs.AI
Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark.
Updated: 2024-06-04 13:51:10
标题: 相信模型,当它相信自己——基于模型的带不确定性感知的演员-评论家算法的展开适应
摘要: Dyna风格的基于模型的强化学习(MBRL)通过模型驱动的回滚将无模型代理与预测转换模型相结合。这种组合提出了一个关键问题:“何时信任您的模型?”即,哪种回滚长度会使模型提供有用的数据?Janner等人(2019)通过逐渐增加训练中的回滚长度来解决这个问题。虽然在理论上很诱人,但统一的模型准确性是一种错误观念,最迟在外推时就会崩溃。相反,我们提出了一个问题:“在哪里信任您的模型?”利用固有的模型不确定性来考虑局部准确性,我们得到了具有不确定性感知回滚适应性的基于模型的演员评论家模型(MACURA)算法。我们提出了一个易于调整的回滚机制,并证明与MuJoCo基准上最先进的深度MBRL方法相比,在数据效率和性能方面取得了显著的改进。
更新时间: 2024-06-04 13:51:10
领域: cs.LG
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection
With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time series is available at a central location. However, we are witnessing the decentralized collection of time series due to the deployment of various edge devices. To bridge the gap between the decentralized time series data and the centralized anomaly detection algorithms, we propose a Parameter-efficient Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. PeFAD for the first time employs the pre-trained language model (PLM) as the body of the client's local model, which can benefit from its cross-modality knowledge transfer capability. To reduce the communication overhead and local model adaptation cost, we propose a parameter-efficient federated training module such that clients only need to fine-tune small-scale parameters and transmit them to the server for update. PeFAD utilizes a novel anomaly-driven mask selection strategy to mitigate the impact of neglected anomalies during training. A knowledge distillation operation on a synthetic privacy-preserving dataset that is shared by all the clients is also proposed to address the data heterogeneity issue across clients. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74\%.
Updated: 2024-06-04 13:51:08
标题: PeFAD:一种用于时间序列异常检测的参数高效的联邦框架
摘要: 随着移动传感技术的普及,大量的时间序列数据在各个领域中产生并积累,为许多现实世界的应用提供了支持。在这种情况下,时间序列异常检测变得非常重要。它致力于在时间序列中识别与正常样本分布不同的异常样本。现有方法通常假设所有时间序列都可在中心位置获得。然而,由于各种边缘设备的部署,我们目睹了时间序列的去中心化收集。为了弥合去中心化时间序列数据与集中式异常检测算法之间的差距,我们提出了一个名为PeFAD的参数高效的联邦异常检测框架,同时考虑了隐私问题。PeFAD首次将预训练语言模型(PLM)作为客户端本地模型的主体,可以从其跨模态知识转移能力中受益。为了减少通信开销和本地模型适应成本,我们提出了一个参数高效的联邦训练模块,使客户端只需要微调小规模参数并将其传输到服务器进行更新。PeFAD利用一种新颖的基于异常的掩码选择策略来减轻训练过程中忽略异常的影响。还提出了对所有客户端共享的合成隐私保护数据集进行知识蒸馏操作,以解决客户端之间数据异质性问题。我们在四个真实数据集上进行了广泛的评估,结果显示,PeFAD的性能优于现有的最先进基线方法,提升高达28.74%。
更新时间: 2024-06-04 13:51:08
领域: cs.LG,cs.DB,cs.DC
Generative Conditional Distributions by Neural (Entropic) Optimal Transport
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE.
Updated: 2024-06-04 13:45:35
标题: 通过神经(熵)最优传输生成条件分布
摘要: 学习条件分布是具有挑战性的,因为期望的结果不是一个单一的分布,而是对应于协变量的多个实例的多个分布。我们引入了一种新颖的神经熵最优输运方法,旨在有效地学习条件分布的生成模型,特别是在样本量有限的情况下。我们的方法依赖于两个神经网络的极小极大训练:一个生成网络参数化条件分布的逆累积分布函数,另一个网络参数化条件Kantorovich势。为了防止过拟合,我们通过惩罚网络输出的Lipschitz常数来正则化目标函数。我们在真实数据集上的实验表明,与最先进的条件分布学习技术相比,我们的算法的有效性。我们的实现可以在https://github.com/nguyenngocbaocmt02/GENTLE找到。
更新时间: 2024-06-04 13:45:35
领域: cs.LG,cs.AI,stat.ML
Fast and Secure Decentralized Optimistic Rollups Using Setchain
Modern blockchains face a scalability challenge due to the intrinsic throughput limitations of consensus protocols. Layer 2 optimistic rollups (L2) are a faster alternative that offer the same interface in terms of smart contract development and user interaction. Optimistic rollups perform most computations offchain and make light use of an underlying blockchain (L1) to guarantee correct behavior, implementing a cheaper blockchain on a blockchain solution. With optimistic rollups, a sequencer calculates offchain batches of L2 transactions and commits batches (compressed or hashed) to the L1 blockchain. The use of hashes requires a data service to translate hashes into their corresponding batches. Current L2 implementations consist of a centralized sequencer (central authority) and an optional data availability committee (DAC). In this paper, we propose a decentralized L2 optimistic rollup based on Setchain, a decentralized Byzantine-tolerant implementation of sets. The main contribution is a fully decentralized "arranger" where arrangers are a formal definition combining sequencers and DACs. We prove our implementation correct and show empirical evidence that our solution scales. A final contribution is a system of incentives (payments) for servers that implement the sequencer and data availability committee protocols correctly, and a fraud-proof mechanism to detect violations of the protocol.
Updated: 2024-06-04 13:45:12
标题: 快速且安全的去中心化乐观 Rollups 使用 Setchain
摘要: 现代区块链面临着可扩展性挑战,这是由于共识协议固有的吞吐量限制所导致的。第二层乐观聚合(L2)是一种更快的替代方案,提供了与智能合约开发和用户交互相同的界面。乐观聚合在链下执行大部分计算,并且轻度使用底层区块链(L1)来保证正确行为,实现了一种更便宜的区块链解决方案。通过乐观聚合,一个顺序器计算链下L2交易的批次,并将批次(压缩或哈希)提交到L1区块链。使用哈希需要一个数据服务将哈希转换为其对应的批次。当前的L2实现包括一个集中式顺序器(中央机构)和一个可选的数据可用性委员会(DAC)。 在本文中,我们提出了一种基于Setchain的去中心化L2乐观聚合,Setchain是一种去中心化的拜占庭容错实现。主要贡献是一个完全去中心化的“安排者”,其中安排者是一个正式的定义,结合了顺序器和DAC。我们证明了我们的实现是正确的,并展示了我们的解决方案具有可扩展性的实证证据。最后一个贡献是一个激励系统(付款),用于实现正确的顺序器和数据可用性委员会协议的服务器,并且具有一种防欺诈机制来检测协议的违规行为。
更新时间: 2024-06-04 13:45:12
领域: cs.CR,cs.DC,cs.LO
An Independence-promoting Loss for Music Generation with Language Models
Music generation schemes using language modeling rely on a vocabulary of audio tokens, generally provided as codes in a discrete latent space learnt by an auto-encoder. Multi-stage quantizers are often employed to produce these tokens, therefore the decoding strategy used for token prediction must be adapted to account for multiple codebooks: either it should model the joint distribution over all codebooks, or fit the product of the codebook marginal distributions. Modelling the joint distribution requires a costly increase in the number of auto-regressive steps, while fitting the product of the marginals yields an inexact model unless the codebooks are mutually independent. In this work, we introduce an independence-promoting loss to regularize the auto-encoder used as the tokenizer in language models for music generation. The proposed loss is a proxy for mutual information based on the maximum mean discrepancy principle, applied in reproducible kernel Hilbert spaces. Our criterion is simple to implement and train, and it is generalizable to other multi-stream codecs. We show that it reduces the statistical dependence between codebooks during auto-encoding. This leads to an increase in the generated music quality when modelling the product of the marginal distributions, while generating audio much faster than the joint distribution model.
Updated: 2024-06-04 13:44:39
标题: 使用语言模型的音乐生成中的促进独立性的损失
摘要: 使用语言建模生成音乐的方案依赖于一个音频令牌的词汇表,通常以自动编码器学习的离散潜在空间中的代码形式提供。多阶量化器通常用于生成这些令牌,因此用于令牌预测的解码策略必须适应多个码书:要么对所有码书建模联合分布,要么拟合码书边缘分布的乘积。对联合分布进行建模需要昂贵地增加自回归步骤的数量,而拟合边缘的乘积会产生一个不准确的模型,除非码书是相互独立的。在这项工作中,我们引入了一种促进独立性的损失,用于规范用作音乐生成语言模型中分词器的自动编码器。所提出的损失是基于最大均值差异原则的互信息代理,在可重现的核希尔伯特空间中应用。我们的标准易于实施和训练,并且可以推广到其他多流编解码器。我们展示了它减少了自动编码过程中码书之间的统计依赖性。这导致在建模边缘分布的乘积时增加了生成的音乐质量,同时比联合分布模型生成音频更快。
更新时间: 2024-06-04 13:44:39
领域: cs.SD,cs.AI,cs.LG,eess.AS
Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models
Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. In this work, we propose to perform TI along an alchemical pathway represented by a trainable neural network, which we term Neural TI. Critically, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient using a denoising-diffusion objective. The ability of the resulting energy-based diffusion model to sample all intermediate ensembles, allows us to perform TI from a single reference calculation. We apply our method to Lennard-Jones fluids, where we report accurate calculations of the excess chemical potential, demonstrating that Neural TI is capable of coupling hundreds of degrees of freedom at once.
Updated: 2024-06-04 13:42:42
标题: 神经热力学整合:基于能量扩散模型的自由能
摘要: 热力学积分(TI)提供了一种严格的方法,通过在一系列插值构象集成的过程中估计自由能差异。然而,TI计算在计算上是昂贵的,通常由于需要对足够的构象空间重叠进行大量中间集合的采样,而仅限于耦合少量自由度。在这项工作中,我们提出沿着可训练的神经网络表示的炼金路径执行TI,我们称之为神经TI。关键是,我们参数化一个时间相关的哈密顿量,在相互作用和非相互作用系统之间插值,并使用去噪扩散目标优化其梯度。由于结果能量基扩散模型能够采样所有中间集合,我们能够从单个参考计算中执行TI。我们将我们的方法应用于Lennard-Jones流体,在那里我们报告了过量化学势的准确计算,表明神经TI能够一次耦合数百个自由度。
更新时间: 2024-06-04 13:42:42
领域: cond-mat.stat-mech,cs.LG
In value-based deep reinforcement learning, a pruned network is a good network
Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.
Updated: 2024-06-04 13:42:06
标题: 在基于价值的深度强化学习中,修剪网络是一个好网络。
摘要: 最近的研究表明,深度强化学习代理在有效利用其网络参数方面存在困难。我们利用之前对稀疏训练技术优势的见解,并证明逐渐剪枝可以使基于价值的代理最大化参数的有效性。这导致网络相比传统网络有显著的性能提升,仅使用完整网络参数的一小部分。
更新时间: 2024-06-04 13:42:06
领域: cs.LG,cs.AI
Disentangled Representation via Variational AutoEncoder for Continuous Treatment Effect Estimation
Continuous treatment effect estimation holds significant practical importance across various decision-making and assessment domains, such as healthcare and the military. However, current methods for estimating dose-response curves hinge on balancing the entire representation by treating all covariates as confounding variables. Although various approaches disentangle covariates into different factors for treatment effect estimation, they are confined to binary treatment settings. Moreover, observational data are often tainted with non-causal noise information that is imperceptible to the human. Hence, in this paper, we propose a novel Dose-Response curve estimator via Variational AutoEncoder (DRVAE) disentangled covariates representation. Our model is dedicated to disentangling covariates into instrumental factors, confounding factors, adjustment factors, and external noise factors, thereby facilitating the estimation of treatment effects under continuous treatment settings by balancing the disentangled confounding factors. Extensive results on synthetic and semi-synthetic datasets demonstrate that our model outperforms the current state-of-the-art methods.
Updated: 2024-06-04 13:41:07
标题: 使用变分自动编码器进行连续处理效果估计的解耦表示
摘要: 连续治疗效果估计在各种决策和评估领域,如医疗保健和军事中具有重要的实际意义。然而,目前用于估计剂量-响应曲线的方法依赖于通过将所有协变量视为混杂变量来平衡整个表示。虽然各种方法将协变量解开为不同因素以用于治疗效果估计,但它们受限于二进制治疗设置。此外,观察数据通常受到不可察觉的非因果噪声信息的污染。因此,在本文中,我们提出了一种通过变分自动编码器(DRVAE)对解开协变量表示进行剂量-响应曲线估计的新方法。我们的模型致力于将协变量解开为工具性因素、混杂因素、调整因素和外部噪声因素,从而通过平衡解开的混杂因素来促进在连续治疗设置下估计治疗效果。对合成和半合成数据集的广泛结果表明,我们的模型优于当前的最先进方法。
更新时间: 2024-06-04 13:41:07
领域: cs.LG
Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of two families of distributions, named Exponential Standard Gaussian (ESG) and Exponential General Gaussian (EGG) distributions, on Randomized Smoothing and Double Sampling Randomized Smoothing (DSRS). We derive an analytic formula for ESG's certified radius, which converges to the origin formula of RS as the dimension $d$ increases. Additionally, we prove that EGG can provide tighter constant factors than DSRS in providing $\Omega(\sqrt{d})$ lower bounds of $\ell_2$ certified radius, and thus further addresses the curse of dimensionality in RS. Our experiments on real-world datasets confirm our theoretical analysis of the ESG distributions, that they provide almost the same certification under different exponents $\eta$ for both RS and DSRS. In addition, EGG
Updated: 2024-06-04 13:41:00
标题: 指数高斯分布对(双采样)随机平滑的影响
摘要: 随机平滑(RS)目前是一种可扩展的认证防御方法,提供对抗性示例的强健认证。虽然在提供对$\ell_p$敌手的防御方面取得了显著进展,但平滑分布与强健认证之间的交互仍然模糊不清。在这项工作中,我们全面研究了两类分布,名为指数标准高斯(ESG)和指数一般高斯(EGG)分布,对随机平滑和双重采样随机平滑(DSRS)的影响。我们推导了ESG认证半径的解析公式,当维度$d$增加时,它收敛于RS的原始公式。此外,我们证明了EGG可以提供比DSRS更紧密的常数因子,从而提供$\Omega(\sqrt{d})$的$\ell_2$认证半径下界,进一步解决了RS中的维度诅咒。我们在真实数据集上的实验证实了对ESG分布的理论分析,它们在不同指数$\eta$下为RS和DSRS提供几乎相同的认证。此外,EGG
更新时间: 2024-06-04 13:41:00
领域: cs.LG
Dynamics Harmonic Analysis of Robotic Systems: Application in Data-Driven Koopman Modelling
We introduce the use of harmonic analysis to decompose the state space of symmetric robotic systems into orthogonal isotypic subspaces. These are lower-dimensional spaces that capture distinct, symmetric, and synergistic motions. For linear dynamics, we characterize how this decomposition leads to a subdivision of the dynamics into independent linear systems on each subspace, a property we term dynamics harmonic analysis (DHA). To exploit this property, we use Koopman operator theory to propose an equivariant deep-learning architecture that leverages the properties of DHA to learn a global linear model of the system dynamics. Our architecture, validated on synthetic systems and the dynamics of locomotion of a quadrupedal robot, exhibits enhanced generalization, sample efficiency, and interpretability, with fewer trainable parameters and computational costs.
Updated: 2024-06-04 13:39:17
标题: 机器人系统的动力学谐波分析:在数据驱动的Koopman建模中的应用
摘要: 我们引入谐波分析来将对称机器人系统的状态空间分解为正交同构子空间。这些是捕捉不同、对称和协同运动的低维空间。对于线性动态,我们表征了这种分解如何导致动态在每个子空间上分为独立的线性系统,我们称之为动力学谐波分析(DHA)属性。为了利用这一属性,我们使用库普曼算子理论提出了一个等变深度学习架构,利用DHA的性质学习系统动态的全局线性模型。我们的架构在合成系统和四足机器人运动动力学上得到验证,表现出增强的泛化能力、样本效率和可解释性,可用较少的可训练参数和计算成本。
更新时间: 2024-06-04 13:39:17
领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY,43-08
Mixtures of Experts Unlock Parameter Scaling for Deep RL
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Updated: 2024-06-04 13:36:10
标题: 专家混合解锁深度强化学习的参数缩放
摘要: 最近对(自)监督学习模型的快速进展在很大程度上是通过实证缩放定律来预测的:模型的性能与其大小成比例。然而,在强化学习领域中,类似的缩放定律仍然难以实现,因为增加模型的参数数量通常会损害其最终性能。本文中,我们展示了将专家混合(MoE)模块,特别是Soft MoEs(Puigcerver等人,2023)纳入基于价值的网络中,可以实现更具参数可伸缩性的模型,这在各种训练方案和模型大小中都表现出明显的性能提升。因此,这项工作为发展强化学习的缩放定律提供了有力的实证证据。
更新时间: 2024-06-04 13:36:10
领域: cs.LG,cs.AI
CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data
Neural Machine Translation (NMT) for low-resource languages is still a challenging task in front of NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation including OpusMT, NLLB, and mBART. We carried out automatic evaluation using a range of different metrics including lexical-based and embedding-based. Furthermore. we create a user-friendly interface for the models we included in this\textsc{ CantonMT} research project and make it available to facilitate Cantonese-to-English MT research. Researchers can add more models into this platform via our open-source\textsc{ CantonMT} toolkit \url{https://github.com/kenrickkung/CantoneseTranslation}.
Updated: 2024-06-04 13:31:20
标题: CantonMT: 使用合成回译数据微调模型的粤语到英语神经机器翻译平台
摘要: 神经机器翻译(NMT)对于低资源语言仍然是自然语言处理研究人员面临的一个具有挑战性的任务。在这项工作中,我们将一种标准的数据增强方法,即反向翻译,应用到一个新的语言翻译方向,即粤语到英语。我们展示了我们使用有限数量的真实数据和通过反向翻译生成的合成数据(包括OpusMT、NLLB和mBART)进行微调的模型。我们使用一系列不同的度量标准进行自动评估,包括基于词汇和基于嵌入的度量。此外,我们为这个CantonMT研究项目中包含的模型创建了一个用户友好的界面,并使其可用以促进粤语到英语的机器翻译研究。研究人员可以通过我们的开源CantonMT工具包将更多模型添加到这个平台上。
更新时间: 2024-06-04 13:31:20
领域: cs.CL,cs.AI
Node-Level Topological Representation Learning on Point Clouds
Topological Data Analysis (TDA) allows us to extract powerful topological and higher-order information on the global shape of a data set or point cloud. Tools like Persistent Homology or the Euler Transform give a single complex description of the global structure of the point cloud. However, common machine learning applications like classification require point-level information and features to be available. In this paper, we bridge this gap and propose a novel method to extract node-level topological features from complex point clouds using discrete variants of concepts from algebraic topology and differential geometry. We verify the effectiveness of these topological point features (TOPF) on both synthetic and real-world data and study their robustness under noise.
Updated: 2024-06-04 13:29:12
标题: 基于点云的节点级拓扑表示学习
摘要: 拓扑数据分析(TDA)使我们能够从数据集或点云的全局形状中提取强大的拓扑和高阶信息。诸如持久同调或欧拉变换之类的工具提供了对点云全局结构的单一复杂描述。然而,常见的机器学习应用(如分类)需要点级信息和特征可用。在本文中,我们弥合了这一差距,并提出了一种新颖的方法,使用代数拓扑和微分几何概念的离散变体,从复杂点云中提取节点级拓扑特征。我们验证了这些拓扑点特征(TOPF)在合成和真实数据上的有效性,并研究它们在噪声下的稳健性。
更新时间: 2024-06-04 13:29:12
领域: math.AT,cs.CG,cs.LG
Text Embedding Inversion Security for Multilingual Language Models
Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages potentially exposed to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and cross-lingual inversion attacks, and explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English-based defences may be ineffective. To alleviate this, we propose a simple masking defense effective for both monolingual and multilingual models. This study is the first to investigate multilingual inversion attacks, shedding light on the differences in attacks and defenses across monolingual and multilingual settings.
Updated: 2024-06-04 13:28:10
标题: 多语言语言模型的文本嵌入反向安全性
摘要: 文本数据通常在自然语言处理中以实数嵌入的形式表示,尤其是随着大型语言模型(LLMs)和嵌入式服务(EaaS)的流行。然而,将敏感信息存储为嵌入可能易受安全漏洞的影响,因为研究表明,即使不了解基础模型,文本也可以从嵌入中重新构建出来。虽然已经探索了一些防御机制,但这些机制都是专注于英语,导致其他语言可能面临攻击的风险。本研究通过多语言嵌入反演探讨了LLM的安全性。我们定义了黑盒多语言和跨语言反演攻击的问题,并探讨了它们的潜在影响。我们的研究结果表明,多语言LLMs可能更容易受到反演攻击的影响,部分原因是基于英语的防御可能无效。为了缓解这一问题,我们提出了一个简单的掩码防御方案,对单语和多语言模型都有效。这项研究是第一个研究多语言反演攻击的研究,揭示了在单语和多语环境中攻击和防御之间的差异。
更新时间: 2024-06-04 13:28:10
领域: cs.CL,cs.AI,cs.CR
TreePIR: Efficient Private Retrieval of Merkle Proofs via Tree Colorings with Fast Indexing and Zero Storage Overhead
A Batch Private Information Retrieval (batch-PIR) scheme allows a client to retrieve multiple data items from a database without revealing them to the storage server(s). Most existing approaches for batch-PIR are based on batch codes, in particular, probabilistic batch codes (PBC) (Angel et al. S&P'18), which incur large storage overheads. In this work, we show that \textit{zero} storage overhead is achievable for tree-shaped databases. In particular, we develop TreePIR, a novel approach tailored made for private retrieval of the set of nodes along an arbitrary root-to-leaf path in a Merkle tree with no storage redundancy. This type of trees has been widely implemented in many real-world systems such as Amazon DynamoDB, Google's Certificate Transparency, and blockchains. Tree nodes along a root-to-leaf path forms the well-known Merkle proof. TreePIR, which employs a novel tree coloring, outperforms PBC, a fundamental component in state-of-the-art batch-PIR schemes (Angel et al. S&P'18, Mughees-Ren S&P'23, Liu et al. S&P'24), in all metrics, achieving $3\times$ lower total storage and $1.5$-$2\times$ lower computation and communication costs. Most notably, TreePIR has $8$-$160\times$ lower setup time and its polylog-complexity indexing algorithm is $19$-$160\times$ faster than PBC for trees of $2^{10}$-$2^{24}$ leaves.
Updated: 2024-06-04 13:24:18
标题: TreePIR:通过树着色实现Merkle证明的高效私人检索,具有快速索引和零存储开销
摘要: 一种批量私人信息检索(批量-PIR)方案允许客户端从数据库中检索多个数据项,而不向存储服务器透露这些数据。大多数现有的批量-PIR方法都基于批量编码,特别是概率批量编码(PBC)(Angel等人,S&P'18),这会导致较大的存储开销。在这项工作中,我们展示了对于树形数据库可以实现零存储开销。具体而言,我们开发了TreePIR,一种专门针对梅克尔树中沿任意根到叶路径私人检索节点集的新方法,而且没有存储冗余。这种类型的树在许多现实世界系统中得到广泛实现,如亚马逊DynamoDB、Google的证书透明度和区块链。沿着根到叶路径的树节点形成了众所周知的梅克尔证明。TreePIR采用了一种新颖的树着色方法,在所有指标上均优于PBC,PBC是最新批量-PIR方案(Angel等人,S&P'18,Mughees-Ren S&P'23,Liu等人,S&P'24)中的基本组件,实现了总存储开销降低3倍,计算和通信成本降低1.5-2倍。值得注意的是,TreePIR的设置时间降低了8-160倍,其对数复杂度索引算法比PBC在叶子数量为2^10-2^24的树上快19-160倍。
更新时间: 2024-06-04 13:24:18
领域: cs.DS,cs.CR,math.CO,05C05, 05C15, 05C85, 05C90,,G.2.2; F.2.0; E.1
Solving Partial Differential Equations in Different Domains by Operator Learning method Based on Boundary Integral Equations
This article explores operator learning models that can deduce solutions to partial differential equations (PDEs) on arbitrary domains without requiring retraining. We introduce two innovative models rooted in boundary integral equations (BIEs): the Boundary Integral Type Deep Operator Network (BI-DeepONet) and the Boundary Integral Trigonometric Deep Operator Neural Network (BI-TDONet), which are crafted to address PDEs across diverse domains. Once fully trained, these BIE-based models adeptly predict the solutions of PDEs in any domain without the need for additional training. BI-TDONet notably enhances its performance by employing the singular value decomposition (SVD) of bounded linear operators, allowing for the efficient distribution of input functions across its modules. Furthermore, to tackle the issue of function sampling values that do not effectively capture oscillatory and impulse signal characteristics, trigonometric coefficients are utilized as both inputs and outputs in BI-TDONet. Our numerical experiments robustly support and confirm the efficacy of this theoretical framework.
Updated: 2024-06-04 13:19:06
标题: 基于边界积分方程的算子学习方法在不同领域中求解偏微分方程
摘要: 这篇文章探讨了能够在任意域上推导出偏微分方程(PDEs)解决方案的操作学习模型,而无需重新训练。我们引入了两种根植于边界积分方程(BIEs)的创新模型:边界积分型深度操作网络(BI-DeepONet)和边界积分三角深度操作神经网络(BI-TDONet),它们旨在解决各种领域的PDEs。一旦完全训练,这些基于BIE的模型能够熟练地预测任何域中的PDEs解决方案,无需额外训练。BI-TDONet通过利用有界线性算子的奇异值分解(SVD),有效地将输入函数分布到其模块中,从而显著提高了性能。此外,为了解决不能有效捕捉振荡和脉冲信号特征的函数采样值问题,BI-TDONet中利用三角系数作为输入和输出。我们的数值实验坚定地支持并确认了这个理论框架的有效性。
更新时间: 2024-06-04 13:19:06
领域: math-ph,cs.LG,math.MP
Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds
In recent years, interest in gradient-based optimization over Riemannian manifolds has surged. However, a significant challenge lies in the reliance on hyperparameters, especially the learning rate, which requires meticulous tuning by practitioners to ensure convergence at a suitable rate. In this work, we introduce innovative learning-rate-free algorithms for stochastic optimization over Riemannian manifolds, eliminating the need for hand-tuning and providing a more robust and user-friendly approach. We establish high probability convergence guarantees that are optimal, up to logarithmic factors, compared to the best-known optimally tuned rate in the deterministic setting. Our approach is validated through numerical experiments, demonstrating competitive performance against learning-rate-dependent algorithms.
Updated: 2024-06-04 13:17:24
标题: 在黎曼流形上的学习率无关随机优化
摘要: 近年来,对于在黎曼流形上基于梯度的优化引起了广泛关注。然而,一个重要的挑战在于依赖超参数,特别是学习率,这需要从业者精心调整以确保以合适的速率收敛。在这项工作中,我们引入了创新的无学习率算法,用于在黎曼流形上进行随机优化,消除了手动调整的需要,提供了更健壮和用户友好的方法。我们建立了高概率收敛保证,与确定性设置中已知的最佳调整率相比,我们的方法在对数因子上是最优的。通过数值实验验证了我们的方法,展示了与依赖学习率的算法相比具有竞争性的性能。
更新时间: 2024-06-04 13:17:24
领域: cs.LG,math.OC
Autonomous Adaptive Security Framework for 5G-Enabled IoT
In IoT-based critical sectors, 5G can provide more rapid connection speeds, lower latency, faster downloads, and capability to connect more devices due to the introduction of new dynamics such as softwarization and virtualization. 5G-enabled IoT networks increase systems vulnerabilities to security threats due to these dynamics. Consequently, adaptive cybersecurity solutions need to be developed for 5G-enabled IoT applications to protect them against potential cyber-attacks. This task specifies new adaptive strategies of security intelligence with associated scenarios to meet the challenges of 5G-IoT characteristics. In this task we have also developed an autonomous adaptive security framework which can protect 5G-enabaled IoT dynamically and autonomously. The framework is based on a closed feedback loop of advanced analytics to monitor, analyse, and adapt to evolving threats to 5G-enanled IoT applications.
Updated: 2024-06-04 13:17:04
标题: 5G-启用的物联网自适应安全框架
摘要: 在基于物联网的关键领域中,5G可以提供更快的连接速度、更低的延迟、更快的下载速度,以及由于引入了新的动态如软件化和虚拟化而能够连接更多设备的能力。5G使物联网网络增加了系统漏洞,使其更容易受到安全威胁的影响。因此,需要为5G使能的物联网应用开发自适应的网络安全解决方案,以保护其免受潜在的网络攻击。本任务指定了新的安全情报自适应策略,并提供相关场景,以满足5G-IoT特性所带来的挑战。在这项任务中,我们还开发了一个自主适应的安全框架,可以动态自主地保护5G使能的物联网。该框架基于先进分析的封闭反馈循环,用于监测、分析并适应不断演变的5G使能的物联网应用的威胁。
更新时间: 2024-06-04 13:17:04
领域: cs.CR,cs.DC
How to Explore with Belief: State Entropy Maximization in POMDPs
Recent works have studied *state entropy maximization* in reinforcement learning, in which the agent's objective is to learn a policy inducing high entropy over states visitation (Hazan et al., 2019). They typically assume full observability of the state of the system, so that the entropy of the observations is maximized. In practice, the agent may only get *partial* observations, e.g., a robot perceiving the state of a physical space through proximity sensors and cameras. A significant mismatch between the entropy over observations and true states of the system can arise in those settings. In this paper, we address the problem of entropy maximization over the *true states* with a decision policy conditioned on partial observations *only*. The latter is a generalization of POMDPs, which is intractable in general. We develop a memory and computationally efficient *policy gradient* method to address a first-order relaxation of the objective defined on *belief* states, providing various formal characterizations of approximation gaps, the optimization landscape, and the *hallucination* problem. This paper aims to generalize state entropy maximization to more realistic domains that meet the challenges of applications.
Updated: 2024-06-04 13:16:34
标题: 如何在信念的基础上进行探索:POMDPs中的状态熵最大化
摘要: 最近的研究探讨了在强化学习中的*状态熵最大化*,其中代理的目标是学习一种政策,使状态访问的熵较高(Hazan等人,2019)。他们通常假设对系统状态的完全可观测性,以便最大化观测的熵。在实践中,代理可能只能获得*部分*观测,例如,一个通过接近传感器和摄像头感知物理空间状态的机器人。在这些情况下,观测和系统真实状态之间可能存在显著的不匹配。本文解决了在仅基于部分观测条件下对*真实状态*进行熵最大化的问题。后者是对POMDPs的泛化,在一般情况下是难以处理的。我们开发了一种记忆和计算效率高的*策略梯度*方法,以解决对*信念*状态定义的目标的第一阶松弛问题,并提供了各种形式上的近似差距、优化景观和*幻觉*问题的特征化。本文旨在将状态熵最大化推广到更符合应用挑战的更现实领域。
更新时间: 2024-06-04 13:16:34
领域: cs.LG,cs.AI
Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.
Updated: 2024-06-04 13:16:08
标题: 更小的批次,更大的收益?研究批次大小对基于强化学习的现实生产调度的影响
摘要: 生产调度是制造业中的一个关键任务,强化学习(RL)被认为是一个重要的解决方案。在之前的一项工作中,RL被应用于解决一个具有两个阶段、通过中央缓冲区连接的实际生产线上的扩展排列流水车间调度问题(PFSSP)。RL代理被训练为按顺序排列大小相同的产品批次,以最小化设置工作量和空闲时间。然而,尚未探讨由于这些产品批次大小变化而带来的重大影响。在这项后续研究中,我们研究了不同批次大小的影响,探讨了解决方案的质量和RL代理的训练动态。结果表明,可以系统地确定批次大小的合理界限。这些界限一方面由于较小批次大小所带来的增加的样本复杂性而确定,另一方面由于处理更大批次大小时代理的灵活性减少而确定。这使从业者能够就选择适当的批次大小做出明智的决策。此外,我们引入并研究了两种新的课程学习策略,以便使用小批次大小进行训练。这项工作的发现为在几个具有相似调度问题的工业用例中的应用提供了潜力。
更新时间: 2024-06-04 13:16:08
领域: cs.LG
Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss
This paper explores the use of XGBoost for composite quantile regression. XGBoost is a highly popular model renowned for its flexibility, efficiency, and capability to deal with missing data. The optimization uses a second order approximation of the loss function, complicating the use of loss functions with a zero or vanishing second derivative. Quantile regression -- a popular approach to obtain conditional quantiles when point estimates alone are insufficient -- unfortunately uses such a loss function, the pinball loss. Existing workarounds are typically inefficient and can result in severe quantile crossings. In this paper, we present a smooth approximation of the pinball loss, the arctan pinball loss, that is tailored to the needs of XGBoost. Specifically, contrary to other smooth approximations, the arctan pinball loss has a relatively large second derivative, which makes it more suitable to use in the second order approximation. Using this loss function enables the simultaneous prediction of multiple quantiles, which is more efficient and results in far fewer quantile crossings.
Updated: 2024-06-04 13:13:29
标题: 使用新型反正切分位损失的XGBoost复合分位回归
摘要: 本文探讨了在复合分位数回归中使用XGBoost的方法。XGBoost是一个非常流行的模型,以其灵活性、效率和处理缺失数据的能力而闻名。优化过程使用了损失函数的二阶近似,这增加了使用具有零或逐渐减少二阶导数的损失函数的复杂性。分位数回归是一种流行的方法,用于在仅使用点估计不足时获得条件分位数,不幸的是,它使用了这样一种损失函数,即pinball损失。现有的解决方法通常效率低下,可能导致严重的分位数交叉。在本文中,我们提出了一个平滑近似的pinball损失,即arctan pinball损失,专门针对XGBoost的需求。与其他平滑近似相反,arctan pinball损失具有相对较大的二阶导数,这使得它更适合在二阶近似中使用。使用这种损失函数可以同时预测多个分位数,这更有效率,导致更少的分位数交叉。
更新时间: 2024-06-04 13:13:29
领域: stat.ML,cs.LG
Astral: training physics-informed neural networks with error majorants
The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired accuracy is reached. We call loss function associated with error majorant $\textbf{Astral}$: neur$\textbf{A}$l a po$\textbf{ST}$erio$\textbf{RI}$ function$\textbf{A}$l Loss. To compare Astral and residual loss functions, we illustrate how error majorants can be derived for various PDEs and conduct experiments with diffusion equations (including anisotropic and in the L-shaped domain), convection-diffusion equation, temporal discretization of Maxwell's equation, and magnetostatics problem. The results indicate that Astral loss is competitive to the residual loss, typically leading to faster convergence and lower error (e.g., for Maxwell's equations, we observe an order of magnitude better relative error and training time). We also report that the error estimate obtained with Astral loss is usually tight enough to be informative, e.g., for a highly anisotropic equation, on average, Astral overestimates error by a factor of $1.5$, and for convection-diffusion by a factor of $1.7$.
Updated: 2024-06-04 13:11:49
标题: 星体:使用误差主量训练物理信息神经网络
摘要: 物理启示学习的原始方法是残差最小化。我们认为残差最多是对近似解误差的间接衡量,并建议使用误差主量来训练。由于误差主量提供了误差的直接上界,因此可以可靠地估计PiNN与精确解的接近程度,并在达到所需精度时停止优化过程。我们称与误差主量相关的损失函数为$\textbf{Astral}$:神经$\textbf{A}$后$\textbf{ST}$erior$\textbf{RI}$功能$\textbf{A}$损失。为了比较Astral和残差损失函数,我们说明了如何推导各种PDE的误差主量,并进行了扩散方程(包括各向异性和L形域)、对流扩散方程、Maxwell方程的时间离散化以及磁静力学问题的实验。结果表明Astral损失与残差损失竞争力强,通常导致更快的收敛和更低的误差(例如,对于Maxwell方程,我们观察到相对误差和训练时间提高了一个数量级)。我们还报告称,使用Astral损失获得的误差估计通常足够紧凑,例如对于高度各向异性的方程,平均而言,Astral对误差的估计超过了1.5倍,对于对流扩散则超过了1.7倍。
更新时间: 2024-06-04 13:11:49
领域: physics.comp-ph,cs.AI,cs.LG,cs.NA,math.NA
An Axiomatic Approach to Loss Aggregation and an Adapted Aggregating Algorithm
Supervised learning has gone beyond the expected risk minimization framework. Central to most of these developments is the introduction of more general aggregation functions for losses incurred by the learner. In this paper, we turn towards online learning under expert advice. Via easily justified assumptions we characterize a set of reasonable loss aggregation functions as quasi-sums. Based upon this insight, we suggest a variant of the Aggregating Algorithm tailored to these more general aggregation functions. This variant inherits most of the nice theoretical properties of the AA, such as recovery of Bayes' updating and a time-independent bound on quasi-sum regret. Finally, we argue that generalized aggregations express the attitude of the learner towards losses.
Updated: 2024-06-04 13:11:01
标题: 一个公理化方法来进行损失聚合和一个适应的聚合算法
摘要: 监督学习已经超出了预期的风险最小化框架。在这些发展的大部分中,最核心的是引入更一般的损失聚合函数,用于评估学习者所承担的损失。在本文中,我们转向专家建议下的在线学习。通过容易证明的假设,我们将一组合理的损失聚合函数表征为准和。基于这一见解,我们提出了一个适用于这些更一般聚合函数的Aggregating Algorithm的变种。这个变种继承了AA的大部分良好的理论性质,比如贝叶斯更新的恢复和对准和遗憾的时间无关界限。最后,我们认为广义聚合表达了学习者对损失的态度。
更新时间: 2024-06-04 13:11:01
领域: cs.LG
How to discretize continuous state-action spaces in Q-learning: A symbolic control approach
Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.
Updated: 2024-06-04 13:07:40
标题: 如何在Q-learning中对连续状态-动作空间进行离散化:一种符号控制方法
摘要: Q学习被广泛认为是合成控制器以实现特定目标的有效方法。然而,处理连续状态-动作空间所带来的挑战仍然是持续的研究重点。本文提出了一项系统分析,突显了空间离散化方法的一个主要缺点。为了解决这一挑战,本文提出了一个代表行为关系的符号模型,例如从抽象到受控系统的交替模拟。这种关系允许基于抽象的合成控制器无缝应用到原始系统上。引入一种新颖的用于符号模型的Q学习技术,该算法产生两个编码最优策略的Q表。理论分析表明,这些Q表既作为原始连续空间系统的Q值的上限也作为下限。此外,本文探讨了空间抽象参数与Q值损失之间的相关性。所得算法有助于在任意精度范围内实现最优性,并提供了在精度和计算复杂度之间权衡的控制。所得结果为选择适当的学习参数和优化控制器提供了宝贵的见解。提出的基于Q学习的符号模型通过两个案例研究展示了工程相关性。
更新时间: 2024-06-04 13:07:40
领域: eess.SY,cs.AI,cs.SY,math.DS
A Study of Optimizations for Fine-tuning Large Language Models
Fine-tuning large language models is a popular choice among users trying to adapt them for specific applications. However, fine-tuning these models is a demanding task because the user has to examine several factors, such as resource budget, runtime, model size and context length among others. A specific challenge is that fine-tuning is memory intensive, imposing constraints on the required hardware memory and context length of training data that can be handled. In this work, we share a detailed study on a variety of fine-tuning optimizations across different fine-tuning scenarios. In particular, we assess Gradient Checkpointing, Low Rank Adaptation, DeepSpeed's ZeRO Redundancy Optimizer and Flash Attention. With a focus on memory and runtime, we examine the impact of different optimization combinations on GPU memory usage and execution runtime during fine-tuning phase. We provide recommendation on best default optimization for balancing memory and runtime across diverse model sizes. We share effective strategies for fine-tuning very large models with tens or hundreds of billions of parameters and enabling large context lengths during fine-tuning. Furthermore, we propose the appropriate optimization mixtures for fine-tuning under GPU resource limitations.
Updated: 2024-06-04 13:05:47
标题: 对大型语言模型进行微调的优化研究
摘要: 调整大型语言模型是用户尝试为特定应用程序自定义它们的流行选择。然而,微调这些模型是一项具有挑战性的任务,因为用户必须考虑多个因素,例如资源预算、运行时间、模型大小和上下文长度等。一个特定的挑战是微调需要大量内存,对所需硬件内存和训练数据的上下文长度施加了限制。在这项工作中,我们分享了对不同微调场景中各种微调优化的详细研究。特别是,我们评估了梯度检查点、低秩适应、DeepSpeed的ZeRO冗余优化器和Flash Attention。着重关注内存和运行时间,我们检查不同优化组合对GPU内存使用和微调阶段执行运行时间的影响。我们提供了关于在不同模型大小之间平衡内存和运行时间的最佳默认优化的建议。我们分享了微调数十亿或数百亿参数的非常大模型和在微调过程中实现大上下文长度的有效策略。此外,我们提出了在GPU资源限制下微调的适当优化混合方案。
更新时间: 2024-06-04 13:05:47
领域: cs.LG
Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey
Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security, and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field. We provide a collection of research lists and periodically update them at https://github.com/shentt67/VFL_Survey.
Updated: 2024-06-04 13:04:53
标题: 纵向联邦学习的效果、安全性和适用性:一项调查
摘要: 垂直联邦学习(VFL)是一种保护隐私的分布式学习范式,不同方参与者共同使用共享样本的分区特征学习模型,而不泄露私人数据。最近的研究已经显示出在VFL中解决各种挑战的有希望的结果,凸显了其在跨领域合作中的实际应用潜力。然而,相关研究分散且缺乏组织。为了推进VFL研究,本调查提供了最近发展的系统概述。首先,我们提供历史和背景介绍,以及VFL的一般训练协议概述。然后,我们重新审视了最近评论中的分类法,并深入分析了其局限性。为了进行全面和有组织的讨论,我们从三个基本角度综合了最近的研究成果:有效性、安全性和适用性。最后,我们讨论了VFL中几个关键的未来研究方向,这将促进该领域的发展。我们提供了一系列研究列表,并定期更新它们的地址为https://github.com/shentt67/VFL_Survey。
更新时间: 2024-06-04 13:04:53
领域: cs.LG,cs.CR
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that speech representations from large-scale ASR models contain valuable speaker information. This work explores the limitations of fine-tuning these models for SV using an SSL contrastive objective in an end-to-end approach. Then, we propose a framework to learn speaker representations in an SSL context by fine-tuning a pre-trained WavLM with a supervised loss using pseudo-labels. Initial pseudo-labels are derived from an SSL DINO-based model and are iteratively refined by clustering the model embeddings. Our method achieves 0.99% EER on VoxCeleb1-O, establishing the new state-of-the-art on self-supervised SV. As this performance is close to our supervised baseline of 0.94% EER, this contribution is a step towards supervised performance on SV with SSL.
Updated: 2024-06-04 12:58:19
标题: 朝向利用大规模ASR模型通过自监督学习在说话者验证中实现监督性能
摘要: 最近自监督学习(SSL)在说话者验证(SV)方面取得了有希望的成果。然而,缩小与监督系统的性能差距仍然是一个持续的挑战。一些研究观察到,来自大规模ASR模型的语音表示包含有价值的说话者信息。本文探讨了使用SSL对这些模型进行微调以进行SV的限制,采用端到端方法的对比目标。然后,我们提出了一个框架,通过使用伪标签对预训练的WavLM进行微调,以在SSL上下文中学习说话者表示,使用有监督的损失。初始伪标签是从一个基于SSL DINO模型推导出来的,并通过对模型嵌入进行聚类来迭代地完善。我们的方法在VoxCeleb1-O数据集上实现了0.99%的EER,建立了自监督SV的新的最先进技术。由于该性能接近我们的监督基线0.94%的EER,这一成果是朝着在SV上实现SSL监督性能的一步。
更新时间: 2024-06-04 12:58:19
领域: eess.AS,cs.LG,cs.SD
VITS : Variational Inference Thomson Sampling for contextual bandits
In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of VITS on both synthetic and real world datasets.
Updated: 2024-06-04 12:57:53
标题: VITS:用于上下文推断的变分推断汤姆森抽样
摘要: 在本文中,我们介绍并分析了一种Thompson采样(TS)算法的变种,用于上下文多臂赌博机。在每一轮中,传统的TS需要从当前后验分布中获取样本,这通常是难以计算的。为了规避这个问题,可以使用近似推断技术并提供接近后验分布的样本。然而,目前的近似技术要么导致估计不准确(拉普拉斯近似),要么在计算上代价高昂(MCMC方法,集成采样等)。在本文中,我们提出了一种新算法,Varational Inference Thompson sampling VITS,基于高斯变分推断。这种方案提供了强大的后验近似,易于从中抽样,并且在计算上效率高,使其成为TS的理想选择。此外,我们证明VITS对于线性上下文多臂赌博机实现了与传统TS在维度和轮数上相同阶数的次线性后悔界。最后,我们实验证明了VITS在合成和真实世界数据集上的有效性。
更新时间: 2024-06-04 12:57:53
领域: stat.ML,cs.LG
Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. Therefore, we analyze yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.
Updated: 2024-06-04 12:56:46
标题: 使用更紧密的贝叶斯遗憾界限的后验采样贝叶斯优化
摘要: 在贝叶斯优化(BO)中,高斯过程上置信界(GP-UCB)和汤普森抽样(TS)是众所周知的获取函数(AFs)选项,具有关于贝叶斯累积遗憾(BCR)的已建立的理论性质。最近,已经证明GP-UCB的随机化变体实现了比GP-UCB更紧密的BCR边界,我们简称为更紧密的BCR边界。受这项研究的启发,本文首先展示了TS实现了更紧密的BCR边界。另一方面,GP-UCB和TS在实践中经常受到手动超参数调整和过度探索的问题。因此,我们分析了另一种名为样本路径最大值的改进概率(PIMS)的AF。我们展示了PIMS实现了更紧密的BCR边界,并且避免了超参数调整,不像GP-UCB。此外,我们展示了一系列广泛的实验,重点关注PIMS的有效性,缓解了GP-UCB和TS的实际问题。
更新时间: 2024-06-04 12:56:46
领域: cs.LG,stat.ML
Test-Time Regret Minimization in Meta Reinforcement Learning
Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently. In this paper, we consider a finite set of tasks modeled through Markov decision processes with various dynamics. We assume to have endured a long training phase, from which the set of tasks is perfectly recovered, and we focus on regret minimization against the optimal policy in the unknown test task. Under a separation condition that states the existence of a state-action pair revealing a task against another, Chen et al. (2022) show that $O(M^2 \log(H))$ regret can be achieved, where $M, H$ are the number of tasks in the set and test episodes, respectively. In our first contribution, we demonstrate that the latter rate is nearly optimal by developing a novel lower bound for test-time regret minimization under separation, showing that a linear dependence with $M$ is unavoidable. Then, we present a family of stronger yet reasonable assumptions beyond separation, which we call strong identifiability, enabling algorithms achieving fast rates $\log (H)$ and sublinear dependence with $M$ simultaneously. Our paper provides a new understanding of the statistical barriers of test-time regret minimization and when fast rates can be achieved.
Updated: 2024-06-04 12:56:10
标题: 元强化学习中的测试时间遗憾最小化
摘要: 元强化学习在一个任务集合上设置了一个分布,代理可以自由训练,然后要求在任何测试任务上高效地学习最优策略。本文考虑了通过各种动态的马尔可夫决策过程建模的有限任务集合。我们假设经历了一个长时间的训练阶段,从中完美恢复了任务集合,并专注于相对于未知测试任务中最优策略的后悔最小化。在一个分离条件下,该条件表明存在一个状态-动作对可以区分一个任务和另一个任务,陈等人(2022年)表明可以实现$O(M^2 \log(H))$的后悔,其中$M,H$分别是任务集合和测试周期的数量。在我们的第一个贡献中,我们通过开发一个新的下界证明了后悔最小化的近乎最优率,展示了与$M$的线性依赖是不可避免的。然后,我们提出了一系列更强大但合理的假设,超越了分离,我们称之为强可识别性,使算法能够同时实现快速率$\log (H)$和对$M$的次线性依赖。我们的论文提供了对测试时间后悔最小化的统计障碍以及何时可以实现快速率的新理解。
更新时间: 2024-06-04 12:56:10
领域: cs.LG
A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.
Updated: 2024-06-04 12:49:46
标题: 一个基于KL的分析框架及其在非下降优化方法中的应用
摘要: 我们提出了一个基于Kurdyka-Lojasiewicz性质的非凹优化场景中非下降类型优化方法的新型分析框架。我们的框架允许涵盖一类广泛的算法,包括那些常用于随机和分布式优化的算法。具体来说,它使得能够分析缺乏充分下降性质且不需要访问完整(确定性)梯度信息的一阶方法。我们利用这一框架首次建立了去中心化梯度方法和联邦平均在温和假设下的迭代收敛性和相应的收敛速率。此外,基于新的分析技术,我们展示了随机重排和随机梯度下降方法的收敛性,而无需典型的先验有界迭代假设。
更新时间: 2024-06-04 12:49:46
领域: math.OC,cs.LG,90C06, 90C26, 90C30
Graph Neural Networks Do Not Always Oversmooth
Graph neural networks (GNNs) have emerged as powerful tools for processing relational data in applications. However, GNNs suffer from the problem of oversmoothing, the property that the features of all nodes exponentially converge to the same vector over layers, prohibiting the design of deep GNNs. In this work we study oversmoothing in graph convolutional networks (GCNs) by using their Gaussian process (GP) equivalence in the limit of infinitely many hidden features. By generalizing methods from conventional deep neural networks (DNNs), we can describe the distribution of features at the output layer of deep GCNs in terms of a GP: as expected, we find that typical parameter choices from the literature lead to oversmoothing. The theory, however, allows us to identify a new, nonoversmoothing phase: if the initial weights of the network have sufficiently large variance, GCNs do not oversmooth, and node features remain informative even at large depth. We demonstrate the validity of this prediction in finite-size GCNs by training a linear classifier on their output. Moreover, using the linearization of the GCN GP, we generalize the concept of propagation depth of information from DNNs to GCNs. This propagation depth diverges at the transition between the oversmoothing and non-oversmoothing phase. We test the predictions of our approach and find good agreement with finite-size GCNs. Initializing GCNs near the transition to the non-oversmoothing phase, we obtain networks which are both deep and expressive.
Updated: 2024-06-04 12:47:13
标题: 图神经网络并非总是过度平滑
摘要: 图神经网络(GNNs)已经成为处理应用中的关联数据的强大工具。然而,GNNs存在过度平滑的问题,即所有节点的特征在层之间指数地收敛到相同的向量,从而阻止了深度GNNs的设计。在这项工作中,我们通过在无限多个隐藏特征的极限下使用它们的高斯过程(GP)等价性来研究图卷积网络(GCNs)中的过度平滑问题。通过将传统深度神经网络(DNNs)的方法进行泛化,我们可以描述深度GCNs输出层特征的分布,即作为一个GP:如预期的那样,我们发现文献中的典型参数选择会导致过度平滑。然而,该理论使我们能够确定一个新的、非过度平滑的阶段:如果网络的初始权重具有足够大的方差,GCNs不会过度平滑,节点特征即使在较大深度时仍然具有信息量。我们通过在有限大小的GCNs上训练线性分类器来证明这一预测的有效性。此外,通过对GCN GP的线性化,我们将信息传播深度的概念从DNNs推广到GCNs。这种信息传播深度在过度平滑和非过度平滑阶段的转变处发散。我们测试了我们方法的预测,并发现与有限大小的GCNs有很好的一致性。通过在接近非过度平滑阶段的转变处初始化GCNs,我们获得了既深又具有表现力的网络。
更新时间: 2024-06-04 12:47:13
领域: stat.ML,cond-mat.dis-nn,cs.LG
Analyzing the Benefits of Prototypes for Semi-Supervised Category Learning
Categories can be represented at different levels of abstraction, from prototypes focused on the most typical members to remembering all observed exemplars of the category. These representations have been explored in the context of supervised learning, where stimuli are presented with known category labels. We examine the benefits of prototype-based representations in a less-studied domain: semi-supervised learning, where agents must form unsupervised representations of stimuli before receiving category labels. We study this problem in a Bayesian unsupervised learning model called a variational auto-encoder, and we draw on recent advances in machine learning to implement a prior that encourages the model to use abstract prototypes to represent data. We apply this approach to image datasets and show that forming prototypes can improve semi-supervised category learning. Additionally, we study the latent embeddings of the models and show that these prototypes allow the models to form clustered representations without supervision, contributing to their success in downstream categorization performance.
Updated: 2024-06-04 12:47:11
标题: 分析原型在半监督类别学习中的益处
摘要: 类别可以以不同的抽象级别表示,从侧重于最典型成员的原型到记住该类别的所有观察到的实例。这些表示在监督学习的背景下已被研究,其中刺激物被呈现为已知的类别标签。我们研究了原型表示在一个较少研究的领域中的好处:半监督学习,其中代理必须在接收类别标签之前形成刺激的无监督表示。我们在一个称为变分自动编码器的贝叶斯无监督学习模型中研究了这个问题,并借鉴了机器学习的最新进展,实现了一个鼓励模型使用抽象原型来表示数据的先验。我们将这种方法应用于图像数据集,并展示了形成原型可以改进半监督类别学习。此外,我们研究了模型的潜在嵌入,并展示了这些原型允许模型形成聚类表示而无需监督,从而有助于它们在下游分类性能中的成功。
更新时间: 2024-06-04 12:47:11
领域: cs.LG,I.2; I.5
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature and lack of efficiency, whereas automated metrics, such as the ROUGE score, demonstrate discordance with human judgment criteria. In this paper, we propose ProxyQA, an innovative framework dedicated to assessing long-text generation. ProxyQA comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. LLMs are tasked to generate extensive content in response to these meta-questions, by engaging an evaluator and incorporating the generated texts as contextual background, ProxyQA assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions. We examine multiple LLMs, emphasizing ProxyQA's demanding nature as a high-quality assessment tool. Human evaluation demonstrates that the proxy-question method is notably self-consistent and aligns closely with human evaluative standards. The dataset and leaderboard is available at \url{https://proxy-qa.com}.
Updated: 2024-06-04 12:46:47
标题: PROXYQA: 一种用于评估大型语言模型生成长篇文本的替代框架
摘要: 大型语言模型(LLMs)在理解长篇内容方面取得了显著成功。然而,探索它们在生成报告和文章等长篇内容方面的能力相对较少研究,并且现有基准测试不够充分。目前主要依赖众包的评估方法被认为劳动密集且缺乏效率,而自动化指标(如ROUGE分数)与人类判断标准存在不一致。在本文中,我们提出了ProxyQA,一个专门用于评估长文本生成的创新框架。ProxyQA包括深入人工筛选的跨多个领域的元问题,每个问题都附带具体的代理问题和预先注释的答案。LLMs被要求针对这些元问题生成大量内容,通过与评估者互动并将生成的文本作为背景,ProxyQA通过评估者在回答代理问题时的准确性来评估生成内容的质量。我们检验了多个LLMs,强调了ProxyQA作为高质量评估工具的苛刻性质。人类评估表明,代理问题方法在自洽性方面显著,并与人类评估标准密切相关。数据集和排行榜可在\url{https://proxy-qa.com}上找到。
更新时间: 2024-06-04 12:46:47
领域: cs.CL,cs.AI
Into the Unknown: Self-Learning Large Language Models
We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through selfassessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in the Unknown (PiUs), along with one extrinsic and three intrinsic methods for automatic PiUs identification. It facilitates the creation of a self-learning loop that focuses exclusively on the knowledge gap in Points in the Unknown, resulting in a reduced hallucination score. We also developed evaluation metrics for gauging an LLM's self-learning capability. Our experiments revealed that 7B-Mistral models that have been finetuned or aligned and RWKV5-Eagle are capable of self-learning considerably well. Our self-learning concept allows more efficient LLM updates and opens new perspectives for knowledge exchange. It may also increase public trust in AI.
Updated: 2024-06-04 12:44:46
标题: 走向未知:自学习大型语言模型
摘要: 我们解决了自学习LLM的主要问题:即学习什么的问题。我们提出了一个自学习LLM框架,使LLM能够通过自我评估其自身幻觉来独立学习先前未知的知识。利用幻觉得分,我们引入了一个新概念——未知点数(PiUs),以及一个外在方法和三种内在方法来自动识别PiUs。它促进了一个自学习循环的创建,专注于未知点数中的知识差距,从而减少幻觉得分。我们还开发了用于评估LLM自学习能力的评估指标。我们的实验表明,经过微调或对齐的7B-Mistral模型和RWKV5-Eagle能够自学得相当好。我们的自学习概念可以使LLM更新更加高效,并为知识交流打开新的视角。这也可能增加公众对人工智能的信任。
更新时间: 2024-06-04 12:44:46
领域: cs.AI
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE. We identify key challenges and potential future directions in this domain, and keep the survey open and updated on GitHub at https://github.com/codefuse-ai/Awesome-Code-LLM.
Updated: 2024-06-04 12:39:47
标题: 将自然语言处理和软件工程的观点统一起来:关于代码语言模型的调查
摘要: 在这项工作中,我们系统地回顾了软件工程中基于语言模型的最新进展,涵盖了70多种模型、40多种评估任务、180多个数据集和900多个相关工作。我们将代码处理模型分解为通用语言模型(代表为GPT系列)和专门预训练于代码的专用模型,通常具有定制目标。我们讨论了这些模型之间的关系和差异,并强调了代码建模从统计模型和RNNs到预训练的Transformers和LLMs的历史转变,这正是NLP所经历的相同过程。我们还超越编程,回顾了LLMs在其他软件工程活动中的应用,包括需求工程、测试、部署和运营,以提供NLP在SE中的全局视图。我们确定了这一领域的关键挑战和潜在未来方向,并在GitHub上保持对调查的开放和更新,网址为https://github.com/codefuse-ai/Awesome-Code-LLM。
更新时间: 2024-06-04 12:39:47
领域: cs.CL,cs.AI,cs.SE
Differentially private exact recovery for stochastic block models
Stochastic block models (SBMs) are a very commonly studied network model for community detection algorithms. In the standard form of an SBM, the $n$ vertices (or nodes) of a graph are generally divided into multiple pre-determined communities (or clusters). Connections between pairs of vertices are generated randomly and independently with pre-defined probabilities, which depend on the communities containing the two nodes. A fundamental problem in SBMs is the recovery of the community structure, and sharp information-theoretic bounds are known for recoverability for many versions of SBMs. Our focus here is the recoverability problem in SBMs when the network is private. Under the edge differential privacy model, we derive conditions for exact recoverability in three different versions of SBMs, namely Asymmetric SBM (when communities have non-uniform sizes), General Structure SBM (with outliers), and Censored SBM (with edge features). Our private algorithms have polynomial running time w.r.t. the input graph's size, and match the recovery thresholds of the non-private setting when $\epsilon\rightarrow\infty$. In contrast, the previous best results for recoverability in SBMs only hold for the symmetric case (equal size communities), and run in quasi-polynomial time, or in polynomial time with recovery thresholds being tight up to some constants from the non-private settings.
Updated: 2024-06-04 12:38:05
标题: 隐私保护的随机块模型精确恢复
摘要: 随机块模型(SBMs)是一个非常常见的用于社区检测算法的网络模型。在标准的SBM形式中,图的n个顶点(或节点)通常被划分为多个预先确定的社区(或簇)。顶点对之间的连接是根据包含这两个节点的社区的预定义概率随机独立生成的。SBMs中的一个基本问题是社区结构的恢复,对于许多版本的SBMs已知具有尖锐的信息论界限用于可恢复性。 我们在这里关注的是在网络是私有的情况下SBMs中的恢复问题。在边界差分隐私模型下,我们推导出在三种不同版本的SBMs中的准确可恢复性的条件,即非对称SBM(当社区大小不均匀时)、一般结构SBM(带有异常值)和被审查的SBM(带有边特征)。我们的私有算法相对于输入图的大小具有多项式运行时间,并且在ε→∞时与非私有设置的恢复阈值相匹配。相比之下,以前在SBMs中的可恢复性的最佳结果仅适用于对称情况(相等大小的社区),并且以准多项式时间运行,或者在多项式时间内,恢复阈值与非私有设置的一些常数紧密匹配。
更新时间: 2024-06-04 12:38:05
领域: cs.CR,cs.AI,cs.DS
In-Context Unlearning: Language Models as Few Shot Unlearners
Machine unlearning, the study of efficiently removing the impact of specific training instances on a model, has garnered increased attention in recent years due to regulatory guidelines such as the \emph{Right to be Forgotten}. Achieving precise unlearning typically involves fully retraining the model and is computationally infeasible in case of very large models such as Large Language Models (LLMs). To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or having only query access to the LLMs. In this work, we propose a new class of unlearning methods for LLMs called ``In-Context Unlearning.'' This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters. To unlearn specific training instances, we present these instances to the LLMs at inference time along with labels that differ from their ground truth. Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters, effectively removing the influence of specific instances on the model while preserving test accuracy.
Updated: 2024-06-04 12:35:56
标题: 背景下的去学习:语言模型作为少样本去学习者
摘要: 机器遗忘,即有效消除特定训练实例对模型的影响的研究,在近年来受到了越来越多的关注,这主要是由于《被遗忘权》等监管指导方针的出现。实现精确的遗忘通常涉及完全重新训练模型,在非常大的模型(如大型语言模型)的情况下,这在计算上是不可行的。为此,最近的工作提出了几种算法,这些算法近似地删除训练数据,而无需重新训练模型。这些算法关键地依赖于访问模型参数,以便更新它们,然而在实践中,由于计算约束或仅具有对LLMs的查询访问权限,这种假设可能不成立。在这项工作中,我们提出了一种新的用于LLMs的遗忘方法的类别,称为“上下文遗忘”。这种方法通过在上下文中提供特定类型的输入而从模型中遗忘实例,无需更新模型参数。为了遗忘特定的训练实例,我们在推理时将这些实例与与其地面真实值不同的标签一起呈现给LLMs。我们的实验结果表明,在某些情况下,上下文遗忘的性能与其他需要访问模型参数的最新方法相当,甚至有时表现更好,有效地消除了特定实例对模型的影响,同时保持测试准确性。
更新时间: 2024-06-04 12:35:56
领域: cs.LG,cs.AI,cs.CR
Privacy Attacks in Decentralized Learning
Decentralized Gradient Descent (D-GD) allows a set of users to perform collaborative learning without sharing their data by iteratively averaging local model updates with their neighbors in a network graph. The absence of direct communication between non-neighbor nodes might lead to the belief that users cannot infer precise information about the data of others. In this work, we demonstrate the opposite, by proposing the first attack against D-GD that enables a user (or set of users) to reconstruct the private data of other users outside their immediate neighborhood. Our approach is based on a reconstruction attack against the gossip averaging protocol, which we then extend to handle the additional challenges raised by D-GD. We validate the effectiveness of our attack on real graphs and datasets, showing that the number of users compromised by a single or a handful of attackers is often surprisingly large. We empirically investigate some of the factors that affect the performance of the attack, namely the graph topology, the number of attackers, and their position in the graph.
Updated: 2024-06-04 12:34:25
标题: 去中心化学习中的隐私攻击
摘要: Decentralized Gradient Descent(D-GD)允许一组用户在网络图中通过迭代地将本地模型更新与邻居平均来进行协作学习,而无需共享他们的数据。非邻居节点之间的直接通信缺失可能会导致用户无法推断有关他人数据的精确信息。在这项工作中,我们通过提出针对D-GD的第一次攻击来展示相反的情况,使用户(或一组用户)能够重建其直接邻居以外其他用户的私人数据。我们的方法基于对八卦平均协议的重建攻击,然后扩展以处理D-GD带来的额外挑战。我们在真实图形和数据集上验证了我们攻击的有效性,显示单个或少数攻击者妨碍的用户数量通常令人惊讶地大。我们经验性地研究了影响攻击性能的一些因素,即图形拓扑结构、攻击者数量及其在图中的位置。
更新时间: 2024-06-04 12:34:25
领域: cs.LG,cs.CR
Reinforcement Learning with Lookahead Information
We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.
Updated: 2024-06-04 12:29:51
标题: 具有前瞻信息的强化学习
摘要: 我们研究了强化学习(RL)问题,在这些问题中,代理在决定采取哪种行动之前,会观察到其当前状态下的奖励或转移实现。这样的观察在许多应用中都是可用的,包括交易、导航等。在环境已知的情况下,先前的研究表明,这种前瞻信息可以显著增加收集的奖励。然而,在特定应用之外,现有的与未知环境交互的方法并不适用于这些观察。在这项工作中,我们填补了这一空白,并设计了可以纳入前瞻信息的经过证明有效的学习算法。为了实现这一目标,我们使用奖励和转移观察的经验分布进行规划,与仅依赖于估计期望的普通方法形成对比。我们证明了我们的算法在与也具有前瞻信息访问权限的基线相比的紧密后悔方面取得了成功-与不能处理前瞻信息的代理相比,线性增加了收集到的奖励量。
更新时间: 2024-06-04 12:29:51
领域: cs.LG,stat.ML
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce the Adaptive Patch extraction (AdaPatch) module to acquire the adaptive patches for these regions adaptively. In order to provide explicit explainability for CXR-report generation task, we propose an AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets prove the effectiveness of our method and its superior performance to existing methods.
Updated: 2024-06-04 12:27:38
标题: 细粒度医学影像中的图像文本对齐促进可解释的循环图像报告生成
摘要: 为了解决这些问题,我们提出了一种新颖的自适应patch-word匹配(AdaMatch)模型,用于将胸部X光(CXR)图像区域与医学报告中的单词相关联,并将其应用于CXR-报告生成,以提供对生成过程的可解释性。AdaMatch利用自适应patch和单词之间的细粒度关系,以提供与相应单词相关的特定图像区域的解释。为了捕获不同大小和位置的异常区域,我们引入了自适应patch提取(AdaPatch)模块,以自适应地获取这些区域的自适应patch。为了为CXR-报告生成任务提供明确的可解释性,我们提出了一种基于AdaMatch的双向大语言模型用于循环CXR-报告生成(AdaMatch-Cyclic)。它利用AdaMatch来获取CXR图像的关键词和医学报告的“关键patch”作为提示,以指导CXR-报告生成。对两个公开可用的CXR数据集进行的大量实验证明了我们方法的有效性及其优越性能相对于现有方法。
更新时间: 2024-06-04 12:27:38
领域: cs.CV,cs.AI,cs.CL
MidiCaps -- A large-scale MIDI dataset with text captions
Generative models guided by text prompts are increasingly becoming more popular. However, no text-to-MIDI models currently exist, mostly due to the lack of a captioned MIDI dataset. This work aims to enable research that combines LLMs with symbolic music by presenting the first large-scale MIDI dataset with text captions that is openly available: MidiCaps. MIDI (Musical Instrument Digital Interface) files are a widely used format for encoding musical information. Their structured format captures the nuances of musical composition and has practical applications by music producers, composers, musicologists, as well as performers. Inspired by recent advancements in captioning techniques applied to various domains, we present a large-scale curated dataset of over 168k MIDI files accompanied by textual descriptions. Each MIDI caption succinctly describes the musical content, encompassing tempo, chord progression, time signature, instruments present, genre and mood; thereby facilitating multi-modal exploration and analysis. The dataset contains a mix of various genres, styles, and complexities, offering a rich source for training and evaluating models for tasks such as music information retrieval, music understanding and cross-modal translation. We provide detailed statistics about the dataset and have assessed the quality of the captions in an extensive listening study. We anticipate that this resource will stimulate further research in the intersection of music and natural language processing, fostering advancements in both fields.
Updated: 2024-06-04 12:21:55
标题: MidiCaps -- 一个带有文本说明的大规模MIDI数据集
摘要: 由文本提示指导的生成模型越来越受到欢迎。然而,目前并没有文本到MIDI模型,主要是因为缺乏带有字幕的MIDI数据集。本研究旨在通过提供第一个大规模带有文本字幕的MIDI数据集MidiCaps,促进将LLMs与符号音乐结合的研究。MIDI(音乐乐器数字接口)文件是编码音乐信息的广泛使用格式。它们的结构化格式捕捉了音乐作品的微妙之处,并被音乐制作人、作曲家、音乐学家以及表演者广泛应用。受最近在各个领域应用字幕技术的进展的启发,我们呈现了一个由超过168,000个MIDI文件和文本描述组成的大规模策划数据集。每个MIDI字幕简要描述了音乐内容,包括速度、和弦进行、拍号、乐器、流派和情绪;从而促进了多模态探索和分析。该数据集包含各种流派、风格和复杂性的混合,为训练和评估用于音乐信息检索、音乐理解和跨模态翻译等任务的模型提供了丰富的资源。我们提供了有关数据集的详细统计信息,并通过广泛的听力研究评估了字幕的质量。我们预计这一资源将刺激音乐和自然语言处理交叉领域的进一步研究,促进两个领域的进步。
更新时间: 2024-06-04 12:21:55
领域: eess.AS,cs.LG,cs.MM,cs.SD
PuFace: Defending against Facial Cloaking Attacks for Facial Recognition Models
The recently proposed facial cloaking attacks add invisible perturbation (cloaks) to facial images to protect users from being recognized by unauthorized facial recognition models. However, we show that the "cloaks" are not robust enough and can be removed from images. This paper introduces PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. Specifically, we devise a purifier that takes all the training images including both cloaked and natural images as input and generates the purified facial images close to the manifold where natural images lie. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiment shows PuFace can effectively defend against two state-of-the-art facial cloaking attacks and reduces the attack success rate from 69.84\% to 7.61\% on average without degrading the normal accuracy for various facial recognition models. Moreover, PuFace is a model-agnostic defense mechanism that can be applied to any facial recognition model without modifying the model structure.
Updated: 2024-06-04 12:19:09
标题: PuFace:针对面部遮挡攻击的人脸识别模型的防御
摘要: 最近提出的面部遮蔽攻击是向面部图像添加不可见的扰动(遮蔽),以保护用户免受未经授权的面部识别模型的识别。然而,我们发现这些“遮蔽”并不够稳健,可以从图像中移除。 本文介绍了PuFace,一种利用神经网络的泛化能力的图像净化系统,通过将遮蔽图像推向自然(未遮蔽)图像的流形,减少遮蔽的影响,在面部识别模型的训练过程之前。具体来说,我们设计了一个净化器,将所有训练图像(包括遮蔽和自然图像)作为输入,并生成接近自然图像所在流形的净化面部图像。为了实现防御目标,我们建议在特别放大的遮蔽图像上训练净化器,使用结合图像损失和特征损失的损失函数。我们的实证实验证明,PuFace能够有效抵御两种最先进的面部遮蔽攻击,并将攻击成功率从69.84%平均降低到7.61%,而不降低各种面部识别模型的正常准确率。此外,PuFace是一种与模型无关的防御机制,可应用于任何面部识别模型,无需修改模型结构。
更新时间: 2024-06-04 12:19:09
领域: cs.CV,cs.AI,cs.CR
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.
Updated: 2024-06-04 12:17:16
标题: 使用Transformer和弱监督学习对书面故事中的情绪轨迹建模
摘要: 讲故事是人类交流中不可或缺的一部分,可以唤起情感并影响听众的情感状态。因此,自动建模故事中的情感轨迹引起了相当大的学术兴趣。然而,由于大多数现有作品仅限于无监督的基于词典的方法,因此尚无此任务的基准。我们通过为原始标注为离散情绪类别的儿童故事数据集引入连续的valence和arousal标签来填补这一空白。我们为该数据收集了额外的注释,并将分类标签映射到连续的valence和arousal空间。为了预测因此获得的情感信号,我们通过弱监督学习方法微调了一个DeBERTa模型,并改进了基线。最佳配置在测试集上实现了$.8221$的valence和$.7125$的arousal的Concordance Correlation Coefficient(CCC),证明了我们提出的方法的有效性。详细分析显示了结果在作者、个别故事或故事中的部分等因素下变化的程度。此外,通过调查难以预测的例子,我们揭示了我们方法的弱点。
更新时间: 2024-06-04 12:17:16
领域: cs.CL,cs.AI
On Universally Optimal Algorithms for A/B Testing
We study the problem of best-arm identification with fixed budget in stochastic multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (referred to as the {\it uniform sampling} algorithm) in all instances, and that (ii) strictly outperforms uniform sampling on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. To establish this result, we first introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm in all instances belongs to this class. The proof then proceeds by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \citep{qin2022open}. For the general problem with more than two arms, we provide a first set of results. We characterize the asymptotic error rate of the celebrated Successive Rejects (SR) algorithm \citep{audibert2010best} and show that, surprisingly, the uniform sampling algorithm outperforms the SR algorithm in some instances.
Updated: 2024-06-04 12:14:46
标题: 关于A/B测试的普遍最优算法
摘要: 我们研究了在具有伯努利奖励的随机多臂老虎机中固定预算下的最佳臂识别问题。对于两臂问题,也被称为A/B测试问题,我们证明不存在任何算法能够在所有实例中像均匀抽样算法(称为{\it 均匀抽样}算法)那样表现良好,并且至少在一个实例上明显优于均匀抽样。简言之,不存在比均匀抽样算法更好的算法。为了得出这个结果,我们首先引入了{\it 一致}和{\it 稳定}算法的自然类,并展示了任何在所有实例中像均匀抽样算法那样表现良好的算法都属于这个类。证明随后通过推导任何一致且稳定算法满足的错误率下界,并展示均匀抽样算法与此下界相匹配。我们的结果解决了\citep{qin2022open}中提出的两个未解问题。对于多于两个臂的一般问题,我们提供了一系列结果。我们表征了著名的Successive Rejects (SR)算法\citep{audibert2010best}的渐近错误率,并展示了令人惊讶的是,在某些实例中均匀抽样算法优于SR算法。
更新时间: 2024-06-04 12:14:46
领域: stat.ML,cs.LG
Description Boosting for Zero-Shot Entity and Relation Classification
Zero-shot entity and relation classification models leverage available external information of unseen classes -- e.g., textual descriptions -- to annotate input text data. Thanks to the minimum data requirement, Zero-Shot Learning (ZSL) methods have high value in practice, especially in applications where labeled data is scarce. Even though recent research in ZSL has demonstrated significant results, our analysis reveals that those methods are sensitive to provided textual descriptions of entities (or relations). Even a minor modification of descriptions can lead to a change in the decision boundary between entity (or relation) classes. In this paper, we formally define the problem of identifying effective descriptions for zero shot inference. We propose a strategy for generating variations of an initial description, a heuristic for ranking them and an ensemble method capable of boosting the predictions of zero-shot models through description enhancement. Empirical results on four different entity and relation classification datasets show that our proposed method outperform existing approaches and achieve new SOTA results on these datasets under the ZSL settings. The source code of the proposed solutions and the evaluation framework are open-sourced.
Updated: 2024-06-04 12:09:44
标题: 零样本实体与关系分类的描述增强
摘要: 零-shot实体和关系分类模型利用可用的外部信息来注释输入文本数据中看不见的类别,例如文本描述。由于最低的数据需求,零-shot学习(ZSL)方法在实践中具有很高的价值,特别是在标记数据稀缺的应用中。尽管最近ZSL领域的研究取得了显著的结果,但我们的分析显示这些方法对实体(或关系)的提供的文本描述非常敏感。甚至对描述的轻微修改都可能导致实体(或关系)类之间的决策边界发生变化。在本文中,我们正式定义了识别零-shot推断有效描述的问题。我们提出了一种生成初始描述变体的策略,一种对其进行排名的启发式方法,以及一种能够通过描述增强来提升零-shot模型预测的集成方法。在四个不同的实体和关系分类数据集上的实证结果表明,我们提出的方法优于现有方法,并在这些数据集上在ZSL设置下取得了新的SOTA结果。所提出解决方案的源代码和评估框架已经开放源代码。
更新时间: 2024-06-04 12:09:44
领域: cs.CL,cs.IR,cs.LG
A Framework for Neurosymbolic Robot Action Planning using Large Language Models
Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, human-robot collaboration scenarios because of the poor performance in complex planning domains or when frequent re-planning is needed. We present a framework, Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, making each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. Recently, significant efforts have been devoted by the research community to evaluate the cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to provide an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%
Updated: 2024-06-04 12:03:28
标题: 一个利用大型语言模型进行神经符号机器人行动规划的框架
摘要: Symbolic task planning是一种广泛应用的方法,用于强化机器人的自主性,因为它易于理解并在机器人架构中部署。然而,由于在复杂规划领域或需要频繁重新规划时性能较差,符号任务规划的技术很难在现实世界的人机协作场景中扩展。我们提出了一个名为Teriyaki的框架,专门旨在弥合符号任务规划和机器学习方法之间的差距。其理念是将大型语言模型(LLMs),即GPT-3,训练成与规划域定义语言(PDDL)兼容的神经符号任务规划器,然后利用其生成能力克服符号任务规划器固有的一些限制。潜在的好处包括:(i)随着规划域复杂性的增加,响应时间与输入和输出的长度之和成正比,从而在可伸缩性方面表现更好;(ii)能够逐步合成计划行动,而不是端到端,使得每个行动在生成后立即可供执行,而不是等待整个计划可用,从而实现并发规划和执行。最近,研究界致力于评估LLMs的认知能力,取得了一些成功。与此相反,通过Teriyaki,我们的目标是在特定规划领域提供与传统规划器可比拟的整体规划性能,同时利用LLMs的能力构建一个前瞻性预测规划模型。在选定的领域初步结果显示,我们的方法可以:(i)解决1,000个样本测试数据集中95.5%的问题;(ii)生成的计划比传统符号规划器短达13.5%;(iii)将计划的平均等待时间降低高达61.4%。
更新时间: 2024-06-04 12:03:28
领域: cs.AI,cs.LG,cs.RO,I.2.6; I.2.8; I.2.9
A Probabilistic Model behind Self-Supervised Learning
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. A common task is to classify augmentations or different modalities of the data, which share semantic content (e.g. an object in an image) but differ in style (e.g. the object's location). Many approaches to self-supervised learning have been proposed, e.g. SimCLR, CLIP, and VicREG, which have recently gained much attention for their representations achieving downstream performance comparable to supervised learning. However, a theoretical understanding of self-supervised methods eludes. Addressing this, we present a generative latent variable model for self-supervised learning and show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations, providing a unifying theoretical framework for these methods. The proposed model also justifies connections drawn to mutual information and the use of a "projection head". Learning representations by fitting the model generatively (termed SimVAE) improves performance over discriminative and other VAE-based methods on simple image benchmarks and significantly narrows the gap between generative and discriminative representation learning in more complex settings. Importantly, as our analysis predicts, SimVAE outperforms self-supervised learning where style information is required, taking an important step toward understanding self-supervised methods and achieving task-agnostic representations.
Updated: 2024-06-04 11:59:20
标题: 一个概率模型背后的自监督学习
摘要: 在自监督学习(SSL)中,表示通过辅助任务学习,而无需注释标签。一个常见的任务是对数据的增强或不同模态进行分类,这些数据共享语义内容(例如图像中的对象),但在风格上有所不同(例如对象的位置)。已经提出了许多自监督学习的方法,例如SimCLR、CLIP和VicREG,这些方法近来因其表示实现下游性能与监督学习可比的特点而受到广泛关注。然而,对自监督方法的理论理解仍然模糊。为此,我们提出了一个用于自监督学习的生成潜变量模型,并展示了几个家族的辨别性SSL方法,包括对比方法,可以诱导出一个相似的表示分布,为这些方法提供了一个统一的理论框架。所提出的模型还证明了与互信息和“投影头”使用相关的连接。通过生成地拟合模型(称为SimVAE)学习表示,改善了简单图像基准上的性能,并在更复杂的设置中显著缩小了生成和辨别性表示学习之间的差距。重要的是,正如我们的分析预测的那样,SimVAE在需要风格信息的自监督学习中表现优异,迈出了理解自监督方法和实现任务无关表示的重要一步。
更新时间: 2024-06-04 11:59:20
领域: cs.LG,cs.AI,stat.ML
Verifiable Encodings for Secure Homomorphic Analytics
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations under different trade-offs and without compromising on the features of the encryption algorithm. Our authenticators operate on top of trending ring learning with errors based fully homomorphic encryption schemes over the integers. We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data. We show that contrary to prior work VERITAS supports verification of any homomorphic operation and we demonstrate its practicality for various applications, such as ride-hailing, genomic-data analysis, encrypted search, and machine-learning training and inference.
Updated: 2024-06-04 11:58:08
标题: 可验证的编码用于安全同态分析
摘要: 同态加密使得可以直接在密文上执行算术运算,这是保护云委托计算敏感数据隐私的一种有前景的解决方案。然而,计算结果的正确性并不保证。我们提出了两种错误检测编码,并构建了认证器,可以在不牺牲加密算法特性的情况下,实现对基于云的同态计算的客户端验证。我们的认证器基于整数上基于环学习与错误的最新同态加密方案运行。我们在VERITAS中实现了我们的解决方案,这是一个用于验证在加密数据上执行的外包计算的即用系统。我们展示了与先前工作相反,VERITAS支持验证任何同态操作,并展示了它在各种应用中的实用性,如打车、基因数据分析、加密搜索以及机器学习训练和推断。
更新时间: 2024-06-04 11:58:08
领域: cs.CR
Power Mean Estimation in Stochastic Monte-Carlo Tree_Search
Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search. Its success relies on the Upper Confidence bound for Trees (UCT) algorithm, an extension of the UCB method for multi-arm bandits. However, the theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection, leading to the development of Fixed-Depth-MCTS with a polynomial exploration bonus to balance exploration and exploitation~\citep{shah2022journal}. Both UCT and Fixed-Depth-MCTS suffer from biased value estimation: the weighted sum underestimates the optimal value, while the maximum valuation overestimates it~\citep{coulom2006efficient}. The power mean estimator offers a balanced solution, lying between the average and maximum values. Power-UCT~\citep{dam2019generalized} incorporates this estimator for more accurate value estimates but its theoretical analysis remains incomplete. This paper introduces Stochastic-Power-UCT, an MCTS algorithm using the power mean estimator and tailored for stochastic MDPs. We analyze its polynomial convergence in estimating root node values and show that it shares the same convergence rate of $\mathcal{O}(n^{-1/2})$, with $n$ is the number of visited trajectories, as Fixed-Depth-MCTS, with the latter being a special case of the former. Our theoretical results are validated with empirical tests across various stochastic MDP environments.
Updated: 2024-06-04 11:56:37
标题: 随机蒙特卡罗树搜索中的功率均值估计
摘要: 蒙特卡洛树搜索(MCTS)是一种广泛使用的在线规划策略,它将蒙特卡洛采样与前向树搜索结合起来。它的成功依赖于树的上置信界(UCT)算法,这是多臂老虎机的UCB方法的扩展。然而,UCT的理论基础是不完整的,因为在动作选择中存在对数奖励项的错误,导致开发了具有多项式探索奖励的固定深度MCTS来平衡探索和开发。无论是UCT还是固定深度MCTS都存在值估计偏差:加权和低估了最优值,而最大估值则高估了最优值。功率平均估计器提供了一个平衡的解决方案,介于平均值和最大值之间。功率-UCT将这个估计器纳入,以获得更准确的值估计,但其理论分析仍然不完整。本文介绍了随机功率-UCT,这是一种使用功率平均估计器并专门针对随机MDP的MCTS算法。我们分析了其在估计根节点值方面的多项式收敛性,并展示了它与访问轨迹数量n的收敛速度相同,即O(n ^ -1/2),其中后者是前者的一个特例。我们的理论结果通过对各种随机MDP环境的实证测试进行了验证。
更新时间: 2024-06-04 11:56:37
领域: cs.AI
On the Limitations of Fractal Dimension as a Measure of Generalization
Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persistent homology dimension have been proposed to correlate with generalization gap, thus serving as a measure of generalization. This work performs an extended evaluation of these topological generalization measures. We demonstrate that fractal dimension fails to predict generalization of models trained from poor initializations. We further identify that the $\ell^2$ norm of the final parameter iterate, one of the simplest complexity measures in learning theory, correlates more strongly with the generalization gap than these notions of fractal dimension. Finally, our study reveals the intriguing manifestation of model-wise double descent in persistent homology-based generalization measures. This work lays the ground for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization.
Updated: 2024-06-04 11:56:19
标题: 关于分形维度作为泛化度量的局限性
摘要: 对于过度参数化神经网络的一般化差距进行界定和预测仍然是理论机器学习中的一个中心性未解问题。已经提出了神经网络优化轨迹具有分形结构的观点,导致基于这些轨迹的分形维度概念的界限和一般化度量。显著地,已经提出了Hausdorff维度和持续同调维度与一般化差距相关,因此作为一种一般化度量。这项工作对这些拓扑一般化度量进行了扩展评估。我们证明,分形维度无法预测从较差初始化训练的模型的一般化。我们进一步确定,最终参数迭代的$\ell^2$范数,是学习理论中最简单的复杂度度量之一,与一般化差距的相关性比这些分形维度概念更强。最后,我们的研究揭示了基于持续同调的一般化度量中模型特定的双下降现象。这项工作为更深入地探讨分形几何、拓扑数据分析和神经网络优化之间的因果关系奠定了基础。
更新时间: 2024-06-04 11:56:19
领域: cs.LG,cs.AI,math.DS,stat.ML
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
Updated: 2024-06-04 11:54:09
标题: MALIBO:基于元学习的无似然贝叶斯优化
摘要: 贝叶斯优化(BO)是一种优化昂贵黑盒函数的流行方法。传统的BO会从头开始优化每个新的目标任务,而元学习已经成为一种利用相关任务知识来更快地优化新任务的方法。然而,现有的元学习BO方法依赖于容易受到可伸缩性问题和在任务间具有不同尺度和噪声类型观察的替代模型。此外,它们经常忽视与任务相似性相关的不确定性。这导致当仅获得有限观察或新任务与相关任务明显不同时,任务适应性不可靠。为了解决这些限制,我们提出了一种新颖的元学习BO方法,绕过替代模型直接学习跨任务查询的效用。我们的方法明确地建模任务不确定性,并包括一个辅助模型,以实现对新任务的强大适应性。广泛的实验证明,我们的方法表现出强大的任何时候性能,并在各种基准测试中优于现有的元学习BO方法。
更新时间: 2024-06-04 11:54:09
领域: cs.LG,stat.ML
Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models
The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations - realistic-looking details that do not exist in the ground truth images. The presence of hallucinations introduces uncertainty regarding the reliability of the models' predictions, raising major concerns about their practical application. In this paper, we employ information-theory tools to investigate this phenomenon, revealing a fundamental tradeoff between uncertainty and perception. We rigorously analyze the relationship between these two factors, proving that the global minimal uncertainty in generative models grows in tandem with perception. In particular, we define the inherent uncertainty of the restoration problem and show that attaining perfect perceptual quality entails at least twice this uncertainty. Additionally, we establish a relation between mean squared-error distortion, uncertainty and perception, through which we prove the aforementioned uncertainly-perception tradeoff induces the well-known perception-distortion tradeoff. This work uncovers fundamental limitations of generative models in achieving both high perceptual quality and reliable predictions for image restoration. We demonstrate our theoretical findings through an analysis of single image super-resolution algorithms. Our work aims to raise awareness among practitioners about this inherent tradeoff, empowering them to make informed decisions and potentially prioritize safety over perceptual performance.
Updated: 2024-06-04 11:53:44
标题: 看起来太美好以至于难以置信:生成恢复模型中幻觉的信息论分析
摘要: 图像恢复中高感知质量的追求推动了具有革命性生成模型的发展,这些模型能够产生与真实数据在视觉上难以区分的结果。然而,随着它们的感知质量不断提高,这些模型也表现出越来越多生成幻觉的倾向 - 看起来逼真的细节在真实图像中并不存在。幻觉的存在引入了关于模型预测可靠性的不确定性,引发了对其实际应用的主要关注。在本文中,我们利用信息理论工具研究了这一现象,揭示了不确定性和感知之间的根本权衡。我们严格分析了这两个因素之间的关系,证明了生成模型中的全局最小不确定性与感知一起增长。特别是,我们定义了恢复问题的固有不确定性,并展示了实现完美感知质量至少需要两倍于此不确定性。此外,我们通过均方误差失真、不确定性和感知之间建立了关系,通过这一关系证明了前述不确定性-感知权衡引发了众所周知的感知-失真权衡。这项工作揭示了生成模型在实现高感知质量和可靠预测方面的基本限制。我们通过对单图像超分辨率算法的分析展示了我们的理论发现。我们的工作旨在提高从业者对这种固有权衡的认识,使他们能够做出明智的决策,可能将安全置于感知性能之上。
更新时间: 2024-06-04 11:53:44
领域: cs.LG,cs.AI,cs.CV,eess.IV
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.
Updated: 2024-06-04 11:51:53
标题: 解开和减轻在检索增强的大型语言模型中的检索不一致性
摘要: 尽管检索增强型大型语言模型(RALMs)在事实性方面表现出优越性,但它们并不始终优于原始的无检索语言模型(LMs)。我们的实验揭示了这种示例级性能的不一致性不仅存在于检索增强和无检索LM之间,还存在于不同的检索器之间。为了理解这一现象,我们调查了RALMs的退化行为,并在理论上将其分解为四类。根据我们的分解进行的进一步分析揭示,知识来源的固有差异和读者模型的不可预测退化最大程度地导致了不一致性。借鉴我们的分析,我们引入了可训练的Retrievers集成(EoR),这是一个可适应性地从不同知识来源检索并有效减少不可预测的读者错误的框架。我们在开放领域问答实验中发现,EoR通过显著减少不一致行为,大幅提高了性能,明显优于单一检索器的RALM。
更新时间: 2024-06-04 11:51:53
领域: cs.AI,cs.CL
Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence
The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the system parameters. Nevertheless, this presumption often proves unattainable in practical scenarios due to factors such as estimation errors, system updates, etc. This paper aims to take the first attempt to develop a triadic-OCD framework with certifiable robustness, provable optimality, and guaranteed convergence. In addition, the proposed triadic-OCD algorithm can be realized in a fully asynchronous distributed manner, easing the necessity of transmitting the data to a single server. This asynchronous mechanism could also mitigate the straggler issue that faced by traditional synchronous algorithm. Moreover, the non-asymptotic convergence property of Triadic-OCD is theoretically analyzed, and its iteration complexity to achieve an $\epsilon$-optimal point is derived. Extensive experiments have been conducted to elucidate the effectiveness of the proposed method.
Updated: 2024-06-04 11:40:50
标题: 三元- OCD:具有可证明的鲁棒性、最优性和收敛性的异步在线变化检测
摘要: 在线变化检测(OCD)的主要目标是及时识别数据流中的变化。 OCD问题在各种领域中有着广泛的应用,例如智能电网中的安全检测和通信网络中的入侵检测。先前的研究通常假设对系统参数具有精确的知识。然而,在实际场景中,由于诸如估计误差、系统更新等因素,这种假设通常难以实现。本文旨在首次尝试开发一个具有可证明的鲁棒性、可证明的最优性和保证的收敛性的三元-OCD框架。此外,所提出的三元-OCD算法可以以完全异步分布的方式实现,减轻了向单个服务器传输数据的必要性。这种异步机制还可以缓解传统同步算法面临的拖后者问题。此外,三元-OCD的非渐近收敛性质在理论上进行了分析,并推导了实现$\epsilon$-最优点的迭代复杂度。进行了大量实验,以阐明所提出方法的有效性。
更新时间: 2024-06-04 11:40:50
领域: stat.ML,cs.AI,cs.LG
Fingerprint Matching with Localized Deep Representation
Compared to minutia-based fingerprint representations, fixed-length representations are attractive due to simple and efficient matching. However, fixed-length fingerprint representations are limited in accuracy when matching fingerprints with different visible areas, which can occur due to different finger poses or acquisition methods. To address this issue, we propose a localized deep representation of fingerprint, named LDRF. By focusing on the discriminative characteristics within local regions, LDRF provides a more robust and accurate fixed-length representation for fingerprints with variable visible areas. LDRF can be adapted to retain information within any valid area, making it highly flexible. The matching scores produced by LDRF also exhibit intuitive statistical characteristics, which led us to propose a matching score normalization technique to mitigate the uncertainty in the cases of very small overlapping area. With this new technique, we can maintain a high level of accuracy and reliability in our fingerprint matching, even as the size of the database grows rapidly. Our experimental results on 21 datasets containing over 140K fingerprints of various finger poses and impression types show that LDRF outperforms other fixed-length representations and is robust to sensing technologies and impression types. Besides, the proposed matching score normalization effectively reduces the false match rate (FMR) in large-scale identification experiments comprising over 5.11 million fingerprints. Specifically, this technique results in a reduction of two orders of magnitude compared to matching without matching score normalization and five orders of magnitude compared to prior works.
Updated: 2024-06-04 11:39:34
标题: 指纹匹配与局部深度表示
摘要: 与基于细节的指纹表示相比,固定长度表示因匹配简单高效而具有吸引力。然而,当匹配具有不同可见区域的指纹时,固定长度指纹表示在准确性上存在局限性,这可能是由于不同的手指姿势或采集方法引起的。为了解决这个问题,我们提出了一种名为LDRF的指纹局部深度表示。通过专注于局部区域内的区分特征,LDRF为可变可见区域的指纹提供了更稳健和准确的固定长度表示。LDRF可以适应保留任何有效区域内的信息,使其具有高度灵活性。由LDRF产生的匹配分数还展现出直观的统计特征,这促使我们提出了一种匹配分数归一化技术,以减轻在非常小的重叠区域情况下的不确定性。通过这种新技术,即使数据库规模迅速增长,我们仍可以保持高水平的准确性和可靠性。我们在包含超过140K个不同手指姿势和印象类型的指纹的21个数据集上的实验结果表明,LDRF优于其他固定长度表示,并且对传感技术和印象类型具有鲁棒性。此外,所提出的匹配分数归一化有效地降低了包含超过511万个指纹的大规模识别实验中的错误匹配率(FMR)。具体而言,与不进行匹配分数归一化相比,这种技术减少了两个数量级,与之前的工作相比减少了五个数量级。
更新时间: 2024-06-04 11:39:34
领域: cs.CV,cs.AI
Riemannian coordinate descent algorithms on matrix manifolds
Many machine learning applications are naturally formulated as optimization problems on Riemannian manifolds. The main idea behind Riemannian optimization is to maintain the feasibility of the variables while moving along a descent direction on the manifold. This results in updating all the variables at every iteration. In this work, we provide a general framework for developing computationally efficient coordinate descent (CD) algorithms on matrix manifolds that allows updating only a few variables at every iteration while adhering to the manifold constraint. In particular, we propose CD algorithms for various manifolds such as Stiefel, Grassmann, (generalized) hyperbolic, symplectic, and symmetric positive (semi)definite. While the cost per iteration of the proposed CD algorithms is low, we further develop a more efficient variant via a first-order approximation of the objective function. We analyze their convergence and complexity, and empirically illustrate their efficacy in several applications.
Updated: 2024-06-04 11:37:11
标题: Riemannian coordinate descent algorithms 在矩阵流形上
摘要: 许多机器学习应用自然地被表述为在黎曼流形上的优化问题。黎曼优化的主要思想是在流形上沿着下降方向移动时保持变量的可行性。这导致在每次迭代中更新所有变量。在这项工作中,我们提供了一个通用的框架,用于在矩阵流形上开发计算效率高的坐标下降(CD)算法,该算法允许仅在每次迭代中更新少量变量,同时遵守流形约束。具体而言,我们提出了针对各种流形的CD算法,如斯蒂费尔、格拉斯曼、(广义)双曲线、辛、以及对称正(半)定流形。虽然所提出的CD算法每次迭代的成本低,但我们通过对目标函数的一阶近似进一步开发了一种更高效的变体。我们分析了它们的收敛性和复杂性,并在几个应用中实证地展示了它们的有效性。
更新时间: 2024-06-04 11:37:11
领域: math.OC,cs.LG,stat.ML
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models
Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bridge this gap, we propose FedMKT, a parameter-efficient federated mutual knowledge transfer framework for large and small language models. This framework is designed to adaptively transfer knowledge from the server's LLM to clients' SLMs while concurrently enriching the LLM with clients' unique domain insights. We facilitate token alignment using minimum edit distance (MinED) and then selective mutual knowledge transfer between client-side SLMs and a server-side LLM, aiming to collectively enhance their performance. Through extensive experiments across three distinct scenarios, heterogeneous, homogeneous, and one-to-one, we evaluate the effectiveness of FedMKT using various public LLMs and SLMs on a range of NLP text generation tasks. Empirical results demonstrate significant performance improvements in clients' SLMs with the aid of the LLM. Furthermore, the LLM optimized by FedMKT achieves a performance comparable to that achieved through direct fine-tuning based on clients' data, highlighting the effectiveness and adaptability of FedMKT.
Updated: 2024-06-04 11:36:09
标题: FedMKT:大型和小型语言模型的联合相互知识传递
摘要: 最近在联邦式大规模语言模型(LLMs)领域的研究主要集中在使客户能够协作地对其本地部署的同质LLMs进行微调,或者将知识从基于服务器的LLMs转移到下游客户的小语言模型(SLMs)。然而,仍然存在一个显著的差距,即同时相互增强服务器的LLM和客户的SLMs。为了弥合这一差距,我们提出了FedMKT,这是一个用于大型和小型语言模型的参数高效的联邦相互知识转移框架。该框架旨在自适应地将知识从服务器的LLM转移到客户的SLMs,同时通过客户独特的领域见解丰富LLM。我们利用最小编辑距离(MinED)进行标记对齐,然后在客户端SLMs和服务器端LLM之间进行选择性相互知识转移,旨在共同提升它们的性能。通过在三种不同场景下(异构、同质和一对一)进行广泛实验,我们评估了FedMKT在各种NLP文本生成任务上使用各种公共LLMs和SLMs的有效性。实证结果表明,在LLM的帮助下,客户的SLMs表现出显著的性能改进。此外,通过FedMKT优化的LLM实现了与基于客户数据直接微调所实现的性能相当的效果,突出了FedMKT的有效性和适应性。
更新时间: 2024-06-04 11:36:09
领域: cs.CL,cs.AI
SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition
Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets.
Updated: 2024-06-04 11:33:40
标题: SMCL:用于长尾识别的显著性掩蔽对比学习
摘要: 实际世界的数据通常遵循长尾分布,不同类别之间样本数量存在严重不平衡。从不平衡数据中进行训练的问题在于,一些背景特征,对所有类别都普遍存在,可能在样本稀缺的类别中没有观察到。因此,这些背景特征会导致对“主要”类别的预测出现偏差。本文提出了一种新方法,即显著性掩蔽对比学习,利用显著性掩蔽和对比学习来缓解这一问题,并提高模型的泛化能力。我们的关键思想是使用显著性检测掩蔽图像的重要部分,并利用对比学习将掩蔽图像移至特征空间中的次要类别,以使掩蔽图像中的背景特征不再与原始类别相关联。实验结果表明,我们的方法在基准长尾数据集上实现了最先进的性能水平。
更新时间: 2024-06-04 11:33:40
领域: cs.CV,cs.LG
Exploring Precision and Recall to assess the quality and diversity of LLMs
We introduce a novel evaluation framework for Large Language Models (LLMs) such as \textsc{Llama-2} and \textsc{Mistral}, focusing on importing Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals new insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned on instruction dataset or with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges that current LLMs face in generating diverse and high-quality text. We release our code and data.
Updated: 2024-06-04 11:33:27
标题: 探讨精确度和召回率以评估LLMs的质量和多样性
摘要: 我们引入了一个新颖的评估框架,用于评估大型语言模型(LLMs),如\textsc{Llama-2}和\textsc{Mistral},重点关注将图像生成的精确度和召回率指标引入到文本生成中。这种方法允许对生成文本的质量和多样性进行细致评估,而无需对齐语料库。通过对最先进的语言模型进行全面评估,该研究揭示了它们在开放式生成任务上的性能,这是传统基准无法充分捕捉的。研究结果突显了在模型在指导数据集或人类反馈上进行微调时,生成样本的质量和多样性之间存在权衡。这项工作扩展了基于分布的自然语言处理评估工具包,为当前LLMs在生成多样化和高质量文本方面面临的实际能力和挑战提供了洞察。我们发布了我们的代码和数据。
更新时间: 2024-06-04 11:33:27
领域: cs.CL,cs.LG
Mask-based Invisible Backdoor Attacks on Object Detection
Deep learning models have achieved unprecedented performance in the domain of object detection, resulting in breakthroughs in areas such as autonomous driving and security. However, deep learning models are vulnerable to backdoor attacks. These attacks prompt models to behave similarly to standard models without a trigger; however, they act maliciously upon detecting a predefined trigger. Despite extensive research on backdoor attacks in image classification, their application to object detection remains relatively underexplored. Given the widespread application of object detection in critical real-world scenarios, the sensitivity and potential impact of these vulnerabilities cannot be overstated. In this study, we propose an effective invisible backdoor attack on object detection utilizing a mask-based approach. Three distinct attack scenarios were explored for object detection: object disappearance, object misclassification, and object generation attack. Through extensive experiments, we comprehensively examined the effectiveness of these attacks and tested certain defense methods to determine effective countermeasures. Code will be available at https://github.com/jeongjin0/invisible-backdoor-object-detection
Updated: 2024-06-04 11:28:42
标题: 基于遮罩的目标检测隐形后门攻击
摘要: 深度学习模型在目标检测领域取得了前所未有的性能,导致在自动驾驶和安全等领域取得了突破。然而,深度学习模型容易受到后门攻击的影响。这些攻击会促使模型在没有触发器的情况下表现出与标准模型类似的行为,但一旦检测到预定义的触发器,它们会表现出恶意行为。尽管针对图像分类中后门攻击进行了大量研究,但对目标检测的应用仍相对未被充分探索。鉴于目标检测在关键的现实场景中具有广泛的应用,这些漏洞的敏感性和潜在影响不容忽视。在本研究中,我们提出了一种利用基于掩码的方法对目标检测进行有效的隐形后门攻击。对目标检测进行了三种不同的攻击场景的探索:目标消失、目标误分类和目标生成攻击。通过大量实验,我们全面考察了这些攻击的有效性,并测试了一些防御方法来确定有效的对策。代码将在https://github.com/jeongjin0/invisible-backdoor-object-detection 上提供。
更新时间: 2024-06-04 11:28:42
领域: cs.CV,cs.AI,cs.CR,I.4.8
Investigating the Impact of Model Instability on Explanations and Uncertainty
Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. Some studies have tried to create confidence estimators for explanations, but none have investigated an existing link between uncertainty and explanation quality. We artificially simulate epistemic uncertainty in text input by introducing noise at inference time. In this large-scale empirical study, we insert different levels of noise perturbations and measure the effect on the output of pre-trained language models and different uncertainty metrics. Realistic perturbations have minimal effect on performance and explanations, yet masking has a drastic effect. We find that high uncertainty doesn't necessarily imply low explanation plausibility; the correlation between the two metrics can be moderately positive when noise is exposed during the training process. This suggests that noise-augmented models may be better at identifying salient tokens when uncertain. Furthermore, when predictive and epistemic uncertainty measures are over-confident, the robustness of a saliency map to perturbation can indicate model stability issues. Integrated Gradients shows the overall greatest robustness to perturbation, while still showing model-specific patterns in performance; however, this phenomenon is limited to smaller Transformer-based language models.
Updated: 2024-06-04 11:18:03
标题: 调查模型不稳定性对解释和不确定性的影响
摘要: 可解释的人工智能方法有助于理解模型行为,然而,对输入进行微小、不可察觉的扰动可能会大大扭曲解释。由于这些解释通常是整体评估的,在模型部署之前,很难评估特定解释的可信度。一些研究尝试创建解释的置信度估计器,但没有研究不确定性与解释质量之间的现有联系。我们通过在推断时引入噪声来人工模拟文本输入中的认知不确定性。在这项大规模的实证研究中,我们插入不同级别的噪声扰动,并测量对预训练语言模型和不同不确定性指标的影响。现实扰动对性能和解释几乎没有影响,但遮蔽却产生了显著影响。我们发现高不确定性并不一定意味着解释的合理性较低;当在训练过程中暴露噪声时,这两个指标之间的相关性可能是适度正向的。这表明,噪声增强模型可能更擅长在不确定时识别显著标记。此外,当预测和认知不确定度测量过于自信时,对扰动的显著性图的稳健性可以指示模型稳定性问题。综合梯度显示出对扰动的整体最大稳健性,同时仍显示出性能中的模型特定模式;然而,这种现象局限于较小的基于Transformer的语言模型。
更新时间: 2024-06-04 11:18:03
领域: cs.LG,cs.CL
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.
Updated: 2024-06-04 11:14:21
标题: SLTrain:一种稀疏加低秩方法,用于参数和内存高效的预训练
摘要: 大型语言模型(LLMs)在各种任务中展现出令人印象深刻的能力。然而,从头开始训练LLMs需要大量的计算能力和广泛的内存容量。最近的研究已经探索了权重上的低秩结构,以实现参数和内存的高效微调,无论是通过低秩适应还是因子分解。虽然对于微调来说很有效,但低秩结构通常不太适合预训练,因为它们限制了参数到一个低维子空间。在这项工作中,我们提出将权重参数化为低秩矩阵和稀疏矩阵的和进行预训练,我们将其称为SLTrain。低秩部分通过矩阵因子分解学习,而对于稀疏部分,我们采用一个简单的策略,即随机选择稀疏支持并仅学习固定支持中的非零条目。尽管简单,但随机固定支持稀疏学习策略与低秩学习结合时显著增强了预训练效果。我们的结果表明,与低秩参数化的预训练相比,SLTrain增加了最小额外参数和内存成本,但实现了明显更好的性能,可与全秩训练相媲美。值得注意的是,与量化和每层更新结合时,SLTrain可以将预训练LLaMA 7B模型的内存需求降低高达73%。
更新时间: 2024-06-04 11:14:21
领域: cs.LG
Rectifying Reinforcement Learning for Reward Matching
The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong resemblance to reinforcement learning (RL), that typically aims to maximize reward, due to their sequential decision-making processes. Recent works have studied connections between GFlowNets and maximum entropy (MaxEnt) RL, which modifies the standard objective of RL agents by learning an entropy-regularized objective. However, a critical theoretical gap persists: despite the apparent similarities in their sequential decision-making nature, a direct link between GFlowNets and standard RL has yet to be discovered, while bridging this gap could further unlock the potential of both fields. In this paper, we establish a new connection between GFlowNets and policy evaluation for a uniform policy. Surprisingly, we find that the resulting value function for the uniform policy has a close relationship to the flows in GFlowNets. Leveraging these insights, we further propose a novel rectified policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets, offering a new perspective. We compare RPE, MaxEnt RL, and GFlowNets in a number of benchmarks, and show that RPE achieves competitive results compared to previous approaches. This work sheds light on the previously unexplored connection between (non-MaxEnt) RL and GFlowNets, potentially opening new avenues for future research in both fields.
Updated: 2024-06-04 11:11:53
标题: 纠正奖励匹配的强化学习
摘要: 生成流网络(GFlowNet)是一个概率框架,其中代理通过学习随机策略和流函数来采样对象,其概率与未归一化奖励函数成比例。GFlowNets与强化学习(RL)有很强的相似性,通常旨在最大化奖励,因为它们具有顺序决策过程。最近的研究探讨了GFlowNets与最大熵(MaxEnt)RL之间的联系,后者通过学习熵正则化目标修改了RL代理的标准目标。然而,一个关键的理论差距仍然存在:尽管它们在顺序决策性质上表现出明显的相似性,但尚未发现GFlowNets与标准RL之间的直接联系,而弥合这一差距可能进一步释放两个领域的潜力。在本文中,我们建立了GFlowNets与均匀策略的策略评估之间的新联系。令人惊讶的是,我们发现均匀策略的结果值函数与GFlowNets中的流之间存在密切关系。利用这些眼见,我们进一步提出了一种新颖的校正策略评估(RPE)算法,它实现了与GFlowNets相同的奖励匹配效果,提供了一种新的视角。我们在多个基准测试中比较了RPE、MaxEnt RL和GFlowNets,并展示了RPE相对于先前方法实现了竞争性的结果。这项工作揭示了(非MaxEnt)RL和GFlowNets之间以前未开发的联系,可能为两个领域的未来研究开辟新的途径。
更新时间: 2024-06-04 11:11:53
领域: cs.LG
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concrete samples that provide the strongest signal for learning in a noisy dataset. In this work, we propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness and relevancy for use in multimodal learning. Our approach leverages strong foundation models for measuring visual-semantic information loss in multimodal representations. We demonstrate that this strongly correlates with human evaluation of concreteness in both single-word and sentence-level texts. Moreover, we show that curation using ICC complements existing approaches: It succeeds in selecting the highest quality samples from multimodal web-scale datasets to allow for efficient training in resource-constrained settings.
Updated: 2024-06-04 11:08:42
标题: ICC:量化多模态数据集策划中图像标题的具体性
摘要: 在多模态学习中,基于成对文本-图像数据的Web规模训练变得越来越重要,但受到野外数据集极其嘈杂的挑战。标准数据过滤方法成功地去除了不匹配的文本-图像对,但允许语义相关但高度抽象或主观的文本。这些方法缺乏细粒度的能力来隔离在嘈杂数据集中提供最强信号的最具体样本。在这项工作中,我们提出了一个新的度量标准,即图像标题具体性,该度量标准评估没有图像参考的标题文本,以衡量其具体性和相关性,用于多模态学习。我们的方法利用了衡量视觉-语义信息丢失的强基础模型在多模态表示中。我们展示了这与人类对单词级和句子级文本具体性的评估具有很强的相关性。此外,我们展示了使用ICC进行策划补充现有方法:它成功地从多模态Web规模数据集中选择了最高质量的样本,以便在资源受限的环境中进行高效的训练。
更新时间: 2024-06-04 11:08:42
领域: cs.LG,cs.CV
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Current Vision-and-Language Navigation (VLN) tasks mainly employ textual instructions to guide agents. However, being inherently abstract, the same textual instruction can be associated with different visual signals, causing severe ambiguity and limiting the transfer of prior knowledge in the vision domain from the user to the agent. To fill this gap, we propose Vision-and-Language Navigation with Multi-modal Prompts (VLN-MP), a novel task augmenting traditional VLN by integrating both natural language and images in instructions. VLN-MP not only maintains backward compatibility by effectively handling text-only prompts but also consistently shows advantages with different quantities and relevance of visual prompts. Possible forms of visual prompts include both exact and similar object images, providing adaptability and versatility in diverse navigation scenarios. To evaluate VLN-MP under a unified framework, we implement a new benchmark that offers: (1) a training-free pipeline to transform textual instructions into multi-modal forms with landmark images; (2) diverse datasets with multi-modal instructions for different downstream tasks; (3) a novel module designed to process various image prompts for seamless integration with state-of-the-art VLN models. Extensive experiments on four VLN benchmarks (R2R, RxR, REVERIE, CVDN) show that incorporating visual prompts significantly boosts navigation performance. While maintaining efficiency with text-only prompts, VLN-MP enables agents to navigate in the pre-explore setting and outperform text-based models, showing its broader applicability.
Updated: 2024-06-04 11:06:13
标题: 为什么只有文本:利用多模态提示提升视觉与语言导航
摘要: 当前的视觉与语言导航(VLN)任务主要采用文本指令来引导智能体。然而,由于其本质上是抽象的,同样的文本指令可能与不同的视觉信号相关联,导致严重的歧义并限制了用户向智能体传递视觉领域的先前知识。为填补这一空白,我们提出了一种新颖的任务,即具有多模态提示的视觉与语言导航(VLN-MP),通过将自然语言和图像整合到指令中,对传统的VLN进行增强。VLN-MP不仅通过有效处理仅文本提示来维持向后兼容性,而且在不同数量和相关性的视觉提示下一贯显示出优势。视觉提示的可能形式包括精确和相似的物体图像,提供了在各种导航场景下的适应性和多功能性。为了在统一框架下评估VLN-MP,我们实现了一个新的基准,提供了:(1)一个无需训练的管道,将文本指令转换为带有地标图像的多模态形式;(2)不同下游任务的多模态指令的多样数据集;(3)一个专门设计用于处理各种图像提示,并与最先进的VLN模型无缝集成的新模块。对四个VLN基准(R2R、RxR、REVERIE、CVDN)进行的大量实验表明,整合视觉提示显著提升了导航性能。在保持仅文本提示效率的同时,VLN-MP使智能体能够在预探索环境中导航,并在性能上胜过基于文本的模型,展示了其更广泛的适用性。
更新时间: 2024-06-04 11:06:13
领域: cs.CV,cs.AI,cs.CL
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explore the latent space of frozen TTS models, which is composed of the latent bottleneck activations of the DDM's denoiser. We identify that this space contains rich semantic information, and outline several novel methods for finding semantic directions within it, both supervised and unsupervised. We then demonstrate how these enable off-the-shelf audio editing, without any further training, architectural changes or data requirements. We present evidence of the semantic and acoustic qualities of the edited audio, and provide supplemental samples: https://latent-analysis-grad-tts.github.io/speech-samples/.
Updated: 2024-06-04 11:03:57
标题: 基于扩散的文本转语音模型的语义潜空间
摘要: Denoising Diffusion Models(DDMs)在文本转语音(TTS)领域的应用正在增加,为合成高质量语音提供了巨大价值。尽管它们展示出令人印象深刻的音频质量,但它们的语义能力的程度尚不清楚,并且控制它们合成的语音的声音特性仍然是一个挑战。受到图像合成的最新进展的启发,我们探索了冻结TTS模型的潜在空间,该空间由DDM的去噪器的潜在瓶颈激活组成。我们确定这个空间包含丰富的语义信息,并概述了几种在其中找到语义方向的新方法,包括监督和无监督。然后,我们演示了这些方法如何实现即插即用的音频编辑,无需进一步的训练、架构变更或数据要求。我们展示了编辑后音频的语义和声学特性的证据,并提供了补充样本:https://latent-analysis-grad-tts.github.io/speech-samples/。
更新时间: 2024-06-04 11:03:57
领域: cs.SD,cs.CL,cs.LG,eess.AS
Query-Enhanced Adaptive Semantic Path Reasoning for Inductive Knowledge Graph Completion
Conventional Knowledge graph completion (KGC) methods aim to infer missing information in incomplete Knowledge Graphs (KGs) by leveraging existing information, which struggle to perform effectively in scenarios involving emerging entities. Inductive KGC methods can handle the emerging entities and relations in KGs, offering greater dynamic adaptability. While existing inductive KGC methods have achieved some success, they also face challenges, such as susceptibility to noisy structural information during reasoning and difficulty in capturing long-range dependencies in reasoning paths. To address these challenges, this paper proposes the Query-Enhanced Adaptive Semantic Path Reasoning (QASPR) framework, which simultaneously captures both the structural and semantic information of KGs to enhance the inductive KGC task. Specifically, the proposed QASPR employs a query-dependent masking module to adaptively mask noisy structural information while retaining important information closely related to the targets. Additionally, QASPR introduces a global semantic scoring module that evaluates both the individual contributions and the collective impact of nodes along the reasoning path within KGs. The experimental results demonstrate that QASPR achieves state-of-the-art performance.
Updated: 2024-06-04 11:02:15
标题: 查询增强的自适应语义路径推理用于归纳式知识图完成
摘要: 传统的知识图谱补全(KGC)方法旨在利用现有信息推断不完整知识图谱(KGs)中的缺失信息,但在涉及新兴实体的情景中往往难以有效执行。归纳式KGC方法可以处理知识图谱中的新兴实体和关系,提供更大的动态适应性。虽然现有的归纳式KGC方法取得了一定的成功,但也面临挑战,比如在推理过程中容易受到嘈杂的结构信息的影响,难以捕捉推理路径中的长距离依赖关系。为了解决这些挑战,本文提出了查询增强自适应语义路径推理(QASPR)框架,同时捕捉KGs的结构和语义信息以增强归纳式KGC任务。具体来说,所提出的QASPR采用了一个依赖于查询的屏蔽模块,自适应地屏蔽嘈杂的结构信息,同时保留与目标密切相关的重要信息。此外,QASPR引入了一个全局语义评分模块,评估KGs中推理路径上节点的个体贡献和整体影响。实验结果表明,QASPR取得了最先进的性能。
更新时间: 2024-06-04 11:02:15
领域: cs.AI
The Deep Latent Space Particle Filter for Real-Time Data Assimilation with Uncertainty Quantification
In Data Assimilation, observations are fused with simulations to obtain an accurate estimate of the state and parameters for a given physical system. Combining data with a model, however, while accurately estimating uncertainty, is computationally expensive and infeasible to run in real-time for complex systems. Here, we present a novel particle filter methodology, the Deep Latent Space Particle filter or D-LSPF, that uses neural network-based surrogate models to overcome this computational challenge. The D-LSPF enables filtering in the low-dimensional latent space obtained using Wasserstein AEs with modified vision transformer layers for dimensionality reduction and transformers for parameterized latent space time stepping. As we demonstrate on three test cases, including leak localization in multi-phase pipe flow and seabed identification for fully nonlinear water waves, the D-LSPF runs orders of magnitude faster than a high-fidelity particle filter and 3-5 times faster than alternative methods while being up to an order of magnitude more accurate. The D-LSPF thus enables real-time data assimilation with uncertainty quantification for physical systems.
Updated: 2024-06-04 10:59:54
标题: 深层潜空间粒子滤波器用于实时数据同化与不确定性量化
摘要: 在数据同化中,观测数据与模拟数据融合,以获得给定物理系统状态和参数的准确估计。然而,将数据与模型结合,虽然可以准确估计不确定性,但在计算上是昂贵且对于复杂系统来说无法实时运行的。在这里,我们提出了一种新颖的粒子滤波方法,Deep Latent Space Particle filter或D-LSPF,它使用基于神经网络的替代模型来克服这种计算挑战。D-LSPF利用使用Wasserstein AEs获得的低维潜在空间进行滤波,通过修改的视觉变压器层进行降维和参数化潜在空间时间步进的变压器。正如我们在三个测试案例中展示的那样,包括多相管道流中的泄漏定位和非线性水波中的海床识别,D-LSPF比高保真度的粒子滤波快几个数量级,比替代方法快3-5倍,同时精度高出一个数量级。因此,D-LSPF实现了物理系统的实时数据同化和不确定性量化。
更新时间: 2024-06-04 10:59:54
领域: cs.CE,cs.AI,cs.LG
E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory
In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performance from the perspective of prototype theory and propose a method to address this issue. Specifically, we conduct extensive pilot experiments and find that ICL conforms to the prototype theory on fine-grained emotion recognition. Based on this theory, we uncover the following deficiencies in ICL: (1) It relies on prototypes (example-label pairs) that are semantically similar but emotionally inaccurate to predict emotions. (2) It is prone to interference from irrelevant categories, affecting the accuracy and robustness of the predictions. To address these issues, we propose an Emotion Context Learning method (E-ICL) on fine-grained emotion recognition. E-ICL relies on more emotionally accurate prototypes to predict categories by referring to emotionally similar examples with dynamic labels. Simultaneously, E-ICL employs an exclusionary emotion prediction strategy to avoid interference from irrelevant categories, thereby increasing its accuracy and robustness. Note that the entire process is accomplished with the assistance of a plug-and-play emotion auxiliary model, without additional training. Experiments on the fine-grained emotion datasets EDOS, Empathetic-Dialogues, EmpatheticIntent, and GoEmotions show that E-ICL achieves superior emotion prediction performance. Furthermore, even when the emotion auxiliary model used is lower than 10% of the LLMs, E-ICL can still boost the performance of LLMs by over 4% on multiple datasets.
Updated: 2024-06-04 10:59:43
标题: E-ICL: 通过原型理论提升细粒度情感识别
摘要: 背景学习(ICL)在各个领域(如知识获取、常识推理和语义理解)中取得了显著的表现。然而,在情感检测任务中,特别是细粒度情感识别方面,其性能明显下降。目前尚不清楚造成这种情况的根本原因。本文从原型理论的角度识别了ICL性能不佳的原因,并提出了解决这一问题的方法。具体来说,我们进行了大量的试验,发现ICL在细粒度情感识别上符合原型理论。基于这一理论,我们揭示了ICL存在以下不足之处:(1)它依赖于语义上相似但情感不准确的原型(示例-标签对)来预测情绪。(2)它容易受到无关类别的干扰,影响预测的准确性和稳健性。为了解决这些问题,我们提出了一种基于细粒度情感识别的情感背景学习方法(E-ICL)。E-ICL依赖于情感更准确的原型,通过参考具有动态标签的情感相似示例来预测类别。同时,E-ICL采用排除性情感预测策略,避免受到无关类别的干扰,从而提高了其准确性和稳健性。需要注意的是,整个过程借助插拔式情感辅助模型完成,无需额外训练。在细粒度情感数据集EDOS、Empathetic-Dialogues、EmpatheticIntent和GoEmotions上的实验结果显示,E-ICL实现了优越的情感预测性能。此外,即使所使用的情感辅助模型低于LLMs的10%,E-ICL仍然可以在多个数据集上将LLMs的性能提升超过4%。
更新时间: 2024-06-04 10:59:43
领域: cs.LG,cs.AI
Can CLIP help CLIP in learning 3D?
In this study, we explore an alternative approach to enhance contrastive text-image-3D alignment in the absence of textual descriptions for 3D objects. We introduce two unsupervised methods, $I2I$ and $(I2L)^2$, which leverage CLIP knowledge about textual and 2D data to compute the neural perceived similarity between two 3D samples. We employ the proposed methods to mine 3D hard negatives, establishing a multimodal contrastive pipeline with hard negative weighting via a custom loss function. We train on different configurations of the proposed hard negative mining approach, and we evaluate the accuracy of our models in 3D classification and on the cross-modal retrieval benchmark, testing image-to-shape and shape-to-image retrieval. Results demonstrate that our approach, even without explicit text alignment, achieves comparable or superior performance on zero-shot and standard 3D classification, while significantly improving both image-to-shape and shape-to-image retrieval compared to previous methods.
Updated: 2024-06-04 10:57:59
标题: CLIP能帮助CLIP学习3D吗?
摘要: 在这项研究中,我们探讨了一种增强对比文本-图像-3D对齐的替代方法,即在没有针对3D对象的文本描述的情况下。我们介绍了两种无监督方法,$I2I$ 和 $(I2L)^2$,利用CLIP对文本和2D数据的知识来计算两个3D样本之间的神经感知相似性。我们采用所提出的方法来挖掘3D难例,通过自定义损失函数建立一个带有难例权重的多模态对比流水线。我们在所提出的难例挖掘方法的不同配置上进行训练,并评估我们模型在3D分类和跨模态检索基准上的准确性,测试图像到形状和形状到图像的检索。结果表明,即使没有明确的文本对齐,我们的方法在零样本和标准3D分类上也能达到可比较或优越的性能,同时与先前的方法相比显著改善了图像到形状和形状到图像的检索。
更新时间: 2024-06-04 10:57:59
领域: cs.CV,cs.AI
Denoising Autoregressive Representation Learning
In this paper, we explore a new generative approach for learning visual representations. Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We find that training with Mean Squared Error (MSE) alone leads to strong representations. To enhance the image generation ability, we replace the MSE loss with the diffusion objective by using a denoising patch decoder. We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models. Notably, the optimal schedule differs significantly from the typical ones used in standard image diffusion models. Overall, despite its simple architecture, DARL delivers performance remarkably close to state-of-the-art masked prediction models under the fine-tuning protocol. This marks an important step towards a unified model capable of both visual perception and generation, effectively combining the strengths of autoregressive and denoising diffusion models.
Updated: 2024-06-04 10:47:02
标题: 去噪自回归表示学习
摘要: 在这篇论文中,我们探讨了一种学习视觉表示的新的生成方法。我们的方法DARL利用仅有解码器的Transformer来自回归地预测图像块。我们发现仅使用均方误差(MSE)训练可以得到强大的表示。为了增强图像生成能力,我们用去噪图像块解码器替换了MSE损失,采用了扩散目标。我们展示了通过使用定制的噪声计划和更大模型的长时间训练可以改进学习到的表示。值得注意的是,最佳计划与标准图像扩散模型中使用的典型计划有显著差异。总体而言,尽管其结构简单,DARL在微调协议下的性能接近于最先进的掩模预测模型。这标志着迈向一个统一模型的重要步骤,该模型能够同时进行视觉感知和生成,有效地结合了自回归和去噪扩散模型的优势。
更新时间: 2024-06-04 10:47:02
领域: cs.LG,cs.CV
HLOB -- Information Persistence and Structure in Limit Order Books
We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.
Updated: 2024-06-04 10:42:46
标题: HLOB -- 限价订单簿中的信息持久性和结构
摘要: 我们引入了一种新型的大规模深度学习模型,用于预测限价订单簿中价格的变化,并将其命名为`HLOB'。该架构(i)利用信息过滤网络(即三角形最大过滤图)编码的信息,揭示了在成交量水平之间更深层次和非平凡的依赖结构;(ii)通过借鉴开创性的同调卷积神经网络类别,保证了处理底层系统复杂性的确定性设计选择。我们在3个真实的限价订单簿数据集上对我们的模型进行了测试,每个数据集包括在纳斯达克交易的15只股票,并系统地描述了HLOB在哪些情景下胜过了最先进的架构。我们的方法揭示了限价订单簿中信息的空间分布以及在增加预测时间范围时信息的衰减,缩小了微观结构建模和基于深度学习的高频金融市场预测之间的差距。
更新时间: 2024-06-04 10:42:46
领域: q-fin.TR,cs.LG
On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data
We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.
Updated: 2024-06-04 10:35:16
标题: 关于从时间聚合的独立同分布数据中恢复因果关系
摘要: 我们考虑在一般设置中,时间聚合对瞬时(非时间)因果发现的影响。这是因为真实的因果时间滞后往往明显短于观测间隔。这种差异导致高度聚合,导致时间延迟因果性消失,瞬时依赖性显现。尽管我们期望这种瞬时依赖性在某种意义上与真实的因果关系一致,以使发现结果有意义,但目前尚不清楚我们需要何种类型的一致性以及何时才能满足这种一致性。我们以形式化的方式提出了功能一致性和条件独立性一致性,分别对应于基于功能因果模型的方法和基于条件独立性的方法,并提供这些一致性将得以保持的条件。我们在理论上和实验上表明,因果发现结果在完全非线性情况下可能会受到聚合的严重扭曲,但我们也发现,如果具有部分线性性或适当的先验条件,仍然可以从聚合数据中恢复因果关系。我们的研究结果表明,当解释来自这种数据的因果发现结果时,社区应该采取谨慎和细致的方法,并说明为什么以及何时聚合会扭曲因果发现方法的性能。
更新时间: 2024-06-04 10:35:16
领域: stat.ML,cs.LG
Fast and Scalable Multi-Kernel Encoder Classifier
This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process. Our theoretical analysis offers a population-level characterization of this approach using random variables. Empirically, our method demonstrates superior running time compared to standard approaches such as support vector machines and two-layer neural network, while achieving comparable classification accuracy across various simulated and real datasets.
Updated: 2024-06-04 10:34:40
标题: 快速可扩展的多核编码器分类器
摘要: 本文通过将核矩阵视为广义图,并利用图嵌入技术的最新进展,介绍了一种新的基于核的分类器。所提出的方法促进了快速和可扩展的核矩阵嵌入,并无缝地集成了多个核以增强学习过程。我们的理论分析使用随机变量对这种方法进行了人群水平的表征。从实证上看,与支持向量机和两层神经网络等标准方法相比,我们的方法在各种模拟和实际数据集上展示出了优越的运行时间,同时实现了可比的分类准确性。
更新时间: 2024-06-04 10:34:40
领域: cs.LG
DNCs Require More Planning Steps
Many recent works use machine learning models to solve various complex algorithmic problems. However, these models attempt to reach a solution without considering the problem's required computational complexity, which can be detrimental to their ability to solve it correctly. In this work we investigate the effect of computational time and memory on generalization of implicit algorithmic solvers. To do so, we focus on the Differentiable Neural Computer (DNC), a general problem solver that also lets us reason directly about its usage of time and memory. In this work, we argue that the number of planning steps the model is allowed to take, which we call "planning budget", is a constraint that can cause the model to generalize poorly and hurt its ability to fully utilize its external memory. We evaluate our method on Graph Shortest Path, Convex Hull, Graph MinCut and Associative Recall, and show how the planning budget can drastically change the behavior of the learned algorithm, in terms of learned time complexity, training time, stability and generalization to inputs larger than those seen during training.
Updated: 2024-06-04 10:31:03
标题: DNCs需要更多的规划步骤
摘要: 许多最近的研究使用机器学习模型来解决各种复杂的算法问题。然而,这些模型试图在不考虑问题所需的计算复杂度的情况下达到解决方案,这可能对它们正确解决问题的能力产生不利影响。在这项工作中,我们研究了计算时间和内存对隐式算法求解器概括性的影响。为此,我们专注于可微分神经计算机(DNC),这是一个通用问题求解器,也让我们直接推理其时间和内存的使用情况。在这项工作中,我们认为模型被允许采取的计划步骤数量,我们称之为“计划预算”,是一个限制条件,可能导致模型泛化能力差,损害其充分利用外部内存的能力。我们在图最短路径、凸包、图最小割和联想回忆等问题上评估了我们的方法,并展示了计划预算如何极大地改变了学习算法的行为,包括学习的时间复杂度、训练时间、稳定性以及对比训练中看到的更大输入的泛化能力。
更新时间: 2024-06-04 10:31:03
领域: cs.LG
LLMs cannot find reasoning errors, but can correct them given the error location
While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023b; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we show that poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake. Firstly, we benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task, even in highly objective, unambiguous cases. Secondly, we test the correction abilities of LLMs -- separately from mistake finding -- using a backtracking setup that feeds ground truth mistake location information to the model. We show that this boosts downstream task performance across our 5 reasoning tasks, indicating that LLMs' correction abilities are robust. Finally, we show that it is possible to obtain mistake location information without ground truth labels or in-domain training data. We train a small classifier with out-of-domain data, which exhibits stronger mistake-finding performance than prompting a large model. We release our dataset of LLM-generated logical mistakes, BIG-Bench Mistake, to enable further research into locating LLM reasoning mistakes.
Updated: 2024-06-04 10:25:13
标题: LLMs无法发现推理错误,但可以在错误位置进行修正
摘要: 尽管自我纠正在改善LLM在风格和质量方面的输出方面表现出了潜力(例如Chen等人,2023b; Madaan等人,2023),但最近试图自我纠正逻辑或推理错误往往会导致正确答案变为不正确,从而导致整体表现更差(Huang等人,2023)。在本文中,我们表明糟糕的自我纠正表现源于LLM无法发现逻辑错误,而不是他们能够纠正已知错误。首先,我们在一些最先进的LLM上进行基准测试,评估它们发现错误的能力,并证明它们通常在任务中遇到困难,即使在高度客观、明确的情况下也是如此。其次,我们测试LLM的纠错能力--与发现错误分开--使用一个回溯设置,将地面真实错误位置信息提供给模型。我们表明这提升了我们5个推理任务中的下游任务表现,表明LLM的纠错能力是强大的。最后,我们表明可以在没有地面真实标签或域内训练数据的情况下获得错误位置信息。我们使用域外数据训练了一个小分类器,其发现错误的表现比提示一个大模型更强。我们发布了我们的LLM生成的逻辑错误数据集BIG-Bench Mistake,以促进进一步研究发现LLM推理错误的工作。
更新时间: 2024-06-04 10:25:13
领域: cs.AI,cs.CL,cs.LG
Epistemic Uncertainty-Weighted Loss for Visual Bias Mitigation
Deep neural networks are highly susceptible to learning biases in visual data. While various methods have been proposed to mitigate such bias, the majority require explicit knowledge of the biases present in the training data in order to mitigate. We argue the relevance of exploring methods which are completely ignorant of the presence of any bias, but are capable of identifying and mitigating them. Furthermore, we propose using Bayesian neural networks with a predictive uncertainty-weighted loss function to dynamically identify potential bias in individual training samples and to weight them during training. We find a positive correlation between samples subject to bias and higher epistemic uncertainties. Finally, we show the method has potential to mitigate visual bias on a bias benchmark dataset and on a real-world face detection problem, and we consider the merits and weaknesses of our approach.
Updated: 2024-06-04 10:24:11
标题: 视觉偏见缓解的认知不确定性加权损失
摘要: 深度神经网络在视觉数据中容易受到学习偏见的影响。虽然已经提出了各种方法来减轻这种偏见,但大多数方法需要明确了解训练数据中存在的偏见才能进行减轻。我们认为探索那些完全不知道任何偏见存在但能够识别和减轻偏见的方法是相关的。此外,我们提出使用具有预测不确定性加权损失函数的贝叶斯神经网络动态识别个别训练样本中的潜在偏见并在训练过程中对它们进行加权。我们发现受偏见影响的样本与更高的认知不确定性之间存在正相关关系。最后,我们展示了该方法有潜力在偏见基准数据集和现实世界的人脸检测问题中减轻视觉偏见,并考虑了我们方法的优点和缺点。
更新时间: 2024-06-04 10:24:11
领域: cs.CV,cs.AI,cs.CY,cs.LG
On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions
The Gumbel-Softmax probability distribution allows learning discrete tokens in generative learning, while the Gumbel-Argmax probability distribution is useful in learning discrete structures in discriminative learning. Despite the efforts invested in optimizing these probability models, their statistical properties are under-explored. In this work, we investigate their representation properties and determine for which families of parameters these probability distributions are complete, i.e., can represent any probability distribution, and minimal, i.e., can represent a probability distribution uniquely. We rely on convexity and differentiability to determine these statistical conditions and extend this framework to general probability models, such as Gaussian-Softmax and Gaussian-Argmax. We experimentally validate the qualities of these extensions, which enjoy a faster convergence rate. We conclude the analysis by identifying two sets of parameters that satisfy these assumptions and thus admit a complete and minimal representation. Our contribution is theoretical with supporting practical evaluation.
Updated: 2024-06-04 10:22:12
标题: 关于扰动Softmax和扰动Argmax概率分布的统计表示特性
摘要: Gumbel-Softmax概率分布允许在生成学习中学习离散标记,而Gumbel-Argmax概率分布在区分学习中学习离散结构时很有用。尽管在优化这些概率模型方面投入了努力,但它们的统计特性仍未被充分探索。在这项工作中,我们研究了它们的表示特性,并确定了哪些参数族可以完整地表示这些概率分布,即可以表示任何概率分布,并且最小地表示,即可以唯一地表示概率分布。我们依靠凸性和可微性来确定这些统计条件,并将这个框架扩展到一般的概率模型,例如Gaussian-Softmax和Gaussian-Argmax。我们通过实验证实了这些扩展的质量,其收敛速度更快。最后,我们通过确定满足这些假设的两组参数来总结分析,从而获得完整和最小表示。我们的贡献是在理论上,并支持实际评估。
更新时间: 2024-06-04 10:22:12
领域: cs.LG,cs.AI,stat.ML
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Despite its widespread adoption as the prominent neural architecture, the Transformer has spurred several independent lines of work to address its limitations. One such approach is selective state space models, which have demonstrated promising results for language modelling. However, their feasibility for learning self-supervised, general-purpose audio representations is yet to be investigated. This work proposes Audio Mamba, a selective state space model for learning general-purpose audio representations from randomly masked spectrogram patches through self-supervision. Empirical results on ten diverse audio recognition downstream tasks show that the proposed models, pretrained on the AudioSet dataset, consistently outperform comparable self-supervised audio spectrogram transformer (SSAST) baselines by a considerable margin and demonstrate better performance in dataset size, sequence length and model size comparisons.
Updated: 2024-06-04 10:19:14
标题: 音频曼巴:自监督音频表示的选择性状态空间
摘要: 尽管Transformer已被广泛采用作为主要的神经架构,但它也引发了几条独立的研究线来解决其局限性。其中一种方法是选择性状态空间模型,已经在语言建模方面取得了有希望的结果。然而,它们在学习自监督的通用音频表示方面的可行性尚未得到研究。本文提出了一种名为Audio Mamba的选择性状态空间模型,通过自监督学习从随机屏蔽的频谱图块中学习通用音频表示。在十个不同的音频识别任务上的实证结果显示,该模型在AudioSet数据集上预训练后,始终比可比的自监督音频频谱变换器(SSAST)基线表现出明显优势,并在数据集大小、序列长度和模型大小比较中表现更好。
更新时间: 2024-06-04 10:19:14
领域: cs.SD,cs.AI,eess.AS
Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing
Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
Updated: 2024-06-04 10:16:33
标题: 朝着在具有循环消息传递的图中的多智能体强化学习的泛化能力
摘要: 基于图的环境对多智能体强化学习提出了独特的挑战。在分散式方法中,智能体在给定的图中运行,并根据部分或过时的观察做出决策。观察到的邻域的大小限制了对不同图的泛化能力,影响了智能体的反应能力、所选择动作的质量和通信开销。本文侧重于泛化能力,并通过在整个图中实现连续信息流解决了观察到的邻域大小的权衡。我们提出了一种循环消息传递模型,该模型与环境的步骤进行迭代,允许节点通过与其邻居交换消息来创建图的全局表示。智能体根据其在图中的位置接收到所学习的图观察结果。我们的方法可以在运行时以分散式方式使用,并与所选择的强化学习算法结合使用。在通信网络中的路由背景下,我们在1000个不同的图中评估了我们的方法,并发现它使智能体能够泛化并适应图中的变化。
更新时间: 2024-06-04 10:16:33
领域: cs.MA,cs.AI
Soft Partitioning of Latent Space for Semantic Channel Equalization
Semantic channel equalization has emerged as a solution to address language mismatch in multi-user semantic communications. This approach aims to align the latent spaces of an encoder and a decoder which were not jointly trained and it relies on a partition of the semantic (latent) space into atoms based on the the semantic meaning. In this work we explore the role of the semantic space partition in scenarios where the task structure involves a one-to-many mapping between the semantic space and the action space. In such scenarios, partitioning based on hard inference results results in loss of information which degrades the equalization performance. We propose a soft criterion to derive the atoms of the partition which leverages the soft decoder's output and offers a more comprehensive understanding of the semantic space's structure. Through empirical validation, we demonstrate that soft partitioning yields a more descriptive and regular partition of the space, consequently enhancing the performance of the equalization algorithm.
Updated: 2024-06-04 10:15:42
标题: 潜空间软分区用于语义通道均衡
摘要: 语义通道均衡已经成为解决多用户语义通信中语言不匹配的解决方案。这种方法旨在对齐编码器和解码器的潜在空间,这两者并没有联合训练,它依赖于基于语义含义的语义(潜在)空间的原子划分。在这项工作中,我们探讨了语义空间划分在任务结构涉及语义空间和动作空间之间的一对多映射的情况下的作用。在这种情况下,基于硬推断结果的划分会导致信息丢失,从而降低均衡性能。我们提出了一个软准则来推导分区的原子,利用了软解码器的输出,并提供了对语义空间结构更全面的理解。通过实证验证,我们证明软划分产生了更具描述性和规则性的空间划分,从而提高了均衡算法的性能。
更新时间: 2024-06-04 10:15:42
领域: cs.LG,cs.IT,cs.MA,math.IT
One-Shot Federated Learning with Bayesian Pseudocoresets
Optimization-based techniques for federated learning (FL) often come with prohibitive communication cost, as high dimensional model parameters need to be communicated repeatedly between server and clients. In this paper, we follow a Bayesian approach allowing to perform FL with one-shot communication, by solving the global inference problem as a product of local client posteriors. For models with multi-modal likelihoods, such as neural networks, a naive application of this scheme is hampered, since clients will capture different posterior modes, causing a destructive collapse of the posterior on the server side. Consequently, we explore approximate inference in the function-space representation of client posteriors, hence suffering less or not at all from multi-modality. We show that distributed function-space inference is tightly related to learning Bayesian pseudocoresets and develop a tractable Bayesian FL algorithm on this insight. We show that this approach achieves prediction performance competitive to state-of-the-art while showing a striking reduction in communication cost of up to two orders of magnitude. Moreover, due to its Bayesian nature, our method also delivers well-calibrated uncertainty estimates.
Updated: 2024-06-04 10:14:39
标题: 一次性使用贝叶斯伪核心集的联邦学习
摘要: 基于优化的联邦学习(FL)技术通常伴随着高昂的通信成本,因为高维模型参数需要在服务器和客户端之间重复传输。在本文中,我们采用贝叶斯方法,允许通过一次性通信进行FL,通过将全局推断问题解决为局部客户后验的乘积来实现。对于具有多模态似然的模型,如神经网络,直接应用此方案受到阻碍,因为客户端将捕获不同的后验模式,在服务器端导致后验的破坏性崩溃。因此,我们探索在客户后验的函数空间表示中进行近似推断,因此在多模态情况下遭受较少或根本不受影响。我们展示了分布式函数空间推断与学习贝叶斯伪核集密切相关,并基于此见解开发了一种可行的贝叶斯FL算法。我们展示了这种方法在预测性能上达到了与最先进技术相竞争的水平,同时在通信成本上实现了高达两个数量级的显著降低。此外,由于其贝叶斯性质,我们的方法还提供了良好校准的不确定性估计。
更新时间: 2024-06-04 10:14:39
领域: cs.LG,cs.AI,cs.DC,stat.ML
Latent Space Alignment for Semantic Channel Equalization
We relax the constraint of a shared language between agents in a semantic and goal-oriented communication system to explore the effect of language mismatch in distributed task solving. We propose a mathematical framework, which provides a modelling and a measure of the semantic distortion introduced in the communication when agents use distinct languages. We then propose a new approach to semantic channel equalization with proven effectiveness through numerical evaluations.
Updated: 2024-06-04 10:13:13
标题: 潜在空间对齐用于语义信道均衡
摘要: 我们放宽代理之间共享语言的约束,建立了一个语义和目标导向的通信系统,以探讨在分布式任务解决中语言不匹配的影响。我们提出了一个数学框架,用于对代理使用不同语言时引入的语义失真进行建模和衡量。然后,我们提出了一种新的语义信道均衡方法,并通过数值评估证明其有效性。
更新时间: 2024-06-04 10:13:13
领域: cs.LG,cs.CL,cs.IT,math.IT
AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields
We present AROMA (Attentive Reduced Order Model with Attention), a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types, including irregular-grid inputs and point clouds. This versatility eliminates the need for patching and allows efficient processing of diverse geometries. The sequential nature of our latent representation can be interpreted spatially and permits the use of a conditional transformer for modeling the temporal dynamics of PDEs. By employing a diffusion-based formulation, we achieve greater stability and enable longer rollouts compared to conventional MSE training. AROMA's superior performance in simulating 1D and 2D equations underscores the efficacy of our approach in capturing complex dynamical behaviors.
Updated: 2024-06-04 10:12:09
标题: AROMA:利用局部神经网络字段保留潜在PDE建模的空间结构
摘要: 我们提出了AROMA(具有注意力机制的注意力减少模型),这是一个旨在利用局部神经场增强偏微分方程(PDEs)建模的框架。我们灵活的编码器-解码器架构可以从各种数据类型中获取空间物理场的平滑潜在表示,包括不规则网格输入和点云。这种灵活性消除了补丁的需求,允许有效处理各种几何形状。我们潜在表示的序列性质可以被解释为空间性质,并允许使用条件变换器来建模PDEs的时间动态。通过采用基于扩散的公式,与传统的均方误差(MSE)训练相比,我们实现了更大的稳定性,并实现了更长的推进。AROMA在模拟1D和2D方程中的卓越表现突显了我们方法在捕捉复杂动态行为方面的有效性。
更新时间: 2024-06-04 10:12:09
领域: cs.LG
Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees
Decision Tree Learning is a fundamental problem for Interpretable Machine Learning, yet it poses a formidable optimization challenge. Despite numerous efforts dating back to the early 1990's, practical algorithms have only recently emerged, primarily leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques. These breakthroughs led to the development of two distinct approaches. Algorithms like DL8.5 and MurTree operate on the space of nodes (or branches), they are very fast, but do not penalise complex Decision Trees, i.e. they do not solve for sparsity. On the other hand, algorithms like OSDT and GOSDT operate on the space of Decision Trees, they solve for sparsity but at the detriment of speed. In this work, we introduce Branches, a novel algorithm that integrates the strengths of both paradigms. Leveraging DP and B&B, Branches achieves exceptional speed while also solving for sparsity. Central to its efficiency is a novel analytical bound enabling substantial pruning of the search space. Theoretical analysis demonstrates that Branches has lower complexity compared to state-of-the-art methods, a claim validated through extensive empirical evaluation. Our results illustrate that Branches not only greatly outperforms existing approaches in terms of speed and number of iterations, it also consistently yields optimal Decision Trees.
Updated: 2024-06-04 10:11:46
标题: 分支:一种用于最优决策树的快速动态规划和分支限界算法
摘要: 决策树学习是可解释机器学习的一个基本问题,但它提出了一个艰巨的优化挑战。尽管自上世纪90年代以来已经有了许多努力,但实际算法只是最近才出现,主要利用动态规划(DP)和分支与界限(B&B)技术。这些突破导致了两种不同的方法的发展。像DL8.5和MurTree这样的算法在节点(或分支)空间上运行,它们非常快,但不惩罚复杂的决策树,即它们不解决稀疏性问题。另一方面,像OSDT和GOSDT这样的算法在决策树空间上运行,它们解决稀疏性问题,但牺牲了速度。在这项工作中,我们介绍了Branches,这是一种将两种范例的优势集成在一起的新算法。利用DP和B&B,Branches实现了出色的速度,同时也解决了稀疏性问题。其效率的核心是一种新颖的分析界限,可以大幅修剪搜索空间。理论分析表明,与最先进的方法相比,Branches具有较低的复杂度,这一说法经过了广泛的实证评估验证。我们的结果表明,Branches在速度和迭代次数方面不仅大大优于现有方法,而且始终产生最佳的决策树。
更新时间: 2024-06-04 10:11:46
领域: cs.LG
KnowGPT: Knowledge Graph based Prompting for Large Language Models
Large Language Models (LLMs) have demonstrated remarkable capabilities in many real-world applications. Nonetheless, LLMs are often criticized for their tendency to produce hallucinations, wherein the models fabricate incorrect statements on tasks beyond their knowledge and perception. To alleviate this issue, researchers have explored leveraging the factual knowledge in knowledge graphs (KGs) to ground the LLM's responses in established facts and principles. However, most state-of-the-art LLMs are closed-source, making it challenging to develop a prompting framework that can efficiently and effectively integrate KGs into LLMs with hard prompts only. Generally, existing KG-enhanced LLMs usually suffer from three critical issues, including huge search space, high API costs, and laborious prompt engineering, that impede their widespread application in practice. To this end, we introduce a novel Knowledge Graph based PrompTing framework, namely KnowGPT, to enhance LLMs with domain knowledge. KnowGPT contains a knowledge extraction module to extract the most informative knowledge from KGs, and a context-aware prompt construction module to automatically convert extracted knowledge into effective prompts. Experiments on three benchmarks demonstrate that KnowGPT significantly outperforms all competitors. Notably, KnowGPT achieves a 92.6% accuracy on OpenbookQA leaderboard, comparable to human-level performance.
Updated: 2024-06-04 10:10:37
标题: KnowGPT:基于知识图的大型语言模型提示
摘要: 大型语言模型(LLMs)已经在许多现实世界应用中展示出显著的能力。尽管如此,LLMs经常因其产生幻觉的倾向而受到批评,其中模型在超出其知识和感知范围的任务上制造不正确的陈述。为了缓解这一问题,研究人员已经探索利用知识图中的事实知识来将LLMs的回应基于已建立的事实和原则。然而,大多数最先进的LLMs都是闭源的,这使得开发一个能够高效有效地将知识图集成到LLMs中的提示框架具有挑战性,仅使用硬提示。通常,现有的知识图增强的LLMs通常存在三个关键问题,包括巨大的搜索空间、高昂的API成本和繁琐的提示工程,这些问题阻碍了它们在实践中的广泛应用。为此,我们引入了一个新颖的基于知识图的提示框架,即KnowGPT,以增强LLMs的领域知识。KnowGPT包含一个知识提取模块,用于从知识图中提取最具信息量的知识,以及一个上下文感知提示构建模块,自动将提取的知识转化为有效的提示。在三个基准测试上的实验表明,KnowGPT明显优于所有竞争对手。值得注意的是,KnowGPT在OpenbookQA排行榜上达到了92.6%的准确率,与人类水平的表现相当。
更新时间: 2024-06-04 10:10:37
领域: cs.CL,cs.AI
Pragmatic Goal-Oriented Communications under Semantic-Effectiveness Channel Errors
In forthcoming AI-assisted 6G networks, integrating semantic, pragmatic, and goal-oriented communication strategies becomes imperative. This integration will enable sensing, transmission, and processing of exclusively pertinent task data, ensuring conveyed information possesses understandable, pragmatic semantic significance, aligning with destination needs and goals. Without doubt, no communication is error free. Within this context, besides errors stemming from typical wireless communication dynamics, potential distortions between transmitter-intended and receiver-interpreted meanings can emerge due to limitations in semantic processing capabilities, as well as language and knowledge representation disparities between transmitters and receivers. The main contribution of this paper is two-fold. First, it proposes and details a novel mathematical modeling of errors stemming from language mismatches at both semantic and effectiveness levels. Second, it provides a novel algorithmic solution to counteract these types of errors which leverages optimal transport theory. Our numerical results show the potential of the proposed mechanism to compensate for language mismatches, thereby enhancing the attainability of reliable communication under noisy communication environments.
Updated: 2024-06-04 10:10:22
标题: 实用目标导向通信在语义有效性信道错误下的应用
摘要: 在即将到来的AI辅助的6G网络中,整合语义、语用和目标导向的沟通策略变得至关重要。这种整合将实现仅涉及相关任务数据的感知、传输和处理,确保传达的信息具有可理解的、语用语义上的重要性,与目标需求和目标一致。毫无疑问,没有通信是完全没有错误的。在这种情况下,除了源于典型无线通信动态的错误外,由于语义处理能力的限制以及发射机意图和接收机解释意义之间的语言和知识表示差异,潜在的发射机意图和接收机解释意义之间的失真也可能出现。本文的主要贡献是双重的。首先,它提出并详细介绍了一个新颖的数学建模,用于描述语言不匹配在语义和有效性水平上产生的错误。其次,它提供了一个利用最优传输理论来对抗这些类型错误的新颖算法解决方案。我们的数值结果显示了所提出机制弥补语言不匹配的潜力,从而增强在嘈杂通信环境下可靠通信的可实现性。
更新时间: 2024-06-04 10:10:22
领域: cs.IT,cs.LG,math.IT
Improving the Validity of Decision Trees as Explanations
In classification and forecasting with tabular data, one often utilizes tree-based models. Those can be competitive with deep neural networks on tabular data and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. We point out that decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Low-accuracy leaves give less valid explanations, which could be interpreted as unfairness among subgroups utilizing these explanations. Here, we train a shallow tree with the objective of minimizing the maximum misclassification error across all leaf nodes. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree can become comparable to state-of-the-art methods (e.g., well-tuned XGBoost) by extending the leaves with further models.
Updated: 2024-06-04 10:09:10
标题: 提高决策树作为解释的有效性
摘要: 在使用表格数据进行分类和预测时,人们经常使用基于树的模型。这些模型在表格数据上可以与深度神经网络竞争,并且在某些条件下是可解释的。可解释性取决于树的深度和树中每个叶子的准确性。我们指出,包含具有不平衡准确性的叶子的决策树可能提供误导性的解释。低准确性的叶子提供的解释较不可靠,这可能被解释为在利用这些解释的子组之间存在不公平。在这里,我们训练一个浅树,目标是最小化所有叶节点上的最大误分类错误。浅树提供全局解释,而通过使用进一步的模型扩展叶子,浅树的整体统计性能可以与最先进的方法(例如,经过良好调整的XGBoost)相媲美。
更新时间: 2024-06-04 10:09:10
领域: cs.LG,cs.AI,math.OC
Activation Addition: Steering Language Models Without Optimization
Reliably controlling the behavior of large language models is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback, prompt engineering and guided decoding. We instead investigate activation engineering: modifying activations at inference-time to predictably alter model behavior. We bias the forward pass with a 'steering vector' implicitly specified through natural language. Past work learned these steering vectors; our Activation Addition (ActAdd) method instead computes them by taking activation differences resulting from pairs of prompts. We demonstrate ActAdd on a range of LLMs (LLaMA-3, OPT, GPT-2, and GPT-J), obtaining SOTA on detoxification and negative-to-positive sentiment control. Our approach yields inference-time control over high-level properties of output like topic and sentiment while preserving performance on off-target tasks. ActAdd takes far less compute and implementation effort than finetuning or RLHF, allows users control through natural language, and its computational overhead (as a fraction of inference time) appears stable or improving over increasing model size.
Updated: 2024-06-04 10:08:39
标题: 激活添加:在不优化的情况下引导语言模型
摘要: 可靠地控制大型语言模型的行为是一个紧迫的开放性问题。现有的方法包括监督微调、从人类反馈中进行强化学习、提示工程和引导解码。我们相反研究激活工程:在推理时修改激活以可预测地改变模型行为。我们通过自然语言隐含地指定一个“转向向量”来偏置前向传播。过去的工作学习了这些转向向量;我们的Activation Addition(ActAdd)方法通过获取由提示对导致的激活差异来计算它们。我们在一系列LLMs(LLaMA-3、OPT、GPT-2和GPT-J)上展示了ActAdd,在解毒和负向到正向情绪控制方面获得了SOTA。我们的方法实现了对输出的高级属性(如主题和情绪)的推理时控制,同时保持了对非目标任务的性能。ActAdd的计算量和实现工作远低于微调或RLHF,允许用户通过自然语言进行控制,并且其计算开销(作为推理时间的一部分)似乎在增大模型尺寸时稳定或改善。
更新时间: 2024-06-04 10:08:39
领域: cs.CL,cs.LG
Learning the Hodgkin-Huxley Model with Operator Learning Techniques
We construct and compare three operator learning architectures, DeepONet, Fourier Neural Operator, and Wavelet Neural Operator, in order to learn the operator mapping a time-dependent applied current to the transmembrane potential of the Hodgkin- Huxley ionic model. The underlying non-linearity of the Hodgkin-Huxley dynamical system, the stiffness of its solutions, and the threshold dynamics depending on the intensity of the applied current, are some of the challenges to address when exploiting artificial neural networks to learn this class of complex operators. By properly designing these operator learning techniques, we demonstrate their ability to effectively address these challenges, achieving a relative L2 error as low as 1.4% in learning the solutions of the Hodgkin-Huxley ionic model.
Updated: 2024-06-04 10:04:54
标题: 使用运算学习技术学习霍奇金-赫胥黎模型
摘要: 我们构建并比较了三种运算符学习架构,DeepONet、Fourier神经算子和小波神经算子,以学习将时间依赖的应用电流映射到Hodgkin-Huxley离子模型的跨膜电位的运算符。Hodgkin-Huxley动力系统的底层非线性、其解的刚度以及与应用电流强度相关的阈值动力学是在利用人工神经网络学习这类复杂运算符时需要解决的一些挑战。通过适当设计这些运算符学习技术,我们展示了它们有效地解决这些挑战的能力,实现了学习Hodgkin-Huxley离子模型解的相对L2误差低至1.4%。
更新时间: 2024-06-04 10:04:54
领域: math.NA,cs.LG,cs.NA
Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model
Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques, integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.
Updated: 2024-06-04 10:04:05
标题: 跨学科材料科学中基于大型语言模型的材料知识图构建与应用
摘要: 材料科学领域的知识广泛分布在广泛的科学文献中,为高效发现和整合新材料带来了重大挑战。传统方法通常依赖昂贵且耗时的实验方法,进一步复杂化了快速创新的过程。为了解决这些挑战,人工智能与材料科学的整合开辟了加速发现过程的途径,尽管这也要求精确的注释、数据提取和信息的可追溯性。为了解决这些问题,本文介绍了材料知识图(MKG),该图利用先进的自然语言处理技术,结合大型语言模型,提取并系统地组织了十年的高质量研究成果到结构化的三元组中,包含162,605个节点和731,772条边。MKG将信息分类为全面的标签,如名称、公式和应用,围绕精心设计的本体结构化,从而增强数据的可用性和整合性。通过实施基于网络的算法,MKG不仅促进了有效的链接预测,还显著减少了对传统实验方法的依赖。这种结构化方法不仅简化了材料研究,还为更复杂的科学知识图奠定了基础。
更新时间: 2024-06-04 10:04:05
领域: cs.CL,cs.AI
Layer-2 Arbitrage: An Empirical Analysis of Swap Dynamics and Price Disparities on Rollups
This paper explores the dynamics of Decentralized Finance (DeFi) within the Layer-2 ecosystem, focusing on Automated Market Makers (AMM) and arbitrage on Ethereum rollups. We observe significant shifts in trading activity from Ethereum to rollups, with swaps on rollups happening 2-3 times more often, though, with lower trade volume. By examining the price differences between AMMs and centralized exchanges, we discover over 0.5 million unexploited arbitrage opportunities on rollups. Remarkably, we observe that these opportunities last, on average, 10 to 20 blocks, requiring adjustments to the LVR metrics to avoid double-counting arbitrage. Our results show that arbitrage in Arbitrum, Base, and Optimism pools ranges from 0.03% to 0.05% of trading volume, while in zkSync Era it oscillates around 0.25%, with the LVR metric overestimating arbitrage by a factor of five. Rollups offer not only lower gas fees, but also provide faster block production, leading to significant differences compared to the trading and arbitrage dynamics of Ethereum.
Updated: 2024-06-04 10:03:23
标题: Layer-2套利:对Rollups上的Swap动态和价格差异的实证分析
摘要: 本文探讨了去中心化金融(DeFi)在Layer-2生态系统中的动态,重点关注以太坊rollups上的自动做市商(AMM)和套利。我们观察到交易活动从以太坊转移到rollups,rollups上的交换频率比以太坊高2-3倍,尽管交易量较低。通过检查AMM和中心化交易所之间的价格差异,我们发现rollups上有超过50万个未开发利用的套利机会。值得注意的是,我们发现这些机会平均持续10到20个区块,需要调整LVR指标以避免对套利进行重复计数。我们的结果表明,在Arbitrum、Base和Optimism池中的套利范围为交易量的0.03%到0.05%,而在zkSync Era中波动在0.25%左右,LVR指标高估了套利约五倍。Rollups不仅提供更低的Gas费用,而且提供更快的区块生成速度,与以太坊的交易和套利动态相比有显著差异。
更新时间: 2024-06-04 10:03:23
领域: cs.CR
Data-driven Energy Efficiency Modelling in Large-scale Networks: An Expert Knowledge and ML-based Approach
The energy consumption of mobile networks poses a critical challenge. Mitigating this concern necessitates the deployment and optimization of network energy-saving solutions, such as carrier shutdown, to dynamically manage network resources. Traditional optimization approaches encounter complexity due to factors like the large number of cells, stochastic traffic, channel variations, and intricate trade-offs. This paper introduces the simulated reality of communication networks (SRCON) framework, a novel, data-driven modeling paradigm that harnesses live network data and employs a blend of machine learning (ML)- and expert-based models. These mix of models accurately characterizes the functioning of network components, and predicts network energy efficiency and user equipment (UE) quality of service for any energy carrier shutdown configuration in a specific network. Distinguishing itself from existing methods, SRCON eliminates the reliance on expensive expert knowledge, drive testing, or incomplete maps for predicting network performance. This paper details the pipeline employed by SRCON to decompose the large network energy efficiency modeling problem into ML and expert-based submodels. It demonstrates how, by embracing stochasticity, and carefully crafting the relationship between such submodels, the overall computational complexity can be reduced and prediction accuracy enhanced. Results derived from real network data underscore the paradigm shift introduced by SRCON, showcasing significant gains over a state-of-the art method used by a operator for network energy efficiency modeling. The reliability of this local, data-driven modeling of the network proves to be a key asset for network energy-saving optimization.
Updated: 2024-06-04 10:01:55
标题: 基于专家知识和机器学习的大规模网络数据驱动的能效建模
摘要: 移动网络的能源消耗构成了一个关键挑战。缓解这一问题需要部署和优化网络节能解决方案,如载波关闭,以动态管理网络资源。传统的优化方法遇到了复杂性的挑战,因为存在诸如大量小区、随机流量、信道变化和复杂的权衡等因素。本文介绍了模拟通信网络(SRCON)框架,这是一种新颖的、数据驱动的建模范式,利用实时网络数据并采用机器学习(ML)和专家模型的混合。这些模型的混合准确地描述了网络组件的功能,并预测了在特定网络中任何能量载波关闭配置下的网络能效和用户设备(UE)服务质量。与现有方法不同,SRCON消除了对昂贵的专家知识、驱动测试或不完整地图进行网络性能预测的依赖。本文详细介绍了SRCON采用的管道,将大规模网络能效建模问题分解成机器学习和基于专家的子模型。它展示了通过接受随机性,并精心制定这些子模型之间的关系,可以降低整体计算复杂性并提高预测准确性。从真实网络数据中得出的结果凸显了SRCON引入的范式转变,展示了与运营商用于网络能效建模的最先进方法相比的显著增益。这种本地、数据驱动的网络建模的可靠性被证明是网络节能优化的关键资产。
更新时间: 2024-06-04 10:01:55
领域: eess.SY,cs.LG,cs.SY
Towards Practical Non-Adversarial Distribution Matching
Distribution matching can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial matching methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for distribution matching. To overcome these limitations, we propose a non-adversarial VAE-based matching method that can be applied to any model pipeline. We develop a set of alignment upper bounds for distribution matching (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based matching approaches both theoretically and empirically. Finally, we demonstrate that our novel matching losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial matching methods.
Updated: 2024-06-04 10:00:47
标题: 朝向实用的非对抗性分布匹配
摘要: 分布匹配可用于学习具有公平性和鲁棒性应用的不变表示。大多数先前的工作都使用对抗匹配方法,但由此产生的极小极大问题不稳定且难以优化。非对抗的基于似然的方法要么需要模型的可逆性,要么对潜在先验施加约束,要么缺乏用于分布匹配的通用框架。为了克服这些限制,我们提出了一种基于非对抗VAE的匹配方法,可应用于任何模型管道。我们为分布匹配开发了一组对齐上界(包括一个嘈杂的上界),这些上界具有类似VAE的目标,但具有不同的视角。我们在理论和实证上仔细比较了我们的方法与先前基于VAE的匹配方法。最后,我们证明我们的新型匹配损失可以替代标准不变表示学习管道中的对抗损失,而无需修改原始体系结构 - 从而显著拓宽了非对抗匹配方法的适用范围。
更新时间: 2024-06-04 10:00:47
领域: cs.LG,stat.ML
Stein Random Feature Regression
In large-scale regression problems, random Fourier features (RFFs) have significantly enhanced the computational scalability and flexibility of Gaussian processes (GPs) by defining kernels through their spectral density, from which a finite set of Monte Carlo samples can be used to form an approximate low-rank GP. However, the efficacy of RFFs in kernel approximation and Bayesian kernel learning depends on the ability to tractably sample the kernel spectral measure and the quality of the generated samples. We introduce Stein random features (SRF), leveraging Stein variational gradient descent, which can be used to both generate high-quality RFF samples of known spectral densities as well as flexibly and efficiently approximate traditionally non-analytical spectral measure posteriors. SRFs require only the evaluation of log-probability gradients to perform both kernel approximation and Bayesian kernel learning that results in superior performance over traditional approaches. We empirically validate the effectiveness of SRFs by comparing them to baselines on kernel approximation and well-known GP regression problems.
Updated: 2024-06-04 09:57:19
标题: 斯坦随机特征回归
摘要: 在大规模回归问题中,随机傅立叶特征(RFFs)通过通过定义内核的谱密度显著增强了高斯过程(GPs)的计算可伸缩性和灵活性,从而可以使用一组蒙特卡洛样本来形成一个近似的低秩GP。然而,RFFs在内核近似和贝叶斯内核学习中的功效取决于能够可追踪地采样内核谱度量以及生成样本的质量。我们引入了Stein随机特征(SRF),利用Stein变分梯度下降,可以用来生成已知谱密度的高质量RFF样本,同时灵活高效地近似传统非解析谱度后验。SRFs只需要评估对数概率梯度就能执行内核近似和贝叶斯内核学习,从而在传统方法上取得优越的性能。我们通过在内核近似和众所周知的GP回归问题上将SRFs与基线进行比较来实证验证SRFs的有效性。
更新时间: 2024-06-04 09:57:19
领域: cs.LG,stat.ML
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accurate evaluation of the target policy. While prior work has considered behavior policy selection, in this paper, we additionally consider a safety constraint on the behavior policy. Namely, we assume there exists a known default policy that incurs a particular expected cost when run and we enforce that the cumulative cost of all behavior policies ran is better than a constant factor of the cost that would be incurred had we always run the default policy. We first show that there exists a class of intractable MDPs where no safe oracle algorithm with knowledge about problem parameters can efficiently collect data and satisfy the safety constraints. We then define the tractability condition for an MDP such that a safe oracle algorithm can efficiently collect data and using that we prove the first lower bound for this setting. We then introduce an algorithm SaVeR for this problem that approximates the safe oracle algorithm and bound the finite-sample mean squared error of the algorithm while ensuring it satisfies the safety constraint. Finally, we show in simulations that SaVeR produces low MSE policy evaluation while satisfying the safety constraint.
Updated: 2024-06-04 09:54:55
标题: SaVeR: 表格型MDP中安全策略评估的最佳数据收集策略
摘要: 在这篇论文中,我们研究了在表格化马尔可夫决策过程(MDPs)中进行安全数据收集的目的是为了政策评估。在政策评估中,我们给定一个\textit{目标}政策,并要求估计它将获得的期望累积奖励。政策评估需要数据,我们对于哪种\textit{行为}政策应该收集数据以最准确地评估目标政策感兴趣。虽然先前的工作已经考虑了行为政策的选择,在这篇论文中,我们另外考虑了对行为政策的安全约束。换句话说,我们假设存在一个已知的默认政策,当运行时会产生特定的预期成本,我们强制执行所有运行的行为政策的累积成本比总是运行默认政策产生的成本要好一定的常数因子。我们首先展示了存在一类无法解决的MDPs,即使具有关于问题参数的知识的安全oracle算法也无法有效地收集数据并满足安全约束。然后我们为MDP定义了可解性条件,使得安全oracle算法可以有效地收集数据,并利用此来证明这种情况的第一个下界。然后我们介绍了一个名为SaVeR的算法来解决这个问题,近似安全oracle算法,并限制算法的有限样本均方误差,同时确保它满足安全约束。最后,我们通过模拟表明SaVeR产生了低均方误差的政策评估,同时满足安全约束。
更新时间: 2024-06-04 09:54:55
领域: cs.LG
Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. We build upon the work by Bolte et al. (2022), who prove linear convergence of nonsmooth ITD under a piecewise Lipschitz smooth assumption. In the deterministic case, we provide a linear rate for AID and an improved linear rate for ITD which closely match the ones for the smooth setting. We further introduce NSID, a new stochastic method to compute the implicit derivative when the contraction map is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. We establish rates for the convergence of NSID, encompassing the best available rates in the smooth setting. We also present illustrative experiments confirming our analysis.
Updated: 2024-06-04 09:53:01
标题: 非光滑隐式微分:确定性和随机收敛速率
摘要: 我们研究了高效计算参数化不可微收缩映射不动点导数的问题。这个问题在机器学习中有广泛的应用,包括超参数优化、元学习和数据污染攻击。我们分析了两种流行的方法:迭代微分(ITD)和近似隐式微分(AID)。在非光滑设置中的一个关键挑战是链规则不再成立。我们在Bolte等人(2022)的工作基础上进行了研究,他们证明了在分段Lipschitz光滑假设下非光滑ITD的线性收敛性。在确定性情况下,我们为AID提供了线性速率,为ITD提供了改进的线性速率,这与光滑设置的速率非常接近。我们进一步引入了NSID,一种新的随机方法,用于计算收缩映射的隐式导数,当该映射被定义为外部映射和只能通过随机无偏估计器访问的内部映射的复合时。我们建立了NSID的收敛速率,涵盖了光滑设置中最佳可用速率。我们还展示了验证我们分析的实验。
更新时间: 2024-06-04 09:53:01
领域: stat.ML,cs.LG,math.OC
The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of Triggers
The area of Machine Learning as a Service (MLaaS) is experiencing increased implementation due to recent advancements in the AI (Artificial Intelligence) industry. However, this spike has prompted concerns regarding AI defense mechanisms, specifically regarding potential covert attacks from third-party providers that cannot be entirely trusted. Recent research has uncovered that auditory backdoors may use certain modifications as their initiating mechanism. DynamicTrigger is introduced as a methodology for carrying out dynamic backdoor attacks that use cleverly designed tweaks to ensure that corrupted samples are indistinguishable from clean. By utilizing fluctuating signal sampling rates and masking speaker identities through dynamic sound triggers (such as the clapping of hands), it is possible to deceive speech recognition systems (ASR). Our empirical testing demonstrates that DynamicTrigger is both potent and stealthy, achieving impressive success rates during covert attacks while maintaining exceptional accuracy with non-poisoned datasets.
Updated: 2024-06-04 09:50:47
标题: 欺骗的艺术:利用动态触发器堆叠的强大后门攻击
摘要: 机器学习即服务(MLaaS)领域由于人工智能(AI)行业最近的进展而经历了增加的实施。然而,这种增长引发了对AI防御机制的担忧,特别是涉及来自不能完全信任的第三方提供商的潜在秘密攻击。最近的研究发现,听觉后门可能使用某些修改作为它们的启动机制。DynamicTrigger被引入作为执行动态后门攻击的方法,它使用巧妙设计的调整来确保被损坏的样本与干净的样本无法辨别。通过利用波动的信号采样率和通过动态声音触发器(例如拍手)掩盖说话者身份,可以欺骗语音识别系统(ASR)。我们的实证测试表明,DynamicTrigger既强大又隐蔽,在秘密攻击中取得了令人印象深刻的成功率,同时在非毒害数据集上保持了出色的准确性。
更新时间: 2024-06-04 09:50:47
领域: cs.CR,cs.AI,cs.LG
Revisiting Differentially Private Hyper-parameter Tuning
We study the application of differential privacy in hyper-parameter tuning, a crucial process in machine learning involving selecting the best hyper-parameter from several candidates. Unlike many private learning algorithms, including the prevalent DP-SGD, the privacy implications of tuning remain insufficiently understood or often totally ignored. Recent works propose a generic private selection solution for the tuning process, yet a fundamental question persists: is this privacy bound tight? This paper provides an in-depth examination of this question. Initially, we provide studies affirming the current privacy analysis for private selection is indeed tight in general. However, when we specifically study the hyper-parameter tuning problem in a white-box setting, such tightness no longer holds. This is first demonstrated by applying privacy audit on the tuning process. Our findings underscore a substantial gap between current theoretical privacy bound and the empirical bound derived even under strong audit setups. This gap motivates our subsequent investigations. Our further study provides improved privacy results for private hyper-parameter tuning due to its distinct properties. Our results demonstrate broader applicability compared to prior analyses, which are limited to specific parameter configurations.
Updated: 2024-06-04 09:49:34
标题: 重新审视差分隐私超参数调整
摘要: 我们研究差分隐私在超参数调整中的应用,这是机器学习中一个关键的过程,涉及从多个候选项中选择最佳超参数。与许多私有学习算法(包括普遍使用的DP-SGD)不同,调整的隐私影响尚未被充分理解,或者经常被完全忽视。最近的研究提出了一个通用的私有选择解决方案,用于调整过程,然而一个基本问题仍然存在:这个隐私界限是否紧密? 本文对这个问题进行了深入的研究。首先,我们提供研究,确认当前针对私有选择的隐私分析在一般情况下确实是紧密的。然而,当我们特别研究白盒设置中的超参数调整问题时,这种紧密性不再存在。通过在调整过程中应用隐私审计,首次证明了这一点。我们的研究结果强调了当前理论隐私界限与即使在强审计设置下也得出的实证界限之间存在重大差距。 这种差距激发了我们接下来的调查。我们进一步的研究提供了针对私有超参数调整的改进隐私结果,因为它具有独特的特性。我们的结果表明,与先前的分析相比,它具有更广泛的适用性,先前的分析仅限于特定的参数配置。
更新时间: 2024-06-04 09:49:34
领域: cs.LG,cs.CR
A rank decomposition for the topological classification of neural representations
Neural networks can be thought of as applying a transformation to an input dataset. The way in which they change the topology of such a dataset often holds practical significance for many tasks, particularly those demanding non-homeomorphic mappings for optimal solutions, such as classification problems. In this work, we leverage the fact that neural networks are equivalent to continuous piecewise-affine maps, whose rank can be used to pinpoint regions in the input space that undergo non-homeomorphic transformations, leading to alterations in the topological structure of the input dataset. Our approach enables us to make use of the relative homology sequence, with which one can study the homology groups of the quotient of a manifold $\mathcal{M}$ and a subset $A$, assuming some minimal properties on these spaces. As a proof of principle, we empirically investigate the presence of low-rank (topology-changing) affine maps as a function of network width and mean weight. We show that in randomly initialized narrow networks, there will be regions in which the (co)homology groups of a data manifold can change. As the width increases, the homology groups of the input manifold become more likely to be preserved. We end this part of our work by constructing highly non-random wide networks that do not have this property and relating this non-random regime to Dale's principle, which is a defining characteristic of biological neural networks. Finally, we study simple feedforward networks trained on MNIST, as well as on toy classification and regression tasks, and show that networks manipulate the topology of data differently depending on the continuity of the task they are trained on.
Updated: 2024-06-04 09:47:30
标题: 一个用于神经表示拓扑分类的等级分解
摘要: 神经网络可以被视为对输入数据集应用转换。它们改变这种数据集的拓扑方式通常对许多任务具有实际意义,特别是对于那些需要非同胚映射以获得最佳解决方案的任务,比如分类问题。在这项工作中,我们利用神经网络等价于连续分段仿射映射的事实,其秩可以用来确定输入空间中经历非同胚变换的区域,从而导致输入数据集的拓扑结构发生变化。我们的方法使我们能够利用相对同调序列,通过它可以研究流形$\mathcal{M}$和子集$A$的商空间的同调群,假设这些空间具有一些最小特性。 作为原则证明,我们通过实证研究来考察低秩(改变拓扑结构)仿射映射的存在与网络宽度和平均权重之间的关系。我们展示了在随机初始化的窄网络中,会存在数据流形的同调群在某些区域发生变化的情况。随着宽度的增加,输入流形的同调群更有可能被保留。我们通过构建高度非随机的宽网络来结束我们工作的这一部分,并将这种非随机的状态与戴尔原则联系起来,这是生物神经网络的一个定义特征。 最后,我们研究了简单的前馈网络在MNIST上的训练,以及在玩具分类和回归任务上的训练,并展示了网络根据其训练连续性对数据的拓扑进行不同的操纵。
更新时间: 2024-06-04 09:47:30
领域: cs.LG,math.AT,q-bio.NC
Radar Spectra-Language Model for Automotive Scene Parsing
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely used for driver assistance functions, and are expected to be crucial for the success of autonomous driving in the future. In many perception tasks only pre-processed radar point clouds are considered. In contrast, radar spectra are a raw form of radar measurements and contain more information than radar point clouds. However, radar spectra are rather difficult to interpret. In this work, we aim to explore the semantic information contained in spectra in the context of automated driving, thereby moving towards better interpretability of radar spectra. To this end, we create a radar spectra-language model, allowing us to query radar spectra measurements for the presence of scene elements using free text. We overcome the scarcity of radar spectra data by matching the embedding space of an existing vision-language model (VLM). Finally, we explore the benefit of the learned representation for scene parsing, and obtain improvements in free space segmentation and object detection merely by injecting the spectra embedding into a baseline model.
Updated: 2024-06-04 09:45:04
标题: 雷达谱-用于汽车场景解析的语言模型
摘要: 雷达传感器具有低成本、远距离和抗风雨等优点。因此,它们被广泛用于驾驶辅助功能,并且预计将对未来自动驾驶的成功至关重要。在许多感知任务中,只考虑预处理的雷达点云。相比之下,雷达谱是雷达测量的原始形式,包含比雷达点云更多的信息。然而,雷达谱较难解释。在这项工作中,我们旨在探索雷达谱中包含的语义信息,以便在自动驾驶的背景下更好地解释雷达谱。为此,我们创建了一个雷达谱-语言模型,使我们能够使用自由文本查询雷达谱测量中是否存在场景元素。我们通过匹配现有视觉-语言模型(VLM)的嵌入空间来克服雷达谱数据的稀缺性。最后,我们探讨了学习到的表示对场景解析的好处,并仅通过将谱嵌入注入到基线模型中,就在自由空间分割和物体检测方面取得了改进。
更新时间: 2024-06-04 09:45:04
领域: cs.CV,cs.LG
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches $n_b \lesssim d^{\frac{\ell}{2}}$ minimizes the training time without changing the total sample complexity, where $\ell$ is the information exponent of the target to be learned \citep{arous2021online} and $d$ is the input dimension. However, larger batch sizes than $n_b \gg d^{\frac{\ell}{2}}$ are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, \textit{Correlation loss SGD}, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.
Updated: 2024-06-04 09:44:49
标题: 在线学习和信息指数:批处理大小和时间/复杂性权衡的重要性
摘要: 我们研究了批处理大小 $n_b$ 对使用一次随机梯度下降(SGD)训练具有各向同性协变量的多指数目标函数的两层神经网络的迭代时间 $T$ 的影响。我们将最小化迭代时间的最佳批处理大小表征为与目标的难度相关的函数,其由信息指数表征。我们表明,使用大批处理 $n_b \lesssim d^{\frac{\ell}{2}}$ 进行梯度更新可以最小化训练时间而不改变总样本复杂度,其中 $\ell$ 是待学习目标的信息指数 \citep{arous2021online},$d$ 是输入维度。然而,大于 $n_b \gg d^{\frac{\ell}{2}}$ 的批处理大小对于改进 SGD 的时间复杂度是有害的。我们通过一种不同的训练协议,\textit{相关损失 SGD},从根本上克服了这一限制,该协议抑制了损失函数中的自相关项。我们表明可以通过一组低维常微分方程(ODEs)跟踪训练进度。最后,我们通过数值实验验证了我们的理论结果。
更新时间: 2024-06-04 09:44:49
领域: stat.ML,cs.LG
Almost linear time differentially private release of synthetic graphs
In this paper, we give an almost linear time and space algorithms to sample from an exponential mechanism with an $\ell_1$-score function defined over an exponentially large non-convex set. As a direct result, on input an $n$ vertex $m$ edges graph $G$, we present the \textit{first} $\widetilde{O}(m)$ time and $O(m)$ space algorithms for differentially privately outputting an $n$ vertex $O(m)$ edges synthetic graph that approximates all the cuts and the spectrum of $G$. These are the \emph{first} private algorithms for releasing synthetic graphs that nearly match this task's time and space complexity in the non-private setting while achieving the same (or better) utility as the previous works in the more practical sparse regime. Additionally, our algorithms can be extended to private graph analysis under continual observation.
Updated: 2024-06-04 09:44:24
标题: 几乎线性时间差分隐私合成图释放
摘要: 在本文中,我们提出了一种几乎线性时间和空间复杂度的算法,用于从一个定义在指数级非凸集合上的$\ell_1$-得分函数的指数机制中进行采样。作为直接结果,针对输入为$n$个顶点$m$条边的图$G$,我们提出了\textit{首个} $\widetilde{O}(m)$时间和$O(m)$空间复杂度的算法,用于差分隐私地输出一个包含$n$个顶点$O(m)$条边的合成图,该合成图近似模拟了$G$的所有切割和谱。这些是\emph{第一个}私密算法,用于发布几乎与非私密设置中的此任务的时间和空间复杂度相匹配的合成图,同时在更实用的稀疏区域实现与以前工作相同(或更好)的效用。此外,我们的算法可以扩展到在持续观察下进行私密图分析。
更新时间: 2024-06-04 09:44:24
领域: cs.CR,cs.DS,cs.LG
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Neural networks extract features from data using stochastic gradient descent (SGD). In particular, higher-order input cumulants (HOCs) are crucial for their performance. However, extracting information from the $p$th cumulant of $d$-dimensional inputs is computationally hard: the number of samples required to recover a single direction from an order-$p$ tensor (tensor PCA) using online SGD grows as $d^{p-1}$, which is prohibitive for high-dimensional inputs. This result raises the question of how neural networks extract relevant directions from the HOCs of their inputs efficiently. Here, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order correlations. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to weakly-recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks.
Updated: 2024-06-04 09:43:45
标题: 滑下楼梯:如何相关的潜变量加速神经网络学习
摘要: 神经网络使用随机梯度下降(SGD)从数据中提取特征。特别是,高阶输入累积量(HOCs)对其性能至关重要。然而,从$d$维输入的第$p$阶累积量中提取信息在计算上是困难的:使用在线SGD从一个顺序为$p$的张量(张量PCA)中恢复单个方向所需的样本数量随着$d^{p-1}$的增加,对于高维输入来说是不可接受的。这一结果引发了一个问题,即神经网络如何有效地从其输入的HOCs中提取相关方向。在这里,我们展示了沿着不同输入累积量中编码的方向的潜在变量之间的相关性如何加速从高阶相关性中学习。我们通过从高维空间中随机起始使用在线SGD的单个神经元所需的样本数量的几乎精确阈值的推导,在理论上展示了这种效果。我们的分析结果在两层神经网络的模拟中得到了确认,并揭示了神经网络中的分层学习的一个新机制。
更新时间: 2024-06-04 09:43:45
领域: stat.ML,cond-mat.stat-mech,cs.LG,math.PR,math.ST,stat.TH
Continual Contrastive Spoken Language Understanding
Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.
Updated: 2024-06-04 09:43:02
标题: 持续对比口语理解
摘要: 最近,神经网络在各个领域展示出了令人印象深刻的进展,语音处理也不例外。然而,这一领域的最新突破需要使用大规模数据集和庞大的计算资源进行广泛的离线训练。不幸的是,这些模型在持续学习新任务时往往难以保留先前获得的知识,而从头重新训练几乎总是不切实际的。在本文中,我们研究了在增量学习(CIL)设置下学习序列到序列模型以实现口语理解的问题,并提出了COCONUT,这是一种依赖于经验重演和对比学习相结合的CIL方法。通过对仅在重演样本上应用的修改版本的标准监督对比损失,COCONUT通过将同一类别的样本拉近并将其他样本推远来保留学习到的表示。此外,我们利用了多模态对比损失,通过将音频和文本特征进行对齐,帮助模型学习更具有区分性的表示。我们还研究了不同的对比设计,以结合对比损失与用于蒸馏的师生架构的优势。对两个已建立的SLU数据集进行的实验证实了我们提出的方法的有效性,并显示与基线相比的显著改进。我们还展示了COCONUT可以与在模型的解码器端操作的方法相结合,从而进一步改进指标。
更新时间: 2024-06-04 09:43:02
领域: eess.AS,cs.AI
Learning Hamiltonian neural Koopman operator and simultaneously sustaining and discovering conservation law
Accurately finding and predicting dynamics based on the observational data with noise perturbations is of paramount significance but still a major challenge presently. Here, for the Hamiltonian mechanics, we propose the Hamiltonian Neural Koopman Operator (HNKO), integrating the knowledge of mathematical physics in learning the Koopman operator, and making it automatically sustain and even discover the conservation laws. We demonstrate the outperformance of the HNKO and its extension using a number of representative physical systems even with hundreds or thousands of freedoms. Our results suggest that feeding the prior knowledge of the underlying system and the mathematical theory appropriately to the learning framework can reinforce the capability of machine learning in solving physical problems.
Updated: 2024-06-04 09:42:34
标题: 学习哈密顿神经Koopman算子,同时维持和发现守恒定律
摘要: 准确地基于带有噪声扰动的观测数据找到并预测动态具有至关重要的意义,但目前仍然是一个主要挑战。在这里,针对哈密顿力学,我们提出了哈密顿神经库普曼算子(HNKO),将数学物理知识整合到学习库普曼算子中,使其能够自动维持甚至发现守恒定律。我们通过使用一些代表性的物理系统展示了HNKO及其扩展的优越性,即使这些系统具有数百或数千个自由度。我们的结果表明,在适当地将潜在系统的先验知识和数学理论输入到学习框架中的情况下,可以增强机器学习在解决物理问题方面的能力。
更新时间: 2024-06-04 09:42:34
领域: math-ph,cs.LG,math.MP
Piecewise Polynomial Regression of Tame Functions via Integer Programming
Tame functions are a class of nonsmooth, nonconvex functions, which feature in a wide range of applications: functions encountered in the training of deep neural networks with all common activations, value functions of mixed-integer programs, or wave functions of small molecules. We consider approximating tame functions with piecewise polynomial functions. We bound the quality of approximation of a tame function by a piecewise polynomial function with a given number of segments on any full-dimensional cube. We also present the first mixed-integer programming formulation of piecewise polynomial regression. Together, these can be used to estimate tame functions. We demonstrate promising computational results.
Updated: 2024-06-04 09:38:39
标题: 通过整数规划实现温和函数的分段多项式回归
摘要: 温顺函数是一类非光滑、非凸函数,广泛应用于训练具有所有常见激活函数的深度神经网络、混合整数规划的价值函数或小分子的波函数等领域。我们考虑用分段多项式函数来逼近温顺函数。我们通过具有给定分段数量的分段多项式函数在任何全维立方体上的质量来界定对温顺函数的逼近。我们还提出了第一个混合整数规划的分段多项式回归公式。这些方法可以用来估计温顺函数。我们展示了有希望的计算结果。
更新时间: 2024-06-04 09:38:39
领域: math.OC,cs.AI,cs.LG,math.ST,stat.TH
Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models
Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events. Existing approaches utilize fine-tuning of small language models (SLMs) like BERT to address the compatibility among the contexts of event mentions. However, due to the complexity and diversity of contexts, these models are prone to learning simple co-occurrences. Recently, large language models (LLMs) like ChatGPT have demonstrated impressive contextual understanding, yet they encounter challenges in adapting to specific information extraction (IE) tasks. In this paper, we propose a collaborative approach for CDECR, leveraging the capabilities of both a universally capable LLM and a task-specific SLM. The collaborative strategy begins with the LLM accurately and comprehensively summarizing events through prompting. Then, the SLM refines its learning of event representations based on these insights during fine-tuning. Experimental results demonstrate that our approach surpasses the performance of both the large and small language models individually, forming a complementary advantage. Across various datasets, our approach achieves state-of-the-art performance, underscoring its effectiveness in diverse scenarios.
Updated: 2024-06-04 09:35:47
标题: 协同事件理解:利用大型语言模型进行跨文档事件指代消解的协作方法
摘要: 跨文档事件核指代消解(CDECR)涉及将指代同一现实世界事件的事件提及聚类到多个文档中。现有方法利用微调小型语言模型(SLM)如BERT来解决事件提及上下文之间的兼容性。然而,由于上下文的复杂性和多样性,这些模型往往倾向于学习简单的共现关系。最近,像ChatGPT这样的大型语言模型(LLM)展示了令人印象深刻的上下文理解能力,但它们在适应特定信息提取(IE)任务方面遇到了挑战。在本文中,我们提出了一种协作方法用于CDECR,利用通用能力强大的LLM和专门任务的SLM的能力。协作策略始于LLM通过提示准确全面地总结事件,然后SLM在微调期间根据这些见解来优化其对事件表征的学习。实验结果表明,我们的方法超越了单独使用大型和小型语言模型的性能,形成了互补优势。在各种数据集上,我们的方法实现了最先进的性能,凸显了它在各种场景中的有效性。
更新时间: 2024-06-04 09:35:47
领域: cs.CL,cs.AI
Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line
A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.
Updated: 2024-06-04 09:34:08
标题: 激活瓶颈:Sigmoid神经网络无法预测一条直线
摘要: 如果一个神经网络的隐藏层之一具有有界图像,则该网络具有激活瓶颈。我们展示了具有激活瓶颈的网络无法预测无界序列,如直线、随机游走或任何具有趋势的序列:预测和实际值之间的差异变得任意大,无论训练过程如何。被广泛使用的神经网络架构,如LSTM和GRU,都受到这一限制的影响。在我们的分析中,我们表征了激活瓶颈,并解释了为什么它们阻止Sigmoid网络学习无界序列。我们通过实验证实了我们的发现,并讨论了对网络架构的修改,以减轻激活瓶颈的影响。
更新时间: 2024-06-04 09:34:08
领域: cs.LG
NeoRL: Efficient Exploration for Nonepisodic RL
We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $\setO(\beta_T \sqrt{T \Gamma_T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.
Updated: 2024-06-04 09:29:27
标题: NeoRL:非时序RL的高效探索
摘要: 我们研究了非周期性强化学习(RL)在非线性动态系统中的问题,其中系统动态是未知的,RL代理必须从单个轨迹中学习,即无需重置。我们提出了Nonepisodic Optimistic RL(NeoRL),这是一种基于乐观主义原则面对不确定性的方法。NeoRL使用经过良好校准的概率模型,并对未知动态的认知不确定性进行乐观规划。在对系统连续性和有界能量假设的前提下,我们为具有高斯过程动态的一般非线性系统提供了一个 $\setO(\beta_T \sqrt{T \Gamma_T})$ 的遗憾上界。我们将NeoRL与其他基线方法在几个深度RL环境中进行了比较,并通过实验证明,NeoRL在实现最佳平均成本的同时,遭受的遗憾最少。
更新时间: 2024-06-04 09:29:27
领域: cs.LG
Optimality of Matrix Mechanism on $\ell_p^p$-metric
In this paper, we introduce the $\ell_p^p$-error metric (for $p \geq 2$) when answering linear queries under the constraint of differential privacy. We characterize such an error under $(\epsilon,\delta)$-differential privacy. Before this paper, tight characterization in the hardness of privately answering linear queries was known under $\ell_2^2$-error metric (Edmonds et al., STOC 2020) and $\ell_p^2$-error metric for unbiased mechanisms (Nikolov and Tang, ITCS 2024). As a direct consequence of our results, we give tight bounds on answering prefix sum and parity queries under differential privacy for all constant $p$ in terms of the $\ell_p^p$ error, generalizing the bounds in Henzinger et al. (SODA 2023) for $p=2$.
Updated: 2024-06-04 09:27:35
标题: 矩阵机制在$\ell_p^p$度量上的最优性
摘要: 在这篇论文中,我们介绍了在差分隐私约束下回答线性查询时的$\ell_p^p$误差度量(对于$p \geq 2$)。我们在$(\epsilon,\delta)$-差分隐私下对这种误差进行了表征。在此论文之前,已知在$\ell_2^2$误差度量(Edmonds等人,STOC 2020)和无偏机制的$\ell_p^2$误差度量(Nikolov和Tang,ITCS 2024)下回答线性查询的难度。作为我们结果的直接结论,我们给出了在差分隐私下回答前缀和奇偶查询的紧密边界,其中$p$为常数,以$\ell_p^p$误差为度量,从而推广了Henzinger等人(SODA 2023)在$p=2$时的边界。
更新时间: 2024-06-04 09:27:35
领域: cs.CR,cs.LG
Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
Model-based reinforcement learning (MBRL) has been a primary approach to ameliorating the sample efficiency issue as well as to make a generalist agent. However, there has not been much effort toward enhancing the strategy of dreaming itself. Therefore, it is a question whether and how an agent can "dream better" in a more structured and strategic way. In this paper, inspired by the observation from cognitive science suggesting that humans use a spatial divide-and-conquer strategy in planning, we propose a new MBRL agent, called Dr. Strategy, which is equipped with a novel Dreaming Strategy. The proposed agent realizes a version of divide-and-conquer-like strategy in dreaming. This is achieved by learning a set of latent landmarks and then utilizing these to learn a landmark-conditioned highway policy. With the highway policy, the agent can first learn in the dream to move to a landmark, and from there it tackles the exploration and achievement task in a more focused way. In experiments, we show that the proposed model outperforms prior pixel-based MBRL methods in various visually complex and partially observable navigation tasks.
Updated: 2024-06-04 09:26:15
标题: 战略博士:基于模型的具有战略梦想能力的通才代理程序
摘要: 基于模型的强化学习(MBRL)一直是改善样本效率问题以及使普遍性代理更好的主要方法。然而,目前对于提高梦境策略本身的努力并不多。因此,一个问题是代理是否以及如何能以更有结构和战略性的方式“做更好的梦”。在本文中,我们受认知科学观察的启发,认为人类在规划时使用一种空间分而治之的策略,提出了一种新的MBRL代理,名为Dr. Strategy,它配备了一种新颖的梦境策略。所提出的代理在梦中实现了一种类似于分而治之的策略。这是通过学习一组潜在的地标,然后利用这些地标来学习一个地标条件的高速公路政策来实现的。通过高速公路政策,代理可以首先在梦中学习移动到一个地标,然后以更集中的方式处理探索和实现任务。在实验中,我们展示了所提出的模型在各种视觉复杂和部分可观察的导航任务中优于先前基于像素的MBRL方法。
更新时间: 2024-06-04 09:26:15
领域: cs.LG
SimulTron: On-Device Simultaneous Speech to Speech Translation
Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the strengths of the Translatotron framework while incorporating key modifications for streaming operation, and an adjustable fixed delay. Our experiments show that SimulTron surpasses Translatotron 2 in offline evaluations. Furthermore, real-time evaluations reveal that SimulTron improves upon the performance achieved by Translatotron 1. Additionally, SimulTron achieves superior BLEU scores and latency compared to previous real-time S2ST method on the MuST-C dataset. Significantly, we have successfully deployed SimulTron on a Pixel 7 Pro device, show its potential for simultaneous S2ST on-device.
Updated: 2024-06-04 09:21:31
标题: SimulTron: 设备上的同时语音翻译
摘要: Simultaneous speech-to-speech translation (S2ST)具有打破语言障碍、实现跨语言流畅对话的潜力。然而,通过移动设备实现准确、实时的翻译仍然是一个重大挑战。我们引入了SimulTron,这是一种新颖的S2ST架构,旨在解决这一问题。SimulTron是一个轻量级直接S2ST模型,利用Translatotron框架的优势,同时融合了关键修改以实现流式操作和可调整的固定延迟。我们的实验表明,SimulTron在离线评估中超越了Translatotron 2。此外,实时评估显示SimulTron在性能上优于Translatotron 1。此外,SimulTron在MuST-C数据集上相比以前的实时S2ST方法实现了更优秀的BLEU分数和延迟。值得注意的是,我们已成功在Pixel 7 Pro设备上部署了SimulTron,展示了其在设备上实现同时S2ST的潜力。
更新时间: 2024-06-04 09:21:31
领域: eess.AS,cs.CL,cs.LG,cs.SD
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.
Updated: 2024-06-04 09:18:20
标题: CondTSF:时间序列预测数据压缩的一行插件
摘要: 数据集压缩是一种新生的技术,可以生成一个小数据集,用于训练深度神经网络以降低训练成本。数据集压缩的目标是确保使用合成数据集训练的模型能够与使用完整数据集训练的模型性能相媲美。然而,现有方法主要集中在分类任务上,导致将它们应用于时间序列预测(TS-forecasting)时面临挑战。这一挑战源自对合成数据评估的差异。在分类中,如果模型使用完整数据集和模型使用合成数据集为相同输入产生相同标签,则认为合成数据是优化的,而不考虑输出logits分布的变化。相反,在TS-forecasting中,合成数据的优化效果是通过两个模型预测之间的距离来确定的。只有当所有预测数据点相似时,才认为合成数据是优化的。因此,与分类相比,TS-forecasting具有更严格的评估方法。为了弥合这一差距,我们在理论上分析了数据集压缩在TS-forecasting中的优化目标,并根据我们的分析提出了一种新的一行插件数据集压缩方法,称为时间序列预测数据集压缩(CondTSF)。将CondTSF插入先前的数据集压缩方法有助于减少使用完整数据集训练的模型和使用合成数据集训练的模型之间的预测距离,从而提高性能。我们在八个常用的时间序列数据集上进行了广泛的实验。CondTSF在所有数据集上持续改善了所有先前数据集压缩方法的性能,尤其是在低压缩比率下。
更新时间: 2024-06-04 09:18:20
领域: cs.LG,cs.AI
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here aims to achieve two goals: 1) identifying suitable learning algorithms and their corresponding hyperparameters, and 2) determining the optimal number of layers and neurons within each layer. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 base and advanced metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The representation in our proposed metaheuristic algorithm is variable-size, meaning that the length of the candidate solutions changes over time. We compared the algorithms based on the architecture found by each algorithm at different levels of missing values, accuracy, F-measure, and stability analysis. Additionally, we conducted a Wilcoxon signed-rank test for statistical comparison of the results. The extensive experiments show that our proposed approach significantly improves energy consumption prediction. Particularly, the JADE algorithm, a variant of Differential Evolution (DE), DE, and the Covariance Matrix Adaptation Evolution Strategy deliver superior results under various conditions and across different missing value levels.
Updated: 2024-06-04 09:14:20
标题: 移动应用开发中基于元启发式神经网络的能耗强预测及缺失值鲁棒性
摘要: 能源消耗是移动应用程序开发中的一个基本关注点,对开发人员和最终用户都具有重要意义。本研究的主要目标是提出一个新颖的基于神经网络的框架,结合元启发式方法,在移动应用程序开发环境中实现稳健的能源预测。这里的元启发式方法旨在实现两个目标:1)确定适合的学习算法及其相应的超参数,2)确定每一层中的最优层数和神经元数量。此外,由于无法访问手机的某些方面,数据集中可能存在缺失数据,而提出的框架可以处理这种情况。此外,我们采用了一个最优算法选择策略,使用了13种基础和高级元启发式算法,以识别基于准确性和对缺失值的抵抗力的最佳算法。我们提出的元启发式算法的表示是可变大小的,候选解的长度随时间变化。我们比较了不同算法在不同缺失值水平、准确性、F值和稳定性分析下找到的架构。此外,我们进行了威尔科克森符号秩检验,对结果进行统计比较。广泛的实验表明,我们提出的方法显著改善了能源消耗预测。特别是JADE算法,差分进化(DE)的变种DE,以及协方差矩阵适应进化策略在不同条件和不同缺失值水平下均提供了优异的结果。
更新时间: 2024-06-04 09:14:20
领域: cs.NE,cs.LG
Iteration Head: A Mechanistic Study of Chain-of-Thought
Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.
Updated: 2024-06-04 09:11:46
标题: 循环头:链式思维的机制研究
摘要: 链式思维(CoT)推理被认为在经验和理论逼近能力方面能够提升大型语言模型。然而,我们对CoT能力的内部运作和出现条件的理解仍然有限。本文通过在受控且可解释的环境中展示了CoT推理在变压器中的出现方式来填补这一空白。特别地,我们观察到出现了一个专门用于迭代推理的注意机制,我们将其称为“迭代头部”。我们追踪了这些迭代头部的出现和精确工作情况,直到注意水平,并测量了它们在任务之间产生的CoT技能的可转移性。
更新时间: 2024-06-04 09:11:46
领域: cs.LG,cs.AI,cs.CL
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
We introduce a hyperbolic neural network approach to pixel-level active learning for semantic segmentation. Analysis of the data statistics leads to a novel interpretation of the hyperbolic radius as an indicator of data scarcity. In HALO (Hyperbolic Active Learning Optimization), for the first time, we propose the use of epistemic uncertainty as a data acquisition strategy, following the intuition of selecting data points that are the least known. The hyperbolic radius, complemented by the widely-adopted prediction entropy, effectively approximates epistemic uncertainty. We perform extensive experimental analysis based on two established synthetic-to-real benchmarks, i.e. GTAV $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes. Additionally, we test HALO on Cityscape $\rightarrow$ ACDC for domain adaptation under adverse weather conditions, and we benchmark both convolutional and attention-based backbones. HALO sets a new state-of-the-art in active learning for semantic segmentation under domain shift and it is the first active learning approach that surpasses the performance of supervised domain adaptation while using only a small portion of labels (i.e., 1%).
Updated: 2024-06-04 09:11:21
标题: 超几何主动学习在领域转移下语义分割中的应用
摘要: 我们引入了一种双曲神经网络方法,用于像素级主动学习语义分割。通过对数据统计的分析,我们得出了双曲半径作为数据稀缺指标的新解释。在HALO(双曲主动学习优化)中,我们首次提出使用认识不确定性作为数据采集策略,遵循选择最不了解的数据点的直觉。双曲半径,辅以被广泛采纳的预测熵,有效地近似认识不确定性。我们基于两个已建立的从合成到真实的基准进行了广泛的实验分析,即GTAV $\rightarrow$ Cityscapes和SYNTHIA $\rightarrow$ Cityscapes。此外,我们在Cityscape $\rightarrow$ ACDC上测试了HALO,用于在恶劣天气条件下的领域适应,并对卷积和基于注意力的骨干网络进行了基准测试。HALO在领域转移下的主动学习语义分割中树立了新的技术水平,并且是第一个在仅使用少部分标签(即1%)的情况下超越监督领域适应性性能的主动学习方法。
更新时间: 2024-06-04 09:11:21
领域: cs.CV,cs.AI
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law.
Updated: 2024-06-04 09:11:13
标题: 学习率和批量大小缩放中的激增现象
摘要: 在当前的深度学习任务中,像Adam、Adagrad、RMSProp、Adafactor和Lion这样的Adam风格优化器已被广泛用作SGD风格优化器的替代品。这些优化器通常使用梯度的符号来更新模型参数,导致更稳定的收敛曲线。学习率和批量大小是优化器中最关键的超参数,需要仔细调整以实现有效的收敛。先前的研究表明,对于SGD风格优化器,最佳学习率会线性增加或遵循类似规则与批量大小。然而,这个结论并不适用于Adam风格优化器。在本文中,我们通过理论分析和大量实验阐明了Adam风格优化器的最佳学习率和批量大小之间的联系。首先,我们提出了在梯度符号情况下批量大小和最佳学习率之间的比例定律,我们证明最佳学习率随着批量大小的增加而先上升后下降。此外,激增的峰值将随着训练的进行逐渐向更大的批量大小移动。其次,我们在各种CV和NLP任务上进行了实验,并验证了比例定律的正确性。
更新时间: 2024-06-04 09:11:13
领域: cs.LG
CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination
Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in terms of intersection heterogeneity and road network intricacy. To make TSC applicable in urban traffic management, we target TSC coordination in city-scale high-authenticity road networks, aiming to solve the three unique and important challenges: city-level scalability, heterogeneity of real-world intersections, and effective coordination among intricate neighbor connections. Since optimizing multiple agents in a parameter-sharing paradigm can boost the training efficiency and help achieve scalability, we propose our method, CityLight, based on the well-acknowledged optimization framework, parameter-sharing MAPPO. To ensure the unified policy network can learn to fit large-scale heterogeneous intersections and tackle the intricate between-neighbor coordination, CityLight proposes a universal representation module that consists of two key designs: heterogeneous intersection alignment and neighborhood impact alignment for coordination. To further boost coordination, CityLight adopts neighborhood-integrated rewards to transition from achieving local optimal to global optimal. Extensive experiments on datasets with hundreds to tens of thousands of real-world intersections and authentic traffic demands validate the surprising effectiveness and generalizability of CityLight, with an overall performance gain of 11.66% and a 22.59% improvement in transfer scenarios in terms of throughput.
Updated: 2024-06-04 09:10:14
标题: CityLight:通往现实世界城市规模交通信号控制协调的通用模型
摘要: 交通信号控制(TSC)是一种有前途的低成本措施,可以提高交通效率,而不会影响现有的道路基础设施。虽然已经提出了各种基于强化学习的TSC方法,并在实验中表现优于传统的基于规则的方法,但没有一种方法被部署到现实世界中。一个重要的差距在于对交叉口异质性和道路网络复杂性场景的过度简化。为了使TSC能够应用于城市交通管理,我们的目标是在城市规模高真实性的道路网络中协调TSC,旨在解决三个独特而重要的挑战:城市级可扩展性、现实世界交叉口的异质性以及复杂邻近连接之间的有效协调。由于在参数共享范式中优化多个代理可以提高训练效率并有助于实现可伸缩性,我们提出了基于众所周知的优化框架参数共享MAPPO的方法CityLight。为了确保统一的策略网络能够学习适应大规模异质交叉口并解决邻近协调之间的复杂性,CityLight提出了一个通用表示模块,包括两个关键设计:异质交叉口对齐和协调的邻近影响对齐。为了进一步提升协调性,CityLight采用邻近整合奖励,实现从实现局部最优到全局最优的过渡。对具有数百至数万个真实世界交叉口和真实交通需求的数据集进行了大量实验,验证了CityLight的惊人有效性和通用性,整体性能提升了11.66%,在吞吐量方面在转移场景中提高了22.59%。
更新时间: 2024-06-04 09:10:14
领域: eess.SY,cs.AI,cs.LG,cs.MA,cs.SY
EchoMamba4Rec: Harmonizing Bidirectional State Space Models with Spectral Filtering for Advanced Sequential Recommendation
Sequential recommendation aims to estimate dynamic user preferences and sequential dependencies among historical user behaviors. Attention-based models have proven effective for sequential recommendation, but they suffer from inference inefficiency due to the quadratic computational complexity of attention mechanisms, particularly for long-range behavior sequences. Inspired by the recent success of state space models (SSMs) in control theory, which provide a robust framework for modeling and controlling dynamic systems, we present EchoMamba4Rec. Control theory emphasizes the use of SSMs for managing long-range dependencies and maintaining inferential efficiency through structured state matrices. EchoMamba4Rec leverages these control relationships in sequential recommendation and integrates bi-directional processing with frequency-domain filtering to capture complex patterns and dependencies in user interaction data more effectively. Our model benefits from the ability of state space models (SSMs) to learn and perform parallel computations, significantly enhancing computational efficiency and scalability. It features a bi-directional Mamba module that incorporates both forward and reverse Mamba components, leveraging information from both past and future interactions. Additionally, a filter layer operates in the frequency domain using learnable Fast Fourier Transform (FFT) and learnable filters, followed by an inverse FFT to refine item embeddings and reduce noise. We also integrate Gate Linear Units (GLU) to dynamically control information flow, enhancing the model's expressiveness and training stability. Experimental results demonstrate that EchoMamba significantly outperforms existing models, providing more accurate and personalized recommendations.
Updated: 2024-06-04 09:07:58
标题: EchoMamba4Rec: 将双向状态空间模型与谱滤波器协调,用于高级顺序推荐
摘要: Sequential recommendation旨在估计动态用户偏好和历史用户行为之间的顺序依赖关系。基于注意力的模型已被证明对于顺序推荐是有效的,但由于注意力机制的二次计算复杂度,特别是对于长序列行为序列而言,它们存在推理效率低下的问题。受控制理论中最近对状态空间模型(SSM)的成功启发,这些模型为建模和控制动态系统提供了一个稳健的框架,我们提出了EchoMamba4Rec。控制理论强调使用SSM来管理长程依赖关系,并通过结构化状态矩阵保持推理效率。EchoMamba4Rec利用这些控制关系在顺序推荐中,并将双向处理与频域滤波集成在一起,以更有效地捕获用户交互数据中的复杂模式和依赖关系。我们的模型受益于SSM学习和执行并行计算的能力,显著提高了计算效率和可扩展性。它具有一个双向Mamba模块,包括前向和反向Mamba组件,利用过去和未来交互的信息。此外,一个滤波层在频域中使用可学习的快速傅立叶变换(FFT)和可学习的滤波器进行操作,然后通过反向FFT来精炼项目嵌入并减少噪音。我们还整合了门线性单元(GLU)来动态控制信息流,增强模型的表达能力和训练稳定性。实验结果表明,EchoMamba明显优于现有模型,提供更准确和个性化的推荐。
更新时间: 2024-06-04 09:07:58
领域: cs.LG,cs.AI
Mixed-Precision Over-The-Air Federated Learning via Approximated Computing
Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (AxC) where bit precisions are adjusted for energy and computational efficiency. The dynamic distribution of bit precision updates amongst FL clients poses an open challenge for OTA-FL, as is is incompatible in the wireless modulation superposition space. Here, we propose an AxC-based OTA-FL framework of clients with multiple precisions, demonstrating the following innovations: (i) optimize the quantization-performance trade-off for both server and clients within the constraints of varying edge computing capabilities and learning accuracy requirements, and (ii) develop heterogeneous gradient resolution OTA-FL modulation schemes to ensure compatibility with physical layer OTA aggregation. Our findings indicate that we can design modulation schemes that enable AxC based OTA-FL, which can achieve 50\% faster and smoother server convergence and a performance enhancement for the lowest precision clients compared to a homogeneous precision approach. This demonstrates the great potential of our AxC-based OTA-FL approach in heterogeneous edge computing environments.
Updated: 2024-06-04 09:07:45
标题: 混合精度空中联邦学习:通过近似计算实现
摘要: 空中联邦学习(OTA-FL)已被广泛研究作为一种保护隐私的分布式学习机制。现实系统将看到具有不同大小、权重和功率配置的FL客户端。现有OTA-FL研究中存在一个关键的研究空白,即假设客户端计算比特精度是均匀的。事实上,许多客户端可能利用近似计算(AxC),其中比特精度根据能量和计算效率进行调整。在FL客户端之间动态分配比特精度更新是OTA-FL面临的一个挑战,因为它在无线调制叠加空间中是不兼容的。 在这里,我们提出了一个基于AxC的OTA-FL客户端多精度框架,展示了以下创新:(i)在变化的边缘计算能力和学习准确性要求的约束条件下,优化服务器和客户端的量化-性能折衷,(ii)开发异构梯度分辨率OTA-FL调制方案,以确保与物理层OTA聚合的兼容性。我们的研究结果表明,我们可以设计调制方案,实现基于AxC的OTA-FL,该方法可以实现服务器收敛速度更快更平滑,并且相对于均匀精度方法,对于最低精度的客户端有性能提升。这证明了我们的基于AxC的OTA-FL方法在异构边缘计算环境中具有巨大潜力。
更新时间: 2024-06-04 09:07:45
领域: cs.LG,cs.AI
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP). However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly. Therefore, we generally prefer a time-adaptive approach with fewer interactions with the system. In this work, we formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge by optimizing over policies that besides control predict the duration of its application. Our formulation results in an extended MDP that any standard RL algorithm can solve. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart while retaining the same or improved performance, and exhibiting robustness over discretization frequency. Finally, we propose OTaCoS, an efficient model-based algorithm for our setting. We show that OTaCoS enjoys sublinear regret for systems with sufficiently smooth dynamics and empirically results in further sample-efficiency gains.
Updated: 2024-06-04 09:06:20
标题: 何时感知和控制?一种连续时间RL的时序自适应方法
摘要: 强化学习(RL)在优化离散时间马尔可夫决策过程(MDP)的策略方面表现出色。然而,许多系统在本质上是连续时间的,使得离散时间MDP成为一个不精确的建模选择。在许多应用中,例如温室控制或医疗治疗,每次交互(测量或动作切换)涉及手动干预,因此本质上是昂贵的。因此,我们通常更喜欢一种时间自适应的方法,与系统的交互次数较少。在这项工作中,我们形式化了一个RL框架,称为时间自适应控制与感知(TaCoS),通过优化策略来解决这一挑战,除了控制之外还预测其应用的持续时间。我们的公式化结果是一个扩展的MDP,任何标准的RL算法都可以解决。我们展示了在TaCoS上训练的最先进的RL算法大大减少了与其离散时间对应物的交互次数,同时保持了相同或改进的性能,并且在离散化频率上表现出鲁棒性。最后,我们提出了OTaCoS,这是一个适用于我们设置的高效的基于模型的算法。我们展示了OTaCoS对于具有足够平滑动力学的系统享有次线性的后悔,并在经验上导致进一步的样本效率增益。
更新时间: 2024-06-04 09:06:20
领域: cs.LG
Towards Practical Single-shot Motion Synthesis
Despite the recent advances in the so-called "cold start" generation from text prompts, their needs in data and computing resources, as well as the ambiguities around intellectual property and privacy concerns pose certain counterarguments for their utility. An interesting and relatively unexplored alternative has been the introduction of unconditional synthesis from a single sample, which has led to interesting generative applications. In this paper we focus on single-shot motion generation and more specifically on accelerating the training time of a Generative Adversarial Network (GAN). In particular, we tackle the challenge of GAN's equilibrium collapse when using mini-batch training by carefully annealing the weights of the loss functions that prevent mode collapse. Additionally, we perform statistical analysis in the generator and discriminator models to identify correlations between training stages and enable transfer learning. Our improved GAN achieves competitive quality and diversity on the Mixamo benchmark when compared to the original GAN architecture and a single-shot diffusion model, while being up to x6.8 faster in training time from the former and x1.75 from the latter. Finally, we demonstrate the ability of our improved GAN to mix and compose motion with a single forward pass. Project page available at https://moverseai.github.io/single-shot.
Updated: 2024-06-04 09:02:14
标题: 朝向实用的单镜头动作合成
摘要: 尽管最近在文本提示生成方面取得了进展,但它们对数据和计算资源的需求,以及知识产权和隐私问题周围的模糊性为其效用提出了某些反驳观点。一个有趣且相对未被探索的替代方案是从单个样本进行无条件合成,这已经导致了有趣的生成应用。在本文中,我们专注于单次动作生成,更具体地加速生成对抗网络(GAN)的训练时间。特别是,我们通过仔细退火损失函数的权重来解决GAN在使用小批量训练时的平衡崩溃挑战,从而防止模式崩溃。此外,我们对生成器和鉴别器模型进行统计分析,以识别训练阶段之间的相关性并实现迁移学习。与原始GAN架构和单次扩散模型相比,我们改进的GAN在Mixamo基准测试中实现了竞争性的质量和多样性,同时训练时间比前者快了多达6.8倍,比后者快了1.75倍。最后,我们展示了我们改进的GAN能够通过单次前向传递混合和组合动作的能力。项目页面位于https://moverseai.github.io/single-shot。
更新时间: 2024-06-04 09:02:14
领域: cs.CV,cs.AI,cs.GR,cs.LG
Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs
The multi-plane representation has been highlighted for its fast training and inference across static and dynamic neural radiance fields. This approach constructs relevant features via projection onto learnable grids and interpolating adjacent vertices. However, it has limitations in capturing low-frequency details and tends to overuse parameters for low-frequency features due to its bias toward fine details, despite its multi-resolution concept. This phenomenon leads to instability and inefficiency when training poses are sparse. In this work, we propose a method that synergistically integrates multi-plane representation with a coordinate-based MLP network known for strong bias toward low-frequency signals. The coordinate-based network is responsible for capturing low-frequency details, while the multi-plane representation focuses on capturing fine-grained details. We demonstrate that using residual connections between them seamlessly preserves their own inherent properties. Additionally, the proposed progressive training scheme accelerates the disentanglement of these two features. We demonstrate empirically that our proposed method outperforms baseline models for both static and dynamic NeRFs with sparse inputs, achieving comparable results with fewer parameters.
Updated: 2024-06-04 08:56:57
标题: 协同网络和张量特征的协同集成,以改善来自稀疏输入的神经辐射场
摘要: 多平面表示法因其在静态和动态神经辐射场中快速训练和推理而受到关注。该方法通过投影到可学习网格并插值相邻顶点来构建相关特征。然而,它在捕获低频细节方面存在局限性,并且由于其偏向细节的偏见,往往会过度使用参数来捕获低频特征,尽管它具有多分辨率的概念。这种现象导致在训练姿势稀疏时不稳定和低效。在这项工作中,我们提出了一种方法,将多平面表示与以坐标为基础的MLP网络结合起来,后者以强烈的偏向低频信号而闻名。基于坐标的网络负责捕获低频细节,而多平面表示专注于捕获细节。我们证明,通过它们之间的残差连接无缝地保留它们自己固有的特性。此外,所提出的渐进式训练方案加速了这两个特征的解缠。我们凭经验证明,我们提出的方法在稀疏输入的静态和动态NeRF模型中胜过基线模型,实现了更少参数的可比结果。
更新时间: 2024-06-04 08:56:57
领域: cs.CV,cs.AI
Morphological Symmetries in Robotics
We present a comprehensive framework for studying and leveraging morphological symmetries in robotic systems. These are intrinsic properties of the robot's morphology, frequently observed in animal biology and robotics, which stem from the replication of kinematic structures and the symmetrical distribution of mass. We illustrate how these symmetries extend to the robot's state space and both proprioceptive and exteroceptive sensor measurements, resulting in the equivariance of the robot's equations of motion and optimal control policies. Thus, we recognize morphological symmetries as a relevant and previously unexplored physics-informed geometric prior, with significant implications for both data-driven and analytical methods used in modeling, control, estimation and design in robotics. For data-driven methods, we demonstrate that morphological symmetries can enhance the sample efficiency and generalization of machine learning models through data augmentation, or by applying equivariant/invariant constraints on the model's architecture. In the context of analytical methods, we employ abstract harmonic analysis to decompose the robot's dynamics into a superposition of lower-dimensional, independent dynamics. We substantiate our claims with both synthetic and real-world experiments conducted on bipedal and quadrupedal robots. Lastly, we introduce the repository MorphoSymm to facilitate the practical use of the theory and applications outlined in this work.
Updated: 2024-06-04 08:54:45
标题: 机器人学中的形态对称性
摘要: 我们提出了一个全面的框架,用于研究和利用机器人系统中的形态对称性。这些是机器人形态的固有属性,经常在动物生物学和机器人技术中观察到,源于运动结构的复制和质量的对称分布。我们阐明了这些对称性如何延伸到机器人的状态空间以及本体感知和外部感知传感器测量,导致机器人的运动方程和最优控制策略的等变性。因此,我们认识到形态对称性作为一个相关且以前未被探索的物理信息几何先验,对机器人建模、控制、估计和设计中使用的数据驱动和分析方法都有重要影响。对于数据驱动方法,我们展示了形态对称性如何通过数据增强或在模型架构上应用等变/不变约束,从而增强机器学习模型的样本效率和泛化能力。在分析方法的背景下,我们利用抽象谐波分析将机器人的动态分解为低维度、独立动态的叠加。我们通过在双足和四足机器人上进行的合成和现实世界实验来证实我们的观点。最后,我们介绍了MorphoSymm仓库,以促进在这项工作中概述的理论和应用的实际应用。
更新时间: 2024-06-04 08:54:45
领域: cs.RO,cs.AI,cs.SY,eess.SY
Learned Regularization for Inverse Problems: Insights from a Spectral Model
In this chapter we provide a theoretically founded investigation of state-of-the-art learning approaches for inverse problems from the point of view of spectral reconstruction operators. We give an extended definition of regularization methods and their convergence in terms of the underlying data distributions, which paves the way for future theoretical studies. Based on a simple spectral learning model previously introduced for supervised learning, we investigate some key properties of different learning paradigms for inverse problems, which can be formulated independently of specific architectures. In particular we investigate the regularization properties, bias, and critical dependence on training data distributions. Moreover, our framework allows to highlight and compare the specific behavior of the different paradigms in the infinite-dimensional limit.
Updated: 2024-06-04 08:49:01
标题: 学习正则化用于逆问题:来自谱模型的见解
摘要: 在本章中,我们从谱重建算子的角度,提供了一个理论上基础的逆问题学习方法的调查。我们对正规化方法及其收敛性给出了一个扩展定义,这为未来的理论研究铺平了道路。基于先前引入的用于监督学习的简单谱学习模型,我们研究了逆问题的不同学习范式的一些关键属性,这些属性可以独立于特定的架构进行表述。特别是,我们研究了正则化属性、偏差和对训练数据分布的关键依赖性。此外,我们的框架允许突出和比较在无限维极限下不同范式的特定行为。
更新时间: 2024-06-04 08:49:01
领域: math.NA,cs.LG,cs.NA,47A52, 65J22, 68T05
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where easily defined performance measures are lacking. However, there are drawbacks when RLHF is commonly used to optimize for average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms in complex and open-ended domains. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of QD with manually crafted diversity metrics on standard benchmarks in robotics and reinforcement learning. Notably, in open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model and is more favorably received in user studies. We conclude by analyzing QDHF's scalability, robustness, and quality of derived diversity metrics, emphasizing its strength in open-ended optimization tasks. Code and tutorials are available at https://liding.info/qdhf.
Updated: 2024-06-04 08:39:33
标题: 质量多样性通过人类反馈:走向开放式多样性驱动优化
摘要: 人类反馈增强学习(RLHF)在定性任务中表现出潜力,特别是在缺乏明确定义的性能指标的情况下。然而,当RLHF常用于优化平均人类偏好时存在缺点,特别是在需要多样模型响应的生成任务中。与此同时,质量多样性(QD)算法擅长于识别多样化和高质量解决方案,但通常依赖手工制定的多样性度量标准。本文介绍了通过人类反馈的质量多样性(QDHF)算法,这是一种新颖方法,通过逐步推断解决方案之间的相似度,从而增强了QD算法在复杂和开放领域中的适用性和有效性。实证研究表明,QDHF在自动多样性发现方面明显优于最先进的方法,并在机器人和强化学习的标准基准上与手工制定的多样性度量标准的QD的功效相匹配。值得注意的是,在开放式生成任务中,QDHF显著增强了从扩散模型生成文本到图像的多样性,并在用户研究中更受欢迎。我们通过分析QDHF的可扩展性、稳健性和派生多样性度量的质量来总结,强调其在开放式优化任务中的优势。代码和教程可在https://liding.info/qdhf获取。
更新时间: 2024-06-04 08:39:33
领域: cs.AI,cs.NE
UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models
OwnThink stands as the most extensive Chinese open-domain knowledge graph introduced in recent times. Despite prior attempts in question answering over OwnThink (OQA), existing studies have faced limitations in model representation capabilities, posing challenges in further enhancing overall accuracy in question answering. In this paper, we introduce UniOQA, a unified framework that integrates two complementary parallel workflows. Unlike conventional approaches, UniOQA harnesses large language models (LLMs) for precise question answering and incorporates a direct-answer-prediction process as a cost-effective complement. Initially, to bolster representation capacity, we fine-tune an LLM to translate questions into the Cypher query language (CQL), tackling issues associated with restricted semantic understanding and hallucinations. Subsequently, we introduce the Entity and Relation Replacement algorithm to ensure the executability of the generated CQL. Concurrently, to augment overall accuracy in question answering, we further adapt the Retrieval-Augmented Generation (RAG) process to the knowledge graph. Ultimately, we optimize answer accuracy through a dynamic decision algorithm. Experimental findings illustrate that UniOQA notably advances SpCQL Logical Accuracy to 21.2% and Execution Accuracy to 54.9%, achieving the new state-of-the-art results on this benchmark. Through ablation experiments, we delve into the superior representation capacity of UniOQA and quantify its performance breakthrough.
Updated: 2024-06-04 08:36:39
标题: UniOQA:一个统一的框架,用于利用大型语言模型进行知识图谱问答
摘要: OwnThink是近期介绍的最广泛的中国开放领域知识图谱。尽管先前有关OwnThink的问答(OQA)的尝试,现有研究在模型表示能力方面存在局限性,进一步提高问答的整体准确性面临挑战。在本文中,我们介绍了UniOQA,这是一个统一的框架,集成了两个互补的并行工作流程。与传统方法不同,UniOQA利用大型语言模型(LLMs)进行精确的问答,并将直接答案预测过程作为一种经济有效的补充。最初,为了增强表示能力,我们对LLM进行微调,将问题翻译成Cypher查询语言(CQL),解决与受限的语义理解和幻觉相关的问题。随后,我们引入实体和关系替换算法,以确保生成的CQL的可执行性。同时,为了增加问答的整体准确性,我们进一步适应了检索增强生成(RAG)过程到知识图谱。最终,我们通过动态决策算法优化答案准确性。实验结果表明,UniOQA显著提高了SpCQL逻辑准确性达到21.2%,执行准确性达到54.9%,在这一基准上取得了新的最先进结果。通过消融实验,我们深入探讨了UniOQA的优越表示能力,并量化了其性能突破。
更新时间: 2024-06-04 08:36:39
领域: cs.CL,cs.AI
Kernel vs. Kernel: Exploring How the Data Structure Affects Neural Collapse
Recently, a vast amount of literature has focused on the "Neural Collapse" (NC) phenomenon, which emerges when training neural network (NN) classifiers beyond the zero training error point. The core component of NC is the decrease in the within class variability of the network's deepest features, dubbed as NC1. The theoretical works that study NC are typically based on simplified unconstrained features models (UFMs) that mask any effect of the data on the extent of collapse. In this paper, we provide a kernel-based analysis that does not suffer from this limitation. First, given a kernel function, we establish expressions for the traces of the within- and between-class covariance matrices of the samples' features (and consequently an NC1 metric). Then, we turn to focus on kernels associated with shallow NNs. First, we consider the NN Gaussian Process kernel (NNGP), associated with the network at initialization, and the complement Neural Tangent Kernel (NTK), associated with its training in the "lazy regime". Interestingly, we show that the NTK does not represent more collapsed features than the NNGP for prototypical data models. As NC emerges from training, we then consider an alternative to NTK: the recently proposed adaptive kernel, which generalizes NNGP to model the feature mapping learned from the training data. Contrasting our NC1 analysis for these two kernels enables gaining insights into the effect of data distribution on the extent of collapse, which are empirically aligned with the behavior observed with practical training of NNs.
Updated: 2024-06-04 08:33:56
标题: 内核vs.内核:探讨数据结构如何影响神经崩溃
摘要: 最近,大量文献集中讨论了“神经崩溃”(NC)现象,即在超越零训练误差点训练神经网络(NN)分类器时出现的现象。NC的核心组成部分是网络最深层特征的类内变异性降低,被称为NC1。研究NC的理论工作通常基于简化的无约束特征模型(UFM),掩盖了数据对崩溃程度的任何影响。本文提供了一种不受此限制的基于核的分析。首先,给定一个核函数,我们建立了样本特征的类内和类间协方差矩阵的迹的表达式(因此得到了一个NC1度量)。然后,我们转而关注与浅层NN相关的核。首先,我们考虑了与网络初始化相关的NN高斯过程核(NNGP),以及与其在“懒惰区域”训练相关的补充神经切线核(NTK)。有趣的是,我们展示了对于典型数据模型,NTK并不比NNGP代表更多崩溃特征。随着训练的进行,我们考虑了NTK的替代方案:最近提出的自适应核,将NNGP推广为对从训练数据中学习的特征映射进行建模。对这两种核进行NC1分析的对比,使我们能够洞察数据分布对崩溃程度的影响,这与实际训练神经网络观察到的行为相吻合。
更新时间: 2024-06-04 08:33:56
领域: cs.LG,cs.AI,cs.IT,math.IT,stat.ML
A Bayesian Approach to Online Planning
The combination of Monte Carlo tree search and neural networks has revolutionized online planning. As neural network approximations are often imperfect, we ask whether uncertainty estimates about the network outputs could be used to improve planning. We develop a Bayesian planning approach that facilitates such uncertainty quantification, inspired by classical ideas from the meta-reasoning literature. We propose a Thompson sampling based algorithm for searching the tree of possible actions, for which we prove the first (to our knowledge) finite time Bayesian regret bound, and propose an efficient implementation for a restricted family of posterior distributions. In addition we propose a variant of the Bayes-UCB method applied to trees. Empirically, we demonstrate that on the ProcGen Maze and Leaper environments, when the uncertainty estimates are accurate but the neural network output is inaccurate, our Bayesian approach searches the tree much more effectively. In addition, we investigate whether popular uncertainty estimation methods are accurate enough to yield significant gains in planning. Our code is available at: https://github.com/nirgreshler/bayesian-online-planning.
Updated: 2024-06-04 08:33:17
标题: 一个贝叶斯方法用于在线规划
摘要: 蒙特卡洛树搜索和神经网络的结合彻底改变了在线规划。由于神经网络逼近通常不完美,我们想知道关于网络输出的不确定性估计是否可以用来改进规划。我们开发了一种贝叶斯规划方法,可以促进这种不确定性量化,灵感来自于元推理文献中的经典思想。我们提出了一种基于汤普森抽样的算法来搜索可能行动的树,证明了我们所知的第一个有限时间贝叶斯遗憾界,并为一类后验分布提出了高效的实现。此外,我们提出了一种适用于树的贝叶斯-UCB方法的变体。在实证上,我们证明在ProcGen Maze和Leaper环境中,当不确定性估计准确但神经网络输出不准确时,我们的贝叶斯方法可以更有效地搜索树。此外,我们还调查了流行的不确定性估计方法是否足够准确以产生规划中的显著收益。我们的代码可在以下链接找到:https://github.com/nirgreshler/bayesian-online-planning。
更新时间: 2024-06-04 08:33:17
领域: cs.AI
Self-Pro: A Self-Prompt and Tuning Framework for Graph Neural Networks
Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in Natural Language Processing (NLP), many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on the randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning for pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training phase as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at https://github.com/gongchenghua/Self-Pro.
Updated: 2024-06-04 08:31:15
标题: Self-Pro: 一种用于图神经网络的自我提示和调整框架
摘要: 图形已成为网络应用程序的重要建模工具,图神经网络(GNNs)在图表示学习方面取得了巨大成功。然而,传统GNNs的性能严重依赖于大量监督。最近,“预训练,微调”已成为解决标签依赖性和泛化不佳问题的范例。然而,针对具有同质性和异质性的图的预训练策略各不相同,各种下游任务的目标也不同。这导致了预文本和下游任务之间的差距,导致“负迁移”和性能不佳。受自然语言处理(NLP)中提示学习的启发,许多研究转向弥合这一差距,充分利用预训练模型。然而,现有的图提示方法专门针对同质性,忽视了图上固有的异质性。同时,大多数方法依赖于随机初始化的提示,这对稳定性产生负面影响。因此,我们提出了Self-Prompt,这是一种基于模型和数据本身的图提示框架。我们首先引入了用于预文本的不对称图对比学习,以解决异质性,并使预文本和下游任务的目标保持一致。然后,我们重复使用预训练阶段的组件作为自适应器,并引入基于图本身的自提示进行任务适应。最后,我们在11个基准数据集上进行了大量实验,以证明其优越性。我们在https://github.com/gongchenghua/Self-Pro上提供了我们的代码。
更新时间: 2024-06-04 08:31:15
领域: cs.LG,cs.AI
Asymptotics of feature learning in two-layer networks after one gradient-step
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error of the sRF in the high-dimensional limit where the number of samples, the width, and the input dimension grow at a proportional rate. The resulting characterization for sRFs also captures closely the learning curves of the original network model. This enables us to understand how adapting to the data is crucial for the network to efficiently learn non-linear functions in the direction of the gradient -- where at initialization it can only express linear functions in this regime.
Updated: 2024-06-04 08:28:52
标题: 两层网络在一次梯度步骤后特征学习的渐近性
摘要: 在这篇论文中,我们研究了两层神经网络如何从数据中学习特征,并在经过单次梯度下降训练后改进核心领域的问题。借鉴了Ba等人(2022年)的见解,我们将训练好的网络建模为尖峰随机特征(sRF)模型。进一步借鉴了高斯普适性的最新进展(Dandi等人,2023年),我们在样本数量、宽度和输入维度以等比例增长的高维极限条件下提供了sRF泛化误差的精确渐近描述。sRF的结果表征也紧密捕捉了原始网络模型的学习曲线。这使我们能够理解适应数据对于网络有效学习非线性函数至关重要,特别是在初始状态下,它只能在这个领域表达线性函数。
更新时间: 2024-06-04 08:28:52
领域: stat.ML,cond-mat.dis-nn,cs.LG
MaskSR: Masked Language Model for Full-band Speech Restoration
Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb, clipping, and low bandwidth. MaskSR works with discrete acoustic tokens extracted using a pre-trained neural codec. During training, MaskSR is optimized to predict randomly masked tokens extracted from the high quality target speech, conditioned on the corrupted speech with various distortions. During inference, MaskSR reconstructs the target speech tokens with efficient iterative sampling. Extensive experiments show that MaskSR obtains competitive results on both the full-band speech restoration task and also on sub-tasks compared with a wide range of models.
Updated: 2024-06-04 08:23:57
标题: MaskSR: 用于全频段语音恢复的掩码语言模型
摘要: 语音恢复旨在在各种失真的情况下恢复高质量语音。尽管已经研究了几种深度学习范例用于这一任务,但最近出现的语言模型的潜力尚未完全探索。本文提出了MaskSR,一个能够恢复全频带44.1 kHz语音的遮罩语言模型,同时考虑噪声、混响、裁剪和低带宽。MaskSR使用预先训练的神经编解码器提取的离散声学令牌进行工作。在训练过程中,MaskSR被优化为在各种失真的情况下,根据损坏的语音条件下预测从高质量目标语音中提取的随机遮罩令牌。在推断过程中,MaskSR通过高效的迭代抽样重建目标语音令牌。大量实验证明,与广泛的模型相比,MaskSR在全频带语音恢复任务和子任务上都取得了有竞争力的结果。
更新时间: 2024-06-04 08:23:57
领域: cs.SD,cs.AI,cs.LG,eess.AS,eess.SP
Learning to Intervene on Concept Bottlenecks
While deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Moreover, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Up to this point, these interventions were typically applied to the model just once and then discarded. To rectify this, we present concept bottleneck memory models (CB2Ms), which keep a memory of past interventions. Specifically, CB2Ms leverage a two-fold memory to generalize interventions to appropriate novel situations, enabling the model to identify errors and reapply previous interventions. This way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. Our experimental evaluations on challenging scenarios like handling distribution shifts and confounded data demonstrate that CB2Ms are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Hence, CB2Ms are a valuable tool for users to provide interactive feedback on CBMs, by guiding a user's interaction and requiring fewer interventions.
Updated: 2024-06-04 08:21:51
标题: 学习干预概念瓶颈
摘要: 尽管深度学习模型通常缺乏可解释性,但概念瓶颈模型(CBMs)通过其概念表示提供固有的解释。此外,它们允许用户通过更新概念值进行干预交互,并因此纠正模型的预测输出。到目前为止,这些干预通常仅应用一次,然后被丢弃。为了纠正这一点,我们提出了概念瓶颈记忆模型(CB2Ms),它们保留了过去干预的记忆。具体来说,CB2Ms利用双重记忆来将干预推广到适当的新情况,使模型能够识别错误并重新应用先前的干预。通过这种方式,CB2M学会自动从最初获得的少量干预中提高模型性能。如果没有先前的人类干预,CB2M可以检测CBM瓶颈的潜在错误,并请求有针对性的干预。我们在处理分布变化和混淆数据等具有挑战性的场景上进行的实验评估表明,CB2Ms能够成功将干预推广到未见数据,并确实能够识别错误推断的概念。因此,CB2Ms是用户为CBMs提供交互式反馈的有价值工具,通过引导用户的交互并要求更少的干预。
更新时间: 2024-06-04 08:21:51
领域: cs.LG,cs.AI
LIRE: listwise reward enhancement for preference alignment
Recently, tremendous strides have been made to align the generation of Large Language Models (LLMs) with human values to mitigate toxic or unhelpful content. Leveraging Reinforcement Learning from Human Feedback (RLHF) proves effective and is widely adopted by researchers. However, implementing RLHF is complex, and its sensitivity to hyperparameters renders achieving stable performance and scalability challenging. Furthermore, prevailing approaches to preference alignment primarily concentrate on pairwise comparisons, with limited exploration into multi-response scenarios, thereby overlooking the potential richness within the candidate pool. For the above reasons, we propose a new approach: Listwise Reward Enhancement for Preference Alignment (LIRE), a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework, thus eliminating the need for online sampling during training. LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm while naturally extending to multi-response scenarios. Moreover, we introduce a self-enhancement algorithm aimed at iteratively refining the reward during training. Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks, with good transferability to out-of-distribution data, assessed using proxy reward models and human annotators.
Updated: 2024-06-04 08:21:05
标题: LIRE:列表式奖励增强以实现偏好对齐
摘要: 最近,人们已经取得了长足的进展,将生成大型语言模型(LLMs)与人类价值观相协调,以减少有毒或无益的内容。利用从人类反馈中获得的强化学习(RLHF)被证明是有效的,并被广泛采纳。然而,实施RLHF是复杂的,其对超参数的敏感性使得实现稳定性和可扩展性具有挑战性。此外,目前对偏好对齐的主要方法主要集中在两两比较上,对于多响应情景的探索有限,从而忽视了候选池中的潜在丰富性。基于上述原因,我们提出了一种新方法:用于偏好对齐的列表奖励增强(LIRE),这是一种基于梯度的奖励优化方法,将多个响应的离线奖励整合到一个简化的列表框架中,从而在训练过程中消除了在线采样的需要。LIRE易于实现,需要最少的参数调整,并且可以无缝地与两两范式结合,同时自然扩展到多响应情景。此外,我们引入了一种自我增强算法,旨在在训练过程中迭代地改进奖励。我们的实验证明,LIRE在对话和总结任务的几个基准上始终优于现有方法,在使用代理奖励模型和人类标注者评估时,对于分布之外的数据具有良好的可传递性。
更新时间: 2024-06-04 08:21:05
领域: cs.CL,cs.LG
Deep ReLU networks and high-order finite element methods II: Chebyshev emulation
We show expression rates and stability in Sobolev norms of deep feedforward ReLU neural networks (NNs) in terms of the number of parameters defining the NN for continuous, piecewise polynomial functions, on arbitrary, finite partitions $\mathcal{T}$ of a bounded interval $(a,b)$. Novel constructions of ReLU NN surrogates encoding function approximations in terms of Chebyshev polynomial expansion coefficients are developed which require fewer neurons than previous constructions. Chebyshev coefficients can be computed easily from the values of the function in the Clenshaw--Curtis points using the inverse fast Fourier transform. Bounds on expression rates and stability are obtained that are superior to those of constructions based on ReLU NN emulations of monomials as considered in [Opschoor, Petersen and Schwab, 2020] and [Montanelli, Yang and Du, 2021]. All emulation bounds are explicit in terms of the (arbitrary) partition of the interval, the target emulation accuracy and the polynomial degree in each element of the partition. ReLU NN emulation error estimates are provided for various classes of functions and norms, commonly encountered in numerical analysis. In particular, we show exponential ReLU emulation rate bounds for analytic functions with point singularities and develop an interface between Chebfun approximations and constructive ReLU NN emulations.
Updated: 2024-06-04 08:19:46
标题: 深度ReLU网络和高阶有限元方法 II:切比雪夫模拟
摘要: 我们展示了深度前馈ReLU神经网络(NNs)对于连续、分段多项式函数在有界区间$(a,b)$的任意有限分区$\mathcal{T}$中的参数数量表达率和稳定性。我们开发了一种新颖的ReLU NN替代构造,将函数逼近编码为Chebyshev多项式展开系数,这种构造需要比以前更少的神经元。Chebyshev系数可以通过在Clenshaw-Curtis点处的函数值使用快速傅立叶逆变换轻松计算得到。获得了超越基于ReLU NN模拟单项式的构造的表达率和稳定性的界限,这些界限在[Opschoor,Petersen和Schwab,2020]和[Montanelli,Yang和Du,2021]中考虑的基于ReLU NN模拟的单项式的界限方面表现优越。所有模拟界限都是明确的,与(任意)间隔分区、目标模拟精度和分区中每个元素的多项式次数有关。为各种在数值分析中常见的函数类和范数提供了ReLU NN模拟误差估计。特别地,我们展示了具有点奇性的解析函数的指数ReLU模拟速率界限,并开发了Chebfun逼近和构建ReLU NN模拟之间的接口。
更新时间: 2024-06-04 08:19:46
领域: math.NA,cs.LG,cs.NA,65N30, 41A05, 41A10, 41A25, 41A50
How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?
Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.
Updated: 2024-06-04 08:17:47
标题: 社交计算研究有多西方化、受教育、工业化、富裕和民主?
摘要: 社会计算领域的许多研究分析来自社交媒体平台的数据,这些数据可能具有固有的偏见。一个被忽视的偏见来源是WEIRD(即西方、受教育、工业化、富裕和民主)人口的过度代表,这可能不能准确反映全球人口多样性。我们评估了在AAAI ICWSM会议上呈现的研究对WEIRD人口的依赖性;这是唯一一个其论文完全专注于社会计算研究的会议。我们通过分析2018年至2022年发表的494篇论文,其中包括完整的研究论文、数据集论文和海报,来进行评估。在筛选出分析合成数据集或缺乏明确原籍国的论文之后,我们还剩下420篇论文,从中有188名参与者进行了一项众包研究,并通过完全手动验证提取了WEIRD得分计算所需的数据。然后,这些数据被用来调整现有的WEIRD指标,以适用于社交媒体数据。我们发现,37%的这些论文仅关注来自西方国家的数据。这个百分比显著低于CHI(76%)和FAccT(84%)会议研究中观察到的百分比,表明ICWSM中存在更多样化的数据集来源。然而,ICWSM的研究仍主要考察相对于FAccT中的那些来自受教育、工业化和富裕的国家的人口,特别指出“民主”变量反映了政治自由和权利。这指出了社交媒体数据在揭示受限制政治自由国家的发现方面的实用性。基于这些见解,我们建议扩展当前的“论文清单”以包括对WEIRD偏见的考虑,并呼吁社区通过鼓励使用来自少数地区的多样化数据集来拓宽研究的包容性。
更新时间: 2024-06-04 08:17:47
领域: cs.HC,cs.AI
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of large language models, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.
Updated: 2024-06-04 08:08:59
标题: 计时器:生成式预训练Transformer是大型时间序列模型
摘要: 深度学习在时间序列分析方面取得了显著的进展。然而,在现实世界中数据稀缺的情况下,深度模型可能会遇到性能瓶颈,这可能是由于当前基准测试中小模型的性能饱和所掩盖的。与此同时,大型模型通过大规模预训练在这些情况下展现出了巨大的能力。随着大型语言模型的出现,取得了持续进展,展现出了前所未有的能力,如少样本泛化、可扩展性和任务的通用性,而这些在小型深度模型中却缺失。为了改变从头开始训练特定场景的小型模型的现状,本文旨在早期开发大型时间序列模型(LTSM)。在预训练过程中,我们整理了包含高达10亿个时间点的大规模数据集,将异构时间序列统一转换为单序列序列(S3)格式,并开发了面向LTSM的GPT风格架构。为了满足不同的应用需求,我们将时间序列的预测、插补和异常检测转化为统一的生成任务。本研究的结果是一个名为Time Series Transformer (Timer)的模型,通过下一个标记预测进行生成式预训练,并具有作为LTSM的有希望的能力,适用于各种下游任务。代码和数据集可在以下网址获取:https://github.com/thuml/Large-Time-Series-Model。
更新时间: 2024-06-04 08:08:59
领域: cs.LG,stat.ML
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at https://sites.google.com/view/fightladder/home.
Updated: 2024-06-04 08:04:23
标题: 对抗梯度:竞争性多智体强化学习的基准
摘要: 最近强化学习(RL)方面的进展在很大程度上依赖于各种精心设计的基准测试,这些基准测试提供了环境平台和一致的评估标准,以评估现有和新颖的算法。具体而言,在多智体RL(MARL)中,基于合作游戏的大量基准测试推动了改善合作多智体系统可扩展性的算法的发展。然而,对于竞争环境,尚未建立一个具有具有挑战性的游戏动态和视觉输入的轻量级和开源基准测试。在这项工作中,我们提出了FightLadder,一个实时格斗游戏平台,以推动竞争性MARL研究。除了平台之外,我们还提供了用于竞争游戏的最先进MARL算法的实现,以及一组评估指标,用于表征代理的性能和可利用性。我们通过训练一个一般代理,该代理在单人模式下始终击败12个内置角色,展示了这个平台的可行性,并揭示了在双人模式下训练一个无法被利用的代理的困难,而没有人类知识和演示。FightLadder提供了精心设计的环境,以解决竞争性MARL研究中的关键挑战,旨在催化该领域的新发现和进步。视频和代码请参阅https://sites.google.com/view/fightladder/home。
更新时间: 2024-06-04 08:04:23
领域: cs.MA,cs.AI,cs.LG
LongSSM: On the Length Extension of State-space Models in Language Modelling
In this paper, we investigate the length-extension of state-space models (SSMs) in language modeling. Length extension involves training models on short sequences and testing them on longer ones. We show that state-space models trained with zero hidden states initialization have difficulty doing length extension. We explain this difficulty by pointing out the length extension is equivalent to polynomial extrapolation. Based on the theory, we propose a simple yet effective method - changing the hidden states initialization scheme - to improve the length extension. Moreover, our method shows that using long training sequence length is beneficial but not necessary to length extension. Changing the hidden state initialization enables the efficient training of long-memory model with a smaller training context length.
Updated: 2024-06-04 08:02:39
标题: LongSSM:关于语言建模中状态空间模型的长度扩展
摘要: 在这篇论文中,我们研究了状态空间模型(SSMs)在语言建模中的长度扩展。长度扩展涉及在短序列上训练模型,然后在更长序列上进行测试。我们展示了使用零隐藏状态初始化训练的状态空间模型在进行长度扩展时存在困难。我们通过指出长度扩展等同于多项式外推来解释这种困难。基于这一理论,我们提出了一种简单而有效的方法 - 改变隐藏状态初始化方案 - 以改善长度扩展。此外,我们的方法表明,使用较长的训练序列长度是有益的,但对于长度扩展并非必要。改变隐藏状态初始化使得能够有效地训练具有更小训练上下文长度的长记忆模型。
更新时间: 2024-06-04 08:02:39
领域: cs.CL,cs.AI,cs.LG,math.DS
MultiMax: Sparse and Multi-Modal Attention Learning
SoftMax is a ubiquitous ingredient of modern machine learning algorithms. It maps an input vector onto a probability simplex and reweights the input by concentrating the probability mass at large entries. Yet, as a smooth approximation to the Argmax function, a significant amount of probability mass is distributed to other, residual entries, leading to poor interpretability and noise. Although sparsity can be achieved by a family of SoftMax variants, they often require an alternative loss function and do not preserve multi-modality. We show that this trade-off between multi-modality and sparsity limits the expressivity of SoftMax as well as its variants. We provide a solution to this tension between objectives by proposing a piece-wise differentiable function, termed MultiMax, which adaptively modulates the output distribution according to input entry range. Through comprehensive analysis and evaluation, we show that MultiMax successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation. The code is available at https://github.com/ZhouYuxuanYX/MultiMax.
Updated: 2024-06-04 07:58:32
标题: MultiMax: 稀疏和多模态关注学习
摘要: SoftMax是现代机器学习算法中普遍存在的一个组成部分。它将输入向量映射到概率单纯形,并通过将概率质量集中在较大的条目上对输入进行重新加权。然而,作为Argmax函数的平滑近似,相当多的概率质量分布到其他残余的条目上,导致解释性差和噪音干扰。虽然一系列SoftMax变体可以实现稀疏性,但它们通常需要替代损失函数,并且无法保留多模态性。我们展示了这种多模态性和稀疏性之间的权衡限制了SoftMax及其变体的表现能力。我们通过提出一种分段可微函数MultiMax解决了这种目标之间的张力,该函数根据输入条目范围自适应调节输出分布。通过全面分析和评估,我们展示了MultiMax成功产生了一个抑制不相关条目而保留多模态性的分布,对图像分类、语言建模和机器翻译有益。代码可在https://github.com/ZhouYuxuanYX/MultiMax获取。
更新时间: 2024-06-04 07:58:32
领域: cs.LG,cs.AI
A Toolbox for Supporting Research on AI in Water Distribution Networks
Drinking water is a vital resource for humanity, and thus, Water Distribution Networks (WDNs) are considered critical infrastructures in modern societies. The operation of WDNs is subject to diverse challenges such as water leakages and contamination, cyber/physical attacks, high energy consumption during pump operation, etc. With model-based methods reaching their limits due to various uncertainty sources, AI methods offer promising solutions to those challenges. In this work, we introduce a Python toolbox for complex scenario modeling \& generation such that AI researchers can easily access challenging problems from the drinking water domain. Besides providing a high-level interface for the easy generation of hydraulic and water quality scenario data, it also provides easy access to popular event detection benchmarks and an environment for developing control algorithms.
Updated: 2024-06-04 07:58:19
标题: 一个支持水配送网络人工智能研究的工具箱
摘要: 饮用水是人类的重要资源,因此,水配送网络(WDNs)被认为是现代社会中的关键基础设施。WDN的运行面临诸多挑战,如水漏和污染、网络攻击、泵操作时高能耗等。由于各种不确定性来源使得基于模型的方法达到了极限,人工智能方法为解决这些挑战提供了有希望的解决方案。在这项工作中,我们介绍了一个用于复杂场景建模和生成的Python工具箱,使得人工智能研究人员可以轻松访问饮用水领域的挑战性问题。除了提供一个高级接口用于轻松生成水力和水质场景数据外,它还为开发控制算法提供了易于访问的流行事件检测基准和环境。
更新时间: 2024-06-04 07:58:19
领域: cs.AI,cs.CE,cs.SY,eess.SY
Multi-target stain normalization for histology slides
Traditional staining normalization approaches, e.g. Macenko, typically rely on the choice of a single representative reference image, which may not adequately account for the diverse staining patterns of datasets collected in practical scenarios. In this study, we introduce a novel approach that leverages multiple reference images to enhance robustness against stain variation. Our method is parameter-free and can be adopted in existing computational pathology pipelines with no significant changes. We evaluate the effectiveness of our method through experiments using a deep-learning pipeline for automatic nuclei segmentation on colorectal images. Our results show that by leveraging multiple reference images, better results can be achieved when generalizing to external data, where the staining can widely differ from the training set.
Updated: 2024-06-04 07:57:34
标题: 组织学切片的多目标染色标准化
摘要: 传统的染色标准化方法,例如Macenko,通常依赖于选择一个代表性的参考图像,这可能无法充分考虑实际场景中收集的数据集的多样化染色模式。在本研究中,我们引入了一种新颖的方法,利用多个参考图像增强对染色变化的鲁棒性。我们的方法是无参数的,并且可以在现有的计算病理学流程中采用,无需进行重大更改。我们通过在结直肠图像上使用深度学习流程进行自动细胞核分割的实验来评估我们方法的有效性。我们的结果表明,通过利用多个参考图像,在推广到外部数据时可以获得更好的结果,其中染色可能与训练集大不相同。
更新时间: 2024-06-04 07:57:34
领域: eess.IV,cs.AI,cs.CV,68U10,I.4.0
ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU
Limited by the complexity of basis function (B-spline) calculations, Kolmogorov-Arnold Networks (KAN) suffer from restricted parallel computing capability on GPUs. This paper proposes a novel ReLU-KAN implementation that inherits the core idea of KAN. By adopting ReLU (Rectified Linear Unit) and point-wise multiplication, we simplify the design of KAN's basis function and optimize the computation process for efficient CUDA computing. The proposed ReLU-KAN architecture can be readily implemented on existing deep learning frameworks (e.g., PyTorch) for both inference and training. Experimental results demonstrate that ReLU-KAN achieves a 20x speedup compared to traditional KAN with 4-layer networks. Furthermore, ReLU-KAN exhibits a more stable training process with superior fitting ability while preserving the "catastrophic forgetting avoidance" property of KAN. You can get the code in https://github.com/quiqi/relu_kan
Updated: 2024-06-04 07:54:31
标题: ReLU-KAN:仅需矩阵加法、点乘和ReLU的新科尔莫戈洛夫-阿诺德网络
摘要: 由于基函数(B-样条)计算的复杂性限制,Kolmogorov-Arnold网络(KAN)在GPU上受到并行计算能力的限制。本文提出了一种新颖的ReLU-KAN实现,继承了KAN的核心思想。通过采用ReLU(修正线性单元)和逐点乘法,我们简化了KAN基函数的设计,并优化了计算过程,以实现高效的CUDA计算。提出的ReLU-KAN架构可以方便地在现有的深度学习框架(如PyTorch)上进行推断和训练。实验结果表明,与具有4层网络的传统KAN相比,ReLU-KAN实现了20倍的加速。此外,ReLU-KAN表现出更稳定的训练过程,具有更好的拟合能力,同时保持了KAN的“避免灾难性遗忘”属性。您可以在https://github.com/quiqi/relu_kan 获取代码。
更新时间: 2024-06-04 07:54:31
领域: cs.LG,cs.NE
Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks
Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling strategy specifically designed for hypergraphs, which tackles their unique complexities in an efficient manner. We also present a Random Hyperedge Augmentation (RHA) technique and an additional Multilayer Perceptron (MLP) module to improve the robustness and generalization capabilities of our approach. Thorough experiments with real-world datasets have proven the effectiveness of our method, markedly reducing computational and memory demands while maintaining performance levels akin to conventional HGNNs and other baseline models. This research paves the way for improving both the scalability and efficacy of HGNNs in extensive applications. We will also make our codebase publicly accessible.
Updated: 2024-06-04 07:53:42
标题: Ada-HGNN:可扩展超图神经网络的自适应采样
摘要: 超图作为一个有效的模型,可以描绘各种现实世界场景中的复杂连接,从社交到生物网络。超图神经网络(HGNNs)的发展已经成为管理数据中复杂关联的有价值的方法,尽管由于内存限制,可扩展性是一个显著的挑战。在这项研究中,我们引入了一种特别设计用于超图的新的自适应采样策略,以有效地应对它们的独特复杂性。我们还提出了一种随机超边增强(RHA)技术和一个额外的多层感知器(MLP)模块,以提高我们方法的鲁棒性和泛化能力。通过对真实世界数据集进行彻底的实验,证明了我们方法的有效性,显著降低了计算和内存需求,同时保持了与传统HGNNs和其他基准模型相当的性能水平。这项研究为在广泛应用中提高HGNNs的可扩展性和效力铺平了道路。我们还将公开我们的代码库。
更新时间: 2024-06-04 07:53:42
领域: cs.LG
Learning to Optimize for Reinforcement Learning
In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learned optimizers do not work well even in simple RL tasks. We investigate this phenomenon and identify two issues. First, the agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. Moreover, due to highly stochastic agent-environment interactions, the agent-gradients have high bias and variance, which increases the difficulty of learning an optimizer for RL. We propose pipeline training and a novel optimizer structure with a good inductive bias to address these issues, making it possible to learn an optimizer for reinforcement learning from scratch. We show that, although only trained in toy tasks, our learned optimizer can generalize to unseen complex tasks in Brax.
Updated: 2024-06-04 07:52:31
标题: 学习优化强化学习
摘要: 最近几年来,通过利用更多的数据、计算和多样化的任务,学习优化器在监督学习中取得了显著的成功,超越了经典的手动设计的优化器。强化学习与监督学习本质上不同,在实践中,即使在简单的强化学习任务中,这些学习优化器也不能很好地工作。我们调查了这一现象并确定了两个问题。首先,代理-梯度分布是非独立和同分布的,导致元训练效率低下。此外,由于代理-环境交互高度随机,代理梯度具有高偏差和方差,增加了为强化学习学习优化器的难度。我们提出了流水线训练和一个具有良好归纳偏差的新型优化器结构,以解决这些问题,使得从头开始学习强化学习优化器成为可能。我们展示了,尽管只在玩具任务中训练,我们学习到的优化器可以推广到Brax中看不见的复杂任务中。
更新时间: 2024-06-04 07:52:31
领域: cs.LG,cs.AI
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusin on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage. In scenarios emphasizing memory efficiency, where only 0.7% of the KV cache is maintained, PyramidKV surpasses other KV cache compression techniques achieving up to a 20.5 absolute accuracy improvement on TREC.
Updated: 2024-06-04 07:51:30
标题: 金字塔KV:基于金字塔信息导流的动态KV缓存压缩
摘要: 在这项研究中,我们调查了大型语言模型(LLMs)内部基于注意力的信息流是否通过明显的模式进行汇总,以进行长文本处理。我们的观察表明,LLMs通过金字塔式信息漏斗汇总信息,其中注意力在较低层中广泛分散,逐渐在特定上下文中巩固,并最终集中于高层中的关键令牌(也称为大量激活或注意力池)。受到这些见解的启发,我们开发了PyramidKV,一种新颖且有效的KV缓存压缩方法。这种方法动态调整不同层级的KV缓存大小,在较低层分配更多缓存,在较高层分配更少缓存,与保持统一KV缓存大小的传统方法不同。我们利用LongBench基准测试进行的实验评估显示,PyramidKV与具有完整KV缓存的模型的性能相匹配,同时仅保留了KV缓存的12%,因此显着减少了内存使用。在强调内存效率的场景中,只保留了0.7%的KV缓存时,PyramidKV超越其他KV缓存压缩技术,在TREC上实现了高达20.5个绝对准确度的改进。
更新时间: 2024-06-04 07:51:30
领域: cs.CL,cs.AI
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM.
Updated: 2024-06-04 07:49:30
标题: 使用条件残差能量模型进行分子合成的偏好优化
摘要: 通过机器学习进行分子合成是药物发现中的一个基本问题。当前的数据驱动策略采用一步反合成模型和搜索算法以自顶向下的方式预测合成路径。尽管这些策略表现出效果,但由于贪婪地选择下一个分子集合而没有任何前瞻,这些策略在分子合成路径生成方面面临限制。此外,现有策略无法基于可能的标准(如材料成本、产量和步骤计数)控制合成路径的生成。在这项工作中,我们提出了一个通用而有原则的框架,通过条件残差能量模型(EBMs)关注基于特定标准的整个合成路径的质量。通过将额外的基于能量的函数纳入我们的概率模型,我们提出的算法可以以即插即用的方式增强由各种策略生成的最有可能的合成路径(具有更高的概率)的质量。大量实验证明,我们的框架可以持续提升各种策略的性能,并将前沿的top-1准确率提高了2.5%。代码可在https://github.com/SongtaoLiu0823/CREBM 获取。
更新时间: 2024-06-04 07:49:30
领域: cs.LG,q-bio.BM
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
The application scope of large language models (LLMs) is increasingly expanding. In practical use, users might provide feedback based on the model's output, hoping for a responsive model that can complete responses according to their feedback. Whether the model can appropriately respond to users' refuting feedback and consistently follow through with execution has not been thoroughly analyzed. In light of this, this paper proposes a comprehensive benchmark, RefuteBench, covering tasks such as question answering, machine translation, and email writing. The evaluation aims to assess whether models can positively accept feedback in form of refuting instructions and whether they can consistently adhere to user demands throughout the conversation. We conduct evaluations on numerous LLMs and find that LLMs are stubborn, i.e. exhibit inclination to their internal knowledge, often failing to comply with user feedback. Additionally, as the length of the conversation increases, models gradually forget the user's stated feedback and roll back to their own responses. We further propose a recall-and-repeat prompts as a simple and effective way to enhance the model's responsiveness to feedback.
Updated: 2024-06-04 07:48:51
标题: RefuteBench:评估大型语言模型的反驳指令遵循
摘要: 大型语言模型(LLMs)的应用范围越来越广泛。在实际使用中,用户可能会根据模型的输出提供反馈,希望得到一个响应灵敏的模型,可以根据他们的反馈完成回应。模型是否能够适当地回应用户的反驳性反馈,并始终按照执行进行分析尚未深入研究。鉴于此,本文提出了一个全面的基准测试,RefuteBench,涵盖了诸如问答、机器翻译和电子邮件撰写等任务。评估旨在评估模型是否能够积极接受反驳性指令形式的反馈,并且是否能够始终遵循用户在对话中的要求。我们对大量LLMs进行评估,发现LLMs往往固执己见,倾向于他们的内部知识,经常无法遵守用户的反馈。此外,随着对话长度的增加,模型逐渐忘记用户表达的反馈,并回归到他们自己的回应。我们进一步提出了一个召回和重复提示作为增强模型对反馈响应的简单而有效的方式。
更新时间: 2024-06-04 07:48:51
领域: cs.CL,cs.AI
Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation
Transfer attacks generate significant interest for real-world black-box applications by crafting transferable adversarial examples through surrogate models. Whereas, existing works essentially directly optimize the single-level objective w.r.t. the surrogate model, which always leads to poor interpretability of attack mechanism and limited generalization performance over unknown victim models. In this work, we propose the \textbf{B}il\textbf{E}vel \textbf{T}ransfer \textbf{A}ttac\textbf{K} (BETAK) framework by establishing an initialization derived bilevel optimization paradigm, which explicitly reformulates the nested constraint relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker. Algorithmically, we introduce the Hyper Gradient Response (HGR) estimation as an effective feedback for the transferability over pseudo-victim attackers, and propose the Dynamic Sequence Truncation (DST) technique to dynamically adjust the back-propagation path for HGR and reduce computational overhead simultaneously. Meanwhile, we conduct detailed algorithmic analysis and provide convergence guarantee to support non-convexity of the LL surrogate attacker. Extensive evaluations demonstrate substantial improvement of BETAK (e.g., $\mathbf{53.41}$\% increase of attack success rates against IncRes-v$2_{ens}$) against different victims and defense methods in targeted and untargeted attack scenarios. The source code is available at https://github.com/callous-youth/BETAK.
Updated: 2024-06-04 07:45:27
标题: 推进广义传递攻击:基于初始化衍生的双层优化和动态序列截断
摘要: 转移攻击通过构建可在替代模型中生成可转移对抗样本,引起了对真实世界黑盒应用的极大兴趣。然而,现有工作基本上直接优化相对于替代模型的单层目标,这总是导致攻击机制的解释性差和在未知受害者模型上的有限泛化性能。在这项工作中,我们提出了BilEvel Transfer AttacK(BETAK)框架,通过建立一个源自双层优化范式的初始化,明确地重新构建上层(UL)伪受害者攻击者和下层(LL)替代攻击者之间嵌套约束关系。在算法上,我们引入了超梯度响应(HGR)估计作为对伪受害者攻击者的可转移性的有效反馈,并提出了动态序列截断(DST)技术,动态调整HGR的反向传播路径并同时减少计算开销。同时,我们进行了详细的算法分析,并提供了收敛保证,以支持LL替代攻击者的非凸性。广泛的评估表明,在针对性和非针对性攻击情景中,BETAK实现了显著的改进(例如,对IncRes-v2_ens的攻击成功率增加了53.41%),针对不同的受害者和防御方法。源代码可在https://github.com/callous-youth/BETAK找到。
更新时间: 2024-06-04 07:45:27
领域: cs.LG,cs.CR,cs.CV
Partial-Label Learning with a Reject Option
In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they impact actions or decisions. We propose a novel risk-consistent partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions instead. When evaluated without the reject option, our nearest neighbor-based approach also achieves competitive prediction performance.
Updated: 2024-06-04 07:45:21
标题: 带有拒绝选项的部分标签学习
摘要: 在现实世界的应用中,人们经常会遇到标记模糊的数据,即不同的标注者分配冲突的类标签。部分标记学习允许在这种弱监督设置中训练分类器,其中最先进的方法已经表现出良好的预测性能。然而,即使是最好的算法也会给出错误的预测,当它们影响行动或决策时可能会产生严重后果。我们提出了一种新颖的风险一致的部分标记学习算法,其中包括一个拒绝选项,也就是说,该算法可以拒绝不确定的预测。对人工和真实世界数据集的大量实验表明,与使用置信度阈值拒绝不确定预测的竞争对手相比,我们的方法在非拒绝预测的数量和准确性之间提供了最佳的权衡。当在没有拒绝选项的情况下进行评估时,我们基于最近邻的方法也取得了竞争力的预测性能。
更新时间: 2024-06-04 07:45:21
领域: cs.LG,stat.ML
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across various sets of standardized benchmarks showing high scores for such models. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales which claim strong function, using a simple, short, conventional common sense problem formulated in concise natural language, easily solvable by humans. The breakdown is dramatic, as models also express strong overconfidence in their wrong solutions, while providing often non-sensical "reasoning"-like explanations akin to confabulations to justify and backup the validity of their clearly failed responses, making them sound plausible. Various standard interventions in an attempt to get the right solution, like various type of enhanced prompting, or urging the models to reconsider the wrong solutions again by multi step re-evaluation, fail. We take these initial observations to the scientific and technological community to stimulate urgent re-assessment of the claimed capabilities of current generation of LLMs, Such re-assessment also requires common action to create standardized benchmarks that would allow proper detection of such basic reasoning deficits that obviously manage to remain undiscovered by current state-of-the-art evaluation procedures and benchmarks. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/AIW
Updated: 2024-06-04 07:43:33
标题: 《爱丽丝梦游仙境:最先进的大型语言模型中展示完全推理崩溃的简单任务》
摘要: 大型语言模型(LLMs)通常被描述为基础模型的实例-即,在少数显示或零射击方式下,在各种任务和条件之间强烈转移的模型,同时展示了预训练规模增加时预测功能改进的扩展规律。这些在不同功能和任务上表现出色的主张依赖于通过对各种标准基准集的测量所显示的高分数的模型。我们在这里展示了目前最大规模训练的最先进模型在简单、简短、常识问题中功能和推理能力的显著崩溃,这些问题用简洁的自然语言表达,易于人类解决。这种崩溃是戏剧性的,因为模型还表现出对错误解决方案的强烈过度自信,同时提供常常类似于胡言乱语的“推理”解释,以证明和支持其明显失败的响应的有效性,使其听起来似乎合理。尝试获取正确解决方案的各种标准干预措施,如各种类型的增强提示,或者敦促模型通过多步重新评估再次考虑错误解决方案,均失败。我们将这些初步观察结果提交给科学和技术界,以刺激对当前一代LLMs声称的能力进行紧急重新评估。这种重新评估还需要共同行动,以创建标准基准,从而允许对显然未被目前最先进的评估程序和基准发现的基本推理缺陷进行正确检测。可以在https://github.com/LAION-AI/AIW找到用于重现本文实验的代码和原始实验数据。
更新时间: 2024-06-04 07:43:33
领域: cs.LG,cs.AI,cs.CL
I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering
Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.
Updated: 2024-06-04 07:43:12
标题: 我找到了“答案”!对LLMs中隐藏状态在问答中的解释
摘要: 人工智能的可解释性和可解释性正变得日益重要,尤其是考虑到大型语言模型(LLMs)的快速发展。本文探讨了在基于知识的问答环境中解释LLMs的问题。研究的主要假设是在隐藏状态的水平上可以区分正确和不正确的模型行为。采用了量子化模型LLaMA-2-7B-Chat、Mistral-7B、Vicuna-7B以及MuSeRC问答数据集来测试这一假设。分析结果支持了提出的假设。我们还确定了对模型行为产生负面影响的层。作为假设实际应用的前景,我们建议额外训练这些“弱”层,以提高任务解决方案的质量。
更新时间: 2024-06-04 07:43:12
领域: cs.CL,cs.AI
Graph Adversarial Diffusion Convolution
This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
Updated: 2024-06-04 07:43:04
标题: 图对抗性扩散卷积
摘要: 本文介绍了一个用于图信号去噪(GSD)问题的最小-最大优化公式。在这个公式中,我们首先通过引入基于拉普拉斯距离的图结构扰动来最大化GSD的第二项,然后最小化GSD的总损失。通过解决最小-最大优化问题,我们推导出了图扩散卷积(GDC)架构的一个新变种,称为图对抗性扩散卷积(GADC)。GADC通过引入一个额外的项,增强了对图结构的对抗攻击和节点特征中的噪声的鲁棒性,与GDC有所不同。此外,GADC提高了在异质图上的GDC的性能。大量实验证明了GADC在各种数据集上的有效性。代码可在https://github.com/SongtaoLiu0823/GADC获取。
更新时间: 2024-06-04 07:43:04
领域: cs.LG
Tabular and Deep Learning for the Whittle Index
The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this paper we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm, neural-network based approximate Q-learning and other state of the art algorithms.
Updated: 2024-06-04 07:41:15
标题: 表格和深度学习用于Whittle指数
摘要: Whittle指数策略是一种启发式方法,在应用于被称为不安定多臂赌博问题(RMABPs)的问题类时表现出非常好的性能(具有保证的渐近最优性)。在本文中,我们提出了QWI和QWINN两种强化学习算法,分别为表格和深度学习,用于学习总折扣准则的Whittle指数。关键特征是使用两个时间尺度,一个更快地更新状态-动作Q值,一个相对较慢地更新Whittle指数。在我们的主要理论结果中,我们展示了QWI,它是一个表格实现,收敛到真实的Whittle指数。然后,我们提出了QWINN,这是QWI算法的一种适应,使用神经网络在更快的时间尺度上计算Q值,能够从一个状态推断信息到另一个状态,并且自然地适应大状态空间环境。对于QWINN,我们展示了贝尔曼误差的所有局部极小值都是局部稳定的均衡状态,这是基于DQN方案的第一种结果。数值计算表明,QWI和QWINN比标准的Q学习算法、基于神经网络的近似Q学习和其他最先进的算法收敛速度更快。
更新时间: 2024-06-04 07:41:15
领域: cs.AI,cs.LG
CAP: A Context-Aware Neural Predictor for NAS
Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training based on the contextual information from the architectures. Specifically, the input architectures are encoded into graphs and the predictor infers the contextual structure around the nodes inside each graph. Then, enhanced by the proposed context-aware self-supervised task, the pre-trained predictor can obtain expressive and generalizable representations of architectures. Therefore, only a few annotated architectures are sufficient for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors. In particular, CAP can rank architectures precisely at the budget of only 172 annotated architectures in NAS-Bench-101. Moreover, CAP can help find promising architectures in both NAS-Bench-101 and DARTS search spaces on the CIFAR-10 dataset, serving as a useful navigator for NAS to explore the search space efficiently.
Updated: 2024-06-04 07:37:47
标题: CAP: 一种用于NAS的上下文感知神经预测器
摘要: 神经预测器在神经架构搜索(NAS)中的性能评估阶段中起到了有效的促进作用,因为它们可以直接估计未见架构。尽管有效,但是用较少的已注释架构训练一个强大的神经预测器仍然是一个巨大的挑战。在本文中,我们提出了一种基于上下文的神经预测器(CAP),它只需要少量已注释架构用于训练,基于架构的上下文信息。具体而言,输入架构被编码成图,预测器推断每个图内节点周围的上下文结构。然后,通过提出的上下文感知自监督任务增强,预训练的预测器可以获得表现出色且具有泛化能力的架构表示。因此,只需要少量已注释架构进行训练。在不同的搜索空间中的实验结果表明,与最先进的神经预测器相比,CAP表现出优越的性能。特别是,在NAS-Bench-101中,CAP可以精确地在只有172个已注释架构的预算下对架构进行排名。此外,CAP可以帮助在CIFAR-10数据集上的NAS-Bench-101和DARTS搜索空间中找到有前途的架构,为NAS有效地探索搜索空间提供了一个有用的导航器。
更新时间: 2024-06-04 07:37:47
领域: cs.LG,cs.NE
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer supervision. Additionally, the retention of non-informative tokens leads to increased computational demands and time costs, particularly in CLIP's ViT image encoder. To address these issues, we propose Multi-Perspective Language-Image Pretraining (MLIP). In MLIP, we leverage the frequency transform's sensitivity to both high and low-frequency variations, which complements the spatial domain's sensitivity limited to low-frequency variations only. By incorporating frequency transforms and token-level alignment, we expand CILP's single supervision into multi-domain and multi-level supervision, enabling a more thorough exploration of informative image features. Additionally, we introduce a token merging method guided by comprehensive semantics from the frequency and spatial domains. This allows us to merge tokens to multi-granularity tokens with a controllable compression rate to accelerate CLIP. Extensive experiments validate the effectiveness of our design.
Updated: 2024-06-04 07:36:57
标题: MLIP:高效的多视角语言-图像预训练,充分利用数据
摘要: 对比语言-图像预训练(CLIP)取得了显著的成功,在多模态研究方面取得了快速进展。然而,CLIP在数据利用效率方面面临着一个明显的挑战。它依赖于表示学习期间每个图像-文本对的单一对比监督,忽视了大量有价值的信息,这些信息可以提供更丰富的监督。此外,保留非信息性令牌会增加计算需求和时间成本,尤其是在CLIP的ViT图像编码器中。为了解决这些问题,我们提出了多视角语言-图像预训练(MLIP)。在MLIP中,我们利用频率变换对高频和低频变化的敏感性,这与空间域对低频变化的敏感性相辅相成。通过整合频率变换和令牌级别的对齐,我们将CILP的单一监督扩展为多领域和多级别监督,实现了对信息丰富图像特征的更全面探索。此外,我们引入了一种由频率和空间领域的综合语义指导的令牌合并方法。这使我们能够将令牌合并为具有可控压缩率的多粒度令牌,以加速CLIP。广泛的实验证实了我们设计的有效性。
更新时间: 2024-06-04 07:36:57
领域: cs.CV,cs.AI
KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-guided refinement to construct a semantically coherent hierarchical structure of entity clusters. By incorporating this hierarchical knowledge along with textual information during the fine-tuning process, KG-FIT effectively captures both global semantics from the LLM and local semantics from the KG. Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. These results highlight the effectiveness of KG-FIT in incorporating open-world knowledge from LLMs to significantly enhance the expressiveness and informativeness of KG embeddings.
Updated: 2024-06-04 07:35:32
标题: KG-FIT:基于开放世界知识的知识图微调
摘要: 知识图谱嵌入(KGE)技术在学习知识图谱中实体和关系的紧凑表示方面至关重要,有助于有效推理和知识发现。虽然现有方法通常专注于基于图结构训练KGE模型或使用分类数据在KG中微调预训练语言模型,但KG-FIT利用LLM引导的精炼来构建实体群集的语义连贯的层次结构。通过在微调过程中结合这种层次知识和文本信息,KG-FIT有效地捕捉了LLM的全局语义和KG的局部语义。在基准数据集FB15K-237、YAGO3-10和PrimeKG上进行的大量实验表明,KG-FIT相对于最先进的基于预训练语言模型的方法具有优势,在链接预测任务的Hits@10指标上分别实现了14.4%、13.5%和11.9%的改进。此外,与构建在其基础上的基于结构的基准模型相比,KG-FIT分别获得了12.6%、6.7%和17.7%的显着性能增益。这些结果突显了KG-FIT在整合来自LLM的开放世界知识以显著增强KG嵌入的表达能力和信息量方面的有效性。
更新时间: 2024-06-04 07:35:32
领域: cs.CL,cs.LG
PETRA: Parallel End-to-end Training with Reversible Architectures
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
Updated: 2024-06-04 07:35:23
标题: PETRA:可逆架构的并行端到端训练
摘要: 可逆结构已被证明能够与非可逆结构相媲美,在深度学习中被应用于节省内存和生成建模。在这项工作中,我们展示了可逆结构如何解决并行化深度模型训练中的挑战。我们介绍了PETRA,一种新颖的用于并行化梯度计算的反向传播的替代方法。PETRA通过使阶段(即一组层)能够在不同设备上独立计算来促进有效的模型并行化,同时只需要在它们之间传递激活和梯度。通过分离前向和后向传递并保持参数的单个更新版本,也消除了对权重存储的需求。我们为PETRA开发了一个类似于自动微分的自定义训练框架,并展示了其在CIFAR-10、ImageNet32和ImageNet上的有效性,利用ResNet-18、ResNet-34和ResNet-50模型实现了与反向传播相当的竞争准确性。
更新时间: 2024-06-04 07:35:23
领域: cs.LG,stat.ML
Physics Inspired Criterion for Pruning-Quantization Joint Learning
Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an analogy we first draw between elasticity dynamics (ED) and model compression (MC). Specifically, derived from Hooke's law in ED, we establish a linear relationship between the filters' importance distribution and the filter property (FP) by a learnable deformation scale in the physics inspired criterion (PIC). Furthermore, we extend PIC with a relative shift variable for a global view. To ensure feasibility and flexibility, available maximum bitwidth and penalty factor are introduced in quantization bitwidth assignment. Experiments on benchmarks of image classification demonstrate that PIC-PQ yields a good trade-off between accuracy and bit-operations (BOPs) compression ratio e.g., 54.96X BOPs compression ratio in ResNet56 on CIFAR10 with 0.10% accuracy drop and 53.24X in ResNet18 on ImageNet with 0.61% accuracy drop). The code will be available at https://github.com/fanxxxxyi/PIC-PQ.
Updated: 2024-06-04 07:34:05
标题: 物理学启发的用于修剪-量化联合学习的标准
摘要: 修剪-量化联合学习总是有助于在资源受限的边缘设备上部署深度神经网络(DNNs)。然而,大多数现有方法并未以可解释的方式联合学习修剪和量化的全局标准。本文提出了一种新颖的受物理启发的修剪-量化联合学习标准(PIC-PQ),该标准是从弹性动力学(ED)与模型压缩(MC)之间的类比中首次提出的。具体地,我们从弹性动力学中的胡克定律推导出,在受物理启发的标准(PIC)中通过可学习的变形比例建立了滤波器重要性分布与滤波器属性(FP)之间的线性关系。此外,我们通过相对移位变量扩展了PIC以获得全局视图。为确保可行性和灵活性,在量化比特宽度分配中引入了可用的最大比特宽度和惩罚因子。在图像分类基准测试上的实验表明,PIC-PQ在准确性和比特操作(BOPs)压缩比之间取得了良好的平衡,例如在CIFAR10上的ResNet56中取得了54.96倍的BOPs压缩比,准确率下降0.10%,在ImageNet上的ResNet18中取得了53.24倍的BOPs压缩比,准确率下降0.61%。代码将在https://github.com/fanxxxxyi/PIC-PQ 上提供。
更新时间: 2024-06-04 07:34:05
领域: cs.LG,cs.CV
The Role of Learning Algorithms in Collective Action
Collective action in machine learning is the study of the control that a coordinated group can have over machine learning algorithms. While previous research has concentrated on assessing the impact of collectives against Bayes (sub-)optimal classifiers, this perspective is limited in that it does not account for the choice of learning algorithm. Since classifiers seldom behave like Bayes classifiers and are influenced by the choice of learning algorithms along with their inherent biases, in this work we initiate the study of how the choice of the learning algorithm plays a role in the success of a collective in practical settings. Specifically, we focus on distributionally robust optimization (DRO), popular for improving a worst group error, and on the ubiquitous stochastic gradient descent (SGD), due to its inductive bias for "simpler" functions. Our empirical results, supported by a theoretical foundation, show that the effective size and success of the collective are highly dependent on properties of the learning algorithm. This highlights the necessity of taking the learning algorithm into account when studying the impact of collective action in machine learning.
Updated: 2024-06-04 07:34:01
标题: 学习算法在集体行动中的作用
摘要: 机器学习中的集体行动是研究协调群体对机器学习算法的控制能力。虽然先前的研究集中于评估集体对贝叶斯(次)最优分类器的影响,但这种观点存在局限性,因为它没有考虑学习算法的选择。由于分类器很少像贝叶斯分类器那样行为,受到学习算法选择及其固有偏见的影响,因此在本研究中,我们开始研究学习算法的选择在实际情况下对集体成功的影响。具体而言,我们关注分布鲁棒优化(DRO),这在改进最坏组错误方面很受欢迎,以及普遍的随机梯度下降(SGD),因为它对“更简单”函数具有归纳偏好。我们的实证结果,得到理论基础的支持,表明集体的有效规模和成功程度高度依赖于学习算法的属性。这突显了在研究机器学习中集体行动的影响时,需要考虑学习算法的必要性。
更新时间: 2024-06-04 07:34:01
领域: cs.LG,cs.CY,stat.ML
Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the attention waveform, leading to decreased performance. To address this issue, we propose a novel inference method named Attention Buckets. It allows LLMs to process their input through multiple parallel processes. Each process utilizes a distinct base angle for the rotary position embedding, thereby creating a unique attention waveform. By compensating an attention trough of a particular process with an attention peak of another process, our approach enhances LLM's awareness to various contextual positions, thus mitigating the risk of overlooking crucial information. In the largest tool-use benchmark, our method elevates a 7B model to achieve state-of-the-art performance, comparable to that of GPT-4. On other benchmarks and some RAG tasks, which also demand a thorough understanding of contextual content, Attention Buckets also exhibited notable enhancements in performance.
Updated: 2024-06-04 07:33:12
标题: 加强注意力中最短的支柱:增强大型语言模型的上下文感知能力,以实现有效的工具使用
摘要: 在这篇论文中,我们证明了大型语言模型(LLMs)在注意力分配中存在的固有波形模式显著影响它们在需要高度上下文意识的任务中的表现,比如利用LLMs进行工具使用。具体来说,当关键信息在上下文中位于注意力波形的低谷区域时,模型可能会忽略这些信息,导致性能下降。为了解决这个问题,我们提出了一种名为Attention Buckets的新型推理方法。它允许LLMs通过多个并行过程处理输入。每个过程利用不同的基角度进行旋转位置嵌入,从而创建独特的注意力波形。通过用一个过程的注意力低谷补偿另一个过程的注意力高峰,我们的方法增强了LLMs对各种上下文位置的意识,从而减轻了忽略关键信息的风险。在最大的工具使用基准测试中,我们的方法使一个7B模型达到了与GPT-4相媲美的最新性能水平。在其他基准测试和一些RAG任务中,这些任务也要求对上下文内容进行深入理解,Attention Buckets在性能方面也表现出明显的增强。
更新时间: 2024-06-04 07:33:12
领域: cs.CL,cs.AI,cs.LG
Causal Effect Identification in LiNGAM Models with Latent Confounders
We study the generic identifiability of causal effects in linear non-Gaussian acyclic models (LiNGAM) with latent variables. We consider the problem in two main settings: When the causal graph is known a priori, and when it is unknown. In both settings, we provide a complete graphical characterization of the identifiable direct or total causal effects among observed variables. Moreover, we propose efficient algorithms to certify the graphical conditions. Finally, we propose an adaptation of the reconstruction independent component analysis (RICA) algorithm that estimates the causal effects from the observational data given the causal graph. Experimental results show the effectiveness of the proposed method in estimating the causal effects.
Updated: 2024-06-04 07:30:27
标题: 在具有潜在混杂因素的LiNGAM模型中的因果效应识别
摘要: 我们研究了具有潜在变量的线性非高斯无环模型(LiNGAM)中因果效应的一般可识别性。我们在两个主要设置中考虑了这个问题:在因果图先验已知的情况下,以及在因果图未知的情况下。在这两种情境下,我们提供了可识别的观察变量之间的直接或总因果效应的完整图形特征化。此外,我们提出了有效的算法来验证图形条件。最后,我们提出了一种重建独立分量分析(RICA)算法的改进版,该算法从观测数据中估计因果效应,给定了因果图。实验结果显示了该方法在估计因果效应方面的有效性。
更新时间: 2024-06-04 07:30:27
领域: stat.ML,cs.LG,stat.ME
QROA: A Black-Box Query-Response Optimization Attack on LLMs
Large Language Models (LLMs) have surged in popularity in recent months, yet they possess concerning capabilities for generating harmful content when manipulated. This study introduces the Query-Response Optimization Attack (QROA), an optimization-based strategy designed to exploit LLMs through a black-box, query-only interaction. QROA adds an optimized trigger to a malicious instruction to compel the LLM to generate harmful content. Unlike previous approaches, QROA does not require access to the model's logit information or any other internal data and operates solely through the standard query-response interface of LLMs. Inspired by deep Q-learning and Greedy coordinate descent, the method iteratively updates tokens to maximize a designed reward function. We tested our method on various LLMs such as Vicuna, Falcon, and Mistral, achieving an Attack Success Rate (ASR) over 80\%. We also tested the model against Llama2-chat, the fine-tuned version of Llama2 designed to resist Jailbreak attacks, achieving good ASR with a suboptimal initial trigger seed. This study demonstrates the feasibility of generating jailbreak attacks against deployed LLMs in the public domain using black-box optimization methods, enabling more comprehensive safety testing of LLMs.
Updated: 2024-06-04 07:27:36
标题: QROA:一种针对LLMs的黑盒查询响应优化攻击
摘要: 最近几个月,大型语言模型(LLMs)在流行度上迅速上升,然而当被操纵时,它们具有生成有害内容的令人担忧的能力。本研究介绍了查询-响应优化攻击(QROA),这是一种基于优化的策略,旨在通过黑盒、仅查询交互来利用LLMs。QROA将一个经过优化的触发器添加到恶意指令中,以迫使LLMs生成有害内容。与先前的方法不同,QROA不需要访问模型的logit信息或任何其他内部数据,仅通过LLMs的标准查询-响应接口进行操作。受深度Q学习和贪婪坐标下降的启发,该方法迭代更新令牌以最大化设计的奖励函数。我们在各种LLMs上测试了我们的方法,如Vicuna、Falcon和Mistral,取得了超过80\%的攻击成功率(ASR)。我们还测试了模型对抗Llama2-chat,这是Llama2的微调版本,旨在抵抗越狱攻击,使用次优的初始触发种子取得了良好的ASR。本研究证明了使用黑盒优化方法在公共领域对部署的LLMs生成越狱攻击的可行性,从而实现更全面的LLMs安全测试。
更新时间: 2024-06-04 07:27:36
领域: cs.CL,cs.LG
DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment
Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-BP training algorithms, such as the direct feedback alignment, have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the i.i.d. assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.
Updated: 2024-06-04 07:24:51
标题: DFA-GNN: 通过直接反馈对齐实现图神经网络的前向学习
摘要: 图神经网络以其在各种应用中的强大表现而闻名,反向传播算法在大多数GNN模型的发展中起着核心作用。然而,尽管其有效性,BP存在挑战其生物学可信度并影响基于图的任务的神经网络训练的效率、可伸缩性和并行性的局限性。虽然已成功将几种非BP训练算法(如直接反馈对齐)应用于处理欧几里得数据的全连接和卷积网络组件,但直接将这些非BP框架调整为管理图数据的非欧几里得GNN模型则面临重大挑战。这些挑战主要源于图数据中i.i.d.假设的违反以及难以访问图中所有样本(节点)的预测错误。为克服这些障碍,在本文中,我们提出了DFA-GNN,一种专为GNN定制的前向学习框架,以半监督学习为案例研究。所提出的方法通过使用专门的前向训练机制打破了BP的限制。具体而言,DFA-GNN将DFA的原则延伸到适应图数据和GNN的独特架构,将图拓扑信息整合到反馈链接中,以适应图数据的非欧几里得特性。此外,针对半监督图学习任务,我们开发了一个伪误差生成器,将训练数据中的残差误差传播到每个未标记节点,从而创建伪误差。然后利用这些伪误差使用DFA来训练GNN。在10个公共基准测试上的广泛实验表明,我们的学习框架不仅胜过先前的非BP方法,还胜过标准BP方法,并且对各种噪声和攻击表现出优秀的稳健性。
更新时间: 2024-06-04 07:24:51
领域: cs.LG,cs.AI
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation learning under the simplifying assumption that the algorithm depends on a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical instantiations of such algorithms, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions between the limiting solutions of the BYOL-$\Pi$ and BYOL-AC dynamics. We show how the two representations are related by a variance equation. This connection leads to a novel variance-like action-conditional objective (BYOL-VAR) and its corresponding ODE. We unify the study of all three objectives through two complementary lenses; a model-based perspective, where each objective is shown to be equivalent to a low-rank approximation of certain dynamics, and a model-free perspective, which establishes relationships between the objectives and their respective value, Q-value, and advantage function. Our empirical investigations, encompassing both linear function approximation and Deep RL environments, demonstrates that BYOL-AC is better overall in a variety of different settings.
Updated: 2024-06-04 07:22:12
标题: 一个行动条件自预测强化学习的统一框架
摘要: 学习一个良好的表示对于强化学习(RL)代理是一个至关重要的挑战。自预测学习提供了一种方法,通过从未来的潜在表示中引导学习一个潜在表示和动态模型(BYOL)。最近的研究通过研究一个连续时间ODE模型,对这些算法进行了理论洞察,以简化假设为条件,该算法依赖于一个固定策略(BYOL-$\Pi$);这一假设与实际实现这类算法的做法相矛盾,后者明确地将其预测条件设置为未来的行动。在这项工作中,我们通过分析一个行动条件的自预测目标(BYOL-AC)使用ODE框架,对其收敛性质进行了研究,并突出了BYOL-$\Pi$和BYOL-AC动态的极限解之间的重要区别。我们展示了这两种表示是如何通过方差方程相关联的。这种联系导致了一种新颖的类似方差的行动条件目标(BYOL-VAR)及其相应的ODE。我们通过两种互补的视角统一了对所有三个目标的研究;一种基于模型的视角,其中显示每个目标等效于某些动态的低秩逼近,另一种基于无模型的视角,建立了目标及其相应价值、Q值和优势函数之间的关系。我们的实证研究涵盖了线性函数逼近和深度RL环境,证明了BYOL-AC在各种不同设置下总体表现更好。
更新时间: 2024-06-04 07:22:12
领域: cs.LG,cs.AI
DisCo: Towards Harmonious Disentanglement and Collaboration between Tabular and Semantic Space for Recommendation
Recommender systems play important roles in various applications such as e-commerce, social media, etc. Conventional recommendation methods usually model the collaborative signals within the tabular representation space. Despite the personalization modeling and the efficiency, the latent semantic dependencies are omitted. Methods that introduce semantics into recommendation then emerge, injecting knowledge from the semantic representation space where the general language understanding are compressed. However, existing semantic-enhanced recommendation methods focus on aligning the two spaces, during which the representations of the two spaces tend to get close while the unique patterns are discarded and not well explored. In this paper, we propose DisCo to Disentangle the unique patterns from the two representation spaces and Collaborate the two spaces for recommendation enhancement, where both the specificity and the consistency of the two spaces are captured. Concretely, we propose 1) a dual-side attentive network to capture the intra-domain patterns and the inter-domain patterns, 2) a sufficiency constraint to preserve the task-relevant information of each representation space and filter out the noise, and 3) a disentanglement constraint to avoid the model from discarding the unique information. These modules strike a balance between disentanglement and collaboration of the two representation spaces to produce informative pattern vectors, which could serve as extra features and be appended to arbitrary recommendation backbones for enhancement. Experiment results validate the superiority of our method against different models and the compatibility of DisCo over different backbones. Various ablation studies and efficiency analysis are also conducted to justify each model component.
Updated: 2024-06-04 07:17:46
标题: DisCo:推荐中表格和语义空间之间和谐解缠和协作的探索
摘要: 推荐系统在电子商务、社交媒体等各种应用中发挥着重要作用。传统的推荐方法通常在表格表示空间内建模协同信号。尽管个性化建模和效率很重要,但潜在的语义依赖被忽略了。随后出现了将语义引入推荐的方法,从语义表示空间注入知识,其中包括压缩了一般语言理解的知识。然而,现有的语义增强推荐方法侧重于对齐两个空间,在此过程中,两个空间的表示往往会接近,而独特模式被丢弃并未充分探索。在本文中,我们提出了DisCo来解开两个表示空间中的独特模式,并协作两个空间以增强推荐,在其中捕捉了两个空间的特定性和一致性。具体来说,我们提出了1)双侧注意网络来捕捉领域内模式和领域间模式,2)充分约束来保留每个表示空间的任务相关信息并过滤噪音,3)解开约束以避免模型丢弃独特信息。这些模块在解开和协作两个表示空间之间取得了平衡,产生了信息丰富的模式向量,可以作为额外特征附加到任意推荐骨干上以增强。实验结果验证了我们的方法对不同模型的优越性以及DisCo在不同骨干上的兼容性。还进行了各种消融研究和效率分析,以证明每个模型组件的合理性。
更新时间: 2024-06-04 07:17:46
领域: cs.IR,cs.AI
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs
The integration of Large Language Models (LLMs) and knowledge graphs (KGs) has achieved remarkable success in various natural language processing tasks. However, existing methodologies that integrate LLMs and KGs often navigate the task-solving process solely based on the LLM's analysis of the question, overlooking the rich cognitive potential inherent in the vast knowledge encapsulated in KGs. To address this, we introduce Observation-Driven Agent (ODA), a novel AI agent framework tailored for tasks involving KGs. ODA incorporates KG reasoning abilities via global observation, which enhances reasoning capabilities through a cyclical paradigm of observation, action, and reflection. Confronting the exponential explosion of knowledge during observation, we innovatively design a recursive observation mechanism. Subsequently, we integrate the observed knowledge into the action and reflection modules. Through extensive experiments, ODA demonstrates state-of-the-art performance on several datasets, notably achieving accuracy improvements of 12.87% and 8.9%.
Updated: 2024-06-04 07:16:14
标题: ODA: 观察驱动代理用于集成LLMs和知识图(ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs)
摘要: 大型语言模型(LLMs)和知识图谱(KGs)的整合在各种自然语言处理任务中取得了显著的成功。然而,现有的整合LLMs和KGs的方法往往仅基于LLM对问题的分析导航任务解决过程,忽视了KGs中所蕴含的丰富认知潜力。为了解决这一问题,我们引入了Observation-Driven Agent(ODA),这是一种专为涉及KGs的任务量身定制的新型AI代理框架。ODA通过全局观察融入KG推理能力,通过观察、行动和反思的循环范式增强推理能力。面对观察过程中知识的指数级增长,我们创新地设计了一个递归观察机制。随后,我们将观察到的知识整合到行动和反思模块中。通过大量实验,ODA在几个数据集上展示了最先进的性能,尤其在准确性方面实现了12.87%和8.9%的改进。
更新时间: 2024-06-04 07:16:14
领域: cs.CL,cs.AI
Multimodal Reasoning with Multimodal Knowledge Graph
Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs. Some approaches have sought to mitigate these issues by employing textual knowledge graphs, but their singular modality of knowledge limits comprehensive cross-modal understanding. In this paper, we propose the Multimodal Reasoning with Multimodal Knowledge Graph (MR-MKG) method, which leverages multimodal knowledge graphs (MMKGs) to learn rich and semantic knowledge across modalities, significantly enhancing the multimodal reasoning capabilities of LLMs. In particular, a relation graph attention network is utilized for encoding MMKGs and a cross-modal alignment module is designed for optimizing image-text alignment. A MMKG-grounded dataset is constructed to equip LLMs with initial expertise in multimodal reasoning through pretraining. Remarkably, MR-MKG achieves superior performance while training on only a small fraction of parameters, approximately 2.25% of the LLM's parameter size. Experimental results on multimodal question answering and multimodal analogy reasoning tasks demonstrate that our MR-MKG method outperforms previous state-of-the-art models.
Updated: 2024-06-04 07:13:23
标题: 多模态知识图谱的多模态推理
摘要: 使用大型语言模型(LLM)进行多模态推理通常会出现幻觉,并且LLM中存在不足或过时的知识。一些方法试图通过使用文本知识图来缓解这些问题,但它们的单一知识模态限制了跨模态理解的全面性。在本文中,我们提出了多模态知识图(MMKG)的多模态推理方法(MR-MKG),利用MMKG来跨模态学习丰富和语义的知识,显著增强LLM的多模态推理能力。具体来说,采用关系图注意力网络来编码MMKG,并设计了跨模态对齐模块来优化图像-文本对齐。构建了一个基于MMKG的数据集,为LLM提供了多模态推理的初步专业知识。值得注意的是,在仅使用LLM参数大小的约2.25%进行训练时,MR-MKG实现了更好的性能。在多模态问答和多模态类比推理任务上的实验结果表明,我们的MR-MKG方法优于先前的最先进模型。
更新时间: 2024-06-04 07:13:23
领域: cs.CL,cs.AI
Inference Attacks in Machine Learning as a Service: A Taxonomy, Review, and Promising Directions
The prosperity of machine learning has also brought people's concerns about data privacy. Among them, inference attacks can implement privacy breaches in various MLaaS scenarios and model training/prediction phases. Specifically, inference attacks can perform privacy inference on undisclosed target training sets based on outputs of the target model, including but not limited to statistics, membership, semantics, data representation, etc. For instance, infer whether the target data has the characteristics of AIDS. In addition, the rapid development of the machine learning community in recent years, especially the surge of model types and application scenarios, has further stimulated the inference attacks' research. Thus, studying inference attacks and analyzing them in depth is urgent and significant. However, there is still a gap in the systematic discussion of inference attacks from taxonomy, global perspective, attack, and defense perspectives. This survey provides an in-depth and comprehensive inference of attacks and corresponding countermeasures in ML-as-a-service based on taxonomy and the latest researches. Without compromising researchers' intuition, we first propose the 3MP taxonomy based on the community research status, trying to normalize the confusing naming system of inference attacks. Also, we analyze the pros and cons of each type of inference attack, their workflow, countermeasure, and how they interact with other attacks. In the end, we point out several promising directions for researchers from a more comprehensive and novel perspective.
Updated: 2024-06-04 07:06:06
标题: 机器学习作为服务中的推断攻击:分类、评论和前景方向
摘要: 机器学习的繁荣也引起了人们对数据隐私的关注。其中,推理攻击可以在各种MLaaS场景和模型训练/预测阶段实施隐私侵犯。具体来说,推理攻击可以基于目标模型的输出,在未公开的目标训练集上执行隐私推理,包括但不限于统计数据、成员身份、语义、数据表示等。例如,推断目标数据是否具有艾滋病特征。此外,近年来机器学习社区的迅速发展,特别是模型类型和应用场景的激增,进一步激发了推理攻击研究的兴趣。因此,研究推理攻击并深入分析是迫切和重要的。然而,从分类、全球视角、攻击和防御角度对推理攻击进行系统讨论仍存在差距。本调查基于分类和最新研究,提供了对ML-as-a-service中推理攻击及相应对策的深入和全面推理。在不影响研究人员的直觉的情况下,我们首先根据社区研究进展提出了3MP分类法,试图规范推理攻击混乱的命名系统。此外,我们分析了每种推理攻击的优缺点、工作流程、对策以及它们如何与其他攻击相互作用。最后,我们从更全面和新颖的角度指出了几个有前途的研究方向。
更新时间: 2024-06-04 07:06:06
领域: cs.LG,cs.AI,cs.CR,cs.CV
Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model
The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection performance, it suffers from significant inefficiency issues, as detecting a single candidate requires querying the source LLM with hundreds of its perturbations. This paper aims to bridge this gap. Concretely, we propose to incorporate a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other samples, to improve query efficiency. Empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget. Notably, when detecting the text generated by LLaMA family models, our method with just 2 or 3 queries can outperform DetectGPT with 200 queries.
Updated: 2024-06-04 07:05:48
标题: 高效检测使用贝叶斯替代模型生成的LLM文本
摘要: 机器生成文本的检测,尤其是来自大型语言模型(LLMs),对于防止由其误用造成的严重社会问题至关重要。一些方法在特定数据集上训练专用检测器,但在泛化到未见测试数据时表现不佳,而其他零射击方法往往产生次优性能。尽管最近的DetectGPT表现出有希望的检测性能,但存在显著的效率问题,因为检测一个候选需要查询源LLM并使用数百个扰动。本文旨在弥合这一差距。具体来说,我们提出融入贝叶斯代理模型,这使我们能够基于贝叶斯不确定性选择典型样本,并将得分从典型样本插值到其他样本,以提高查询效率。实证结果表明,我们的方法在低查询预算下明显优于现有方法。值得注意的是,当检测由LLaMA系列模型生成的文本时,我们的方法仅需2或3次查询即可优于使用200次查询的DetectGPT。
更新时间: 2024-06-04 07:05:48
领域: cs.LG,cs.AI,cs.CL
Deep Limit Order Book Forecasting
We exploit cutting-edge deep learning methodologies to explore the predictability of high-frequency Limit Order Book mid-price changes for a heterogeneous set of stocks traded on the NASDAQ exchange. In so doing, we release `LOBFrame', an open-source code base to efficiently process large-scale Limit Order Book data and quantitatively assess state-of-the-art deep learning models' forecasting capabilities. Our results are twofold. We demonstrate that the stocks' microstructural characteristics influence the efficacy of deep learning methods and that their high forecasting power does not necessarily correspond to actionable trading signals. We argue that traditional machine learning metrics fail to adequately assess the quality of forecasts in the Limit Order Book context. As an alternative, we propose an innovative operational framework that evaluates predictions' practicality by focusing on the probability of accurately forecasting complete transactions. This work offers academics and practitioners an avenue to make informed and robust decisions on the application of deep learning techniques, their scope and limitations, effectively exploiting emergent statistical properties of the Limit Order Book.
Updated: 2024-06-04 07:05:33
标题: 深层限价订单簿预测
摘要: 我们利用尖端深度学习方法来探索在纳斯达克交易所交易的一组异质股票的高频限价订单簿中间价变动的可预测性。在此过程中,我们发布了“LOBFrame”,这是一个开源代码库,用于高效处理大规模限价订单簿数据,并定量评估最先进的深度学习模型的预测能力。我们的结果是双重的。我们证明了股票的微观结构特征影响深度学习方法的有效性,并且它们的高预测能力不一定对应可操作的交易信号。我们认为,传统的机器学习指标未能充分评估限价订单簿背景下预测的质量。作为替代方案,我们提出了一种创新的操作框架,通过关注准确预测完整交易的概率来评估预测的实用性。这项工作为学术界和实践者提供了一个途径,使他们能够对深度学习技术的应用、它们的范围和限制做出明智和稳健的决策,有效地利用限价订单簿的新兴统计特性。
更新时间: 2024-06-04 07:05:33
领域: q-fin.TR,cs.LG
Verifying the Generalization of Deep Learning to Out-of-Distribution Domains
Deep neural networks (DNNs) play a crucial role in the field of machine learning, demonstrating state-of-the-art performance across various application domains. However, despite their success, DNN-based models may occasionally exhibit challenges with generalization, i.e., may fail to handle inputs that were not encountered during training. This limitation is a significant challenge when it comes to deploying deep learning for safety-critical tasks, as well as in real-world settings characterized by substantial variability. We introduce a novel approach for harnessing DNN verification technology to identify DNN-driven decision rules that exhibit robust generalization to previously unencountered input domains. Our method assesses generalization within an input domain by measuring the level of agreement between independently trained deep neural networks for inputs in this domain. We also efficiently realize our approach by using off-the-shelf DNN verification engines, and extensively evaluate it on both supervised and unsupervised DNN benchmarks, including a deep reinforcement learning (DRL) system for Internet congestion control -- demonstrating the applicability of our approach for real-world settings. Moreover, our research introduces a fresh objective for formal verification, offering the prospect of mitigating the challenges linked to deploying DNN-driven systems in real-world scenarios.
Updated: 2024-06-04 07:02:59
标题: 验证深度学习对于超出分布领域的泛化效果
摘要: 深度神经网络(DNNs)在机器学习领域发挥着至关重要的作用,展示出在各种应用领域中的最先进性能。然而,尽管取得了成功,基于DNN的模型偶尔可能出现泛化方面的挑战,即可能无法处理在训练过程中未遇到的输入。在将深度学习应用于安全关键任务以及在实际世界中存在大量变量的情况下,这种限制是一个重要挑战。我们引入了一种新方法,利用DNN验证技术来识别那些对以前未遇到的输入领域具有强大泛化能力的DNN驱动决策规则。我们的方法通过测量在该输入领域中独立训练的深度神经网络对输入之间的一致性水平来评估泛化能力。我们还通过使用现成的DNN验证引擎高效实现了我们的方法,并在监督和无监督的DNN基准测试中进行了广泛评估,包括用于互联网拥塞控制的深度强化学习(DRL)系统,展示了我们方法在实际环境中的适用性。此外,我们的研究引入了一个新的目标,即形式验证,为在实际场景中部署DNN驱动系统所面临的挑战提供缓解的可能性。
更新时间: 2024-06-04 07:02:59
领域: cs.LG,cs.LO
ShadowBound: Efficient Heap Memory Protection Through Advanced Metadata Management and Customized Compiler Optimization
In software development, the prevalence of unsafe languages such as C and C++ introduces potential vulnerabilities, especially within the heap, a pivotal component for dynamic memory allocation. Despite its significance, heap management complexities have made heap corruption pervasive, posing severe threats to system security. While prior solutions aiming for temporal and spatial memory safety exhibit overheads deemed impractical, we present ShadowBound, a unique heap memory protection design. At its core, ShadowBound is an efficient out-of-bounds defense that can work with various use-after-free defenses (e.g. MarkUs, FFMalloc, PUMM) without compatibility constraints. We harness a shadow memory-based metadata management mechanism to store heap chunk boundaries and apply customized compiler optimizations tailored for boundary checking. We implemented ShadowBound atop the LLVM framework and integrated three state-of-the-art use-after-free defenses. Our evaluations show that ShadowBound provides robust heap protection with minimal time and memory overhead, suggesting its effectiveness and efficiency in safeguarding real-world programs against prevalent heap vulnerabilities.
Updated: 2024-06-04 07:02:53
标题: ShadowBound:通过先进的元数据管理和定制的编译器优化实现高效的堆内存保护
摘要: 在软件开发中,诸如C和C++等不安全语言的广泛使用引入了潜在的漏洞,特别是在堆中,这是动态内存分配的关键组件。尽管堆的重要性,堆管理的复杂性使堆破坏普遍存在,对系统安全构成严重威胁。尽管先前的解决方案旨在实现时间和空间内存安全,但被认为是不切实际的开销,我们提出了ShadowBound,这是一种独特的堆内存保护设计。在其核心,ShadowBound是一种高效的越界防御,可以与各种用完后释放后的防御(例如MarkUs、FFMalloc、PUMM)一起工作,而无需兼容性约束。我们利用基于影子内存的元数据管理机制来存储堆块边界,并应用定制的编译器优化,专门用于边界检查。我们在LLVM框架上实现了ShadowBound,并集成了三种最先进的用完后释放后的防御。我们的评估表明,ShadowBound提供了强大的堆内存保护,几乎没有时间和内存开销,表明其在保护现实世界程序免受普遍堆漏洞方面的有效性和效率。
更新时间: 2024-06-04 07:02:53
领域: cs.CR
MetaMixer Is All You Need
Transformer, composed of self-attention and Feed-Forward Network, has revolutionized the landscape of network design across various vision tasks. FFN is a versatile operator seamlessly integrated into nearly all AI models to effectively harness rich representations. Recent works also show that FFN functions like key-value memories. Thus, akin to the query-key-value mechanism within self-attention, FFN can be viewed as a memory network, where the input serves as query and the two projection weights operate as keys and values, respectively. We hypothesize that the importance lies in query-key-value framework itself rather than in self-attention. To verify this, we propose converting self-attention into a more FFN-like efficient token mixer with only convolutions while retaining query-key-value framework, namely FFNification. Specifically, FFNification replaces query-key and attention coefficient-value interactions with large kernel convolutions and adopts GELU activation function instead of softmax. The derived token mixer, FFNified attention, serves as key-value memories for detecting locally distributed spatial patterns, and operates in the opposite dimension to the ConvNeXt block within each corresponding sub-operation of the query-key-value framework. Building upon the above two modules, we present a family of Fast-Forward Networks. Our FFNet achieves remarkable performance improvements over previous state-of-the-art methods across a wide range of tasks. The strong and general performance of our proposed method validates our hypothesis and leads us to introduce MetaMixer, a general mixer architecture that does not specify sub-operations within the query-key-value framework. We show that using only simple operations like convolution and GELU in the MetaMixer can achieve superior performance.
Updated: 2024-06-04 07:00:14
标题: MetaMixer 是你所需要的一切
摘要: 变压器由自我注意力和前馈网络组成,已经在各种视觉任务的网络设计领域引起了革命性的变化。前馈网络是一种多功能的运算符,可以无缝地集成到几乎所有人工智能模型中,以有效地利用丰富的表示。最近的研究也表明,前馈网络类似于键值内存。因此,类似于自我注意力中的查询-键-值机制,前馈网络可以被视为一个内存网络,其中输入作为查询,两个投影权重分别作为键和值。我们假设重要性在于查询-键-值框架本身,而不在于自我注意力。为了验证这一点,我们提出将自我注意力转换为更类似于前馈网络的高效令牌混合器,只使用卷积而保留查询-键-值框架,即FFNification。具体而言,FFNification用大内核卷积替换查询-键和注意力系数-值的交互,并采用GELU激活函数,而不是softmax。衍生的令牌混合器,FFNified注意力,用于检测局部分布的空间模式,并在查询-键-值框架的每个对应子操作中的相反维度上运行ConvNeXt块。基于上述两个模块,我们提出了一系列快速前向网络。我们的FFNet在各种任务中实现了明显的性能提升,超过了先前的最先进方法。我们提出的方法的强大和通用性能验证了我们的假设,并引导我们引入MetaMixer,一个不指定查询-键-值框架内子操作的通用混合器架构。我们展示,仅使用简单的运算,如卷积和GELU,在MetaMixer中可以实现更好的性能。
更新时间: 2024-06-04 07:00:14
领域: cs.CV,cs.AI,cs.LG
Why Would You Suggest That? Human Trust in Language Model Responses
The emergence of Large Language Models (LLMs) has revealed a growing need for human-AI collaboration, especially in creative decision-making scenarios where trust and reliance are paramount. Through human studies and model evaluations on the open-ended News Headline Generation task from the LaMP benchmark, we analyze how the framing and presence of explanations affect user trust and model performance. Overall, we provide evidence that adding an explanation in the model response to justify its reasoning significantly increases self-reported user trust in the model when the user has the opportunity to compare various responses. Position and faithfulness of these explanations are also important factors. However, these gains disappear when users are shown responses independently, suggesting that humans trust all model responses, including deceptive ones, equitably when they are shown in isolation. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
Updated: 2024-06-04 06:57:47
标题: 为什么你会建议这样做?人类对语言模型回复的信任
摘要: 大型语言模型(LLMs)的出现揭示了人工智能协作的日益增长的需求,特别是在创造性决策场景中,信任和依赖至关重要。通过对LaMP基准测试中开放式新闻标题生成任务的人类研究和模型评估,我们分析了解释的框架和存在如何影响用户信任和模型性能。总体而言,我们提供证据表明,在用户有机会比较各种响应时,向模型响应中添加解释以证明其推理显著增加了用户对模型的自我报告信任。这些解释的位置和忠实度也是重要因素。然而,当用户独立显示响应时,这些增益消失,这表明当这些响应独立显示时,人类信任所有模型响应,包括欺骗性的响应。我们的发现敦促未来研究深入探讨对人机协作系统信任的微妙评估。
更新时间: 2024-06-04 06:57:47
领域: cs.CL,cs.AI,cs.HC
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech, underexploiting the varying noise information in real world. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model. Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner to guide the reverse denoising process. Meanwhile, a multi-task learning scheme is devised to jointly optimize SE and NC tasks to enhance the noise specificity of conditioner. NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models. Experiments on VB-DEMAND dataset show that NASE effectively improves multiple mainstream diffusion SE models, especially on unseen noises.
Updated: 2024-06-04 06:57:43
标题: 噪声感知语音增强技术:基于扩散概率模型
摘要: 随着扩散模型的最新进展,生成性语音增强(SE)由于其对未知测试噪声具有巨大潜力,已引起了研究兴趣的激增。然而,现有研究主要集中在清晰语音的固有特性上,未充分利用现实世界中变化的噪声信息。在本文中,我们提出了一种噪声感知语音增强(NASE)方法,该方法提取特定于噪声的信息,以指导扩散模型中的反向过程。具体来说,我们设计了一个噪声分类(NC)模型,将声学嵌入作为噪声条件器,以指导反向降噪过程。同时,我们设计了一个多任务学习方案,共同优化SE和NC任务,以增强条件器的噪声特异性。NASE被证明是一个可以推广到任何扩散SE模型的即插即用模块。在VB-DEMAND数据集上的实验表明,NASE有效地改进了多个主流扩散SE模型,特别是在未知噪声上。
更新时间: 2024-06-04 06:57:43
领域: eess.AS,cs.LG,cs.SD
On the Mode-Seeking Properties of Langevin Dynamics
The Langevin Dynamics framework, which aims to generate samples from the score function of a probability distribution, is widely used for analyzing and interpreting score-based generative modeling. While the convergence behavior of Langevin Dynamics under unimodal distributions has been extensively studied in the literature, in practice the data distribution could consist of multiple distinct modes. In this work, we investigate Langevin Dynamics in producing samples from multimodal distributions and theoretically study its mode-seeking properties. We prove that under a variety of sub-Gaussian mixtures, Langevin Dynamics is unlikely to find all mixture components within a sub-exponential number of steps in the data dimension. To reduce the mode-seeking tendencies of Langevin Dynamics, we propose Chained Langevin Dynamics, which divides the data vector into patches of constant size and generates every patch sequentially conditioned on the previous patches. We perform a theoretical analysis of Chained Langevin Dynamics by reducing it to sampling from a constant-dimensional distribution. We present the results of several numerical experiments on synthetic and real image datasets, supporting our theoretical results on the iteration complexities of sample generation from mixture distributions using the chained and vanilla Langevin Dynamics. The code is available at https://github.com/Xiwei-Cheng/Chained_LD.
Updated: 2024-06-04 06:57:12
标题: 关于 Langevin 动力学的模式寻找特性
摘要: 朗之万动力学框架旨在从概率分布的得分函数中生成样本,广泛用于分析和解释基于得分的生成建模。虽然文献中已对单峰分布下朗之万动力学的收敛性行为进行了广泛研究,但在实际中,数据分布可能包含多个不同的模式。在这项工作中,我们研究了朗之万动力学在生成多峰分布样本方面的性能,并在理论上研究了其寻找模式的特性。我们证明,在各种次高斯混合分布下,朗之万动力学不太可能在数据维度的次指数步数内找到所有混合成分。为了减少朗之万动力学的寻找模式的倾向,我们提出了链式朗之万动力学,它将数据向量分为恒定大小的块,并依次生成每个块,条件是先前的块。我们通过将链式朗之万动力学简化为从常数维度分布中采样的方法进行了理论分析。我们在合成和真实图像数据集上进行了几次数值实验,支持我们关于使用链式和普通朗之万动力学从混合分布中生成样本的迭代复杂性的理论结果。代码可在https://github.com/Xiwei-Cheng/Chained_LD 上找到。
更新时间: 2024-06-04 06:57:12
领域: cs.LG,stat.ML
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization
We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
Updated: 2024-06-04 06:56:41
标题: 自适应和最优的二阶乐观方法用于极小化优化
摘要: 我们提出了一种自适应的、无需线搜索的二阶方法,具有最优的收敛速度,用于解决凸凹极小-极大问题。通过自适应步长,我们的算法采用简单的更新规则,每次迭代只需解决一个线性系统,消除了线搜索或回溯机制的需要。具体来说,我们基于乐观方法,并适当将其与二阶信息结合起来。此外,与常见的自适应方案不同,我们将步长递归地定义为梯度范数和乐观更新中的预测误差的函数。我们首先分析了一种变体,其中步长需要了解Hessian矩阵的Lipschitz常数。在额外假设梯度Lipschitz连续的情况下,我们进一步设计了一个无参数版本,通过在本地跟踪Hessian矩阵的Lipschitz常数并确保迭代保持有界来实现。我们还通过将其与现有的二阶算法进行比较,评估了我们算法的实际性能,用于极小-极大优化。
更新时间: 2024-06-04 06:56:41
领域: math.OC,cs.LG,stat.ML
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
As text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant hindrance to text-to-4D performance due to its Janus-faced and texture-unrealistic problems coupled with high computational costs. In this paper, we propose \textbf{P}ixel-\textbf{L}evel \textbf{A}lignments for Text-to-\textbf{4D} Gaussian Splatting (\textbf{PLA4D}), a novel method that utilizes text-to-video frames as explicit pixel alignment targets to generate static 3D objects and inject motion into them. Specifically, we introduce Focal Alignment to calibrate camera poses for rendering and GS-Mesh Contrastive Learning to distill geometry priors from rendered image contrasts at the pixel level. Additionally, we develop Motion Alignment using a deformation network to drive changes in Gaussians and implement Reference Refinement for smooth 4D object surfaces. These techniques enable 4D Gaussian Splatting to align geometry, texture, and motion with generated videos at the pixel level. Compared to previous methods, PLA4D produces synthesized outputs with better texture details in less time and effectively mitigates the Janus-faced problem. PLA4D is fully implemented using open-source models, offering an accessible, user-friendly, and promising direction for 4D digital content creation. Our project page: https://github.com/MiaoQiaowei/PLA4D.github.io.
Updated: 2024-06-04 06:56:39
标题: PLA4D:用于文本到4D高斯喷涂的像素级对齐
摘要: 随着文本条件扩散模型(DMs)在图像、视频和3D生成方面取得突破,研究界的重点已经转向更具挑战性的任务,即文本到4D合成,这引入了时间维度以生成动态的3D对象。在这个背景下,我们确定了得分蒸馏采样(SDS),这是一种广泛使用的文本到3D合成技术,由于其具有两面性和纹理不真实的问题以及高计算成本,它成为文本到4D性能的重要障碍。在本文中,我们提出了一种新颖的方法,称为基于像素级对齐的文本到4D高斯喷洒(PLA4D),该方法利用文本到视频帧作为显式像素对齐目标生成静态3D对象并将动作注入到其中。具体而言,我们引入了焦点对齐来校准渲染的相机姿势,以及GS-Mesh对比学习来从像素级别的渲染图像对比中提炼几何先验。此外,我们使用变形网络开发了运动对齐,以驱动高斯变化,并实现参考细化以获得平滑的4D对象表面。这些技术使4D高斯喷洒能够在像素级别与生成的视频对齐几何、纹理和动作。与先前的方法相比,PLA4D在更短的时间内产生了具有更好纹理细节的合成输出,并有效地缓解了两面性问题。PLA4D完全使用开源模型实现,为4D数字内容创作提供了一种易于访问、用户友好和有前途的方向。我们的项目页面:https://github.com/MiaoQiaowei/PLA4D.github.io。
更新时间: 2024-06-04 06:56:39
领域: cs.CV,cs.AI
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs. However, these works consider the generated programs as an indivisible entity, which falls short for LLMs in debugging the programs, especially when the programs contain complex logic flows and data operations. In contrast, when human developers debug programs, they typically set breakpoints and selectively examine runtime execution information. The execution flow and the intermediate variables play a crucial role in the debugging process, yet they are underutilized in the existing literature on code generation. In this study, we introduce Large Language Model Debugger (LDB), a novel debugging framework that enables LLMs to refine their generated programs with the runtime execution information. Specifically, LDB segments the programs into basic blocks and tracks the values of intermediate variables after each block throughout the runtime execution. This allows LLMs to concentrate on simpler code units within the overall execution flow, verify their correctness against the task description block by block, and efficiently pinpoint any potential errors. Experiments demonstrate that LDB consistently enhances the baseline performance by up to 9.8% across the HumanEval, MBPP, and TransCoder benchmarks, archiving new state-of-the-art performance in code debugging for various LLM selections.
Updated: 2024-06-04 06:55:27
标题: LDB:通过逐步验证运行时执行的大型语言模型调试器
摘要: 大型语言模型(LLMs)正在在代码生成方面取得重要进展。最近的研究不仅在一次性代码生成中取得进展,还进一步将单元测试和程序验证器整合到LLMs中,以迭代地改进生成的程序。然而,这些研究将生成的程序视为一个不可分割的实体,这在LLMs调试程序时存在不足,特别是当程序包含复杂的逻辑流和数据操作时。相比之下,当人类开发者调试程序时,他们通常设置断点并选择性地检查运行时执行信息。执行流程和中间变量在调试过程中起着至关重要的作用,然而它们在现有的代码生成文献中被低估。在本研究中,我们介绍了一个新的调试框架——大型语言模型调试器(LDB),它使LLMs能够通过运行时执行信息改进生成的程序。具体来说,LDB将程序分割成基本块,并在每个块之后跟踪中间变量的值,贯穿整个运行时执行过程。这使得LLMs可以集中精力在整体执行流程中的简单代码单元上,逐块验证其正确性,并高效地定位任何潜在错误。实验证明,LDB在HumanEval、MBPP和TransCoder基准测试中始终将基准性能提高了高达9.8%,在各种LLM选择中实现了代码调试的新的最先进性能。
更新时间: 2024-06-04 06:55:27
领域: cs.SE,cs.AI,cs.CL
Parameterizing Federated Continual Learning for Reproducible Research
Federated Learning (FL) systems evolve in heterogeneous and ever-evolving environments that challenge their performance. Under real deployments, the learning tasks of clients can also evolve with time, which calls for the integration of methodologies such as Continual Learning. To enable research reproducibility, we propose a set of experimental best practices that precisely capture and emulate complex learning scenarios. Our framework, Freddie, is the first entirely configurable framework for Federated Continual Learning (FCL), and it can be seamlessly deployed on a large number of machines thanks to the use of Kubernetes and containerization. We demonstrate the effectiveness of Freddie on two use cases, (i) large-scale FL on CIFAR100 and (ii) heterogeneous task sequence on FCL, which highlight unaddressed performance challenges in FCL scenarios.
Updated: 2024-06-04 06:54:53
标题: 参数化联邦式持续学习以实现可重复研究
摘要: 联邦学习(FL)系统在异构和不断发展的环境中发展,挑战其性能。在真实部署环境中,客户端的学习任务也会随时间而演变,这需要集成诸如持续学习等方法。为了实现研究的可重现性,我们提出了一套精确捕捉和模拟复杂学习场景的实验最佳实践。我们的框架Freddie是第一个完全可配置的联邦持续学习(FCL)框架,由于使用了Kubernetes和容器化技术,可以轻松部署在大量机器上。我们展示了Freddie在两个用例上的有效性,即(i)CIFAR100上的大规模FL和(ii)FCL上的异构任务序列,这突出了FCL场景中未解决的性能挑战。
更新时间: 2024-06-04 06:54:53
领域: cs.LG,cs.DC,I.2.11
Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer
In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome these challenges, we propose an innovative multi-task learning model, Physics-informed Embedding Network with Multi-Task Transformer (PEMT-Net), which enhances decoding performance through physics-informed embedding and deep learning techniques. PEMT-Net consists of two principal components: feature augmentation and classification. For feature augmentation, we propose a novel approach by creating neural embedding graphs via node embedding, utilizing random walks to simulate the physical diffusion of neural information. This method captures both local and non-local information overflow and proposes a position encoding based on relative physical coordinates. In the classification segment, we propose adaptive embedding fusion to maximally capture linear and non-linear characteristics. Furthermore, we propose an innovative parameter-sharing mechanism to optimize the retention and learning of extracted features. Experiments on a specific dataset demonstrate PEMT-Net's significant performance in multi-task auditory signal decoding, surpassing existing methods and offering new insights into the brain's mechanisms for processing complex auditory information.
Updated: 2024-06-04 06:53:32
标题: 通过物理信息嵌入网络与多任务变压器理解听觉诱发脑信号
摘要: 在脑机交互和认知神经科学领域,从基于任务的功能磁共振成像(fMRI)中有效解码听觉信号对于理解大脑如何处理复杂听觉信息至关重要。尽管现有方法已增强了解码能力,但在信息利用和模型表示方面仍存在局限性。为了克服这些挑战,我们提出了一种创新的多任务学习模型,即物理信息嵌入网络与多任务变压器(PEMT-Net),通过物理信息嵌入和深度学习技术增强解码性能。PEMT-Net由两个主要组成部分组成:特征增强和分类。对于特征增强,我们提出了一种通过节点嵌入创建神经嵌入图的新方法,利用随机漫步模拟神经信息的物理扩散。该方法捕获了本地和非本地信息溢出,并提出了基于相对物理坐标的位置编码。在分类部分,我们提出了自适应嵌入融合以最大程度地捕获线性和非线性特征。此外,我们提出了一种创新的参数共享机制,以优化提取特征的保留和学习。对特定数据集的实验证明了PEMT-Net在多任务听觉信号解码方面的显著性能,超越了现有方法,并为理解大脑处理复杂听觉信息的机制提供了新的见解。
更新时间: 2024-06-04 06:53:32
领域: q-bio.NC,cs.LG,cs.SD,eess.AS
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning
Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretically determined solely by current states and actions based on the Markov Decision Process (MDP), and (2) global correlation, where each step's features are related to long-term historical information due to the time-continuous nature of trajectories. In this paper, we propose a novel action sequence predictor, named Mamba Decision Maker (MambaDM), where Mamba is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. In particular, we introduce a novel mixer module that proficiently extracts and integrates both global and local features of the input sequence, effectively capturing interrelationships in RL datasets. Extensive experiments demonstrate that MambaDM achieves state-of-the-art performance in Atari and OpenAI Gym datasets. Furthermore, we empirically investigate the scaling laws of MambaDM, finding that increasing model size does not bring performance improvement, but scaling the dataset amount by 2x for MambaDM can obtain up to 33.7% score improvement on Atari dataset. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements in robust and efficient decision-making systems. Our code will be available at https://github.com/AndyCao1125/MambaDM.
Updated: 2024-06-04 06:49:18
标题: 蛇毒作为决策者:探索离线强化学习中的多尺度序列建模
摘要: 顺序建模在离线强化学习(RL)中展示了显著的能力,其中Decision Transformer(DT)是其中最显著的代表之一,取得了显著的成功。然而,RL轨迹具有与传统序列(例如文本或音频)不同的独特特性:(1)局部相关性,RL中的下一个状态在理论上仅由当前状态和动作基于马尔可夫决策过程(MDP)确定,(2)全局相关性,由于轨迹的时间连续性质,每一步的特征与长期历史信息相关。在本文中,我们提出了一种新颖的动作序列预测器,名为Mamba决策制定者(MambaDM),其中Mamba预计将成为序列建模范式的一个有前途的替代方案,因为它可以有效地建模多尺度依赖关系。特别地,我们引入了一个新颖的混合模块,能够有效地提取和整合输入序列的全局和局部特征,有效地捕捉RL数据集中的相互关系。大量实验证明,MambaDM在Atari和OpenAI Gym数据集中取得了最先进的性能。此外,我们通过实证研究了MambaDM的扩展规律,发现增加模型大小并不会带来性能改进,但是将MambaDM的数据集规模扩大2倍可以在Atari数据集上获得高达33.7%的得分改进。本文深入探讨了MambaDM在RL领域中的序列建模能力,为未来健壮和高效的决策系统的发展铺平道路。我们的代码将在https://github.com/AndyCao1125/MambaDM 上提供。
更新时间: 2024-06-04 06:49:18
领域: cs.LG
A Risk Estimation Study of Native Code Vulnerabilities in Android Applications
Android is the most used Operating System worldwide for mobile devices, with hundreds of thousands of apps downloaded daily. Although these apps are primarily written in Java and Kotlin, advanced functionalities such as graphics or cryptography are provided through native C/C++ libraries. These libraries can be affected by common vulnerabilities in C/C++ code (e.g., memory errors such as buffer overflow), through which attackers can read/modify data or execute arbitrary code. The detection and assessment of vulnerabilities in Android native code have only been recently explored by previous research work. In this paper, we propose a fast risk-based approach that provides a risk score related to the native part of an Android application. In this way, before an app is released, the developer can check if the app may contain vulnerabilities in the Native Code and, if present, patch them to publish a more secure application. To this end, we first use fast regular expressions to detect library versions and possible vulnerable functions. Then, we apply scores extracted from a vulnerability database to the analyzed application, thus obtaining a risk score representative of the whole app. We demonstrate the validity of our approach by performing a large-scale analysis on more than $100,000$ applications (but only $40\%$ contained native code) and $15$ popular libraries carrying known vulnerabilities. The attained results show that many applications contain well-known vulnerabilities that miscreants can potentially exploit, posing serious concerns about the security of the whole Android applications landscape.
Updated: 2024-06-04 06:44:07
标题: 一项关于安卓应用中本地代码漏洞风险估计的研究
摘要: 安卓是全球移动设备中使用最广泛的操作系统,每天都有数十万个应用被下载。尽管这些应用主要是用Java和Kotlin编写的,但高级功能(如图形或加密)是通过本地C/C++库提供的。这些库可能受到C/C++代码中常见漏洞的影响(例如,内存错误如缓冲区溢出),通过这些漏洞攻击者可以读取/修改数据或执行任意代码。先前的研究工作仅最近才开始探讨在安卓本地代码中检测和评估漏洞。在本文中,我们提出了一种快速基于风险的方法,为安卓应用的本地部分提供与风险相关的风险评分。通过这种方式,在发布应用之前,开发人员可以检查应用是否可能包含本地代码中的漏洞,并在存在漏洞时进行修补,以发布更安全的应用。为此,我们首先使用快速正则表达式来检测库版本和可能的易受攻击函数。然后,我们将从漏洞数据库中提取的评分应用于分析的应用程序,从而获得代表整个应用程序的风险评分。我们通过对100,000多个应用程序(但仅有40%含有本地代码)和15个流行的携带已知漏洞的库进行大规模分析来证明我们方法的有效性。获得的结果表明,许多应用程序包含潜在可能被不良分子利用的众所周知的漏洞,这引发了对整个安卓应用程序安全性的严重关注。
更新时间: 2024-06-04 06:44:07
领域: cs.CR
ODE-based Learning to Optimize
Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this paper, we present a comprehensive framework integrating the inertial systems with Hessian-driven damping equation (ISHD) and learning-based approaches for developing optimization methods through a deep synergy of theoretical insights. We first establish the convergence condition for ensuring the convergence of the solution trajectory of ISHD. Then, we show that provided the stability condition, another relaxed requirement on the coefficients of ISHD, the sequence generated through the explicit Euler discretization of ISHD converges, which gives a large family of practical optimization methods. In order to select the best optimization method in this family for certain problems, we introduce the stopping time, the time required for an optimization method derived from ISHD to achieve a predefined level of suboptimality. Then, we formulate a novel learning to optimize (L2O) problem aimed at minimizing the stopping time subject to the convergence and stability condition. To navigate this learning problem, we present an algorithm combining stochastic optimization and the penalty method (StoPM). The convergence of StoPM using the conservative gradient is proved. Empirical validation of our framework is conducted through extensive numerical experiments across a diverse set of optimization problems. These experiments showcase the superior performance of the learned optimization methods.
Updated: 2024-06-04 06:39:45
标题: 基于ODE的学习优化
摘要: 近年来,人们对通过普通微分方程(ODEs)的视角理解加速方法产生了越来越大的兴趣。尽管在理论上取得了进展,但将连续时间模型中观察到的快速收敛转化为离散时间迭代方法仍面临着重大挑战。在本文中,我们提出了一个综合框架,通过深度融合理论洞察和惯性系统与Hessian驱动阻尼方程(ISHD)以及基于学习的方法,用于开发优化方法。我们首先建立了确保ISHD解轨迹收敛的收敛条件。然后,我们展示了在提供稳定条件的情况下,ISHD的显式Euler离散化生成的序列收敛,从而提供了一个大家族的实用优化方法。为了为特定问题选择最佳的优化方法,我们引入了停止时间,即从ISHD衍生的优化方法达到预定义次优状态所需的时间。然后,我们制定了一个旨在最小化停止时间的新颖学习优化(L2O)问题,符合收敛和稳定条件。为了解决这个学习问题,我们提出了一个将随机优化和惩罚方法结合的算法(StoPM)。使用保守梯度证明了StoPM的收敛性。通过对一系列不同的优化问题进行广泛的数值实验,我们对框架进行了实证验证。这些实验展示了学习到的优化方法的卓越性能。
更新时间: 2024-06-04 06:39:45
领域: math.OC,cs.AI
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve inference efficiency. Structured pruning improves LM inference efficiency by removing consistent parameter blocks, yet often increases training memory and time. To improve both training and inference efficiency, we introduce APT that adaptively prunes and tunes parameters for the LMs. At the early stage of fine-tuning, APT dynamically adds salient tuning parameters for fast and accurate convergence while discarding unimportant parameters for efficiency. Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86.4% LLaMA models' performance with 70% parameters remained. Furthermore, APT speeds up LMs fine-tuning by up to 8x and reduces large LMs memory training footprint by up to 70%.
Updated: 2024-06-04 06:39:23
标题: APT:自适应剪枝和调整预训练语言模型以实现高效训练和推理
摘要: 使用大型语言模型(LM)进行微调和推理通常被认为是昂贵的。通过对预训练LM进行参数高效微调,可以通过更新少量LM参数来减少训练内存,但并不改善推理效率。结构化剪枝通过去除一致的参数块来提高LM推理效率,但通常会增加训练内存和时间。为了改善训练和推理效率,我们引入了APT,该方法能够自适应地剪枝和微调LM的参数。在微调的早期阶段,APT动态地添加显著的微调参数,以实现快速和准确的收敛,同时丢弃无关的参数以提高效率。与基线相比,我们的实验表明,当将RoBERTa和T5模型的参数剪枝保留40%时,APT能够保持高达98%的任务性能,同时将LLaMA模型的性能保持在70%参数剩余的情况下达到86.4%。此外,APT可以将LM的微调加速高达8倍,并将大型LM的内存训练占用减少高达70%。
更新时间: 2024-06-04 06:39:23
领域: cs.CL,cs.LG
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping
Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models. We empirically demonstrate that PCC can effectively mitigate unintended memorization in ASR models. Surprisingly, we find that PCC positively influences ASR performance metrics, leading to improved convergence rates and reduced word error rates. To avoid tuning the additional hyperparameter introduced by PCC, we further propose a novel variant, adaptive per-core clipping (APCC), for streamlined optimization. Our findings highlight the multifaceted benefits of PCC as a strategy for robust, privacy-forward ASR model training.
Updated: 2024-06-04 06:34:33
标题: 高效训练ASR模型,减少记忆,提高性能:基于Per-core Clipping技术
摘要: 梯度裁剪在训练大规模自动语音识别(ASR)模型中发挥着重要作用。通常应用于小批量梯度以防止梯度爆炸,并应用于单个样本梯度以减轻意外记忆。本文系统地研究了梯度裁剪的特定粒度,即每核心裁剪(PCC),在训练各种ASR模型时的影响。我们通过实验证明,PCC可以有效地减轻ASR模型中的意外记忆。令人惊讶的是,我们发现PCC对ASR性能指标有积极影响,导致改善收敛速度和减少词错误率。为了避免调整PCC引入的额外超参数,我们进一步提出了一种新颖的变体,自适应每核心裁剪(APCC),用于简化优化。我们的研究结果突显了PCC作为一种用于稳健、面向隐私的ASR模型训练策略的多方面益处。
更新时间: 2024-06-04 06:34:33
领域: cs.CR,cs.CL,cs.SD,eess.AS
Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.
Updated: 2024-06-04 06:33:13
标题: 长期对话中因果知觉的位置去偏见微调
摘要: 对话系统的核心是基于广泛的对话历史生成相关、信息丰富和类人的回复。最近,对话生成领域普遍采用大型语言模型(LLMs),由于其在生成话语方面的强大能力。然而,这些模型存在一个自然的缺陷,即固有的位置偏见,这可能导致它们更多地关注附近的话语而不是因果相关的话语,从而在长期对话中生成不相关和通用的回复。为了缓解这一问题,在本文中,我们提出了一种新颖的方法,名为因果感知长期对话框架(CPD),它采用基于扰动的因果变量发现方法从对话历史中提取因果相关的话语,并在微调过程中增强模型的因果感知。具体而言,在CPD中提出了一种局部位置感知方法,用于消除句子间位置相关性,这有助于模型基于扰动提取因果相关的话语。然后,还提出了一种因果感知微调策略,通过不同地扰动因果相关和非因果相关的话语,以增强发现因果不变因素的能力,用于回复生成。在两个数据集上的实验结果证明,我们提出的方法可以有效缓解多个LLMs的位置偏见,并与现有基线相比取得显著进展。
更新时间: 2024-06-04 06:33:13
领域: cs.CL,cs.AI
Strengthening Network Intrusion Detection in IoT Environments with Self-Supervised Learning and Few Shot Learning
The Internet of Things (IoT) has been introduced as a breakthrough technology that integrates intelligence into everyday objects, enabling high levels of connectivity between them. As the IoT networks grow and expand, they become more susceptible to cybersecurity attacks. A significant challenge in current intrusion detection systems for IoT includes handling imbalanced datasets where labeled data are scarce, particularly for new and rare types of cyber attacks. Existing literature often fails to detect such underrepresented attack classes. This paper introduces a novel intrusion detection approach designed to address these challenges. By integrating Self Supervised Learning (SSL), Few Shot Learning (FSL), and Random Forest (RF), our approach excels in learning from limited and imbalanced data and enhancing detection capabilities. The approach starts with a Deep Infomax model trained to extract key features from the dataset. These features are then fed into a prototypical network to generate discriminate embedding. Subsequently, an RF classifier is employed to detect and classify potential malware, including a range of attacks that are frequently observed in IoT networks. The proposed approach was evaluated through two different datasets, MaleVis and WSN-DS, which demonstrate its superior performance with accuracies of 98.60% and 99.56%, precisions of 98.79% and 99.56%, recalls of 98.60% and 99.56%, and F1-scores of 98.63% and 99.56%, respectively.
Updated: 2024-06-04 06:30:22
标题: 在物联网环境中通过自监督学习和少样本学习加强网络入侵检测
摘要: 物联网(IoT)作为一项突破性技术,将智能集成到日常物品中,使它们之间能够实现高度连接。随着物联网网络的增长和扩展,它们变得更容易受到网络安全攻击。目前物联网入侵检测系统面临的一个重要挑战是处理数据不平衡的情况,其中标记数据稀缺,尤其是针对新型和罕见类型的网络攻击。现有文献往往无法检测到这些少数类别的攻击。本文介绍了一种旨在解决这些挑战的新型入侵检测方法。通过集成自监督学习(SSL)、少样本学习(FSL)和随机森林(RF),我们的方法在学习有限和不平衡数据以及增强检测能力方面表现出色。该方法首先使用训练有素的深度Infomax模型从数据集中提取关键特征。然后将这些特征输入到原型网络中生成区分性嵌入。随后,采用RF分类器来检测和分类潜在的恶意软件,包括在物联网网络中经常观察到的一系列攻击。提出的方法通过两个不同的数据集MaleVis和WSN-DS进行评估,结果显示其在准确率方面的表现优越,分别为98.60%和99.56%,精确率为98.79%和99.56%,召回率为98.60%和99.56%,F1分数分别为98.63%和99.56%。
更新时间: 2024-06-04 06:30:22
领域: cs.CR,cs.AI
Bayesian Mesh Optimization for Graph Neural Networks to Enhance Engineering Performance Prediction
In engineering design, surrogate models are widely employed to replace computationally expensive simulations by leveraging design variables and geometric parameters from computer-aided design (CAD) models. However, these models often lose critical information when simplified to lower dimensions and face challenges in parameter definition, especially with the complex 3D shapes commonly found in industrial datasets. To address these limitations, we propose a Bayesian graph neural network (GNN) framework for a 3D deep-learning-based surrogate model that predicts engineering performance by directly learning geometric features from CAD using mesh representation. Our framework determines the optimal size of mesh elements through Bayesian optimization, resulting in a high-accuracy surrogate model. Additionally, it effectively handles the irregular and complex structures of 3D CADs, which differ significantly from the regular and uniform pixel structures of 2D images typically used in deep learning. Experimental results demonstrate that the quality of the mesh significantly impacts the prediction accuracy of the surrogate model, with an optimally sized mesh achieving superior performance. We compare the performance of models based on various 3D representations such as voxel, point cloud, and graph, and evaluate the computational costs of Monte Carlo simulation and Bayesian optimization methods to find the optimal mesh size. We anticipate that our proposed framework has the potential to be applied to mesh-based simulations across various engineering fields, leveraging physics-based information commonly used in computer-aided engineering.
Updated: 2024-06-04 06:27:48
标题: 贝叶斯网格优化用于增强工程性能预测的图神经网络
摘要: 在工程设计中,替代模型被广泛应用来替代计算昂贵的仿真,通过利用计算机辅助设计(CAD)模型中的设计变量和几何参数。然而,当这些模型简化为更低维度时,往往会丢失关键信息,并且在参数定义方面面临挑战,特别是在工业数据集中常见的复杂3D形状。为了解决这些限制,我们提出了一种基于贝叶斯图神经网络(GNN)框架的3D深度学习替代模型,通过直接从CAD中使用网格表示学习几何特征来预测工程性能。我们的框架通过贝叶斯优化确定网格元素的最佳大小,从而产生高精度的替代模型。此外,它有效处理3D CAD的不规则和复杂结构,这些结构与通常用于深度学习的2D图像的规则和统一像素结构明显不同。实验结果表明,网格质量显著影响替代模型的预测准确性,优化大小的网格实现了优越的性能。我们比较基于各种3D表示(如体素、点云和图形)的模型性能,并评估蒙特卡洛仿真和贝叶斯优化方法的计算成本,以找到最佳网格大小。我们预计我们提出的框架有潜力应用于各种工程领域的基于网格的仿真,利用常用于计算机辅助工程的基于物理的信息。
更新时间: 2024-06-04 06:27:48
领域: cs.LG,cs.AI,cs.CV,cs.GR
Towards a Better Theoretical Understanding of Independent Subnetwork Training
Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.
Updated: 2024-06-04 06:11:20
标题: 朝着更好的理论理解独立子网络训练
摘要: 现代大规模机器学习的进展离不开数据并行分布式计算范式。由于大规模模型的分布式计算给通信通道带来了巨大压力,最近的研究主要集中在协同设计通信压缩策略和训练算法,以减少通信成本。尽管纯数据并行性可以实现更好的数据扩展,但它受到模型扩展性差的影响。事实上,计算节点受到内存限制的严重限制,阻止了模型规模的进一步增加。因此,训练巨型神经网络模型的最新成就也依赖于某种形式的模型并行性。在这项工作中,我们对Independent Subnetwork Training (IST)进行了更深入的理论研究,这是一种最近提出的高效解决上述问题的技术。我们确定了IST与分布式方法和压缩通信等替代方法之间的基本差异,并对其在二次模型上的优化性能进行了精确分析。
更新时间: 2024-06-04 06:11:20
领域: cs.LG,cs.DC,math.OC
Personalized Topic Selection Model for Topic-Grounded Dialogue
Recently, the topic-grounded dialogue (TGD) system has become increasingly popular as its powerful capability to actively guide users to accomplish specific tasks through topic-guided conversations. Most existing works utilize side information (\eg topics or personas) in isolation to enhance the topic selection ability. However, due to disregarding the noise within these auxiliary information sources and their mutual influence, current models tend to predict user-uninteresting and contextually irrelevant topics. To build user-engaging and coherent dialogue agent, we propose a \textbf{P}ersonalized topic s\textbf{E}lection model for \textbf{T}opic-grounded \textbf{D}ialogue, named \textbf{PETD}, which takes account of the interaction of side information to selectively aggregate such information for more accurately predicting subsequent topics. Specifically, we evaluate the correlation between global topics and personas and selectively incorporate the global topics aligned with user personas. Furthermore, we propose a contrastive learning based persona selector to filter out irrelevant personas under the constraint of lacking pertinent persona annotations. Throughout the selection and generation, diverse relevant side information is considered. Extensive experiments demonstrate that our proposed method can generate engaging and diverse responses, outperforming state-of-the-art baselines across various evaluation metrics.
Updated: 2024-06-04 06:09:49
标题: 个性化主题选择模型用于主题驱动对话
摘要: 最近,基于主题的对话(TGD)系统因其强大的能力而变得越来越受欢迎,可以通过主题引导对话来积极引导用户完成特定任务。大多数现有作品利用边缘信息(例如主题或人物)来增强主题选择能力。然而,由于忽视了这些辅助信息源中的噪声和它们之间的相互影响,当前模型往往会预测用户不感兴趣和与上下文无关的主题。为了建立引人入胜和连贯的对话代理,我们提出了一个名为\textbf{PETD}的\textbf{P}ersonalized topic s\textbf{E}lection model for \textbf{T}opic-grounded \textbf{D}ialogue,考虑了边缘信息的相互作用,以选择性地聚合这些信息,以更准确地预测后续主题。具体来说,我们评估了全局主题和人物之间的相关性,并选择性地整合与用户人物相匹配的全局主题。此外,我们提出了一种基于对比学习的人物选择器,用于在缺乏相关人物注释的约束条件下滤除不相关的人物。在选择和生成过程中,考虑了各种相关的边缘信息。大量实验证明,我们提出的方法可以生成引人入胜且多样化的响应,在各种评估指标上优于现有技术基线。
更新时间: 2024-06-04 06:09:49
领域: cs.CL,cs.AI
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. Built on the temporal point process model for events, our method employs the likelihood function as a score to evaluate generated logic trees. We propose an amortized Expectation-Maximization (EM) learning framework and treat the logic tree as latent variables. In the E-step, we evaluate the posterior distribution over the latent logic trees using an LLM prior and the likelihood of the observed event sequences. LLM provides a high-quality prior for the latent logic trees, however, since the posterior is built over a discrete combinatorial space, we cannot get the closed-form solution. We propose to generate logic tree samples from the posterior using a learnable GFlowNet, which is a diversity-seeking generator for structured discrete variables. The M-step employs the generated logic rules to approximate marginalization over the posterior, facilitating the learning of model parameters and refining the tunable LLM prior parameters. In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations. Empirical demonstrations showcase the promising performance and adaptability of our framework.
Updated: 2024-06-04 06:09:03
标题: 从LLMs中提取事件序列解释的潜在逻辑树
摘要: 现代高风险系统,如医疗保健或机器人技术,通常会生成大量的流式事件序列。我们的目标是设计一个高效的即插即用工具,从大型语言模型(LLMs)中引导基于逻辑树的解释,以提供对每个观察到的事件序列的定制洞见。基于事件的时间点过程模型,我们的方法利用似然函数作为评估生成的逻辑树的分数。我们提出了一种摊销期望最大化(EM)学习框架,并将逻辑树视为潜在变量。在E步中,我们利用LLM先验和观察到的事件序列的似然度评估潜在逻辑树的后验分布。LLM为潜在逻辑树提供了高质量的先验,但是由于后验建立在离散组合空间上,我们无法得到封闭形式的解决方案。我们建议使用可学习的GFlowNet从后验中生成逻辑树样本,GFlowNet是一种用于结构化离散变量的多样性生成器。M步利用生成的逻辑规则来近似边际化后验,促进模型参数的学习和改进可调整的LLM先验参数。在在线设置中,我们本地构建的轻量级模型将迭代地从LLM中提取每个序列的最相关规则,仅使用少量迭代。实证演示展示了我们框架的有希望性能和适应性。
更新时间: 2024-06-04 06:09:03
领域: cs.LG
Dynamic Incremental Optimization for Best Subset Selection
Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.
Updated: 2024-06-04 05:57:16
标题: 动态递增优化用于最佳子集选择
摘要: 最佳子集选择被认为是许多稀疏学习问题的“黄金标准”。已经提出了各种优化技术来解决这个非光滑非凸问题。本文研究了一类$\ell_0$正则化问题的对偶形式。基于原问题和对偶问题的结构开发了一种高效的原始-对偶算法。通过利用对偶范围估计以及增量策略,我们的算法可能减少冗余计算,并改善最佳子集选择的解决方案。对合成和真实数据集的理论分析和实验证实了所提出解决方案的效率和统计性质。
更新时间: 2024-06-04 05:57:16
领域: cs.LG,stat.ML
Adaptive Online Experimental Design for Causal Discovery
Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs by leveraging observational, interventional data, or their combination. The majority of existing causal discovery methods are developed assuming infinite interventional data. We focus on data interventional efficiency and formalize causal discovery from the perspective of online learning, inspired by pure exploration in bandit problems. A graph separating system, consisting of interventions that cut every edge of the graph at least once, is sufficient for learning causal graphs when infinite interventional data is available, even in the worst case. We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system via allocation matching and learns the causal graph based on sampling history. Given any desired confidence value, the algorithm determines a termination condition and runs until it is met. We analyze the algorithm to establish a problem-dependent upper bound on the expected number of required interventional samples. Our proposed algorithm outperforms existing methods in simulations across various randomly generated causal graphs. It achieves higher accuracy, measured by the structural hamming distance (SHD) between the learned causal graph and the ground truth, with significantly fewer samples.
Updated: 2024-06-04 05:55:56
标题: 自适应在线实验设计用于因果发现
摘要: 因果发现旨在通过利用观察性、干预性数据或它们的组合来揭示编码在因果图中的因果关系。现有的大多数因果发现方法都是在假设有无限干预数据的情况下开发的。我们关注数据干预效率,并从在线学习的角度形式化因果发现,受到赌博问题中纯探索的启发。一个由切断图中每条边至少一次的干预组成的分隔系统,在有无限干预数据可用时足以学习因果图,甚至在最坏的情况下也是如此。我们提出了一种跟踪停止的因果发现算法,通过分配匹配自适应地从图分隔系统中选择干预,并基于采样历史学习因果图。根据任何所需的置信度值,算法确定终止条件并运行直到满足条件。我们分析了该算法,建立了对所需干预样本数的问题相关上限。我们提出的算法在各种随机生成的因果图的模拟中优于现有方法。它在学习的因果图和实际情况之间的结构汉明距离(SHD)方面实现了更高的准确性,并且样本数量明显较少。
更新时间: 2024-06-04 05:55:56
领域: cs.LG,stat.AP
EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost
Data augmentation plays a critical role in improving model performance across various domains, but it becomes challenging with graph data due to their complex and irregular structure. To address this issue, we propose EPIC (Edit Path Interpolation via learnable Cost), a novel interpolation-based method for augmenting graph datasets. To interpolate between two graphs lying in an irregular domain, EPIC leverages the concept of graph edit distance, constructing an edit path that represents the transformation process between two graphs via edit operations. Moreover, our method introduces a context-sensitive cost model that accounts for the importance of specific edit operations formulated through a learning framework. This allows for a more nuanced transformation process, where the edit distance is not merely count-based but reflects meaningful graph attributes. With randomly sampled graphs from the edit path, we enrich the training set to enhance the generalization capability of classification models. Experimental evaluations across several benchmark datasets demonstrate that our approach outperforms existing augmentation techniques in many tasks.
Updated: 2024-06-04 05:54:38
标题: EPIC:通过可学习成本的编辑路径插值进行图增强
摘要: 数据增强在提升各个领域模型性能中发挥着关键作用,但在图数据方面由于其复杂且不规则的结构而变得具有挑战性。为解决这一问题,我们提出了EPIC(通过可学习成本的编辑路径插值)方法,这是一种基于插值的用于增强图数据集的新方法。为了在不规则的领域中插值两个图之间,EPIC利用图编辑距离的概念,构建了一个表示两个图之间转换过程的编辑路径,通过编辑操作。此外,我们的方法引入了一个上下文敏感的成本模型,考虑了通过学习框架制定的特定编辑操作的重要性。这允许进行更加细致的转换过程,其中编辑距离不仅仅基于计数,而且反映了有意义的图属性。通过从编辑路径中随机采样的图,我们丰富了训练集,以增强分类模型的泛化能力。对几个基准数据集的实验评估表明,我们的方法在许多任务中优于现有的增强技术。
更新时间: 2024-06-04 05:54:38
领域: cs.LG,cs.AI
Competition-Level Problems are Effective LLM Evaluators
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently. This paper aims to evaluate the reasoning capacities of LLMs, specifically in solving recent competition-level programming problems in Codeforces, which are expert-crafted and unique, requiring deep understanding and robust reasoning skills. We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered. Surprisingly, the peiceived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems, which shows the potential data contamination, as well as the challenges for any existing LLM to solve unseen complex reasoning problems. We further explore various approaches such as fine-tuning, Chain-of-Thought prompting and problem description simplification, unfortunately none of them is able to consistently mitigate the challenges. Through our work, we emphasis the importance of this excellent data source for assessing the genuine reasoning capabilities of LLMs, and foster the development of LLMs with stronger reasoning abilities and better generalization in the future.
Updated: 2024-06-04 05:49:50
标题: 竞赛水平问题是有效的LLM评估者
摘要: 大型语言模型(LLMs)已经展示出令人印象深刻的推理能力,然而关于这些能力以及最近潜在的数据污染问题仍在持续辩论中。本文旨在评估LLMs的推理能力,具体地解决Codeforces中最近比赛级别的编程问题,这些问题是由专家精心设计的,独特的,需要深刻的理解和强大的推理能力。我们首先对GPT-4在这一任务中的零猜测试性能进行了全面评估,考虑了问题的发布时间、难度和遇到的错误类型等各个方面。令人惊讶的是,GPT-4在2021年9月之后的问题中始终表现出像悬崖般的下降,跨越所有难度和问题类型,这显示了潜在的数据污染,以及任何现有LLM解决未见过的复杂推理问题的挑战。我们进一步探讨了各种方法,如微调、思维链提示和问题描述简化,不幸的是,没有一种方法能够持续缓解挑战。通过我们的工作,我们强调了这一优秀数据源对评估LLMs真正的推理能力的重要性,并促进将来具有更强推理能力和更好泛化能力的LLMs的发展。
更新时间: 2024-06-04 05:49:50
领域: cs.CL,cs.AI
Zyda: A 1.3T Dataset for Open Language Modeling
The size of large language models (LLMs) has scaled dramatically in recent years and their computational and data requirements have surged correspondingly. State-of-the-art language models, even at relatively smaller sizes, typically require training on at least a trillion tokens. This rapid advancement has eclipsed the growth of open-source datasets available for large-scale LLM pretraining. In this paper, we introduce Zyda (Zyphra Dataset), a dataset under a permissive license comprising 1.3 trillion tokens, assembled by integrating several major respected open-source datasets into a single, high-quality corpus. We apply rigorous filtering and deduplication processes, both within and across datasets, to maintain and enhance the quality derived from the original datasets. Our evaluations show that Zyda not only competes favorably with other open datasets like Dolma, FineWeb, and RefinedWeb, but also substantially improves the performance of comparable models from the Pythia suite. Our rigorous data processing methods significantly enhance Zyda's effectiveness, outperforming even the best of its constituent datasets when used independently.
Updated: 2024-06-04 05:47:17
标题: Zyda:一份用于开放语言建模的1.3T数据集
摘要: 近年来,大型语言模型(LLMs)的规模急剧扩大,其计算和数据需求相应激增。即使在相对较小的规模上,最先进的语言模型通常需要至少训练一万亿个标记。这一快速发展已经超越了用于大规模LLM预训练的开源数据集的增长。在本文中,我们介绍了Zyda(Zyphra数据集),这是一个根据自由许可证包含1.3万亿个标记的数据集,通过将几个主要受尊敬的开源数据集整合到一个高质量语料库中而组装而成。我们应用严格的过滤和去重处理,既在数据集内部又在数据集之间,以保持和增强从原始数据集中获得的质量。我们的评估显示,Zyda不仅与其他开放数据集(如Dolma、FineWeb和RefinedWeb)竞争有利,而且显著提高了Pythia套件中相似模型的性能。我们严格的数据处理方法显著增强了Zyda的效果,甚至在独立使用时也超越了其组成数据集中最好的一个。
更新时间: 2024-06-04 05:47:17
领域: cs.CL,cs.AI
Neural Common Neighbor with Completion for Link Prediction
In this work, we propose a novel link prediction model and further boost it by studying graph incompleteness. First, we introduce MPNN-then-SF, an innovative architecture leveraging structural feature (SF) to guide MPNN's representation pooling, with its implementation, namely Neural Common Neighbor (NCN). NCN exhibits superior expressiveness and scalability compared with existing models, which can be classified into two categories: SF-then-MPNN, augmenting MPNN's input with SF, and SF-and-MPNN, decoupling SF and MPNN. Second, we investigate the impact of graph incompleteness -- the phenomenon that some links are unobserved in the input graph -- on SF, like the common neighbor. Through dataset visualization, we observe that incompleteness reduces common neighbors and induces distribution shifts, significantly affecting model performance. To address this issue, we propose to use a link prediction model to complete the common neighbor structure. Combining this method with NCN, we propose Neural Common Neighbor with Completion (NCNC). NCN and NCNC outperform recent strong baselines by large margins, and NCNC further surpasses state-of-the-art models in standard link prediction benchmarks. Our code is available at https://github.com/GraphPKU/NeuralCommonNeighbor.
Updated: 2024-06-04 05:46:09
标题: 神经网络常见邻居与补全用于链接预测
摘要: 在这项工作中,我们提出了一种新颖的链接预测模型,并通过研究图不完整性进一步提升了它。首先,我们介绍了MPNN-then-SF,这是一种创新的架构,利用结构特征(SF)来引导MPNN的表示汇集,其实现方式是神经共同邻居(NCN)。与现有模型相比,NCN表现出更优越的表达能力和可扩展性,这些模型可以分为两类:SF-then-MPNN,通过SF增强MPNN的输入,以及SF-and-MPNN,解耦SF和MPNN。其次,我们研究了图不完整性对SF的影响 - 即某些链接在输入图中是未观察到的现象 - 就像共同邻居一样。通过数据集可视化,我们观察到不完整性会减少共同邻居并引起分布转移,显著影响模型性能。为了解决这个问题,我们提出使用链接预测模型来完善共同邻居结构。结合这种方法和NCN,我们提出了带有完成功能的神经共同邻居(NCNC)。NCN和NCNC在大幅度超过最近的强基线的基础上表现出色,而NCNC进一步超越了标准链接预测基准测试中的最先进模型。我们的代码可在https://github.com/GraphPKU/NeuralCommonNeighbor找到。
更新时间: 2024-06-04 05:46:09
领域: cs.LG,cs.AI,cs.SI
Convergence Analysis of Fractional Gradient Descent
Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove linear convergence for smooth and strongly convex functions and $O(1/T)$ convergence for smooth and convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness - H\"older smoothness - that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up.
Updated: 2024-06-04 05:40:40
标题: 分数梯度下降的收敛分析
摘要: 分数阶导数是整数阶导数的一个经过深入研究的泛化。自然地,对于优化问题,了解使用分数阶导数的梯度下降的收敛性质是很有意义的。目前对分数梯度下降的收敛分析在所分析的方法和设置方面都存在限制。本文旨在通过分析分数梯度下降在光滑和凸、光滑和强凸以及光滑和非凸设置中的变化来填补这些空白。首先,将建立连接分数和整数导数的新边界。然后,将这些边界应用于上述设置,证明光滑和强凸函数的线性收敛性和光滑和凸函数的$O(1/T)$收敛性。此外,我们使用更适合分数导数的扩展平滑度概念 - H\"older平滑度 - 证明了对光滑和非凸函数的$O(1/T)$收敛性。最后,将展示使用分数梯度下降相比标准梯度下降的潜在加速效果的实证结果,以及一些初步的理论结果来解释这种加速效果。
更新时间: 2024-06-04 05:40:40
领域: math.OC,cs.LG,cs.NA,math.NA,G.1.6
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Code and models are available at https://decoding-comp-trust.github.io.
Updated: 2024-06-04 05:40:12
标题: 解压缩信任:审查压缩下高效LLM的可信度
摘要: 将高性能大规模语言模型(LLMs)进行压缩已成为一种资源高效推理的首选策略。尽管最先进的压缩方法在保留良好任务性能方面取得了令人印象深刻的进展,但在安全性和可信度方面的潜在风险却被大多数忽视。本研究首次对三种领先的LLMs使用五种最先进的压缩技术在八个可信任维度上进行了全面评估。我们的实验突显了压缩与可信度之间的复杂相互作用,揭示了一些有趣的模式。我们发现在同时实现效率和可信度方面,量化目前比修剪是一种更有效的方法。例如,一个4位量化模型保留了其原始对应模型的可信度,但模型修剪显著降低了可信度,甚至在50%的稀疏度下也是如此。此外,在适度的位范围内使用量化可能会意外地提高某些可信度维度,如道德和公平性。相反,将极端量化降至非常低的位级别(3位)往往会显著降低可信度。这种增加的风险不能仅通过查看良好性能来揭示,反而需要在实践中进行全面的可信度评估。这些发现为在LLMs中同时实现高效用、效率和可信度提供了实际建议。代码和模型可在https://decoding-comp-trust.github.io 上获取。
更新时间: 2024-06-04 05:40:12
领域: cs.CL,cs.AI
Evidentially Calibrated Source-Free Time-Series Domain Adaptation with Temporal Imputation
Source-free domain adaptation (SFDA) aims to adapt a model pre-trained on a labeled source domain to an unlabeled target domain without access to source data, preserving the source domain's privacy. While SFDA is prevalent in computer vision, it remains largely unexplored in time series analysis. Existing SFDA methods, designed for visual data, struggle to capture the inherent temporal dynamics of time series, hindering adaptation performance. This paper proposes MAsk And imPUte (MAPU), a novel and effective approach for time series SFDA. MAPU addresses the critical challenge of temporal consistency by introducing a novel temporal imputation task. This task involves randomly masking time series signals and leveraging a dedicated temporal imputer to recover the original signal within the learned embedding space, bypassing the complexities of noisy raw data. Notably, MAPU is the first method to explicitly address temporal consistency in the context of time series SFDA. Additionally, it offers seamless integration with existing SFDA methods, providing greater flexibility. We further introduce E-MAPU, which incorporates evidential uncertainty estimation to address the overconfidence issue inherent in softmax predictions. To achieve that, we leverage evidential deep learning to obtain a better-calibrated pre-trained model and adapt the target encoder to map out-of-support target samples to a new feature representation closer to the source domain's support. This fosters better alignment, ultimately enhancing adaptation performance. Extensive experiments on five real-world time series datasets demonstrate that both MAPU and E-MAPU achieve significant performance gains compared to existing methods. These results highlight the effectiveness of our proposed approaches for tackling various time series domain adaptation problems.
Updated: 2024-06-04 05:36:29
标题: 基于证据校准的无源时间序列领域自适应与时间插补
摘要: 无源域适应(SFDA)旨在将在标记的源域上预训练的模型适应到未标记的目标域,而无需访问源数据,以保护源域的隐私。尽管SFDA在计算机视觉中很常见,但在时间序列分析中仍然很少被探索。现有的针对视觉数据设计的SFDA方法很难捕捉时间序列的固有时间动态,从而阻碍了适应性能。本文提出了一种新颖有效的时间序列SFDA方法MAsk And imPUte(MAPU)。MAPU通过引入一项新颖的时间插补任务来解决时间一致性的关键挑战。该任务涉及随机屏蔽时间序列信号,并利用专门的时间插补器在学习的嵌入空间内恢复原始信号,从而绕过嘈杂原始数据的复杂性。值得注意的是,MAPU是第一个明确在时间序列SFDA背景下解决时间一致性的方法。此外,它与现有的SFDA方法无缝集成,提供了更大的灵活性。我们进一步引入了E-MAPU,它结合了证据不确定性估计,以解决softmax预测中固有的过度自信问题。为了实现这一目标,我们利用证据深度学习来获得更好校准的预训练模型,并调整目标编码器以将支持外的目标样本映射到更接近源域支持的新特征表示。这促进了更好的对齐,最终提升了适应性能。对五个真实世界的时间序列数据集进行了大量实验,结果表明,与现有方法相比,MAPU和E-MAPU均取得了显著的性能提升。这些结果突显了我们提出的方法在解决各种时间序列域适应问题方面的有效性。
更新时间: 2024-06-04 05:36:29
领域: cs.LG,cs.AI
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Graph Transformers, which incorporate self-attention and positional encoding, have recently emerged as a powerful architecture for various graph learning tasks. Despite their impressive performance, the complex non-convex interactions across layers and the recursive graph structure have made it challenging to establish a theoretical foundation for learning and generalization. This study introduces the first theoretical investigation of a shallow Graph Transformer for semi-supervised node classification, comprising a self-attention layer with relative positional encoding and a two-layer perceptron. Focusing on a graph data model with discriminative nodes that determine node labels and non-discriminative nodes that are class-irrelevant, we characterize the sample complexity required to achieve a desirable generalization error by training with stochastic gradient descent (SGD). This paper provides the quantitative characterization of the sample complexity and number of iterations for convergence dependent on the fraction of discriminative nodes, the dominant patterns, and the initial model errors. Furthermore, we demonstrate that self-attention and positional encoding enhance generalization by making the attention map sparse and promoting the core neighborhood during training, which explains the superior feature representation of Graph Transformers. Our theoretical results are supported by empirical experiments on synthetic and real-world benchmarks.
Updated: 2024-06-04 05:30:16
标题: 如何提高图变换器的泛化能力?自注意力和位置编码的理论探究
摘要: 图形转换器最近作为各种图学习任务的强大架构出现,其结合了自注意力和位置编码。尽管它们表现出色,但层间复杂的非凸交互和递归图结构使得建立学习和泛化的理论基础具有挑战性。本研究介绍了第一个关于半监督节点分类的浅层图形转换器的理论研究,包括具有相对位置编码和两层感知器的自注意力层。我们针对一个包含决定节点标签的有区别节点和与类别无关的无区别节点的图数据模型,表征了通过随机梯度下降(SGD)训练来实现理想泛化误差所需的样本复杂度。本文提供了取决于有区别节点比例、主导模式和初始模型误差的收敛样本复杂度和迭代次数的定量表征。此外,我们证明自注意力和位置编码通过使注意力图稀疏化并在训练过程中促进核心邻域,增强了泛化能力,解释了图形转换器的卓越特征表示。我们的理论结果得到了对合成和真实世界基准测试的实证实验支持。
更新时间: 2024-06-04 05:30:16
领域: cs.LG
Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. TD($\lambda$) provides a mechanism to navigate this bias-variance tradeoff smoothly. Appropriately selecting $\lambda$ can significantly improve performance. Here, we propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing $\lambda$-return targets. Unlike other model-based solutions to credit assignment, Chunked-TD is less vulnerable to model inaccuracies. Our approach is motivated by the principle of history compression and 'chunks' trajectories for conventional TD learning. Chunking with learned world models compresses near-deterministic regions of the environment-policy interaction to speed up credit assignment while still bootstrapping when necessary. We propose algorithms that can be implemented online and show that they solve some problems much faster than conventional TD($\lambda$).
Updated: 2024-06-04 05:28:56
标题: 序列压缩加速了强化学习中的信用分配
摘要: 在强化学习中,时间性信用分配是具有挑战性的,因为结果存在延迟和随机性。蒙特卡洛目标可以弥合行动和结果之间的长时间延迟,但由于随机性而导致高方差目标。时序差分(TD)学习利用自举来克服方差,但引入了只能通过多次迭代来校正的偏差。TD($\lambda$)提供了一种平滑处理偏差-方差权衡的机制。适当选择$\lambda$可以显著提高性能。在这里,我们提出了基于块状TD的方法,该方法使用模型预测的转换概率来计算$\lambda$-返回目标。与其他基于模型的信用分配解决方案不同,块状TD对模型不准确性的脆弱性较小。我们的方法受到历史压缩原则的启发,并将传统TD学习的轨迹“分块”。通过学习的世界模型对环境-策略交互中的近确定性区域进行压缩,以加速信用分配,同时仍在必要时进行自举。我们提出了可以在线实现的算法,并展示它们比传统的TD($\lambda$)解决一些问题更快。
更新时间: 2024-06-04 05:28:56
领域: cs.LG,cs.AI
Can Dense Connectivity Benefit Outlier Detection? An Odyssey with NAS
Recent advances in Out-of-Distribution (OOD) Detection is the driving force behind safe and reliable deployment of Convolutional Neural Networks (CNNs) in real world applications. However, existing studies focus on OOD detection through confidence score and deep generative model-based methods, without considering the impact of DNN structures, especially dense connectivity in architecture fabrications. In addition, existing outlier detection approaches exhibit high variance in generalization performance, lacking stability and confidence in evaluating and ranking different outlier detectors. In this work, we propose a novel paradigm, Dense Connectivity Search of Outlier Detector (DCSOD), that automatically explore the dense connectivity of CNN architectures on near-OOD detection task using Neural Architecture Search (NAS). We introduce a hierarchical search space containing versatile convolution operators and dense connectivity, allowing a flexible exploration of CNN architectures with diverse connectivity patterns. To improve the quality of evaluation on OOD detection during search, we propose evolving distillation based on our multi-view feature learning explanation. Evolving distillation stabilizes training for OOD detection evaluation, thus improves the quality of search. We thoroughly examine DCSOD on CIFAR benchmarks under OOD detection protocol. Experimental results show that DCSOD achieve remarkable performance over widely used architectures and previous NAS baselines. Notably, DCSOD achieves state-of-the-art (SOTA) performance on CIFAR benchmark, with AUROC improvement of $\sim$1.0%.
Updated: 2024-06-04 05:19:32
标题: 密集连接是否有益于异常检测?与NAS的一次探索
摘要: 最近在异常检测方面取得的进展是推动在现实世界应用中安全可靠部署卷积神经网络(CNNs)的驱动力。然而,现有研究集中在通过置信度评分和基于深生成模型的方法进行异常检测,而没有考虑DNN结构的影响,特别是在架构制作中的密集连接。此外,现有的异常检测方法在泛化性能方面存在高方差,缺乏评估和排名不同异常检测器的稳定性和信心。在这项工作中,我们提出了一种新的范式,即异常检测器的密集连接搜索(DCSOD),它利用神经架构搜索(NAS)自动探索CNN架构的密集连接在接近OOD检测任务中的应用。我们引入了一个包含多功能卷积操作符和密集连接的分层搜索空间,允许对具有不同连接模式的CNN架构进行灵活的探索。为了在搜索过程中改进对OOD检测的评估质量,我们提出了基于我们的多视图特征学习解释的进化蒸馏。进化蒸馏稳定了OOD检测评估的训练,从而提高了搜索的质量。我们在CIFAR基准测试中彻底检查了DCSOD在OOD检测协议下的性能。实验结果显示,DCSOD在广泛使用的架构和先前的NAS基线上取得了显著的性能。值得注意的是,DCSOD在CIFAR基准测试中取得了最先进的性能,AUROC改进约为1.0%。
更新时间: 2024-06-04 05:19:32
领域: cs.LG,cs.CV
Games for Artificial Intelligence Research: A Review and Perspectives
Games have been the perfect test-beds for artificial intelligence research for the characteristics that widely exist in real-world scenarios. Learning and optimisation, decision making in dynamic and uncertain environments, game theory, planning and scheduling, design and education are common research areas shared between games and real-world problems. Numerous open-source games or game-based environments have been implemented for studying artificial intelligence. In addition to single- or multi-player, collaborative or adversarial games, there has also been growing interest in implementing platforms for creative design in recent years. Those platforms provide ideal benchmarks for exploring and comparing artificial intelligence ideas and techniques. This paper reviews the games and game-based platforms for artificial intelligence research, provides guidance on matching particular types of artificial intelligence with suitable games for testing and matching particular needs in games with suitable artificial intelligence techniques, discusses the research trend induced by the evolution of those games and platforms, and gives an outlook.
Updated: 2024-06-04 05:18:04
标题: 人工智能研究中的游戏:回顾与展望
摘要: 游戏一直是人工智能研究的理想试验平台,因为其中存在着现实世界场景中广泛存在的特征。学习和优化、在动态和不确定环境中做出决策、博弈论、规划和调度、设计和教育是游戏和真实世界问题之间共享的常见研究领域。许多开源游戏或基于游戏的环境已经被实现用于研究人工智能。除了单人或多人、协作或对抗性游戏之外,近年来对于实现创意设计平台也越来越感兴趣。这些平台为探索和比较人工智能思想和技术提供了理想的基准。本文回顾了用于人工智能研究的游戏和基于游戏的平台,提供了关于将特定类型的人工智能与适合的游戏进行测试匹配以及将游戏中的特定需求与适合的人工智能技术进行匹配的指导,讨论了由这些游戏和平台的演变引发的研究趋势,并展望了未来。
更新时间: 2024-06-04 05:18:04
领域: cs.AI
Differentially Private Federated Learning without Noise Addition: When is it Possible?
Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate model over multiple training rounds thanks to leveraging the "noise" from other users' updates. However, the privacy metric used in that work (mutual information) measures the on-average privacy leakage, without providing any privacy guarantees for worse-case scenarios. To address this, in this work we study the conditions under which FL with SA can provide worst-case differential privacy guarantees. Specifically, we formally identify the necessary condition that SA can provide DP without addition noise. We then prove that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy $\epsilon$ bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee.
Updated: 2024-06-04 05:17:56
标题: 在哪种情况下可能实现不添加噪音的差分隐私联邦学习?
摘要: 联邦学习(FL)与安全聚合(SA)作为一种隐私保护框架,用于训练机器学习模型,同时防止服务器从用户的个别加密模型更新中学习到信息,已经引起了重要关注。最近的研究通过利用其他用户更新中的“噪声”,扩展了FL与SA的隐私保证,通过限制多轮训练中聚合模型的信息泄露。然而,该研究使用的隐私度量(互信息)衡量的是平均隐私泄露,而没有提供最坏情况下的隐私保证。为了解决这个问题,在这项研究中,我们研究了FL与SA可以提供最坏情况差分隐私保证的条件。具体而言,我们正式确定了SA可以在不添加噪声的情况下提供差分隐私的必要条件。然后我们证明,当聚合模型更新中的随机性服从具有非奇异协方差矩阵的高斯分布时,SA可以提供差分隐私保证,隐私级别ε受协方差矩阵最小特征值的倒数限制。然而,我们进一步证明,在实践中,这些条件几乎不可能成立,因此仍需要在模型更新中添加额外的噪声,以使FL中的SA实现差分隐私。最后,我们讨论了利用聚合模型更新中固有的随机性来降低实现差分隐私保证所需的额外噪声量的潜在解决方案。
更新时间: 2024-06-04 05:17:56
领域: cs.CR,cs.LG
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling
Retrieval-augmented generation enhances large language models (LLMs) by incorporating relevant information from external knowledge sources. This enables LLMs to adapt to specific domains and mitigate hallucinations in knowledge-intensive tasks. However, existing retrievers are often misaligned with LLMs due to their separate training processes and the black-box nature of LLMs. To address this challenge, we propose ARL2, a retriever learning technique that harnesses LLMs as labelers. ARL2 leverages LLMs to annotate and score relevant evidence, enabling learning the retriever from robust LLM supervision. Furthermore, ARL2 uses an adaptive self-training strategy for curating high-quality and diverse relevance data, which can effectively reduce the annotation cost. Extensive experiments demonstrate the effectiveness of ARL2, achieving accuracy improvements of 5.4% on NQ and 4.6% on MMLU compared to the state-of-the-art methods. Additionally, ARL2 exhibits robust transfer learning capabilities and strong zero-shot generalization abilities. Our code will be published at \url{https://github.com/zhanglingxi-cs/ARL2}.
Updated: 2024-06-04 05:17:24
标题: ARL2: 通过自我引导的自适应相关性标记,为黑匣子大型语言模型对齐检索器
摘要: 检索增强生成通过从外部知识源中整合相关信息增强了大型语言模型(LLMs)。这使得LLMs能够适应特定领域并减轻知识密集任务中的幻觉。然而,由于检索器和LLMs具有单独的训练过程和LLMs的黑盒特性,现有的检索器通常与LLMs不匹配。为了解决这一挑战,我们提出了ARL2,一种利用LLMs作为标签器的检索器学习技术。ARL2利用LLMs标注和评分相关证据,从而能够从强大的LLM监督中学习检索器。此外,ARL2使用自适应自我训练策略来筛选高质量和多样性相关性数据,可以有效降低标注成本。大量实验证明了ARL2的有效性,在NQ上实现了5.4%的准确度提升,在MMLU上实现了4.6%的准确度提升,与最先进的方法相比。此外,ARL2表现出强大的迁移学习能力和强大的零次通用化能力。我们的代码将在\url{https://github.com/zhanglingxi-cs/ARL2}上发布。
更新时间: 2024-06-04 05:17:24
领域: cs.CL,cs.AI,cs.IR,cs.LG
Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation
We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs' discriminative power, we introduce Feature Attribution with Necessity and Sufficiency (FANS), which find a neighborhood of the input such that perturbing samples within this neighborhood have a high Probability of being Necessity and Sufficiency (PNS) cause for the change in predictions, and use this PNS as the importance of the feature. Specifically, FANS compute this PNS via a heuristic strategy for estimating the neighborhood and a perturbation test involving two stages (factual and interventional) for counterfactual reasoning. To generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Please refer to the source code via \url{https://github.com/DMIRLAB-Group/FANS}.
Updated: 2024-06-04 05:15:01
标题: 通过双阶段扰动测试的必要性和充分性特征归因用于因果解释
摘要: 我们研究了机器学习模型可解释性的问题,重点关注通过扰动测试评估特征重要性的特征归因方法(FAMs)。尽管它们很实用,但在扰动后预测变化相似时,FAMs很难区分不同特征的贡献。为了增强FAMs的区分能力,我们引入了具有必要性和充分性的特征归因(FANS),它找到输入的邻域,使得在该邻域内扰动样本导致预测变化的必要性和充分性(PNS)概率很高,并将此PNS作为特征的重要性。具体来说,FANS通过一种启发式策略计算这个PNS,用于估计邻域,并使用涉及两个阶段(事实和干预)的扰动测试进行反事实推理。为了生成反事实样本,我们使用基于重新采样的方法在观察样本上来近似所需的条件分布。我们证明FANS在六个基准测试中优于现有的归因方法。请通过\url{https://github.com/DMIRLAB-Group/FANS}查看源代码。
更新时间: 2024-06-04 05:15:01
领域: cs.LG,stat.ME
GaitGuard: Towards Private Gait in Mixed Reality
Augmented/Mixed Reality (AR/MR) technologies offers a new era of immersive, collaborative experiences, distinctively setting them apart from conventional mobile systems. However, as we further investigate the privacy and security implications within these environments, the issue of gait privacy emerges as a critical yet underexplored concern. Given its uniqueness as a biometric identifier that can be correlated to several sensitive attributes, the protection of gait information becomes crucial in preventing potential identity tracking and unauthorized profiling within these systems. In this paper, we conduct a user study with 20 participants to assess the risk of individual identification through gait feature analysis extracted from video feeds captured by MR devices. Our results show the capability to uniquely identify individuals with an accuracy of up to 92%, underscoring an urgent need for effective gait privacy protection measures. Through rigorous evaluation, we present a comparative analysis of various mitigation techniques, addressing both aware and unaware adversaries, in terms of their utility and impact on privacy preservation. From these evaluations, we introduce GaitGuard, the first real-time framework designed to protect the privacy of gait features within the camera view of AR/MR devices. Our evaluations of GaitGuard within a MR collaborative scenario demonstrate its effectiveness in implementing mitigation that reduces the risk of identification by up to 68%, while maintaining a minimal latency of merely 118.77 ms, thus marking a critical step forward in safeguarding privacy within AR/MR ecosystems.
Updated: 2024-06-04 05:13:26
标题: GaitGuard:走姿在混合现实中的私密化
摘要: 增强/混合现实(AR/MR)技术为沉浸式、协作体验开启了一个新时代,与传统的移动系统明显不同。然而,随着我们进一步研究这些环境中的隐私和安全影响,步态隐私问题出现为一个至关重要但未被充分探讨的关注点。鉴于其作为可与多个敏感属性相关联的生物识别标识符的独特性,保护步态信息变得至关重要,以防止潜在的身份追踪和未经授权的轮廓建立在这些系统中。在本文中,我们进行了一个包括20名参与者的用户研究,以评估通过MR设备捕获的视频流中提取的步态特征分析对个体识别风险的影响。我们的结果显示,我们能够以高达92%的准确率唯一识别个体,突出了对有效步态隐私保护措施的紧迫需求。通过严格的评估,我们提出了各种减缓技术的比较分析,针对意识到和不意识到的对手,从它们对隐私保护的效用和影响角度进行考量。从这些评估中,我们介绍了GaitGuard,这是第一个设计用于保护AR/MR设备摄像机视图中步态特征隐私的实时框架。我们在MR协作场景中对GaitGuard的评估展示了其在实施减缓方面的有效性,将个体识别风险降低了高达68%,同时仅保持了仅118.77毫秒的最小延迟,从而在AR/MR生态系统中维护隐私方面迈出了重要的一步。
更新时间: 2024-06-04 05:13:26
领域: cs.HC,cs.CR
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.
Updated: 2024-06-04 05:11:19
标题: RAM-EHR:检索增强与电子健康记录上的临床预测相结合
摘要: 我们提出了RAM-EHR,一个用于改善电子健康记录(EHRs)上临床预测的检索增强管道。RAM-EHR首先收集多个知识来源,将它们转换为文本格式,并使用密集检索获取与医学概念相关的信息。这种策略解决了与概念复杂名称相关的困难。然后,RAM-EHR增强了本地EHR预测模型,与一致性正则化共同训练,以捕获来自患者就诊和总结知识的互补信息。在两个EHR数据集上的实验显示,RAM-EHR相对于先前的知识增强基线具有更高的效力(AUROC提高了3.4%,AUPR提高了7.2%),强调了RAM-EHR的总结知识对临床预测任务的有效性。代码将发布在\url{https://github.com/ritaranx/RAM-EHR}。
更新时间: 2024-06-04 05:11:19
领域: cs.CL,cs.AI,cs.IR,q-bio.OT
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positions, seeds, and prompts. To be specific, extracting these patches from one noise and injecting them into another noise leads to object generation in targeted areas. We identify these patches by analyzing the dispersion of object bounding boxes across generated images, leading to the development of a posterior analysis technique. Furthermore, we create a dataset consisting of Gaussian noises labeled with bounding boxes corresponding to the objects appearing in the generated images and train a detector that identifies these patches from the initial noise. To explain the formation of these patches, we reveal that they are outliers in Gaussian noise, and follow distinct distributions through two-sample tests. Finally, we find the misalignment between prompts and the trigger patch patterns can result in unsuccessful image generations. The study proposes a reject-sampling strategy to obtain optimal noise, aiming to improve prompt adherence and positional diversity in image generation.
Updated: 2024-06-04 05:06:00
标题: 扩散模型中的水晶球假设:从初始噪声预测物体位置
摘要: 扩散模型在文本到图像生成任务中取得了显著的成功;然而,初始噪声的作用很少被探讨。在这项研究中,我们确定了初始噪声图像中的特定区域,称为触发补丁,这些区域在生成的图像中的对象生成中起关键作用。值得注意的是,这些补丁是“通用的”,可以在不同位置、种子和提示之间进行泛化。具体来说,从一个噪声中提取这些补丁并将它们注入到另一个噪声中会导致目标区域的对象生成。我们通过分析生成图像中物体边界框的分散情况来识别这些补丁,从而开发了一种后验分析技术。此外,我们创建了一个数据集,其中包含用与生成图像中出现的对象对应的边界框标记的高斯噪声,并训练了一个检测器,从初始噪声中识别这些补丁。为了解释这些补丁的形成,我们揭示它们在高斯噪声中是离群值,并通过两样本检验遵循不同的分布。最后,我们发现提示和触发补丁模式之间的不对齐可能导致图像生成失败。该研究提出了一种拒绝抽样策略来获取最佳噪声,旨在提高提示遵从性和图像生成中的位置多样性。
更新时间: 2024-06-04 05:06:00
领域: cs.CV,cs.AI
Multiway Multislice PHATE: Visualizing Hidden Dynamics of RNNs through Training
Recurrent neural networks (RNNs) are a widely used tool for sequential data analysis, however, they are still often seen as black boxes of computation. Understanding the functional principles of these networks is critical to developing ideal model architectures and optimization strategies. Previous studies typically only emphasize the network representation post-training, overlooking their evolution process throughout training. Here, we present Multiway Multislice PHATE (MM-PHATE), a novel method for visualizing the evolution of RNNs' hidden states. MM-PHATE is a graph-based embedding using structured kernels across the multiple dimensions spanned by RNNs: time, training epoch, and units. We demonstrate on various datasets that MM-PHATE uniquely preserves hidden representation community structure among units and identifies information processing and compression phases during training. The embedding allows users to look under the hood of RNNs across training and provides an intuitive and comprehensive strategy to understanding the network's internal dynamics and draw conclusions, e.g., on why and how one model outperforms another or how a specific architecture might impact an RNN's learning ability.
Updated: 2024-06-04 05:05:27
标题: 多途径多切片PHATE:通过训练可视化RNN的隐藏动态
摘要: 循环神经网络(RNNs)是用于顺序数据分析的广泛工具,然而,它们仍然经常被视为计算的黑盒子。了解这些网络的功能原理对于开发理想的模型架构和优化策略至关重要。先前的研究通常只强调网络在训练后的表示,忽略了它们在训练过程中的演变过程。在这里,我们提出了Multiway Multislice PHATE(MM-PHATE),这是一种用于可视化RNNs隐藏状态演变的新方法。MM-PHATE是一种基于图的嵌入,通过跨越RNNs的多个维度(时间、训练时期和单元)展开的结构化核来实现。我们在各种数据集上展示了MM-PHATE独特地保留了单元之间的隐藏表示社区结构,并在训练过程中识别了信息处理和压缩阶段。这种嵌入允许用户在训练期间深入了解RNNs的内部动态,并提供了一种直观和全面的策略来理解网络的内部动态并得出结论,例如,为什么和如何一个模型优于另一个模型,或者特定的架构如何影响RNN的学习能力。
更新时间: 2024-06-04 05:05:27
领域: cs.LG
Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment
This paper focuses on transferring control policies between robot manipulators with different morphology. While reinforcement learning (RL) methods have shown successful results in robot manipulation tasks, transferring a trained policy from simulation to a real robot or deploying it on a robot with different states, actions, or kinematics is challenging. To achieve cross-embodiment policy transfer, our key insight is to project the state and action spaces of the source and target robots to a common latent space representation. We first introduce encoders and decoders to associate the states and actions of the source robot with a latent space. The encoders, decoders, and a latent space control policy are trained simultaneously using loss functions measuring task performance, latent dynamics consistency, and encoder-decoder ability to reconstruct the original states and actions. To transfer the learned control policy, we only need to train target encoders and decoders that align a new target domain to the latent space. We use generative adversarial training with cycle consistency and latent dynamics losses without access to the task reward or reward tuning in the target domain. We demonstrate sim-to-sim and sim-to-real manipulation policy transfer with source and target robots of different states, actions, and embodiments. The source code is available at \url{https://github.com/ExistentialRobotics/cross_embodiment_transfer}.
Updated: 2024-06-04 05:00:24
标题: 跨身体机器人操作技能转移:利用潜在空间对齐
摘要: 这篇论文关注将控制策略从具有不同形态的机器人操作器转移。虽然强化学习(RL)方法在机器人操作任务中取得了成功的结果,但将训练好的策略从模拟环境转移到真实机器人或在具有不同状态、动作或运动学的机器人上部署它是具有挑战性的。为了实现跨形态策略转移,我们的关键观点是将源机器人和目标机器人的状态和动作空间投影到一个共同的潜在空间表示中。我们首先引入编码器和解码器来将源机器人的状态和动作与一个潜在空间相关联。编码器、解码器和潜在空间控制策略同时使用测量任务性能、潜在动态一致性和编码器-解码器重构原始状态和动作能力的损失函数进行训练。为了转移学习的控制策略,我们只需要训练目标编码器和解码器,将新的目标领域对齐到潜在空间中。我们使用生成对抗训练,包括循环一致性和潜在动态损失,而无需访问目标领域中的任务奖励或奖励调整。我们展示了源机器人和目标机器人的不同状态、动作和形态之间的模拟到模拟和模拟到实际操作策略转移。源代码可在以下链接找到:\url{https://github.com/ExistentialRobotics/cross_embodiment_transfer}。
更新时间: 2024-06-04 05:00:24
领域: cs.RO,cs.AI
Noise Correction on Subjective Datasets
Incorporating every annotator's perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.
Updated: 2024-06-04 04:53:23
标题: 主观数据集的噪声校正
摘要: 将每个标注者的观点纳入其中对于无偏数据建模至关重要。标注者疲劳和随时间变化的观点可能会扭曲数据集的标注。为了解决这一问题,我们提出利用多任务学习结合基于损失的标签校正来学习更准确的多样化观点表示。我们展示了使用我们的新颖公式,我们可以清晰地区分出赞同和不赞同的标注。此外,该方法提供了一种可控的方式来鼓励或阻止不同意见。我们展示了这种修改可以提高在单个或多个标注者设置中的预测性能。最后,我们展示了这种方法对施加在主观数据上的额外标签噪音保持稳健性。
更新时间: 2024-06-04 04:53:23
领域: cs.LG,cs.AI,cs.HC
DrEureka: Language Model Guided Sim-To-Real Transfer
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.
Updated: 2024-06-04 04:53:05
标题: DrEureka:语言模型引导的从模拟到真实的转移
摘要: 将在模拟中学习的策略转移到现实世界是一种大规模获取机器人技能的有前途的策略。然而,模拟到现实的方法通常依赖于手动设计和调整任务奖励函数以及模拟物理参数,使得这一过程缓慢且需要大量人力。在本文中,我们研究了使用大型语言模型(LLM)来自动化和加速模拟到现实设计。我们的LLM引导的模拟到现实方法DrEureka,只需要目标任务的物理模拟,并自动构建适当的奖励函数和领域随机化分布以支持现实世界的转移。我们首先证明了我们的方法可以发现与现有人工设计的模拟到现实配置相竞争的四足动态和熟练操纵任务。然后,我们展示了我们的方法能够解决新颖的机器人任务,如四足平衡和在瑜伽球上行走,而无需迭代手动设计。
更新时间: 2024-06-04 04:53:05
领域: cs.RO,cs.AI,cs.LG
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference. Our code can be found at https://github.com/Cornell-RelaxML/quip-sharp.
Updated: 2024-06-04 04:51:52
标题: QuIP#: 用Hadamard不一致性和格码书实现更好的LLM量化
摘要: 训练后量化(PTQ)通过将LLMs的权重量化为低精度来减少内存占用。在这项工作中,我们介绍了QuIP#,一种仅基于权重的PTQ方法,通过三种新技术在极端压缩范围(每个权重≤4位)实现了最先进的结果。首先,QuIP#通过使用随机哈达玛变换改进了QuIP的(Chee等人,2023年)非相干处理,这种方法更快且具有更好的理论性质。其次,QuIP#利用向量量化来利用不相干权重具有的球形亚高斯分布:具体来说,我们引入一组基于高度对称的$E_8$晶格的硬件高效码本,实现了最佳的8维单位球装填。第三,QuIP#使用微调来提高对原始模型的保真度。我们的实验证明,QuIP#优于现有的PTQ方法,实现了PTQ缩放中的新行为,并支持快速推断。我们的代码可以在https://github.com/Cornell-RelaxML/quip-sharp找到。
更新时间: 2024-06-04 04:51:52
领域: cs.LG,cs.AI,cs.CL
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead to low generation risks, 2) how to provide provable guarantees on the generation risks of RAG and vanilla LLMs, and 3) what sufficient conditions enable RAG models to reduce generation risks. We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks, which we refer to as conformal generation risk. We also provide theoretical guarantees on conformal generation risks for general bounded risk functions under test distribution shifts. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial. Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.
Updated: 2024-06-04 04:51:08
标题: C-RAG:检定生成风险针对检索增强语言模型
摘要: 尽管大型语言模型(LLMs)在各种应用中具有令人印象深刻的能力,但它们仍然存在可信度问题,例如幻觉和错位。提出了检索增强语言模型(RAG)来通过基于外部知识来增强生成的可信度,但对其生成风险的理论理解尚未被探索。本文回答了以下问题:1)RAG是否确实可以降低生成风险,2)如何对RAG和普通LLMs的生成风险提供可证明的保证,3)什么样的充分条件能够使RAG模型降低生成风险。我们提出了C-RAG,第一个用于认证RAG模型生成风险的框架。具体地,我们为RAG模型提供符合风险分析,并认证生成风险的上置信界,我们称之为符合生成风险。我们还为在测试分布转移下的一般有界风险函数提供了符合生成风险的理论保证。我们证明当检索模型和变换器的质量是非平凡的时,RAG实现了比单个LLM更低的符合生成风险。我们的大量实证结果展示了我们在四个最先进的检索模型上对四个广泛使用的NLP数据集的符合生成风险保证的严密性和紧密性。
更新时间: 2024-06-04 04:51:08
领域: cs.AI,cs.CL,cs.IR
Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis
Differential privacy (DP) has the potential to enable privacy-preserving analysis on sensitive data, but requires analysts to judiciously spend a limited ``privacy loss budget'' $\epsilon$ across queries. Analysts conducting exploratory analyses do not, however, know all queries in advance and seldom have DP expertise. Thus, they are limited in their ability to specify $\epsilon$ allotments across queries prior to an analysis. To support analysts in spending $\epsilon$ efficiently, we propose a new interactive analysis paradigm, Measure-Observe-Remeasure, where analysts ``measure'' the database with a limited amount of $\epsilon$, observe estimates and their errors, and remeasure with more $\epsilon$ as needed. We instantiate the paradigm in an interactive visualization interface which allows analysts to spend increasing amounts of $\epsilon$ under a total budget. To observe how analysts interact with the Measure-Observe-Remeasure paradigm via the interface, we conduct a user study that compares the utility of $\epsilon$ allocations and findings from sensitive data participants make to the allocations and findings expected of a rational agent who faces the same decision task. We find that participants are able to use the workflow relatively successfully, including using budget allocation strategies that maximize over half of the available utility stemming from $\epsilon$ allocation. Their loss in performance relative to a rational agent appears to be driven more by their inability to access information and report it than to allocate $\epsilon$.
Updated: 2024-06-04 04:48:40
标题: 测量-观察-再测量:一种交互式差分隐私探索分析范式
摘要: 差分隐私(DP)有潜力在敏感数据上实现保护隐私的分析,但需要分析人员明智地在查询中花费有限的“隐私损失预算” $\epsilon$。然而,进行探索性分析的分析人员事先并不知道所有的查询,而且很少具备DP专业知识。因此,在分析之前,他们在跨查询中指定 $\epsilon$ 分配的能力受到限制。为了帮助分析人员有效地使用 $\epsilon$,我们提出了一种新的交互式分析范式,即“测量-观察-重新测量”,分析人员可以在数据库上“测量”一定数量的 $\epsilon$,观察估计值及其误差,并根据需要重新测量更多的 $\epsilon$。 我们将这一范式实例化为一个交互式可视化界面,允许分析人员在总预算下逐渐增加 $\epsilon$ 的支出。为了观察分析人员如何通过界面与“测量-观察-重新测量”范式进行交互,我们进行了一项用户研究,比较了敏感数据参与者的 $\epsilon$ 分配和发现与面临相同决策任务的理性代理人所预期的 $\epsilon$ 分配和发现之间的效用。我们发现参与者能够相对成功地使用工作流程,包括使用最大化超过一半来自 $\epsilon$ 分配的可用效用的预算分配策略。相对于理性代理人的绩效损失似乎更多地是由于他们无法获取信息并报告信息,而不是由于分配 $\epsilon`。
更新时间: 2024-06-04 04:48:40
领域: cs.CR,cs.DB,cs.HC
ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation
Spatiotemporal data is prevalent in a wide range of edge devices, such as those used in personal communication and financial transactions. Recent advancements have sparked a growing interest in integrating spatiotemporal analysis with large-scale language models. However, spatiotemporal data often contains sensitive information, making it unsuitable for open third-party access. To address this challenge, we propose a Graph-GAN-based model for generating privacy-protected spatiotemporal data. Our approach incorporates spatial and temporal attention blocks in the discriminator and a spatiotemporal deconvolution structure in the generator. These enhancements enable efficient training under Gaussian noise to achieve differential privacy. Extensive experiments conducted on three real-world spatiotemporal datasets validate the efficacy of our model. Our method provides a privacy guarantee while maintaining the data utility. The prediction model trained on our generated data maintains a competitive performance compared to the model trained on the original data.
Updated: 2024-06-04 04:43:54
标题: ST-DPGAN:一种用于时空数据生成的隐私保护框架
摘要: 时空数据在各种边缘设备中广泛存在,例如个人通信和金融交易中使用的设备。最近的进展引发了将时空分析与大规模语言模型集成的兴趣增加。然而,时空数据通常包含敏感信息,使其不适合开放的第三方访问。为了解决这一挑战,我们提出了一种基于Graph-GAN的模型,用于生成受隐私保护的时空数据。我们的方法在鉴别器中结合了空间和时间注意力块,并在生成器中采用了时空反卷积结构。这些增强功能使得在高斯噪声下高效训练,以实现差分隐私。在三个真实世界的时空数据集上进行的广泛实验验证了我们模型的有效性。我们的方法提供了隐私保证,同时保持数据的实用性。在我们生成的数据上训练的预测模型与在原始数据上训练的模型相比保持了竞争性能。
更新时间: 2024-06-04 04:43:54
领域: cs.LG,cs.AI,cs.CR
Certifiably Byzantine-Robust Federated Conformal Prediction
Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this framework for distributed uncertainty quantification is susceptible to Byzantine failures. A minor subset of malicious clients can significantly compromise the practicality of coverage guarantees. To address this vulnerability, we introduce a novel framework Rob-FCP, which executes robust federated conformal prediction, effectively countering malicious clients capable of reporting arbitrary statistics with the conformal calibration process. We theoretically provide the conformal coverage bound of Rob-FCP in the Byzantine setting and show that the coverage of Rob-FCP is asymptotically close to the desired coverage level. We also propose a malicious client number estimator to tackle a more challenging setting where the number of malicious clients is unknown to the defender and theoretically shows its effectiveness. We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks on five standard benchmark and real-world healthcare datasets.
Updated: 2024-06-04 04:43:30
标题: 可以翻译为:经认证的拜占庭强鲁棒的联邦一致性预测
摘要: 依照规范预测已经展示出在构建对具有可交换数据样本的机器学习模型的统计严谨预测集方面具有令人印象深刻的能力。孤立的数据集,加上与本地数据共享相关的不断升级的隐私问题,已经激发了将规范预测扩展到具有分布式数据样本的联邦环境的最近创新。然而,这种分布式不确定性量化框架容易受到拜占庭故障的影响。一小部分恶意客户端可以显著地损害覆盖保证的实用性。为了解决这一漏洞,我们引入了一个新颖的框架Rob-FCP,该框架执行强大的联邦规范预测,有效地对抗能够通过规范校准过程报告任意统计数据的恶意客户端。我们在拜占庭环境中理论上提供了Rob-FCP的规范覆盖界限,并展示了Rob-FCP的覆盖率与期望覆盖水平渐近接近。我们还提出了一个恶意客户端数量估算器,以应对一个更具挑战性的设置,即防御者不知道恶意客户端的数量,并从理论上证明了其有效性。我们在五个标准基准和真实世界医疗保健数据集上经验地展示了Rob-FCP对各种比例的恶意客户端在各种拜占庭攻击下的稳健性。
更新时间: 2024-06-04 04:43:30
领域: cs.LG,cs.AI
Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions
This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of component functions. Numerical experiments across various tasks validate the effectiveness of our method.
Updated: 2024-06-04 04:39:51
标题: 适应性方差缩减在较弱假设下的随机优化
摘要: 这篇论文探讨了基于STORM技术的随机优化的自适应方差缩减方法。现有的自适应STORM扩展依赖于强假设,如有界梯度和有界函数值,或者在收敛速度中受到额外的$O(\log T)$项的影响。为了解决这些限制,我们引入了一种新颖的自适应STORM方法,通过我们新设计的学习率策略,实现了非凸函数的最佳收敛速度为$O(T^{-1/3})$。与现有方法相比,我们的方法需要更弱的假设,并且在不增加$O(\log T)$项的情况下达到最佳收敛速度。我们还将所提出的技术扩展到随机组合优化,获得相同的最佳速率为$O(T^{-1/3})。此外,我们研究了非凸有限和问题,并开发了另一种创新的自适应方差缩减方法,实现了一个具有最佳收敛速度为$O(n^{1/4} T^{-1/2})$的方法,其中$n$表示组件函数的数量。通过各种任务上的数值实验验证了我们方法的有效性。
更新时间: 2024-06-04 04:39:51
领域: math.OC,cs.LG
LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand
Motivated by an imperative to reduce the carbon emissions of cloud data centers, this paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU) and applies it to carbon-aware resource scaling for executing computing workloads. The task is to dynamically scale resources (e.g., the number of servers) assigned to a job of unknown length such that it is completed before a deadline, with the objective of reducing the carbon emissions of executing the workload. The total carbon emissions of executing a job originate from the emissions of running the job and excess carbon emitted while switching between different scales (e.g., due to checkpoint and resume). Prior work on carbon-aware resource scaling has assumed accurate job length information, while other approaches have ignored switching losses and require carbon intensity forecasts. These assumptions prohibit the practical deployment of prior work for online carbon-aware execution of scalable computing workload. We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU. To achieve improved practical average-case performance, LACS integrates machine-learned predictions of job length. To achieve solid theoretical performance, LACS extends the recent theoretical advances on online conversion with switching costs to handle a scenario where the job length is unknown. Our experimental evaluations demonstrate that, on average, the carbon footprint of LACS lies within 1.2% of the online baseline that assumes perfect job length information and within 16% of the offline baseline that, in addition to the job length, also requires accurate carbon intensity forecasts. Furthermore, LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.
Updated: 2024-06-04 04:34:24
标题: LACS:具有不确定需求的碳感知资源扩展的学习增强算法
摘要: 受减少云数据中心碳排放的迫切要求的驱动,本文研究了具有未知作业长度的在线碳感知资源调整问题(OCSU),并将其应用于执行计算工作负载的碳感知资源调整。任务是动态调整资源(例如服务器数量)分配给未知长度的作业,以便在截止日期之前完成,并旨在减少执行工作负载时的碳排放。执行作业的总碳排放来自于运行作业和在不同规模之间切换时产生的额外碳排放(例如,由于检查点和恢复)。以前关于碳感知资源调整的工作假设有准确的作业长度信息,而其他方法则忽略了切换损失,并需要碳密度预测。这些假设阻碍了以前的工作在在线碳感知执行可扩展计算工作负载方面的实际部署。我们提出了LACS,这是一个在理论上强大的学习增强算法,用于解决OCSU。为了实现改进的实际平均性能,LACS将机器学习预测的作业长度整合进来。为了实现坚实的理论性能,LACS扩展了最近关于在线转换与切换成本的理论进展,以处理作业长度未知的情况。我们的实验评估表明,平均而言,LACS的碳足迹与假设有完美作业长度信息的在线基线相比仅相差1.2%,与离线基线相比,后者除了作业长度外还需要准确的碳密度预测,相差16%。此外,与面向截止日期的碳不可知执行作业相比,LACS实现了32%的碳足迹减少。
更新时间: 2024-06-04 04:34:24
领域: cs.DC,cs.LG
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.
Updated: 2024-06-04 04:29:24
标题: 悟空:大规模推荐的扩展定律
摘要: 比例定律在可持续改进模型质量中起着至关重要的作用。不幸的是,迄今为止的推荐模型并没有展现出类似于大型语言模型领域中观察到的这些定律,这是由于它们的扩展机制的低效性。这种限制在将这些模型适应越来越复杂的实际数据集方面带来了重大挑战。在本文中,我们提出了一种基于纯粹堆叠因子分解机的有效网络架构和一种协同扩展策略,统称为Wukong,以建立在推荐领域中的比例定律。Wukong的独特设计使其能够通过更高更宽的层简单地捕捉多样的、任意顺序的交互。我们对六个公共数据集进行了广泛的评估,结果表明Wukong在质量方面始终优于最先进的模型。此外,我们在一个内部的大规模数据集上评估了Wukong的可扩展性。结果显示,Wukong在质量上保持了对最先进模型的优越性,同时在模型复杂度上跨越两个数量级,超过了100 GFLOP/示例,而先前的技术则表现不佳。
更新时间: 2024-06-04 04:29:24
领域: cs.LG,cs.AI
Improving Generalization in Aerial and Terrestrial Mobile Robots Control Through Delayed Policy Learning
Deep Reinforcement Learning (DRL) has emerged as a promising approach to enhancing motion control and decision-making through a wide range of robotic applications. While prior research has demonstrated the efficacy of DRL algorithms in facilitating autonomous mapless navigation for aerial and terrestrial mobile robots, these methods often grapple with poor generalization when faced with unknown tasks and environments. This paper explores the impact of the Delayed Policy Updates (DPU) technique on fostering generalization to new situations, and bolstering the overall performance of agents. Our analysis of DPU in aerial and terrestrial mobile robots reveals that this technique significantly curtails the lack of generalization and accelerates the learning process for agents, enhancing their efficiency across diverse tasks and unknown scenarios.
Updated: 2024-06-04 04:16:38
标题: 通过延迟策略学习改善航空和陆地移动机器人控制中的泛化
摘要: 深度强化学习(DRL)已经成为增强运动控制和决策制定的一种有前景的方法,适用于各种机器人应用。尽管先前的研究已经证明了DRL算法在促进无地图自主导航方面对于空中和地面移动机器人的有效性,但是这些方法在面对未知任务和环境时常常面临泛化能力不足的问题。本文探讨了延迟策略更新(DPU)技术对于促进对新情况的泛化以及增强代理的整体性能的影响。我们对空中和地面移动机器人中DPU的分析表明,这种技术显著减少了泛化能力不足,并加快了代理的学习过程,提高了它们在各种任务和未知情况下的效率。
更新时间: 2024-06-04 04:16:38
领域: cs.RO,cs.AI
Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain. We use this dataset to investigate whether machine translation (MT) metrics which are fine-tuned on human-generated MT quality judgements are robust to domain shifts between training and inference. We find that fine-tuned metrics exhibit a substantial performance drop in the unseen domain scenario relative to metrics that rely on the surface form, as well as pre-trained metrics which are not fine-tuned on MT quality judgments.
Updated: 2024-06-04 04:14:16
标题: Feinabgestimmte maschinelle Übersetzungsmetriken kämpfen mit unbekannten Domänen
摘要: 我们引入了一个新的、广泛的多维质量度量(MQM)标注数据集,涵盖了生物医学领域的11种语言对。我们利用这个数据集来研究机器翻译(MT)度量是否对人类生成的MT质量判断进行微调后,能否稳健地应对训练和推断之间的领域转移。我们发现,在未知领域情况下,经过微调的度量相对于依赖表面形式的度量以及未经MT质量判断微调的预训练度量表现出明显的性能下降。
更新时间: 2024-06-04 04:14:16
领域: cs.CL,cs.AI
Edit Distance Robust Watermarks for Language Models
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.
Updated: 2024-06-04 04:03:17
标题: 编辑距离鲁棒的语言模型水印
摘要: 受检测人工智能生成文本的问题的启发,我们考虑具有可证明保证的语言模型输出水印问题。我们的目标是实现符合以下条件的水印:(a) 无法检测性,这是由Christ、Gunn和Zamir(2024年)引入的密码学概念,规定很难在计算上区分带水印的语言模型输出和模型的实际输出分布;以及(b) 对引入恶意插入、替换和删除的通道具有稳健性,水印文本。早期方案只能处理随机替换和删除,因此我们的目标是获得一种更自然和吸引人的稳健性保证,该保证与编辑距离相关。我们的主要结果是一种水印方案,当语言模型的字母表大小允许按照安全参数的多项式增长时,可以实现无法检测性和对编辑的稳健性。为了导出这样一种方案,我们采用了Christ和Gunn(2024年)引入的一种方法,首先构建满足无法检测性和稳健性属性的伪随机码,类似于上述属性;我们的关键想法是通过将符号解释为编码词的索引来处理恶意插入和删除,我们将其称为索引伪随机码。此外,我们的编码依赖于比以前工作中使用的更弱的计算假设。然后我们表明,存在一种通用转换,从大字母表上的这种编码到任意语言模型的水印方案。
更新时间: 2024-06-04 04:03:17
领域: cs.CR,cs.AI,cs.LG
A Comparative Study of Sampling Methods with Cross-Validation in the FedHome Framework
This paper presents a comparative study of sampling methods within the FedHome framework, designed for personalized in-home health monitoring. FedHome leverages federated learning (FL) and generative convolutional autoencoders (GCAE) to train models on decentralized edge devices while prioritizing data privacy. A notable challenge in this domain is the class imbalance in health data, where critical events such as falls are underrepresented, adversely affecting model performance. To address this, the research evaluates six oversampling techniques using Stratified K-fold cross-validation: SMOTE, Borderline-SMOTE, Random OverSampler, SMOTE-Tomek, SVM-SMOTE, and SMOTE-ENN. These methods are tested on FedHome's public implementation over 200 training rounds with and without stratified K-fold cross-validation. The findings indicate that SMOTE-ENN achieves the most consistent test accuracy, with a standard deviation range of 0.0167-0.0176, demonstrating stable performance compared to other samplers. In contrast, SMOTE and SVM-SMOTE exhibit higher variability in performance, as reflected by their wider standard deviation ranges of 0.0157-0.0180 and 0.0155-0.0180, respectively. Similarly, the Random OverSampler method shows a significant deviation range of 0.0155-0.0176. SMOTE-Tomek, with a deviation range of 0.0160-0.0175, also shows greater stability but not as much as SMOTE-ENN. This finding highlights the potential of SMOTE-ENN to enhance the reliability and accuracy of personalized health monitoring systems within the FedHome framework.
Updated: 2024-06-04 04:03:07
标题: 一个关于在FedHome框架中使用交叉验证的抽样方法的比较研究
摘要: 本文介绍了在FedHome框架内进行的抽样方法的比较研究,该框架旨在为个性化的家庭健康监测提供支持。FedHome利用联邦学习(FL)和生成卷积自编码器(GCAE)在分散式边缘设备上训练模型,同时优先考虑数据隐私。在这一领域中一个显著的挑战是健康数据中的分类不平衡,例如摔倒等关键事件往往被低估,从而对模型性能产生不利影响。为了解决这一问题,研究评估了六种过抽样技术,采用分层K折交叉验证:SMOTE、Borderline-SMOTE、Random OverSampler、SMOTE-Tomek、SVM-SMOTE和SMOTE-ENN。这些方法在FedHome的公共实现上进行了测试,进行了200轮训练,有或没有分层K折交叉验证。研究结果表明,SMOTE-ENN实现了最一致的测试准确性,标准差范围为0.0167-0.0176,表现稳定,相对于其他抽样器而言。相反,SMOTE和SVM-SMOTE的性能变化较大,其标准差范围分别为0.0157-0.0180和0.0155-0.0180。同样,Random OverSampler方法显示了显著的标准差范围为0.0155-0.0176。SMOTE-Tomek的标准差范围为0.0160-0.0175,也表现出较高的稳定性,但不及SMOTE-ENN。这一发现突显了SMOTE-ENN提升FedHome框架内个性化健康监测系统的可靠性和准确性的潜力。
更新时间: 2024-06-04 04:03:07
领域: cs.LG,cs.AI,cs.CY
Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
We propose a graph-oriented attention-based explainability method for tabular data. Tasks involving tabular data have been solved mostly using traditional tree-based machine learning models which have the challenges of feature selection and engineering. With that in mind, we consider a transformer architecture for tabular data, which is amenable to explainability, and present a novel way to leverage self-attention mechanism to provide explanations by taking into account the attention matrices of all heads and layers as a whole. The matrices are mapped to a graph structure where groups of features correspond to nodes and attention values to arcs. By finding the maximum probability paths in the graph, we identify groups of features providing larger contributions to explain the model's predictions. To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods.
Updated: 2024-06-04 03:59:23
标题: 基于Transformer的多层注意力机制解释可解释性,在表格数据中的应用
摘要: 我们提出了一种基于图的注意力机制解释方法,适用于表格数据。涉及表格数据的任务大多使用传统的基于树的机器学习模型来解决,这些模型在特征选择和工程方面存在挑战。考虑到这一点,我们考虑了一种适用于表格数据的变压器架构,这种架构易于解释,并提出了一种新颖的方法,利用自注意机制来提供解释,通过将所有头和层的注意力矩阵作为一个整体来考虑。这些矩阵被映射到一个图结构,其中特征组对应节点,注意力值对应弧。通过在图中找到最大概率路径,我们确定了提供更大贡献以解释模型预测的特征组。为了评估多层注意力机制解释的质量,我们将它们与流行的注意力、梯度和扰动解释方法进行比较。
更新时间: 2024-06-04 03:59:23
领域: cs.LG,cs.AI
Data-Driven Approaches for Thrust Prediction in Underwater Flapping Fin Propulsion Systems
Flapping-fin underwater vehicle propulsion systems provide an alternative to propeller-driven systems in situations that require involve a constrained environment or require high maneuverability. Testing new configurations through experiments or high-fidelity simulations is an expensive process, slowing development of new systems. This is especially true when introducing new fin geometries. In this work, we propose machine learning approaches for thrust prediction given the system's fin geometries and kinematics. We introduce data-efficient fin shape parameterization strategies that enable our network to predict thrust profiles for unseen fin geometries given limited fin shapes in input data. In addition to faster development of systems, generalizable surrogate models offer fast, accurate predictions that could be used on an unmanned underwater vehicle control system.
Updated: 2024-06-04 03:58:58
标题: 数据驱动方法用于水下摆动鳍推进系统的推力预测
摘要: 鳍式水下车辆推进系统为在需要限制空间或高机动性的情况下提供了一种替代螺旋桨驱动系统的选择。通过实验或高保真度仿真测试新配置是一个昂贵的过程,会减缓新系统的开发。特别是在引入新的鳍状几何形状时更是如此。在这项工作中,我们提出了机器学习方法来预测给定系统鳍状几何形状和运动学的推力。我们引入了数据高效的鳍状参数化策略,使我们的网络能够在输入数据中具有有限鳍状形状的情况下预测未见过的鳍状几何形状的推力曲线。除了系统更快的开发外,具有泛化能力的代理模型提供了快速、准确的预测结果,可用于无人水下车辆控制系统。
更新时间: 2024-06-04 03:58:58
领域: cs.RO,cs.LG
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature
Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.
Updated: 2024-06-04 03:58:14
标题: Bileve:利用双层签名保护大型语言模型中的文本来源免受欺骗
摘要: 文本水印对于大型语言模型(LLMs)常被用于识别机器生成内容的来源,这对于在打击深度伪造或有害内容时评估责任是很有希望的。虽然现有的水印技术通常优先考虑抵抗移除攻击的稳健性,但不幸的是,它们容易受到欺骗攻击的影响:恶意行为者可以微妙地改变LLM生成的响应的含义,甚至伪造有害内容,可能误将责任归咎于LLM开发者。为了克服这一问题,我们引入了一种双层签名方案,Bileve,该方案嵌入了用于完整性检查的细粒度签名位(减轻欺骗攻击),以及一种粗粒度信号,当签名无效时追踪文本来源(增强可检测性),通过一种新颖的基于排名的采样策略。与仅输出二进制结果的传统水印检测器相比,Bileve在检测过程中可以区分5种情景,可可靠地追踪文本来源并规范LLMs。在OPT-1.3B和LLaMA-7B上进行的实验表明,Bileve在击败欺骗攻击方面具有增强的可检测性。
更新时间: 2024-06-04 03:58:14
领域: cs.CR,cs.CL
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs
This paper surveys evaluation techniques to enhance the trustworthiness and understanding of Large Language Models (LLMs). As reliance on LLMs grows, ensuring their reliability, fairness, and transparency is crucial. We explore algorithmic methods and metrics to assess LLM performance, identify weaknesses, and guide development towards more trustworthy applications. Key evaluation metrics include Perplexity Measurement, NLP metrics (BLEU, ROUGE, METEOR, BERTScore, GLEU, Word Error Rate, Character Error Rate), Zero-Shot and Few-Shot Learning Performance, Transfer Learning Evaluation, Adversarial Testing, and Fairness and Bias Evaluation. We introduce innovative approaches like LLMMaps for stratified evaluation, Benchmarking and Leaderboards for competitive assessment, Stratified Analysis for in-depth understanding, Visualization of Blooms Taxonomy for cognitive level accuracy distribution, Hallucination Score for quantifying inaccuracies, Knowledge Stratification Strategy for hierarchical analysis, and Machine Learning Models for Hierarchy Generation. Human Evaluation is highlighted for capturing nuances that automated metrics may miss. These techniques form a framework for evaluating LLMs, aiming to enhance transparency, guide development, and establish user trust. Future papers will describe metric visualization and demonstrate each approach on practical examples.
Updated: 2024-06-04 03:54:53
标题: 增强LLMs的信任:用于比较和解释LLMs的算法
摘要: 本文调查了评估技术,以增强大型语言模型(LLMs)的可信度和理解力。随着对LLMs的依赖程度增加,确保它们的可靠性、公平性和透明性至关重要。我们探讨了算法方法和度量标准来评估LLM的性能,识别弱点,并指导开发朝着更可信赖的应用方向发展。关键评估指标包括困惑度测量、自然语言处理度量(BLEU、ROUGE、METEOR、BERTScore、GLEU、单词错误率、字符错误率)、零样本和少样本学习性能、迁移学习评估、对抗测试、公平性和偏见评估。我们介绍了像LLMMaps这样的划分评估的创新方法,用于竞争性评估的基准和排行榜,用于深入理解的分层分析,用于认知水平准确度分布的布鲁姆斯分类可视化,用于量化不准确性的幻觉分数,用于层次分析的知识分层策略,以及用于层次生成的机器学习模型。重点介绍了人类评估,以捕捉自动度量可能遗漏的微妙之处。这些技术形成了一个评估LLMs的框架,旨在增强透明度、指导发展,并建立用户信任。未来的论文将描述度量可视化,并在实际示例中演示每种方法。
更新时间: 2024-06-04 03:54:53
领域: cs.CL,cs.AI,2020: 68T50, 68Q25,I.2.7; F.2.2
Speeding up Policy Simulation in Supply Chain RL
Simulating a single trajectory of a dynamical system under some state-dependent policy is a core bottleneck in policy optimization algorithms. The many inherently serial policy evaluations that must be performed in a single simulation constitute the bulk of this bottleneck. To wit, in applying policy optimization to supply chain optimization (SCO) problems, simulating a single month of a supply chain can take several hours. We present an iterative algorithm for policy simulation, which we dub Picard Iteration. This scheme carefully assigns policy evaluation tasks to independent processes. Within an iteration, a single process evaluates the policy only on its assigned tasks while assuming a certain 'cached' evaluation for other tasks; the cache is updated at the end of the iteration. Implemented on GPUs, this scheme admits batched evaluation of the policy on a single trajectory. We prove that the structure afforded by many SCO problems allows convergence in a small number of iterations, independent of the horizon. We demonstrate practical speedups of 400x on large-scale SCO problems even with a single GPU, and also demonstrate practical efficacy in other RL environments.
Updated: 2024-06-04 03:48:08
标题: 加速供应链强化学习中的政策模拟
摘要: 在某些状态相关策略下模拟动态系统的单一轨迹是政策优化算法中的一个核心瓶颈。在单次模拟中必须执行的许多固有的串行政策评估构成了这个瓶颈的主要部分。换句话说,在将政策优化应用于供应链优化(SCO)问题时,模拟供应链一个月的过程可能需要几个小时。 我们提出了一种用于政策模拟的迭代算法,我们称之为Picard迭代。该方案将政策评估任务精心分配给独立的进程。在每次迭代中,一个进程仅对其分配的任务进行政策评估,同时假设对其他任务进行某种“缓存”评估;缓存在迭代结束时更新。在GPU上实现,这种方案允许对单一轨迹进行批量评估。我们证明,许多SCO问题所提供的结构允许在少量迭代中收敛,而与时间跨度无关。我们展示了即使在单个GPU上,也可以在大规模SCO问题上实现400倍的实际加速,并且在其他RL环境中也展示了实际有效性。
更新时间: 2024-06-04 03:48:08
领域: cs.AI,cs.DC,cs.LG
Process-Driven Autoformalization in Lean 4
Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L}ean~\textbf{4} (\textbf{\name}) designed to evaluate the autoformalization capabilities of large language models (LLMs). This benchmark encompasses a comprehensive assessment of questions, answers, formal statements, and proofs. Additionally, we introduce a \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{PSV}) model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data. Furthermore, when fine-tuned with data containing detailed process information, PSV can leverage the data more effectively, leading to more significant improvements in autoformalization for Lean 4. Our dataset and code are available at \url{https://github.com/rookie-joe/PDA}.
Updated: 2024-06-04 03:48:08
标题: 《在Lean 4中的过程驱动自动形式化》
摘要: 自动形式化,将自然语言数学转化为形式语言,具有推进数学推理的重要潜力。然而,现有的努力局限于具有大量在线语料库的形式语言,并且难以跟上像Lean 4这样快速发展的语言。为了弥合这一差距,我们提出了一个新的基准\textbf{Form}alization for \textbf{L}ean~\textbf{4}(\textbf{\name}),旨在评估大型语言模型(LLMs)的自动形式化能力。该基准包括对问题、答案、形式化陈述和证明的全面评估。此外,我们引入了一个\textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier(\textbf{PSV})模型,利用Lean 4编译器的精确反馈来增强自动形式化。我们的实验表明,PSV方法改善了自动形式化,使得使用更少的过滤训练数据可以获得更高的准确性。此外,当用包含详细过程信息的数据进行微调时,PSV可以更有效地利用数据,从而在Lean 4的自动形式化方面取得更显著的改进。我们的数据集和代码可在\url{https://github.com/rookie-joe/PDA}上找到。
更新时间: 2024-06-04 03:48:08
领域: cs.CL,cs.LG,cs.LO
Orthogonal Causal Calibration
Estimates of causal parameters such as conditional average treatment effects and conditional quantile treatment effects play an important role in real-world decision making. Given this importance, one should ensure these estimators are calibrated. While there is a rich literature on calibrating estimators of non-causal parameters, very few methods have been derived for calibrating estimators of causal parameters, or more generally estimators of quantities involving nuisance parameters. In this work, we provide a general framework for calibrating predictors involving nuisance estimation. We consider a notion of calibration defined with respect to an arbitrary, nuisance-dependent loss $\ell$, under which we say an estimator $\theta$ is calibrated if its predictions cannot be changed on any level set to decrease loss. We prove generic upper bounds on the calibration error of any causal parameter estimate $\theta$ with respect to any loss $\ell$ using a concept called Neyman Orthogonality. Our bounds involve two decoupled terms - one measuring the error in estimating the unknown nuisance parameters, and the other representing the calibration error in a hypothetical world where the learned nuisance estimates were true. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration. One algorithm, which applies to universally orthogonalizable loss functions, transforms the data into generalized pseudo-outcomes and applies an off-the-shelf calibration procedure. The other algorithm, which applies to conditionally orthogonalizable loss functions, extends the classical uniform mass binning algorithm to include nuisance estimation. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings, with additional loss only arising from errors in nuisance estimation.
Updated: 2024-06-04 03:35:25
标题: 正交因果校准
摘要: 估计因果参数,如条件平均处理效应和条件分位数处理效应,在现实世界的决策中起着重要作用。鉴于这种重要性,人们应确保这些估计量是校准的。虽然有丰富的文献可以校准非因果参数的估计量,但对于校准因果参数的估计量,或者更一般地说,涉及干扰参数的量的估计量,很少有方法被提出。 在这项工作中,我们提供了一个涉及干扰估计的校准预测器的一般框架。我们考虑了一个与任意干扰相关的损失$\ell$定义的校准概念,根据这个概念,我们说一个估计量$\theta$是校准的,如果它的预测不能在任何水平集上被改变以减少损失。我们利用一种称为Neyman正交性的概念,证明了任何因果参数估计$\theta$相对于任何损失$\ell$的校准误差的通用上界。我们的界限涉及两个解耦合的术语 - 一个衡量了估计未知干扰参数的误差,另一个代表了在一个假设的世界中学到的干扰估计是真实的情况下的校准误差。我们利用我们的界限来分析用于因果校准的两种样本分割算法的收敛性。其中一种算法适用于普遍正交化损失函数,将数据转化为广义伪结果,并应用一个现成的校准程序。另一种算法适用于有条件正交化损失函数,将经典的均匀质量分箱算法扩展到包括干扰估计。我们的结果非常普遍,显示基本上任何现有的校准算法都可以在因果环境中使用,额外的损失仅来自于干扰估计的误差。
更新时间: 2024-06-04 03:35:25
领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH
Brain-Inspired Spiking Neural Networks for Industrial Fault Diagnosis: A Survey, Challenges, and Opportunities
In recent decades, Industrial Fault Diagnosis (IFD) has emerged as a crucial discipline concerned with detecting and gathering vital information about industrial equipment's health condition, thereby facilitating the identification of failure types and severities. The pursuit of precise and effective fault recognition has garnered substantial attention, culminating in a focus on automating equipment monitoring to preclude safety accidents and reduce reliance on human labor. The advent of artificial neural networks (ANNs) has been instrumental in augmenting intelligent IFD algorithms, particularly in the context of big data. Despite these advancements, ANNs, being a simplified biomimetic neural network model, exhibit inherent limitations such as resource and data dependencies and restricted cognitive capabilities. To address these limitations, the third-generation Spiking Neural Network (SNN), founded on principles of Brain-inspired computing, has surfaced as a promising alternative. The SNN, characterized by its biological neuron dynamics and spiking information encoding, demonstrates exceptional potential in representing spatiotemporal features. Consequently, developing SNN-based IFD models has gained momentum, displaying encouraging performance. Nevertheless, this field lacks systematic surveys to illustrate the current situation, challenges, and future directions. Therefore, this paper systematically reviews the theoretical progress of SNN-based models to answer the question of what SNN is. Subsequently, it reviews and analyzes existing SNN-based IFD models to explain why SNN needs to be used and how to use it. More importantly, this paper systematically answers the challenges, solutions, and opportunities of SNN in IFD.
Updated: 2024-06-04 03:31:10
标题: 基于大脑启发的脉冲神经网络在工业故障诊断中的应用:调查、挑战和机遇
摘要: 在最近几十年中,工业故障诊断(IFD)作为一个关键的学科出现了,其关注于检测和收集有关工业设备健康状况的重要信息,从而促进了对故障类型和严重程度的识别。精确和有效的故障识别受到了广泛关注,最终集中在自动化设备监测上,以预防安全事故并减少对人力劳动的依赖。人工神经网络(ANNs)的出现对增强智能IFD算法尤为重要,特别是在大数据的背景下。尽管取得了这些进展,作为一种简化的仿生神经网络模型,ANNs存在固有的限制,如资源和数据依赖性以及受限的认知能力。为了解决这些限制,基于大脑启发计算原理的第三代脉冲神经网络(SNN)已经成为一种有前途的替代方案。SNN以其生物神经元动态和脉冲信息编码为特征,展示了在表示时空特征方面的卓越潜力。因此,开发基于SNN的IFD模型已经获得了动力,并展现出了令人鼓舞的表现。然而,这一领域缺乏系统的调查来说明当前情况、挑战和未来方向。因此,本文系统回顾了基于SNN模型的理论进展,以回答SNN是什么的问题。随后,它审查和分析了现有的基于SNN的IFD模型,以解释为什么需要使用SNN以及如何使用它。更重要的是,本文系统回答了SNN在IFD中的挑战、解决方案和机遇。
更新时间: 2024-06-04 03:31:10
领域: cs.NE,cs.AI,cs.LG
Fine-Grained Modeling of Narrative Context: A Coherence Perspective via Retrospective Questions
This work introduces an original and practical paradigm for narrative comprehension, stemming from the characteristics that individual passages within narratives tend to be more cohesively related than isolated. Complementary to the common end-to-end paradigm, we propose a fine-grained modeling of narrative context, by formulating a graph dubbed NarCo, which explicitly depicts task-agnostic coherence dependencies that are ready to be consumed by various downstream tasks. In particular, edges in NarCo encompass free-form retrospective questions between context snippets, inspired by human cognitive perception that constantly reinstates relevant events from prior context. Importantly, our graph formalism is practically instantiated by LLMs without human annotations, through our designed two-stage prompting scheme. To examine the graph properties and its utility, we conduct three studies in narratives, each from a unique angle: edge relation efficacy, local context enrichment, and broader application in QA. All tasks could benefit from the explicit coherence captured by NarCo.
Updated: 2024-06-04 03:26:19
标题: 细粒度建模叙事背景:通过回顾性问题的一致性视角
摘要: 这项工作介绍了一种原创且实用的叙事理解范式,源自叙事中个别段落往往比孤立更具内聚性的特点。与常见的端到端范式相辅相成,我们提出了对叙事背景进行细粒度建模的方法,通过构建一个名为NarCo的图形,明确展示了任务无关的连贯性依赖关系,可供各种下游任务消费。具体而言,NarCo中的边涵盖了上下文片段之间的自由形式的回顾性问题,受到人类认知感知的启发,后者不断重新呈现先前上下文中的相关事件。重要的是,我们的图形形式化由LLMs实际实现,无需人类注释,通过我们设计的两阶段提示方案。为了检验图形的属性及其实用性,我们从三个独特角度进行了三项关于叙事的研究:边关系效力、局部上下文丰富性以及在问答中的更广泛应用。所有任务都可以从NarCo捕获的明确连贯性中受益。
更新时间: 2024-06-04 03:26:19
领域: cs.CL,cs.LG
Fast networked data selection via distributed smoothed quantile estimation
Collecting the most informative data from a large dataset distributed over a network is a fundamental problem in many fields, including control, signal processing and machine learning. In this paper, we establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset. The top-$k$ selection in a network can be formulated as a distributed nonsmooth convex optimization problem known as quantile estimation. Unfortunately, the lack of smoothness in the local objective functions leads to extremely slow convergence and poor scalability with respect to the network size. To overcome the deficiency, we propose an accelerated method that employs smoothing techniques. Leveraging the piecewise linearity of the local objective functions in quantile estimation, we characterize the iteration complexity required to achieve top-$k$ selection, a challenging task due to the lack of strong convexity. Several numerical results are provided to validate the effectiveness of the algorithm and the correctness of the theory.
Updated: 2024-06-04 03:26:15
标题: 通过分布式平滑分位数估计快速网络数据选择
摘要: 在许多领域,包括控制、信号处理和机器学习中,从分布在网络上的大型数据集中收集最具信息量的数据是一个基本问题。本文建立了选择最具信息量数据与找到多重集合的前k个元素之间的联系。在网络中的前k选择可以被公式化为一个分布式非光滑凸优化问题,即分位数估计。不幸的是,局部目标函数的缺乏光滑性导致了收敛速度极慢,并且在网络规模方面的可扩展性差。为了克服这一不足,我们提出了一种采用平滑技术的加速方法。利用分位数估计中局部目标函数的分段线性特性,我们表征了实现前k选择所需的迭代复杂性,这是一项具有挑战性的任务,因为缺乏强凸性。提供了几个数值结果来验证算法的有效性和理论的正确性。
更新时间: 2024-06-04 03:26:15
领域: eess.SY,cs.AI,cs.SY
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the Prototypical Part Network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this architecture is able to produce visually interpretable classifications, it often learns to classify based on parts of the image that are not semantically meaningful. To address this problem, we propose the Reward Reweighing, Reselecting, and Retraining (R3) post-processing framework, which performs three additional corrective updates to a pretrained ProtoPNet in an offline and efficient manner. The first two steps involve learning a reward model based on collected human feedback and then aligning the prototypes with human preferences. The final step is retraining, which realigns the base features and the classifier layer of the original model with the updated prototypes. We find that our R3 framework consistently improves both the interpretability and the predictive accuracy of ProtoPNet and its variants.
Updated: 2024-06-04 03:25:20
标题: 通过奖励重新加权、重新选择和重新训练来改善原型视觉解释
摘要: 近年来,人们开始研究开发深度可解释的图像分类方法,可以清楚地将模型的输出归因于数据的特定特征。其中一种方法是原型部分网络(ProtoPNet),它试图基于输入的有意义部分对图像进行分类。虽然这种架构能够产生视觉可解释的分类结果,但通常学习基于图像中不具有语义意义的部分进行分类。为了解决这个问题,我们提出了奖励重新加权、重新选择和重新训练(R3)后处理框架,以离线和高效的方式对预训练的ProtoPNet进行三个额外的校正更新。前两个步骤涉及学习基于收集的人类反馈的奖励模型,然后将原型与人类偏好对齐。最后一步是重新训练,它重新调整原始模型的基础特征和分类器层与更新的原型对齐。我们发现,我们的R3框架始终提高了ProtoPNet及其变体的解释性和预测准确性。
更新时间: 2024-06-04 03:25:20
领域: cs.LG,cs.AI,cs.CV,cs.HC
Redefining DDoS Attack Detection Using A Dual-Space Prototypical Network-Based Approach
Distributed Denial of Service (DDoS) attacks pose an increasingly substantial cybersecurity threat to organizations across the globe. In this paper, we introduce a new deep learning-based technique for detecting DDoS attacks, a paramount cybersecurity challenge with evolving complexity and scale. Specifically, we propose a new dual-space prototypical network that leverages a unique dual-space loss function to enhance detection accuracy for various attack patterns through geometric and angular similarity measures. This approach capitalizes on the strengths of representation learning within the latent space (a lower-dimensional representation of data that captures complex patterns for machine learning analysis), improving the model's adaptability and sensitivity towards varying DDoS attack vectors. Our comprehensive evaluation spans multiple training environments, including offline training, simulated online training, and prototypical network scenarios, to validate the model's robustness under diverse data abundance and scarcity conditions. The Multilayer Perceptron (MLP) with Attention, trained with our dual-space prototypical design over a reduced training set, achieves an average accuracy of 94.85% and an F1-Score of 94.71% across our tests, showcasing its effectiveness in dynamic and constrained real-world scenarios.
Updated: 2024-06-04 03:22:52
标题: 重新定义DDoS攻击检测:使用基于双空间原型网络的方法
摘要: 分布式拒绝服务(DDoS)攻击对全球组织构成越来越重要的网络安全威胁。本文介绍了一种新的基于深度学习的技术,用于检测DDoS攻击,这是一个不断演变复杂和规模不断扩大的网络安全挑战。具体来说,我们提出了一种新的双空间原型网络,利用独特的双空间损失函数通过几何和角度相似度测量来提高对各种攻击模式的检测准确性。这种方法利用了潜在空间中的表示学习的优势(潜在数据的低维表示,用于捕获机器学习分析的复杂模式),改进了模型对不同DDoS攻击向量的适应性和敏感性。我们的全面评估涵盖多个训练环境,包括离线训练、模拟在线训练和原型网络场景,以验证模型在不同数据丰富度和稀缺性条件下的稳健性。经过我们设计的双空间原型训练的多层感知器(MLP)和注意力,在缩减的训练集上实现了94.85%的平均准确率和94.71%的F1-Score,在动态和受限的现实场景中展示了其有效性。
更新时间: 2024-06-04 03:22:52
领域: cs.CR,cs.LG,cs.NI
Position-based Rogue Access Point Detection
Rogue Wi-Fi access point (AP) attacks can lead to data breaches and unauthorized access. Existing rogue AP detection methods and tools often rely on channel state information (CSI) or received signal strength indicator (RSSI), but they require specific hardware or achieve low detection accuracy. On the other hand, AP positions are typically fixed, and Wi-Fi can support indoor positioning of user devices. Based on this position information, the mobile platform can check if one (or more) AP in range is rogue. The inclusion of a rogue AP would in principle result in a wrong estimated position. Thus, the idea to use different subsets of APs: the positions computed based on subsets that include a rogue AP will be significantly different from those that do not. Our scheme contains two components: subset generation and position validation. First, we generate subsets of RSSIs from APs, which are then utilized for positioning, similar to receiver autonomous integrity monitoring (RAIM). Second, the position estimates, along with uncertainties, are combined into a Gaussian mixture, to check for inconsistencies by evaluating the overlap of the Gaussian components. Our comparative analysis, conducted on a real-world dataset with three types of attacks and synthetic RSSIs integrated, demonstrates a substantial improvement in rogue AP detection accuracy.
Updated: 2024-06-04 03:22:36
标题: 基于位置的恶意接入点检测
摘要: 流氓Wi-Fi接入点(AP)攻击可能导致数据泄露和未经授权访问。现有的流氓AP检测方法和工具通常依赖于信道状态信息(CSI)或接收信号强度指示器(RSSI),但它们需要特定的硬件或实现低检测精度。另一方面,AP位置通常是固定的,而Wi-Fi可以支持用户设备的室内定位。基于这些位置信息,移动平台可以检查范围内是否有一个(或多个)流氓AP。包含流氓AP原则上会导致错误的估计位置。因此,使用不同子集的AP的想法:基于包含流氓AP的子集计算的位置将与不包含流氓AP的位置显着不同。我们的方案包含两个组件:子集生成和位置验证。首先,我们从AP中生成RSSI的子集,然后将其用于定位,类似于接收机自主完整性监测(RAIM)。其次,位置估计值以及不确定性被合并成高斯混合物,通过评估高斯成分的重叠来检查不一致性。我们在一个包含三种类型攻击和合成RSSI的真实数据集上进行的比较分析表明,在流氓AP检测准确性方面取得了显着改善。
更新时间: 2024-06-04 03:22:36
领域: cs.CR
Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology
This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time target tracking and classification, environment perception and decision support, and path planning and navigation. Application process in key areas. Research results show that the proposed system has an accuracy of over 98% in image recognition, target tracking and classification, and also demonstrates efficient performance and practicality in environmental perception and decision support, path planning and navigation. The conclusion points out that deep learning technology can significantly improve the accuracy and real-time response capabilities of autonomous driving systems. Although there are still challenges in environmental perception and decision support, with the advancement of technology, it is expected to achieve wider applications and greater capabilities in the future. potential.
Updated: 2024-06-04 03:15:41
标题: 基于深度学习的计算机视觉在自动驾驶技术中的应用研究
摘要: 这项研究旨在探讨深度学习在自动驾驶计算机视觉技术中的应用以及其对系统性能改进的影响。通过使用卷积神经网络(CNN)、多任务联合学习方法和深度强化学习等先进技术,本文详细分析了深度学习在图像识别、实时目标跟踪和分类、环境感知和决策支持、路径规划和导航等关键领域的应用过程。研究结果显示,所提出的系统在图像识别、目标跟踪和分类方面的准确率超过98%,同时在环境感知和决策支持、路径规划和导航方面表现出高效性和实用性。结论指出,深度学习技术可以显著提高自动驾驶系统的准确性和实时响应能力。尽管在环境感知和决策支持方面仍然存在挑战,但随着技术的进步,预计未来将实现更广泛的应用和更大的潜力。
更新时间: 2024-06-04 03:15:41
领域: cs.CV,cs.AI
Towards Robust Physical-world Backdoor Attacks on Lane Detection
Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint transformations) and environmental conditions (e.g., weather or lighting changes). To tackle this issue, this paper introduces BadLANE, a dynamic scene adaptation backdoor attack for LD designed to withstand changes in real-world dynamic scene factors. To address the challenges posed by changing driving perspectives, we propose an amorphous trigger pattern composed of shapeless pixels. This trigger design allows the backdoor to be activated by various forms or shapes of mud spots or pollution on the road or lens, enabling adaptation to changes in vehicle observation viewpoints during driving. To mitigate the effects of environmental changes, we design a meta-learning framework to train meta-generators tailored to different environmental conditions. These generators produce meta-triggers that incorporate diverse environmental information, such as weather or lighting conditions, as the initialization of the trigger patterns for backdoor implantation, thus enabling adaptation to dynamic environments. Extensive experiments on various commonly used LD models in both digital and physical domains validate the effectiveness of our attacks, outperforming other baselines significantly (+25.15% on average in Attack Success Rate). Our codes will be available upon paper publication.
Updated: 2024-06-04 03:13:03
标题: 朝着对车道检测的物理世界鲁棒后门攻击
摘要: 基于深度学习的车道检测(LD)在自动驾驶系统中扮演着关键角色,例如自适应巡航控制。然而,它容易受到后门攻击的影响。现有的车道检测后门攻击方法在动态现实场景中表现出有限的有效性,主要是因为它们未考虑到动态场景因素,包括驾驶视角的变化(例如视角转换)和环境条件的变化(例如天气或光照变化)。为了解决这个问题,本文介绍了BadLANE,一种专为LD设计的动态场景适应后门攻击,旨在应对真实世界动态场景因素的变化。为了解决不断变化的驾驶视角带来的挑战,我们提出了由无固定形状像素组成的不规则触发模式。这种触发设计允许后门被各种形式或形状的泥迹或污染物激活,从而使其能够适应驾驶过程中车辆观察视角的变化。为了减轻环境变化的影响,我们设计了一个元学习框架,用于训练适应不同环境条件的元生成器。这些生成器生成包含各种环境信息的元触发器,例如天气或光照条件,作为后门植入的触发模式的初始化,从而实现对动态环境的适应。在数字和物理领域对各种常用LD模型进行了广泛实验,验证了我们攻击的有效性,显著优于其他基线(攻击成功率平均提高了25.15%)。我们的代码将在论文发表后提供。
更新时间: 2024-06-04 03:13:03
领域: cs.CV,cs.AI
TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision
Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving $\textit{machine precision}$ in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.
Updated: 2024-06-04 03:11:56
标题: TENG:用深度神经网络解决偏微分方程的时间演进自然梯度向机器精度的方法
摘要: 偏微分方程(PDEs)在科学和工程中对建模动态系统至关重要。神经网络的出现引发了在解决这些复杂性方面的显著转变,尽管在准确性方面仍然存在挑战,特别是对于初值问题。在本文中,我们引入了$\textit{时变自然梯度(TENG)}$,将时间相关的变分原理和基于优化的时间积分进行了泛化,利用自然梯度优化来获得神经网络PDE解的高准确性。我们的全面开发包括TENG-Euler等算法及其高阶变体,如TENG-Heun,专门针对提高精度和效率。TENG的有效性通过其性能进一步验证,超越了当前主流方法,并在一系列PDE中实现了逐步优化的$\textit{机器精度}$,包括热方程,Allen-Cahn方程和Burgers'方程。
更新时间: 2024-06-04 03:11:56
领域: cs.LG,physics.comp-ph
Navigating the Future of Federated Recommendation Systems with Foundation Models
In recent years, the integration of federated learning (FL) and recommendation systems (RS), known as Federated Recommendation Systems (FRS), has attracted attention for preserving user privacy by keeping private data on client devices. However, FRS faces inherent limitations such as data heterogeneity and scarcity, due to the privacy requirements of FL and the typical data sparsity issues of RSs. Models like ChatGPT are empowered by the concept of transfer learning and self-supervised learning, so they can be easily applied to the downstream tasks after fine-tuning or prompting. These models, so-called Foundation Models (FM), fouce on understanding the human's intent and perform following their designed roles in the specific tasks, which are widely recognized for producing high-quality content in the image and language domains. Thus, the achievements of FMs inspire the design of FRS and suggest a promising research direction: integrating foundation models to address the above limitations. In this study, we conduct a comprehensive review of FRSs with FMs. Specifically, we: 1) summarise the common approaches of current FRSs and FMs; 2) review the challenges posed by FRSs and FMs; 3) discuss potential future research directions; and 4) introduce some common benchmarks and evaluation metrics in the FRS field. We hope that this position paper provides the necessary background and guidance to explore this interesting and emerging topic.
Updated: 2024-06-04 03:10:54
标题: 使用基础模型导航联邦推荐系统的未来
摘要: 近年来,联邦学习(FL)与推荐系统(RS)的整合,即联邦推荐系统(FRS),因在客户设备上保留私人数据以保护用户隐私而受到关注。然而,FRS面临固有限制,如数据异构性和稀缺性,这是由于FL的隐私要求和RS的典型数据稀疏问题所致。像ChatGPT这样的模型受到迁移学习和自监督学习概念的赋予,因此它们可以在微调或提示后轻松应用于下游任务。这些模型,被称为基础模型(FM),致力于理解人类意图并根据设计的角色在特定任务中执行,被广泛认为在图像和语言领域生成高质量内容。因此,基础模型的成就激发了FRS的设计,并提出了一个有前途的研究方向:整合基础模型以解决上述限制。在这项研究中,我们对带有基础模型的FRS进行了全面审查。具体而言,我们:1)总结了当前FRS和FM的常用方法;2)审查了FRS和FM带来的挑战;3)讨论了潜在的未来研究方向;4)介绍了FRS领域的一些常见基准和评估指标。我们希望这篇立场论文为探索这一有趣且新兴的主题提供必要的背景和指导。
更新时间: 2024-06-04 03:10:54
领域: cs.IR,cs.AI,cs.LG
A New Analysis of Differential Privacy's Generalization Guarantees
We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality.
Updated: 2024-06-04 03:08:58
标题: 《差分隐私的泛化保证的新分析》
摘要: 我们提供了一个关于自适应数据分析的“转移定理”的新证明:对于任何用于回答自适应选择的统计查询的机制,只要具有差分隐私和样本精确性,就可以在样本外准确。我们的新证明是基础性的,并提供了我们预计将在其他地方有用的结构洞见。我们展示:1)差分隐私确保对由交互过程的传输引发的数据集的后验分布上的任何查询的期望接近于其在数据分布上的真实值,2)样本准确性本身确保机制产生的任何查询答案接近其后验期望。这个第二个断言是通过一个思维实验得出的,即在机制承诺其答案后,我们设想从后验分布重新对数据集进行抽样。然后通过总结这两种界限,特别是避免了在先前工作中用于推导高概率界限的“监控参数”而得出转移定理。我们新证明技术的一个好处是,我们获得的具体界限明显优于先前已知的最佳界限,即使这些改进是在常数上,而不是在已知为紧密的渐近性上。正如我们所展示的,我们的新界限在比以前的技术水平更小的数据集大小上比朴素的“样本拆分”基线效果更好,将这个领域的技术更接近实用性。
更新时间: 2024-06-04 03:08:58
领域: cs.LG,cs.CR,stat.ML
CredSec: A Blockchain-based Secure Credential Management System for University Adoption
University education play a critical role in shaping intellectual and professional development of the individuals and contribute significantly to the advancement of knowledge and society. Generally, university authority has a direct control of students result making and stores the credential in their local dedicated server. So, there is chance to alter the credential and also have a very high possibility to encounter various threats and different security attacks. To resolve these, we propose a blockchain based secure credential management system (BCMS) for efficiently storing, managing and recovering credential without involving the university authority. The proposed BCMS incorporates a modified two factor encryption (m2FE) technique, a combination of RSA cryptosystem and a DNA encoding to ensure credential privacy and an enhanced authentication scheme for teachers and students. Besides, to reduce size of the cipher credential and its conversion time, we use character to integer (C2I) table instead of ASCII table. Finally, the experimental result and analysis of the BCMS illustrate the effectiveness over state of the art works.
Updated: 2024-06-04 03:07:54
标题: CredSec:面向大学采用的基于区块链的安全凭证管理系统
摘要: 大学教育在塑造个人的智力和专业发展方面起着至关重要的作用,并对知识和社会的进步做出了重大贡献。一般来说,大学管理当局直接控制学生的成绩,并将凭证存储在他们的本地专用服务器中。因此,存在篡改凭证的可能性,也有很高的可能性遇到各种威胁和不同的安全攻击。为了解决这些问题,我们提出了一种基于区块链的安全凭证管理系统(BCMS),用于高效存储、管理和恢复凭证,而无需涉及大学管理当局。所提出的BCMS结合了修改后的双因子加密(m2FE)技术,将RSA密码系统和DNA编码相结合,以确保凭证的隐私性,并为教师和学生提供增强的身份验证方案。此外,为了减小密文凭证的大小和转换时间,我们使用字符到整数(C2I)表,而不是ASCII表。最后,BCMS的实验结果和分析展示了其在现有技术中的有效性。
更新时间: 2024-06-04 03:07:54
领域: cs.CR
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages self-generated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and informativeness of generated responses. Extensive experiments demonstrate that our method significantly reduces hallucinations and improves cross-modal consistency across various benchmarks and cutting-edge LMMs. Our method provides a simple yet effective decoding strategy that can be integrated to existing LMM frameworks without additional training.
Updated: 2024-06-04 03:04:21
标题: "CODE:在大型多模态模型中对抗幻觉的自动生成描述的对比研究"
摘要: 大型多模态模型(LMMs)最近表现出在视觉上下文理解和连贯响应生成方面的显着能力。然而,随着这些进展,幻觉问题已经出现,成为一个重要挑战,导致产生与视觉内容无关的错误响应。在本文中,我们引入了一种新颖的对比解码方法,称为COuntering DEscription Contrastive Decoding(CODE),在LMMs的解码阶段利用自动生成的描述作为对比参考来解决幻觉问题。CODE利用模型自身的全面描述作为视觉对应物,来校正和改进响应与实际视觉内容的对齐。通过动态调整LMM词汇表中下一个标记预测的信息流和分布,CODE增强了生成响应的连贯性和信息量。大量实验证明,我们的方法显著减少了幻觉,并提高了在各种基准和尖端LMMs中的跨模态一致性。我们的方法提供了一种简单而有效的解码策略,可以集成到现有的LMM框架中,无需额外训练。
更新时间: 2024-06-04 03:04:21
领域: cs.CV,cs.AI
Decoupled Alignment for Robust Plug-and-Play Adaptation
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we employ delta debugging to identify the critical components of knowledge necessary for effective distillation. On the harmful question dataset, our method significantly enhances the average defense success rate by approximately 14.41%, reaching as high as 51.39%, in 17 unaligned pre-trained LLMs, without compromising performance.
Updated: 2024-06-04 03:04:09
标题: 解耦对齐用于稳健的即插即用调整
摘要: 我们引入了一种低资源安全增强方法,用于对齐大型语言模型(LLMs),而无需进行监督微调(SFT)或从人类反馈中进行强化学习(RLHF)。我们的主要思想是利用知识蒸馏,从现有对齐良好的LLMs中提取对齐信息,并以即插即用的方式集成到未对齐的LLMs中。在方法论上,我们采用增量调试来识别对有效蒸馏必要的知识的关键组件。在有害问题数据集上,我们的方法显著提高了平均防御成功率,约为14.41%,在17个未对齐的预训练LLMs中达到了51.39%,而不会影响性能。
更新时间: 2024-06-04 03:04:09
领域: cs.CL,cs.AI,cs.CR
Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning
We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.
Updated: 2024-06-04 03:04:00
标题: 苏格拉底怀疑的回响:在校准证据强化学习中接纳不确定性
摘要: 我们提出了一种新颖的统计方法,用于在基于分位数回归的深度Q网络中引入不确定性意识的无模型分布式强化学习。所提出的算法$\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$旨在解决在随机环境中分别估计aleatoric和epistemic不确定性所面临的关键挑战。它将深度证据学习与基于符合推断原则的分位数校准相结合,提供明确的、无需样本计算的$\textit{全局}$不确定性,而不是基于简单方差的$\textit{局部}$估计,克服了传统方法在计算和统计效率以及处理超出分布(OOD)观测方面的局限性。在一套小型Atari游戏(即MinAtar)上进行测试后,CEQR-DQN显示出在得分和学习速度方面超过类似的现有框架。其严格评估不确定性的能力改进了探索策略,并可以作为其他需要不确定性意识的算法的蓝图。
更新时间: 2024-06-04 03:04:00
领域: cs.LG,cs.AI
Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction
Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause significant increases in costs including GPU memory consumption and computing latency, named diverging training costs issue. Affected by the problem, most existing methods adopt low-resolution (LR) BEV and struggle to estimate the precise locations of urban scene components like road lanes, and sidewalks. As the imprecision leads to risky self-driving, the diverging training costs issue has to be resolved. In this paper, we address the issue with our novel Trumpet Neural Network (TNN) mechanism. The framework utilizes LR BEV space and outputs an up-sampled semantic BEV map to create a memory-efficient pipeline. To this end, we introduce Local Restoration of BEV representation. Specifically, the up-sampled BEV representation has severely aliased, blocky signals, and thick semantic labels. Our proposed Local Restoration restores the signals and thins (or narrows down) the width of the labels. Our extensive experiments show that the TNN mechanism provides a plug-and-play memory-efficient pipeline, thereby enabling the effective estimation of real-sized (or precise) semantic labels for BEV map construction.
Updated: 2024-06-04 03:03:39
标题: 解决训练成本分歧问题:利用局部恢复进行精确鸟瞰地图构建
摘要: 最近在鸟瞰视图(BEV)融合技术方面取得了显著进展,用于地图构建在城市环境中表现出了卓越的映射能力。然而,它们深层且笨重的架构导致了大量的反向传播内存和计算延迟。因此,这个问题在构建高分辨率(HR)BEV地图时构成了一个不可避免的瓶颈,因为它们的大尺寸特征导致了成本的显著增加,包括GPU内存消耗和计算延迟,称为训练成本分歧问题。受到这个问题的影响,大多数现有方法采用低分辨率(LR)BEV,并且难以准确估计城市场景组件如道路车道和人行道的精确位置。由于不准确导致了危险的自动驾驶,因此必须解决训练成本分歧问题。在本文中,我们通过我们的新颖的Trumpet神经网络(TNN)机制来解决这个问题。该框架利用LR BEV空间并输出一个上采样的语义BEV地图来创建一个内存高效的管道。为此,我们引入了BEV表示的局部恢复。具体而言,上采样的BEV表示具有严重的伪影、块状信号和厚的语义标签。我们提出的局部恢复恢复了信号并缩小(或减小)标签的宽度。我们的广泛实验表明,TNN机制提供了一个即插即用的内存高效管道,从而使BEV地图构建中真正大小(或精确)的语义标签的有效估计成为可能。
更新时间: 2024-06-04 03:03:39
领域: cs.CV,cs.AI
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https://github.com/OODRobustBench/OODRobustBench.
Updated: 2024-06-04 03:01:05
标题: OODRobustBench:一个基准和大规模分析对分布转移下的敌对鲁棒性
摘要: 现有的工作在提高对抗鲁棒性方面取得了很大进展,但通常只在与训练数据相同分布的数据上测试他们的方法,即内部分布(ID)测试。因此,目前尚不清楚这种鲁棒性在输入分布转移下的泛化能力,即超出分布(OOD)测试。这种遗漏是令人担忧的,因为当方法在实际应用中部署时,这种分布转移是不可避免的。为了解决这个问题,我们提出了一个名为OODRobustBench的基准测试,以全面评估OOD对抗鲁棒性,使用23个数据集的分布转移(即输入分布中的自然转移)和6个威胁转移(即未预见的对抗威胁模型)。OODRobustBench用于评估706个鲁棒模型,进行了60.7K次对抗评估。这一大规模分析显示:1)对抗鲁棒性存在严重的OOD泛化问题;2)ID鲁棒性与OOD鲁棒性呈正线性关联。后者使得可以从ID鲁棒性预测OOD鲁棒性。我们随后预测并验证现有方法不太可能实现高OOD鲁棒性。因此,需要新的方法来实现超出我们预测的OOD鲁棒性。为了促进这些方法的发展,我们研究了一系列技术,并确定了几个有前途的方向。代码和模型可在以下链接找到:https://github.com/OODRobustBench/OODRobustBench。
更新时间: 2024-06-04 03:01:05
领域: cs.LG,cs.CV
Image steganography based on generative implicit neural representation
In the realm of advanced steganography, the scale of the model typically correlates directly with the resolution of the fundamental grid, necessitating the training of a distinct neural network for message extraction. This paper proposes an image steganography based on generative implicit neural representation. This approach transcends the constraints of image resolution by portraying data as continuous functional expressions. Notably, this method permits the utilization of a diverse array of multimedia data as cover images, thereby broadening the spectrum of potential carriers. Additionally, by fixing a neural network as the message extractor, we effectively redirect the training burden to the image itself, resulting in both a reduction in computational overhead and an enhancement in steganographic speed. This approach also circumvents potential transmission challenges associated with the message extractor. Experimental findings reveal that this methodology achieves a commendable optimization efficiency, achieving a completion time of just 3 seconds for 64x64 dimensional images, while concealing only 1 bpp of information. Furthermore, the accuracy of message extraction attains an impressive mark of 100%.
Updated: 2024-06-04 03:00:47
标题: 基于生成隐式神经表示的图像隐写术
摘要: 在先进隐写术领域,模型的规模通常与基本网格的分辨率直接相关,需要训练一个独立的神经网络用于消息提取。本文提出了一种基于生成隐式神经表示的图像隐写术。这种方法通过将数据表现为连续的功能表达,超越了图像分辨率的限制。值得注意的是,这种方法允许利用各种多媒体数据作为封面图像,从而扩大了潜在载体的范围。此外,通过将神经网络固定为消息提取器,我们有效地将训练负担转移到图像本身,既减少了计算开销,又提高了隐写术速度。这种方法还规避了与消息提取器相关的潜在传输挑战。实验结果显示,这种方法实现了令人称赞的优化效率,对于64x64维度的图像,完成时间仅为3秒,同时隐藏了仅1 bpp的信息。此外,消息提取的准确度达到了100%的令人印象深刻的水平。
更新时间: 2024-06-04 03:00:47
领域: cs.CR,68T07,E.3
GOMAA-Geo: GOal Modality Agnostic Active Geo-localization
We consider the task of active geo-localization (AGL) in which an agent uses a sequence of visual cues observed during aerial navigation to find a target specified through multiple possible modalities. This could emulate a UAV involved in a search-and-rescue operation navigating through an area, observing a stream of aerial images as it goes. The AGL task is associated with two important challenges. Firstly, an agent must deal with a goal specification in one of multiple modalities (e.g., through a natural language description) while the search cues are provided in other modalities (aerial imagery). The second challenge is limited localization time (e.g., limited battery life, urgency) so that the goal must be localized as efficiently as possible, i.e. the agent must effectively leverage its sequentially observed aerial views when searching for the goal. To address these challenges, we propose GOMAA-Geo - a goal modality agnostic active geo-localization agent - for zero-shot generalization between different goal modalities. Our approach combines cross-modality contrastive learning to align representations across modalities with supervised foundation model pretraining and reinforcement learning to obtain highly effective navigation and localization policies. Through extensive evaluations, we show that GOMAA-Geo outperforms alternative learnable approaches and that it generalizes across datasets - e.g., to disaster-hit areas without seeing a single disaster scenario during training - and goal modalities - e.g., to ground-level imagery or textual descriptions, despite only being trained with goals specified as aerial views. Code and models are publicly available at https://github.com/mvrl/GOMAA-Geo/tree/main.
Updated: 2024-06-04 02:59:36
标题: GOMAA-Geo: 目标模态不可知的主动地理定位
摘要: 我们考虑了主动地理定位(AGL)的任务,在这个任务中,一个代理人使用在空中导航过程中观察到的一系列视觉线索来找到通过多种可能的方式指定的目标。这可以模拟一个无人机参与搜索和救援行动,在区域内导航,观察到一系列空中图像。AGL任务涉及两个重要挑战。首先,代理人必须处理多种模式的目标规范(例如,通过自然语言描述),而搜索线索是以其他模式(空中图像)提供的。第二个挑战是有限的定位时间(例如,有限的电池寿命,紧急情况),因此目标必须尽可能高效地定位,即代理人在搜索目标时必须有效地利用其顺序观察到的空中视图。为了解决这些挑战,我们提出了GOMAA-Geo - 一种目标模态不可知的主动地理定位代理 - 用于不同目标模态之间的零样本泛化。我们的方法结合了跨模态对比学习,以对齐跨模态的表示与监督基础模型预训练和强化学习,以获得高效的导航和定位策略。通过广泛的评估,我们展示了GOMAA-Geo优于其他可学习方法,并且它在数据集之间泛化 - 例如,在没有在训练过程中看到任何灾难场景的情况下,泛化到遭受灾难的地区 - 和目标模态 - 例如,到地面级别的图像或文本描述,尽管只训练过以空中视图指定的目标。代码和模型可以在https://github.com/mvrl/GOMAA-Geo/tree/main上公开获取。
更新时间: 2024-06-04 02:59:36
领域: cs.CV,cs.AI
Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games
Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of the two-timescale analysis in [Borkar, 1997]. We give a simple example satisfying the various hypothesis made in the proof of convergence and illustrating the performance of the algorithm.
Updated: 2024-06-04 02:58:13
标题: 多尺度强化Q学习算法在均场控制博弈中的分析
摘要: 均值场控制游戏(MFCG),在[Angiuli等,2022a]中引入,代表了在大量协作群体之间进行的竞争游戏,在群体数量和规模的无限极限下。在本文中,我们证明了一个三时间尺度的强化Q学习(RL)算法在模型自由方法下从代表性代理的角度解决MFCG的收敛性。我们的分析使用一个Q表,针对有限状态和动作空间,在无限时间段内每个离散时间步更新一次。在[Angiuli等,2023]中,我们证明了MFG和MFC分别的两时间尺度算法的收敛性,强调了在MFC情况下需要跟踪多个人口分布。在这里,我们将这个特征也整合到MFCG中,并且三个更新速率按正确比例减少到零。我们的证明技术使用了[Borkar,1997]中两时间尺度分析的三时间尺度的推广。我们给出了一个简单的例子,满足证明中所做的各种假设,并展示了算法的性能。
更新时间: 2024-06-04 02:58:13
领域: math.OC,cs.LG,cs.MA
HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model
Head pose estimation (HPE) task requires a sophisticated understanding of 3D spatial relationships and precise numerical output of yaw, pitch, and roll Euler angles. Previous HPE studies are mainly based on Non-large language models (Non-LLMs), which rely on close-up human heads cropped from the full image as inputs and lack robustness in real-world scenario. In this paper, we present a novel framework to enhance the HPE prediction task by leveraging the visual grounding capability of CogVLM. CogVLM is a vision language model (VLM) with grounding capability of predicting object bounding boxes (BBoxes), which enables HPE training and prediction using full image information input. To integrate the HPE task into the VLM, we first cop with the catastrophic forgetting problem in large language models (LLMs) by investigating the rehearsal ratio in the data rehearsal method. Then, we propose and validate a LoRA layer-based model merging method, which keeps the integrity of parameters, to enhance the HPE performance in the framework. The results show our HPE-CogVLM achieves a 31.5\% reduction in Mean Absolute Error for HPE prediction over the current Non-LLM based state-of-the-art in cross-dataset evaluation. Furthermore, we compare our LoRA layer-based model merging method with LoRA fine-tuning only and other merging methods in CogVLM. The results demonstrate our framework outperforms them in all HPE metrics.
Updated: 2024-06-04 02:51:26
标题: HPE-CogVLM:基于视觉语言模型的新头部姿势定位任务探索
摘要: 头部姿势估计(HPE)任务需要对3D空间关系有复杂的理解,并精确输出偏航、俯仰和横滚欧拉角。先前的HPE研究主要基于非大型语言模型(Non-LLMs),依赖于从完整图像中剪裁的近距离人头作为输入,并在真实场景中缺乏稳健性。在本文中,我们提出了一种新的框架,通过利用CogVLM的视觉基础能力来增强HPE预测任务。CogVLM是一个具有预测物体边界框(BBoxes)基础能力的视觉语言模型(VLM),它使得可以使用完整图像信息作为输入进行HPE训练和预测。为了将HPE任务整合到VLM中,我们首先通过研究数据回放方法中的回放比率来处理大型语言模型(LLMs)中的灾难性遗忘问题。然后,我们提出并验证了一种基于LoRA层的模型合并方法,该方法保持参数的完整性,以增强框架中的HPE性能。结果显示,我们的HPE-CogVLM在跨数据集评估中相比当前基于Non-LLM的最新技术,将HPE预测的平均绝对误差降低了31.5\%。此外,我们将我们的LoRA层模型合并方法与仅LoRA微调和CogVLM中的其他合并方法进行比较。结果表明,我们的框架在所有HPE指标中表现出色。
更新时间: 2024-06-04 02:51:26
领域: cs.CV,cs.AI,cs.CL
Generating Synthetic Net Load Data with Physics-informed Diffusion Model
This paper presents a novel physics-informed diffusion model for generating synthetic net load data, addressing the challenges of data scarcity and privacy concerns. The proposed framework embeds physical models within denoising networks, offering a versatile approach that can be readily generalized to unforeseen scenarios. A conditional denoising neural network is designed to jointly train the parameters of the transition kernel of the diffusion model and the parameters of the physics-informed function. Utilizing the real-world smart meter data from Pecan Street, we validate the proposed method and conduct a thorough numerical study comparing its performance with state-of-the-art generative models, including generative adversarial networks, variational autoencoders, normalizing flows, and a well calibrated baseline diffusion model. A comprehensive set of evaluation metrics is used to assess the accuracy and diversity of the generated synthetic net load data. The numerical study results demonstrate that the proposed physics-informed diffusion model outperforms state-of-the-art models across all quantitative metrics, yielding at least 20% improvement.
Updated: 2024-06-04 02:50:19
标题: 用物理启发扩散模型生成合成净负荷数据
摘要: 本文提出了一种新颖的基于物理信息的扩散模型,用于生成合成净负载数据,解决数据稀缺和隐私问题。所提出的框架将物理模型嵌入去噪网络中,提供了一种灵活的方法,可以轻松推广到未预料到的情况。设计了一个条件去噪神经网络,用于联合训练扩散模型的转换核参数和基于物理信息的函数参数。利用Pecan Street的真实智能电表数据,我们验证了所提出的方法,并进行了与最先进的生成模型(包括生成对抗网络、变分自动编码器、正规化流以及一个良好校准的基线扩散模型)性能的彻底数值研究。使用一套全面的评估指标来评估生成的合成净负载数据的准确性和多样性。数值研究结果表明,所提出的基于物理信息的扩散模型在所有定量指标上优于最先进的模型,至少提高了20%。
更新时间: 2024-06-04 02:50:19
领域: cs.LG,cs.AI
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Defocus blur is a persistent problem in microscope imaging that poses harm to pathology interpretation and medical intervention in cell microscopy and microscope surgery. To address this problem, a unified framework including the multi-pyramid transformer (MPT) and extended frequency contrastive regularization (EFCR) is proposed to tackle two outstanding challenges in microscopy deblur: longer attention span and data deficiency. The MPT employs an explicit pyramid structure at each network stage that integrates the cross-scale window attention (CSWA), the intra-scale channel attention (ISCA), and the feature-enhancing feed-forward network (FEFN) to capture long-range cross-scale spatial interaction and global channel context. The EFCR addresses the data deficiency problem by exploring latent deblur signals from different frequency bands. It also enables deblur knowledge transfer to learn cross-domain information from extra data, improving deblur performance for labeled and unlabeled data. Extensive experiments and downstream task validation show the framework achieves state-of-the-art performance across multiple datasets. Project page: https://github.com/PieceZhang/MPT-CataBlur.
Updated: 2024-06-04 02:47:19
标题: 一个统一的框架用于显微镜焦外模糊的多金字塔变换器和对比学习
摘要: Defocus模糊是显微镜成像中的一个持续问题,对细胞显微镜学和显微手术中的病理学解释和医学干预造成了危害。为了解决这个问题,提出了一个包括多金字塔变换器(MPT)和扩展频率对比正则化(EFCR)的统一框架,以应对显微镜去模糊中的两个突出挑战:更长的注意力跨度和数据不足。MPT在每个网络阶段采用显式金字塔结构,集成了跨尺度窗口注意力(CSWA)、内尺度通道注意力(ISCA)和特征增强前向网络(FEFN),以捕捉跨尺度空间交互和全局通道上下文。EFCR通过探索不同频段的潜在去模糊信号来解决数据不足问题。它还能够进行去模糊知识传递,学习来自额外数据的跨领域信息,提高标记和未标记数据的去模糊性能。大量实验和下游任务验证表明,该框架在多个数据集上实现了最先进的性能。项目页面:https://github.com/PieceZhang/MPT-CataBlur。
更新时间: 2024-06-04 02:47:19
领域: cs.CV,cs.AI
A Global Geometric Analysis of Maximal Coding Rate Reduction
The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR$^2$ problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR$^2$ a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets.
Updated: 2024-06-04 02:39:48
标题: 一个全球性的极大编码速率降低的几何分析
摘要: 最大编码率降低(MCR$^2$)目标用于学习结构化和紧凑的深度表示引起了越来越多的关注,特别是在最近用于推导完全可解释和高度有效的深度网络架构之后。然而,它缺乏完整的理论证明:只知道其全局最优解的性质,并且其全局景观尚未被研究。在这项工作中,我们对所有局部和全局最优解的性质进行了完整的表征,以及其他类型的临界点。具体地,我们表明MCR$^2$问题的每个(局部或全局)最大化者对应于低维、有区别性和多样化的表示,而且目标的每个临界点要么是一个局部最大化者,要么是一个严格的鞍点。这样有利的景观使MCR$^2$成为通过一阶优化方法学习多样化和有区别性表示的自然选择。为了验证我们的理论发现,我们在合成和真实数据集上进行了大量实验。
更新时间: 2024-06-04 02:39:48
领域: cs.LG
PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming
Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems. The new architecture PDHG-Net is designed by unrolling the recently emerged PDHG method into a neural network, combined with channel-expansion techniques borrowed from graph neural networks. We prove that the proposed PDHG-Net can recover PDHG algorithm, thus can approximate optimal solutions of LP instances with a polynomial number of neurons. We propose a two-stage inference approach: first use PDHG-Net to generate an approximate solution, and then apply PDHG algorithm to further improve the solution. Experiments show that our approach can significantly accelerate LP solving, achieving up to a 3$\times$ speedup compared to FOMs for large-scale LP problems.
Updated: 2024-06-04 02:39:42
标题: PDHG展开学习优化方法用于大规模线性规划
摘要: 解决大规模线性规划(LP)问题是通信网络、电力系统、金融和物流等各个领域中的重要任务。最近,出现了两种不同的方法来加速LP求解:(i) 一阶方法 (FOMs); (ii) 学习优化 (L2O)。在这项工作中,我们提出了一种称为PDHG-Net的FOM展开神经网络 (NN),并提出了一种两阶段L2O方法来解决大规模LP问题。新架构PDHG-Net是通过将最近出现的PDHG方法展开为神经网络设计的,结合了从图神经网络借鉴来的通道扩展技术。我们证明了所提出的PDHG-Net可以恢复PDHG算法,因此可以用多项式数量的神经元近似LP实例的最优解。我们提出了一种两阶段推断方法:首先使用PDHG-Net生成一个近似解,然后应用PDHG算法进一步改进解。实验证明,我们的方法可以显著加速LP求解,与FOMs相比,对于大规模LP问题可以实现高达3倍的加速。
更新时间: 2024-06-04 02:39:42
领域: cs.LG,math.OC
FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler
Federated learning (FL) enables collaborative machine learning across distributed data owners, but data heterogeneity poses a challenge for model calibration. While prior work focused on improving accuracy for non-iid data, calibration remains under-explored. This study reveals existing FL aggregation approaches lead to sub-optimal calibration, and theoretical analysis shows despite constraining variance in clients' label distributions, global calibration error is still asymptotically lower bounded. To address this, we propose a novel Federated Calibration (FedCal) approach, emphasizing both local and global calibration. It leverages client-specific scalers for local calibration to effectively correct output misalignment without sacrificing prediction accuracy. These scalers are then aggregated via weight averaging to generate a global scaler, minimizing the global calibration error. Extensive experiments demonstrate FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47.66% on average.
Updated: 2024-06-04 02:36:14
标题: FedCal:通过聚合参数化标量在联邦学习中实现本地和全局校准
摘要: 联邦学习(FL)实现了分布式数据所有者之间的协作机器学习,但数据的异质性给模型校准带来了挑战。尽管先前的工作侧重于提高非独立同分布数据的准确性,校准仍然未被充分研究。本研究揭示了现有的FL聚合方法导致次优的校准,理论分析显示,尽管限制了客户端标签分布的方差,全局校准误差仍然是渐近下界。为了解决这个问题,我们提出了一种新颖的联邦校准(FedCal)方法,强调了本地和全局校准。它利用客户端特定的缩放器进行本地校准,有效地纠正输出不对齐而不损失预测准确性。然后,这些缩放器通过权重平均进行聚合,生成一个全局缩放器,最小化全局校准误差。大量实验证明,FedCal明显优于表现最佳的基准线,平均减少全局校准误差47.66%。
更新时间: 2024-06-04 02:36:14
领域: cs.LG,cs.DC
Adaptive Convolutional Forecasting Network Based on Time Series Feature-Driven
Time series data in real-world scenarios contain a substantial amount of nonlinear information, which significantly interferes with the training process of models, leading to decreased prediction performance. Therefore, during the time series forecasting process, extracting the local and global time series patterns and understanding the potential nonlinear features among different time observations are highly significant. To address this challenge, we introduce multi-resolution convolution and deformable convolution operations. By enlarging the receptive field using convolution kernels with different dilation factors to capture temporal correlation information at different resolutions, and adaptively adjusting the sampling positions through additional offset vectors, we enhance the network's ability to capture potential nonlinear features among time observations. Building upon this, we propose ACNet, an adaptive convolutional network designed to effectively model the local and global temporal dependencies and the nonlinear features between observations in multivariate time series. Specifically, by extracting and fusing time series features at different resolutions, we capture both local contextual information and global patterns in the time series. The designed nonlinear feature adaptive extraction module captures the nonlinear features among different time observations in the time series. We evaluated the performance of ACNet across twelve real-world datasets. The results indicate that ACNet consistently achieves state-of-the-art performance in both short-term and long-term forecasting tasks with favorable runtime efficiency.
Updated: 2024-06-04 02:30:49
标题: 基于时间序列特征驱动的自适应卷积预测网络
摘要: 时间序列数据在现实场景中包含大量非线性信息,这显著干扰了模型训练过程,导致预测性能下降。因此,在时间序列预测过程中,提取局部和全局时间序列模式并理解不同时间观察之间的潜在非线性特征非常重要。为了解决这一挑战,我们引入了多分辨率卷积和可变形卷积操作。通过使用不同膨胀因子的卷积核扩大感受野,以在不同分辨率捕获时间相关信息,并通过额外的偏移向量自适应调整采样位置,增强网络捕捉时间观察之间的潜在非线性特征的能力。在此基础上,我们提出了ACNet,一个自适应卷积网络,旨在有效地模拟多元时间序列中观察之间的局部和全局时间依赖性以及非线性特征。具体来说,通过在不同分辨率提取和融合时间序列特征,我们捕获了时间序列中的局部背景信息和全局模式。设计的非线性特征自适应提取模块捕获了时间序列中不同时间观察之间的非线性特征。我们评估了ACNet在十二个现实世界数据集上的性能。结果表明,ACNet在短期和长期预测任务中始终取得了最先进的性能,并具有良好的运行时效率。
更新时间: 2024-06-04 02:30:49
领域: cs.LG,cs.IR
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
Standard empirical risk minimization (ERM) models may prioritize learning spurious correlations between spurious features and true labels, leading to poor accuracy on groups where these correlations do not hold. Mitigating this issue often requires expensive spurious attribute (group) labels or relies on trained ERM models to infer group labels when group information is unavailable. However, the significant performance gap in worst-group accuracy between using pseudo group labels and using oracle group labels inspires us to consider further improving group robustness through preciser group inference. Therefore, we propose GIC, a novel method that accurately infers group labels, resulting in improved worst-group performance. GIC trains a spurious attribute classifier based on two key properties of spurious correlations: (1) high correlation between spurious attributes and true labels, and (2) variability in this correlation between datasets with different group distributions. Empirical studies on multiple datasets demonstrate the effectiveness of GIC in inferring group labels, and combining GIC with various downstream invariant learning methods improves worst-group accuracy, showcasing its powerful flexibility. Additionally, through analyzing the misclassifications in GIC, we identify an interesting phenomenon called semantic consistency, which may contribute to better decoupling the association between spurious attributes and labels, thereby mitigating spurious correlation. The code for GIC is available at https://github.com/yujinhanml/GIC.
Updated: 2024-06-04 02:25:52
标题: 提高对虚假相关性的稳健性需要更精确的群体推断
摘要: 标准的经验风险最小化(ERM)模型可能会优先学习虚假特征与真实标签之间的虚假关联,导致在这些关联不成立的群体上准确性较低。缓解这一问题通常需要昂贵的虚假属性(群体)标签,或依赖于训练好的ERM模型在群体信息不可用时推断群体标签。然而,使用伪群体标签和使用完美群体标签之间最差群体准确性的显著性能差距,启发我们考虑通过更精确的群体推断进一步提高群体鲁棒性。因此,我们提出了GIC,一种准确推断群体标签的新方法,从而提高了最差群体性能。GIC基于虚假关联的两个关键特性训练虚假属性分类器:(1)虚假属性和真实标签之间的高相关性,以及(2)在具有不同群体分布的数据集之间的这种相关性的变化。对多个数据集的经验研究表明,GIC在推断群体标签方面的有效性,并将GIC与各种下游不变学习方法结合使用,提高了最差群体准确性,展示了其强大的灵活性。此外,通过分析GIC中的误分类,我们发现一个有趣的现象称为语义一致性,这可能有助于更好地解耦虚假属性和标签之间的关联,从而减轻虚假关联。GIC的代码可在https://github.com/yujinhanml/GIC 上找到。
更新时间: 2024-06-04 02:25:52
领域: cs.LG
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries. Code is available at https://github.com/kevinliang888/IntroPlan.
Updated: 2024-06-04 02:25:30
标题: 内省规划:将机器人的不确定性与固有任务模糊性对齐
摘要: 大型语言模型(LLMs)展示了先进的推理能力,使机器人能够理解自然语言指令并通过适当的基础规划高级行动。然而,LLM幻觉可能导致机器人自信地执行与用户目标不一致的计划,或者在极端情况下不安全。此外,自然语言指令中的固有模糊性可能引发任务不确定性,特别是在存在多个有效选项的情况下。为了解决这个问题,LLMs必须识别这种不确定性并主动寻求澄清。本文探讨了内省规划的概念,作为一种系统方法,指导LLMs形成有关机器人任务执行的不确定性感知计划,而无需进行精细调整。我们研究了任务级别机器人规划中的不确定性量化,并证明内省显著提高了成功率和安全性,与最先进的基于LLM的规划方法相比。此外,我们评估了内省规划与一致性预测结合的有效性,揭示了这种组合产生了更紧密的置信区间,从而通过更少的无谓用户澄清查询来保持统计成功保证。代码可在https://github.com/kevinliang888/IntroPlan 上找到。
更新时间: 2024-06-04 02:25:30
领域: cs.AI,cs.CL,cs.LG
Superfast Selection for Decision Tree Algorithms
We present a novel and systematic method, called Superfast Selection, for selecting the "optimal split" for decision tree and feature selection algorithms over tabular data. The method speeds up split selection on a single feature by lowering the time complexity, from O(MN) (using the standard selection methods) to O(M), where M represents the number of input examples and N the number of unique values. Additionally, the need for pre-encoding, such as one-hot or integer encoding, for feature value heterogeneity is eliminated. To demonstrate the efficiency of Superfast Selection, we empower the CART algorithm by integrating Superfast Selection into it, creating what we call Ultrafast Decision Tree (UDT). This enhancement enables UDT to complete the training process with a time complexity O(KM$^2$) (K is the number of features). Additionally, the Training Only Once Tuning enables UDT to avoid the repetitive training process required to find the optimal hyper-parameter. Experiments show that the UDT can finish a single training on KDD99-10% dataset (494K examples with 41 features) within 1 second and tuning with 214.8 sets of hyper-parameters within 0.25 second on a laptop.
Updated: 2024-06-04 02:19:30
标题: 决策树算法的超快速选择
摘要: 我们提出了一种新颖和系统化的方法,称为超快速选择,用于在表格数据上选择决策树和特征选择算法的“最佳分割”。该方法通过降低时间复杂度,从O(MN)(使用标准选择方法)降至O(M),其中M代表输入示例的数量,N代表唯一值的数量,来加快单个特征的分割选择。此外,消除了对预编码(如one-hot或整数编码)进行特征值异质性的需求。为了展示超快速选择的效率,我们将其集成到CART算法中,创建了我们称之为超快速决策树(UDT)。这种增强使得UDT能够在时间复杂度为O(KM$^2$)(K是特征数量)的情况下完成训练过程。此外,仅训练一次调整使得UDT能够避免寻找最佳超参数所需的重复训练过程。实验证明,在笔记本电脑上,UDT可以在1秒内完成对KDD99-10%数据集(494K个示例,41个特征)的单次训练,并在0.25秒内对214.8组超参数进行调整。
更新时间: 2024-06-04 02:19:30
领域: cs.LG
Bifurcated Generative Flow Networks
Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have recently emerged as a promising framework for learning stochastic policies that generate high-quality and diverse objects proportionally to their rewards. However, existing GFlowNets often suffer from low data efficiency due to the direct parameterization of edge flows or reliance on backward policies that may struggle to scale up to large action spaces. In this paper, we introduce Bifurcated GFlowNets (BN), a novel approach that employs a bifurcated architecture to factorize the flows into separate representations for state flows and edge-based flow allocation. This factorization enables BN to learn more efficiently from data and better handle large-scale problems while maintaining the convergence guarantee. Through extensive experiments on standard evaluation benchmarks, we demonstrate that BN significantly improves learning efficiency and effectiveness compared to strong baselines.
Updated: 2024-06-04 02:12:27
标题: 分叉生成流网络
摘要: Generative Flow Networks(GFlowNets)是一种新型的概率采样器家族,最近作为学习随机策略的有希望的框架出现,该框架可以按照它们的奖励比例生成高质量和多样化的对象。然而,现有的GFlowNets往往由于直接参数化边缘流或依赖于可能难以扩展到大动作空间的反向策略而导致数据效率低下。在本文中,我们介绍了一种新颖的方法,即Bifurcated GFlowNets(BN),它采用了一个分叉的架构来将流分解为状态流和基于边缘的流分配的独立表示。这种因子分解使得BN能够更有效地从数据中学习,并更好地处理大规模问题,同时保持收敛性保证。通过对标准评估基准的广泛实验,我们证明了与强基线相比,BN显著提高了学习效率和效果。
更新时间: 2024-06-04 02:12:27
领域: cs.LG
A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, flexibility, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., CV and NLP) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.
Updated: 2024-06-04 02:11:25
标题: 一项基于混合数据增强的调查:分类、方法、应用和可解释性
摘要: 数据增强(DA)在现代机器学习和深度神经网络中是不可或缺的。DA的基本思想是通过添加略有扰动的现有数据的新训练数据或合成新数据来改进模型的泛化能力。本文全面回顾了DA技术的一个重要子集,即基于混合的数据增强(MixDA),该技术通过组合多个示例生成新样本。与传统的DA方法不同,这些方法仅适用于单个样本或整个数据集,MixDA因其有效性、简单性、灵活性、计算效率、理论基础和广泛适用性而脱颖而出。我们首先介绍了一个将MixDA分为基于Mixup、Cutmix和混合方法的新分类法,基于数据混合操作的层次透视。随后,我们深入审查了各种MixDA技术,重点关注它们的基本动机。由于其多功能性,MixDA已渗透到广泛的应用领域,我们在本调查中也对此进行了全面调查。此外,我们深入研究了MixDA有效性的基本机制,通过分析MixDA的固有属性,探讨其对模型泛化和校准的影响,以揭示模型行为的见解。最后,我们总结了目前MixDA研究的关键发现和基本挑战,同时概述了未来工作的潜在方向。与以往专注于特定领域(例如CV和NLP)的DA方法或仅审查MixDA研究有限子集的相关调查不同,我们是第一个系统审查MixDA的研究,涵盖了其分类法、方法论、应用和可解释性。此外,我们为对这一激动人心领域感兴趣的研究人员提供了有前途的方向。
更新时间: 2024-06-04 02:11:25
领域: cs.LG,cs.CL,cs.CV
TinyLlama: An Open-Source Small Language Model
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.
Updated: 2024-06-04 02:05:30
标题: 小羊骆驼:一个开源的小型语言模型
摘要: 我们介绍了TinyLlama,这是一个紧凑的1.1B语言模型,预训练了大约1万亿个标记,大约进行了3个时代。基于Llama 2的架构和分词器,TinyLlama利用了开源社区贡献的各种进展(例如FlashAttention和Lit-GPT),实现了更好的计算效率。尽管规模相对较小,TinyLlama在一系列下游任务中表现出卓越的性能。它在与规模相当的现有开源语言模型相比,明显表现更好。我们的模型检查点和代码可以在GitHub上公开获取,网址为https://github.com/jzhang38/TinyLlama。
更新时间: 2024-06-04 02:05:30
领域: cs.CL,cs.AI
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This 'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to develop a general model that is able to capture diverse data patterns of graphs and can be utilized to adaptively help the downstream tasks. To this end, we propose UniAug, a universal graph structure augmentor built on a diffusion model. We first pre-train a discrete diffusion model on thousands of graphs across domains to learn the graph structural patterns. In the downstream phase, we provide adaptive enhancement by conducting graph structure augmentation with the help of the pre-trained diffusion model via guided generation. By leveraging the pre-trained diffusion model for structure augmentation, we consistently achieve performance improvements across various downstream tasks in a plug-and-play manner. To the best of our knowledge, this study represents the first demonstration of a data-scaling graph structure augmentor on graphs across domains.
Updated: 2024-06-04 02:04:09
标题: 跨领域图数据缩放:扩散模型展示
摘要: 自然语言和图像模型受益于数据的缩放行为:模型获得的数据越多,它们的性能就越好。这种“更多数据更好”的现象使得在大量数据上进行大规模预训练变得有效。然而,当前的图预训练方法由于图之间的异质性而难以扩展数据规模。为了实现有效的数据扩展,我们旨在开发一个能够捕捉图数据多样模式的通用模型,并能够适应性地帮助下游任务。为此,我们提出了UniAug,一个建立在扩散模型上的通用图结构增强器。我们首先在跨领域的数千个图上预训练一个离散扩散模型,以学习图结构模式。在下游阶段,我们通过引导生成,利用预训练的扩散模型进行图结构增强,实现自适应增强。通过利用预训练的扩散模型进行结构增强,我们以即插即用的方式在各种下游任务中持续实现性能改进。据我们所知,这项研究是首次在跨领域的图上展示了一个数据扩展图结构增强器。
更新时间: 2024-06-04 02:04:09
领域: cs.LG
Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation
The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task.
Updated: 2024-06-04 02:01:18
标题: 跨学科公平性在不平衡研究提案主题推断中的应用:一种基于层次变换器和选择性插值的方法
摘要: 研究提案中主题推断的目标是从资助机构定义的学科体系中获得最适合的学科划分。该机构随后将根据这一划分从其数据库中找到合适的同行评审专家。自动化主题推断可以减少手动填写主题导致的人为错误,弥合资助机构与项目申请人之间的知识差距,提高系统效率。现有方法侧重于将其建模为分层多标签分类问题,使用生成模型来迭代地推断最适当的主题信息。然而,这些方法忽视了跨学科研究提案和非跨学科提案之间规模差距,导致了一个不公平的现象,即自动推断系统将跨学科提案分类为非跨学科,在专家分配过程中造成了不公平。我们该如何在复杂的学科体系下解决这一数据不平衡问题,从而解决这种不公平?在本文中,我们基于Transformer编码器-解码器架构实现了一个主题标签推断系统。此外,我们利用插值技术,在训练过程中基于非参数指标如跨主题概率和主题出现概率,从非跨学科提案中创建一系列伪跨学科提案。这种方法旨在减少系统在模型训练过程中的偏见。最后,我们在一个真实数据集上进行了大量实验,验证了所提方法的有效性。实验结果表明,我们的训练策略可以显著减轻主题推断任务中产生的不公平。
更新时间: 2024-06-04 02:01:18
领域: cs.CL,cs.AI
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
Despite the success of Transformers on language understanding, code generation, and logical reasoning, they still fail to generalize over length on basic arithmetic tasks such as addition and multiplication. A major reason behind this failure is the vast difference in structure between numbers and text; For example, the numbers are typically parsed from right to left, and there is a correspondence between digits at the same position across different numbers. In contrast, for text, such symmetries are quite unnatural. In this work, we propose to encode these semantics explicitly into the model via modified number formatting and custom positional encodings. Empirically, our method allows a Transformer trained on numbers with at most 5-digits for addition and multiplication to generalize up to 50-digit numbers, without using additional data for longer sequences. We further demonstrate that traditional absolute positional encodings (APE) fail to generalize to longer sequences, even when trained with augmented data that captures task symmetries. To elucidate the importance of explicitly encoding structure, we prove that explicit incorporation of structure via positional encodings is necessary for out-of-distribution generalization. Finally, we pinpoint other challenges inherent to length generalization beyond capturing symmetries, in particular complexity of the underlying task, and propose changes in the training distribution to address them.
Updated: 2024-06-04 02:00:07
标题: 明确编码结构对算术任务中长度概括的关键作用
摘要: 尽管Transformers在语言理解、代码生成和逻辑推理方面取得了成功,但它们在基本算术任务如加法和乘法上仍然无法在长度上进行泛化。造成这种失败的一个主要原因是数字和文本之间结构上的巨大差异;例如,数字通常从右向左解析,不同数字之间相同位置的数字之间存在对应关系。相比之下,对于文本来说,这样的对称性是相当不自然的。在这项工作中,我们提出通过修改数字格式和自定义位置编码将这些语义明确地编码到模型中。在经验上,我们的方法使一个在最多5位数字上进行训练的Transformer能够在加法和乘法上泛化到最多50位数字,而无需使用额外的数据来处理更长的序列。我们进一步证明,即使使用捕捉任务对称性的增强数据进行训练,传统的绝对位置编码(APE)也无法泛化到更长的序列。为了阐明明确编码结构的重要性,我们证明通过位置编码明确地结合结构对于超出分布泛化是必要的。最后,我们指出了长度泛化中固有的其他挑战,特别是底层任务的复杂性,并提出改变训练分布以解决这些挑战。
更新时间: 2024-06-04 02:00:07
领域: cs.LG,cs.CL,stat.ML
Large Language Model-Enabled Multi-Agent Manufacturing Systems
Traditional manufacturing faces challenges adapting to dynamic environments and quickly responding to manufacturing changes. The use of multi-agent systems has improved adaptability and coordination but requires further advancements in rapid human instruction comprehension, operational adaptability, and coordination through natural language integration. Large language models like GPT-3.5 and GPT-4 enhance multi-agent manufacturing systems by enabling agents to communicate in natural language and interpret human instructions for decision-making. This research introduces a novel framework where large language models enhance the capabilities of agents in manufacturing, making them more adaptable, and capable of processing context-specific instructions. A case study demonstrates the practical application of this framework, showing how agents can effectively communicate, understand tasks, and execute manufacturing processes, including precise G-code allocation among agents. The findings highlight the importance of continuous large language model integration into multi-agent manufacturing systems and the development of sophisticated agent communication protocols for a more flexible manufacturing system.
Updated: 2024-06-04 01:57:37
标题: 大型语言模型驱动的多智能体制造系统
摘要: 传统制造业面临着适应动态环境和快速响应制造变化的挑战。多智能体系统的使用改善了适应性和协调性,但需要进一步发展快速人类指令理解、操作适应性和通过自然语言集成进行协调的能力。像GPT-3.5和GPT-4这样的大型语言模型通过使智能体能够使用自然语言进行交流和解释人类指令来增强多智能体制造系统。这项研究引入了一个新的框架,其中大型语言模型增强了制造中智能体的能力,使它们更具适应性,并能够处理特定上下文的指令。一项案例研究展示了该框架的实际应用,展示了智能体如何有效地交流、理解任务并执行制造过程,包括在智能体之间进行精确的G代码分配。研究结果强调将大型语言模型持续整合到多智能体制造系统中的重要性,以及为更灵活的制造系统开发复杂的智能体通信协议。
更新时间: 2024-06-04 01:57:37
领域: cs.MA,cs.AI
Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic
Behavior Tree (BT) planning is crucial for autonomous robot behavior control, yet its application in complex scenarios is hampered by long planning times. Pruning and heuristics are common techniques to accelerate planning, but it is difficult to design general pruning strategies and heuristic functions for BT planning problems. This paper proposes improving BT planning efficiency for everyday service robots leveraging commonsense reasoning provided by Large Language Models (LLMs), leading to model-free pre-planning action space pruning and heuristic generation. This approach takes advantage of the modularity and interpretability of BT nodes, represented by predicate logic, to enable LLMs to predict the task-relevant action predicates and objects, and even the optimal path, without an explicit action model. We propose the Heuristic Optimal Behavior Tree Expansion Algorithm (HOBTEA) with two heuristic variants and provide a formal comparison and discussion of their efficiency and optimality. We introduce a learnable and transferable commonsense library to enhance the LLM's reasoning performance without fine-tuning. The action space expansion based on the commonsense library can further increase the success rate of planning. Experiments show the theoretical bounds of commonsense pruning and heuristic, and demonstrate the actual performance of LLM learning and reasoning with the commonsense library. Results in four datasets showcase the practical effectiveness of our approach in everyday service robot applications.
Updated: 2024-06-04 01:41:24
标题: 高效的行为树规划:基于常识修剪和启发式的方法
摘要: 行为树(BT)规划对于自主机器人的行为控制至关重要,然而在复杂场景中的应用受到长时间规划的阻碍。修剪和启发式是加速规划的常见技术,但很难为BT规划问题设计通用的修剪策略和启发函数。本文提出了利用大型语言模型(LLM)提供的常识推理来提高日常服务机器人的BT规划效率,从而实现无模型的预规划行动空间修剪和启发式生成。该方法利用谓词逻辑表示的BT节点的模块化和可解释性,使LLMs能够预测与任务相关的行动谓词和对象,甚至是最佳路径,而无需显式行动模型。我们提出了带有两种启发式变体的启发式最佳行为树扩展算法(HOBTEA),并对它们的效率和最优性进行了正式比较和讨论。我们引入了一个可学习和可转移的常识库,以增强LLM的推理性能而无需微调。基于常识库的行动空间扩展可以进一步提高规划的成功率。实验展示了常识修剪和启发式的理论界限,并展示了LLM学习和推理与常识库的实际性能。在四个数据集中的结果展示了我们方法在日常服务机器人应用中的实际有效性。
更新时间: 2024-06-04 01:41:24
领域: cs.RO,cs.AI
RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents
Current research in form understanding predominantly relies on large pre-trained language models, necessitating extensive data for pre-training. However, the importance of layout structure (i.e., the spatial relationship between the entity blocks in the visually rich document) to relation extraction has been overlooked. In this paper, we propose REgion-Aware Relation Extraction (RE$^2$) that leverages region-level spatial structure among the entity blocks to improve their relation prediction. We design an edge-aware graph attention network to learn the interaction between entities while considering their spatial relationship defined by their region-level representations. We also introduce a constraint objective to regularize the model towards consistency with the inherent constraints of the relation extraction task. Extensive experiments across various datasets, languages and domains demonstrate the superiority of our proposed approach.
Updated: 2024-06-04 01:32:18
标题: RE$^2$: 从视觉丰富的文档中提取区域感知关系
摘要: 目前关于形式理解的研究主要依赖于大型预训练语言模型,需要大量数据进行预训练。然而,布局结构(即视觉丰富文档中实体块之间的空间关系)对于关系抽取的重要性被忽视了。在本文中,我们提出了一种利用实体块之间的区域级空间结构来改善它们关系预测的REgion-Aware Relation Extraction(RE$^2$)方法。我们设计了一个边缘感知的图注意力网络,以学习实体之间的相互作用,同时考虑它们由区域级表示定义的空间关系。我们还引入了一个约束目标,以使模型符合关系抽取任务的固有约束。在各种数据集、语言和领域上进行的大量实验表明了我们提出的方法的优越性。
更新时间: 2024-06-04 01:32:18
领域: cs.CL,cs.AI
Graph Machine Learning in the Era of Large Language Models (LLMs)
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.
Updated: 2024-06-04 01:31:30
标题: 大语言模型(LLMs)时代的图机器学习
摘要: 图形在表示社交网络、知识图谱和分子发现等各个领域中复杂关系方面起着重要作用。随着深度学习的发展,图神经网络(GNNs)已成为图机器学习(Graph ML)中的基石,促进了图结构的表示和处理。最近,大型语言模型(LLMs)在语言任务中表现出了前所未有的能力,并被广泛应用于计算机视觉和推荐系统等各种应用中。这一显著成功也引起了将LLMs应用于图领域的兴趣。人们正在不断努力探索LLMs在推进图ML的泛化、可迁移性和少样本学习能力方面的潜力。同时,图,特别是知识图谱,充满了可靠的事实知识,可以用于增强LLMs的推理能力,并潜在地缓解其幻觉和缺乏可解释性等局限性。鉴于这一研究方向的迅速进展,有必要对LLMs时代图ML的最新进展进行系统性回顾,以提供研究人员和从业者深入了解。因此,在本调查中,我们首先回顾了图ML的最新发展。然后,我们探讨了如何利用LLMs来提高图特征的质量,减轻对标记数据的依赖,并解决图的异质性和分布外(OOD)泛化等挑战。随后,我们深入探讨了图如何增强LLMs,突出它们在增强LLM预训练和推理方面的能力。此外,我们调查了各种应用,并讨论了这一有前途领域的潜在未来方向。
更新时间: 2024-06-04 01:31:30
领域: cs.LG,cs.AI,cs.CL,cs.SI
HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model
Honeypots, as a strategic cyber-deception mechanism designed to emulate authentic interactions and bait unauthorized entities, continue to struggle with balancing flexibility, interaction depth, and deceptive capability despite their evolution over decades. Often they also lack the capability of proactively adapting to an attacker's evolving tactics, which restricts the depth of engagement and subsequent information gathering. Under this context, the emergent capabilities of large language models, in tandem with pioneering prompt-based engineering techniques, offer a transformative shift in the design and deployment of honeypot technologies. In this paper, we introduce HoneyGPT, a pioneering honeypot architecture based on ChatGPT, heralding a new era of intelligent honeypot solutions characterized by their cost-effectiveness, high adaptability, and enhanced interactivity, coupled with a predisposition for proactive attacker engagement. Furthermore, we present a structured prompt engineering framework that augments long-term interaction memory and robust security analytics. This framework, integrating thought of chain tactics attuned to honeypot contexts, enhances interactivity and deception, deepens security analytics, and ensures sustained engagement. The evaluation of HoneyGPT includes two parts: a baseline comparison based on a collected dataset and a field evaluation in real scenarios for four weeks. The baseline comparison demonstrates HoneyGPT's remarkable ability to strike a balance among flexibility, interaction depth, and deceptive capability. The field evaluation further validates HoneyGPT's efficacy, showing its marked superiority in enticing attackers into more profound interactive engagements and capturing a wider array of novel attack vectors in comparison to existing honeypot technologies.
Updated: 2024-06-04 01:31:20
标题: HoneyGPT:使用大型语言模型打破终端蜜罐的三难困境
摘要: 蜜罐作为一种战略性网络欺骗机制,旨在模拟真实交互并诱使未经授权的实体,尽管经过几十年的发展,仍然在平衡灵活性、交互深度和欺骗能力方面面临挑战。通常它们也缺乏主动适应攻击者不断演变的战术的能力,这限制了深度参与和随后信息收集的深度。在这种背景下,大型语言模型的新兴能力,与先进的基于提示的工程技术相结合,为蜜罐技术的设计和部署带来了变革性转变。在本文中,我们介绍了基于ChatGPT的创新蜜罐架构HoneyGPT,标志着一种以其成本效益、高适应性和增强交互性为特征的智能蜜罐解决方案的新时代,同时具有积极主动地吸引攻击者参与的倾向。此外,我们提出了一个结构化的提示工程框架,增强了长期交互记忆和强大的安全分析。这个框架,整合了与蜜罐环境相适应的思维链策略,增强了交互性和欺骗性,加深了安全分析,并确保了持续的参与。 HoneyGPT的评估包括两部分:基于收集的数据集的基准比较和在真实场景中进行为期四周的现场评估。基准比较展示了HoneyGPT在灵活性、交互深度和欺骗能力之间取得了平衡的显著能力。现场评估进一步验证了HoneyGPT的有效性,显示其在吸引攻击者进行更深入的互动参与和捕获更广泛的新型攻击向量方面明显优于现有蜜罐技术。
更新时间: 2024-06-04 01:31:20
领域: cs.CR,cs.AI,cs.ET,cs.SE
Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains
Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.
Updated: 2024-06-04 01:30:07
标题: 多领域局部分布转移的多重稳健估计
摘要: 分布转移在现实世界的机器学习应用中是普遍存在的,给训练在一个数据分布上的模型推广到另一个数据分布上带来挑战。我们关注的是数据分布在整个人口的多个部分之间变化,并且在每个部分内只对训练和测试(部署)数据分布的差异进行局部假设的情景。我们提出了一种两阶段的多重稳健估计方法,用于改善表格数据分析中每个个体部分的模型性能。该方法涉及拟合基于从多个部分的训练数据中学习的聚类的基础模型的线性组合,然后为每个部分进行细化步骤。我们的方法旨在与常用的现成机器学习模型一起实现。我们在测试风险上建立了该方法的泛化界限的理论保证。通过对合成和真实数据集的广泛实验,我们证明了所提出的方法在回归和分类任务上预测准确性和稳健性方面明显优于现有替代方案。我们还评估了其在Meta的用户城市预测数据集上的有效性。
更新时间: 2024-06-04 01:30:07
领域: stat.ML,cs.LG,stat.ME
CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels
Online learning is a rapidly growing industry. However, a major doubt about online learning is whether students are as engaged as they are in face-to-face classes. An engagement recognition system can notify the instructors about the students condition and improve the learning experience. Current challenges in engagement detection involve poor label quality, extreme data imbalance, and intra-class variety - the variety of behaviors at a certain engagement level. To address these problems, we present the CMOSE dataset, which contains a large number of data from different engagement levels and high-quality labels annotated according to psychological advice. We also propose a training mechanism MocoRank to handle the intra-class variety and the ordinal pattern of different degrees of engagement classes. MocoRank outperforms prior engagement detection frameworks, achieving a 1.32% increase in overall accuracy and 5.05% improvement in average accuracy. Further, we demonstrate the effectiveness of multi-modality in engagement detection by combining video features with speech and audio features. The data transferability experiments also state that the proposed CMOSE dataset provides superior label quality and behavior diversity.
Updated: 2024-06-04 01:27:35
标题: CMOSE:高质量标签的综合多模态在线学生参与数据集
摘要: 在线学习是一个迅速发展的行业。然而,关于在线学习的一个主要疑问是学生是否像面对面课堂中那样投入。一种参与度识别系统可以通知教师学生的情况并改善学习体验。目前在参与度检测中面临的挑战包括标签质量差、数据严重不平衡以及类内差异性-即在某一参与度水平上的行为多样性。为了解决这些问题,我们提出了CMOSE数据集,其中包含大量来自不同参与度水平的数据,并根据心理学建议进行了高质量标注。我们还提出了一种训练机制MocoRank来处理类内差异性和不同程度的参与度类别的顺序模式。MocoRank优于先前的参与度检测框架,整体准确率提高了1.32%,平均准确率提高了5.05%。此外,我们通过将视频特征与语音和音频特征相结合,展示了多模态在参与度检测中的有效性。数据可迁移实验还表明,提出的CMOSE数据集提供了优质的标签质量和行为多样性。
更新时间: 2024-06-04 01:27:35
领域: cs.CV,cs.AI
Observable Propagation: Uncovering Feature Vectors in Transformers
A key goal of current mechanistic interpretability research in NLP is to find linear features (also called "feature vectors") for transformers: directions in activation space corresponding to concepts that are used by a given model in its computation. Present state-of-the-art methods for finding linear features require large amounts of labelled data -- both laborious to acquire and computationally expensive to utilize. In this work, we introduce a novel method, called "observable propagation" (in short: ObProp), for finding linear features used by transformer language models in computing a given task -- using almost no data. Our paradigm centers on the concept of "observables", linear functionals corresponding to given tasks. We then introduce a mathematical theory for the analysis of feature vectors, including a similarity metric between feature vectors called the coupling coefficient which estimates the degree to which one feature's output correlates with another's. We use ObProp to perform extensive qualitative investigations into several tasks, including gendered occupational bias, political party prediction, and programming language detection. Our results suggest that ObProp surpasses traditional approaches for finding feature vectors in the low-data regime, and that ObProp can be used to better understand the mechanisms responsible for bias in large language models.
Updated: 2024-06-04 01:26:01
标题: 可观测传播:揭示变压器中的特征向量
摘要: 当前自然语言处理领域机制可解释性研究的一个关键目标是为transformer找到线性特征(也称为“特征向量”):在激活空间中与给定模型在计算中使用的概念对应的方向。目前用于找到线性特征的最先进方法需要大量标记数据,这些数据获取困难且计算成本高昂。在本研究中,我们介绍了一种新的方法,称为“可观传播”(简称ObProp),用于找到transformer语言模型在计算给定任务时使用的线性特征--几乎不需要数据。我们的范式聚焦于“可观察量”概念,即与给定任务对应的线性泛函。然后我们介绍了一个用于分析特征向量的数学理论,包括一个称为耦合系数的特征向量之间的相似度度量,该度量估计了一个特征输出与另一个特征输出相关的程度。我们使用ObProp对多个任务进行了广泛的定性调查,包括性别职业偏见、政党预测和编程语言检测。我们的结果表明,ObProp在低数据情况下超越了传统方法用于找到特征向量,并且ObProp可用于更好地理解导致大型语言模型偏见的机制。
更新时间: 2024-06-04 01:26:01
领域: cs.LG,cs.CL
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-explored and unresolved. This survey delves into the emerging security threats faced by AI agents, categorizing them into four critical knowledge gaps: unpredictability of multi-step user inputs, complexity in internal executions, variability of operational environments, and interactions with untrusted external entities. By systematically reviewing these threats, this paper highlights both the progress made and the existing limitations in safeguarding AI agents. The insights provided aim to inspire further research into addressing the security threats associated with AI agents, thereby fostering the development of more robust and secure AI agent applications.
Updated: 2024-06-04 01:22:31
标题: 人工智能代理受到威胁:关键安全挑战和未来发展途径调查
摘要: 一种人工智能(AI)代理是一个软件实体,根据预先定义的目标和数据输入,自主执行任务或做出决策。能够感知用户输入、推理和规划任务以及执行动作的AI代理在算法开发和任务执行方面取得了显著的进展。然而,它们带来的安全挑战仍未得到充分探讨和解决。本调查深入探讨了AI代理所面临的新兴安全威胁,将其分类为四个关键的知识空白:多步用户输入的不可预测性,内部执行的复杂性,操作环境的可变性以及与不受信任的外部实体的交互。通过系统地审查这些威胁,本文既强调了取得的进展,又指出了在保护AI代理方面存在的限制。提供的见解旨在激发进一步研究,以解决与AI代理相关的安全威胁,从而促进更加健壮和安全的AI代理应用程序的发展。
更新时间: 2024-06-04 01:22:31
领域: cs.CR,cs.AI
Privacy-Preserving CNN Training with Transfer Learning: Multiclass Logistic Regression
In this paper, we present a practical solution to implement privacy-preserving CNN training based on mere Homomorphic Encryption (HE) technique. To our best knowledge, this is the first attempt successfully to crack this nut and no work ever before has achieved this goal. Several techniques combine to accomplish the task:: (1) with transfer learning, privacy-preserving CNN training can be reduced to homomorphic neural network training, or even multiclass logistic regression (MLR) training; (2) via a faster gradient variant called $\texttt{Quadratic Gradient}$, an enhanced gradient method for MLR with a state-of-the-art performance in convergence speed is applied in this work to achieve high performance; (3) we employ the thought of transformation in mathematics to transform approximating Softmax function in the encryption domain to the approximation of the Sigmoid function. A new type of loss function termed $\texttt{Squared Likelihood Error}$ has been developed alongside to align with this change.; and (4) we use a simple but flexible matrix-encoding method named $\texttt{Volley Revolver}$ to manage the data flow in the ciphertexts, which is the key factor to complete the whole homomorphic CNN training. The complete, runnable C++ code to implement our work can be found at: \href{https://github.com/petitioner/HE.CNNtraining}{$\texttt{https://github.com/petitioner/HE.CNNtraining}$}. We select $\texttt{REGNET\_X\_400MF}$ as our pre-trained model for transfer learning. We use the first 128 MNIST training images as training data and the whole MNIST testing dataset as the testing data. The client only needs to upload 6 ciphertexts to the cloud and it takes $\sim 21$ mins to perform 2 iterations on a cloud with 64 vCPUs, resulting in a precision of $21.49\%$.
Updated: 2024-06-04 01:18:21
标题: 隐私保护的卷积神经网络训练与迁移学习:多类逻辑回归
摘要: 在这篇论文中,我们提出了一种基于同态加密(HE)技术的实用解决方案,用于实现隐私保护的CNN训练。据我们所知,这是第一次成功地破解这个难题,以前没有任何工作实现过这个目标。几种技术结合起来完成了这项任务:(1)通过迁移学习,隐私保护的CNN训练可以简化为同态神经网络训练,甚至是多类别逻辑回归(MLR)训练;(2)通过一种更快的梯度变体称为“Quadratic Gradient”,本文应用了具有收敛速度方面最先进性能的增强梯度方法来实现高性能;(3)我们应用数学中的变换思想,将近似Softmax函数在加密域中转换为Sigmoid函数的近似。随之开发了一种新型损失函数,称为“Squared Likelihood Error”,以适应这一变化;(4)我们使用一种简单但灵活的矩阵编码方法,称为“Volley Revolver”,来管理密文中的数据流,这是完成整个同态CNN训练的关键因素。可以在以下链接找到用于实现我们工作的完整可运行的C++代码:https://github.com/petitioner/HE.CNNtraining。我们选择REGNET_X_400MF作为我们的预训练模型进行迁移学习。我们使用前128个MNIST训练图像作为训练数据,整个MNIST测试数据集作为测试数据。客户端只需将6个密文上传到云端,在拥有64个vCPU的云端上执行2次迭代,大约需要21分钟,精度为21.49%。
更新时间: 2024-06-04 01:18:21
领域: cs.CR,cs.CV,cs.LG
Contextualized Diffusion Models for Text-Guided Image and Video Generation
Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual relationships exclusively into the reverse process, often disregarding their relevance in the forward process. This inconsistency between forward and reverse processes may limit the precise conveyance of textual semantics in visual synthesis results. To address this issue, we propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample into forward and reverse processes. We propagate this context to all timesteps in the two processes to adapt their trajectories, thereby facilitating cross-modal conditional modeling. We generalize our contextualized diffusion to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing. In each task, our ContextDiff achieves new state-of-the-art performance, significantly enhancing the semantic alignment between text condition and generated samples, as evidenced by quantitative and qualitative evaluations. Our code is available at https://github.com/YangLing0818/ContextDiff
Updated: 2024-06-04 01:08:56
标题: 上下文化扩散模型用于文本引导的图像和视频生成
摘要: 条件扩散模型在高保真度文本引导的视觉生成和编辑中表现出优越的性能。然而,当前的文本引导视觉扩散模型主要专注于将文本-视觉关系独占地纳入到反向过程中,往往忽视了它们在正向过程中的相关性。正向和反向过程之间的这种不一致性可能限制了视觉合成结果中文本语义的精确传达。为解决这一问题,我们提出了一种新颖且通用的上下文化扩散模型(ContextDiff),通过将跨模态上下文涵盖文本条件与视觉样本之间的相互作用和对齐纳入到正向和反向过程中。我们将这种上下文传播到两个过程中的所有时间步骤,以调整它们的轨迹,从而促进跨模态条件建模。我们将我们的上下文化扩散推广到DDPMs和DDIMs,并通过理论推导证明了我们模型的有效性,并在两个具有挑战性的任务:文本到图像生成和文本到视频编辑中展示了我们模型的有效性。在每个任务中,我们的ContextDiff均取得了新的最先进性能,显著提升了文本条件和生成样本之间的语义对齐,这得到了定量和定性评估的证明。我们的代码可在https://github.com/YangLing0818/ContextDiff上获得。
更新时间: 2024-06-04 01:08:56
领域: cs.CV,cs.AI,cs.LG
GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security
Schema matching constitutes a pivotal phase in the data ingestion process for contemporary database systems. Its objective is to discern pairwise similarities between two sets of attributes, each associated with a distinct data table. This challenge emerges at the initial stages of data analytics, such as when incorporating a third-party table into existing databases to inform business insights. Given its significance in the realm of database systems, schema matching has been under investigation since the 2000s. This study revisits this foundational problem within the context of large language models. Adhering to increasingly stringent data security policies, our focus lies on the zero-shot and few-shot scenarios: the model should analyze only a minimal amount of customer data to execute the matching task, contrasting with the conventional approach of scrutinizing the entire data table. We emphasize that the zero-shot or few-shot assumption is imperative to safeguard the identity and privacy of customer data, even at the potential cost of accuracy. The capability to accurately match attributes under such stringent requirements distinguishes our work from previous literature in this domain.
Updated: 2024-06-04 01:08:00
标题: GRAM:在数据安全的背景下生成检索增强匹配数据模式
摘要: 模式匹配在当代数据库系统的数据摄取过程中扮演着关键角色。其目标是识别与两个不同数据表相关联的属性集之间的成对相似性。这一挑战出现在数据分析的初期阶段,例如将第三方表格整合到现有数据库中以提供业务洞察。鉴于其在数据库系统领域的重要性,模式匹配自2000年以来一直受到研究。本研究在大型语言模型的背景下重新审视这一基础性问题。遵循日益严格的数据安全政策,我们的重点在于零-shot和少-shot场景:模型应仅分析少量客户数据来执行匹配任务,与传统方法审查整个数据表形成对比。我们强调,零-shot或少-shot假设对于保护客户数据的身份和隐私至关重要,即使可能会牺牲准确性。在如此严格的要求下准确匹配属性的能力使我们的工作与这一领域先前的文献有所区别。
更新时间: 2024-06-04 01:08:00
领域: cs.DB,cs.AI,cs.CL,cs.IR,cs.LG
CR-UTP: Certified Robustness against Universal Text Perturbations
It is imperative to ensure the stability of every prediction made by a language model; that is, a language's prediction should remain consistent despite minor input variations, like word substitutions. In this paper, we investigate the problem of certifying a language model's robustness against Universal Text Perturbations (UTPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations (ISTPs), operating under the assumption that any random alteration of a sample's clean or adversarial words would negate the impact of sample-wise perturbations. However, with UTPs, masking only the adversarial words can eliminate the attack. A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius due to input corruption by extensive masking. To solve this challenge, we introduce a novel approach, the superior prompt search method, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking. Additionally, we theoretically motivate why ensembles are a particularly suitable choice as base prompts for random smoothing. The method is denoted by superior prompt ensembling technique. We also empirically confirm this technique, obtaining state-of-the-art results in multiple settings. These methodologies, for the first time, enable high certified accuracy against both UTPs and ISTPs. The source code of CR-UTP is available at https://github.com/UCFML-Research/CR-UTP.
Updated: 2024-06-04 01:02:22
标题: CR-UTP:通用文本扰动的认证稳健性
摘要: 确保语言模型的每个预测都稳定是至关重要的;也就是说,一个语言的预测应该在输入变化时保持一致,例如词语替换。在本文中,我们研究了针对通用文本扰动(UTPs)的语言模型鲁棒性认证的问题,这种扰动已广泛用于通用对抗攻击和后门攻击。现有基于随机平滑的认证鲁棒性显示出在认证特定输入文本扰动(ISTPs)方面具有相当大的潜力,这是基于假设任何对样本的干净或对抗性词语进行的随机改变都将抵消样本级扰动的影响。然而,对于UTPs,只掩盖对抗性词语可能会消除攻击。一种简单的方法是简单地增加掩盖比例和掩盖攻击令牌的可能性,但这会导致由于广泛掩盖而对输入造成的破坏,从而显著降低认证准确度和认证半径。为了解决这一挑战,我们引入了一种新颖的方法,即优越提示搜索方法,旨在识别一个在广泛掩盖下保持更高认证准确度的优越提示。此外,我们在理论上解释了为什么集成是随机平滑的基准提示的特别合适选择。该方法被称为优越提示集成技术。我们还在多种设置下经验性地证实了这一技术,获得了最新的成果。这些方法首次实现了针对UTPs和ISTPs的高度认证准确度。CR-UTP的源代码可在https://github.com/UCFML-Research/CR-UTP 上找到。
更新时间: 2024-06-04 01:02:22
领域: cs.CL,cs.CR,cs.LG
Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression
We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression.
Updated: 2024-06-04 00:56:22
标题: 揭示混合线性回归中EM迭代的螺线轨迹
摘要: 我们研究了期望最大化(EM)算法在两成分混合线性回归(2MLR)中的迭代轨迹和收敛速度。MLR的基本目标是从未标记的观测中学习回归模型。EM算法在解决混合线性回归方面有着广泛的应用。最近的结果已经证实,在没有噪声和高信噪比的情况下,EM对于2MLR具有超线性收敛性,并在一些假设下确认了其全局收敛速度。然而,收敛的指数尚未在理论上估计,并且EM迭代的轨迹的几何特性尚未得到很好的理解。在本文中,首先,我们使用Bessel函数在所有信噪比范围内提供了EM更新的显式闭式表达式。然后,在没有噪声的情况下,通过在总体水平上推导递推关系完全表征了EM迭代的行为,并值得注意的是,我们展示了所有迭代都位于某个摆线上。基于这种新的轨迹分析,我们展示了超线性收敛的指数的理论估计,并进一步改进了有限样本级别的统计误差界。我们的分析为研究混合线性回归的EM算法行为提供了一个新的框架。
更新时间: 2024-06-04 00:56:22
领域: cs.LG,math.ST,stat.ML,stat.TH
SSNet: A Lightweight Multi-Party Computation Scheme for Practical Privacy-Preserving Machine Learning Service in the Cloud
As privacy-preserving becomes a pivotal aspect of deep learning (DL) development, multi-party computation (MPC) has gained prominence for its efficiency and strong security. However, the practice of current MPC frameworks is limited, especially when dealing with large neural networks, exemplified by the prolonged execution time of 25.8 seconds for secure inference on ResNet-152. The primary challenge lies in the reliance of current MPC approaches on additive secret sharing, which incurs significant communication overhead with non-linear operations such as comparisons. Furthermore, additive sharing suffers from poor scalability on party size. In contrast, the evolving landscape of MPC necessitates accommodating a larger number of compute parties and ensuring robust performance against malicious activities or computational failures. In light of these challenges, we propose SSNet, which for the first time, employs Shamir's secret sharing (SSS) as the backbone of MPC-based ML framework. We meticulously develop all framework primitives and operations for secure DL models tailored to seamlessly integrate with the SSS scheme. SSNet demonstrates the ability to scale up party numbers straightforwardly and embeds strategies to authenticate the computation correctness without incurring significant performance overhead. Additionally, SSNet introduces masking strategies designed to reduce communication overhead associated with non-linear operations. We conduct comprehensive experimental evaluations on commercial cloud computing infrastructure from Amazon AWS, as well as across diverse prevalent DNN models and datasets. SSNet demonstrates a substantial performance boost, achieving speed-ups ranging from 3x to 14x compared to SOTA MPC frameworks. Moreover, SSNet also represents the first framework that is evaluated on a five-party computation setup, in the context of secure DL inference.
Updated: 2024-06-04 00:55:06
标题: SSNet:一种轻量级多方计算方案,用于云中实际的隐私保护机器学习服务
摘要: 随着隐私保护成为深度学习(DL)开发中的关键方面,多方计算(MPC)以其高效性和强大的安全性而备受关注。然而,当前MPC框架的实践受到限制,特别是在处理大型神经网络时,例如在ResNet-152上进行安全推断需要25.8秒的执行时间。主要挑战在于当前MPC方法依赖于加法秘密共享,这会导致在非线性操作(如比较)中产生显着的通信开销。此外,加法共享在参与方规模方面存在可扩展性差的问题。相比之下,MPC的发展趋势要求能容纳更多的计算参与方,并确保对恶意活动或计算故障具有强大的性能。 在面对这些挑战时,我们提出了SSNet,首次将Shamir的秘密共享(SSS)作为基于MPC的ML框架的支柱。我们精心设计了所有框架原语和操作,用于定制与SSS方案无缝集成的安全DL模型。SSNet展示了直接扩展参与方数量的能力,并嵌入了策略来验证计算正确性,而不会带来显著的性能开销。此外,SSNet引入了旨在减少与非线性操作相关的通信开销的掩蔽策略。我们在亚马逊AWS的商业云计算基础设施上进行了全面的实验评估,同时跨越不同的流行DNN模型和数据集。SSNet展示了显著的性能提升,速度提升范围从3倍到14倍,与SOTA MPC框架相比。此外,SSNet还是第一个在五方计算设置中进行评估的框架,在安全DL推断的背景下。
更新时间: 2024-06-04 00:55:06
领域: cs.CR,cs.LG
A Survey of Unikernel Security: Insights and Trends from a Quantitative Analysis
Unikernels, an evolution of LibOSs, are emerging as a virtualization technology to rival those currently used by cloud providers. Unikernels combine the user and kernel space into one "uni"fied memory space and omit functionality that is not necessary for its application to run, thus drastically reducing the required resources. The removed functionality however is far-reaching and includes components that have become common security technologies such as Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), and Non-executable bits (NX bits). This raises questions about the real-world security of unikernels. This research presents a quantitative methodology using TF-IDF to analyze the focus of security discussions within unikernel research literature. Based on a corpus of 33 unikernel-related papers spanning 2013-2023, our analysis found that Memory Protection Extensions and Data Execution Prevention were the least frequently occurring topics, while SGX was the most frequent topic. The findings quantify priorities and assumptions in unikernel security research, bringing to light potential risks from underexplored attack surfaces. The quantitative approach is broadly applicable for revealing trends and gaps in niche security domains.
Updated: 2024-06-04 00:51:12
标题: 一个Unikernel安全调查:来自定量分析的见解和趋势
摘要: Unikernels,作为 LibOSs 的演变,正逐渐成为云服务提供商当前所使用的虚拟化技术的竞争对手。Unikernels将用户空间和内核空间合并为一个“统一”的内存空间,并省略了对其应用程序运行不必要的功能,从而大大减少了所需资源。然而,所删除的功能涉及范围广泛,包括已成为常见安全技术的组件,如地址空间布局随机化(ASLR)、数据执行预防(DEP)和不可执行位(NX 位)。这引发了有关unikernels实际安全性的问题。本研究提出了一种使用 TF-IDF 进行分析unikernel研究文献中安全讨论焦点的定量方法。基于涵盖2013-2023年的33篇unikernel相关论文的语料库,我们的分析发现,内存保护扩展和数据执行预防是最不经常出现的主题,而SGX是最常见的主题。这些发现量化了unikernel安全研究中的优先事项和假设,揭示了来自未充分探索的攻击面的潜在风险。这种定量方法广泛适用于揭示小众安全领域中的趋势和差距。
更新时间: 2024-06-04 00:51:12
领域: cs.CR,cs.DC,cs.OS,C.2.4; K.6.5; D.4.6
Adversarial Attacks on Combinatorial Multi-Armed Bandits
We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.
Updated: 2024-06-04 00:49:53
标题: 对组合多臂赌博机的对抗性攻击
摘要: 我们研究了对组合多臂老虎机(CMAB)进行奖励中毒攻击。我们首先提供了CMAB的攻击性的充分必要条件,这是一个用来捕捉CMAB的脆弱性和稳健性的概念。攻击性条件取决于相应CMAB实例的固有属性,如超级臂的奖励分布和基本臂的结果分布。此外,我们设计了一种用于可攻击的CMAB实例的攻击算法。与之前对多臂老虎机的理解相反,我们的研究揭示了一个令人惊讶的事实,即特定CMAB实例的攻击性也取决于老虎机实例对于对手是已知的还是未知的。这一发现表明,在实践中对CMAB进行敌对攻击是困难的,因为环境对于对手来说大多是未知的,因此不存在任何CMAB实例的通用攻击策略。我们通过对包括概率最大覆盖问题、在线最小生成树、用于在线排名的级联老虎机以及在线最短路径等真实世界CMAB应用的广泛实验验证了我们的理论发现。
更新时间: 2024-06-04 00:49:53
领域: cs.LG,cs.DS,stat.ML
Understanding Stochastic Natural Gradient Variational Inference
Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ non-asymptotic convergence rate of stochastic NGVI. The complexity is no worse than stochastic gradient descent (\aka black-box variational inference) and the rate likely has better constant dependency that leads to faster convergence in practice. For non-conjugate likelihoods, we show that stochastic NGVI with the canonical parameterization implicitly optimizes a non-convex objective. Thus, a global convergence rate of $\mathcal{O}(\frac{1}{T})$ is unlikely without some significant new understanding of optimizing the ELBO using natural gradients.
Updated: 2024-06-04 00:45:37
标题: 理解随机自然梯度变分推断
摘要: 随机自然梯度变分推断(NGVI)是一种在各种概率模型中应用广泛的后验推断方法。尽管它被广泛使用,但关于在随机设置下的非渐近收敛速率知之甚少。我们的目标是减少这一差距并提供更好的理解。对于共轭似然函数,我们证明了随机NGVI的第一个$\mathcal{O}(\frac{1}{T})$的非渐近收敛速率。复杂度不会比随机梯度下降(又称黑盒变分推断)更差,而且该速率可能具有更好的常数依赖性,导致在实践中更快的收敛。对于非共轭似然函数,我们表明使用规范参数化的随机NGVI隐式地优化了一个非凸目标。因此,没有一些重大新的理解如何使用自然梯度优化ELBO,全局收敛速率$\mathcal{O}(\frac{1}{T})$是不太可能的。
更新时间: 2024-06-04 00:45:37
领域: cs.LG,stat.ML
Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks
Feedback Alignment (FA) methods are biologically inspired local learning rules for training neural networks with reduced communication between layers. While FA has potential applications in distributed and privacy-aware ML, limitations in multi-class classification and lack of theoretical understanding of the alignment mechanism have constrained its impact. This study introduces a unified framework elucidating the operational principles behind alignment in FA. Our key contributions include: (1) a novel conservation law linking changes in synaptic weights to implicit regularization that maintains alignment with the gradient, with support from experiments, (2) sufficient conditions for convergence based on the concept of alignment dominance, and (3) empirical analysis showing better alignment can enhance FA performance on complex multi-class tasks. Overall, these theoretical and practical advancements improve interpretability of bio-plausible learning rules and provide groundwork for developing enhanced FA algorithms.
Updated: 2024-06-04 00:42:04
标题: 神经网络中反馈对齐学习机制中的隐式正则化
摘要: 反馈对齐(Feedback Alignment,FA)方法是受生物启发的局部学习规则,用于训练神经网络并降低层间通信。虽然FA在分布式和隐私感知机器学习中具有潜在应用,但在多类分类和对齐机制的理论理解方面存在限制,限制了其影响。本研究引入了一个统一的框架,阐明了FA中对齐的操作原理。我们的关键贡献包括:(1)一个新颖的守恒定律,将突触权重的变化与隐式正则化联系起来,保持与梯度的对齐,得到实验证明支持;(2)基于对齐优势概念的收敛充分条件;(3)经验分析显示更好的对齐可以提高FA在复杂多类任务上的性能。总的来说,这些理论和实践进步提高了生物合理学习规则的可解释性,并为开发改进的FA算法奠定了基础。
更新时间: 2024-06-04 00:42:04
领域: cs.LG,stat.ML
Fruit Classification System with Deep Learning and Neural Architecture Search
The fruit identification process involves analyzing and categorizing different types of fruits based on their visual characteristics. This activity can be achieved using a range of methodologies, encompassing manual examination, conventional computer vision methodologies, and more sophisticated methodologies employing machine learning and deep learning. Our study identified a total of 15 distinct categories of fruit, consisting of class Avocado, Banana, Cherry, Apple Braeburn, Apple golden 1, Apricot, Grape, Kiwi, Mango, Orange, Papaya, Peach, Pineapple, Pomegranate and Strawberry. Neural Architecture Search (NAS) is a technological advancement employed within the realm of deep learning and artificial intelligence, to automate conceptualizing and refining neural network topologies. NAS aims to identify neural network structures that are highly suitable for tasks, such as the detection of fruits. Our suggested model with 99.98% mAP increased the detection performance of the preceding research study that used Fruit datasets. In addition, after the completion of the study, a comparative analysis was carried out to assess the findings in conjunction with those of another research that is connected to the topic. When compared to the findings of earlier studies, the detector that was proposed exhibited higher performance in terms of both its accuracy and its precision.
Updated: 2024-06-04 00:41:47
标题: 基于深度学习和神经架构搜索的水果分类系统
摘要: 水果识别过程涉及分析和分类不同类型的水果,基于它们的视觉特征。这一活动可以通过一系列方法实现,包括手动检查、传统的计算机视觉方法,以及采用机器学习和深度学习的更复杂的方法。我们的研究确定了15个不同的水果类别,包括鳄梨、香蕉、樱桃、苹果布雷本、苹果金1、杏子、葡萄、猕猴桃、芒果、橙子、木瓜、桃子、菠萝、石榴和草莓。神经网络架构搜索(NAS)是深度学习和人工智能领域中采用的一项技术进步,用于自动化构思和优化神经网络拓扑结构。NAS旨在确定适用于任务的神经网络结构,例如水果检测。我们提出的模型在Fruit数据集上的检测性能提高了99.98%的mAP,超过了先前研究的检测性能。此外,在完成研究后,进行了比较分析,评估了与该主题相关的另一项研究的发现。与早期研究的发现相比,所提出的检测器在准确性和精度方面表现更好。
更新时间: 2024-06-04 00:41:47
领域: cs.CV,cs.AI,I.2; I.4
Disguised Copyright Infringement of Latent Diffusion Models
Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access. Our code is available at https://github.com/watml/disguised_copyright_infringement.
Updated: 2024-06-04 00:35:06
标题: 潜在扩散模型的伪装版权侵权
摘要: 版权侵权可能发生在一个生成模型产生与训练阶段访问过的一些受版权保护数据极为相似的样本时。访问的概念通常指的是直接在训练数据集中包含受版权保护的样本,可以检查以识别侵权行为。我们认为这种视觉审计很大程度上忽视了一种隐蔽的版权侵权行为,即构建一个看起来与受版权保护的样本截然不同但仍会导致在其上训练潜在扩散模型的效果的伪装。这种伪装只需要间接访问受版权保护的材料,无法在视觉上区分,因此很容易规避当前的审计工具。在本文中,我们通过揭示伪装生成算法、伪装的揭示以及如何检测它们来更好地理解这种伪装的版权侵权行为,以增强现有工具包。此外,我们引入了更广泛的承认概念,以理解这种间接访问。我们的代码可在https://github.com/watml/disguised_copyright_infringement找到。
更新时间: 2024-06-04 00:35:06
领域: cs.LG,cs.CR
On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks
Recent studies reveal that local differential privacy (LDP) protocols are vulnerable to data poisoning attacks where an attacker can manipulate the final estimate on the server by leveraging the characteristics of LDP and sending carefully crafted data from a small fraction of controlled local clients. This vulnerability raises concerns regarding the robustness and reliability of LDP in hostile environments. In this paper, we conduct a systematic investigation of the robustness of state-of-the-art LDP protocols for numerical attributes, i.e., categorical frequency oracles (CFOs) with binning and consistency, and distribution reconstruction. We evaluate protocol robustness through an attack-driven approach and propose new metrics for cross-protocol attack gain measurement. The results indicate that Square Wave and CFO-based protocols in the Server setting are more robust against the attack compared to the CFO-based protocols in the User setting. Our evaluation also unfolds new relationships between LDP security and its inherent design choices. We found that the hash domain size in local-hashing-based LDP has a profound impact on protocol robustness beyond the well-known effect on utility. Further, we propose a zero-shot attack detection by leveraging the rich reconstructed distribution information. The experiment show that our detection significantly improves the existing methods and effectively identifies data manipulation in challenging scenarios.
Updated: 2024-06-04 00:30:36
标题: 关于数据中毒攻击下数值属性LDP协议的稳健性
摘要: 最近的研究表明,本地差分隐私(LDP)协议容易受到数据毒化攻击的影响,攻击者可以利用LDP的特性,通过向受控的本地客户端发送精心制作的数据来操纵服务器上的最终估计。这种脆弱性引发了对LDP在敌对环境中的稳健性和可靠性的担忧。 在本文中,我们对最先进的用于数值属性的LDP协议进行了系统调查,即具有分箱和一致性以及分布重建的分类频率预言(CFOs)。我们通过攻击驱动方法评估协议的稳健性,并提出了用于跨协议攻击增益测量的新指标。结果表明,在服务器设置中,Square Wave和基于CFO的协议比用户设置中的基于CFO的协议更具抵抗攻击能力。我们的评估还揭示了LDP安全性与其固有设计选择之间的新关系。我们发现,基于局部散列的LDP中的哈希域大小对协议的稳健性产生了深远影响,超出了对效用的已知影响。此外,我们提出了一种通过利用丰富的重建分布信息进行零射击检测的方法。实验证明,我们的检测显着改进了现有方法,并有效地识别了在具有挑战性的场景中的数据操纵。
更新时间: 2024-06-04 00:30:36
领域: cs.CR
Demystifying Oversmoothing in Attention-Based Graph Neural Networks
Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.
Updated: 2024-06-04 00:30:31
标题: 揭示基于注意力的图神经网络中的过度平滑现象
摘要: 图神经网络(GNNs)中的过度平滑指的是增加网络深度导致节点表示变得均匀的现象。虽然先前的研究已经确定图卷积网络(GCNs)会指数级地失去表达能力,但图注意力机制是否可以缓解过度平滑仍存在争议。在这项工作中,我们通过严格的数学分析,将基于注意力的GNNs视为非线性时变动力系统,并结合不均匀矩阵积和联合谱半径的理论工具和技术,为这个问题提供了明确的答案。我们建立了一个框架,扩展了对对称GCNs过度平滑的现有结果,适用于更广泛的GNN模型,包括随机游走GCNs、图注意力网络(GATs)和(图)变压器。特别地,我们的分析考虑了不对称的、状态相关的和时变的聚合算子,以及各种常见的非线性激活函数,如ReLU、LeakyReLU、GELU和SiLU。我们确定,与普遍看法相反,图注意力机制不能防止过度平滑,并且会指数级地失去表达能力。
更新时间: 2024-06-04 00:30:31
领域: cs.LG,cs.SI,stat.ML
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms. However, Transformers are unique in that they can implement gradient-based learning algorithms under simple parameter configurations. This paper provides theoretical and empirical evidence that (non-linear) Transformers naturally learn to implement gradient descent in function space, which in turn enable them to learn non-linear functions in context. Our results apply to a broad class of combinations of non-linear architectures and non-linear in-context learning tasks. Additionally, we show that the optimal choice of non-linear activation depends in a natural way on the class of functions that need to be learned.
Updated: 2024-06-04 00:20:05
标题: Transformer实现功能梯度下降来学习上下文中的非线性函数
摘要: 许多神经网络架构被认为是图灵完备的,因此原则上可以实现任意算法。然而,变压器在于它们可以在简单的参数配置下实现基于梯度的学习算法。本文提供了理论和经验证据表明(非线性)变压器自然地学习实现函数空间中的梯度下降,从而使它们能够在上下文中学习非线性函数。我们的结果适用于一类非线性架构和非线性上下文学习任务的组合。此外,我们还展示了非线性激活函数的最佳选择在自然地取决于需要学习的函数类别。
更新时间: 2024-06-04 00:20:05
领域: cs.LG
Feedback Loops With Language Models Drive In-Context Reward Hacking
Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents. These interactions form feedback loops: LLM outputs affect the world, which in turn affect subsequent LLM outputs. In this work, we show that feedback loops can cause in-context reward hacking (ICRH), where the LLM at test-time optimizes a (potentially implicit) objective but creates negative side effects in the process. For example, consider an LLM agent deployed to increase Twitter engagement; the LLM may retrieve its previous tweets into the context window and make them more controversial, increasing engagement but also toxicity. We identify and study two processes that lead to ICRH: output-refinement and policy-refinement. For these processes, evaluations on static datasets are insufficient -- they miss the feedback effects and thus cannot capture the most harmful behavior. In response, we provide three recommendations for evaluation to capture more instances of ICRH. As AI development accelerates, the effects of feedback loops will proliferate, increasing the need to understand their role in shaping LLM behavior.
Updated: 2024-06-04 00:16:52
标题: 使用语言模型的反馈环路推动上下文奖励黑客行为
摘要: 语言模型影响外部世界:它们查询读取和写入网页的API,生成塑造人类行为的内容,并作为自主代理运行系统命令。这些交互形成了反馈循环:LLM输出影响世界,进而影响随后的LLM输出。在这项工作中,我们展示了反馈循环可能导致上下文奖励欺骗(ICRH),即在测试时,LLM优化一个(潜在的)目标,但在过程中产生负面副作用。例如,考虑一个部署在增加Twitter参与度的LLM代理;LLM可能将其先前的推文检索到上下文窗口中,并使它们更具争议性,增加参与度但也增加了有毒性。我们确定并研究了导致ICRH的两个过程:输出细化和策略细化。对于这些过程,在静态数据集上的评估是不够的——它们忽略了反馈效应,因此无法捕捉到最有害的行为。作为回应,我们提供了三条评估建议,以捕捉更多ICRH实例。随着人工智能的发展加速,反馈循环的影响将蔓延,增加了理解它们在塑造LLM行为中的作用的必要性。
更新时间: 2024-06-04 00:16:52
领域: cs.LG,cs.AI,cs.CL
Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers
Optimal transport (OT) and the related Wasserstein metric (W) are powerful and ubiquitous tools for comparing distributions. However, computing pairwise Wasserstein distances rapidly becomes intractable as cohort size grows. An attractive alternative would be to find an embedding space in which pairwise Euclidean distances map to OT distances, akin to standard multidimensional scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder that embeds empirical distributions into a latent space wherein Euclidean distances approximate OT distances. Extending MDS theory, we show that our objective function implies a bound on the error incurred when embedding non-Euclidean distances. Empirically, distances between Wormhole embeddings closely match Wasserstein distances, enabling linear time computation of OT distances. Along with an encoder that maps distributions to embeddings, Wasserstein Wormhole includes a decoder that maps embeddings back to distributions, allowing for operations in the embedding space to generalize to OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By lending scalability and interpretability to OT approaches, Wasserstein Wormhole unlocks new avenues for data analysis in the fields of computational geometry and single-cell biology.
Updated: 2024-06-04 00:09:59
标题: Wasserstein Wormhole:使用变换器实现可扩展的最优输运距离
摘要: 最优输运(OT)及相关的Wasserstein度量(W)是比较分布的强大而普遍的工具。然而,随着队列大小的增长,计算成对的Wasserstein距离迅速变得难以处理。一个吸引人的替代方案是找到一个嵌入空间,其中成对的欧氏距离映射到OT距离,类似于标准的多维缩放(MDS)。我们提出了Wasserstein Wormhole,这是一个基于变压器的自动编码器,将经验分布嵌入到一个潜在空间中,在这个空间中,欧氏距离近似于OT距离。扩展MDS理论,我们展示了我们的目标函数意味着在嵌入非欧几里德距离时所产生的误差的一个界限。在实践中,Wormhole嵌入之间的距离与Wasserstein距离非常接近,从而实现了OT距离的线性时间计算。除了一个将分布映射到嵌入的编码器外,Wasserstein Wormhole还包括一个将嵌入映射回分布的解码器,使得在嵌入空间中的操作能够推广到OT空间,如Wasserstein重心估计和OT插值。通过为OT方法提供可伸缩性和可解释性,Wasserstein Wormhole为计算几何和单细胞生物学领域的数据分析开辟了新的途径。
更新时间: 2024-06-04 00:09:59
领域: cs.LG,cs.CG,q-bio.GN
Neural Green's Operators for Parametric Partial Differential Equations
This work introduces neural Green's operators (NGOs), a novel neural operator network architecture that learns the solution operator for a parametric family of linear partial differential equations (PDEs). Our construction of NGOs is derived directly from the Green's formulation of such a solution operator. Similar to deep operator networks (DeepONets) and variationally mimetic operator networks (VarMiONs), NGOs constitutes an expansion of the solution to the PDE in terms of basis functions, that is returned from a sub-network, contracted with coefficients, that are returned from another sub-network. However, in accordance with the Green's formulation, NGOs accept weighted averages of the input functions, rather than sampled values thereof, as is the case in DeepONets and VarMiONs. Application of NGOs to canonical linear parametric PDEs shows that, while they remain competitive with DeepONets, VarMiONs and Fourier neural operators when testing on data that lie within the training distribution, they robustly generalize when testing on finer-scale data generated outside of the training distribution. Furthermore, we show that the explicit representation of the Green's function that is returned by NGOs enables the construction of effective preconditioners for numerical solvers for PDEs.
Updated: 2024-06-04 00:02:52
标题: 神经网络参数化偏微分方程的Green算子
摘要: 这项工作介绍了神经格林算子(NGOs),一种新颖的神经算子网络架构,用于学习线性偏微分方程(PDEs)的参数族的解算子。我们构建NGOs的方法直接源自这种解算子的格林公式。类似于深度算子网络(DeepONets)和变分拟态算子网络(VarMiONs),NGOs将PDE的解展开成基函数的形式,这些基函数来自一个子网络,与来自另一个子网络的系数相乘。然而,与格林公式一致,NGOs接受输入函数的加权平均值,而不是其中的采样值,这与DeepONets和VarMiONs不同。将NGOs应用于经典线性参数PDEs显示,尽管在训练分布内的数据测试时,它们与DeepONets、VarMiONs和傅里叶神经算子竞争力相当,但在测试超出训练分布的精细数据时,它们具有良好的泛化能力。此外,我们展示了由NGOs返回的格林函数的显式表示使得能够为PDE的数值解算器构建有效的预处理器。
更新时间: 2024-06-04 00:02:52
领域: cs.LG,cs.NA,math.NA,68T07,I.2.6; G.1.8
TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability
Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. We perform some initial analyses using this dataset and find several instances of LLMs failing in simple tasks showing their inability to understand simple questions.
Updated: 2024-06-04 00:01:35
标题: TruthEval:用于评估LLM真实性和可靠性的数据集
摘要: 大型语言模型(LLM)评估目前是研究的一个最重要领域,现有的基准测试表明存在不足,不完全代表LLMs的各种能力。我们提出了一个经过精心策划的具有挑战性敏感话题声明的LLM基准测试集合,称为TruthEval。这些声明是手工策划的,包含已知的真实值。选择的类别旨在区分LLMs的能力和它们的随机性。我们使用这个数据集进行了一些初步分析,并发现了几个LLMs在简单任务中失败的实例,显示它们无法理解简单问题。
更新时间: 2024-06-04 00:01:35
领域: cs.CL,cs.AI