Arxiv Day: Article

Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal

In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53,000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult or impossible to estimate with historical data. We propose hybrid approaches that incorporate the strengths of both predictive approaches (accurate estimation) and causal approaches (correct criterion); we show that targeting intermediate baseline outcomes is most effective in our specific application, while targeting based on low baseline outcomes is detrimental. In one year of the experiment, nudging all students improved early filing by an average of 6.4 percentage points over a baseline average of 37% filing, and we estimate that targeting half of the students using our preferred policy attains around 75% of this benefit.

Updated: 2024-05-31 23:59:02

标题: 机器学习：在学生经济资助续签领域实验中进行因果与预测目标定向助推的研究

摘要: 在许多情况下，干预措施可能对某些人比其他人更有效，因此定向干预可能是有益的。我们在一个规模庞大的现场实验中分析了定向的价值，该实验涉及超过53,000名大学生，旨在使用“助推”来鼓励学生在非约束性截止日期之前更新他们的财政援助申请。我们从基线定向方法开始。首先，我们基于一个估计异质治疗效应的因果森林进行定向，然后根据估计具有最高治疗效应的学生进行治疗分配。接下来，我们评估了两种替代的定向策略，一种是针对在没有治疗的情况下预测的财政援助续约概率低的学生，另一种是针对那些概率高的学生。预测的基线结果并不是定向的理想标准，也不清楚是优先考虑低、高还是中等的预测概率。尽管如此，在实践中，基于低基线结果的定向是常见的，例如因为个体特征与治疗效应之间的关系通常难以或不可能用历史数据来估计。我们提出混合方法，结合了预测方法（准确估计）和因果方法（正确标准）的优势；我们表明，在我们的具体应用中，基于中等基线结果的定向效果最好，而基于低基线结果的定向是有害的。在实验的一年中，对所有学生进行助推使初期申请的平均增加了6.4个百分点，超过了37%的基线平均值，我们估计使用我们首选政策对一半学生进行定向可以实现约75%的这种效益。

更新时间: 2024-05-31 23:59:02

领域: econ.EM,cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2310.08672v2

Unpacking the Black Box: Regulating Algorithmic Decisions

What should regulators of complex algorithms regulate? We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. In our model, a regulator is limited in how much she can learn about a black-box model deployed by an agent with misaligned preferences. The regulator faces two choices: first, whether to allow for the use of complex algorithms; and second, which key properties of algorithms to regulate. We show that limiting agents to algorithms that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and complex algorithms have sufficiently better performance than simple ones. Allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them. Regulation that focuses on the overall average behavior of algorithms, for example based on standard explainer tools, will generally be inefficient. Targeted regulation that focuses on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that is preferred both by the lender and from the perspective of the financial regulator.

Updated: 2024-05-31 23:47:21

标题: 打开黑盒子：规范算法决策

摘要: 监管复杂算法应该如何监管？我们提出了一个监督模型，用于监管在高风险应用中使用的“黑匣子”算法，比如贷款、医疗测试或招聘。在我们的模型中，监管者在了解代理人部署的黑匣子模型方面存在限制。监管者面临两个选择：首先，是否允许使用复杂算法；其次，要监管算法的哪些关键属性。我们表明，只要失调有限并且复杂算法的性能比简单算法好得多，限制代理人只能使用足够简单以便完全透明的算法是低效的。允许使用复杂算法可以提高福利，但收益取决于监管者如何监管它们。例如，基于标准解释工具的整体平均行为的监管通常是低效的。针对性的监管，专注于激励失调的根源，例如过多的假阳性或种族差异，可以提供次优解决方案。我们通过一个消费者贷款的应用提供了对我们理论发现的实证支持，我们在该应用中发现，基于特定背景的解释工具监管的复杂模型优于简单、完全透明的模型。这种来自复杂模型的收益代表了我们实证应用中的帕累托改进，这种改进被放贷方和金融监管者所偏好。

更新时间: 2024-05-31 23:47:21

领域: econ.GN,cs.AI,cs.LG,q-fin.EC,stat.ML

下载: http://arxiv.org/abs/2110.03443v3

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transformer models trained on multiple synthetic datasets and with different training objectives and show that while other objectives e.g. sequence-to-sequence modeling, prefix language modeling, often failed to lead to hierarchical generalization, models trained with the language modeling objective consistently learned to generalize hierarchically. We then conduct pruning experiments to study how transformers trained with the language modeling objective encode hierarchical structure. When pruned, we find joint existence of subnetworks within the model with different generalization behaviors (subnetworks corresponding to hierarchical structure and linear order). Finally, we take a Bayesian perspective to further uncover transformers' preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and whether the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization.

Updated: 2024-05-31 23:47:15

标题: 学习语法而无需种植树：理解Transformer何时以及为何会进行层次化泛化

摘要: 在自然语言数据上训练的Transformers已经被证明学习了其层次结构，并且可以推广到具有未见句法结构的句子，而无需显式编码任何结构偏见。在这项工作中，我们调查了Transformer模型和它们的训练中可能导致这种泛化行为出现的归纳偏差来源。我们在多个合成数据集上广泛实验Transformer模型，并使用不同的训练目标，结果显示，虽然其他目标（例如序列到序列建模、前缀语言建模）往往无法实现层次泛化，但使用语言建模目标训练的模型一直能够一致地学习层次泛化。然后我们进行修剪实验，研究使用语言建模目标训练的Transformers如何编码层次结构。在修剪时，我们发现模型中存在具有不同泛化行为的子网络（子网络对应于层次结构和线性顺序）。最后，我们以贝叶斯视角进一步揭示Transformers对层次泛化的偏好：我们建立了Transformers在数据集上是否层次泛化与该数据集的最简解释是否由层次语法提供之间的相关性，与展示线性泛化的常规语法相比。

更新时间: 2024-05-31 23:47:15

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.16367v2

ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks

Despite remarkable advancements in mitigating hallucinations in large language models (LLMs) by retrieval augmentation, it remains challenging to measure the reliability of LLMs using static question-answering (QA) data. Specifically, given the potential of data contamination (e.g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private. Inspired by adversarial machine learning, we investigate the feasibility of automatically perturbing existing static one for dynamic evaluation. Specifically, this paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases for evaluating the LLMs' reliability in using new evidence for answering. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets on a collection of LLMs under various prompting settings. Our generated data is human-readable and useful to trigger hallucination in LLM. Accurate models on static data are observed to produce unsupported answers from the perturbed evidence, with pronounced accuracy drops across LLMs including GPT-4. We find that our adversarial examples are transferable across all considered LLMs. The examples generated by a small model can be used to evaluate a much larger model, making our approach cost-effective.

Updated: 2024-05-31 23:46:24

标题: ReEval：通过可转移的对抗攻击对检索增强的大型语言模型进行自动幻觉评估

摘要: 尽管通过检索增强在大型语言模型（LLMs）中减轻幻觉方面取得了显著进展，但使用静态问答（QA）数据来衡量LLMs的可靠性仍然具有挑战性。具体而言，考虑到数据污染的潜在性（例如导致记忆），良好的静态基准性能并不能确保模型可以可靠地利用提供的证据进行回答，这对于避免在需要的知识是新的或私人的情况下产生幻觉至关重要。受到对抗机器学习的启发，我们调查了自动扰动现有静态数据以进行动态评估的可行性。具体来说，本文提出了一种基于LLM的框架ReEval，该框架使用提示链接来扰动原始证据，生成用于评估LLMs可靠性的新测试案例，以便用于回答新证据。我们使用ChatGPT实现了ReEval，并在各种提示设置下对两个流行的开放领域QA数据集的各种LLMs变体进行评估。我们生成的数据可以被人类阅读，并可用于触发LLM中的幻觉。发现在静态数据上表现准确的模型会从扰动的证据中产生不支持的答案，包括GPT-4在内的各种LLMs的准确性显著下降。我们发现我们的对抗性示例在所有考虑的LLMs上都是可转移的。一个小模型生成的示例可以用于评估一个更大的模型，使我们的方法具有成本效益。

更新时间: 2024-05-31 23:46:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.12516v2

HoSNN: Adversarially-Robust Homeostatic Spiking Neural Networks with Adaptive Firing Thresholds

While spiking neural networks (SNNs) offer a promising neurally-inspired model of computation, they are vulnerable to adversarial attacks. We present the first study that draws inspiration from neural homeostasis to design a threshold-adapting leaky integrate-and-fire (TA-LIF) neuron model and utilize TA-LIF neurons to construct the adversarially robust homeostatic SNNs (HoSNNs) for improved robustness. The TA-LIF model incorporates a self-stabilizing dynamic thresholding mechanism, offering a local feedback control solution to the minimization of each neuron's membrane potential error caused by adversarial disturbance. Theoretical analysis demonstrates favorable dynamic properties of TA-LIF neurons in terms of the bounded-input bounded-output stability and suppressed time growth of membrane potential error, underscoring their superior robustness compared with the standard LIF neurons. When trained with weak FGSM attacks (attack budget = 2/255) and tested with much stronger PGD attacks (attack budget = 8/255), our HoSNNs significantly improve model accuracy on several datasets: from 30.54% to 74.91% on FashionMNIST, from 0.44% to 35.06% on SVHN, from 0.56% to 42.63% on CIFAR10, from 0.04% to 16.66% on CIFAR100, over the conventional LIF-based SNNs.

Updated: 2024-05-31 23:45:57

标题: HoSNN：具有自适应发射阈值的对抗性稳态脉冲神经网络

摘要: 尽管尖峰神经网络（SNNs）提供了一种有前景的神经启发式计算模型，但它们容易受到对抗性攻击的影响。我们提出了第一项研究，从神经稳态中汲取灵感，设计了一个自适应阈值漏电积分-发射（TA-LIF）神经元模型，并利用TA-LIF神经元构建了具有对抗鲁棒性的稳态SNNs（HoSNNs），以提高鲁棒性。TA-LIF模型结合了自稳定的动态阈值机制，为最小化由对抗性干扰引起的每个神经元的膜电位误差提供了一种局部反馈控制解决方案。理论分析表明，与标准LIF神经元相比，TA-LIF神经元在有界输入有界输出稳定性和膜电位误差的抑制时间增长方面具有有利的动态特性，强调了它们相对于传统基于LIF的SNNs的优越鲁棒性。在用弱FGSM攻击（攻击预算=2/255）训练并用更强的PGD攻击（攻击预算=8/255）进行测试时，我们的HoSNNs在多个数据集上显着提高了模型的准确性：在FashionMNIST上从30.54％提高到74.91％，在SVHN上从0.44％提高到35.06％，在CIFAR10上从0.56％提高到42.63％，在CIFAR100上从0.04％提高到16.66％，均优于传统的基于LIF的SNNs。

更新时间: 2024-05-31 23:45:57

领域: cs.NE,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.10373v3

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.

Updated: 2024-05-31 23:31:00

标题: 您的公平模型有多稳健？探索多种公平策略的稳健性

摘要: 随着机器学习在高风险决策中的引入，确保算法公平性已成为一个日益重要的问题需要解决。为此，提出了许多数学上的公平性定义，并开发了各种优化技术，均旨在最大化某种明确定义的公平性概念。然而，公平解决方案依赖于训练数据的质量，并且对噪声非常敏感。最近的研究表明，稳健性（模型在未见数据上表现良好的能力）在应对新问题时应采取的策略类型中起着重要作用，因此，测量这些策略的稳健性已成为一个基本问题。因此，在这项工作中，我们提出了一个衡量各种公平性优化策略稳健性的新标准 - 稳健性比率。我们在五个公平性基准数据集上进行了多次广泛实验，使用了三种最受欢迎的公平性策略，针对四种最受欢迎的公平性定义。我们的实验经验证明，依赖阈值优化的公平性方法在所有评估的数据集中对噪声非常敏感，尽管大多数情况下表现优于其他方法。这与其他两种方法形成鲜明对比，这两种方法在低噪声情形下较不公平，但在高噪声情形下更公平。据我们所知，我们是第一次定量评估公平性优化策略的稳健性。这可能作为选择各种数据集最适合的公平性策略的指南。

更新时间: 2024-05-31 23:31:00

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2207.04581v4

AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research

The recent embrace of machine learning (ML) in the development of autonomous weapons systems (AWS) creates serious risks to geopolitical stability and the free exchange of ideas in AI research. This topic has received comparatively little attention of late compared to risks stemming from superintelligent artificial general intelligence (AGI), but requires fewer assumptions about the course of technological development and is thus a nearer-future issue. ML is already enabling the substitution of AWS for human soldiers in many battlefield roles, reducing the upfront human cost, and thus political cost, of waging offensive war. In the case of peer adversaries, this increases the likelihood of "low intensity" conflicts which risk escalation to broader warfare. In the case of non-peer adversaries, it reduces the domestic blowback to wars of aggression. This effect can occur regardless of other ethical issues around the use of military AI such as the risk of civilian casualties, and does not require any superhuman AI capabilities. Further, the military value of AWS raises the specter of an AI-powered arms race and the misguided imposition of national security restrictions on AI research. Our goal in this paper is to raise awareness among the public and ML researchers on the near-future risks posed by full or near-full autonomy in military technology, and we provide regulatory suggestions to mitigate these risks. We call upon AI policy experts and the defense AI community in particular to embrace transparency and caution in their development and deployment of AWS to avoid the negative effects on global stability and AI research that we highlight here.

Updated: 2024-05-31 23:28:13

标题: AI技术驱动的自主武器可能导致地缘政治不稳定并威胁AI研究

摘要: 最近在自主武器系统（AWS）的发展中对机器学习（ML）的采纳给地缘政治稳定和人工智能研究的自由交流带来了严重风险。与源自超智能通用人工智能（AGI）的风险相比，这个话题最近受到的关注较少，但对技术发展的走向做出的假设较少，因此是一个更近期的问题。ML已经使得AWS能够在许多战场角色中取代人类士兵，降低了发动进攻战争的前期人力成本，从而也降低了政治成本。在与同等实力的对手的情况下，这增加了“低强度”冲突升级为更广泛战争的可能性。在与非同等实力的对手的情况下，它减少了侵略战争所带来的国内不满。这种影响可以在军事人工智能使用的其他伦理问题（如对平民伤亡的风险）之外发生，并且不需要任何超人类的人工智能能力。此外，AWS的军事价值引发了AI动力军备竞赛的担忧，并导致错误地对AI研究施加国家安全限制。本文的目标是提高公众和ML研究人员对军事技术中全面或接近全面自主性带来的近期风险的意识，并提供监管建议以减轻这些风险。我们呼吁AI政策专家和特别是国防AI社区在开发和部署AWS时采取透明和谨慎的态度，以避免我们在这里强调的对全球稳定和AI研究产生的负面影响。

更新时间: 2024-05-31 23:28:13

领域: cs.CY,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.01859v2

Reconstructing Graph Diffusion History from a Single Snapshot

Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.

Updated: 2024-05-31 23:25:07

标题: 从单个快照重建图传播历史

摘要: 图上的扩散是普遍存在的，并具有许多高影响应用。在这些应用中，完整的扩散历史在识别动态模式、反映预防行动和预测干预效果方面起着至关重要的作用。尽管它们的重要性，完整的扩散历史很少可用，并且由于问题不适定性、爆炸性搜索空间和训练数据的稀缺性而极具挑战性。迄今为止，存在一些扩散历史重建方法。它们完全基于最大似然估计（MLE）公式，并需要知道真实的扩散参数。在本文中，我们研究了一个更难的问题，即从单个快照中重建扩散历史（DASH），在这种情况下，我们试图仅通过最终快照来重建历史，而不知道真实的扩散参数。我们从理论分析开始，揭示了MLE公式的一个基本限制。我们证明：（a）由于扩散参数估计的NP难度，扩散参数的估计误差是无法避免的，（b）MLE公式对扩散参数的估计误差敏感。为了克服MLE公式的固有限制，我们提出了一种新颖的重心公式：找到历史后验分布的重心，可以证明这种方法对扩散参数的估计误差是稳定的。我们进一步开发了一种名为DIffusion hiTting Times with Optimal proposal（DITTO）的有效求解器，通过将问题简化为通过Metropolis-Hastings马尔可夫链蒙特卡洛方法（M-H MCMC）估计后验期望击中时间，并利用无监督图神经网络学习一个优化建议，加速M-H MCMC的收敛。我们进行了大量实验来证明所提出方法的有效性。

更新时间: 2024-05-31 23:25:07

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2306.00488v4

Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading

A common approach to quantifying neural text classifier interpretability is to calculate faithfulness metrics based on iteratively masking salient input tokens and measuring changes in the model prediction. We propose that this property is better described as "sensitivity to iterative masking", and highlight pitfalls in using this measure for comparing text classifier interpretability. We show that iterative masking produces large variation in faithfulness scores between otherwise comparable Transformer encoder text classifiers. We then demonstrate that iteratively masked samples produce embeddings outside the distribution seen during training, resulting in unpredictable behaviour. We further explore task-specific considerations that undermine principled comparison of interpretability using iterative masking, such as an underlying similarity to salience-based adversarial attacks. Our findings give insight into how these behaviours affect neural text classifiers, and provide guidance on how sensitivity to iterative masking should be interpreted.

Updated: 2024-05-31 22:41:54

标题: 坚固的不忠：当掩盖语言模型上的忠诚度测量误导时

摘要: 一种常见的量化神经文本分类器可解释性的方法是基于反复屏蔽显著的输入标记，并测量模型预测的变化来计算忠实度指标。我们提出，这种属性更好地描述为“对迭代屏蔽的敏感性”，并强调在比较文本分类器可解释性时使用这种度量的缺陷。我们展示了迭代屏蔽在否则可比较的Transformer编码器文本分类器之间产生了忠实度分数的大变化。然后我们证明了迭代屏蔽样本产生了在训练过程中看不到的分布之外的嵌入，导致了不可预测的行为。我们进一步探讨了特定任务的考虑因素，这些因素破坏了使用迭代屏蔽进行解释性比较的基本原则，如与显著性基础对抗攻击的潜在相似性。我们的发现揭示了这些行为如何影响神经文本分类器，并提供了关于如何解释对迭代屏蔽的敏感性的指导。

更新时间: 2024-05-31 22:41:54

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2308.06795v2

Sampling-based Distributed Training with Message Passing Neural Network

In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN). Our objective is to address the challenge of scaling edge-based graph neural networks as the number of nodes increases. Through our distributed training approach, coupled with Nystr\"om-approximation sampling techniques, we present a scalable graph neural network, referred to as DS-MPNN (D and S standing for distributed and sampled, respectively), capable of scaling up to $O(10^5)$ nodes. We validate our sampling and distributed training approach on two cases: (a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils, providing comparisons with both single-GPU implementation and node-based graph convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy to single-GPU implementation, can accommodate a significantly larger number of nodes compared to the single-GPU variant (S-MPNN), and significantly outperforms the node-based GCN.

Updated: 2024-05-31 22:39:26

标题: 基于采样的消息传递神经网络分布式训练

摘要: 在这项研究中，我们介绍了一种基于域分解的分布式训练和推断方法，用于消息传递神经网络（MPNN）。我们的目标是解决随着节点数量增加而扩展边缘图神经网络的挑战。通过我们的分布式训练方法，结合Nyström逼近采样技术，我们提出了一种可扩展的图神经网络，称为DS-MPNN（D和S分别代表分布式和采样），能够扩展到$O(10^5)$个节点。我们在两个案例上验证了我们的采样和分布式训练方法：（a）达西流数据集和（b）二维翼型的稳态RANS模拟，与单GPU实现和基于节点的图卷积网络（GCNs）进行比较。DS-MPNN模型表现出与单GPU实现相当的准确性，相比于单GPU变体（S-MPNN），能够容纳数量显着更多的节点，并且明显优于基于节点的GCN。

更新时间: 2024-05-31 22:39:26

领域: cs.LG,cs.DC,physics.flu-dyn

下载: http://arxiv.org/abs/2402.15106v3

Fast Inference of Removal-Based Node Influence

Graph neural networks (GNNs) are widely utilized to capture the information spreading patterns in graphs. While remarkable performance has been achieved, there is a new trending topic of evaluating node influence. We propose a new method of evaluating node influence, which measures the prediction change of a trained GNN model caused by removing a node. A real-world application is, "In the task of predicting Twitter accounts' polarity, had a particular account been removed, how would others' polarity change?". We use the GNN as a surrogate model whose prediction could simulate the change of nodes or edges caused by node removal. Our target is to obtain the influence score for every node, and a straightforward way is to alternately remove every node and apply the trained GNN on the modified graph to generate new predictions. It is reliable but time-consuming, so we need an efficient method. The related lines of work, such as graph adversarial attack and counterfactual explanation, cannot directly satisfy our needs, since their problem settings are different. We propose an efficient, intuitive, and effective method, NOde-Removal-based fAst GNN inference (NORA), which uses the gradient information to approximate the node-removal influence. It only costs one forward propagation and one backpropagation to approximate the influence score for all nodes. Extensive experiments on six datasets and six GNN models verify the effectiveness of NORA. Our code is available at https://github.com/weikai-li/NORA.git.

Updated: 2024-05-31 22:36:34

标题: 快速推断基于移除的节点影响力

摘要: 图神经网络（GNNs）被广泛用于捕捉图中信息传播模式。虽然已经取得了显著的性能，但评估节点影响力是一个新的热门话题。我们提出了一种评估节点影响力的新方法，该方法衡量了通过移除节点引起的训练好的GNN模型的预测变化。一个现实世界的应用是，“在预测Twitter账户情感极性的任务中，如果移除了某个特定账户，其他账户的情感极性会如何变化？”。我们将GNN作为代理模型，其预测可以模拟通过节点移除引起的节点或边的变化。我们的目标是为每个节点获取影响力评分，一种直接的方法是交替移除每个节点并将训练好的GNN应用于修改后的图上，生成新的预测。这种方法是可靠的但耗时，因此我们需要一种高效的方法。相关工作线，如图对抗攻击和反事实解释，不能直接满足我们的需求，因为它们的问题设置不同。我们提出了一种高效、直观和有效的方法，基于节点移除的快速GNN推断（NORA），它使用梯度信息来近似节点移除的影响力。只需一个前向传播和一个反向传播即可近似所有节点的影响力评分。对六个数据集和六个GNN模型的大量实验验证了NORA的有效性。我们的代码可在https://github.com/weikai-li/NORA.git 上找到。

更新时间: 2024-05-31 22:36:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.08333v3

Is machine learning good or bad for the natural sciences?

Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they amplify confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics.

Updated: 2024-05-31 22:28:18

标题: 机器学习对自然科学是好还是坏？

摘要: 机器学习（ML）方法正在对所有科学领域产生巨大影响。然而，ML具有强烈的本体论 - 只有数据存在于其中 - 和强烈的认识论 - 如果模型在留出的训练数据上表现良好，则被认为是好的。这些哲学观念与自然科学中的标准实践和重要哲学观念存在强烈冲突。在这里，我们确定了一些自然科学中ML的位置，其中本体论和认识论具有价值。例如，当使用表达能力强的机器学习模型进行因果推理以表示混杂因素（如前景、背景或仪器校准参数）的影响时，模型容量和ML的宽松哲学观念可以使结果更加可信。我们还表明，在某些情况下，引入ML会带来强烈且不受欢迎的统计偏见。例如，当使用ML模型模拟物理（或基于第一原理）模拟时，它们会放大确认偏见。另一方面，当使用表达能力强的回归模型对数据集进行标记时，这些标签在未经控制的情况下不能用于下游联合或集成分析。标题中的问题被问及所有自然科学领域；也就是说，我们呼吁科学界退一步，考虑ML在他们领域中的作用和价值；我们在这里给出的（部分）答案来自物理学的特定视角。

更新时间: 2024-05-31 22:28:18

领域: stat.ML,astro-ph.IM,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2405.18095v2

Kernel Ridge Riesz Representers: Generalization Error and Mis-specification

Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) providing inference for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population $L_2$ rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends inference beyond average effects to heterogeneous effects, i.e. causal functions. I use KRRR to infer heterogeneous treatment effects, by age, of 401(k) eligibility on assets.

Updated: 2024-05-31 22:24:23

标题: 核Ridge Riesz 代表：泛化误差和误差规范

摘要: 核平衡权重为平均处理效应提供置信区间，基于在特征空间中平衡受治组和未受治组的协变量的想法，通常采用岭正则化。先前关于经典核岭平衡权重的研究存在一定局限性：(i)未明确平衡权重的泛化错误，(ii)通常需要正确指定特征，(iii)仅提供平均效应的推断。我将核平衡权重解释为核岭Riesz代表者(KRRR)，通过对反事实有效维度进行新的表征，解决了这些局限性。KRRR是核岭回归和核岭平衡权重的精确概括。我证明了类似于核岭回归的强大性质：控制泛化错误的人口$L_2$率，以及一个独立的闭式解决方案可以插值。该框架放宽了基础回归模型由特征正确规定的严格假设。它将推断扩展到异质效应，即因果函数，而不仅仅是平均效应。我使用KRRR来推断401(k)资格对资产的年龄异质处理效应。

更新时间: 2024-05-31 22:24:23

领域: stat.ML,cs.LG,econ.EM,math.ST,stat.TH,62G15, 62D20, 46E22,G.3; J.4

下载: http://arxiv.org/abs/2102.11076v3

Pontryagin Neural Operator for Solving Parametric General-Sum Differential Games

The values of two-player general-sum differential games are viscosity solutions to Hamilton-Jacobi-Isaacs (HJI) equations. Value and policy approximations for such games suffer from the curse of dimensionality (CoD). Alleviating CoD through physics-informed neural networks (PINN) encounters convergence issues when differentiable values with large Lipschitz constants are present due to state constraints. On top of these challenges, it is often necessary to learn generalizable values and policies across a parametric space of games, e.g., for game parameter inference when information is incomplete. To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms the current state-of-the-art hybrid PINN model on safety performance across games with parametric state constraints. Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts, which are computationally cheap. We show that the costate dynamics, which can reflect state constraint violation, effectively enables the learning of differentiable values with large Lipschitz constants, without requiring manually supervised data as suggested by the hybrid PINN model. More importantly, we show that the close relationship between costates and policies makes the former critical in learning feedback control policies with generalizable safety performance.

Updated: 2024-05-31 21:53:47

标题: 波恩特拉金神经算子用于解决参数化的一般和差分博弈

摘要: 两人一组的一般和差分博弈的价值是哈密顿-雅可比-艾萨克斯（HJI）方程的粘度解。对于这样的博弈，价值和策略的近似受到维度诅咒（CoD）的影响。通过物理信息神经网络（PINN）缓解CoD时，由于状态约束存在具有大Lipschitz常数的可微值，会遇到收敛问题。除了这些挑战外，通常需要在博弈参数空间中学习可泛化的价值和策略，例如在信息不完整时进行游戏参数推断。为了解决这些挑战，我们在本文中提出了一种Pontryagin模式神经运算符，该运算符在具有参数状态约束的游戏中的安全性能方面优于当前最先进的混合PINN模型。我们的关键贡献是引入一个在前向和后向共轭状态展开之间的差异上定义的共轭损失，这在计算上是廉价的。我们展示了共轭动力学，可以反映状态约束违规，有效地使学习具有大Lipschitz常数的可微值成为可能，而不需要像混合PINN模型建议的那样手动监督数据。更重要的是，我们展示了共轭与策略之间的密切关系，在学习具有可泛化安全性能的反馈控制策略中，前者是至关重要的。

更新时间: 2024-05-31 21:53:47

领域: cs.LG,cs.GT,cs.RO

下载: http://arxiv.org/abs/2401.01502v2

Graph Machine Learning based Doubly Robust Estimator for Network Causal Effects

We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While there is extensive literature focusing on estimating causal effects in social network setups, a majority of them make prior assumptions about the form of network-induced confounding mechanisms. Such strong assumptions are rarely likely to hold especially in high-dimensional networks. We propose a novel methodology that combines graph machine learning approaches with the double machine learning framework to enable accurate and efficient estimation of direct and peer effects using a single observational social network. We demonstrate the semiparametric efficiency of our proposed estimator under mild regularity conditions, allowing for consistent uncertainty quantification. We demonstrate that our method is accurate, robust, and scalable via an extensive simulation study. We use our method to investigate the impact of Self-Help Group participation on financial risk tolerance.

Updated: 2024-05-31 21:38:53

标题: 基于图机器学习的双重稳健估计器用于网络因果效应

摘要: 我们解决了在社交网络数据中推断因果效应的挑战。这导致了干扰的挑战 - 单位的结果受到邻居的处理的影响 - 以及网络诱发的混杂因素。虽然有大量文献集中在社交网络设置中估计因果效应，但其中大多数对网络诱发的混杂机制形式进行先验假设。这种强假设很少可能成立，尤其是在高维网络中。我们提出了一种新颖的方法论，将图机器学习方法与双机器学习框架结合起来，以便使用单个观察性社交网络准确且高效地估计直接和同行效应。在温和的正则条件下，我们展示了所提出的估计器的半参数效率，允许一致的不确定性量化。我们通过广泛的模拟研究证明，我们的方法准确、稳健且可扩展。我们使用我们的方法来研究自助团体参与对金融风险承受能力的影响。

更新时间: 2024-05-31 21:38:53

领域: cs.LG,cs.SI,stat.ME

下载: http://arxiv.org/abs/2403.11332v2

Light-weight probing of unsupervised representations for Reinforcement Learning

Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. Inspired by the vision community, we study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation. Specifically, we probe for the observed reward in a given state and the action of an expert in a given state, both of which are generally applicable to many RL domains. Through rigorous experimentation, we show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark, while having lower variance and up to 600x lower computational cost. This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes without the need to run RL evaluations for every setting. Leveraging this framework, we further improve existing self-supervised learning (SSL) recipes for RL, highlighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsupervised objective.

Updated: 2024-05-31 21:36:57

标题: 轻量级探测无监督表示的强化学习

摘要: 无监督的视觉表示学习为利用大量未标记的轨迹构建有用的视觉表示提供了机会，这有助于强化学习（RL）算法的训练。然而，评估这种表示的适应性需要训练RL算法，这是计算密集型的，并且具有高方差的结果。受视觉社区的启发，我们研究线性探测是否可以作为质量评估无监督RL表示的代理任务。具体而言，我们探寻给定状态下的观察奖励和专家在给定状态下的动作，这两者通常适用于许多RL领域。通过严格的实验，我们展示探测任务与Atari100k基准测试中的下游RL性能强相关，同时具有较低的方差和高达600倍的较低计算成本。这为探索预训练算法空间并识别有前途的预训练配方提供了更有效的方法，而无需为每个设置运行RL评估。借助这个框架，我们进一步改进了现有的自监督学习（SSL）配方，突出了前向模型、视觉主干的大小和无监督目标的精确制定的重要性。

更新时间: 2024-05-31 21:36:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2208.12345v2

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.

Updated: 2024-05-31 21:34:33

标题: FORML：一种用于斯蒂费尔流形上元学习的黎曼海森无约束方法

摘要: 元学习问题通常被制定为双层优化问题，在优化的内循环和外循环中分别更新任务特定和元参数。然而，在黎曼空间中执行优化，其中参数和元参数位于黎曼流形上，计算密集。与欧几里得方法不同，黎曼反向传播需要计算包括通过黎曼算子的反向计算在内的二阶导数。本文介绍了一种使用斯蒂弗尔流形上的一阶导数近似的无海森方法。我们的方法显著减少了计算负载和内存占用。我们展示了如何使用一个在最后分类层的参数上强制正交约束的斯蒂弗尔全连接层作为主干网络的头部，增强了基于梯度的元学习方法的表示重用。我们在各种少样本学习数据集上的实验结果表明，与最先进的方法相比，特别是其欧几里得对应物MAML，我们提出的方法优势明显。

更新时间: 2024-05-31 21:34:33

领域: cs.LG

下载: http://arxiv.org/abs/2402.18605v2

An NLP Crosswalk Between the Common Core State Standards and NAEP Item Specifications

Natural language processing (NLP) is rapidly developing for applications in educational assessment. In this paper, I describe an NLP-based procedure that can be used to support subject matter experts in establishing a crosswalk between item specifications and content standards. This paper extends recent work by proposing and demonstrating the use of multivariate similarity based on embedding vectors for sentences or texts. In particular, a hybrid regression procedure is demonstrated for establishing the match of each content standard to multiple item specifications. The procedure is used to evaluate the match of the Common Core State Standards (CCSS) for mathematics at grade 4 to the corresponding item specifications for the 2026 National Assessment of Educational Progress (NAEP).

Updated: 2024-05-31 21:30:44

标题: 一个自然语言处理的桥梁：普通核心国家标准和全国评估教育进度（NAEP）项目规范之间的对应关系

摘要: 自然语言处理（NLP）正在迅速发展，用于教育评估应用。在本文中，我描述了一个基于NLP的程序，可用于支持学科专家在建立项目规格和内容标准之间的对应关系。本文通过提出并演示基于嵌入向量的句子或文本的多变量相似性的使用，扩展了最近的工作。特别地，演示了一种混合回归程序，用于建立每个内容标准与多个项目规格的匹配。该程序用于评估数学四年级的《共同核心国家标准》（CCSS）与2026年全国教育进步评估（NAEP）相应项目规格的匹配情况。

更新时间: 2024-05-31 21:30:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17284v2

HD Maps are Lane Detection Generalizers: A Novel Generative Framework for Single-Source Domain Generalization

Lane detection is a vital task for vehicles to navigate and localize their position on the road. To ensure reliable driving, lane detection models must have robust generalization performance in various road environments. However, despite the advanced performance in the trained domain, their generalization performance still falls short of expectations due to the domain discrepancy. To bridge this gap, we propose a novel generative framework using HD Maps for Single-Source Domain Generalization (SSDG) in lane detection. We first generate numerous front-view images from lane markings of HD Maps. Next, we strategically select a core subset among the generated images using (i) lane structure and (ii) road surrounding criteria to maximize their diversity. In the end, utilizing this core set, we train lane detection models to boost their generalization performance. We validate that our generative framework from HD Maps outperforms the Domain Adaptation model MLDA with +3.01%p accuracy improvement, even though we do not access the target domain images.

Updated: 2024-05-31 21:26:39

标题: 高清地图是车道检测的泛化器：一种新颖的单源领域泛化生成框架

摘要: 车道检测是车辆在道路上导航和定位其位置的关键任务。为了确保可靠的驾驶，车道检测模型必须在各种道路环境中具有强大的泛化性能。然而，尽管在训练领域表现出色，但由于域差异，它们的泛化性能仍然不符合预期。为了弥补这一差距，我们提出了一种使用HD地图的新颖生成框架，用于车道检测中的单源域泛化（SSDG）。我们首先从HD地图的车道标记中生成大量前视图像。接下来，我们根据车道结构和道路周围标准策略性地选择生成的图像中的核心子集，以最大化它们的多样性。最后，利用这个核心集，我们训练车道检测模型以提高它们的泛化性能。我们验证了我们从HD地图生成框架优于领域适应模型MLDA，准确度提高了+3.01％，即使我们没有访问目标域图像。

更新时间: 2024-05-31 21:26:39

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2311.16589v2

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tried to accelerate diffusion-QL, the improvement in training and/or inference speed often results in degraded performance. In this paper, we introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy. We bridge the two polices by a newly introduced diffusion trust region loss. The diffusion policy maintains expressiveness, while the trust region loss directs the one-step policy to explore freely and seek modes within the region defined by the diffusion policy. DTQL eliminates the need for iterative denoising sampling during both training and inference, making it remarkably computationally efficient. We evaluate its effectiveness and algorithmic characteristics against popular Kullback-Leibler (KL) based distillation methods in 2D bandit scenarios and gym tasks. We then show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds. The PyTorch implementation is available at https://github.com/TianyuCodings/Diffusion_Trusted_Q_Learning.

Updated: 2024-05-31 21:23:55

标题: 离线强化学习中创建信任区域的扩散策略

摘要: 离线强化学习（RL）利用预先收集的数据集来训练最优策略。扩散Q学习（DQL）将扩散模型引入作为一种强大而表达丰富的策略类别，显著提升了离线RL的性能。然而，其依赖于迭代去噪采样来生成动作，导致训练和推理速度变慢。虽然最近有几次尝试加速扩散QL，但训练和/或推理速度的提高通常会导致性能下降。在本文中，我们引入了一种双策略方法，即扩散信任Q学习（DTQL），它包括一个用于纯行为克隆的扩散策略和一个实用的单步策略。我们通过一个新引入的扩散信任区域损失来连接这两种策略。扩散策略保持了表达力，而信任区域损失指导单步策略自由探索并在扩散策略定义的区域内寻找模式。DTQL在训练和推理过程中消除了迭代去噪采样的需要，使其在计算上非常高效。我们在2D赌徒场景和gym任务中评估了其有效性和算法特性，与流行的基于Kullback-Leibler（KL）的蒸馏方法进行比较。然后，我们展示了DTQL不仅在大多数D4RL基准任务中能够胜过其他方法，而且在训练和推理速度上表现出高效性。PyTorch实现可在https://github.com/TianyuCodings/Diffusion_Trusted_Q_Learning找到。

更新时间: 2024-05-31 21:23:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19690v2

Exfiltration of personal information from ChatGPT via prompt injection

We report that ChatGPT 4 and 4o are susceptible to a prompt injection attack that allows an attacker to query users' personal data. It is applicable without the use of any 3rd party tools and all users are currently affected. This vulnerability is exacerbated by the recent introduction of ChatGPT's memory feature, which allows an attacker to command ChatGPT to monitor the user for the desired personal data.

Updated: 2024-05-31 21:21:19

标题: ChatGPT通过提示注入泄露个人信息

摘要: 我们报告了ChatGPT 4和4o容易受到即时注入攻击的影响，这使得攻击者可以查询用户的个人数据。这种攻击适用于没有使用任何第三方工具的情况，目前所有用户都受到影响。最近引入的ChatGPT的内存功能加剧了这一漏洞，这使得攻击者可以命令ChatGPT监视用户以获取所需的个人数据。

更新时间: 2024-05-31 21:21:19

领域: cs.CR,cs.AI,cs.CL,cs.CY,cs.ET

下载: http://arxiv.org/abs/2406.00199v1

BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, it is still challenging to deploy the model with improved forgetting-adaptation trade-offs and efficiency. In addition, current CTTA scenarios assume only the disjoint situation, even though real-world domains are seamlessly changed. To address these challenges, this paper proposes BECoTTA, an input-dependent and efficient modular framework for CTTA. We propose Mixture-of Domain Low-rank Experts (MoDE) that contains two core components: (i) Domain-Adaptive Routing, which helps to selectively capture the domain adaptive knowledge with multiple domain routers, and (ii) Domain-Expert Synergy Loss to maximize the dependency between each domain and expert. We validate that our method outperforms multiple CTTA scenarios, including disjoint and gradual domain shits, while only requiring ~98% fewer trainable parameters. We also provide analyses of our method, including the construction of experts, the effect of domain-adaptive experts, and visualizations.

Updated: 2024-05-31 21:14:42

标题: BECoTTA：输入依赖的专家在线混合，用于持续测试时间适应

摘要: 持续的测试时间自适应（CTTA）需要在保留先前学习知识的同时有效地适应连续的未知领域。然而，尽管CTTA取得了进展，但仍然具有挑战性，难以部署具有改进的遗忘-适应权衡和效率的模型。此外，当前的CTTA场景仅假定不相交的情况，即使真实世界的领域是无缝变化的。为了解决这些挑战，本文提出了BECoTTA，一种输入相关和高效的模块化框架，用于CTTA。我们提出了包含两个核心组件的混合领域低秩专家（MoDE）：（i）领域自适应路由，有助于通过多个领域路由器有选择地捕获领域自适应知识；和（ii）领域专家协同损失，最大化每个领域和专家之间的依赖性。我们验证了我们的方法优于多个CTTA场景，包括不相交和渐进的领域转移，同时只需要约98%较少的可训练参数。我们还对我们的方法进行了分析，包括专家的构建，领域自适应专家的影响和可视化。

更新时间: 2024-05-31 21:14:42

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.08712v3

Fairness Without Harm: An Influence-Guided Active Sampling Approach

The pursuit of fairness in machine learning (ML), ensuring that the models do not exhibit biases toward protected demographic groups, typically results in a compromise scenario. This compromise can be explained by a Pareto frontier where given certain resources (e.g., data), reducing the fairness violations often comes at the cost of lowering the model accuracy. In this work, we aim to train models that mitigate group fairness disparity without causing harm to model accuracy. Intuitively, acquiring more data is a natural and promising approach to achieve this goal by reaching a better Pareto frontier of the fairness-accuracy tradeoff. The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes. However, these sensitive attribute annotations should be protected due to privacy and safety concerns. In this paper, we propose a tractable active data sampling algorithm that does not rely on training group annotations, instead only requiring group annotations on a small validation set. Specifically, the algorithm first scores each new example by its influence on fairness and accuracy evaluated on the validation dataset, and then selects a certain number of examples for training. We theoretically analyze how acquiring more data can improve fairness without causing harm, and validate the possibility of our sampling approach in the context of risk disparity. We also provide the upper bound of generalization error and risk disparity as well as the corresponding connections. Extensive experiments on real-world data demonstrate the effectiveness of our proposed algorithm.

Updated: 2024-05-31 21:11:10

标题: 无害公平性：一种受影响引导的主动抽样方法

摘要: 在机器学习（ML）中追求公平性，确保模型不对受保护的人口群体产生偏见，通常会导致一种妥协情况。这种妥协可以通过帕累托前沿解释，即在特定资源（例如数据）的情况下，减少公平性违规往往会以降低模型准确性为代价。在这项工作中，我们旨在训练可以减轻群体公平差异而不损害模型准确性的模型。直觉上，获取更多数据是实现这一目标的一种自然且有前途的方法，通过达到更好的公平性-准确性权衡的帕累托前沿。当前的数据获取方法，如公平主动学习方法，通常需要注释敏感属性。然而，由于隐私和安全问题，这些敏感属性注释应该受到保护。在本文中，我们提出了一种可行的主动数据采样算法，不依赖于训练群体注释，而是只需要在一个小的验证集上进行群体注释。具体来说，该算法首先通过在验证数据集上评估公平性和准确性的影响来对每个新样本进行评分，然后选择一定数量的样本进行训练。我们从理论上分析了如何通过获取更多数据来提高公平性而不会造成伤害，并验证了我们的采样方法在风险差异背景下的可能性。我们还提供了泛化误差和风险差异的上限以及相应的联系。对真实世界数据的广泛实验证明了我们提出的算法的有效性。

更新时间: 2024-05-31 21:11:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12789v2

Domain-Independent Dynamic Programming

For combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) aim to decouple modeling and solving a problem: the `holy grail' of declarative problem solving. We propose domain-independent dynamic programming (DIDP), a new model-based paradigm based on dynamic programming (DP). While DP is not new, it has typically been implemented as a problem-specific method. We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by AI planning. We show that heuristic search algorithms can be used to solve DyPDL models and propose seven DIDP solvers. We experimentally compare our DIDP solvers with commercial MIP and CP solvers (solving MIP and CP models, respectively) on common benchmark instances of eleven combinatorial optimization problem classes. We show that DIDP outperforms MIP in nine problem classes, CP also in nine problem classes, and both MIP and CP in seven.

Updated: 2024-05-31 21:05:34

标题: 领域无关的动态规划

摘要: 对于组合优化问题，基于模型的范式如混合整数规划（MIP）和约束编程（CP）旨在将建模和解决问题分离：这是声明式问题解决的“圣杯”。我们提出了领域无关的动态规划（DIDP），这是一种基于动态规划（DP）的新模型化范式。虽然DP并不新颖，但通常被实现为一种特定于问题的方法。我们引入了动态规划描述语言（DyPDL），这是一种基于状态转换系统定义DP模型的形式化方法，灵感来自于人工智能规划。我们展示了启发式搜索算法可以用于解决DyPDL模型，并提出了七种DIDP求解器。我们在十一个组合优化问题类的常见基准实例上实验比较了我们的DIDP求解器与商业MIP和CP求解器（分别解决MIP和CP模型）。我们发现DIDP在九个问题类中优于MIP，也优于九个问题类中的CP，并且在七个问题类中同时优于MIP和CP。

更新时间: 2024-05-31 21:05:34

领域: cs.AI,F.2.2; I.2.8

下载: http://arxiv.org/abs/2401.13883v2

Training neural networks with structured noise improves classification and generalization

The beneficial role of noise-injection in learning is a consolidated concept in the field of artificial neural networks, suggesting that even biological systems might take advantage of similar mechanisms to optimize their performance. The training-with-noise algorithm proposed by Gardner and collaborators is an emblematic example of a noise-injection procedure in recurrent networks, which can be used to model biological neural systems. We show how adding structure to noisy training data can substantially improve the algorithm performance, allowing the network to approach perfect retrieval of the memories and wide basins of attraction, even in the scenario of maximal injected noise. We also prove that the so-called Hebbian Unlearning rule coincides with the training-with-noise algorithm when noise is maximal and data are stable fixed points of the network dynamics.

Updated: 2024-05-31 21:01:45

标题: 使用结构噪声训练神经网络改善分类和泛化

摘要: 噪声注入在学习中的有益作用是人工神经网络领域中一个已经得到确认的概念，这表明即使生物系统也可能利用类似的机制来优化它们的性能。由Gardner和合作者提出的带噪声训练算法是递归网络中噪声注入程序的一个标志性例子，可以用来模拟生物神经系统。我们展示了如何给嘈杂的训练数据添加结构可以显著提高算法性能，使网络能够接近完美检索记忆和宽广的吸引盆地，即使在最大注入噪声的情况下。我们还证明了所谓的希伯规则与训练噪声算法在噪声最大且数据是网络动力学的稳定固定点时是一致的。

更新时间: 2024-05-31 21:01:45

领域: cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2302.13417v6

Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR

We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.

Updated: 2024-05-31 20:54:20

标题: 元学习线性二次调节器：一种无模型LQR的策略梯度MAML方法

摘要: 我们研究了在多任务、异质和无模型设置下学习线性二次调节器（LQR）的问题。我们对基于策略梯度（PG）的模型无关元学习（MAML）方法（Finn等，2017年）在不同任务异质性设置下解决LQR问题的稳定性和个性化保证进行了表征。我们展示了我们的MAML-LQR算法在模型基础和无模型学习场景中产生了一个稳定控制器，该控制器与每个任务特定的最优控制器在任务异质性偏差范围内接近。此外，在模型基础设置中，我们展示了这样的控制器以线性收敛速率实现，这超过了现有工作中的次线性速率。我们的理论保证表明，学习到的控制器可以有效地适应未见过的LQR任务。

更新时间: 2024-05-31 20:54:20

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2401.14534v2

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

Transformers excel at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they do so remains a mystery. Recent work suggests that Transformers may internally run Gradient Descent (GD), a first-order optimization method, to perform ICL. In this paper, we instead demonstrate that Transformers learn to approximate higher-order optimization methods for ICL. For in-context linear regression, Transformers share a similar convergence rate as Iterative Newton's Method; both are exponentially faster than GD. Empirically, predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations; thus, Transformers and Newton's method converge at roughly the same rate. In contrast, Gradient Descent converges exponentially more slowly. We also show that Transformers can learn in-context on ill-conditioned data, a setting where Gradient Descent struggles but Iterative Newton succeeds. Finally, to corroborate our empirical findings, we prove that Transformers can implement $k$ iterations of Newton's method with $k + \mathcal{O}(1)$ layers.

Updated: 2024-05-31 20:37:54

标题: 变压器学习上下文学习中的高阶优化方法：线性模型研究

摘要: 变压器在上下文学习（ICL）方面表现出色 - 从演示中学习而无需参数更新 - 但它们如何做到这一点仍然是一个谜。最近的研究表明，变压器可能在内部运行梯度下降（GD），这是一种一阶优化方法，用于执行ICL。在本文中，我们相反地证明了变压器学会了近似高阶优化方法以进行ICL。对于上下文线性回归，变压器与迭代牛顿法具有类似的收敛速度；两者都比GD快得多。经验上，连续变压器层的预测与牛顿方法的不同迭代线性地匹配，每个中间层大致计算3次迭代；因此，变压器和牛顿方法以大致相同的速度收敛。相比之下，梯度下降的收敛速度慢得多。我们还展示了变压器可以在条件恶劣的数据上进行上下文学习，这是梯度下降困难而迭代牛顿成功的情况。最后，为了证实我们的经验发现，我们证明了变压器可以用$k+\mathcal{O}(1)$层实现$k$次牛顿方法的迭代。

更新时间: 2024-05-31 20:37:54

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.17086v2

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning (DSpodFL), a DFL methodology built on a generalized notion of sporadicity in both local gradient and aggregation processes. DSpodFL subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing heterogeneous and time-varying computation/communication scenarios. We analytically characterize the convergence behavior of DSpodFL for both convex and non-convex models, for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises, and show how our bounds recover existing results as special cases. Experiments demonstrate that DSpodFL consistently achieves improved training speeds compared with baselines under various system settings.

Updated: 2024-05-31 20:36:30

标题: 分散的零星联邦学习：具有收敛保证的统一算法框架

摘要: 分散式联邦学习（DFL）捕捉了模型更新和模型汇总都由客户端在没有中央服务器的情况下独立完成的FL设置。现有的DFL工作主要集中在客户端在本地模型交换之间进行固定数量的本地更新的设置中，忽视了通信和计算能力的异质性和动态性。在这项工作中，我们提出了分散式间歇联邦学习（DSpodFL），这是一种建立在本地梯度和聚合过程的广义间歇性概念上的DFL方法。DSpodFL通过将每次迭代中梯度下降在每个客户端的发生和客户端对之间模型交换建模为任意指示随机变量，从而在统一的算法框架下包含了许多现有的分散式优化方法，从而捕捉了异构和时变的计算/通信场景。我们在通信图连接性、客户端数据异质性和梯度噪声上做出温和的假设，分析了DSpodFL在凸和非凸模型、恒定和递减学习率下的收敛行为，并展示了我们的界限如何将现有结果作为特例。实验证明，与各种系统设置下的基线相比，DSpodFL始终能够实现更快的训练速度。

更新时间: 2024-05-31 20:36:30

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2402.03448v2

DOCTOR: Dynamic On-Chip Temporal Variation Remediation Toward Self-Corrected Photonic Tensor Accelerators

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads, offering unparalleled speed and energy efficiency, especially in resource-limited, latency-sensitive edge computing environments. However, the deployment of analog photonic tensor accelerators encounters reliability challenges due to hardware noise and environmental variations. While off-chip noise-aware training and on-chip training have been proposed to enhance the variation tolerance of optical neural accelerators with moderate, static noise, we observe a notable performance degradation over time due to temporally drifting variations, which requires a real-time, in-situ calibration mechanism. To tackle this challenging reliability issues, for the first time, we propose a lightweight dynamic on-chip remediation framework, dubbed DOCTOR, providing adaptive, in-situ accuracy recovery against temporally drifting noise. The DOCTOR framework intelligently monitors the chip status using adaptive probing and performs fast in-situ training-free calibration to restore accuracy when necessary. Recognizing nonuniform spatial variation distributions across devices and tensor cores, we also propose a variation-aware architectural remapping strategy to avoid executing critical tasks on noisy devices. Extensive experiments show that our proposed framework can guarantee sustained performance under drifting variations with 34% higher accuracy and 2-3 orders-of-magnitude lower overhead compared to state-of-the-art on-chip training methods. Our code is open-sourced at https://github.com/ScopeX-ASU/DOCTOR.

Updated: 2024-05-31 20:24:47

标题: DOCTOR：面向自校正光子张量加速器的动态芯片时间变化修复

摘要: 光子计算已经成为加速计算密集人工智能（AI）工作负载的有希望的解决方案，尤其在资源有限、延迟敏感的边缘计算环境中，提供了无与伦比的速度和能源效率。然而，由于硬件噪音和环境变化，模拟光子张量加速器的部署遇到了可靠性挑战。虽然已经提出了离片噪音感知训练和片上训练方法来增强光学神经加速器对中等、静态噪音的变化容忍性，但由于时间漂移变化导致的性能明显下降，这需要实时的、现场校准机制。为了解决这些具有挑战性的可靠性问题，我们首次提出了一种轻量级的动态片上修复框架，名为DOCTOR，提供针对时间漂移噪音的自适应、现场准确度恢复。DOCTOR框架通过自适应探测智能监视芯片状态，并在必要时进行快速的现场无训练校准，恢复准确度。鉴别设备和张量核之间非均匀空间变化分布，我们还提出了一种变化感知的架构重映射策略，以避免在噪音设备上执行关键任务。大量实验表明，我们提出的框架可以在漂移变化下保证持续性能，并且与最先进的片上训练方法相比，准确度提高了34%，开销降低了2-3个数量级。我们的代码已在https://github.com/ScopeX-ASU/DOCTOR上开源。

更新时间: 2024-05-31 20:24:47

领域: cs.ET,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.02688v2

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) result in significant accuracy variance, failing to provide a balanced performance across clients. To address these challenges, we propose CoPreFL, a collaborative/distributed pre-training approach which provides a robust initialization for downstream FL tasks. The key idea of CoPreFL is a model-agnostic meta-learning (MAML) procedure that tailors the global model to closely mimic heterogeneous and unseen FL scenarios, resulting in a pre-trained model that is rapidly adaptable to arbitrary FL tasks. Our MAML procedure incorporates performance variance into the meta-objective function, balancing performance across clients rather than solely optimizing for accuracy. Through extensive experiments, we demonstrate that CoPreFL obtains significant improvements in both average accuracy and variance across arbitrary downstream FL tasks with unseen/seen labels, compared with various pre-training baselines. We also show how CoPreFL is compatible with different well-known FL algorithms applied by the downstream tasks, enhancing performance in each case.

Updated: 2024-05-31 20:16:17

标题: 重新思考起点：联合预训练用于联邦下游任务

摘要: 最近的一些研究表明，利用中心预训练模型可以为联邦学习（FL）提供有利的初始化。然而，现有的预训练方法在面对任意一组下游FL任务时往往不能很好地泛化。具体来说，它们通常（i）在有未见过的下游标签时实现有限的平均准确性，（ii）导致显著的准确性方差，无法提供客户端之间的平衡性能。为了解决这些挑战，我们提出了CoPreFL，一种协作/分布式预训练方法，为下游FL任务提供强大的初始化。CoPreFL的关键思想是一种模型无关的元学习（MAML）过程，该过程将全局模型调整得更加接近异质和未见的FL场景，从而产生一个可以快速适应任意FL任务的预训练模型。我们的MAML过程将性能方差纳入元目标函数中，平衡了客户端之间的性能，而不仅仅是优化准确性。通过大量实验，我们证明CoPreFL在未见/已见标签的任意下游FL任务中，相比各种预训练基线，都实现了显著的平均准确性和方差改进。我们还展示了CoPreFL如何与下游任务应用的不同知名FL算法兼容，从而在每种情况下提高性能。

更新时间: 2024-05-31 20:16:17

领域: cs.LG

下载: http://arxiv.org/abs/2402.02225v2

Online Learning with Bounded Recall

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.

Updated: 2024-05-31 19:55:56

标题: 在线学习与有界回忆

摘要: 我们研究了在重复博弈研究中流行的“有界回忆”设置中的全信息在线学习问题。在线学习算法 $\mathcal{A}$ 在时间 $t$ 的输出可以被写成 $M$ 个先前奖励的函数（而不是 $\mathcal{A}$ 的任何其他内部状态）。我们首先证明了通过从基于均值的无悔学习算法（例如，在过去的 $M$ 轮上运行 Hedge）构建有界回忆算法的一种自然方法失败了，并且任何这样的算法会产生每轮恒定的后悔。然后，我们构建了一个达到每轮后悔 $\Theta(1/\sqrt{M})$ 的固定有界回忆算法，并补充了一个紧密的下界。最后，我们展示了与完美回忆设置不同的是，任何低后悔界有界回忆算法必须意识到过去 $M$ 个损失的排序——任何对过去 $M$ 个损失进行对称函数操作的有界回忆算法必须产生每轮恒定的后悔。

更新时间: 2024-05-31 19:55:56

领域: cs.LG,cs.GT,stat.ML

下载: http://arxiv.org/abs/2205.14519v2

Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens

Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). Noting that second-order methods often only function effectively with the addition of stabilising heuristics (such as Levenberg-Marquardt damping), we ask how much these (as opposed to the second-order curvature model) contribute to second-order algorithms' performance. We thus study AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens & Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales and hyperparameter tuning methodologies, concluding K-FAC's adaptive heuristics are of variable standalone general effectiveness, and finding an untuned AdamQLR setting can achieve comparable performance vs runtime to tuned benchmarks.

Updated: 2024-05-31 19:31:38

标题: 通过第二阶镜头观察Adam以研究K-FAC启发式

摘要: 研究深度学习优化的过程中存在一种紧张状态，即一阶梯度方法（如SGD和Adam）的计算效率与二阶曲率方法（如拟牛顿方法和K-FAC）的理论效率之间的紲合。注意到二阶方法通常只有在添加稳定启发式方法（如Levenberg-Marquardt阻尼）后才能有效运行，我们探讨了这些方法（而不是二阶曲率模型）对二阶优化算法性能的贡献。因此，我们研究了AdamQLR：一种优化器，结合了来自K-FAC（Martens & Grosse, 2015）的阻尼和学习率选择技术，以及受Adam启发的更新方向，通过二阶镜头考虑Adam。我们在各种规模和超参数调整方法的回归和分类任务上评估了AdamQLR，得出K-FAC的自适应启发式方法的独立一般有效性不同，发现未调整的AdamQLR设置可以实现与经过调整的基准相当的性能和运行时间。

更新时间: 2024-05-31 19:31:38

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.14963v2

Federated Generative Learning with Foundation Models

Existing approaches in Federated Learning (FL) mainly focus on sending model parameters or gradients from clients to a server. However, these methods are plagued by significant inefficiency, privacy, and security concerns. Thanks to the emerging foundation generative models, we propose a novel federated learning framework, namely Federated Generative Learning. In this framework, each client can create text embeddings that are tailored to their local data, and send embeddings to the server. Then the informative training data can be synthesized remotely on the server using foundation generative models with these embeddings, which can benefit FL tasks. Our proposed framework offers several advantages, including increased communication efficiency, robustness to data heterogeneity, substantial performance improvements, and enhanced privacy protection. We validate these benefits through extensive experiments conducted on 12 datasets. For example, on the ImageNet100 dataset with a highly skewed data distribution, our method outperforms FedAvg by 12% in a single communication round, compared to FedAvg's performance over 200 communication rounds. We have released the code for all experiments conducted in this study.

Updated: 2024-05-31 18:49:21

标题: 具有基础模型的联邦生成学习

摘要: 现有的联邦学习（FL）方法主要集中在从客户端向服务器发送模型参数或梯度。然而，这些方法存在显著的低效、隐私和安全问题。由于新兴的基础生成模型，我们提出了一种新颖的联邦学习框架，即联邦生成学习。在这个框架中，每个客户端可以创建适合其本地数据的文本嵌入，并将嵌入发送到服务器。然后，在服务器上使用这些嵌入和基础生成模型远程合成信息丰富的训练数据，这可以有利于FL任务。我们提出的框架提供了几个优势，包括增加通信效率、对数据异质性的鲁棒性、显著的性能改进以及增强的隐私保护。我们通过对12个数据集进行的大量实验证实了这些优势。例如，在具有高度倾斜数据分布的ImageNet100数据集上，我们的方法在单次通信轮回中比FedAvg提高了12％，而FedAvg在200次通信轮回中的性能要强。我们已经发布了本研究中进行的所有实验的代码。

更新时间: 2024-05-31 18:49:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.16064v2

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

AI-driven design problems, such as DNA/protein sequence design, are commonly tackled from two angles: generative modeling, which efficiently captures the feasible design space (e.g., natural images or biological sequences), and model-based optimization, which utilizes reward models for extrapolation. To combine the strengths of both approaches, we adopt a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL. Although prior work has explored similar avenues, they primarily focus on scenarios where accurate reward models are accessible. In contrast, we concentrate on an offline setting where a reward model is unknown, and we must learn from static offline datasets, a common scenario in scientific domains. In offline scenarios, existing approaches tend to suffer from overoptimization, as they may be misled by the reward model in out-of-distribution regions. To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions. Through empirical and theoretical analysis, we demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models while avoiding the generation of invalid designs through pre-trained diffusion models.

Updated: 2024-05-31 18:34:35

标题: 通过保守微调扩散模型，将基于模型的优化和生成建模进行桥接

摘要: AI驱动的设计问题，如DNA/蛋白质序列设计，通常从两个角度解决：生成建模，有效地捕捉可行的设计空间（例如自然图像或生物序列），以及基于模型的优化，利用奖励模型进行外推。为了结合这两种方法的优势，我们采用了一种混合方法，通过RL优化奖励模型来微调尖端扩散模型。尽管之前的工作已经探索了类似的途径，但它们主要集中在准确的奖励模型可访问的情况下。相比之下，我们专注于一个离线设置，其中奖励模型是未知的，我们必须从静态离线数据集中学习，这在科学领域是一个常见的情况。在离线场景中，现有方法往往遭受过度优化，因为它们可能被奖励模型在分布区域之外误导。为了解决这个问题，我们引入了一种保守的微调方法BRAID，通过优化保守奖励模型，该模型在离线数据分布之外包括额外的惩罚。通过实证和理论分析，我们展示了我们的方法在离线数据中胜过最佳设计的能力，利用奖励模型的外推能力，同时避免通过预训练的扩散模型生成无效设计。

更新时间: 2024-05-31 18:34:35

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.19673v2

TENNs-PLEIADES: Building Temporal Kernels with Orthogonal Polynomials

We introduce a neural network named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), belonging to the TENNs (Temporal Neural Networks) architecture. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.

Updated: 2024-05-31 18:29:13

标题: TENNs-PLEIADES：使用正交多项式构建时间核

摘要: 我们介绍了一个名为PLEIADES（PoLynomial Expansion In Adaptive Distributed Event-based Systems）的神经网络，属于TENNs（Temporal Neural Networks）架构。我们专注于将这些网络与基于事件的数据接口化，以实现低延迟的在线时空分类和检测。通过使用结构化的时间核和基于事件的数据，我们可以在没有额外微调的情况下自由地变化数据的采样率以及网络的离散步长。我们在三个基于事件的基准测试上进行了实验，并且以极大的优势在所有三个基准测试上获得了最新颖的结果，同时具有显著更小的内存和计算成本。我们取得了以下成就：1）在DVS128手势识别数据集上使用192K参数达到了99.59%的准确率，并且通过增加一个小的额外输出滤波器达到了100%的准确率；2）在AIS 2024眼球追踪挑战中使用277K参数达到了99.58%的测试准确率；3）在PROPHESEE 1 Megapixel汽车检测数据集上使用576k参数达到了0.556的mAP。

更新时间: 2024-05-31 18:29:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.12179v3

Resampling methods for Private Statistical Inference

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $\epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.

Updated: 2024-05-31 17:59:36

标题: Resampling方法用于私人统计推断

摘要: 我们考虑使用差分隐私构建置信区间的任务。我们提出了两种私有变体的非参数自举方法，这些方法私下计算数据分区上运行多次“小”自举的结果的中位数，并对生成的置信区间的覆盖误差给出了渐近边界。对于固定的差分隐私参数$\epsilon$，我们的方法在样本量$n$的对数因子内享有与非私有自举相同的误差率。我们在实际和合成数据上验证了我们的方法在均值估计、中位数估计和逻辑回归方面的性能。我们的方法在提供与现有方法（和非私有基准线）类似的覆盖准确性的同时，提供了明显更短（$\gtrsim 10$倍）的置信区间比以往的方法。

更新时间: 2024-05-31 17:59:36

领域: stat.ML,cs.CR,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.07131v2

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the mechanisms behind CLIP's generalizability beyond data imbalance but also provides transferable insights for the research community. The findings are validated in both supervised and self-supervised learning, enabling models trained on imbalanced data to achieve CLIP-level performance on diverse recognition tasks. Code will be available at: https://github.com/CVMI-Lab/clip-beyond-tail.

Updated: 2024-05-31 17:57:24

标题: 跨越数据不平衡的泛化能力：关于CLIP的可控研究，以获取可转移的见解

摘要: 大规模视觉-语言数据集中存在严重的数据不平衡。尽管如此，我们发现与监督学习相比，CLIP在此之上进行预训练表现出显著的对数据不平衡的鲁棒性，并且在学习可泛化表示方面表现出显著的有效性。为了探究这一发现背后的原因，我们进行了控制实验来研究各种潜在因素，并揭示了CLIP的假设任务形成了一个动态分类问题，其中只有训练中存在的类别的子集。这隔离了主导类别的偏见，并隐含地平衡了学习信号。此外，通过更具描述性的语言监督、更大规模的数据和更广泛的开放世界概念，CLIP的鲁棒性和可区分性得到提高，这些对监督学习是不可访问的。我们的研究不仅揭示了CLIP在数据不平衡之外泛化能力的机制，还为研究社区提供了可转移的见解。这些发现在监督学习和自监督学习中得到验证，使在数据不平衡上训练的模型能在各种识别任务上达到CLIP级别的性能。源代码将在以下链接提供: https://github.com/CVMI-Lab/clip-beyond-tail。

更新时间: 2024-05-31 17:57:24

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.21070v1

Code Pretraining Improves Entity Tracking Abilities of Language Models

Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base models with additional code data. We extend this analysis to additionally examine the effect of math training, another highly structured data type, and alignment tuning, an important step for enhancing the usability of models. We find clear evidence that models additionally trained on large amounts of code outperform the base models. On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.

Updated: 2024-05-31 17:56:33

标题: 代码预训练提高语言模型实体追踪能力

摘要: 最近的研究提供了间接证据，表明在代码上进行预训练的语言模型可以提高模型跟踪自然语言表达的话语实体状态变化的能力。在这项研究中，我们通过比较语言模型在实体跟踪性能上的对比来系统地测试这一主张。关键是，这些对比的模型包括基础模型和在这些基础模型基础上额外训练了代码数据的模型。我们进一步分析了数学训练的影响，这是另一种高度结构化的数据类型，以及对齐调整，这是增强模型可用性的重要步骤。我们发现明显证据表明额外训练大量代码的模型优于基础模型。另一方面，我们发现在各种模型族群中，额外的数学训练或对齐调整并没有一致的好处。

更新时间: 2024-05-31 17:56:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.21068v1

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$\times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

Updated: 2024-05-31 17:55:27

标题: 短文标题翻译：Mamba: 具有选择性状态空间的线性时间序列建模

摘要: 基于Transformer架构及其核心注意力模块的基础模型现在正驱动深度学习中大多数令人兴奋的应用程序，几乎普遍采用了Transformer架构。许多次线性时间架构，如线性注意力、门控卷积和循环模型，以及结构化状态空间模型（SSMs），已经被开发出来以解决Transformer在长序列上的计算效率低下的问题，但它们在重要的模态（如语言）上表现不如注意力。我们发现这些模型的一个关键弱点是它们无法执行基于内容的推理，并进行了几项改进。首先，简单地让SSM参数成为输入的函数，可以解决它们对离散模态的弱点，使模型能够根据当前令牌在序列长度维度上选择性地传播或遗忘信息。其次，尽管这种改变阻止了有效的卷积使用，我们设计了一种硬件感知的并行算法，以递归模式运行。我们将这些选择性的SSMs集成到一个简化的端到端神经网络架构中，没有注意力甚至没有MLP模块（Mamba）。Mamba享有快速推理速度（比Transformer高5倍），并且在序列长度上具有线性扩展性，其性能在真实数据上可以提高到百万长度的序列。作为通用序列模型骨干，Mamba在语言、音频和基因组学等多个模态上实现了最先进的性能。在语言建模方面，我们的Mamba-3B模型在预训练和下游评估中均优于同等大小的Transformer，并且与其两倍大小的Transformer性能相匹敌。

更新时间: 2024-05-31 17:55:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.00752v2

Recurrent neural networks: vanishing and exploding gradients are not the end of the story

Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.

Updated: 2024-05-31 17:53:00

标题: 循环神经网络：梯度消失和梯度爆炸并不是故事的结局

摘要: 循环神经网络(RNNs)因梯度消失和梯度爆炸而难以学习长期记忆而闻名。最近，状态空间模型(SSMs)作为RNNs的一个子类成功克服了这些困难，挑战了我们的理论理解。在本文中，我们深入探讨了RNNs的优化挑战，并发现随着网络记忆的增加，参数的变化导致输出变化越来越大，使得基于梯度的学习变得高度敏感，即使没有梯度爆炸。我们的分析进一步揭示了元素逐个回归设计模式与谨慎的参数化相结合在减轻这种影响方面的重要性。这一特征在SSMs以及其他架构中都存在，比如LSTMs。总的来说，我们的见解为解释RNNs的基于梯度学习中的一些困难提供了新的解释，并解释了为什么一些架构表现比其他更好。

更新时间: 2024-05-31 17:53:00

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2405.21064v1

Neural Network Verification with Branch-and-Bound for General Nonlinearities

Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to optimize branching points offline, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including networks with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as networks involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple neural networks, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest $\alpha,\!\beta$-CROWN, the winner of the 4th International Verification of Neural Networks Competition (VNN-COMP 2023).

Updated: 2024-05-31 17:51:07

标题: 使用分支定界进行神经网络验证以处理一般非线性情况

摘要: 分支定界（BaB）是神经网络（NN）验证中最有效的方法之一。然而，现有的关于BaB的研究大多集中在具有分段线性激活函数的NN上，特别是ReLU网络。在本文中，我们开发了一个通用框架，名为GenBaB，基于线性边界传播，在通用计算图中进行非线性的BaB。为了决定要分支的神经元，我们设计了一种利用线性边界作为快捷方式来有效估计分支后潜在改进的新分支启发式。为了决定一般非线性函数的非平凡分支点，我们提出了离线优化分支点的方法，在验证过程中可以有效地利用查找表。我们展示了我们的GenBaB在验证各种NN方面的有效性，包括具有Sigmoid、Tanh、Sine和GeLU等激活函数的网络，以及涉及LSTM和Vision Transformers中的乘法等多维非线性操作的网络。我们的框架还允许验证一般非线性计算图，并支持超越简单神经网络的验证应用，特别是用于AC Optimal Power Flow（ACOPF）。GenBaB是最新的$\alpha,\!\beta$-CROWN的一部分，也是第四届国际神经网络验证竞赛（VNN-COMP 2023）的获胜者。

更新时间: 2024-05-31 17:51:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.21063v1

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

Updated: 2024-05-31 17:50:01

标题: 变压器是SSMs：通过结构化状态空间对偶的广义模型和高效算法

摘要: 尽管变压器一直是深度学习在语言建模方面取得成功的主要架构，但最近显示，像Mamba这样的状态空间模型（SSMs）在小到中等规模上可以与或胜过变压器。我们展示这些模型族实际上是非常密切相关的，并发展了一个丰富的理论连接框架，将SSMs与注意力的变体通过对一个众所周知的结构半可分矩阵类的各种分解联系起来。我们的状态空间对偶（SSD）框架使我们能够设计一个新的架构（Mamba-2），其核心层是Mamba的选择性SSM的改进版本，速度提高2-8倍，同时在语言建模方面继续与变压器保持竞争力。

更新时间: 2024-05-31 17:50:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.21060v1

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect implicit contamination. TED substantially mitigates performance improvements up to 66.9\% attributed to data contamination across various contamination setups. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.

Updated: 2024-05-31 17:49:03

标题: 泛化还是记忆：对于大型语言模型的数据污染和可信评估

摘要: 最近关于大型语言模型（LLMs）强大能力的说法通常是通过在开放式基准上进行评估来支持的。考虑到LLMs训练数据的庞大规模和广泛来源，它可能明确或隐含地包含测试数据，导致LLMs更容易受到数据污染的影响。然而，由于训练数据的不透明性，模型的黑盒访问以及合成训练数据的快速增长，检测和缓解LLMs数据污染面临重大挑战。在本文中，我们提出了CDD，即通过LLMs输出分布进行污染检测。CDD仅需要采样文本来检测数据污染，通过识别LLMs输出分布的峰值来实现。为了减轻评估中数据污染的影响，我们还提出了TED：通过LLMs输出分布进行可信评估，基于对LLMs输出分布的校正。为了促进这项研究，我们引入了两个基准，即DetCon和ComiEval，用于数据污染检测和污染缓解评估任务。广泛的实验结果显示，CDD在准确率、F1分数和AUC指标方面相对其他污染检测方法平均改进了21.8％-30.2％，并且能够有效检测隐含污染。TED显著减轻了因数据污染而导致的性能改进高达66.9％，涵盖各种污染设置。在实际应用中，我们发现ChatGPT在HumanEval基准上具有高潜力受到数据污染的影响。

更新时间: 2024-05-31 17:49:03

领域: cs.CL,cs.AI,cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2402.15938v3

P4: Towards private, personalized, and Peer-to-Peer learning

Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensures that each client receives a personalized model while maintaining differential privacy guarantee of each client's local dataset during and after the training. Our approach includes the design of a lightweight algorithm to identify similar clients and group them in a private, peer-to-peer (P2P) manner. Once grouped, we develop differentially-private knowledge distillation for clients to co-train with minimal impact on accuracy. We evaluate our proposed method on three benchmark datasets (FEMNIST or Federated EMNIST, CIFAR-10 and CIFAR-100) and two different neural network architectures (Linear and CNN-based networks) across a range of privacy parameters. The results demonstrate the potential of P4, as it outperforms the state-of-the-art of differential private P2P by up to 40 percent in terms of accuracy. We also show the practicality of P4 by implementing it on resource constrained devices, and validating that it has minimal overhead, e.g., about 7 seconds to run collaborative training between two clients.

Updated: 2024-05-31 17:47:52

标题: P4: 私密、个性化和点对点学习的探索

摘要: 个性化学习是一种应对协作机器学习中数据异质性问题的提议方法。在分散设置中，个性化的两个主要挑战是客户端聚类和数据隐私。本文通过开发P4（个性化私密点对点）方法来解决这些挑战，该方法确保每个客户端在训练期间和之后接收到个性化模型的同时维持每个客户端本地数据集的差异性隐私保证。我们的方法包括设计一种轻量级算法来识别相似的客户端，并以私密的点对点（P2P）方式将它们分组。一旦分组，我们开发差异化私密的知识蒸馏，让客户端进行协同训练，对准确性影响最小。我们在三个基准数据集（FEMNIST或联合EMNIST，CIFAR-10和CIFAR-100）和两种不同的神经网络架构（线性和基于CNN的网络）上评估我们提出的方法，跨一系列隐私参数。结果表明了P4的潜力，因为在准确性方面，它在差异隐私点对点的最新技术上表现出高达40％的优势。我们还展示了P4的实用性，通过在资源受限设备上实现它，并验证它的运行开销最小，例如，在两个客户端之间运行协作训练大约需要7秒。

更新时间: 2024-05-31 17:47:52

领域: cs.LG

下载: http://arxiv.org/abs/2405.17697v2

An Organic Weed Control Prototype using Directed Energy and Deep Learning

Organic weed control is a vital to improve crop yield with a sustainable approach. In this work, a directed energy weed control robot prototype specifically designed for organic farms is proposed. The robot uses a novel distributed array robot (DAR) unit for weed treatment. Soybean and corn databases are built to train deep learning neural nets to perform weed recognition. The initial deep learning neural nets show a high performance in classifying crops. The robot uses a patented directed energy plant eradication recipe that is completely organic and UV-C free, with no chemical damage or physical disturbance to the soil. The deep learning can classify 8 common weed species in a soybean field under natural environment with up to 98% accuracy.

Updated: 2024-05-31 17:47:22

标题: 一个利用定向能量和深度学习的有机除草原型

摘要: 有机除草对提高农作物产量具有可持续性的重要性。在这项工作中，提出了一种专为有机农场设计的定向能量除草机器人原型。该机器人使用一种新颖的分布式阵列机器人（DAR）单元进行除草处理。建立了大豆和玉米数据库，用于训练深度学习神经网络进行除草识别。初始的深度学习神经网络在分类作物方面表现出很高的性能。该机器人使用一种专利的定向能量植物根除配方，完全无机且不含紫外线C，不会对土壤造成化学损害或物理干扰。深度学习可以在自然环境下对大豆田中的8种常见杂草进行分类，准确率高达98%。

更新时间: 2024-05-31 17:47:22

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.21056v1

Dynamic Conditional Optimal Transport through Simulation-Free Flows

We study the geometry of conditional optimal transport (COT) and prove a dynamical formulation which generalizes the Benamou-Brenier Theorem. Equipped with these tools, we propose a simulation-free flow-based method for conditional generative modeling. Our method couples an arbitrary source distribution to a specified target distribution through a triangular COT plan, and a conditional generative model is obtained by approximating the geodesic path of measures induced by this COT plan. Our theory and methods are applicable in infinite-dimensional settings, making them well suited for a wide class of Bayesian inverse problems. Empirically, we demonstrate that our method is competitive on several challenging conditional generation tasks, including an infinite-dimensional inverse problem.

Updated: 2024-05-31 17:43:54

标题: 动态条件最优输运通过无需模拟的流量

摘要: 我们研究了条件最优输运（COT）的几何学，并证明了一个动力学公式，它推广了Benamou-Brenier定理。借助这些工具，我们提出了一种基于流的无仿真方法，用于条件生成建模。我们的方法通过一个三角形的COT计划将任意源分布与指定的目标分布耦合在一起，通过逼近由该COT计划诱导的测量的测地线路径获得条件生成模型。我们的理论和方法适用于无限维设置，使它们非常适用于广泛的贝叶斯逆问题类别。在实证上，我们证明了我们的方法在几个具有挑战性的条件生成任务上具有竞争力，包括一个无限维逆问题。

更新时间: 2024-05-31 17:43:54

领域: cs.LG

下载: http://arxiv.org/abs/2404.04240v2

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. Using the Kronecker product and efficient Stiefel optimizers, we achieve parameter-efficient adaptation of orthogonal matrices. We introduce Spectral Orthogonal Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods.

Updated: 2024-05-31 17:43:35

标题: 谱感知参数高效微调用于扩散模型

摘要: 将大规模预训练生成模型以参数高效的方式进行调整越来越受到关注。传统方法如低秩调整通过施加约束实现参数高效，但可能不适用于需要高表示能力的任务。我们提出了一种新颖的面向谱的生成模型调整框架。我们的方法调整预训练权重的奇异值和它们的基向量。通过使用Kronecker积和高效的Stiefel优化器，我们实现了正交矩阵的参数高效调整。我们引入了谱正交分解调整（SODA），平衡了计算效率和表示能力。在文本到图像扩散模型上进行了广泛评估，证明了SODA的有效性，为现有微调方法提供了一种面向谱的替代方案。

更新时间: 2024-05-31 17:43:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.21050v1

Grammar-Aligned Decoding

Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints.

Updated: 2024-05-31 17:39:15

标题: 语法对齐解码

摘要: 大型语言模型（LLMs）在可靠生成高度结构化输出（如程序代码、数学公式或格式良好的标记）方面存在困难。通过受限解码方法可以缓解这个问题，通过贪婪地限制LLM在每个步骤可以输出的标记，以确保输出符合给定的约束。具体来说，在语法受限解码（GCD）中，LLM的输出必须遵循给定的语法。在本文中，我们证明了GCD技术（以及一般约束解码技术）可能会扭曲LLM的分布，导致输出是符合语法的，但出现的可能性与LLM给出的可能性不成比例，因此最终是低质量的。我们将与语法约束对齐的抽样问题称为语法对齐解码（GAD），并提出了自适应采样与近似预期未来（ASAp），这是一种解码算法，它保证输出是符合语法的，同时能够证明产生的输出与LLM在给定语法约束条件下的条件概率分布相匹配。我们的算法利用先前的样本输出来合理地近似未来不同输出前缀的语法性。我们在代码生成和结构化NLP任务上的评估显示，ASAp通常能够生成具有更高可能性（根据LLM的分布）的输出，而仍然强制执行所需的语法约束。

更新时间: 2024-05-31 17:39:15

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.21047v1

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to produce diverse, maximally informative responses. By allowing RLHF to confidently stray from the pre-trained model, online exploration offers the possibility of novel, potentially super-human capabilities, but its full potential as a paradigm for language model training has yet to be realized, owing to computational and statistical bottlenecks in directly adapting existing reinforcement learning techniques. We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO), which is simple and practical -- a one-line change to (online) Direct Preference Optimization (DPO; Rafailov et al., 2023) -- yet enjoys the strongest known provable guarantees and promising empirical performance. XPO augments the DPO objective with a novel and principled exploration bonus, empowering the algorithm to explore outside the support of the initial model and human feedback data. In theory, we show that XPO is provably sample-efficient and converges to a near-optimal language model policy under natural exploration conditions, irrespective of whether the initial model has good coverage. Our analysis, which builds on the observation that DPO implicitly performs a form of $Q^{\star}$-approximation (or, Bellman error minimization), combines previously disparate techniques from language modeling and theoretical reinforcement learning in a serendipitous fashion through the perspective of KL-regularized Markov decision processes. Empirically, we find that XPO is more sample-efficient than non-exploratory DPO variants in a preliminary evaluation.

Updated: 2024-05-31 17:39:06

标题: 探索性偏好优化：利用隐式Q*逼近实现样本高效的RLHF

摘要: 人类反馈强化学习（RLHF）已经成为语言模型对齐的核心工具。我们考虑RLHF中的在线探索，利用与人类或AI反馈的互动访问，通过鼓励模型产生多样化、最大信息量的回应。通过允许RLHF自信地偏离预训练模型，在线探索提供了新颖的、潜在超越人类能力的可能性，但作为语言模型训练范式的全部潜力尚未实现，这归因于直接调整现有强化学习技术中的计算和统计瓶颈。我们提出了一种新的RLHF在线探索算法，探索性偏好优化（XPO），它简单实用--只需对（在线）直接偏好优化（DPO；Rafailov等，2023）进行一行更改--但享有已知的最强保证和有希望的实证表现。XPO通过一种新颖和有原则的探索奖励增强了DPO的目标，使算法能够在初始模型和人类反馈数据支持之外进行探索。理论上，我们证明了XPO在自然探索条件下是经过证明的样本高效率，并收敛到近似最优的语言模型策略，无论初始模型覆盖情况如何。我们的分析基于观察到DPO隐式执行一种$Q^{\star}$-近似（或贝尔曼误差最小化）形式，并通过KL正则化马尔可夫决策过程的视角，以一种巧合的方式将语言建模和理论强化学习的先前不同技术相结合。在初步评估中，我们发现XPO比非探索性DPO变体更具样本效率。

更新时间: 2024-05-31 17:39:06

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.21046v1

An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction

Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones.

Updated: 2024-05-31 17:38:49

标题: 一种基于注意力的多上下文卷积编码器-解码器神经网络用于施工区域交通影响预测

摘要: 工作区是非预期交通拥堵和道路事故的主要原因之一。尽管其影响显著，但针对工作区交通影响的研究仍然稀缺。在本文中，我们提出了一个数据整合管道，增强了来自不同平台的工作区和交通数据的利用，并引入了一种新颖的深度学习模型，用于预测计划工作区事件期间的交通速度和事故可能性。所提出的模型将交通模式转化为2D时空图像，作为模型输入和输出，并采用基于注意力的多上下文卷积编码器-解码器架构，捕捉工作区事件和交通变化之间的时空依赖关系。在美国马里兰州四年的归档工作区交通数据上进行训练和验证，该模型在预测交通速度、事故可能性和推断的交通属性（如排队长度和拥堵时间（即开始时间和持续时间））方面表现出优越的性能。具体而言，所提出的模型通过将交通速度的预测误差降低了5%至34%，排队长度降低了11%至29%，拥堵时间降低了6%至17%，并将事故预测的准确性提高了5%至7%。因此，这个模型为增强工作区的规划和交通管理提供了重要的希望。

更新时间: 2024-05-31 17:38:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.21045v1

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.

Updated: 2024-05-31 17:36:16

标题: 目标网络和过度参数化稳定了带有函数逼近的离策略引导算法

摘要: 我们证明，在某些情况下，目标网络和过度参数化的线性函数逼近的结合为基于自举值估计建立了一个较弱的收敛条件，即使使用离线数据。我们的条件在整个状态-动作空间上的预期更新或从情节式马尔可夫决策过程中学习一批完整轨迹时自然得到满足。值得注意的是，仅使用目标网络或过度参数化模型并不能提供这样的收敛保证。此外，我们将结果扩展到学习截断轨迹，表明对所有任务都可以通过轻微修改实现收敛，类似于轨迹中最终状态的值截断。我们的主要结果侧重于预测的时序差估计，提供了高概率的值估计误差界限，并对Baird的反例和一个四房间任务进行了实证分析。此外，我们探讨了控制设置，展示了类似的收敛条件适用于Q学习。

更新时间: 2024-05-31 17:36:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.21043v1

Comparing information content of representation spaces for disentanglement with VAE ensembles

Disentanglement is the endeavour to use machine learning to divide information about a dataset into meaningful fragments. In practice these fragments are representation (sub)spaces, often the set of channels in the latent space of a variational autoencoder (VAE). Assessments of disentanglement predominantly employ metrics that are coarse-grained at the model level, but this approach can obscure much about the process of information fragmentation. Here we propose to study the learned channels in aggregate, as the fragments of information learned by an ensemble of repeat training runs. Additionally, we depart from prior work where measures of similarity between individual subspaces neglected the nature of data embeddings as probability distributions. Instead, we view representation subspaces as communication channels that perform a soft clustering of the data; consequently, we generalize two classic information-theoretic measures of similarity between clustering assignments to compare representation spaces. We develop a lightweight method of estimation based on fingerprinting representation subspaces by their ability to distinguish dataset samples, allowing us to identify, analyze, and leverage meaningful structure in ensembles of VAEs trained on synthetic and natural datasets. Using this fully unsupervised pipeline we identify "hotspots" in the space of information fragments: groups of nearly identical representation subspaces that appear repeatedly in an ensemble of VAEs, particularly as regularization is increased. Finally, we leverage the proposed methodology to achieve ensemble learning with VAEs, boosting the information content of a set of weak learners -- a capability not possible with previous methods of assessing channel similarity.

Updated: 2024-05-31 17:33:07

标题: 比较VAE集合中表征空间的信息内容，以实现解缠效果

摘要: 解缠是利用机器学习将关于数据集的信息划分为有意义的片段的努力。在实践中，这些片段通常是表示（子）空间，通常是变分自动编码器（VAE）的潜在空间中的通道集。解缠的评估主要使用在模型级别粗粒度的度量，但这种方法可能会掩盖关于信息分解过程的许多细节。在这里，我们建议研究学习的通道的总体情况，作为一组重复训练运行所学习的信息片段。此外，我们放弃了以前工作中忽视个别子空间之间相似性的度量的做法，忽视了数据嵌入作为概率分布的性质。相反，我们将表示子空间视为执行数据的软聚类的通信通道；因此，我们将两种经典的信息论相似性度量推广为比较表示空间之间的相似性。我们开发了一种基于指纹表示子空间的估计的轻量级方法，通过它们区分数据集样本的能力，使我们能够识别、分析和利用在合成和自然数据集上训练的VAE集合中的有意义的结构。使用这种完全无监督的管道，我们确定了信息片段空间中的“热点”：在一组VAE的集合中反复出现的几乎相同的表示子空间组，尤其是在正则化增加时。最后，我们利用提出的方法论实现了与VAE的集成学习，提升了一组弱学习者的信息内容——这是以前评估通道相似性的方法所无法实现的能力。

更新时间: 2024-05-31 17:33:07

领域: cs.LG

下载: http://arxiv.org/abs/2405.21042v1

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

We introduce API Pack, a massive multi-programming language dataset containing more than 1 million instruction-API call pairs to improve the API call generation capabilities of large language models. By fine-tuning CodeLlama-13B on 20,000 Python instances from API Pack, we achieved around 10% and 5% higher accuracy compared to GPT-3.5 and GPT-4, respectively, in generating unseen API calls. Fine-tuning on API Pack enables cross-programming language generalization by leveraging a large amount of data in one language and small amounts of data from other languages. Scaling the training data to 1 million instances further improves the model's generalization to new APIs not encountered during training. We open-source the API Pack dataset, trained models, and associated source code at https://github.com/zguo0525/API-Pack to facilitate further research.

Updated: 2024-05-31 17:31:38

标题: API Pack：用于API调用生成的大规模多编程语言数据集

摘要: 我们介绍了API Pack，一个庞大的多编程语言数据集，包含超过100万个指令-API调用对，旨在提高大型语言模型的API调用生成能力。通过在20,000个Python实例上对CodeLlama-13B进行微调，我们在生成未见API调用方面与GPT-3.5和GPT-4相比，分别实现了约10%和5%的更高准确性。在API Pack上进行微调可以通过利用一种语言中的大量数据和其他语言的少量数据来实现跨编程语言泛化。将训练数据扩展到100万个实例进一步提高了模型对训练过程中未遇到的新API的泛化能力。我们在https://github.com/zguo0525/API-Pack 上开源了API Pack数据集、训练模型和相关源代码，以促进进一步研究。

更新时间: 2024-05-31 17:31:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.09615v3

Direct Alignment of Language Models via Quality-Aware Self-Refinement

Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align the behaviors of Large Language Models (LLMs) with human preferences. Recently, a popular alternative is Direct Policy Optimization (DPO), which replaces an LLM-based reward model with the policy itself, thus obviating the need for extra memory and training time to learn the reward model. However, DPO does not consider the relative qualities of the positive and negative responses, and can lead to sub-optimal training outcomes. To alleviate this problem, we investigate the use of intrinsic knowledge within the on-the-fly fine-tuning LLM to obtain relative qualities and help to refine the loss function. Specifically, we leverage the knowledge of the LLM to design a refinement function to estimate the quality of both the positive and negative responses. We show that the constructed refinement function can help self-refine the loss function under mild assumptions. The refinement function is integrated into DPO and its variant Identity Policy Optimization (IPO). Experiments across various evaluators indicate that they can improve the performance of the fine-tuned models over DPO and IPO.

Updated: 2024-05-31 17:31:18

标题: 语言模型的直接对准通过质量感知的自我完善

摘要: 来自人类反馈的强化学习（RLHF）通常用于将大型语言模型（LLMs）的行为与人类偏好对齐。最近，一种流行的替代方案是直接策略优化（DPO），它用策略本身取代了基于LLM的奖励模型，因此无需额外的内存和训练时间来学习奖励模型。然而，DPO并未考虑正面和负面响应的相对质量，可能导致次优的训练结果。为了缓解这个问题，我们研究了在即时微调LLM中利用内在知识来获取相对质量并帮助优化损失函数。具体来说，我们利用LLM的知识设计了一个细化函数，用于估计正面和负面响应的质量。我们展示了构建的细化函数可以在温和的假设下帮助自我优化损失函数。细化函数被整合到DPO及其变体Identity Policy Optimization（IPO）中。跨不同评估者的实验表明，它们可以提高微调模型的性能，超过DPO和IPO。

更新时间: 2024-05-31 17:31:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.21040v1

A-PETE: Adaptive Prototype Explanations of Tree Ensembles

The need for interpreting machine learning models is addressed through prototype explanations within the context of tree ensembles. An algorithm named Adaptive Prototype Explanations of Tree Ensembles (A-PETE) is proposed to automatise the selection of prototypes for these classifiers. Its unique characteristics is using a specialised distance measure and a modified k-medoid approach. Experiments demonstrated its competitive predictive accuracy with respect to earlier explanation algorithms. It also provides a a sufficient number of prototypes for the purpose of interpreting the random forest classifier.

Updated: 2024-05-31 17:29:39

标题: A-PETE:树集成的自适应原型解释

摘要: 通过树集成的原型解释，解决了解释机器学习模型的需求。提出了一种名为Tree Ensembles的自适应原型解释算法（A-PETE），用于自动选择这些分类器的原型。其独特特点是使用专门的距离度量和修改后的k-medoid方法。实验表明，与先前的解释算法相比，它具有竞争力的预测准确性。它还为解释随机森林分类器提供了足够数量的原型。

更新时间: 2024-05-31 17:29:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.21036v1

Standards for Belief Representations in LLMs

As large language models (LLMs) continue to demonstrate remarkable abilities across various domains, computer scientists are developing methods to understand their cognitive processes, particularly concerning how (and if) LLMs internally represent their beliefs about the world. However, this field currently lacks a unified theoretical foundation to underpin the study of belief in LLMs. This article begins filling this gap by proposing adequacy conditions for a representation in an LLM to count as belief-like. We argue that, while the project of belief measurement in LLMs shares striking features with belief measurement as carried out in decision theory and formal epistemology, it also differs in ways that should change how we measure belief. Thus, drawing from insights in philosophy and contemporary practices of machine learning, we establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use, which together help lay the groundwork for a comprehensive understanding of belief representation in LLMs. We draw on empirical work showing the limitations of using various criteria in isolation to identify belief representations.

Updated: 2024-05-31 17:21:52

标题: 在LLMs中信念表征的标准

摘要: 随着大型语言模型（LLMs）在各个领域展示出卓越的能力，计算机科学家正在开发方法来理解它们的认知过程，特别是关于LLMs如何（以及是否）内部表示他们对世界的信念。然而，目前这一领域缺乏统一的理论基础来支持对LLMs中信念研究的探讨。本文开始填补这一空白，通过提出一个在LLMs中表示为信念类似的充分条件来探讨这个问题。我们认为，尽管在LLMs中信念测量的项目与在决策理论和形式认识论中进行的信念测量有着明显的相似之处，但也存在着一些不同之处，这些不同之处应该改变我们对信念的测量方式。因此，结合哲学的洞见和现代机器学习的实践，我们确立了四个平衡理论考虑和实际约束的标准。我们提出的标准包括准确性、一致性、统一性和使用性，这些标准共同帮助奠定了对LLMs中信念表示的全面理解的基础。我们借鉴了实证工作，展示了单独使用各种标准来识别信念表示的局限性。

更新时间: 2024-05-31 17:21:52

领域: cs.AI

下载: http://arxiv.org/abs/2405.21030v1

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based update to reduce the upper-level objective function over the approximated solution set. We measure the performance of our method in terms of suboptimality and infeasibility errors and provide non-asymptotic convergence guarantees for both error criteria. Specifically, when the feasible set is compact, we show that our method requires at most $\mathcal{O}(\max\{1/\sqrt{\epsilon_{f}}, 1/\epsilon_g\})$ iterations to find a solution that is $\epsilon_f$-suboptimal and $\epsilon_g$-infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the $r$-th H\"olderian error bound, we show that our method achieves an iteration complexity of $\mathcal{O}(\max\{\epsilon_{f}^{-\frac{2r-1}{2r}},\epsilon_{g}^{-\frac{2r-1}{2r}}\})$, which matches the optimal complexity of single-level convex constrained optimization when $r=1$.

Updated: 2024-05-31 17:20:29

标题: 一种用于凸平滑简单双层优化的加速梯度方法

摘要: 在这篇论文中，我们关注简单的双层优化问题，其中我们在另一个凸光滑约束优化问题的最优解集上最小化凸光滑目标函数。我们提出了一种新颖的双层优化方法，该方法使用切平面方法在局部近似较低层问题的解集，并采用加速梯度更新来减少在近似解集上的上层目标函数。我们通过次优性和不可行性误差来衡量我们方法的性能，并为两种误差标准提供非渐近收敛保证。具体而言，当可行集是紧凑的时，我们表明我们的方法最多需要$\mathcal{O}(\max\{1/\sqrt{\epsilon_{f}}, 1/\epsilon_g\})$次迭代来找到一个$\epsilon_f$-次优和$\epsilon_g$-不可行的解。此外，在额外假设较低层目标满足$r$-th H\"olderian误差界的情况下，我们表明我们的方法实现了$\mathcal{O}(\max\{\epsilon_{f}^{-\frac{2r-1}{2r}},\epsilon_{g}^{-\frac{2r-1}{2r}}\})$的迭代复杂性，这与当$r=1$时单层凸约束优化的最优复杂性相匹配。

更新时间: 2024-05-31 17:20:29

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.08097v2

Collective Variable Free Transition Path Sampling with Generative Flow Network

Understanding transition paths between meta-stable states in molecular systems is fundamental for material design and drug discovery. However, sampling these paths via molecular dynamics simulations is computationally prohibitive due to the high-energy barriers between the meta-stable states. Recent machine learning approaches are often restricted to simple systems or rely on collective variables (CVs) extracted from expensive domain knowledge. In this work, we propose to leverage generative flow networks (GFlowNets) to sample transition paths without relying on CVs. We reformulate the problem as amortized energy-based sampling over molecular trajectories and train a bias potential by minimizing the squared log-ratio between the target distribution and the generator, derived from the flow matching objective of GFlowNets. Our evaluation on three proteins (Alanine Dipeptide, Polyproline, and Chignolin) demonstrates that our approach, called TPS-GFN, generates more realistic and diverse transition paths than the previous CV-free machine learning approach.

Updated: 2024-05-31 17:18:35

标题: 使用生成流网络的无集体变量过渡路径抽样

摘要: 理解分子系统中亚稳态之间的转变路径对于材料设计和药物发现至关重要。然而，通过分子动力学模拟采样这些路径在计算上是禁止的，因为亚稳态之间存在高能量壁。最近的机器学习方法通常受限于简单系统，或依赖于从昂贵领域知识中提取的集体变量（CVs）。在这项工作中，我们提出利用生成式流网络（GFlowNets）来采样转变路径，而不依赖于CVs。我们将问题重新表述为通过分子轨迹进行摊销能量基础采样，并通过最小化目标分布与生成器之间的平方对数比率来训练偏置势，该生成器来源于GFlowNets的流匹配目标。我们在三种蛋白质（丙氨酸二肽、聚脯氨酸和Chignolin）上的评估表明，我们的方法，称为TPS-GFN，比以前的无CV机器学习方法生成更现实和多样化的转变路径。

更新时间: 2024-05-31 17:18:35

领域: cs.LG

下载: http://arxiv.org/abs/2405.19961v2

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (e.g. giving a numeric score) as well as implicit markers, like an authoritative tone or elaborating with additional knowledge. For LLMs to be trustworthy knowledge sources, the confidence they convey should match their actual expertise; however, most current models tend towards overconfidence. To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We cast calibration as preference optimization, creating data via a two-agent game, where a speaker model's outputs are judged by a simulated listener. We then finetune three LLMs (Mistral-7B, Llama3-8B, Llama3-70B) with LACIE, and show that the resulting models are better calibrated w.r.t. a simulated listener. Crucially, these trends transfer to human listeners, helping them correctly predict model correctness: we conduct a human evaluation where annotators accept or reject an LLM's answers, finding that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers. Furthermore, LACIE generalizes to another dataset, resulting in a large increase in truthfulness on TruthfulQA when trained on TriviaQA. Our analysis indicates that LACIE leads to a better confidence separation between correct and incorrect examples. Qualitatively, we find that a LACIE-trained model hedges more and implicitly signals certainty when it is correct by using an authoritative tone or including details. Finally, LACIE finetuning leads to an emergent increase in model abstention (e.g. saying "I don't know") for answers that are likely wrong.

Updated: 2024-05-31 17:16:38

标题: LACIE：大型语言模型中的听众感知微调用于置信度校准

摘要: 在回答问题时，语言模型可以传达不仅是答案，还有答案正确的信心水平。这包括显式的信心标记（例如给出一个数字分数）以及隐含的标记，比如权威的语气或者通过额外的知识详细阐述。为了让语言模型成为可信赖的知识来源，它们传达的信心应该与其实际专业知识相匹配；然而，大多数当前的模型倾向于过度自信。为了校准隐含和显式的信心标记，我们引入了一种实用的、关注听众的微调方法（LACIE），该方法对听众进行建模，考虑的不仅是答案是否正确，而且还有它是否会被听众接受。我们将校准视为偏好优化，通过一个两个代理模型的游戏创建数据，其中演讲者模型的输出由模拟听众来评判。然后，我们用LACIE对三个语言模型（Mistral-7B、Llama3-8B、Llama3-70B）进行微调，并展示由此产生的模型在与模拟听众相关的方面更好校准。至关重要的是，这些趋势可以转移到人类听众，帮助他们正确预测模型的正确性：我们进行了一个人类评估，标注者接受或拒绝语言模型的答案，发现使用LACIE进行训练结果使47%更少的错误答案被接受，同时保持了相同水平的正确答案接受率。此外，LACIE可以泛化到另一个数据集，当在TriviaQA上进行训练时，在TruthfulQA上的真实性大幅增加。我们的分析表明，LACIE导致了正确和错误示例之间更好的信心分离。从定性上看，我们发现通过LACIE进行训练的模型更加谨慎，并在正确时通过权威的语气或包含细节暗示确定性。最后，LACIE的微调导致模型对可能错误的答案进行紧急弃权（例如说“我不知道”）。

更新时间: 2024-05-31 17:16:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.21028v1

Compact Optimality Verification for Optimization Proxies

Recent years have witnessed increasing interest in optimization proxies, i.e., machine learning models that approximate the input-output mapping of parametric optimization problems and return near-optimal feasible solutions. Following recent work by (Nellikkath & Chatzivasileiadis, 2021), this paper reconsiders the optimality verification problem for optimization proxies, i.e., the determination of the worst-case optimality gap over the instance distribution. The paper proposes a compact formulation for optimality verification and a gradient-based primal heuristic that brings substantial computational benefits to the original formulation. The compact formulation is also more general and applies to non-convex optimization problems. The benefits of the compact formulation are demonstrated on large-scale DC Optimal Power Flow and knapsack problems.

Updated: 2024-05-31 17:11:39

标题: 优化代理的紧凑优化验证

摘要: 近年来，人们对优化代理感兴趣的增加，即机器学习模型，它们近似参数化优化问题的输入输出映射，并返回接近最优的可行解。本文在(Nellikkath & Chatzivasileiadis, 2021)最近的工作基础上，重新考虑了优化代理的最优性验证问题，即确定实例分布上最坏情况的最优性差距。本文提出了一个紧凑的形式化方法用于最优性验证，以及一个基于梯度的原始启发式方法，为原始形式带来了显著的计算优势。该紧凑形式化方法也更通用，适用于非凸优化问题。紧凑形式化方法的优势在大规模直流最优功率流和背包问题上得到了展示。

更新时间: 2024-05-31 17:11:39

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2405.21023v1

Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging

Dynamic Positron Emission Tomography (dPET) imaging and Time-Activity Curve (TAC) analyses are essential for understanding and quantifying the biodistribution of radiopharmaceuticals over time and space. Traditional compartmental modeling, while foundational, commonly struggles to fully capture the complexities of biological systems, including non-linear dynamics and variability. This study introduces an innovative data-driven neural network-based framework, inspired by Reaction Diffusion systems, designed to address these limitations. Our approach, which adaptively fits TACs from dPET, enables the direct calibration of diffusion coefficients and reaction terms from observed data, offering significant improvements in predictive accuracy and robustness over traditional methods, especially in complex biological scenarios. By more accurately modeling the spatio-temporal dynamics of radiopharmaceuticals, our method advances modeling of pharmacokinetic and pharmacodynamic processes, enabling new possibilities in quantitative nuclear medicine.

Updated: 2024-05-31 17:09:07

标题: 超越传统参数建模：基于数据驱动的动态PET成像中时间活动曲线的估计和预测框架

摘要: 动态正电子发射断层扫描（dPET）成像和时间-活动曲线（TAC）分析对于理解和量化放射性药物在时间和空间上的生物分布至关重要。传统的隔室建模虽然是基础，但常常难以完全捕捉生物系统的复杂性，包括非线性动态和变异性。本研究引入了一种受反应扩散系统启发的创新数据驱动的神经网络框架，旨在解决这些局限性。我们的方法通过自适应拟合dPET的TAC，能够直接校准扩散系数和反应项，从观测数据中获得显著的预测精度和稳健性改进，尤其是在复杂的生物场景中。通过更准确地建模放射性药物的时空动态，我们的方法推动了药代动力学和药效动力学过程的建模，为定量核医学开启了新的可能性。

更新时间: 2024-05-31 17:09:07

领域: cs.LG,eess.IV,math.DS

下载: http://arxiv.org/abs/2405.21021v1

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milestone, its attacking efficiency remains unsatisfactory. In this paper, we present several improved (empirical) techniques for optimization-based jailbreaks like GCG. We first observe that the single target template of "Sure" largely limits the attacking performance of GCG; given this, we propose to apply diverse target templates containing harmful self-suggestion and/or guidance to mislead LLMs. Besides, from the optimization aspects, we propose an automatic multi-coordinate updating strategy in GCG (i.e., adaptively deciding how many tokens to replace in each step) to accelerate convergence, as well as tricks like easy-to-hard initialisation. Then, we combine these improved technologies to develop an efficient jailbreak method, dubbed $\mathcal{I}$-GCG. In our experiments, we evaluate on a series of benchmarks (such as NeurIPS 2023 Red Teaming Track). The results demonstrate that our improved techniques can help GCG outperform state-of-the-art jailbreaking attacks and achieve nearly 100% attack success rate. The code is released at https://github.com/jiaxiaojunQAQ/I-GCG.

Updated: 2024-05-31 17:07:15

标题: 大规模语言模型优化破解的改进技术

摘要: 大型语言模型（LLMs）正在迅速发展，它们广泛部署的关键组成部分是与安全相关的对齐。许多红队攻击旨在越狱LLMs，其中在这些努力中，贪婪坐标梯度（GCG）攻击的成功导致人们对基于优化的越狱技术的研究越来越感兴趣。尽管GCG是一个重要的里程碑，但其攻击效率仍然不尽人意。在本文中，我们提出了几种改进的（经验）技术，用于像GCG这样基于优化的越狱。我们首先观察到“Sure”单目标模板在很大程度上限制了GCG的攻击性能；鉴于此，我们提议应用包含有害的自我建议和/或指导的多样化目标模板来误导LLMs。此外，从优化的角度来看，我们提出了GCG中的自动多坐标更新策略（即，自适应地决定每步替换多少个令牌）以加快收敛速度，以及类似易到难的初始化技巧。然后，我们结合这些改进的技术来开发一种高效的越狱方法，称为$\mathcal{I}$-GCG。在我们的实验中，我们在一系列基准测试中进行评估（如NeurIPS 2023红队跟踪）。结果表明，我们改进的技术可以帮助GCG超越最先进的越狱攻击，并实现近乎100%的攻击成功率。代码发布在https://github.com/jiaxiaojunQAQ/I-GCG。

更新时间: 2024-05-31 17:07:15

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2405.21018v1

Stochastic Online Fisher Markets: Static Pricing Limits and Adaptive Enhancements

Fisher markets are one of the most fundamental models for resource allocation. However, the problem of computing equilibrium prices in Fisher markets typically relies on complete knowledge of users' budgets and utility functions and requires transactions to happen in a static market where all users are present simultaneously. Motivated by these practical considerations, we study an online variant of Fisher markets, wherein users with privately known utility and budget parameters, drawn i.i.d. from a distribution, arrive sequentially. In this setting, we first study the limitations of static pricing algorithms, which set uniform prices for all users, along two performance metrics: (i) regret, i.e., the optimality gap in the objective of the Eisenberg-Gale program between an online algorithm and an oracle with complete information, and (ii) capacity violations, i.e., the over-consumption of goods relative to their capacities. Given the limitations of static pricing, we design adaptive posted-pricing algorithms, one with knowledge of the distribution of users' budget and utility parameters and another that adjusts prices solely based on past observations of user consumption, i.e., revealed preference feedback, with improved performance guarantees. Finally, we present numerical experiments to compare our revealed preference algorithm's performance to several benchmarks.

Updated: 2024-05-31 17:07:04

标题: 随机在线费舍尔市场：静态定价限制和自适应增强

摘要: Fisher市场是资源分配的最基本模型之一。然而，在Fisher市场中计算均衡价格的问题通常依赖于对用户预算和效用函数的完全了解，并要求交易发生在所有用户同时出现的静态市场中。受到这些实际考虑的启发，我们研究了Fisher市场的在线变体，其中具有私人已知效用和预算参数的用户依次到达，这些参数是从分布中独立同分布地抽取的。在这种设置下，我们首先研究了静态定价算法的局限性，这些算法为所有用户设置统一价格，并沿着两个性能指标进行评估：（i）遗憾，即在线算法与具有完全信息的预言者之间的Eisenberg-Gale方案目标的最优差距，以及（ii）容量违规，即相对于其容量的过度消费。鉴于静态定价的局限性，我们设计了适应性发布定价算法，其中一个具有对用户预算和效用参数分布的了解，另一个仅基于用户消费的过去观察来调整价格，即透露偏好反馈，具有改进的性能保证。最后，我们进行了数值实验，将我们的透露偏好算法的性能与几个基准进行比较。

更新时间: 2024-05-31 17:07:04

领域: cs.GT,cs.LG,econ.TH,math.OC

下载: http://arxiv.org/abs/2205.00825v4

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer

Updated: 2024-05-31 17:03:00

标题: 分层世界模型作为视觉全身人形控制器

摘要: 人形机器人的全身控制具有挑战性，这是由于问题的高维性质，加上双足形态的固有不稳定性。从视觉观察中学习进一步加剧了这种困难。在这项工作中，我们探索了基于强化学习的高度数据驱动的视觉全身人形控制方法，没有任何简化假设、奖励设计或技能原语。具体来说，我们提出了一个分层世界模型，其中高层代理根据视觉观察生成指令，供低层代理执行，两者都接受奖励训练。我们的方法在一个模拟的56自由度人形机器人上的8个任务中产生了高性能控制策略，同时合成出广受人类喜爱的动作。代码和视频：https://nicklashansen.com/rlpuppeteer

更新时间: 2024-05-31 17:03:00

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2405.18418v2

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework HierTail for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HierTail in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in accuracy.

Updated: 2024-05-31 17:02:37

标题: 掌握图上的长尾复杂性：特征化、学习和泛化

摘要: 在图中长尾分类的背景下，现有大部分工作主要围绕模型去偏置策略的发展，旨在减轻类别不平衡并提升整体性能。尽管取得了显著成功，但提供理论工具来描述图中长尾类别的行为并获得在现实场景中的泛化性能洞见的文献非常有限。为了弥补这一差距，我们提出了一个适用于图中长尾分类的泛化界限，通过将问题制定为多任务学习的方式，即每个任务对应于预测一个特定类别。我们的理论结果表明，长尾分类的泛化性能受整体损失范围和任务复杂度的支配。基于理论发现，我们提出了一个新颖的通用框架HierTail，用于图中长尾分类。特别是，我们从一个分层任务分组模块开始，允许我们将相关任务分配到超任务中，从而控制任务空间的复杂度；然后，我们进一步设计了一个平衡的对比学习模块，以自适应平衡头部和尾部类别的梯度，以统一方式来控制所有任务中的损失范围。大量实验证明了HierTail在表征真实图中的长尾类别方面的有效性，其准确率比领先基线方法提高了高达12.9%。

更新时间: 2024-05-31 17:02:37

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2305.09938v4

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To improve the recall of the RAG for the target contexts, we introduce a knowledge graph to construct structured data to achieve hard matching at a fine-grained level. Moreover, we normalize the backdoor scenarios in LLMs to analyze the real harm caused by backdoors from both attackers' and users' perspectives and further verify whether the context is a favorable tool for jailbreaking models. Extensive experimental results on truthfulness, language understanding, and harmfulness show that TrojanRAG exhibits versatility threats while maintaining retrieval capabilities on normal queries.

Updated: 2024-05-31 16:59:17

标题: 特洛伊木马RAG：检索增强生成可以成为大型语言模型中的后门驱动程序

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）方面表现显著，但引起了潜在安全威胁的担忧。后门攻击最初验证了LLM在所有阶段都造成了重大伤害，但其成本和稳健性受到了批评。攻击LLMs在安全审查中本质上是有风险的，而且成本高昂。此外，LLMs的持续迭代将降低后门的稳健性。在本文中，我们提出了TrojanRAG，它在检索增强生成中采用联合后门攻击，从而在通用攻击场景中操纵LLMs。具体而言，对手构建了精心设计的目标上下文和触发集。通过对比学习，多对后门快捷方式被正交优化，从而将触发条件限制在参数子空间以改善匹配。为了提高RAG对目标上下文的召回率，我们引入了知识图来构建结构化数据，以在细粒度水平上实现硬匹配。此外，我们对LLMs中的后门场景进行了规范化处理，以分析从攻击者和用户的角度造成的实际伤害，并进一步验证上下文是否是破解模型的有利工具。在真实性、语言理解和有害性方面的广泛实验结果表明，TrojanRAG展示了多样化的威胁，同时保持对常规查询的检索能力。

更新时间: 2024-05-31 16:59:17

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2405.13401v3

Modeling User Preferences via Brain-Computer Interfacing

Present Brain-Computer Interfacing (BCI) technology allows inference and detection of cognitive and affective states, but fairly little has been done to study scenarios in which such information can facilitate new applications that rely on modeling human cognition. One state that can be quantified from various physiological signals is attention. Estimates of human attention can be used to reveal preferences and novel dimensions of user experience. Previous approaches have tackled these incredibly challenging tasks using a variety of behavioral signals, from dwell-time to click-through data, and computational models of visual correspondence to these behavioral signals. However, behavioral signals are only rough estimations of the real underlying attention and affective preferences of the users. Indeed, users may attend to some content simply because it is salient, but not because it is really interesting, or simply because it is outrageous. With this paper, we put forward a research agenda and example work using BCI to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience. Subsequently, we link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.

Updated: 2024-05-31 16:57:30

标题: 通过脑机接口建模用户偏好

摘要: 目前的脑-计算机界面（BCI）技术允许推断和检测认知和情感状态，但在研究这些信息如何促进依赖于人类认知建模的新应用方面所做的工作相对较少。可以从各种生理信号中量化的一种状态是注意力。人类注意力的估计可以用来揭示用户体验的偏好和新领域。之前的方法使用各种行为信号，从停留时间到点击数据，以及与这些行为信号的视觉对应的计算模型，来应对这些极具挑战性的任务。然而，行为信号只是对用户真实注意力和情感偏好的粗略估计。事实上，用户可能只是因为内容引人注目而关注某些内容，而不是因为内容真的有趣，或者仅仅因为内容引人注目。在本文中，我们提出了一个研究议程和使用BCI推断用户偏好、他们对视觉内容的注意力相关以及与情感体验的关联的示例工作。随后，我们将这些与相关应用联系起来，例如信息检索、个性化引导生成模型以及众包对情感体验的人口估计。

更新时间: 2024-05-31 16:57:30

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.09691v2

G-Transformer for Conditional Average Potential Outcome Estimation over Time

Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. Yet, existing neural methods for this task suffer from either (a) bias or (b) large variance. In order to address both limitations, we introduce the G-transformer (GT). Our GT is a novel, neural end-to-end model designed for unbiased, low-variance estimation of conditional average potential outcomes (CAPOs) over time. Specifically, our GT is the first neural model to perform regression-based iterative G-computation for CAPOs in the time-varying setting. We evaluate the effectiveness of our GT across various experiments. In sum, this work represents a significant step towards personalized decision-making from electronic health records.

Updated: 2024-05-31 16:52:51

标题: G-Transformer用于随时间条件平均潜在结果估计

摘要: 根据观察数据估计治疗随时间变化的潜在结果对于个性化医学决策非常重要。然而，现有的神经方法在这项任务上存在偏差或大方差的问题。为了解决这两个限制，我们引入了G-transformer（GT）。我们的GT是一种新颖的神经端到端模型，旨在对时间上的条件平均潜在结果（CAPOs）进行无偏差、低方差的估计。具体来说，我们的GT是第一个在时变环境中执行基于回归的迭代G-计算以获得CAPOs的神经模型。我们通过各种实验评估了我们的GT的有效性。总的来说，这项工作代表了从电子健康记录中实现个性化决策的重要一步。

更新时间: 2024-05-31 16:52:51

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.21012v1

Explaining Explanations in Probabilistic Logic Programming

The emergence of tools based on artificial intelligence has also led to the need of producing explanations which are understandable by a human being. In most approaches, the system is considered a black box, making it difficult to generate appropriate explanations. In this work, though, we consider a setting where models are transparent: probabilistic logic programming (PLP), a paradigm that combines logic programming for knowledge representation and probability to model uncertainty. However, given a query, the usual notion of explanation is associated with a set of choices, one for each random variable of the model. Unfortunately, such a set does not explain why the query is true and, in fact, it may contain choices that are actually irrelevant for the considered query. To improve this situation, we present in this paper an approach to explaining explanations which is based on defining a new query-driven inference mechanism for PLP where proofs are labeled with "choice expressions", a compact and easy to manipulate representation for sets of choices. The combination of proof trees and choice expressions allows us to produce comprehensible query justifications with a causal structure.

Updated: 2024-05-31 16:45:22

标题: 概率逻辑编程中解释解释的解释

摘要: 基于人工智能的工具的出现也导致了需要产生能够被人理解的解释的需求。在大多数方法中，系统被视为黑匣子，这使得生成适当的解释变得困难。然而，在本研究中，我们考虑了一个透明的模型设置：概率逻辑编程（PLP），这是一种将逻辑编程用于知识表示和概率建模不确定性的范式。然而，对于给定的查询，通常的解释概念与模型的每个随机变量的一组选择相关联。不幸的是，这样的集合并不能解释为什么查询是真实的，实际上，它可能包含实际上与所考虑的查询无关的选择。为了改善这种情况，我们在本文中提出了一种基于为PLP定义一个新的查询驱动推理机制的解释方法，其中证明被标记为“选择表达式”，这是一种紧凑且易于操作的集合选择表示。证明树和选择表达式的结合使我们能够产生具有因果结构的可理解的查询解释。

更新时间: 2024-05-31 16:45:22

领域: cs.AI,cs.PL

下载: http://arxiv.org/abs/2401.17045v3

Explaining Predictions by Characteristic Rules

Characteristic rules have been advocated for their ability to improve interpretability over discriminative rules within the area of rule learning. However, the former type of rule has not yet been used by techniques for explaining predictions. A novel explanation technique, called CEGA (Characteristic Explanatory General Association rules), is proposed, which employs association rule mining to aggregate multiple explanations generated by any standard local explanation technique into a set of characteristic rules. An empirical investigation is presented, in which CEGA is compared to two state-of-the-art methods, Anchors and GLocalX, for producing local and aggregated explanations in the form of discriminative rules. The results suggest that the proposed approach provides a better trade-off between fidelity and complexity compared to the two state-of-the-art approaches; CEGA and Anchors significantly outperform GLocalX with respect to fidelity, while CEGA and GLocalX significantly outperform Anchors with respect to the number of generated rules. The effect of changing the format of the explanations of CEGA to discriminative rules and using LIME and SHAP as local explanation techniques instead of Anchors are also investigated. The results show that the characteristic explanatory rules still compete favorably with rules in the standard discriminative format. The results also indicate that using CEGA in combination with either SHAP or Anchors consistently leads to a higher fidelity compared to using LIME as the local explanation technique.

Updated: 2024-05-31 16:44:40

标题: 用特征规则解释预测结果

摘要: 特征规则因其在规则学习领域中提高可解释性的能力而受到推崇。然而，在解释预测的技术中，前一种规则类型尚未被使用。提出了一种新颖的解释技术，称为CEGA（Characteristic Explanatory General Association rules），它利用关联规则挖掘将由任何标准局部解释技术生成的多个解释聚合为一组特征规则。进行了一项实证调查，将CEGA与两种最先进的方法Anchors和GLocalX进行比较，用于生成形式为区分规则的局部和聚合解释。结果表明，与两种最先进的方法相比，所提出的方法在忠实度和复杂度之间提供了更好的权衡；CEGA和Anchors在忠实度方面明显优于GLocalX，而CEGA和GLocalX在生成规则数量方面明显优于Anchors。还研究了将CEGA的解释格式更改为区分规则，并使用LIME和SHAP作为局部解释技术而非Anchors的效果。结果显示，特征解释规则仍然与标准区分规则竞争有利。结果还表明，将CEGA与SHAP或Anchors结合使用始终比使用LIME作为局部解释技术具有更高的忠实度。

更新时间: 2024-05-31 16:44:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.21003v1

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by hand. A graphic designer and a simulation engineer work with predefined assets to construct rich scenes with realistic dynamic and kinematic properties. While this may scale to small numbers of scenes, to achieve the generalization properties that are required for data-driven robotic control, we require a pipeline that is able to synthesize large numbers of realistic scenes, complete with 'natural' kinematic and dynamic structures. To attack this problem, we develop models for inferring structure and generating simulation scenes from natural images, allowing for scalable scene generation from web-scale datasets. To train these image-to-simulation models, we show how controllable text-to-image generative models can be used in generating paired training data that allows for modeling of the inverse problem, mapping from realistic images back to complete scene models. We show how this paradigm allows us to build large datasets of scenes in simulation with semantic and physical realism. We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images and use these for training robotic control policies. We then robustly deploy in the real world for tasks like articulated object manipulation. In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies in the resulting environments.

Updated: 2024-05-31 16:44:06

标题: URDFormer：从现实世界图像构建关节仿真环境的流水线

摘要: 构建既视觉又物理真实的模拟场景是一个实际领域的问题，涉及从机器人到计算机视觉等领域。随着使用大量数据的学习方法的研究人员寻找新的训练数据来源，这个问题变得更加相关。然而，构建模拟模型通常仍然是手工完成的。图形设计师和模拟工程师使用预定义的资产来构建具有真实动态和运动特性的丰富场景。虽然这可以扩展到少量场景，但为了实现数据驱动机器人控制所需的泛化特性，我们需要一个能够合成大量真实场景的流水线，包括“自然”动力学和动态结构。为了解决这个问题，我们开发了推断结构和从自然图像生成模拟场景的模型，允许从网络规模数据集进行可扩展的场景生成。为了训练这些图像到模拟的模型，我们展示了可控的文本到图像生成模型如何用于生成成对的训练数据，从而允许对逆问题建模，即从真实图像到完整场景模型的映射。我们展示了这种范式如何让我们在模拟中构建具有语义和物理真实性的大型数据集。我们提出了一个集成的端到端流水线，从现实世界图像生成具有关节动力学和动态结构的模拟场景，并将其用于训练机器人控制策略。然后我们在现实世界中稳健地部署用于关节对象操作等任务。通过这样做，我们的工作既提供了一个用于大规模生成模拟环境的流水线，又提供了一个集成系统，用于在生成的环境中训练稳健的机器人控制策略。

更新时间: 2024-05-31 16:44:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.11656v3

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

Reinforcement learning (RL) has garnered significant attention for developing decision-making agents that aim to maximize rewards, specified by an external supervisor, within fully observable environments. However, many real-world problems involve partial observations, formulated as partially observable Markov decision processes (POMDPs). Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment from observed data. However, aggregating observed data over time becomes impractical in continuous spaces. Moreover, inference-based RL approaches often require many samples to perform well, as they focus solely on reward maximization and neglect uncertainty in the inferred state. Active inference (AIF) is a framework formulated in POMDPs and directs agents to select actions by minimizing a function called expected free energy (EFE). This supplies reward-maximizing (exploitative) behaviour, as in RL, with information-seeking (exploratory) behaviour. Despite this exploratory behaviour of AIF, its usage is limited to discrete spaces due to the computational challenges associated with EFE. In this paper, we propose a unified principle that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their aforementioned limitations in continuous space POMDP settings. We substantiate our findings with theoretical analysis, providing novel perspectives for utilizing AIF in the design of artificial agents. Experimental results demonstrate the superior learning capabilities of our method in solving continuous space partially observable tasks. Notably, our approach harnesses information-seeking exploration, enabling it to effectively solve reward-free problems and rendering explicit task reward design by an external supervisor optional.

Updated: 2024-05-31 16:40:03

标题: 主动推理和强化学习：在部分可观察性下对连续状态和行动空间的统一推理

摘要: 强化学习（RL）受到了广泛关注，因为它可以开发决策代理，旨在在完全可观测的环境中最大化由外部监督指定的奖励。然而，许多现实世界的问题涉及部分观测，被制定为部分可观测马尔可夫决策过程（POMDPs）。先前的研究通过合并过去的行动和观测的记忆，或者从观察数据中推断环境的真实状态来处理POMDPs中的RL。然而，随着时间的推移聚合观察数据在连续空间中变得不切实际。此外，基于推理的RL方法通常需要许多样本才能表现良好，因为它们仅关注奖励最大化，并忽视推断状态中的不确定性。主动推理（AIF）是在POMDPs中制定的一个框架，通过最小化称为预期自由能量（EFE）的函数来指导代理选择行动。这为奖励最大化（剥削性）行为提供了信息寻求（探索性）行为，就像在RL中一样。尽管AIF的这种探索行为，但由于与EFE相关的计算挑战，其使用仅限于离散空间。在本文中，我们提出了一个统一原则，建立了AIF和RL之间的理论联系，使这两种方法能够无缝集成，并克服了它们在连续空间POMDP设置中的前述限制。我们通过理论分析证实了我们的发现，为在人工代理设计中利用AIF提供了新的视角。实验结果表明，我们的方法在解决连续空间部分可观测任务方面具有卓越的学习能力。值得注意的是，我们的方法利用信息寻求探索，使其能够有效地解决无奖励问题，并使外部监督的显式任务奖励设计成为可选项。

更新时间: 2024-05-31 16:40:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2212.07946v3

Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise

We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is i.i.d. Gaussian, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or they are limited to a special class of noise ensembles. In this paper, we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. These limits are then achieved by an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations. Our approach leverages tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals), and it unveils the equivalence between the rotationally invariant model and a surrogate Gaussian model.

Updated: 2024-05-31 16:38:35

标题: 信息限制和结构化噪声的尖峰矩阵模型的Thouless-Anderson-Palmer方程

摘要: 我们考虑了一个结构化尖峰模型的贝叶斯推断的典型问题：一个低秩信号被加性噪声污染。当噪声是独立同分布的高斯时，信息论和算法限制都已经很好地理解，但更现实的结构化噪声情况仍然具有挑战性。为了捕捉结构同时保持数学可处理性，一系列工作集中在旋转不变噪声上。然而，现有研究要么提供次优算法，要么限于特殊类别的噪声集合。在本文中，我们建立了从一般迹集合中绘制的噪声矩阵的信息论限制的第一个表征。然后，通过灵感来自自适应Thouless-Anderson-Palmer (TAP)方程理论的高效算法实现了这些限制。我们的方法利用了统计物理学（复制方法）和随机矩阵理论（广义球面积分）的工具，并揭示了旋转不变模型与替代高斯模型之间的等价性。

更新时间: 2024-05-31 16:38:35

领域: cs.IT,cond-mat.dis-nn,cs.LG,math.IT,math.ST,stat.TH,62F15, 82B44

下载: http://arxiv.org/abs/2405.20993v1

Calibrated Self-Rewarding Vision Language Models

Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR enhances performance and reduces hallucinations across ten benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning. Our data and code are available at https://github.com/YiyangZhou/CSR.

Updated: 2024-05-31 16:37:53

标题: 经校准的自我奖励视觉语言模型

摘要: 大型视觉语言模型（LVLMs）通过整合预训练的大型语言模型（LLMs）和视觉模型，通过指导调整取得了实质性进展。尽管取得了这些进展，LVLMs经常表现出幻觉现象，即生成的文本响应在语言上似乎合理，但与输入图像相矛盾，表明图像和文本对之间存在错位。这种错位是因为模型倾向于优先考虑文本信息而不是视觉输入，即使语言模型和视觉表示都具有高质量。现有方法利用额外的模型或人类注释来筛选偏好数据，并通过偏好优化增强模态对齐。这些方法可能无法有效反映目标LVLM的偏好，使得策划的偏好容易被区分出来。我们的工作通过提出校准的自我奖励（CSR）方法来应对这些挑战，该方法使模型能够通过迭代生成候选响应、评估每个响应的奖励，并策划偏好数据进行微调来自我改进。在奖励建模中，我们采用逐步策略，并将视觉约束纳入自我奖励过程中，以更加强调视觉输入。实证结果表明，CSR提高了性能，在十个基准和任务中减少了幻觉，实现了比现有方法高出7.62％的实质性改进。我们的实证结果得到了严格的理论分析的支持，在温和假设下，验证了将视觉约束引入自我奖励范式的有效性。此外，CSR显示出与不同的视觉语言模型兼容，并通过迭代微调逐步改善性能的能力。我们的数据和代码可在https://github.com/YiyangZhou/CSR 上找到。

更新时间: 2024-05-31 16:37:53

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.14622v3

Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models

Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect_VLM.

Updated: 2024-05-31 16:35:41

标题: 基于视觉-语言基础模型的运动预测中的困难案例检测

摘要: 处理自动驾驶中的困难情况，如异常的道路使用者、极端天气条件和复杂交通互动，面临着重大挑战。为了确保安全，对于自动驾驶系统，有效检测和管理这些情况至关重要。然而，这些情况的罕见性和高风险性要求训练强大模型的广泛、多样化数据集。视觉语言基础模型（VLMs）在被训练的广泛数据集上展现出了出色的零样本能力。这项工作探讨了VLMs在检测自动驾驶中困难情况的潜力。我们展示了GPT-4v等VLMs在交通参与者运动预测中检测困难情况的能力，涵盖了代理和场景两个层面。我们引入了一个可行的流程，其中VLMs被连续图像帧和设计的提示输入，有效识别具有挑战性的代理或场景，这些代理或场景由现有的预测模型验证。此外，通过利用VLMs对困难情况的检测，我们通过为GPT建议的训练样本执行数据选择，进一步提高了现有运动预测流程的训练效率。我们展示了我们的流程在NuScenes数据集上与最先进方法相结合的有效性和可行性。代码可在https://github.com/KTH-RPL/Detect_VLM上访问。

更新时间: 2024-05-31 16:35:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20991v1

Locking Machine Learning Models into Hardware

Modern Machine Learning models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed -- for example it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as Multi-Party Computation or Homomorphic encryption remain impractical for wide adoption. In this paper we take a different approach and investigate feasibility of ML-specific mechanisms that deter unauthorized model use by restricting the model to only be usable on specific hardware, making adoption on unauthorized hardware inconvenient. That way, even if IP is compromised, it cannot be trivially used without specialised hardware or major model adjustment. In a sense, we seek to enable cheap locking of machine learning models into specific hardware. We demonstrate that locking mechanisms are feasible by either targeting efficiency of model representations, such making models incompatible with quantisation, or tie the model's operation on specific characteristics of hardware, such as number of cycles for arithmetic operations. We demonstrate that locking comes with negligible work and latency overheads, while significantly restricting usability of the resultant model on unauthorized hardware.

Updated: 2024-05-31 16:35:29

标题: 将机器学习模型锁定到硬件中

摘要: 现代机器学习模型是昂贵的知识产权，商业竞争力往往取决于保持这种知识产权的机密性。这反过来限制了这些模型的部署方式--例如，如何在设备上部署模型而不可避免地泄漏底层模型尚不清楚。与此同时，诸如多方计算或同态加密等保密计算技术仍然不实用于广泛采用。在本文中，我们采取了一种不同的方法，研究了通过将模型限制为仅可在特定硬件上使用的ML特定机制的可行性，以阻止未经授权的模型使用，从而使在未经授权的硬件上采用变得不方便。这样，即使知识产权被泄露，也不能轻易使用，除非有专门的硬件或重大的模型调整。在某种意义上，我们试图实现将机器学习模型以廉价的方式锁定在特定硬件上。我们通过以提高模型表示的效率为目标，比如使模型与量化不兼容，或者将模型的操作与特定硬件特性相关联，比如算术操作的周期数，展示了锁定机制是可行的。我们证明，锁定带来的工作和延迟开销微不足道，同时显著限制了在未经授权的硬件上使用结果模型的可用性。

更新时间: 2024-05-31 16:35:29

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20990v1

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Driven by the ever-growing volume and decentralized nature of data, coupled with the escalating size of modern models, distributed deep learning (DDL) has been entrenched as the preferred paradigm for training. However, frequent synchronization of DL models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth, and make themselves less practical in bandwidth-constrained federated settings, by relying on overly simplistic, periodic, and rigid synchronization schedules. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Remarkably, FDA achieves this without sacrificing convergence speed - in stark contrast to the trade-offs encountered in the field. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings.

Updated: 2024-05-31 16:34:11

标题: 通过联邦动态平均实现高效的分布式深度学习通信

摘要: 由于数据量不断增长和分散性质，再加上现代模型规模不断扩大，分布式深度学习（DDL）已经成为训练的首选范式。然而，DL模型的频繁同步，涵盖了数百万到数十亿个参数，造成了通信瓶颈，严重阻碍了可扩展性。更糟糕的是，DDL算法通常浪费了宝贵的带宽，并且通过过于简单、周期性和僵化的同步时间表，在带宽受限的联合设置中变得不太实用。为了解决这些缺点，我们提出了联合动态平均（FDA），一种基于模型方差值动态触发同步的高效通信DDL策略。通过在广泛的学习任务范围内进行大量实验，我们证明FDA将通信成本降低数个数量级，相比传统和尖端的通信高效算法。值得注意的是，FDA在不牺牲收敛速度的情况下实现了这一点——与领域中遇到的权衡形成鲜明对比。此外，我们展示FDA在不同数据异质性设置下保持了稳健的性能。

更新时间: 2024-05-31 16:34:11

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.20988v1

Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging

Generative Adversarial Networks (GANs) have high computational costs to train their complex architectures. Throughout the training process, GANs' output is analyzed qualitatively based on the loss and synthetic images' diversity and quality. Based on this qualitative analysis, training is manually halted once the desired synthetic images are generated. By utilizing an early stopping criterion, the computational cost and dependence on manual oversight can be reduced yet impacted by training problems such as mode collapse, non-convergence, and instability. This is particularly prevalent in biomedical imagery, where training problems degrade the diversity and quality of synthetic images, and the high computational cost associated with training makes complex architectures increasingly inaccessible. This work proposes a novel early stopping criteria to quantitatively detect training problems, halt training, and reduce the computational costs associated with synthesizing biomedical images. Firstly, the range of generator and discriminator loss values is investigated to assess whether mode collapse, non-convergence, and instability occur sequentially, concurrently, or interchangeably throughout the training of GANs. Secondly, utilizing these occurrences in conjunction with the Mean Structural Similarity Index (MS-SSIM) and Fr\'echet Inception Distance (FID) scores of synthetic images forms the basis of the proposed early stopping criteria. This work helps identify the occurrence of training problems in GANs using low-resource computational cost and reduces training time to generate diversified and high-quality synthetic images.

Updated: 2024-05-31 16:33:20

标题: 生物医学成像中生成对抗网络训练的早停止准则

摘要: 生成对抗网络（GANs）在训练其复杂结构时具有较高的计算成本。在整个训练过程中，基于损失和合成图像的多样性和质量，对GANs的输出进行定性分析。根据这种定性分析，一旦生成所需的合成图像，就会手动停止训练。通过利用早停止标准，可以减少计算成本和对手动监督的依赖性，但受到训练问题（如模态崩塌、不收敛和不稳定性）的影响。这在生物医学图像中特别普遍，训练问题会降低合成图像的多样性和质量，并且与训练相关的高计算成本使得复杂结构日益难以访问。本文提出了一种新颖的早停止标准，用于定量检测训练问题，停止训练，并减少合成生物医学图像的计算成本。首先，调查生成器和鉴别器损失值的范围，以评估模态崩塌、不收敛和不稳定性是否在GANs的训练过程中顺序发生、同时发生或交替发生。其次，利用这些发生情况结合合成图像的均值结构相似性指数（MS-SSIM）和Frechet Inception Distance（FID）分数，形成了所提出的早停止标准的基础。本研究有助于使用低资源计算成本识别GANs中的训练问题，并减少训练时间以生成多样化和高质量的合成图像。

更新时间: 2024-05-31 16:33:20

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.20987v1

Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks

The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This paper introduces a benchmark for predictive uncertainty quantification in BEV segmentation. The benchmark assesses various approaches across three popular datasets using two representative backbones and focuses on the effectiveness of predicted uncertainty in identifying misclassified and out-of-distribution (OOD) pixels, as well as calibration. Empirical findings highlight the challenges in uncertainty quantification. Our results find that evidential deep learning based approaches show the most promise by efficiently quantifying aleatoric and epistemic uncertainty. We propose the Uncertainty-Focal-Cross-Entropy (UFCE) loss, designed for highly imbalanced data, which consistently improves the segmentation quality and calibration. Additionally, we introduce a vacuity-scaled regularization term that enhances the model's focus on high uncertainty pixels, improving epistemic uncertainty quantification.

Updated: 2024-05-31 16:32:46

标题: 鸟瞰视角语义分割的不确定性量化：方法和基准

摘要: 将来自多个传感器的原始特征融合在一起，为自动驾驶车辆创建鸟瞰图（BEV）表示对规划和控制系统至关重要。越来越多的人对使用深度学习模型进行BEV语义分割感兴趣。预测分割错误并改善DNN的可解释性对于自动驾驶至关重要，但研究不足。本文介绍了一种用于BEV分割中预测不确定性量化的基准。该基准评估了在三个流行数据集上使用两种代表性骨干的各种方法，并侧重于预测不确定性在识别错误分类和超出分布（OOD）像素以及校准方面的有效性。实证研究结果突出了不确定性量化中的挑战。我们的结果发现，基于证据的深度学习方法通过有效量化随机性和认知不确定性显示出最有希望的前景。我们提出了Uncertainty-Focal-Cross-Entropy（UFCE）损失，针对高度不平衡的数据设计，始终提高了分割质量和校准。此外，我们引入了一个量度缩放的正则化项，增强了模型对高不确定性像素的关注，改善了认知不确定性的量化。

更新时间: 2024-05-31 16:32:46

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20986v1

The Structure and Dynamics of Knowledge Graphs, with Superficiality

Large knowledge graphs combine human knowledge garnered from projects ranging from academia and institutions to enterprises and crowdsourcing. Within such graphs, each relationship between two nodes represents a basic fact involving these two entities. The diversity of the semantics of relationships constitutes the richness of knowledge graphs, leading to the emergence of singular topologies, sometimes chaotic in appearance. However, this complex characteristic can be modeled in a simple way by introducing the concept of superficiality, which controls the overlap between relationships whose facts are generated independently. With this model, superficiality also regulates the balance of the global distribution of knowledge by determining the proportion of misdescribed entities. This is the first model for the structure and dynamics of knowledge graphs. It leads to a better understanding of formal knowledge acquisition and organization.

Updated: 2024-05-31 16:32:44

标题: 知识图谱的结构和动态，以及表面性

摘要: 大型知识图谱结合了从学术和机构到企业和众包项目中获得的人类知识。在这样的图谱中，两个节点之间的每个关系代表涉及这两个实体的基本事实。关系的语义多样性构成了知识图谱的丰富性，导致独特拓扑的出现，有时外观混乱。然而，这种复杂特征可以通过引入表面性的概念以简单的方式建模，该概念控制了由独立生成的事实产生的关系之间的重叠。通过这种模型，表面性还通过确定被错误描述的实体的比例来调节全局知识分布的平衡。这是知识图谱结构和动态的第一个模型。它有助于更好地理解正式知识获取和组织。

更新时间: 2024-05-31 16:32:44

领域: cs.AI

下载: http://arxiv.org/abs/2305.08116v3

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

Updated: 2024-05-31 16:31:07

标题: 贝叶斯设计原则用于离线到在线强化学习

摘要: 离线强化学习（RL）对于探索成本高昂或不安全的实际应用至关重要。然而，离线学习的策略通常是次优的，需要进一步进行在线微调。在本文中，我们解决了离线到在线微调的基本困境：如果代理保持悲观态度，可能无法学习到更好的策略，而如果直接变得乐观，性能可能会突然下降。我们展示了贝叶斯设计原则在解决这种困境中的重要性。代理不应采用乐观或悲观的策略，而应该根据其对最佳策略的信念来行动。这样的概率匹配代理可以避免性能突然下降，同时仍然保证找到最佳策略。基于我们的理论发现，我们介绍了一种新算法，在各种基准测试中优于现有方法，展示了我们方法的有效性。总体而言，所提出的方法为离线到在线强化学习提供了一种新视角，有潜力从离线数据中实现更有效的学习。

更新时间: 2024-05-31 16:31:07

领域: cs.LG

下载: http://arxiv.org/abs/2405.20984v1

Open Ad Hoc Teamwork with Cooperative Game Theory

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation for OAHT, from the perspective of cooperative game theory, and validate its learning paradigm. Building on our theory, we propose a novel algorithm named CIAO, compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.

Updated: 2024-05-31 16:28:10

标题: 用合作博弈论实现开放的临时团队合作

摘要: Ad hoc团队合作提出了一个具有挑战性的问题，需要设计一个代理与队友协作，而无需事先协调或联合培训。开放式ad hoc团队合作（OAHT）通过考虑具有不断变化的队友数量的环境，即开放式团队，进一步复杂化了这一挑战。实践中解决这一问题的一个有前途的解决方案是利用图神经网络的泛化能力来处理无限数量的代理，称为基于图的策略学习（GPL）。然而，它在协调图上的联合Q值表示缺乏令人信服的解释。在本文中，我们建立了一个新理论来理解从合作博弈论的角度理解OAHT的联合Q值表示，并验证了其学习范式。在我们的理论基础上，我们提出了一种名为CIAO的新算法，与GPL框架兼容，并具有额外可证明的实现技巧，可以促进学习。实验结果的演示可在https://sites.google.com/view/ciao2024上找到，实验代码已发布在https://github.com/hsvgbkhgbv/CIAO。

更新时间: 2024-05-31 16:28:10

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2402.15259v3

Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e.g., various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions. Our code is available at https://github.com/xqlin98/INSTINCT.

Updated: 2024-05-31 16:27:53

标题: 使用您的本能：利用神经臂带与变压器耦合的LLM指令优化

摘要: 大型语言模型（LLMs）展示了出色的指令遵循能力，并在各种应用中取得了令人印象深刻的性能。然而，LLMs的性能严重依赖于给予它们的指令，这些指令通常需要通过大量人力调整。最近的工作使用了查询效率高的贝叶斯优化（BO）算法来自动优化给予黑盒LLMs的指令。然而，当优化高度复杂（例如高维度）的目标函数时，如将指令映射到LLMs性能的函数时，BO通常表现不佳。这主要是由于BO使用的高斯过程（GP）的有限表达能力。与此同时，已经反复证明神经网络（NNs），特别是预训练的transformers，具有强大的表达能力并且能够建模高度复杂的函数。因此，我们采用了一个神经bandit算法，将BO中的GP替换为NN代理，以优化黑盒LLMs的指令。更重要的是，神经bandit算法允许我们自然地将NN代理与预训练transformer学习的隐藏表示耦合起来（即一个开源LLM），从而显著提高了性能。这些动机促使我们提出了我们的INSTruction optimization usIng Neural bandits Coupled with Transformers（INSTINCT）算法。我们对ChatGPT进行指令优化，并进行了大量实验，结果显示INSTINCT在不同任务中始终优于基线，例如各种指令归纳任务以及改进零样本思维链指令的任务。我们的代码可在https://github.com/xqlin98/INSTINCT找到。

更新时间: 2024-05-31 16:27:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.02905v2

Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional Limits

Transthoracic Echocardiography (TTE) is a fundamental, non-invasive diagnostic tool in cardiovascular medicine, enabling detailed visualization of cardiac structures crucial for diagnosing various heart conditions. Despite its widespread use, TTE ultrasound imaging faces inherent limitations, notably the trade-off between field of view (FoV) and resolution. This paper introduces a novel application of conditional Generative Adversarial Networks (cGANs), specifically designed to extend the FoV in TTE ultrasound imaging while maintaining high resolution. Our proposed cGAN architecture, termed echoGAN, demonstrates the capability to generate realistic anatomical structures through outpainting, effectively broadening the viewable area in medical imaging. This advancement has the potential to enhance both automatic and manual ultrasound navigation, offering a more comprehensive view that could significantly reduce the learning curve associated with ultrasound imaging and aid in more accurate diagnoses. The results confirm that echoGAN reliably reproduce detailed cardiac features, thereby promising a significant step forward in the field of non-invasive cardiac naviagation and diagnostics.

Updated: 2024-05-31 16:26:30

标题: 对抗生成网络在超声成像中的应用：将视野扩展至传统极限之外

摘要: 经胸超声心动图（TTE）是心血管医学中的基本非侵入性诊断工具，能够详细显示心脏结构，对于诊断各种心脏疾病至关重要。尽管TTE超声成像被广泛应用，但面临固有限制，尤其是视野（FoV）和分辨率之间的权衡。本文介绍了一种新的条件生成对抗网络（cGANs）应用，专门设计用于扩展TTE超声成像中的FoV，同时保持高分辨率。我们提出的cGAN架构，称为echoGAN，展示了通过外部绘制生成逼真解剖结构的能力，有效扩大了医学成像中可视的区域。这一进步有望提升自动和手动超声导航，提供更全面的视图，可以显著减少与超声成像相关的学习曲线，并有助于更准确的诊断。结果证实，echoGAN可可靠地再现详细的心脏特征，从而在非侵入性心脏导航和诊断领域迈出了重要的一步。

更新时间: 2024-05-31 16:26:30

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.20981v1

Neural Gaussian Scale-Space Fields

Gaussian scale spaces are a cornerstone of signal representation and processing, with applications in filtering, multiscale analysis, anti-aliasing, and many more. However, obtaining such a scale space is costly and cumbersome, in particular for continuous representations such as neural fields. We present an efficient and lightweight method to learn the fully continuous, anisotropic Gaussian scale space of an arbitrary signal. Based on Fourier feature modulation and Lipschitz bounding, our approach is trained self-supervised, i.e., training does not require any manual filtering. Our neural Gaussian scale-space fields faithfully capture multiscale representations across a broad range of modalities, and support a diverse set of applications. These include images, geometry, light-stage data, texture anti-aliasing, and multiscale optimization.

Updated: 2024-05-31 16:26:08

标题: 神经高斯尺度空间场

摘要: 高斯尺度空间是信号表示和处理的基石，应用于滤波、多尺度分析、抗混叠等领域。然而，获取这样的尺度空间是昂贵且繁琐的，特别是对于连续表示如神经场。我们提出了一种高效且轻量级的方法，用于学习任意信号的完全连续、各向异性的高斯尺度空间。基于傅里叶特征调制和利普希茨边界，我们的方法是自监督训练的，即训练不需要任何手动滤波。我们的神经高斯尺度空间场忠实地捕捉了各种模态的多尺度表示，并支持多种应用，包括图像、几何、光场数据、纹理抗混叠和多尺度优化。

更新时间: 2024-05-31 16:26:08

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2405.20980v1

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.

Updated: 2024-05-31 16:24:53

标题: 通过自适应对抗训练提升检索增强语言模型的噪声鲁棒性

摘要: 大型语言模型（LLMs）展示了相当大的能力，但也面临挑战，包括幻觉、过时知识和无法追踪的推理过程。检索增强生成（RAG）已经成为一个有前途的解决方案，将外部数据库中的知识整合起来，以缓解这些挑战。然而，不当的检索到的段落可能会潜在地阻碍LLMs生成全面和高质量的响应的能力。先前关于检索噪音鲁棒性的RAG研究通常局限于一组有限的噪音类型，偏离了现实世界的检索环境，限制了实际应用性。在这项研究中，我们首先调查检索噪音，并将其分类为三种不同类型，反映现实世界的环境。我们分析了这些不同检索噪音对LLMs鲁棒性的影响。随后，我们提出了一种名为检索增强自适应对抗训练（RAAT）的新型RAG方法。RAAT利用自适应对抗训练来动态调整模型的训练过程，以应对检索噪音。同时，它采用多任务学习来确保模型内部识别嘈杂上下文的能力。大量实验表明，使用RAAT训练的LLaMA-27B模型在各种噪音条件下在F1和EM得分方面都取得了显著进展。为了可重现性，我们在https://github.com/calubkk/RAAT上发布了我们的代码和数据。

更新时间: 2024-05-31 16:24:53

领域: cs.AI

下载: http://arxiv.org/abs/2405.20978v1

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

In Federated Learning (FL), a set of clients collaboratively train a machine learning model (called global model) without sharing their local training data. The local training data of clients is typically non-i.i.d. and heterogeneous, resulting in varying contributions from individual clients to the final performance of the global model. In response, many contribution evaluation methods were proposed, where the server could evaluate the contribution made by each client and incentivize the high-contributing clients to sustain their long-term participation in FL. Existing studies mainly focus on developing new metrics or algorithms to better measure the contribution of each client. However, the security of contribution evaluation methods of FL operating in adversarial environments is largely unexplored. In this paper, we propose the first model poisoning attack on contribution evaluation methods in FL, termed ACE. Specifically, we show that any malicious client utilizing ACE could manipulate the parameters of its local model such that it is evaluated to have a high contribution by the server, even when its local training data is indeed of low quality. We perform both theoretical analysis and empirical evaluations of ACE. Theoretically, we show our design of ACE can effectively boost the malicious client's perceived contribution when the server employs the widely-used cosine distance metric to measure contribution. Empirically, our results show ACE effectively and efficiently deceive five state-of-the-art contribution evaluation methods. In addition, ACE preserves the accuracy of the final global models on testing inputs. We also explore six countermeasures to defend ACE. Our results show they are inadequate to thwart ACE, highlighting the urgent need for new defenses to safeguard the contribution evaluation methods in FL.

Updated: 2024-05-31 16:21:55

标题: ACE：联邦学习中贡献评估方法的模型毒化攻击

摘要: 在联合学习（FL）中，一组客户端合作训练一个机器学习模型（称为全局模型），而无需共享他们的本地训练数据。客户端的本地训练数据通常是非独立同分布的和异构的，导致个体客户端对全局模型最终性能的贡献不同。为此，许多贡献评估方法被提出，服务器可以评估每个客户端的贡献，并激励高贡献的客户端持续参与FL。现有研究主要集中在开发新的指标或算法来更好地衡量每个客户端的贡献。然而，在对抗环境中运行的FL的贡献评估方法的安全性还未得到深入探讨。本文提出了FL中贡献评估方法的第一个模型毒化攻击，称为ACE。具体地，我们展示任何恶意客户端利用ACE都可以操纵其本地模型的参数，使其被服务器评估为高贡献，即使其本地训练数据确实质量低劣。我们对ACE进行了理论分析和实证评估。理论上，我们展示了ACE的设计可以在服务器采用广泛使用的余弦距离度量来衡量贡献时有效地提升恶意客户端的感知贡献。实证上，我们的结果表明ACE有效且高效地欺骗了五种最先进的贡献评估方法。此外，ACE保留了最终全局模型在测试输入上的准确性。我们还探讨了六种抵抗ACE的对策。我们的结果表明它们不足以阻止ACE，突显了迫切需要新的防御措施来保护FL中的贡献评估方法。

更新时间: 2024-05-31 16:21:55

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20975v1

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at \url{https://github.com/xu1868/SaySelf}.

Updated: 2024-05-31 16:21:16

标题: SaySelf：教授LLMs使用自我反思的理据表达自信

摘要: 大型语言模型（LLMs）通常会生成不准确或虚构的信息，并且通常无法表明它们的置信度，这限制了它们的广泛应用。先前的研究通过直接或自一致提示，或构建特定数据集进行监督微调，从LLMs中引出置信度。基于提示的方法表现较差，而基于训练的方法仅限于二进制或不准确的组级置信度估计。在这项工作中，我们提出了先进的SaySelf，一个培训框架，教导LLMs表达更准确的细粒度置信度估计。除了置信度分数外，SaySelf还启动了指导LLMs产生自我反思理由的过程，清晰地识别其参数知识中的差距并解释其不确定性。这是通过使用LLMs自动总结特定知识中的不确定性来实现的，通过自然语言进行。总结是基于对多个采样推理链中的不一致性的分析，产生的数据被用于监督微调。此外，我们利用精心设计的奖励函数进行强化学习，来校准置信度估计，激励LLMs提供准确、高置信度的预测，并惩罚错误输出中的自信过度。在分布内和分布外数据集上的实验结果表明，SaySelf在减少置信度校准误差和保持任务性能方面的有效性。我们展示生成的自我反思理由是合理的，并可以进一步为校准做出贡献。代码已在\url{https://github.com/xu1868/SaySelf}上公开。

更新时间: 2024-05-31 16:21:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20974v1

LCQ: Low-Rank Codebook based Quantization for Large Language Models

Large language models~(LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization~(LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

Updated: 2024-05-31 16:21:05

标题: LCQ：基于低秩码本的大型语言模型量化

摘要: 大型语言模型（LLMs）最近在许多任务中展现出了令人期待的性能。然而，LLMs的高存储和计算成本已成为部署LLMs的挑战。权重量化已被广泛用于模型压缩，可以同时减少存储和计算成本。大多数现有的LLMs权重量化方法使用基于秩-1的码书进行量化，当压缩比较高时会导致显著的准确性损失。在本文中，我们提出了一种新颖的权重量化方法，称为基于低秩码书的量化（LCQ），用于LLMs。LCQ采用一个低秩码书，其秩可以大于1，用于量化。实验证明，LCQ可以在几乎没有额外存储成本的情况下实现比现有方法更好的准确性。

更新时间: 2024-05-31 16:21:05

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.20973v1

Amortizing intractable inference in diffusion models for vision, language, and control

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.

Updated: 2024-05-31 16:18:46

标题: 扩散模型中难以处理的推理的摊销在视觉、语言和控制中

摘要: 扩散模型已经成为视觉、语言和强化学习中有效的分布估计器，但是它们作为先验在下游任务中的使用会引发一个棘手的后验推断问题。本文研究了在一个模型中对数据后验进行摊销取样，$\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$，该模型由扩散生成模型先验$p(\mathbf{x})$和一个黑盒约束或似然函数$r(\mathbf{x})$组成。我们陈述并证明了一种无数据学习目标的渐近正确性，相对轨迹平衡，用于训练从该后验中采样的扩散模型，这是现有方法仅在近似或受限情况下解决的问题。相对轨迹平衡源自对扩散模型的生成流网络视角，这允许使用深度强化学习技术来改善模式覆盖。实验展示了在扩散先验下对任意后验进行无偏推断的广泛潜力：在视觉领域（分类器引导）、语言领域（在离散扩散LLM下进行填充）和多模态数据领域（文本到图像生成）。除了生成建模，我们将相对轨迹平衡应用于具有基于得分的行为先验的连续控制问题，实现了在线强化学习基准测试中的最新结果。

更新时间: 2024-05-31 16:18:46

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20971v1

PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with asymmetric loss (PUAL), by introducing a structure of asymmetric loss on positive instances into the objective function of the global and local learning classifier. Then we develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary. We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data.

Updated: 2024-05-31 16:18:06

标题: PUAL：一种三分叉正-未标记数据的分类器

摘要: 正例-无标签（PU）学习旨在使用仅包含标记为正例和无标签实例的数据来训练分类器。然而，现有的PU学习方法通常很难在三分数据上实现令人满意的性能，其中正例分布在负例的两侧。为了解决这个问题，首先我们提出了一个带有不对称损失（PUAL）的PU分类器，通过在全局和局部学习分类器的目标函数中引入正例上的不对称损失结构。然后我们开发了一个基于核的算法，使PUAL能够获得非线性决策边界。我们通过对模拟和真实世界数据集的实验表明，PUAL能够在三分数据上实现令人满意的分类效果。

更新时间: 2024-05-31 16:18:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.20970v1

A new multivariate primitive from CCZ equivalence

Multivariate Cryptography is one of the main candidates for Post-quantum Cryptography. Multivariate schemes are usually constructed by applying two secret affine invertible transformations $\mathcal S,\mathcal T$ to a set of multivariate polynomials $\mathcal{F}$ (often quadratic). The secret polynomials $\mathcal{F}$ posses a trapdoor that allows the legitimate user to find a solution of the corresponding system, while the public polynomials $\mathcal G=\mathcal S\circ\mathcal F\circ\mathcal T$ look like random polynomials. The polynomials $\mathcal G$ and $\mathcal F$ are said to be affine equivalent. In this article, we present a more general way of constructing a multivariate scheme by considering the CCZ equivalence, which has been introduced and studied in the context of vectorial Boolean functions.

Updated: 2024-05-31 16:15:02

标题: CCZ等价性中的新的多变量原始元素

摘要: 多变量密码学是后量子密码学的主要候选方案之一。多变量方案通常是通过将两个秘密可逆仿射变换$\mathcal S,\mathcal T$ 应用于一组多项式$\mathcal{F}$（通常是二次的）来构建的。秘密多项式$\mathcal{F}$ 具有一个陷门，允许合法用户找到相应系统的解，而公共多项式$\mathcal G=\mathcal S\circ\mathcal F\circ\mathcal T$ 看起来像随机多项式。多项式$\mathcal G$ 和$\mathcal F$ 被称为仿射等价。在本文中，我们提出了一种更一般的构建多变量方案的方法，通过考虑CCZ等价性，这在向量布尔函数的背景下已被引入和研究。

更新时间: 2024-05-31 16:15:02

领域: cs.CR

下载: http://arxiv.org/abs/2405.20968v1

Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation

Despite its benefits for tumour detection and treatment, the administration of contrast agents in dynamic contrast-enhanced MRI (DCE-MRI) is associated with a range of issues, including their invasiveness, bioaccumulation, and a risk of nephrogenic systemic fibrosis. This study explores the feasibility of producing synthetic contrast enhancements by translating pre-contrast T1-weighted fat-saturated breast MRI to their corresponding first DCE-MRI sequence leveraging the capabilities of a generative adversarial network (GAN). Additionally, we introduce a Scaled Aggregate Measure (SAMe) designed for quantitatively evaluating the quality of synthetic data in a principled manner and serving as a basis for selecting the optimal generative model. We assess the generated DCE-MRI data using quantitative image quality metrics and apply them to the downstream task of 3D breast tumour segmentation. Our results highlight the potential of post-contrast DCE-MRI synthesis in enhancing the robustness of breast tumour segmentation models via data augmentation. Our code is available at https://github.com/RichardObi/pre_post_synthesis.

Updated: 2024-05-31 16:15:01

标题: Pre- to Post-Contrast乳腺MRI合成以增强肿瘤分割

摘要: 尽管动态增强MRI（DCE-MRI）在肿瘤检测和治疗方面具有益处，但对比剂的使用涉及一系列问题，包括侵入性、生物积聚和肾源性系统纤维化风险。本研究探讨了通过将术前对比剂T1加权脂肪饱和乳腺MRI转换为相应的第一次DCE-MRI序列，利用生成对抗网络（GAN）的能力来生成合成对比增强的可行性。此外，我们引入了一种名为缩放聚合度量（SAMe）的方法，旨在以原则方式定量评估合成数据的质量，并作为选择最佳生成模型的基础。我们使用定量图像质量指标评估生成的DCE-MRI数据，并将其应用于下游任务的3D乳腺肿瘤分割。我们的结果突显了术后对比增强MRI合成在通过数据增强提升乳腺肿瘤分割模型的鲁棒性方面的潜力。我们的代码可在https://github.com/RichardObi/pre_post_synthesis 上找到。

更新时间: 2024-05-31 16:15:01

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.10879v3

Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation

Continual learning is inherently a constrained learning problem. The goal is to learn a predictor under a no-forgetting requirement. Although several prior studies formulate it as such, they do not solve the constrained problem explicitly. In this work, we show that it is both possible and beneficial to undertake the constrained optimization problem directly. To do this, we leverage recent results in constrained learning through Lagrangian duality. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. In this setting, we analyze two versions of the continual learning problem: a coarse approach with constraints at the task level and a fine approach with constraints at the sample level. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations. We then leverage this result to partition the buffer in the coarse approach, allocating more resources to harder tasks, and to populate the buffer in the fine approach, including only impactful samples. We derive a deviation bound on dual variables as sensitivity indicators, and empirically corroborate this result in diverse continual learning benchmarks. We also discuss the limitations of these methods with respect to the amount of memory available and the expressiveness of the parametrization.

Updated: 2024-05-31 16:11:27

标题: 原始对偶持续学习：通过自适应内存分配平衡稳定性和可塑性

摘要: 持续学习本质上是一个受限制的学习问题。目标是在无遗忘要求下学习一个预测器。尽管先前的几项研究将其作为这样的问题来表述，但它们并未明确解决受限制的问题。在这项工作中，我们展示了直接进行受限制的优化问题是可能且有益的。为此，我们利用了最近在通过拉格朗日对偶进行受限制学习方面的结果。我们专注于基于记忆的方法，其中来自先前任务的少量样本可以存储在重放缓冲区中。在这种情况下，我们分析了持续学习问题的两个版本：一种粗略方法，其中在任务级别设置约束，另一种细致方法，其中在样本级别设置约束。我们展示了对偶变量指示了持续学习问题的最优值对约束扰动的敏感性。然后，我们利用这一结果在粗略方法中对缓冲区进行分区，为更难的任务分配更多资源，并在细致方法中填充缓冲区，仅包含有影响的样本。我们推导了对偶变量作为敏感性指标的偏差界限，并在不同的持续学习基准测试中通过实证证实了这一结果。我们还讨论了这些方法在可用内存量和参数化的表达能力方面的限制。

更新时间: 2024-05-31 16:11:27

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2310.00154v2

Multi-Agent Reinforcement Learning for Offloading Cellular Communications with Cooperating UAVs

Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization.

Updated: 2024-05-31 16:10:28

标题: 多智能体强化学习用于卸载与合作无人机的蜂窝通信

摘要: 在陆地蜂窝网络中，智能数据收集的有效解决方案至关重要，特别是在物联网应用的背景下。陆地基站的有限频谱和覆盖范围给网络用户不断增长的数据速率需求带来挑战。无人机以其高灵活性、机动性和灵活性而闻名，为从陆地基站卸载数据流量提供了一种替代手段，充当额外的接入点。本文介绍了一种新颖的方法，有效地最大化多个无人机用于从陆地基站卸载数据流量。具体而言，重点是通过在服务质量约束下联合优化无人机轨迹和用户关联指标，最大化用户与无人机的关联。由于所制定的无人机控制问题是非凸和组合的，本研究利用多智能体强化学习框架。在这个框架中，每个无人机作为独立的代理，旨在保持无人机之间的合作行为。所提出的方法利用有限状态马尔可夫决策过程考虑无人机速度约束和它们轨迹之间的关系和状态空间。提出了一种低复杂度分布式状态行为奖励状态行为算法，用于确定无人机在训练周期内的最优顺序决策策略。广泛的仿真结果验证了所提出的分析，并为最优无人机轨迹提供了宝贵的见解。得出的轨迹表现出比Q学习和粒子群优化等基准技术更优越的平均无人机关联性能。

更新时间: 2024-05-31 16:10:28

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2402.02957v2

eXponential FAmily Dynamical Systems (XFADS): Large-scale nonlinear Gaussian state-space modeling

State-space graphical models and the variational autoencoder framework provide a principled apparatus for learning dynamical systems from data. State-of-the-art probabilistic approaches are often able to scale to large problems at the cost of flexibility of the variational posterior or expressivity of the dynamics model. However, those consolidations can be detrimental if the ultimate goal is to learn a generative model capable of explaining the spatiotemporal structure of the data and making accurate forecasts. We introduce a low-rank structured variational autoencoding framework for nonlinear Gaussian state-space graphical models capable of capturing dense covariance structures that are important for learning dynamical systems with predictive capabilities. Our inference algorithm exploits the covariance structures that arise naturally from sample based approximate Gaussian message passing and low-rank amortized posterior updates -- effectively performing approximate variational smoothing with time complexity scaling linearly in the state dimensionality. In comparisons with other deep state-space model architectures our approach consistently demonstrates the ability to learn a more predictive generative model. Furthermore, when applied to neural physiological recordings, our approach is able to learn a dynamical system capable of forecasting population spiking and behavioral correlates from a small portion of single trials.

Updated: 2024-05-31 16:05:37

标题: 指数家庭动力系统（XFADS）：大规模非线性高斯状态空间建模

摘要: 状态空间图模型和变分自动编码器框架为从数据中学习动态系统提供了一个有原则的工具。最先进的概率方法通常能够扩展到大问题，但代价是变分后验的灵活性或动力学模型的表达能力。然而，如果最终目标是学习一个能够解释数据的时空结构并进行准确预测的生成模型，这些整合可能会有害。我们引入了一个用于非线性高斯状态空间图模型的低秩结构化变分自动编码框架，能够捕捉对学习具有预测能力的动态系统至关重要的密集协方差结构。我们的推断算法利用从基于样本的近似高斯消息传递和低秩摊销后验更新中自然产生的协方差结构，有效地执行具有时间复杂度线性缩放的近似变分平滑。与其他深度状态空间模型架构相比，我们的方法始终表现出学习更具预测性的生成模型的能力。此外，当应用于神经生理记录时，我们的方法能够学习一个能够从单个试验的小部分预测种群尖峰和行为相关性的动态系统。

更新时间: 2024-05-31 16:05:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.01371v2

Navigating Tabular Data Synthesis Research: Understanding User Needs and Tool Capabilities

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (iv) complex data distributions, as well as preserving (i) column correlations, (ii) temporal dependencies, and (iii) integrity constraints (e.g., functional dependencies) present in the original dataset. While substantial progress has been made recently in the context of generational models, there is no one-size-fits-all solution for tabular data today, and choosing the right tool for a given task is therefore no trivial task. In this paper, we survey the state of the art in Tabular Data Synthesis (TDS), examine the needs of users by defining a set of functional and non-functional requirements, and compile the challenges associated with meeting those needs. In addition, we evaluate the reported performance of 36 popular research TDS tools about these requirements and develop a decision guide to help users find suitable TDS tools for their applications. The resulting decision guide also identifies significant research gaps.

Updated: 2024-05-31 16:00:43

标题: 导航表格数据综合研究：理解用户需求与工具能力

摘要: 在一个快速发展的数据驱动应用的时代，对于数据在研究和实践中的需求不断增长。当没有真实数据可用时（例如，由于隐私法规），合成数据已经成为一种替代方案。合成表格数据呈现出独特和复杂的挑战，尤其是处理（i）缺失值，（ii）数据集不平衡，（iii）多样的列类型，以及（iv）复杂的数据分布，同时保留（i）列之间的相关性，（ii）时间依赖性，和（iii）原始数据集中存在的完整性约束（例如，功能依赖）。尽管最近在生成模型的背景下取得了实质性进展，但目前并没有适用于所有表格数据的解决方案，因此选择适合特定任务的正确工具并非易事。在本文中，我们调查了表格数据合成（TDS）的最新技术，通过定义一组功能和非功能需求来审视用户的需求，并概述了满足这些需求所面临的挑战。此外，我们评估了36种流行的研究TDS工具在这些需求方面的表现，并制定了一个决策指南，以帮助用户找到适用于他们应用的合适的TDS工具。最终的决策指南还确定了重要的研究空白。

更新时间: 2024-05-31 16:00:43

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2405.20959v1

Not Just Novelty: A Longitudinal Study on Utility and Customization of an AI Workflow

Generative AI brings novel and impressive abilities to help people in everyday tasks. There are many AI workflows that solve real and complex problems by chaining AI outputs together with human interaction. Although there is an undeniable lure of AI, it is uncertain how useful generative AI workflows are after the novelty wears off. Additionally, workflows built with generative AI have the potential to be easily customized to fit users' individual needs, but do users take advantage of this? We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication. Our study revealed that there exists a familiarization phase, during which users were exploring the novel capabilities of the workflow and discovering which aspects they found useful. After this phase, users understood the workflow and were able to anticipate the outputs. Surprisingly, after familiarization the perceived utility of the system was rated higher than before, indicating that the perceived utility of AI is not just a novelty effect. The increase in benefits mainly comes from end-users' ability to customize prompts, and thus potentially appropriate the system to their own needs. This points to a future where generative AI systems can allow us to design for appropriation.

Updated: 2024-05-31 16:00:05

标题: 不仅仅是新颖性：关于AI工作流程的效用和定制的纵向研究

摘要: 生成式人工智能为人们在日常任务中提供了新颖和令人印象深刻的能力。有许多人工智能工作流程通过将人工智能输出与人类互动相结合来解决真实而复杂的问题。尽管人工智能具有不可抗拒的吸引力，但一旦新奇感消退，生成式人工智能工作流程的实用性尚不确定。此外，使用生成式人工智能构建的工作流程有望轻松定制以满足用户个体需求，但用户是否能充分利用这一点呢？我们进行了为期三周的纵向研究，与12名用户一起了解生成式人工智能工具在科学传播中的熟悉和定制情况。我们的研究揭示了存在一个熟悉化阶段，在此阶段用户正在探索工作流程的新功能，并发现哪些方面他们认为有用。在此阶段之后，用户理解了工作流程，并能够预期输出结果。令人惊讶的是，在熟悉化之后，系统的感知效用被评定为比之前更高，表明人工智能的感知效用并不仅仅是一种新奇效应。收益的增加主要来自终端用户定制提示的能力，因此可能将系统适应到他们自己的需求中。这指向了一个未来，生成式人工智能系统可以让我们设计适用性。

更新时间: 2024-05-31 16:00:05

领域: cs.HC,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2402.09894v2

Online Cascade Learning for Efficient Inference over Streams

Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first approach to address this challenge. The objective here is to learn a "cascade" of models, starting with lower-capacity models (such as logistic regression) and ending with a powerful LLM, along with a deferral policy that determines the model to be used on a given input. We formulate the task of learning cascades online as an imitation-learning problem, where smaller models are updated over time imitating the collected LLM demonstrations, and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing.

Updated: 2024-05-31 15:59:34

标题: 在线级联学习用于流数据高效推理

摘要: 大型语言模型（LLMs）在回答有关数据流的复杂查询中具有自然的作用，但LLM推理的高计算成本使它们在许多此类任务中不可行。我们提出了在线级联学习，这是第一个解决这一挑战的方法。这里的目标是学习一个“级联”模型，从较低容量的模型（如逻辑回归）开始，以强大的LLM结束，并配有确定在给定输入上使用的模型的延期政策。我们将在线学习级联任务形式化为一个模仿学习问题，其中较小的模型随着时间的推移更新，模仿收集的LLM演示，并为该问题提供了一个无悔算法。在四个基准测试中的实验结果显示，我们的方法在准确性方面与LLMs相媲美，同时将推理成本降低了高达90％，并且对输入分布变化具有很强的鲁棒性，突显了它在流处理中的功效和适应性。

更新时间: 2024-05-31 15:59:34

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.04513v2

A Robot Walks into a Bar: Can Language Models Serve asCreativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians

We interviewed twenty professional comedians who perform live shows in front of audiences and who use artificial intelligence in their artistic process as part of 3-hour workshops on ``AI x Comedy'' conducted at the Edinburgh Festival Fringe in August 2023 and online. The workshop consisted of a comedy writing session with large language models (LLMs), a human-computer interaction questionnaire to assess the Creativity Support Index of AI as a writing tool, and a focus group interrogating the comedians' motivations for and processes of using AI, as well as their ethical concerns about bias, censorship and copyright. Participants noted that existing moderation strategies used in safety filtering and instruction-tuned LLMs reinforced hegemonic viewpoints by erasing minority groups and their perspectives, and qualified this as a form of censorship. At the same time, most participants felt the LLMs did not succeed as a creativity support tool, by producing bland and biased comedy tropes, akin to ``cruise ship comedy material from the 1950s, but a bit less racist''. Our work extends scholarship about the subtle difference between, one the one hand, harmful speech, and on the other hand, ``offensive'' language as a practice of resistance, satire and ``punching up''. We also interrogate the global value alignment behind such language models, and discuss the importance of community-based value alignment and data ownership to build AI tools that better suit artists' needs.

Updated: 2024-05-31 15:55:51

标题: 一个机器人走进酒吧：语言模型可以作为创意支持工具为喜剧吗？对语言模型的幽默与喜剧演员的评估

摘要: 我们采访了二十位在观众面前表演现场节目并在艺术过程中使用人工智能的专业喜剧演员，这是在2023年8月在爱丁堡艺术节期间和线上进行的一场名为“AI x Comedy”的3小时研讨会的一部分。研讨会包括与大型语言模型（LLMs）进行喜剧写作会话，进行人机交互问卷调查以评估AI作为写作工具的创造力支持指数，以及进行焦点小组讨论，探讨喜剧演员使用AI的动机和过程，以及他们对偏见、审查和版权的道德关切。参与者指出，现有的安全过滤和指导调整的LLMs使用的审查策略通过抹去少数群体及其观点来加强霸权观点，并将其视为一种审查形式。与此同时，大多数参与者认为LLMs并没有成功作为一种创造性支持工具，因为它们产生了乏味和偏见的喜剧模式，类似于“20世纪50年代的游轮喜剧素材，但稍微少了点种族主义”。我们的工作扩展了关于有害言论和“冒犯性”语言之间微妙差异的学术研究，一方面是有害言论，另一方面是作为抵抗、讽刺和“向上打击”的语言。我们还探讨了这些语言模型背后的全球价值调整，并讨论了基于社区的价值调整和数据所有权的重要性，以构建更符合艺术家需求的AI工具。

更新时间: 2024-05-31 15:55:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20956v1

Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score

Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_\beta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $\beta$ value in the $F_\beta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_\beta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 \times 2$ binary soft-set confusion matrix to a multiclass $d \times d$ confusion matrix and proposes dynamic adaptation of the threshold value $\tau$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_\beta$ that is a consistent estimator of Macro-$F_\beta$, and our extensive experiments show the practical effectiveness of our approach.

Updated: 2024-05-31 15:54:01

标题: 通过$F_β$-分数将多类神经网络分类器准则与任务性能对齐

摘要: 多类神经网络分类器通常使用交叉熵损失进行训练。在训练之后，使用基于多类混淆矩阵的特定应用度量（例如宏$F_\beta$-得分）评估这个相同神经网络的性能。有人质疑使用交叉熵是否会产生与预期的特定应用性能标准相一致的分类器，特别是在需要强调分类器性能的某个方面的情况下。例如，如果更重视精确度而不是召回率，$F_\beta$评估指标中的$\beta$值可以相应调整，但是在训练过程中，交叉熵目标并不知道这种偏好。我们提出了一种方法来解决多类神经网络分类器的训练-评估差距，使用户可以根据期望的最终$F_\beta$-得分来训练这些模型。在二元分类的先前工作基础上，我们利用软集混淆矩阵的概念和Heaviside阶跃函数的分段线性逼近。我们的方法将$2 \times 2$二元软集混淆矩阵扩展为多类$d \times d$混淆矩阵，并提出了阈值$\tau$的动态调整，该阈值参数化了运行时的分段线性Heaviside逼近。我们提出了一个理论分析，表明我们的方法可以用于优化基于软集的宏$F_\beta$的近似，这是宏$F_\beta$的一致估计，我们的大量实验显示了我们方法的实际有效性。

更新时间: 2024-05-31 15:54:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20954v1

Monte Carlo Tree Search Satellite Scheduling Under Cloud Cover Uncertainty

Efficient utilization of satellite resources in dynamic environments remains a challenging problem in satellite scheduling. This paper addresses the multi-satellite collection scheduling problem (m-SatCSP), aiming to optimize task scheduling over a constellation of satellites under uncertain conditions such as cloud cover. Leveraging Monte Carlo Tree Search (MCTS), a stochastic search algorithm, two versions of MCTS are explored to schedule satellites effectively. Hyperparameter tuning is conducted to optimize the algorithm's performance. Experimental results demonstrate the effectiveness of the MCTS approach, outperforming existing methods in both solution quality and efficiency. Comparative analysis against other scheduling algorithms showcases competitive performance, positioning MCTS as a promising solution for satellite task scheduling in dynamic environments.

Updated: 2024-05-31 15:50:46

标题: 蒙特卡洛树搜索下的云层覆盖不确定性卫星调度

摘要: 在动态环境中有效利用卫星资源仍然是卫星调度中的一个具有挑战性的问题。本文讨论了多卫星集合调度问题（m-SatCSP），旨在优化在诸如云层覆盖等不确定条件下的一组卫星任务调度。利用蒙特卡洛树搜索（MCTS），探讨了两个版本的MCTS以有效地调度卫星。进行了超参数调整以优化算法性能。实验结果表明了MCTS方法的有效性，优于现有方法在解决方案质量和效率方面。与其他调度算法的比较分析展示了竞争性表现，将MCTS定位为动态环境中卫星任务调度的有前途的解决方案。

更新时间: 2024-05-31 15:50:46

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.20951v1

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.

Updated: 2024-05-31 15:48:05

标题: HQ-DiT：高效FP4混合量化的扩散变换器

摘要: 最近，扩散变压器（DiTs）在工业和学术领域引起了广泛关注，因其优越的视觉生成能力而优于使用U-Net的传统扩散模型。然而，DiTs的增强性能也伴随着高参数数量和实现成本，严重限制了它们在资源有限的设备上（如手机）的使用。为了解决这些挑战，我们引入了用于DiT的混合浮点量化（HQ-DiT），这是一种高效的后训练量化方法，利用4位浮点（FP）精度对DiT推断中的权重和激活进行量化。与固定点量化（例如INT8）相比，FP量化结合了我们提出的剪裁范围选择机制，自然地与DiT内的数据分布对齐，从而产生最小的量化误差。此外，HQ-DiT还实现了一种通用的身份数学变换，以减轻异常值引起的严重量化误差。实验结果表明，DiT可以实现极低精度量化（即4位），对性能影响可以忽略不计。我们的方法是DiTs中权重和激活都仅量化为4位的首次实例，在ImageNet上仅增加了0.12的sFID。

更新时间: 2024-05-31 15:48:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19751v2

Perimeter Control with Heterogeneous Metering Rates for Cordon Signals: A Physics-Regularized Multi-Agent Reinforcement Learning Approach

Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by regulating the transfer flow of the Protected Network (PN) based on the Macroscopic Fundamental Diagram (MFD). The uniform metering rate for cordon signals in most existing studies overlooks the variance of local traffic states at the intersection level, which may cause severe local traffic congestion and degradation of the network stability. PC strategies with heterogeneous metering rates for cordon signals allow precise control for the perimeter but the complexity of the problem increases exponentially with the scale of the PN. This paper leverages a Multi-Agent Reinforcement Learning (MARL)-based traffic signal control framework to decompose this PC problem, which considers heterogeneous metering rates for cordon signals, into multi-agent cooperation tasks. Each agent controls an individual signal located in the cordon, decreasing the dimension of action space for the controller compared to centralized methods. A physics regularization approach for the MARL framework is proposed to ensure the distributed cordon signal controllers are aware of the global network state by encoding MFD-based knowledge into the action-value functions of the local agents. The proposed PC strategy is operated as a two-stage system, with a feedback PC strategy detecting the overall traffic state within the PN and then distributing local instructions to cordon signals controllers in the MARL framework via the physics regularization. Through numerical tests with different demand patterns in a microscopic traffic environment, the proposed PC strategy shows promising robustness and transferability. It outperforms state-of-the-art feedback PC strategies in increasing network throughput, decreasing distributed delay for gate links, and reducing carbon emissions.

Updated: 2024-05-31 15:44:52

标题: 使用异质计量速率进行围栏信号的周边控制：一种物理正则化的多智能体强化学习方法

摘要: 周界控制（PC）策略被提出来解决城市道路网络在过饱和情况下通过根据宏观基本图（MFD）调节受保护网络（PN）的传输流量来控制的问题。大多数现有研究中的围栏信号的统一计量率忽视了交叉口级别的局部交通状态的变异性，这可能导致严重的局部交通拥堵和网络稳定性的降低。采用具有异质计量率的围栏信号的PC策略允许对周界进行精确控制，但随着PN规模的增加，问题的复杂性呈指数增长。本文利用基于多智能体强化学习（MARL）的交通信号控制框架来分解这个PC问题，该框架考虑了围栏信号的异质计量率，并将其分解为多智能体协作任务。每个智能体控制围栏中的一个单独信号，相比于集中方法，这降低了控制器的动作空间维度。提出了一种物理正则化方法，以确保分布式围栏信号控制器通过将基于MFD的知识编码到本地智能体的动作值函数中，了解全局网络状态。所提出的PC策略被操作为一个两阶段系统，通过一个反馈PC策略检测PN内的整体交通状态，然后通过物理正则化将本地指令分发给MARL框架中的围栏信号控制器。通过在微观交通环境中进行不同需求模式的数值测试，提出的PC策略展现出有希望的鲁棒性和可转移性。它在提高网络吞吐量、减少门链路的分布式延迟和减少碳排放方面优于最先进的反馈PC策略。

更新时间: 2024-05-31 15:44:52

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2308.12985v2

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, the enhanced safety often come with the side effect of over-refusal, where the LLMs may reject innocuous prompts and become less helpful. Although the issue of over-refusal has been empirically observed, a systematic measurement is challenging due to the difficulty of crafting prompts that appear harmful but are benign. This study proposes a novel method for automatically generating large-scale sets of ``seemingly toxic prompts'' (benign prompts likely rejected by LLMs). Leveraging this technique, we introduce OR-Bench, the first large-scale over-refusal benchmark. OR-Bench comprises 80,000 seemingly toxic prompts across 10 common rejection categories, a subset of around 1,000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses. We then conduct a comprehensive study to measure the over-refusal of 25 popular LLMs across 8 model families. Our datasets are available at https://huggingface.co/datasets/bench-llm/OR-Bench and the corresponding demo can be found at https://huggingface.co/spaces/bench-llm/or-bench. We hope this benchmark can help the community develop better safety aligned models.

Updated: 2024-05-31 15:44:33

标题: OR-Bench：大型语言模型的过度拒绝基准

摘要: 大型语言模型（LLMs）需要谨慎的安全对齐，以防止恶意输出。虽然大量研究集中在减轻有害内容生成上，但增强的安全性往往会带来过度拒绝的副作用，即LLMs可能拒绝无害提示并变得不那么有帮助。尽管已经从经验上观察到了过度拒绝的问题，但由于难以制定看似有害但实际无害的提示，系统化的测量是具有挑战性的。本研究提出了一种新颖的方法，用于自动生成大规模的“看似有毒提示”集（可能被LLMs拒绝的无害提示）。利用这一技术，我们引入了OR-Bench，第一个大规模过度拒绝基准。OR-Bench包括了80,000个看似有毒提示，涵盖了10个常见的拒绝类别，还有约1,000个即使对于最先进的LLMs也具有挑战性的难题子集，以及额外的600个有毒提示，以防止不加区分的回应。然后，我们进行了一项全面的研究，以测量25种流行的LLMs在8个模型系列中的过度拒绝情况。我们的数据集可在https://huggingface.co/datasets/bench-llm/OR-Bench找到，并且相应的演示可以在https://huggingface.co/spaces/bench-llm/or-bench找到。我们希望这一基准可以帮助社区开发更好的安全对齐模型。

更新时间: 2024-05-31 15:44:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20947v1

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain generative tasks, without compromising their original unimodal capabilities. We propose Zipper, a multi-tower decoder architecture that addresses these concerns by using cross-attention to flexibly compose multimodal generative models from independently pre-trained unimodal decoders. In our experiments fusing speech and text modalities, we show the proposed architecture performs very competitively in scenarios with limited aligned text-speech data. We also showcase the flexibility of our model to selectively maintain unimodal (e.g., text-to-text generation) generation performance by freezing the corresponding modal tower (e.g. text). In cross-modal tasks such as automatic speech recognition (ASR) where the output modality is text, we show that freezing the text backbone results in negligible performance degradation. In cross-modal tasks such as text-to-speech generation (TTS) where the output modality is speech, we show that using a pre-trained speech backbone results in superior performance to the baseline.

Updated: 2024-05-31 15:42:53

标题: 拉链：用于融合多种模态的多塔解码器架构

摘要: 整合多个生成式基础模型，特别是那些在不同模态上训练的模型，使其成为整体而不仅仅是部分之和，面临着重大挑战。两个关键障碍是可用的对齐数据（包含相似含义但在不同模态中表达不同的概念），以及在跨领域生成任务中有效地利用单模态表示，而不损害其原始的单模态能力。我们提出了Zipper，一个多塔解码器架构，通过使用交叉注意力灵活地从独立预训练的单模态解码器组合多模态生成模型，从而解决了这些问题。在我们的实验中，将语音和文本模态融合在一起，我们展示了所提出的架构在有限对齐文本-语音数据的情况下表现出很高的竞争力。我们还展示了我们模型的灵活性，通过冻结相应的模态塔（例如文本）选择性地保持单模态（例如文本到文本生成）生成性能。在输出模态为文本的交模态任务（例如自动语音识别（ASR））中，我们展示了冻结文本主干导致性能下降可以忽略不计。在输出模态为语音的交模态任务（例如文本到语音生成（TTS））中，我们展示了使用预训练的语音主干将带来比基线更优越的性能。

更新时间: 2024-05-31 15:42:53

领域: cs.LG,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2405.18669v2

Effective Interplay between Sparsity and Quantization: From Theory to Practice

The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two methods remains an open question. In this paper, we investigate the interaction between these two methods and assess whether their combination impacts final model accuracy. We mathematically prove that applying sparsity before quantization is the optimal sequence for these operations, minimizing error in computation. Our empirical studies across a wide range of models, including OPT and Llama model families (125M-8B) and ViT corroborate these theoretical findings. In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation. Our findings extend to the efficient deployment of large models in resource-limited compute platforms and reduce serving cost, offering insights into best practices for applying these compression methods to maximize efficacy without compromising accuracy.

Updated: 2024-05-31 15:34:13

标题: 稀疏性与量化之间的有效相互作用：从理论到实践

摘要: 深度神经网络规模不断增加，需要有效的模型压缩以提高计算效率并减少内存占用。稀疏性和量化是两种突出的压缩方法，它们分别展示了显著的计算和内存占用减少，同时保持模型准确性。虽然有效，但这两种方法之间的相互作用仍然是一个开放的问题。本文研究了这两种方法之间的相互作用，并评估它们的组合是否会影响最终模型的准确性。我们在数学上证明，将稀疏性应用于量化之前是这些操作的最佳顺序，可以最小化计算误差。我们在包括OPT和Llama模型系列（125M-8B）和ViT在内的各种模型上进行了实证研究，证实了这些理论发现。此外，通过严格的分析，我们证明稀疏性和量化并不正交；它们的相互作用可能会严重损害模型的准确性，其中量化误差在这一恶化中起着主导作用。我们的发现还适用于在资源受限的计算平台上高效部署大型模型，并降低服务成本，为应用这些压缩方法以最大程度提高效能而不影响准确性的最佳实践提供了见解。

更新时间: 2024-05-31 15:34:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20935v1

Concentration Bounds for Optimized Certainty Equivalent Risk Estimation

We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample bounds for the same. To show the applicability of our bounds, we consider a risk-aware bandit problem, with OCE as the risk. For this problem, we derive bound on the probability of mis-identification. Finally, we conduct numerical experiments to validate the theoretical findings.

Updated: 2024-05-31 15:32:43

标题: 优化等价风险估计的集中度界限

摘要: 我们考虑从独立同分布的样本中估计优化确定等价（OCE）风险的问题。对于OCE的经典样本平均逼近（SAA），我们推导了均方误差以及集中界（假设为次高斯）。此外，我们分析了基于有效随机逼近的OCE估计器，并推导了相同的有限样本界。为了展示我们界的适用性，我们考虑了一个风险感知的赌博问题，其中OCE作为风险。对于这个问题，我们推导了误识别概率的界。最后，我们进行数值实验以验证理论发现。

更新时间: 2024-05-31 15:32:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20933v1

Learning to Model the World with Language

To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future: what they will observe, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations, and learns to act from imagined model rollouts. While current methods that learn language-conditioned policies degrade in performance with more diverse types of language, we show that Dynalang learns to leverage environment descriptions, game rules, and instructions to excel on tasks ranging from game-playing to navigating photorealistic home scans. Finally, we show that our method enables additional capabilities due to learning a generative model: Dynalang can be pretrained on text-only data, enabling learning from offline datasets, and generate language grounded in an environment.

Updated: 2024-05-31 15:32:02

标题: 学会用语言对世界建模

摘要: 与人类互动并在世界中行动，代理需要理解人们使用的语言范围并将其与视觉世界联系起来。虽然当前的代理可以学习执行简单的语言指令，但我们的目标是构建能够利用多样化语言的代理--例如“这个按钮打开电视”或“我把碗放好”--传达一般知识，描述世界状态，提供交互反馈等。我们的关键想法是，代理应该将这样多样化的语言解释为一种帮助它们预测未来的信号：它们将观察到什么，世界将如何行为，以及哪些情况将受到奖励。这种观点将语言理解与未来预测统一为一个强大的自监督学习目标。我们在Dynalang中实现了这一点，该代理学习多模态世界模型以预测未来的文本和图像表示，并从想象的模型展开中学习行动。虽然当前学习语言条件的策略的方法在应用更多样化类型的语言时性能下降，但我们展示Dynalang学会利用环境描述、游戏规则和指令在从游戏玩法到导航逼真家庭扫描的任务中表现出色。最后，我们展示我们的方法由于学习了生成模型而使能额外的能力：Dynalang可以在仅文本数据上进行预训练，使其能够从离线数据集中学习，并生成基于环境的语言。

更新时间: 2024-05-31 15:32:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.01399v2

GUIDE: Guidance-based Incremental Learning with Diffusion Models

We introduce GUIDE, a novel continual learning approach that directs diffusion models to rehearse samples at risk of being forgotten. Existing generative strategies combat catastrophic forgetting by randomly sampling rehearsal examples from a generative model. Such an approach contradicts buffer-based approaches where sampling strategy plays an important role. We propose to bridge this gap by incorporating classifier guidance into the diffusion process to produce rehearsal examples specifically targeting information forgotten by a continuously trained model. This approach enables the generation of samples from preceding task distributions, which are more likely to be misclassified in the context of recently encountered classes. Our experimental results show that GUIDE significantly reduces catastrophic forgetting, outperforming conventional random sampling approaches and surpassing recent state-of-the-art methods in continual learning with generative replay.

Updated: 2024-05-31 15:31:16

标题: 指南：基于指导的扩散模型增量学习

摘要: 我们介绍了一种新颖的持续学习方法GUIDE，该方法将扩散模型引导到对可能被遗忘的样本进行复习。现有的生成策略通过从生成模型中随机抽样复习样本来对抗灾难性遗忘。这种方法与基于缓冲区的方法相矛盾，其中抽样策略起着重要作用。我们提出通过将分类器引导纳入扩散过程，以生成特别针对持续训练模型遗忘信息的复习样本，以弥合这一差距。这种方法使得能够从先前任务分布中生成样本，这些样本在最近遇到的类别环境中更有可能被错误分类。我们的实验结果表明，GUIDE显著减少了灾难性遗忘，优于传统的随机抽样方法，并超越了最近的持续学习中的生成回放的最新技术方法。

更新时间: 2024-05-31 15:31:16

领域: cs.LG

下载: http://arxiv.org/abs/2403.03938v2

Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba

Temporal logic is a framework for representing and reasoning about propositions that evolve over time. It is commonly used for specifying requirements in various domains, including hardware and software systems, as well as robotics. Specification mining or formula generation involves extracting temporal logic formulae from system traces and has numerous applications, such as detecting bugs and improving interpretability. Although there has been a surge of deep learning-based methods for temporal logic satisfiability checking in recent years, the specification mining literature has been lagging behind in adopting deep learning methods despite their many advantages, such as scalability. In this paper, we introduce autoregressive models that can generate linear temporal logic formulae from traces, towards addressing the specification mining problem. We propose multiple architectures for this task: transformer encoder-decoder, decoder-only transformer, and Mamba, which is an emerging alternative to transformer models. Additionally, we devise a metric for quantifying the distinctiveness of the generated formulae and a straightforward algorithm to enforce the syntax constraints. Our experiments show that the proposed architectures yield promising results, generating correct and distinct formulae at a fraction of the compute cost needed for the combinatorial baseline.

Updated: 2024-05-31 15:21:53

标题: 使用Transformer和Mamba学习估计线性时态逻辑中的系统规范

摘要: 时间逻辑是一个用于表示和推理随时间演变的命题的框架。它通常用于在各个领域中指定需求，包括硬件和软件系统，以及机器人学。规范挖掘或公式生成涉及从系统跟踪中提取时间逻辑公式，具有许多应用，例如检测错误和提高可解释性。尽管近年来出现了许多基于深度学习的时间逻辑可满足性检查方法，但规范挖掘文献在采用深度学习方法方面落后，尽管这些方法具有诸如可扩展性等许多优势。在本文中，我们介绍了能够从跟踪中生成线性时间逻辑公式的自回归模型，以解决规范挖掘问题。我们为这个任务提出了多种架构：变压器编码器-解码器，仅解码器变压器和Mamba，这是变压器模型的一种新兴替代品。此外，我们设计了一个衡量生成公式独特性的度量标准，以及一个用于强制语法约束的简单算法。我们的实验表明，所提出的架构产生了有希望的结果，以较低的计算成本生成正确且独特的公式，而不需要组合基线所需的计算成本的一部分。

更新时间: 2024-05-31 15:21:53

领域: cs.CL,cs.LG,cs.LO

下载: http://arxiv.org/abs/2405.20917v1

Fast yet Safe: Early-Exiting with Risk Control

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.

Updated: 2024-05-31 15:21:44

标题: 快速而安全：风险控制的早期退出

摘要: 缩放机器学习模型显著提高了它们的性能。然而，这样的收益是以推断速度慢和资源密集为代价的。早期退出神经网络（EENNs）提供了一种有希望的解决方案：它们通过允许中间层提前退出并生成预测来加速推断。然而，EENNs面临的一个根本问题是如何在不严重降低性能的情况下确定何时退出。换句话说，EENN 何时“安全”地“快速”进行。为了解决这个问题，我们研究如何将风险控制框架调整到EENNs。风险控制提供了一种无分布、事后的解决方案，调整EENN的退出机制，使得仅在输出质量足够的情况下才会发生退出。我们在一系列视觉和语言任务中从经验上验证了我们的见解，证明了风险控制可以实现大幅度的计算节省，同时保留用户指定的性能目标。

更新时间: 2024-05-31 15:21:44

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.20915v1

RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs

The growing popular awareness of personal privacy raises the following quandary: what is the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. Very recent work has taken steps towards relaxing the assumption by allowing data contributors to locally perturb their own data. Although these solutions withhold some data content to mitigate privacy risks, they have been shown to offer insufficient protection against disclosure attacks. Aiming at providing a more rigorous data safeguard for the Internet of Things (IoTs), this paper initiates the study of privacy-preserving data aggregation. We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation. Specially, we design a differentially private randomizer, which carefully guides data contributors to obfuscate the truth. Then, a shuffler is employed to receive the noisy data from all data contributors. After that, it breaks the correct linkage between senders and receivers by applying a random permutation. The estimation phase involves using inaccurate data to calculate an approximate aggregate value. Extensive simulations are provided to explore the privacy-utility landscape of our RASE.

Updated: 2024-05-31 15:21:38

标题: RASE: 针对IoT的高效隐私保护数据聚合抵抗披露攻击

摘要: 个人隐私日益受到关注，这引发了一个问题：如何收集和保护日益增多的传感器设备产生的数据的新范式。大多数先前关于数据聚合和隐私保护的共同设计的研究都假设一个受信任的融合中心遵守隐私制度。最近的一些工作已经开始放宽这一假设，允许数据贡献者在本地扰动自己的数据。尽管这些解决方案保留了一些数据内容以减轻隐私风险，但已经证明它们提供的保护不足以抵御披露攻击。为了为物联网提供更严格的数据保护，本文开始研究隐私保护数据聚合。我们提出了一种新的范式（称为RASE），可以推广为一个3步顺序过程，噪声添加，然后随机排列，最后参数估计。特别地，我们设计了一个差分隐私随机器，精心引导数据贡献者混淆真相。然后，一个混合器被用来接收来自所有数据贡献者的嘈杂数据。之后，它通过随机排列打破发送者和接收者之间的正确链接。估计阶段涉及使用不准确的数据来计算近似的聚合值。我们提供了大量的模拟来探索我们的RASE的隐私-效用景观。

更新时间: 2024-05-31 15:21:38

领域: cs.CR

下载: http://arxiv.org/abs/2405.20914v1

Predicting ptychography probe positions using single-shot phase retrieval neural network

Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization approaches. When the error accumulates along the scan path or when the error magnitude is large, these approaches may not converge with satisfactory result. We propose a fundamentally new approach for ptychography probe position prediction for data with large position errors, where a neural network is used to make single-shot phase retrieval on individual diffraction patterns, yielding the object image at each scan point. The pairwise offsets among these images are then found using a robust image registration method, and the results are combined to yield the complete scan path by constructing and solving a linear equation. We show that our method can achieve good position prediction accuracy for data with large and accumulating errors on the order of $10^2$ pixels, a magnitude that often makes optimization-based algorithms fail to converge. For ptychography instruments without sophisticated position control equipment such as interferometers, our method is of significant practical potential.

Updated: 2024-05-31 15:21:06

标题: 使用单次快拍相位恢复神经网络预测ptychography探针位置

摘要: 全息显微技术是一种强大的成像技术，在材料科学、生物学和纳米技术等领域得到应用。然而，重建的全息显微图像的准确性高度依赖记录的探测器位置的准确性，而这些位置往往存在误差。这些误差通常通过数值优化方法与相位恢复一起进行校正。当误差沿扫描路径累积或误差量较大时，这些方法可能无法收敛并产生令人满意的结果。我们提出了一种基本上新的全息显微探测器位置预测方法，用于具有大位置误差的数据，其中使用神经网络对单次衍射图案进行相位恢复，从而获得每个扫描点上的物体图像。然后，利用强健的图像配准方法找到这些图像之间的成对偏移，并通过构建和解决线性方程组将结果组合以获得完整的扫描路径。我们展示了我们的方法可以实现对具有大量和累积误差（约为$10^2$像素）的数据的良好位置预测准确性，这种量级通常会导致基于优化的算法无法收敛。对于没有复杂位置控制设备（如干涉仪）的全息显微仪器，我们的方法具有重要的实际潜力。

更新时间: 2024-05-31 15:21:06

领域: physics.app-ph,cs.AI,cs.CV,physics.data-an,94A08,I.4.0

下载: http://arxiv.org/abs/2405.20910v1

NoticIA: A Clickbait Article Summarization Dataset in Spanish

We present NoticIA, a dataset consisting of 850 Spanish news articles featuring prominent clickbait headlines, each paired with high-quality, single-sentence generative summarizations written by humans. This task demands advanced text understanding and summarization abilities, challenging the models' capacity to infer and connect diverse pieces of information to meet the user's informational needs generated by the clickbait headline. We evaluate the Spanish text comprehension capabilities of a wide range of state-of-the-art large language models. Additionally, we use the dataset to train ClickbaitFighter, a task-specific model that achieves near-human performance in this task.

Updated: 2024-05-31 15:19:18

标题: 标题翻译：NoticIA：一个西班牙语的点击诱导文章摘要数据集

摘要: 我们介绍了NoticIA数据集，其中包含850篇西班牙新闻文章，每篇文章都配有显眼的标题，每个标题都与高质量的、由人类撰写的单句生成式摘要配对。这项任务要求高级的文本理解和摘要能力，挑战了模型推断和连接多样信息片段以满足点击标题生成的用户信息需求的能力。我们评估了一系列最先进的大型语言模型在西班牙文本理解方面的能力。此外，我们使用该数据集训练了ClickbaitFighter，一个专门针对该任务的模型，在此任务中实现了接近人类表现的性能。

更新时间: 2024-05-31 15:19:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.07611v2

Enhancing Vision Models for Text-Heavy Content Understanding and Interaction

Interacting and understanding with text heavy visual content with multiple images is a major challenge for traditional vision models. This paper is on enhancing vision models' capability to comprehend or understand and learn from images containing a huge amount of textual information from the likes of textbooks and research papers which contain multiple images like graphs, etc and tables in them with different types of axes and scales. The approach involves dataset preprocessing, fine tuning which is by using instructional oriented data and evaluation. We also built a visual chat application integrating CLIP for image encoding and a model from the Massive Text Embedding Benchmark which is developed to consider both textual and visual inputs. An accuracy of 96.71% was obtained. The aim of the project is to increase and also enhance the advance vision models' capabilities in understanding complex visual textual data interconnected data, contributing to multimodal AI.

Updated: 2024-05-31 15:17:47

标题: 增强视觉模型以实现文本密集内容的理解和交互

摘要: 与包含多幅图像的文本密集视觉内容进行交互和理解是传统视觉模型面临的主要挑战。本文旨在增强视觉模型理解和学习包含大量文本信息的图像的能力，这些信息来自于诸如教科书和研究论文等包含多幅图像（如图表等）和不同类型轴线和比例尺的文本。该方法涉及数据集预处理、微调（通过使用面向教学的数据）和评估。我们还构建了一个集成CLIP用于图像编码的视觉聊天应用程序，以及从Massive Text Embedding Benchmark中开发的模型，该模型考虑了文本和视觉输入。我们获得了96.71%的准确率。该项目的目标是增强先进视觉模型在理解复杂的视觉文本数据互联数据方面的能力，为多模态人工智能做出贡献。

更新时间: 2024-05-31 15:17:47

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20906v1

VENI, VINDy, VICI: a variational reduced-order modeling framework with uncertainty quantification

The simulation of many complex phenomena in engineering and science requires solving expensive, high-dimensional systems of partial differential equations (PDEs). To circumvent this, reduced-order models (ROMs) have been developed to speed up computations. However, when governing equations are unknown or partially known, typically ROMs lack interpretability and reliability of the predicted solutions. In this work we present a data-driven, non-intrusive framework for building ROMs where the latent variables and dynamics are identified in an interpretable manner and uncertainty is quantified. Starting from a limited amount of high-dimensional, noisy data the proposed framework constructs an efficient ROM by leveraging variational autoencoders for dimensionality reduction along with a newly introduced, variational version of sparse identification of nonlinear dynamics (SINDy), which we refer to as Variational Identification of Nonlinear Dynamics (VINDy). In detail, the method consists of Variational Encoding of Noisy Inputs (VENI) to identify the distribution of reduced coordinates. Simultaneously, we learn the distribution of the coefficients of a pre-determined set of candidate functions by VINDy. Once trained offline, the identified model can be queried for new parameter instances and new initial conditions to compute the corresponding full-time solutions. The probabilistic setup enables uncertainty quantification as the online testing consists of Variational Inference naturally providing Certainty Intervals (VICI). In this work we showcase the effectiveness of the newly proposed VINDy method in identifying interpretable and accurate dynamical system for the R\"ossler system with different noise intensities and sources. Then the performance of the overall method - named VENI, VINDy, VICI - is tested on PDE benchmarks including structural mechanics and fluid dynamics.

Updated: 2024-05-31 15:16:48

标题: VENI, VINDy, VICI：一种具有不确定性量化的变分降阶建模框架

摘要: 在工程和科学中模拟许多复杂现象需要解决昂贵的、高维的偏微分方程组（PDEs）。为了避免这种情况，开发了降阶模型（ROMs）以加速计算。然而，当控制方程未知或部分已知时，通常情况下ROMs缺乏可解释性和预测解的可靠性。在这项工作中，我们提出了一种数据驱动的、非侵入性的框架来构建ROMs，其中潜在变量和动态是以可解释的方式识别的，同时不确定性也被量化。通过利用变分自动编码器进行降维以及一种新引入的、稀疏非线性动力学识别（SINDy）的变分版本，我们将其称为变分非线性动力学识别（VINDy），提出的框架从有限数量的高维、嘈杂数据开始构建高效的ROM。具体而言，该方法包括用于识别降维坐标分布的有噪输入的变分编码（VENI）。同时，我们通过VINDy学习预先确定的一组候选函数的系数分布。一旦离线训练完成，识别的模型可以查询新的参数实例和新的初始条件，以计算相应的全时间解。概率设置使得在线测试包括自然提供置信区间的变分推理（VICI）。在这项工作中，我们展示了新提出的VINDy方法在识别具有不同噪声强度和来源的R\"ossler系统的可解释和准确的动力系统方面的有效性。然后，将整体方法的性能（命名为VENI、VINDy、VICI）在包括结构力学和流体动力学在内的PDE基准上进行测试。

更新时间: 2024-05-31 15:16:48

领域: cs.LG,cs.CE,math.DS

下载: http://arxiv.org/abs/2405.20905v1

Position: Stop Making Unscientific AGI Performance Claims

Developments in the field of Artificial Intelligence (AI), and particularly large language models (LLMs), have created a 'perfect storm' for observing 'sparks' of Artificial General Intelligence (AGI) that are spurious. Like simpler models, LLMs distill meaningful representations in their latent embeddings that have been shown to correlate with external variables. Nonetheless, the correlation of such representations has often been linked to human-like intelligence in the latter but not the former. We probe models of varying complexity including random projections, matrix decompositions, deep autoencoders and transformers: all of them successfully distill information that can be used to predict latent or external variables and yet none of them have previously been linked to AGI. We argue and empirically demonstrate that the finding of meaningful patterns in latent spaces of models cannot be seen as evidence in favor of AGI. Additionally, we review literature from the social sciences that shows that humans are prone to seek such patterns and anthropomorphize. We conclude that both the methodological setup and common public image of AI are ideal for the misinterpretation that correlations between model representations and some variables of interest are 'caused' by the model's understanding of underlying 'ground truth' relationships. We, therefore, call for the academic community to exercise extra caution, and to be keenly aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.

Updated: 2024-05-31 15:16:21

标题: 立场：停止做出不科学的AGI性能声明

摘要: 人工智能（AI）领域的发展，特别是大型语言模型（LLMs），已经创造了一个“完美风暴”，用于观察人工通用智能（AGI）的'火花'，这些火花是虚假的。与简单模型一样，LLMs在它们的潜在嵌入中提炼出有意义的表示，这些表示已被证明与外部变量相关联。然而，这些表示的相关性通常与后者而非前者被联系到类似于人类智能。我们探究了各种复杂度的模型，包括随机投影、矩阵分解、深度自动编码器和变压器：它们都成功地提炼出信息，可用于预测潜在或外部变量，但以前都没有被联系到AGI。我们认为并通过实验证明，在模型的潜在空间中发现有意义的模式不能被视为支持AGI的证据。此外，我们审查了社会科学文献，显示人类倾向于寻找这种模式并拟人化。我们得出结论，无论是方法论设置还是AI的公众形象，都极易导致误解，即模型表示与某些感兴趣的变量之间的相关性是由于模型对底层'真相'关系的理解所致。因此，我们呼吁学术界要更加谨慎，并要敏锐地意识到学术诚信原则，以解释和传播AI研究结果。

更新时间: 2024-05-31 15:16:21

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.03962v3

Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model's reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent.

Updated: 2024-05-31 15:15:04

标题: 在思维链推理上的预防性回答“攻击”

摘要: 大型语言模型（LLMs）在与Chain-of-Thought（CoT）提示结合时展示出令人印象深刻的推理能力。然而，这种方法的鲁棒性需要进一步的调查。在本文中，我们引入了一种新颖的情景，称为预先回答，即LLM在进行推理之前获取答案。这种情况可能会无意间发生，也可能是恶意用户通过提示注入攻击引发的。实验显示，预先回答显著损害了模型在各种CoT方法和广泛数据集上的推理能力。为了加强推理的鲁棒性，我们提出了两项措施，旨在在一定程度上缓解这一问题。

更新时间: 2024-05-31 15:15:04

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.20902v1

Bayesian Program Learning by Decompiling Amortized Knowledge

DreamCoder is an inductive program synthesis system that, whilst solving problems, learns to simplify search in an iterative wake-sleep procedure. The cost of search is amortized by training a neural search policy, reducing search breadth and effectively "compiling" useful information to compose program solutions across tasks. Additionally, a library of program components is learnt to compress and express discovered solutions in fewer components, reducing search depth. We present a novel approach for library learning that directly leverages the neural search policy, effectively "decompiling" its amortized knowledge to extract relevant program components. This provides stronger amortized inference: the amortized knowledge learnt to reduce search breadth is now also used to reduce search depth. We integrate our approach with DreamCoder and demonstrate faster domain proficiency with improved generalization on a range of domains, particularly when fewer example solutions are available.

Updated: 2024-05-31 15:14:58

标题: 贝叶斯程序学习：通过解析摊销知识进行学习

摘要: DreamCoder是一个归纳程序合成系统，通过解决问题，学习简化搜索的迭代式唤醒-睡眠过程。通过训练神经搜索策略来摊销搜索成本，减少搜索广度并有效地“编译”有用信息以组合跨任务的程序解决方案。此外，学习了一个程序组件库，用于压缩和表达发现的解决方案在更少的组件中，减少搜索深度。我们提出了一种直接利用神经搜索策略的库学习的新方法，有效地“反编译”其摊销知识以提取相关的程序组件。这提供了更强大的摊销推理：学会减少搜索广度的摊销知识现在也用于减少搜索深度。我们将我们的方法与DreamCoder集成，并展示了在一系列领域中更快的领域熟练度，特别是在可用的示例解决方案较少时的改进泛化能力。

更新时间: 2024-05-31 15:14:58

领域: cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2306.07856v3

Unity by Diversity: Improved Representation Learning in Multimodal VAEs

Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or both across modalities to learn a shared representation. Such architectures impose hard constraints on the model. In this work, we show that a better latent representation can be obtained by replacing these hard constraints with a soft constraint. We propose a new mixture-of-experts prior, softly guiding each modality's latent representation towards a shared aggregate posterior. This approach results in a superior latent representation and allows each encoding to preserve information better from its uncompressed original features. In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.

Updated: 2024-05-31 15:14:43

标题: 多样性统一：多模态VAE中改进的表示学习

摘要: 多模态数据的变分自动编码器在数据分析中具有许多任务的潜力，例如表示学习、条件生成和缺失值插补。目前的架构要么在模态之间共享编码器输出，解码器输入，要么两者都共享以学习共享表示。这种架构对模型施加了严格约束。在这项工作中，我们展示了通过用软约束替换这些硬约束可以获得更好的潜在表示。我们提出了一个新的专家混合先验，通过软指导每种模态的潜在表示朝向共享的聚合后验。这种方法导致了优秀的潜在表示，并允许每种编码更好地保留来自其未压缩原始特征的信息。通过在多个基准数据集和两个具有挑战性的真实世界数据集上进行广泛实验，我们展示了相比现有方法，学习到的潜在表示和缺失数据模态的插补得到了改进。

更新时间: 2024-05-31 15:14:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.05300v3

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation

Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.

Updated: 2024-05-31 15:13:33

标题: 地球是平的，因为……：通过说服性对话调查低水平的信息素养者对错误信息的信仰

摘要: 大型语言模型（LLMs）囊括了大量知识，但仍然容易受到外部错误信息的影响。现有研究主要研究了这种易受影响的行为在单轮设置中的情况。然而，信念在多轮对话中可能会发生变化，特别是在一场有说服力的对话中。因此，在这项研究中，我们深入探讨LLMs对有说服力对话的易受影响性，特别是对于他们可以正确回答的事实问题。我们首先整理了Farm（即事实到错误信息）数据集，其中包含与系统生成的有说服力错误信息配对的事实问题。然后，我们开发了一个测试框架来跟踪LLMs在说服性对话中的信念变化。通过广泛实验，我们发现LLMs对事实知识的正确信念很容易受到各种说服策略的操纵。

更新时间: 2024-05-31 15:13:33

领域: cs.CL,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2312.09085v5

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

We propose a novel method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" (RMax) by "complex-valued convolutions + modulus" (CMod), which is stable to translations, or shifts. To justify our approach, we claim that CMod and RMax produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context, CMod can therefore be considered as a stable alternative to RMax. To enforce this property, we constrain the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation.

Updated: 2024-05-31 15:08:21

标题: 从 CNN 到基于复小波的平移不变双模型

摘要: 我们提出了一种新颖的方法，以增加卷积神经网络中的平移不变性和预测准确性。具体而言，我们通过将第一层的组合“实值卷积+最大池化”（RMax）替换为“复值卷积+模值”（CMod），从而实现了稳定的平移。为了证明我们的方法，我们声称当卷积核是带通和定向（类似Gabor滤波器）时，CMod和RMax产生可比较的输出。在这种情况下，CMod因此可以被视为RMax的一个稳定替代品。为了强制实现这一属性，我们约束卷积核采用这样的Gabor-like结构。相应的架构被称为数学孪生，因为它利用一个明确定义的数学算子来模拟原始的自由训练模型的行为。与基于低通滤波的先前方法相比，我们的方法在ImageNet和CIFAR-10分类任务上实现了更高的准确性。可以说，我们方法强调保留高频细节有助于在平移不变性和信息保留之间实现更好的平衡，从而提高了性能。此外，它的计算成本和内存占用比当前工作更低，使其成为实际实施的有希望的解决方案。

更新时间: 2024-05-31 15:08:21

领域: cs.CV,cs.AI,eess.IV,stat.ML

下载: http://arxiv.org/abs/2212.00394v3

Fuzzychain: An Equitable Consensus Mechanism for Blockchain Networks

Blockchain technology has become a trusted method for establishing secure and transparent transactions through a distributed, encrypted network. The operation of blockchain is governed by consensus algorithms, among which Proof of Stake (PoS) is popular yet has its drawbacks, notably the potential for centralising power in nodes with larger stakes or higher rewards. Fuzzychain, our proposed solution, introduces the use of fuzzy sets to define stake semantics, promoting decentralised and distributed processing control. This system selects validators based on their degree of membership to the stake fuzzy sets rather than just the size of their stakes. As a pioneer proposal in applying fuzzy sets to blockchain, Fuzzychain aims to rectify PoS's limitations. Our results indicate that Fuzzychain not only matches PoS in functionality but also ensures a fairer distribution of stakes among validators, leading to more inclusive validator selection and a better-distributed network.

Updated: 2024-05-31 15:07:14

标题: 模糊链：区块链网络的公平共识机制

摘要: 区块链技术已经成为通过分布式加密网络建立安全透明交易的可信方法。区块链的运作由共识算法控制，其中权益证明（PoS）是一种流行的方式，但其缺点是可能将权力集中在拥有更大利益或更高奖励的节点上。我们提出的解决方案Fuzzychain引入模糊集的概念来定义权益语义，促进去中心化和分布式处理控制。该系统根据验证者对权益模糊集的隶属度来选择验证者，而不仅仅是根据他们的权益大小。作为将模糊集应用于区块链的先驱提议，Fuzzychain旨在纠正PoS的局限性。我们的结果表明，Fuzzychain不仅在功能上与PoS相匹配，而且确保了验证者之间权益更公平地分布，从而导致更具包容性的验证者选择和更好分布的网络。

更新时间: 2024-05-31 15:07:14

领域: cs.CR,cs.ET,cs.LO

下载: http://arxiv.org/abs/2404.13337v2

Enhancing Deep Traffic Forecasting Models with Dynamic Regression

Deep learning models for traffic forecasting often assume the residual is independent and isotropic across time and space. This assumption simplifies loss functions such as mean absolute error, but real-world residual processes often exhibit significant autocorrelation and structured spatiotemporal correlation. This paper introduces a dynamic regression (DR) framework to enhance existing spatiotemporal traffic forecasting models by incorporating structured learning for the residual process. We assume the residual of the base model (i.e., a well-developed traffic forecasting model) follows a matrix-variate seasonal autoregressive (AR) model, which is seamlessly integrated into the training process through the redesign of the loss function. Importantly, the parameters of the DR framework are jointly optimized alongside the base model. We evaluate the effectiveness of the proposed framework on state-of-the-art (SOTA) deep traffic forecasting models using both speed and flow datasets, demonstrating improved performance and providing interpretable AR coefficients and spatiotemporal covariance matrices.

Updated: 2024-05-31 15:05:40

标题: 用动态回归增强深度交通预测模型

摘要: 交通预测的深度学习模型通常假设残差在时间和空间上是独立和各向同性的。这种假设简化了损失函数，如平均绝对误差，但现实世界中的残差过程通常表现出显著的自相关性和结构化的时空相关性。本文引入了一个动态回归（DR）框架，通过结构化学习残差过程来增强现有的时空交通预测模型。我们假设基础模型的残差（即一个成熟的交通预测模型）遵循一个矩阵变量季节性自回归（AR）模型，通过重新设计损失函数无缝集成到训练过程中。重要的是，DR框架的参数与基础模型一起进行联合优化。我们使用速度和流量数据集对提出的框架在最先进的深度交通预测模型上进行了评估，展示了改进的性能，并提供可解释的AR系数和时空协方差矩阵。

更新时间: 2024-05-31 15:05:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2301.06650v2

MALT: Multi-scale Action Learning Transformer for Online Action Detection

Online action detection (OAD) aims to identify ongoing actions from streaming video in real-time, without access to future frames. Since these actions manifest at varying scales of granularity, ranging from coarse to fine, projecting an entire set of action frames to a single latent encoding may result in a lack of local information, necessitating the acquisition of action features across multiple scales. In this paper, we propose a multi-scale action learning transformer (MALT), which includes a novel recurrent decoder (used for feature fusion) that includes fewer parameters and can be trained more efficiently. A hierarchical encoder with multiple encoding branches is further proposed to capture multi-scale action features. The output from the preceding branch is then incrementally input to the subsequent branch as part of a cross-attention calculation. In this way, output features transition from coarse to fine as the branches deepen. We also introduce an explicit frame scoring mechanism employing sparse attention, which filters irrelevant frames more efficiently, without requiring an additional network. The proposed method achieved state-of-the-art performance on two benchmark datasets (THUMOS'14 and TVSeries), outperforming all existing models used for comparison, with an mAP of 0.2% for THUMOS'14 and an mcAP of 0.1% for TVseries.

Updated: 2024-05-31 15:03:35

标题: MALT: 在线动作检测的多尺度动作学习变压器

摘要: 在线动作检测（OAD）旨在实时从流视频中识别正在进行的动作，而无需访问未来的帧。由于这些动作以不同粒度的尺度呈现，从粗糙到精细不等，将整个动作帧集投影到单个潜在编码可能导致局部信息不足，需要跨多个尺度获取动作特征。在本文中，我们提出了一种多尺度动作学习Transformer（MALT），其中包括一种新颖的递归解码器（用于特征融合），具有更少的参数并且可以更有效地训练。进一步提出了一个具有多个编码分支的分层编码器，用于捕获多尺度动作特征。然后，从先前分支的输出逐渐输入到后续分支，作为跨注意力计算的一部分。通过这种方式，随着分支的加深，输出特征从粗糙到精细过渡。我们还引入了一种明确的帧评分机制，采用稀疏注意力，可以更有效地过滤不相关的帧，而无需额外的网络。所提出的方法在两个基准数据集（THUMOS'14和TVSeries）上实现了最先进的性能，优于所有用于比较的现有模型，THUMOS'14的mAP为0.2%，TVSeries的mcAP为0.1%。

更新时间: 2024-05-31 15:03:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20892v1

CARTE: Pretraining and Transfer for Tabular Learning

Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries. An extensive benchmark shows that CARTE facilitates learning, outperforming a solid set of baselines including the best tree-based models. CARTE also enables joint learning across tables with unmatched columns, enhancing a small table with bigger ones. CARTE opens the door to large pretrained models for tabular data.

Updated: 2024-05-31 15:03:11

标题: CARTE：表格学习的预训练和迁移

摘要: 预训练的深度学习模型是处理图像或文本的首选解决方案。然而，对于表格数据，目前标准仍然是训练基于树的模型。事实上，对表格的迁移学习面临数据集成的挑战：找到对应关系，在条目中的对应关系（实体匹配）可能会出现不同词汇表示同一实体，在列之间的对应关系（模式匹配），可能以不同的顺序、名称等形式出现...我们提出了一种神经架构，不需要这样的对应关系。因此，我们可以在未匹配的背景数据上对其进行预训练。这种架构--CARTE，即表格条目的上下文感知表示--使用图形表示表格（或关系）数据来处理具有不同列的表格，使用条目和列名称的字符串嵌入来建模开放词汇表，并使用图注意力网络将条目与列名和相邻条目进行上下文化。广泛的基准测试表明，CARTE促进了学习，优于包括最佳基于树模型在内的一组牢固基线。CARTE还能够实现跨具有不匹配列的表格的联合学习，提升一个小表格与更大表格之间的关联。CARTE为表格数据开启了大型预训练模型的大门。

更新时间: 2024-05-31 15:03:11

领域: cs.LG

下载: http://arxiv.org/abs/2402.16785v2

Intelligent and Miniaturized Neural Interfaces: An Emerging Era in Neurotechnology

Integrating smart algorithms on neural devices presents significant opportunities for various brain disorders. In this paper, we review the latest advancements in the development of three categories of intelligent neural prostheses featuring embedded signal processing on the implantable or wearable device. These include: 1) Neural interfaces for closed-loop symptom tracking and responsive stimulation; 2) Neural interfaces for emerging network-related conditions, such as psychiatric disorders; and 3) Intelligent BMI SoCs for movement recovery following paralysis.

Updated: 2024-05-31 15:00:36

标题: 智能和微型神经接口：神经技术中的新兴时代

摘要: 在神经设备上整合智能算法为各种脑部疾病提供了重要的机会。本文回顾了三类智能神经假体的最新进展，这些假体具有内置信号处理功能，可以植入或佩戴在身体上。这包括：1）用于闭环症状跟踪和响应性刺激的神经接口；2）用于新兴网络相关疾病（如精神疾病）的神经接口；和3）用于瘫痪后运动康复的智能BMI SoCs。

更新时间: 2024-05-31 15:00:36

领域: eess.SP,cs.AR,cs.HC,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2405.10780v2

EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs

Non-IID transfer learning on graphs is crucial in many high-stakes domains. The majority of existing works assume stationary distribution for both source and target domains. However, real-world graphs are intrinsically dynamic, presenting challenges in terms of domain evolution and dynamic discrepancy between source and target domains. To bridge the gap, we shift the problem to the dynamic setting and pose the question: given the label-rich source graphs and the label-scarce target graphs both observed in previous T timestamps, how can we effectively characterize the evolving domain discrepancy and optimize the generalization performance of the target domain at the incoming T+1 timestamp? To answer it, we propose a generalization bound for dynamic non-IID transfer learning on graphs, which implies the generalization performance is dominated by domain evolution and domain discrepancy between source and target graphs. Inspired by the theoretical results, we introduce a novel generic framework named EvoluNet. It leverages a transformer-based temporal encoding module to model temporal information of the evolving domains and then uses a dynamic domain unification module to efficiently learn domain-invariant representations across the source and target domains. Finally, EvoluNet outperforms the state-of-the-art models by up to 12.1%, demonstrating its effectiveness in transferring knowledge from dynamic source graphs to dynamic target graphs.

Updated: 2024-05-31 14:58:20

标题: EvoluNet：推动图上动态非独立同分布转移学习

摘要: 在许多高风险领域中，图上的非IID迁移学习至关重要。现有大多数作品假定源域和目标域均具有固定分布。然而，现实世界的图是固有动态的，存在领域演化和源域与目标域之间动态差异的挑战。为了弥合差距，我们将问题转移到动态设置，并提出问题：在前T个时间戳中观察到的标签丰富的源图和标签稀缺的目标图，如何有效地表征不断演化的领域差异，并优化目标域在下一个T+1时间戳的泛化性能？为了回答这个问题，我们提出了一个动态非IID图上迁移学习的泛化界限，这表明泛化性能受领域演化和源图与目标图之间的领域差异支配。受到理论结果的启发，我们引入了一个名为EvoluNet的新颖通用框架。它利用基于变压器的时间编码模块来建模不断演化领域的时间信息，然后使用动态领域统一模块有效地学习源和目标领域之间的领域不变表示。最后，EvoluNet在效果上超过了现有模型高达12.1％，证明了其在从动态源图到动态目标图的知识转移中的有效性。

更新时间: 2024-05-31 14:58:20

领域: cs.LG

下载: http://arxiv.org/abs/2305.00664v5

Sheaf HyperNetworks for Personalized Federated Learning

Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline.

Updated: 2024-05-31 14:55:38

标题: 用于个性化联邦学习的Sheaf超网络

摘要: 图超网络（GHNs）通过将图神经网络（GNNs）与超网络（HNs）结合构建，利用跨不同领域的关系数据，例如神经结构搜索、分子性质预测和联邦学习。尽管GNNs和HNs分别取得了成功，我们发现GHNs存在一些问题，如过度平滑和异质性。此外，我们无法直接将GHNs应用于个性化联邦学习（PFL）场景，其中事先客户关系图可能不存在、私有或不可访问。为了在PFL环境中缓解这些限制，我们提出了一种新型HNs类别，即束超网络（SHNs），将细胞束理论与HNs结合起来，以改善PFL的参数共享。我们在各种PFL任务中对SHNs进行了彻底评估，包括多类分类、交通和天气预测。此外，我们提供了一种在缺乏此类图的情况下构建客户关系图的方法。我们展示了在复杂的非IID场景中，SHNs始终优于现有的PFL解决方案。虽然基线的性能取决于任务，但SHNs在准确度和均方误差较低方面表现出高达2.7%的改善。

更新时间: 2024-05-31 14:55:38

领域: cs.LG

下载: http://arxiv.org/abs/2405.20882v1

Paying to Do Better: Games with Payments between Learning Agents

In repeated games, such as auctions, players typically use learning algorithms to choose their actions. The use of such autonomous learning agents has become widespread on online platforms. In this paper, we explore the impact of players incorporating monetary transfers into their agents' algorithms, aiming to incentivize behavior in their favor. Our focus is on understanding when players have incentives to make use of monetary transfers, how these payments affect learning dynamics, and what the implications are for welfare and its distribution among the players. We propose a simple game-theoretic model to capture such scenarios. Our results on general games show that in a broad class of games, players benefit from letting their learning agents make payments to other learners during the game dynamics, and that in many cases, this kind of behavior improves welfare for all players. Our results on first- and second-price auctions show that in equilibria of the ``payment policy game,'' the agents' dynamics can reach strong collusive outcomes with low revenue for the auctioneer. These results highlight a challenge for mechanism design in systems where automated learning agents can benefit from interacting with their peers outside the boundaries of the mechanism.

Updated: 2024-05-31 14:55:11

标题: 付费做得更好：学习智能体之间的支付游戏

摘要: 在重复博弈中，例如拍卖，玩家通常使用学习算法来选择他们的行动。这种自主学习代理的使用在在线平台上变得普遍。在本文中，我们探讨了玩家将货币转移纳入其代理算法中的影响，旨在激励有利于他们的行为。我们的重点是理解玩家何时有动机利用货币转移，这些支付如何影响学习动态，以及对福利及其在玩家之间的分配有何影响。我们提出了一个简单的博弈论模型来捕捉这种情景。我们关于一般游戏的结果表明，在广泛的游戏类别中，玩家从让他们的学习代理在游戏动态过程中向其他学习者支付款项中受益，并且在许多情况下，这种行为改善了所有玩家的福利。我们关于一价和二价拍卖的结果表明，在“支付政策游戏”的均衡中，代理的动态可以达到与拍卖者收入较低的强烈勾结结果。这些结果突显了在自动学习代理可以从与其同行之间的交互中获益的系统中机制设计面临的挑战。

更新时间: 2024-05-31 14:55:11

领域: cs.GT,cs.AI,cs.MA,econ.TH,91A05, 91A06, 91A10, 91A20, 91A40, 91A80, 91B26,F.0; I.2; I.2.6; J.4

下载: http://arxiv.org/abs/2405.20880v1

Flow matching achieves minimax optimal convergence

Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM in terms of the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve the minmax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain these optimal rates.

Updated: 2024-05-31 14:54:51

标题: Flow matching实现极小极大最优收敛

摘要: 流匹配（FM）作为一种无需模拟的生成模型，已经引起了重要关注。与基于随机微分方程的扩散模型不同，FM采用了一种更简单的方法，通过解决一个普通微分方程并从正态分布中获取初始条件，从而简化了样本生成过程。本文讨论了FM在$p$-Wasserstein距离方面的收敛性质，这是一种分布差异的度量。我们建立了FM可以实现$1 \leq p \leq 2$的最佳收敛速率，首次提出了FM可以达到与扩散模型相媲美的收敛速率的理论证据。我们的分析通过研究更广泛的均值和方差函数类别来扩展现有框架，并确定了实现这些最佳速率所必要的特定条件。

更新时间: 2024-05-31 14:54:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.20879v1

SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation

Sequential recommendation effectively addresses information overload by modeling users' temporal and sequential interaction patterns. To overcome the limitations of supervision signals, recent approaches have adopted self-supervised learning techniques in recommender systems. However, there are still two critical challenges that remain unsolved. Firstly, existing sequential models primarily focus on long-term modeling of individual interaction sequences, overlooking the valuable short-term collaborative relationships among the behaviors of different users. Secondly, real-world data often contain noise, particularly in users' short-term behaviors, which can arise from temporary intents or misclicks. Such noise negatively impacts the accuracy of both graph and sequence models, further complicating the modeling process. To address these challenges, we propose a novel framework called Self-Supervised Graph Neural Network (SelfGNN) for sequential recommendation. The SelfGNN framework encodes short-term graphs based on time intervals and utilizes Graph Neural Networks (GNNs) to learn short-term collaborative relationships. It captures long-term user and item representations at multiple granularity levels through interval fusion and dynamic behavior modeling. Importantly, our personalized self-augmented learning structure enhances model robustness by mitigating noise in short-term graphs based on long-term user interests and personal stability. Extensive experiments conducted on four real-world datasets demonstrate that SelfGNN outperforms various state-of-the-art baselines. Our model implementation codes are available at https://github.com/HKUDS/SelfGNN.

Updated: 2024-05-31 14:53:12

标题: SelfGNN：用于序列推荐的自监督图神经网络

摘要: 顺序推荐通过建模用户的时间和顺序交互模式，有效地解决了信息过载问题。为了克服监督信号的局限性，最近的方法在推荐系统中采用了自监督学习技术。然而，仍然存在两个尚未解决的关键挑战。首先，现有的顺序模型主要集中在对个体交互序列的长期建模，忽视了不同用户行为之间宝贵的短期协作关系。其次，现实世界中的数据通常包含噪声，特别是在用户的短期行为中，这可能来自临时意图或误点击。这种噪声会对图形和序列模型的准确性产生负面影响，进一步使建模过程变得更加复杂。为了解决这些挑战，我们提出了一种名为自监督图神经网络（SelfGNN）的新框架，用于顺序推荐。SelfGNN框架基于时间间隔对短期图进行编码，并利用图神经网络（GNNs）学习短期协作关系。通过间隔融合和动态行为建模，它在多个粒度级别捕获长期用户和物品表示。重要的是，我们的个性化自增强学习结构通过减轻基于长期用户兴趣和个人稳定性的短期图中的噪声，增强了模型的鲁棒性。在四个真实世界数据集上进行的大量实验表明，SelfGNN优于各种最先进的基线模型。我们的模型实现代码可在https://github.com/HKUDS/SelfGNN 上找到。

更新时间: 2024-05-31 14:53:12

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.20878v1

Waveform Design for Over-the-Air Computing

In response to the increasing number of devices anticipated in next-generation networks, a shift toward over-the-air (OTA) computing has been proposed. Leveraging the superposition of multiple access channels, OTA computing enables efficient resource management by supporting simultaneous uncoded transmission in the time and the frequency domain. Thus, to advance the integration of OTA computing, our study presents a theoretical analysis addressing practical issues encountered in current digital communication transceivers, such as time sampling error and intersymbol interference (ISI). To this end, we examine the theoretical mean squared error (MSE) for OTA transmission under time sampling error and ISI, while also exploring methods for minimizing the MSE in the OTA transmission. Utilizing alternating optimization, we also derive optimal power policies for both the devices and the base station. Additionally, we propose a novel deep neural network (DNN)-based approach to design waveforms enhancing OTA transmission performance under time sampling error and ISI. To ensure fair comparison with existing waveforms like the raised cosine (RC) and the better-than-raised-cosine (BRTC), we incorporate a custom loss function integrating energy and bandwidth constraints, along with practical design considerations such as waveform symmetry. Simulation results validate our theoretical analysis and demonstrate performance gains of the designed pulse over RC and BTRC waveforms. To facilitate testing of our results without necessitating the DNN structure recreation, we provide curve fitting parameters for select DNN-based waveforms as well.

Updated: 2024-05-31 14:52:58

标题: 无线计算的波形设计

摘要: 随着预计在下一代网络中设备数量的增加，一种向过空（OTA）计算的转变被提出。利用多个接入通道的叠加，OTA计算通过支持在时间和频率域中同时进行未编码传输来实现有效的资源管理。因此，为了推动OTA计算的整合，我们的研究提出了一项理论分析，解决了当前数字通信收发器中遇到的实际问题，如时间采样误差和符号间干扰（ISI）。为此，我们研究了在时间采样误差和ISI下OTA传输的理论均方误差（MSE），同时探讨了减小OTA传输中MSE的方法。利用交替优化，我们还导出了设备和基站的最优功率策略。此外，我们提出了一种基于深度神经网络（DNN）的方法，设计波形以增强在时间采样误差和ISI下的OTA传输性能。为了确保与现有波形（如升余弦（RC）和优于升余弦（BRTC））的公平比较，我们结合能量和带宽约束以及波形对称性等实际设计考虑，将自定义损失函数整合进去。仿真结果验证了我们的理论分析，并展示了设计的脉冲波形相对于RC和BTRC波形的性能增益。为了方便测试我们的结果，而不需要重新创建DNN结构，我们还为选定的基于DNN的波形提供了曲线拟合参数。

更新时间: 2024-05-31 14:52:58

领域: cs.IT,cs.DC,cs.LG,eess.SP,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.20877v1

Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study

Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applications pose more than just resource and performance constraints. In particular, predictions must not be overly confident, i.e., provide properly calibrated uncertainty estimations (proper uncertainty calibration), and CNNs must be robust against corruptions like naturally occurring input perturbations (natural corruption robustness). This work investigates the important trade-off between uncertainty calibration, natural corruption robustness, and performance for current state-of-research post-hoc CNN pruning techniques in the context of image classification tasks. Our study reveals that post-hoc pruning substantially improves the model's uncertainty calibration, performance, and natural corruption robustness, sparking hope for safe and robust embedded CNNs.Furthermore, uncertainty calibration and natural corruption robustness are not mutually exclusive targets under pruning, as evidenced by the improved safety aspects obtained by post-hoc unstructured pruning with increasing compression.

Updated: 2024-05-31 14:52:49

标题: 研究事后修剪感知卷积神经网络的校准性和鲁棒性: 图像分类基准研究

摘要: 卷积神经网络（CNNs）在许多计算机视觉任务中取得了最先进的性能。然而，高计算和存储需求阻碍了它们在资源受限环境（如嵌入式设备）中的部署。模型修剪通过减小模型大小，同时保持优越性能，有助于满足这些限制。与此同时，安全关键应用不仅仅限于资源和性能约束。特别是，预测不能过于自信，即提供正确校准的不确定性估计（正确的不确定性校准），CNNs必须对自然发生的输入扰动（自然的破坏稳健性）具有鲁棒性。本研究探讨了在图像分类任务的背景下，当前研究后处理CNN修剪技术在不确定性校准、自然破坏稳健性和性能之间的重要权衡。我们的研究表明，后处理修剪显著改善了模型的不确定性校准、性能和自然破坏稳健性，为安全和稳健的嵌入式CNNs带来了希望。此外，在修剪下，不确定性校准和自然破坏稳健性并不是互斥的目标，正如后处理非结构化修剪通过增加压缩获得的改善安全性方面所证明的那样。

更新时间: 2024-05-31 14:52:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20876v1

TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression

Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. While the literature addresses this by proposing alternate formulations to mitigate the impact of the predicted covariance, we focus on improving the predicted covariance itself. We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? (2) In the absence of supervision, how can we quantify the accuracy of covariance estimation? We address (1) with a Taylor Induced Covariance (TIC), which captures the randomness of the predicted mean by incorporating its gradient and curvature through the second order Taylor polynomial. Furthermore, we tackle (2) by introducing a Task Agnostic Correlations (TAC) metric, which combines the notion of correlations and absolute error to evaluate the covariance. We evaluate TIC-TAC across multiple experiments spanning synthetic and real-world datasets. Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood. Our code is available at https://github.com/vita-epfl/TIC-TAC

Updated: 2024-05-31 14:51:58

标题: TIC-TAC：深度异方差回归中改进协方差估计的框架

摘要: 深度异方差回归涉及联合优化预测分布的均值和协方差，使用负对数似然。然而，最近的研究表明，由于与协方差估计相关的挑战，这可能导致次优收敛。尽管文献通过提出替代公式来减轻预测协方差的影响来解决这个问题，但我们专注于改进预测协方差本身。我们研究了两个问题：（1）预测协方差是否真正捕捉到预测均值的随机性？（2）在没有监督的情况下，我们如何量化协方差估计的准确性？我们通过Taylor诱导协方差（TIC）来解决（1），通过第二阶Taylor多项式结合梯度和曲率来捕捉预测均值的随机性。此外，我们通过引入任务无关相关（TAC）度量来解决（2），该度量结合了相关性和绝对误差的概念来评估协方差。我们在跨多个实验涵盖合成和真实数据集的情况下评估了TIC-TAC。我们的结果表明，TIC不仅准确地学习了协方差，还有助于负对数似然的改进收敛。我们的代码可以在https://github.com/vita-epfl/TIC-TAC上找到。

更新时间: 2024-05-31 14:51:58

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2310.18953v2

Multivariate Probabilistic Time Series Forecasting with Correlated Errors

Accurately modeling the correlation structure of errors is essential for reliable uncertainty quantification in probabilistic time series forecasting. Recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data frequently exhibit significant error autocorrelation and cross-lag correlation due to factors such as missing covariates. In this paper, we present a plug-and-play method that learns the covariance structure of errors over multiple steps for autoregressive models with Gaussian-distributed errors. To achieve scalable inference and computational efficiency, we model the contemporaneous covariance using a low-rank-plus-diagonal parameterization and characterize cross-covariance through a group of independent latent temporal processes. The learned covariance matrix can be used to calibrate predictions based on observed residuals. We evaluate our method on probabilistic models built on RNN and Transformer architectures, and the results confirm the effectiveness of our approach in enhancing predictive accuracy and uncertainty quantification without significantly increasing the parameter size.

Updated: 2024-05-31 14:49:11

标题: 具有相关误差的多元概率时间序列预测

摘要: 准确建模误差相关结构对于可靠的概率时间序列预测中的不确定性量化至关重要。最近针对多元时间序列的深度学习模型已经开发出了对于时变的同时协方差的高效参数化，但它们通常假设误差在时间上是独立的，以简化问题。然而，真实世界的数据经常表现出显著的误差自相关性和交叉滞后相关性，这是由于缺失的协变量等因素导致的。在本文中，我们提出了一种即插即用的方法，用于学习具有高斯分布误差的自回归模型在多个步骤上的协方差结构。为了实现可扩展的推断和计算效率，我们使用低秩加对角线参数化来建模同时协方差，并通过一组独立的潜在时间过程来表征交叉协方差。学习的协方差矩阵可以用来校准基于观察残差的预测。我们在建立在RNN和Transformer架构上的概率模型上评估了我们的方法，结果证实了我们的方法在提高预测准确性和不确定性量化方面的有效性，而不显著增加参数大小。

更新时间: 2024-05-31 14:49:11

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.01000v3

Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

The unprecedented demand for computing resources to train DNN models has led to a search for minimal numerical encoding. Recent state-of-the-art (SOTA) proposals advocate for multi-level scaled narrow bitwidth numerical formats. In this paper, we show that single-level scaling is sufficient to maintain training accuracy while maximizing arithmetic density. We identify a previously proposed single-level scaled format for 8-bit training, Hybrid Block Floating Point (HBFP), as the optimal candidate to minimize. We perform a full-scale exploration of the HBFP design space using mathematical tools to study the interplay among various parameters and identify opportunities for even smaller encodings across layers and epochs. Based on our findings, we propose Accuracy Booster, a mixed-mantissa HBFP technique that uses 4-bit mantissas for over 99% of all arithmetic operations in training and 6-bit mantissas only in the last epoch and first/last layers. We show Accuracy Booster enables increasing arithmetic density over all other SOTA formats by at least 2.3x while achieving state-of-the-art accuracies in 4-bit training.

Updated: 2024-05-31 14:47:25

标题: 准确度增强器：实现DNN训练的4位定点算术

摘要: 对深度神经网络（DNN）模型训练计算资源的需求空前增加，促使人们寻找最小化数值编码的方法。最新的最高水平（SOTA）提议倡导采用多级缩放窄位数值格式。本文展示了单级缩放足以保持训练精度，并最大化算术密度。我们确定了之前提出的8位训练的单级缩放格式Hybrid Block Floating Point（HBFP）作为最佳选择。我们使用数学工具对HBFP设计空间进行全面探索，研究各种参数之间的相互作用，并确定跨层和时期更小编码的机会。根据我们的发现，我们提出了Accuracy Booster，一种混合尾数HBFP技术，其中使用4位尾数进行超过99%的所有训练中的算术操作，而仅在最后时期和第一/最后层中使用6位尾数。我们展示了Accuracy Booster能够在4位训练中实现最先进的准确性，同时将算术密度提高至少2.3倍，超过所有其他SOTA格式。

更新时间: 2024-05-31 14:47:25

领域: cs.LG

下载: http://arxiv.org/abs/2211.10737v4

Automatic Channel Pruning for Multi-Head Attention

Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon.

Updated: 2024-05-31 14:47:20

标题: 多头注意力模型的自动通道剪枝

摘要: 尽管Transformers表现强劲，但它们的二次计算复杂性在应用于视觉任务时存在挑战。自动剪枝是减少计算复杂性的有效方法之一，而无需启发式方法。然而，直接将其应用于多头注意力并不简单，因为存在通道不对齐问题。在本文中，我们提出了一种自动通道剪枝方法，以考虑多头注意力机制。首先，我们将基于通道相似性的权重整合到剪枝指标中，以保留每个头部中更多的信息通道。然后，我们调整剪枝指标，以确保在所有头部中等比例地移除通道，防止通道不对齐。我们还添加了一个重新加权模块，用于补偿由于通道移除而导致的信息丢失，以及一个有效的初始化步骤，根据原始结构和每个通道之间的注意力差异来设置剪枝指标。我们提出的方法不仅可以用于原始注意力，还可以用于线性注意力，后者相对于标记数量具有更高的效率。在ImageNet-1K上，将我们的剪枝方法应用于FLattenTransformer，该模型包含了注意力机制，相比先前的最新高效模型和剪枝方法，显示出更高的准确性。代码将很快提供。

更新时间: 2024-05-31 14:47:20

领域: cs.CV,cs.AI,cs.CC

下载: http://arxiv.org/abs/2405.20867v1

ABodyBuilder3: Improved and scalable antibody structure predictions

Accurate prediction of antibody structure is a central task in the design and development of monoclonal antibodies, notably to understand both their developability and their binding properties. In this article, we introduce ABodyBuilder3, an improved and scalable antibody structure prediction model based on ImmuneBuilder. We achieve a new state-of-the-art accuracy in the modelling of CDR loops by leveraging language model embeddings, and show how predicted structures can be further improved through careful relaxation strategies. Finally, we incorporate a predicted Local Distance Difference Test into the model output to allow for a more accurate estimation of uncertainties.

Updated: 2024-05-31 14:45:11

标题: ABodyBuilder3：改进和可扩展的抗体结构预测

摘要: 准确预测抗体结构是设计和开发单克隆抗体的核心任务，特别是为了理解它们的可发展性和结合特性。在本文中，我们介绍了ABodyBuilder3，这是一个基于ImmuneBuilder的改进和可扩展的抗体结构预测模型。通过利用语言模型嵌入，我们在CDR环的建模中实现了新的最先进准确性，并展示了如何通过谨慎的松弛策略进一步改进预测的结构。最后，我们将预测的局部距离差异测试纳入模型输出，以便更准确地估计不确定性。

更新时间: 2024-05-31 14:45:11

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2405.20863v1

BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning

In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious updates are made statistically close to the benign ones. In this paper, we first reveal the fact that planting subsequent backdoors with the same target label could significantly help to maintain the accuracy of previously planted backdoors, and then propose a novel proactive backdoor detection mechanism for FL named BackdoorIndicator, which has the server inject indicator tasks into the global model leveraging out-of-distribution (OOD) data, and then utilizing the fact that any backdoor samples are OOD samples with respect to benign samples, the server, who is completely agnostic of the potential backdoor types and target labels, can accurately detect the presence of backdoors in uploaded models, via evaluating the indicator tasks. We perform systematic and extensive empirical studies to demonstrate the consistently superior performance and practicality of BackdoorIndicator over baseline defenses, across a wide range of system and adversarial settings.

Updated: 2024-05-31 14:44:57

标题: 后门指标：利用OOD数据在联邦学习中实现积极的后门检测

摘要: 在联邦学习（FL）系统中，分散的数据所有者（客户端）可以将他们本地训练的模型上传到一个中央服务器，以共同训练一个全局模型。恶意客户可能通过上传有毒的本地模型，在全局模型中植入后门，导致在遇到攻击者定义的触发器时将某些样本误分类为目标类别。现有的后门防御在不同系统和对抗设置下表现不一致，特别是当恶意更新与良性更新在统计上非常接近时。在本文中，我们首先揭示了后续植入相同目标标签的后门可以显著帮助维持先前植入后门的准确性的事实，然后提出了一种针对FL的新型主动后门检测机制，名为BackdoorIndicator，该机制使服务器注入指示任务到全局模型中，利用分布外（OOD）数据，并利用后门样本相对于良性样本而言都是OOD样本的事实，服务器完全不知道潜在的后门类型和目标标签，可以通过评估指示任务准确检测上传模型中的后门是否存在。我们进行了系统性和广泛的实证研究，证明了BackdoorIndicator在各种系统和对抗设置下相对基线防御具有持续优越的性能和实用性。

更新时间: 2024-05-31 14:44:57

领域: cs.CR

下载: http://arxiv.org/abs/2405.20862v1

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29% fewer samples than baselines, and reduces training time by 21--38%.

Updated: 2024-05-31 14:44:05

标题: 通过样本操纵提高安全强化学习的效率

摘要: 安全强化学习（RL）对于在现实世界应用中部署RL代理至关重要，因为它旨在最大化长期奖励同时满足安全约束。然而，安全RL经常受到样本效率低下的困扰，需要与环境进行大量交互才能学习安全策略。我们提出了一种名为Efficient Safe Policy Optimization（ESPO）的新方法，通过样本操作提高安全RL的效率。ESPO采用一个优化框架，具有三种模式：最大化奖励、最小化成本以及在两者之间平衡权衡。通过根据观察到的奖励和安全梯度之间的冲突动态调整采样过程，ESPO在理论上保证了收敛性、优化稳定性和改进的样本复杂度边界。在Safety-MuJoCo和Omnisafe基准上的实验表明，ESPO在奖励最大化和约束满足方面明显优于现有的基于原始和基于原始-对偶的基线。此外，ESPO在样本效率上取得了显著的收益，比基线需要25-29%更少的样本，并将训练时间缩短了21-38%。

更新时间: 2024-05-31 14:44:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.20860v1

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments, and further test its usefulness as an evaluation instrument, along a number of dimensions: We show that it can easily keep up with new developments while avoiding data contamination, we show that the tests implemented within it are not yet saturated (human performance is substantially higher than that of even the best models), and we show that it lends itself to investigating additional questions, such as the impact of the prompting language on performance. We believe that the approach forms a good basis for making decisions on model choice for building applied interactive systems, and perhaps ultimately setting up a closed-loop development environment of system and simulated evaluator.

Updated: 2024-05-31 14:43:31

标题: clembench-2024：一个具有挑战性、动态、互补、多语言的基准测试和底层灵活框架，用于LLMs作为多动作代理。

摘要: 最近的工作已经证实，大型语言模型（LLM）可以被提示进行“自我对话”对话游戏，探索特定能力（一般指令遵循，战略目标定位，语言理解能力），其中结果互动游戏可以自动评分。在本文中，我们采用了一个提出的框架来设置这种游戏环境，并进一步测试其作为评估工具的实用性，沿着许多维度：我们展示它可以轻松跟上新的发展同时避免数据污染，我们展示其中实施的测试尚未饱和（人类表现明显高于最好的模型），我们展示它有助于探究其他问题，比如提示语言对性能的影响。我们相信这种方法为构建应用交互系统的模型选择决策提供了良好的基础，也许最终可以建立一个系统和模拟评估者的闭环开发环境。

更新时间: 2024-05-31 14:43:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20859v1

CycleFormer : TSP Solver Based on Language Modeling

We propose a new transformer model for the Traveling Salesman Problem (TSP) called CycleFormer. We identified distinctive characteristics that need to be considered when applying a conventional transformer model to TSP and aimed to fully incorporate these elements into the TSP-specific transformer. Unlike the token sets in typical language models, which are limited and static, the token (node) set in TSP is unlimited and dynamic. To exploit this fact to the fullest, we equated the encoder output with the decoder linear layer and directly connected the context vector of the encoder to the decoder encoding. Additionally, we added a positional encoding to the encoder tokens that reflects the two-dimensional nature of TSP, and devised a circular positional encoding for the decoder tokens that considers the cyclic properties of a tour. By incorporating these ideas, CycleFormer outperforms state-of-the-art (SOTA) transformer models for TSP from TSP-50 to TSP-500. Notably, on TSP-500, the optimality gap was reduced by approximately 2.8 times, from 3.09% to 1.10%, compared to the existing SOTA. The code will be made available at https://github.com/Giventicket/CycleFormer.

Updated: 2024-05-31 14:42:52

标题: CycleFormer：基于语言建模的TSP求解器

摘要: 我们提出了一个新的transformer模型CycleFormer用于解决旅行推销员问题（TSP）。我们确定了将传统transformer模型应用于TSP时需要考虑的独特特征，并旨在将这些元素完全融入到TSP特定的transformer中。与典型语言模型中的令牌集合有限且静态不同，TSP中的令牌（节点）集合是无限且动态的。为了充分利用这一事实，我们将编码器输出与解码器线性层等同，并直接将编码器的上下文向量连接到解码器编码中。此外，我们为编码器令牌添加了一个位置编码，反映了TSP的二维特性，并为解码器令牌设计了一个考虑了旅游的循环属性的圆形位置编码。通过融合这些思想，CycleFormer在TSP-50到TSP-500的TSP的transformer模型上表现出色。值得注意的是，在TSP-500上，与现有的SOTA相比，最优性差距减少了约2.8倍，从3.09%降至1.10%。代码将在https://github.com/Giventicket/CycleFormer上提供。

更新时间: 2024-05-31 14:42:52

领域: cs.LG

下载: http://arxiv.org/abs/2405.20042v2

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear regression optimization with a regularization on the distance between the to-be-learned target parameters and the already-learned source parameters. We analytically characterize the generalization performance of our transfer learning approach and demonstrate its ability to resolve the peak in generalization errors in double descent phenomena of the minimum L2-norm solution to linear regression. Moreover, we show that for sufficiently related tasks, the optimally tuned transfer learning approach can outperform the optimally tuned ridge regression method, even when the true parameter vector conforms to an isotropic Gaussian prior distribution. Namely, we demonstrate that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task. Our results emphasize the ability of transfer learning to extend the solution space to the target task and, by that, to have an improved MMSE solution. We formulate the linear MMSE solution to our transfer learning setting and point out its key differences from the common design philosophy to transfer learning.

Updated: 2024-05-31 14:35:18

标题: 迁移学习的共同直觉可能成功或失败：线性回归案例研究

摘要: 我们研究了从源到目标线性回归任务的基本迁移学习过程，包括在存在更多学习参数而不是数据样本的超参数化设置中。目标任务学习通过使用其训练数据以及先前为源任务计算的参数来解决。我们定义了一种针对目标任务的迁移学习方法，即在目标参数和已学习的源参数之间的距离上进行正则化的线性回归优化。我们从分析的角度表征了我们的迁移学习方法的泛化性能，并展示了其能够解决线性回归最小L2范数解中泛化错误高峰的双下降现象。此外，我们表明，对于足够相关的任务，经过最佳调优的迁移学习方法可以胜过经过最佳调优的岭回归方法，即使真实参数向量符合各向同性高斯先验分布。换句话说，我们展示了迁移学习可以击败独立目标任务的最小均方误差（MMSE）解。我们的结果强调了迁移学习扩展解空间到目标任务的能力，并由此获得改进的MMSE解。我们将线性MMSE解形式化为我们的迁移学习设置，并指出它与常见的迁移学习设计理念之间的关键差异。

更新时间: 2024-05-31 14:35:18

领域: cs.LG

下载: http://arxiv.org/abs/2103.05621v4

SLIM: a Scalable Light-weight Root Cause Analysis for Imbalanced Data in Microservice

The newly deployed service -- one kind of change service, could lead to a new type of minority fault. Existing state-of-the-art methods for fault localization rarely consider the imbalanced fault classification in change service. This paper proposes a novel method that utilizes decision rule sets to deal with highly imbalanced data by optimizing the F1 score subject to cardinality constraints. The proposed method greedily generates the rule with maximal marginal gain and uses an efficient minorize-maximization (MM) approach to select rules iteratively, maximizing a non-monotone submodular lower bound. Compared with existing fault localization algorithms, our algorithm can adapt to the imbalanced fault scenario of change service, and provide interpretable fault causes which are easy to understand and verify. Our method can also be deployed in the online training setting, with only about 15% training overhead compared to the current SOTA methods. Empirical studies showcase that our algorithm outperforms existing fault localization algorithms in both accuracy and model interpretability.

Updated: 2024-05-31 14:32:31

标题: SLIM：微服务中不平衡数据的可扩展轻量级根本原因分析

摘要: 新部署的服务——一种变更服务，可能导致一种新型的少数故障。现有的最先进故障定位方法很少考虑变更服务中不平衡的故障分类。本文提出了一种利用决策规则集处理高度不平衡数据的新方法，通过优化F1分数以满足基数约束。所提出的方法贪婪地生成具有最大边际增益的规则，并使用高效的小化极大（MM）方法迭代地选择规则，最大化一个非单调次模拟下限。与现有的故障定位算法相比，我们的算法可以适应变更服务中的不平衡故障场景，并提供易于理解和验证的可解释的故障原因。我们的方法还可以在在线训练设置中部署，与当前最先进方法相比，仅有约15％的训练开销。实证研究表明，我们的算法在准确性和模型可解释性方面优于现有的故障定位算法。

更新时间: 2024-05-31 14:32:31

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20848v1

Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models

Image-based advertisements are complex multimodal stimuli that often contain unusual visual elements and figurative language. Previous research on automatic ad understanding has reported impressive zero-shot accuracy of contrastive vision-and-language models (VLMs) on an ad-explanation retrieval task. Here, we examine the original task setup and show that contrastive VLMs can solve it by exploiting grounding heuristics. To control for this confound, we introduce TRADE, a new evaluation test set with adversarial grounded explanations. While these explanations look implausible to humans, we show that they "fool" four different contrastive VLMs. Our findings highlight the need for an improved operationalisation of automatic ad understanding that truly evaluates VLMs' multimodal reasoning abilities. We make our code and TRADE available at https://github.com/dmg-illc/trade .

Updated: 2024-05-31 14:31:46

标题: 不要购买！重新评估对比多模态模型的广告理解能力

摘要: 基于图像的广告是复杂的多模态刺激，通常包含不寻常的视觉元素和比喻语言。先前关于自动广告理解的研究报道了对比视觉和语言模型（VLMs）在广告解释检索任务上的令人印象深刻的零射击准确性。在这里，我们检查了原始任务设置，并展示对比VLMs可以通过利用接地启发式解决它。为了控制这种混淆，我们引入了TRADE，一个具有对抗性接地解释的新评估测试集。虽然这些解释对人类来说看起来不合理，但我们显示它们“愚弄”了四种不同的对比VLMs。我们的发现突显了需要改进自动广告理解的操作化，以真正评估VLMs的多模态推理能力。我们将我们的代码和TRADE提供在https://github.com/dmg-illc/trade。

更新时间: 2024-05-31 14:31:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20846v1

Multi-hop Question Answering

The task of Question Answering (QA) has attracted significant research interest for long. Its relevance to language understanding and knowledge retrieval tasks, along with the simple setting makes the task of QA crucial for strong AI systems. Recent success on simple QA tasks has shifted the focus to more complex settings. Among these, Multi-Hop QA (MHQA) is one of the most researched tasks over the recent years. In broad terms, MHQA is the task of answering natural language questions that involve extracting and combining multiple pieces of information and doing multiple steps of reasoning. An example of a multi-hop question would be "The Argentine PGA Championship record holder has won how many tournaments worldwide?". Answering the question would need two pieces of information: "Who is the record holder for Argentine PGA Championship tournaments?" and "How many tournaments did [Answer of Sub Q1] win?". The ability to answer multi-hop questions and perform multi step reasoning can significantly improve the utility of NLP systems. Consequently, the field has seen a surge with high quality datasets, models and evaluation strategies. The notion of 'multiple hops' is somewhat abstract which results in a large variety of tasks that require multi-hop reasoning. This leads to different datasets and models that differ significantly from each other and makes the field challenging to generalize and survey. We aim to provide a general and formal definition of the MHQA task, and organize and summarize existing MHQA frameworks. We also outline some best practices for building MHQA datasets. This book provides a systematic and thorough introduction as well as the structuring of the existing attempts to this highly interesting, yet quite challenging task.

Updated: 2024-05-31 14:28:40

标题: 多跳问题回答

摘要: 问题回答（QA）任务长期以来一直吸引着广泛的研究兴趣。与语言理解和知识检索任务的相关性以及简单的设置使得QA任务对于强大的人工智能系统至关重要。最近在简单QA任务上取得的成功使得研究重心转向更复杂的设置。在这些任务中，多跳问题回答（MHQA）是近年来研究最多的任务之一。广义上讲，MHQA是回答涉及提取和组合多个信息片段以及进行多步推理的自然语言问题的任务。一个多跳问题的例子是：“阿根廷PGA锦标赛纪录保持者在全球赢得了多少个锦标赛？”。回答这个问题需要两个信息片段：“谁是阿根廷PGA锦标赛的纪录保持者？”和“[子问题1的答案]赢得了多少个锦标赛？”。能够回答多跳问题并进行多步推理可以显著提高自然语言处理系统的效用。因此，该领域迎来了高质量的数据集、模型和评估策略的激增。‘多跳’的概念有些抽象，导致了需要多跳推理的各种任务的出现。这导致了不同的数据集和模型之间存在显著差异，使得该领域具有挑战性并且难以概括和调查。我们的目标是提供MHQA任务的一般和正式定义，以及组织和总结现有的MHQA框架。我们还概述了构建MHQA数据集的一些最佳实践。本书提供了对这个极具吸引力但相当具有挑战性任务的现有尝试的系统和全面的介绍。

更新时间: 2024-05-31 14:28:40

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2204.09140v2

Gameplay Filters: Safe Robot Walking through Adversarial Imagination

Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety.

Updated: 2024-05-31 14:26:47

标题: 游戏过滤器：通过对抗性想象实现安全的机器人行走

摘要: 确保腿式机器人在不确定的、新颖的环境中安全运行对于它们广泛采用至关重要。尽管最近在安全过滤器方面取得了进展，可以防止任意任务驱动策略导致安全故障，但现有的腿式机器人运动解决方案仍然依赖简化的动力学，在机器人被从预定义的稳定步态中扰动时可能会失败。本文提出了一种通用方法，利用离线博弈理论强化学习来合成高度鲁棒的高阶非线性动力学安全过滤器。这种游戏过滤器通过不断模拟对抗性未来并阻止会导致未来游戏失败（从而违反安全性）的任务驱动行为，从而保持运行时安全性。在一个36维四足机器人运动任务上验证，游戏过滤器表现出天然的对模拟到真实环境差距的鲁棒性，而无需手动调整或启发式设计。物理实验展示了游戏安全过滤器在受到牵拉和未建模的不规则地形等扰动时的有效性，而模拟研究则揭示了如何在不影响安全性的情况下权衡计算和保守性。

更新时间: 2024-05-31 14:26:47

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.00846v2

IncomeSCM: From tabular data set to time-series simulator and causal estimation benchmark

Evaluating observational estimators of causal effects demands information that is rarely available: unconfounded interventions and outcomes from the population of interest, created either by randomization or adjustment. As a result, it is customary to fall back on simulators when creating benchmark tasks. Simulators offer great control but are often too simplistic to make challenging tasks, either because they are hand-designed and lack the nuances of real-world data, or because they are fit to observational data without structural constraints. In this work, we propose a general, repeatable strategy for turning observational data into sequential structural causal models and challenging estimation tasks by following two simple principles: 1) fitting real-world data where possible, and 2) creating complexity by composing simple, hand-designed mechanisms. We implement these ideas in a highly configurable software package and apply it to the well-known Adult income data set to construct the \tt IncomeSCM simulator. From this, we devise multiple estimation tasks and sample data sets to compare established estimators of causal effects. The tasks present a suitable challenge, with effect estimates varying greatly in quality between methods, despite similar performance in the modeling of factual outcomes, highlighting the need for dedicated causal estimators and model selection criteria.

Updated: 2024-05-31 14:25:58

标题: IncomeSCM：从表格数据集到时间序列模拟器和因果估计基准

摘要: 评估因果效应的观察估计器需要很少可用的信息：干预和感兴趣人群的结果不受混杂的影响，这些信息可以通过随机化或调整来创建。因此，通常在创建基准任务时会退而求其次使用模拟器。模拟器具有很好的控制性，但通常过于简单，无法创建具有挑战性的任务，要么是因为它们是手工设计的，缺乏真实数据的细微差别，要么是因为它们适用于没有结构约束的观察数据。在这项工作中，我们提出了一个将观察数据转换为序贯结构因果模型并通过遵循两个简单原则创建具有挑战性的估计任务的通用可重复策略：1）在可能的情况下拟合真实世界数据，2）通过组合简单的手工设计机制来创建复杂性。我们在一个高度配置的软件包中实现了这些想法，并将其应用于著名的成人收入数据集，构建了 IncomeSCM 模拟器。从中，我们设计了多个估计任务和样本数据集，以比较已建立的因果效应估计器。这些任务提供了一个适当的挑战，尽管在建模事实结果方面表现相似，但效应估计在方法之间质量差异很大，突显了需要专门的因果估计器和模型选择标准。

更新时间: 2024-05-31 14:25:58

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.16069v2

einspace: Searching for Neural Architectures from Fundamental Operations

Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar. Our space is versatile, supporting architectures of various sizes and complexities, while also containing diverse network operations which allow it to model convolutions, attention components and more. It contains many existing competitive architectures, and provides flexibility for discovering new ones. Using this search space, we perform experiments to find novel architectures as well as improvements on existing ones on the diverse Unseen NAS datasets. We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines. We believe that this work is an important advancement towards a transformative NAS paradigm where search space expressivity and strategic search initialisation play key roles.

Updated: 2024-05-31 14:25:45

标题: einspace：从基本操作中搜索神经结构

摘要: 神经架构搜索（NAS）可以为特定任务找到高性能网络。然而，NAS的结果相当平凡；例如，并未从卷积结构转变为transformers。这主要是因为NAS中的搜索空间通常不够多样化，无法事先包含这样的转换。相反，为了使NAS能够提供更大的潜力进行基本设计转变，我们需要一种基于更基本操作构建的新颖表达式搜索空间设计。为此，我们引入了einspace，这是一个基于参数化概率上下文无关语法的搜索空间。我们的空间多功能，支持各种大小和复杂度的架构，同时包含多样化的网络操作，使其能够建模卷积、注意力组件等。它包含许多现有竞争性架构，并为发现新架构提供灵活性。利用这一搜索空间，我们进行实验，以在多样化的未见NAS数据集上找到新颖架构和对现有架构的改进。我们展示了可以通过从头开始搜索获得具有竞争力的架构，并且当使用强基线初始化搜索时，我们始终能够获得巨大的改进。我们相信这项工作对于实现一个具有转变性的NAS范式是一个重要进步，其中搜索空间表达能力和战略搜索初始化发挥关键作用。

更新时间: 2024-05-31 14:25:45

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.20838v1

Solving partial differential equations with sampled neural networks

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent probability distributions allows us to progress on both challenges. In most examples, the random sampling schemes outperform iterative, gradient-based optimization of physics-informed neural networks regarding training time and accuracy by several orders of magnitude. For time-dependent PDE, we construct neural basis functions only in the spatial domain and then solve the associated ordinary differential equation with classical methods from scientific computing over a long time horizon. This alleviates one of the greatest challenges for neural PDE solvers because it does not require us to parameterize the solution in time. For second-order elliptic PDE in Barron spaces, we prove the existence of sampled networks with $L^2$ convergence to the solution. We demonstrate our approach on several time-dependent and static PDEs. We also illustrate how sampled networks can effectively solve inverse problems in this setting. Benefits compared to common numerical schemes include spectral convergence and mesh-free construction of basis functions.

Updated: 2024-05-31 14:24:39

标题: 用采样神经网络解决偏微分方程

摘要: 解决偏微分方程（PDE）的近似解是计算科学和工程中一个重要问题。使用神经网络作为解的假设已被证明在训练时间和逼近精度方面是一个挑战。在本文中，我们讨论如何从数据无关和数据相关的概率分布中对解的隐藏权重和偏置进行采样，以便在这两个挑战上取得进展。在大多数示例中，随机采样方案在训练时间和准确性方面均比基于迭代梯度的物理信息神经网络优化要优越几个数量级。对于时变PDE，我们仅在空间域中构建神经基函数，然后使用科学计算中的经典方法解决相关的常微分方程，覆盖一个长时间范围。这缓解了神经PDE求解器面临的最大挑战之一，因为它不需要我们在时间上对解进行参数化。对于巴伦空间中的二阶椭圆PDE，我们证明了具有$L^2$收敛性的采样网络存在，并演示了我们的方法在几个时变和静态PDE上的应用。我们还说明了采样网络如何有效地解决此设置中的逆问题。与常见数值方案相比，其优势包括谱收敛和无网格构建基函数。

更新时间: 2024-05-31 14:24:39

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2405.20836v1

Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial for evaluating activation magnitudes and identifying outliers, which can distort the quantization range and negatively impact performance. Our analysis reveals a marked contrast in quantization effectiveness across models. The older OPT model, which much of the quantization literature is based on, shows significant performance deterioration and high susceptibility to outliers with varying calibration sets. In contrast, newer models like Llama-2 7B, Llama-3 8B, Command-R 35B, and Mistral 7B demonstrate strong robustness, with Mistral 7B showing near-immunity to outliers and stable activations. These findings suggest a shift in PTQ strategies might be needed. As advancements in pre-training methods reduce the relevance of outliers, there is an emerging need to reassess the fundamentals of current quantization literature. The emphasis should pivot towards optimizing inference speed, rather than primarily focusing on outlier preservation, to align with the evolving characteristics of state-of-the-art LLMs.

Updated: 2024-05-31 14:24:33

标题: 异常值和校准集对现代LLMs的量化影响逐渐减小

摘要: 后训练量化（PTQ）通过减少内存使用，提高大型语言模型（LLMs）的效率，使其能够更快地运行并与更易获得的硬件兼容，但会稍微降低性能。我们探讨了校准集在PTQ中的作用，特别是它们对各种知名开源LLMs中隐藏激活的影响。校准集对评估激活幅度和识别异常值至关重要，这些异常值可能扭曲量化范围并对性能产生负面影响。我们的分析揭示了不同模型之间的量化效果明显差异。基于许多量化文献的老OPT模型显示出明显的性能恶化，并对校准集变化的异常值高度敏感。相比之下，像Llama-27B、Llama-38B、Command-R35B和Mistral7B这样的新模型表现出强大的稳健性，其中Mistral7B几乎对异常值免疫且激活稳定。这些发现表明，可能需要对PTQ策略进行调整。随着预训练方法的进步降低了异常值的相关性，有必要重新评估当前量化文献的基本原理。重点应转向优化推理速度，而不是主要关注异常值保留，以适应最先进LLMs不断发展的特征。

更新时间: 2024-05-31 14:24:33

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20835v1

Attention-aware Semantic Communications for Collaborative Inference

We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. Therefore, instead of employing the partitioning strategy, our framework utilizes a lightweight ViT model on the edge device, with the server deploying a complicated ViT model. To enhance communication efficiency and achieve the classification accuracy of the server model, we propose two strategies: 1) attention-aware patch selection and 2) entropy-aware image transmission. Attention-aware patch selection leverages the attention scores generated by the edge device's transformer encoder to identify and select the image patches critical for classification. This strategy enables the edge device to transmit only the essential patches to the server, significantly improving communication efficiency. Entropy-aware image transmission uses min-entropy as a metric to accurately determine whether to depend on the lightweight model on the edge device or to request the inference from the server model. In our framework, the lightweight ViT model on the edge device acts as a semantic encoder, efficiently identifying and selecting the crucial image information required for the classification task. Our experiments demonstrate that the proposed collaborative inference framework can reduce communication overhead by 68% with only a minimal loss in accuracy compared to the server model on the ImageNet dataset.

Updated: 2024-05-31 14:23:09

标题: 关注感知的语义通信用于协作推断

摘要: 我们提出了一种通信高效的边缘推理领域的协作推理框架，重点是有效利用视觉transformer（ViT）模型。传统协作推理的分区策略未能降低通信成本，因为ViTs固有的架构在整个transformer编码器中保持一致的层维度。因此，我们的框架不使用分区策略，而是在边缘设备上使用轻量级ViT模型，服务器部署复杂的ViT模型。为了增强通信效率并实现服务器模型的分类准确性，我们提出了两种策略：1）注意力感知的补丁选择和2）熵感知的图像传输。注意力感知的补丁选择利用边缘设备transformer编码器生成的注意力分数来识别和选择对分类至关重要的图像补丁。这种策略使得边缘设备仅传输必要的补丁到服务器，显著提高通信效率。熵感知的图像传输使用最小熵作为度量标准，准确确定是依赖于边缘设备上的轻量级模型还是请求服务器模型的推理。在我们的框架中，边缘设备上的轻量级ViT模型充当语义编码器，高效地识别和选择分类任务所需的关键图像信息。我们的实验表明，与ImageNet数据集上服务器模型相比，所提出的协作推理框架可以将通信开销减少68%，而仅有微小的准确度损失。

更新时间: 2024-05-31 14:23:09

领域: eess.SP,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07217v2

Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment

Traditional language model alignment methods, such as Direct Preference Optimization (DPO), are limited by their dependence on static, pre-collected paired preference data, which hampers their adaptability and practical applicability. To overcome this limitation, we introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation. Specifically, we employ an Exponential Moving Average (EMA) model in conjunction with a replay buffer to enable dynamic updates of response segments, effectively integrating real-time feedback with insights from historical data. Our comprehensive evaluations of the LLaMA3-8B and Mistral-7B models across benchmarks, including the Open LLM Leaderboard, IFEval, AlpacaEval 2.0, and MT-Bench, demonstrate that SAPO matches or surpasses established offline contrastive baselines, such as DPO and Odds Ratio Preference Optimization, and outperforms offline self-play methods like SPIN. Our code is available at https://github.com/yinyueqin/SAPO

Updated: 2024-05-31 14:21:04

标题: 自增强偏好优化：用于语言模型对齐的离策略范式

摘要: 传统的语言模型对齐方法，如直接偏好优化（DPO），受限于对静态、预先收集的成对偏好数据的依赖，这限制了它们的适应性和实际适用性。为了克服这一限制，我们引入了自我增强偏好优化（SAPO），这是一种有效且可扩展的训练范例，不需要现有的成对数据。建立在自我对弈概念之上，这种概念自动生成负面响应，我们进一步结合了一个离策略学习管道，以增强数据的探索和开发。具体来说，我们利用指数移动平均（EMA）模型与重放缓冲区相结合，实现对响应段的动态更新，有效地将实时反馈与历史数据的见解整合在一起。我们对LLaMA3-8B和Mistral-7B模型在各种基准测试中的全面评估，包括Open LLM排行榜，IFEval，AlpacaEval 2.0和MT-Bench，表明SAPO与已建立的离线对比基线（如DPO和赔率比偏好优化）相匹敌甚至超越，并且优于离线自我对弈方法如SPIN。我们的代码可在https://github.com/yinyueqin/SAPO找到。

更新时间: 2024-05-31 14:21:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20830v1

Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applications, and 2) unlabeled training datasets are utilized for evaluation, where such transductive inference might not adequately address challenges in the wild. In this paper, we aim to generalize OWSSL by addressing them. Our work suggests that practical OWSSL may require different training settings, evaluation methods, and learning strategies compared to those prevalent in the existing literature.

Updated: 2024-05-31 14:21:00

标题: 重新思考开放世界半监督学习：分布不匹配和归纳推断

摘要: 开放世界半监督学习（OWSSL）通过考虑未标记数据集中的新类别，将传统的半监督学习扩展到开放世界场景。尽管在OWSSL方面取得了最近的进展，成功往往取决于以下假设：1）标记和未标记数据集共享相同的平衡类先验分布，这在实际应用中通常不成立；2）未标记训练数据集用于评估，这种转导推断可能无法充分解决实际挑战。本文旨在通过解决这些问题来泛化OWSSL。我们的工作表明，实际应用中OWSSL可能需要与现有文献中普遍存在的训练设置、评估方法和学习策略不同。

更新时间: 2024-05-31 14:21:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20829v1

Interpretable Knowledge Tracing via Response Influence-based Counterfactual Reasoning

Knowledge tracing (KT) plays a crucial role in computer-aided education and intelligent tutoring systems, aiming to assess students' knowledge proficiency by predicting their future performance on new questions based on their past response records. While existing deep learning knowledge tracing (DLKT) methods have significantly improved prediction accuracy and achieved state-of-the-art results, they often suffer from a lack of interpretability. To address this limitation, current approaches have explored incorporating psychological influences to achieve more explainable predictions, but they tend to overlook the potential influences of historical responses. In fact, understanding how models make predictions based on response influences can enhance the transparency and trustworthiness of the knowledge tracing process, presenting an opportunity for a new paradigm of interpretable KT. However, measuring unobservable response influences is challenging. In this paper, we resort to counterfactual reasoning that intervenes in each response to answer \textit{what if a student had answered a question incorrectly that he/she actually answered correctly, and vice versa}. Based on this, we propose RCKT, a novel response influence-based counterfactual knowledge tracing framework. RCKT generates response influences by comparing prediction outcomes from factual sequences and constructed counterfactual sequences after interventions. Additionally, we introduce maximization and inference techniques to leverage accumulated influences from different past responses, further improving the model's performance and credibility. Extensive experimental results demonstrate that our RCKT method outperforms state-of-the-art knowledge tracing methods on four datasets against six baselines, and provides credible interpretations of response influences.

Updated: 2024-05-31 14:19:03

标题: 通过基于响应影响的对事实推理解释性知识追踪

摘要: 知识追踪（KT）在计算机辅助教育和智能辅导系统中起着至关重要的作用，旨在通过根据学生过去的反应记录预测他们对新问题的未来表现来评估学生的知识熟练度。虽然现有的深度学习知识追踪（DLKT）方法显著提高了预测准确性并取得了最新的成果，但它们常常缺乏可解释性。为了解决这一局限性，当前的方法已经开始探索融入心理影响以实现更可解释的预测，但他们倾向于忽视历史反应的潜在影响。事实上，了解模型如何基于反应影响进行预测可以增强知识追踪过程的透明性和可信度，为可解释性KT的新范式提供了机会。然而，衡量不可观察的反应影响是具有挑战性的。在本文中，我们借助反事实推理，干预每个反应来回答“如果一个学生回答了他/她实际上回答正确的问题，反之亦然，他/她回答了错误的问题会怎样”。基于此，我们提出了RCKT，一种基于反应影响的新颖反事实知识追踪框架。RCKT通过比较干预后的事实序列和构建的反事实序列的预测结果来生成反应影响。此外，我们引入了最大化和推理技术，以利用不同过去反应积累的影响，进一步提高模型的性能和可信度。大量实验结果表明，我们的RCKT方法在四个数据集上优于六个基线的最新知识追踪方法，并提供了可信的反应影响解释。

更新时间: 2024-05-31 14:19:03

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.10045v2

Analysis of clinical, dosimetric and radiomic features for predicting local failure after stereotactic radiotherapy of brain metastases in malignant melanoma

Background: The aim of this study was to investigate the role of clinical, dosimetric and pretherapeutic magnetic resonance imaging (MRI) features for lesion-specific outcome prediction of stereotactic radiotherapy (SRT) in patients with brain metastases from malignant melanoma (MBM). Methods: In this multicenter, retrospective analysis, we reviewed 517 MBM from 130 patients treated with SRT (single fraction or hypofractionated). For each gross tumor volume (GTV) 1576 radiomic features (RF) were calculated (788 each for the GTV and for a 3 mm margin around the GTV). Clinical parameters, radiation dose and RF from pretherapeutic contrast-enhanced T1-weighted MRI from different institutions were evaluated with a feature processing and elimination pipeline in a nested cross-validation scheme. Results: Seventy-two (72) of 517 lesions (13.9%) showed a local failure (LF) after SRT. The processing pipeline showed clinical, dosimetric and radiomic features providing information for LF prediction. The most prominent ones were the correlation of the gray level co-occurrence matrix of the margin (hazard ratio (HR): 0.37, confidence interval (CI): 0.23-0.58) and systemic therapy before SRT (HR: 0.55, CI: 0.42-0.70). The majority of RF associated with LF was calculated in the margin around the GTV. Conclusions: Pretherapeutic MRI based RF connected with lesion-specific outcome after SRT could be identified, despite multicentric data and minor differences in imaging protocols. Image data analysis of the surrounding metastatic environment may provide therapy-relevant information with the potential to further individualize radiotherapy strategies.

Updated: 2024-05-31 14:18:37

标题: 《恶性黑色素瘤脑转移经立体定向放疗后局部失败的临床、剂量和放射组织学特征分析》

摘要: 背景：本研究的目的是调查临床、剂量和术前磁共振成像（MRI）特征在恶性黑色素瘤（MBM）脑转移患者立体定向放射治疗（SRT）的病变特异性预后预测中的作用。方法：在这项多中心、回顾性分析中，我们回顾了130名接受SRT治疗的MBM患者的517个MBM。对每个肿瘤体积（GTV）计算了1576个放射学特征（RF）（每个GTV和GTV周围3毫米边缘各788个）。利用特征处理和排除管道，在嵌套交叉验证方案中评估了来自不同机构的临床参数、放射剂量和术前增强T1加权MRI的RF。结果：517个病变中有72个（13.9%）在SRT后出现局部失败（LF）。处理管道显示临床、剂量和放射学特征提供了LF预测的信息。最突出的是边缘的灰度级共现矩阵的相关性（风险比（HR）：0.37，置信区间（CI）：0.23-0.58）和SRT前的全身治疗（HR：0.55，CI：0.42-0.70）。与LF相关的大多数RF是在GTV周围的边缘计算的。结论：尽管有多中心数据和成像协议中的轻微差异，但可识别与SRT后病变特异性结果相关的术前MRI基于RF。对周围转移环境的图像数据分析可能提供与治疗相关的信息，有潜力进一步个性化放射治疗策略。

更新时间: 2024-05-31 14:18:37

领域: physics.med-ph,cs.LG

下载: http://arxiv.org/abs/2405.20825v1

Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in \href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}.

Updated: 2024-05-31 14:18:31

标题: 解读RNA次级结构预测：基于概率K-Rook匹配的视角

摘要: 核糖核酸（RNA）的二级结构在细胞内比其三级结构更稳定和更易访问，因此对功能预测至关重要。尽管深度学习在这一领域显示出了很有前途的结果，但当前的方法存在着泛化能力差和复杂性高的问题。在这项工作中，我们将RNA二级结构预测重新定义为K-Rook问题，从而将预测过程简化为在有限解空间内的概率匹配。基于这一创新性的视角，我们引入了RFold，这是一种简单而有效的方法，它学会从给定序列中预测出最匹配的K-Rook解决方案。RFold采用二维优化策略，将概率匹配问题分解为行向和列向组件，以减少匹配复杂度，简化解决过程，同时确保输出的有效性。大量实验表明，RFold实现了竞争性能，并且比当前最先进的方法具有大约八倍更快的推理效率。代码和Colab演示可在\href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}中找到。

更新时间: 2024-05-31 14:18:31

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2212.14041v4

Distributed agency in second language learning and teaching through generative AI

Generative AI offers significant opportunities for language learning. Tools like ChatGPT can provide informal second language practice through chats in written or voice forms, with the learner specifying through prompts conversational parameters such as proficiency level, language register, and discussion topics. AI can be instructed to give corrective feedback, create practice exercises, or develop an extended study plan. Instructors can use AI to build learning and assessment materials in a variety of media. AI is likely to make immersive technologies more powerful and versatile, moving away from scripted interactions. For both learners and teachers, it is important to understand the limitations of AI systems that arise from their purely statistical model of human language, which limits their ability to deal with nuanced social and cultural aspects of language use. Additionally, there are ethical concerns over how AI systems are created as well as practical constraints in their use, especially for less privileged populations. The power and versatility of AI tools are likely to turn them into valuable and constant companions in many peoples lives (akin to smartphones), creating a close connection that goes beyond simple tool use. Ecological theories such as sociomaterialism are helpful in examining the shared agency that develops through close user-AI interactions, as are the perspectives on human-object relations from Indigenous cultures.

Updated: 2024-05-31 14:17:17

标题: 通过生成式人工智能在第二语言学习和教学中的分散代理

摘要: 生成式人工智能为语言学习提供了重要机会。像ChatGPT这样的工具可以通过书面或语音形式的聊天，为学习者提供非正式的第二语言练习，学习者可以通过提示指定会话参数，如熟练程度、语言风格和讨论主题。人工智能可以被指示提供纠正性反馈、创建练习题或制定扩展学习计划。教师可以利用人工智能在各种媒体上构建学习和评估材料。人工智能可能使沉浸式技术更加强大和多样化，摆脱了脚本化的互动。对于学习者和教师来说，重要的是要了解人工智能系统的局限性，这些局限性源于它们对人类语言的纯统计模型，这限制了它们处理语言使用中微妙的社会和文化方面的能力。此外，对于人工智能系统的创造存在伦理关切以及在使用中存在实际限制，尤其是对于较不幸运的人群。人工智能工具的强大和多样化可能会使它们成为许多人生活中宝贵且不可或缺的伴侣（类似智能手机），创造一种超越简单工具使用的紧密联系。生态理论如社会物质主义有助于研究通过密切用户-人工智能互动发展起来的共同代理，以及从土著文化视角看待人-物关系。

更新时间: 2024-05-31 14:17:17

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.20216v4

Online Convex Optimisation: The Optimal Switching Regret for all Segmentations Simultaneously

We consider the classic problem of online convex optimisation. Whereas the notion of static regret is relevant for stationary problems, the notion of switching regret is more appropriate for non-stationary problems. A switching regret is defined relative to any segmentation of the trial sequence, and is equal to the sum of the static regrets of each segment. In this paper we show that, perhaps surprisingly, we can achieve the asymptotically optimal switching regret on every possible segmentation simultaneously. Our algorithm for doing so is very efficient: having a space and per-trial time complexity that is logarithmic in the time-horizon. Our algorithm also obtains novel bounds on its dynamic regret: being adaptive to variations in the rate of change of the comparator sequence.

Updated: 2024-05-31 14:16:52

标题: 在线凸优化：同时为所有分割提供最佳切换后悔

摘要: 我们考虑在线凸优化的经典问题。虽然静态遗憾的概念适用于静态问题，但切换遗憾的概念更适用于非静态问题。切换遗憾是相对于试验序列的任何分割定义的，并且等于每个分割的静态遗憾的总和。在本文中，我们展示了令人惊讶的是，我们可以同时在每种可能的分割上实现渐近最优的切换遗憾。我们的算法非常高效：在时间范围内具有对数空间和每次试验时间复杂度。我们的算法还获得了对其动态遗憾的新型界限：可以适应比较器序列变化速率的变化。

更新时间: 2024-05-31 14:16:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20824v1

Pursuing Overall Welfare in Federated Learning through Sequential Decision Making

In traditional federated learning, a single global model cannot perform equally well for all clients. Therefore, the need to achieve the client-level fairness in federated system has been emphasized, which can be realized by modifying the static aggregation scheme for updating the global model to an adaptive one, in response to the local signals of the participating clients. Our work reveals that existing fairness-aware aggregation strategies can be unified into an online convex optimization framework, in other words, a central server's sequential decision making process. To enhance the decision making capability, we propose simple and intuitive improvements for suboptimal designs within existing methods, presenting AAggFF. Considering practical requirements, we further subdivide our method tailored for the cross-device and the cross-silo settings, respectively. Theoretical analyses guarantee sublinear regret upper bounds for both settings: $\mathcal{O}(\sqrt{T \log{K}})$ for the cross-device setting, and $\mathcal{O}(K \log{T})$ for the cross-silo setting, with $K$ clients and $T$ federation rounds. Extensive experiments demonstrate that the federated system equipped with AAggFF achieves better degree of client-level fairness than existing methods in both practical settings. Code is available at https://github.com/vaseline555/AAggFF

Updated: 2024-05-31 14:15:44

标题: 通过顺序决策实现联邦学习中的整体福祉

摘要: 在传统的联邦学习中，单一全局模型无法为所有客户端表现同样出色。因此，强调了在联邦系统中实现客户端级公平性的需求，这可以通过将用于更新全局模型的静态聚合方案修改为自适应方案来实现，以响应参与客户端的本地信号。我们的工作揭示了现有的公平感知聚合策略可以统一到一个在线凸优化框架中，换句话说，一个中央服务器的顺序决策过程。为了增强决策能力，我们提出了对现有方法中次优设计进行简单直观的改进，提出了AAggFF。考虑到实际要求，我们进一步将我们的方法分别针对跨设备和跨隔离设置进行细分。理论分析保证了两种设置的次线性遗憾上界：对于跨设备设置为$\mathcal{O}(\sqrt{T \log{K}})$，对于跨隔离设置为$\mathcal{O}(K \log{T})$，其中$K$为客户端数量，$T$为联邦轮数。大量实验表明，配备AAggFF的联邦系统在两种实际设置中均实现了比现有方法更好的客户端级公平性程度。代码可在https://github.com/vaseline555/AAggFF上找到。

更新时间: 2024-05-31 14:15:44

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2405.20821v1

On the Completeness and Complexity of the Lifted Dynamic Junction Tree Algorithm

For static lifted inference algorithms, completeness, i.e., domain liftability, is extensively studied. However, so far no domain liftability results for temporal lifted inference algorithms exist. In this paper, we close this gap. More precisely, we contribute the first completeness and complexity analysis for a temporal lifted algorithm, the socalled lifted dynamic junction tree algorithm (LDJT), which is the only exact lifted temporal inference algorithm out there. To handle temporal aspects efficiently, LDJT uses conditional independences to proceed in time, leading to restrictions w.r.t. elimination orders. We show that these restrictions influence the domain liftability results and show that one particular case while proceeding in time, has to be excluded from FO12 . Additionally, for the complexity of LDJT, we prove that the lifted width is in even more cases smaller than the corresponding treewidth in comparison to static inference.

Updated: 2024-05-31 14:15:38

标题: 关于提升的动态联合树算法的完整性和复杂性

摘要: 对于静态 lifted 推理算法，完备性，即域 liftability，被广泛研究。然而，迄今为止，暂时没有关于时态 lifted 推理算法的域 liftability 结果存在。在本文中，我们填补了这一空白。更确切地说，我们为一种时态 lifted 算法，即所谓的 lifted 动态联合树算法（LDJT），贡献了首个完备性和复杂性分析，这是目前唯一的确切的 lifted 时态推理算法。为了高效处理时态方面的问题，LDJT 利用条件独立性在时间上进行推进，导致在消除顺序方面受限。我们展示了这些限制如何影响域 liftability 结果，并表明在时间推进时，有一种特定情况必须从 FO12 中排除。此外，对于 LDJT 的复杂性，我们证明 lifted 宽度在更多情况下比相应的树宽小，与静态推理相比。

更新时间: 2024-05-31 14:15:38

领域: cs.AI

下载: http://arxiv.org/abs/2110.09197v3

Optimally Improving Cooperative Learning in a Social Setting

We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions, for the same classification task, through communication or observations of each other's predictions. Clearly if highly influential vertices use erroneous classifiers, there will be a negative effect on the accuracy of all the agents in the network. We ask the following question: how can we optimally fix the prediction of a few classifiers so as maximize the overall accuracy in the entire network. To this end we consider an aggregate and an egalitarian objective function. We show a polynomial time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard. Furthermore, we develop approximation algorithms for the egalitarian improvement. The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.

Updated: 2024-05-31 14:07:33

标题: 在社会环境中最佳地提高合作学习

摘要: 我们考虑一种合作学习场景，其中一组具有独立拥有的分类器的网络代理通过通信或观察彼此的预测动态更新其预测，用于相同的分类任务。显然，如果具有高度影响力的顶点使用错误的分类器，将对网络中所有代理的准确性产生负面影响。我们提出以下问题：如何最佳修正少数分类器的预测，以最大化整个网络的总体准确性。为此，我们考虑了一个聚合和一个平等的目标函数。我们展示了一个多项式时间算法来优化聚合目标函数，并证明了优化平等目标函数是NP难的。此外，我们开发了适用于平等改进的近似算法。我们所有算法的性能都由数学分析保证，并由合成和真实数据的实验证实支持。

更新时间: 2024-05-31 14:07:33

领域: cs.DS,cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.20808v1

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a numerical tokenization technique for improved processing of numerical data. These enhancements allow BioT5+ to bridge the gap between molecular representations and their textual descriptions, providing a more holistic understanding of biological entities, and largely improving the grounded reasoning of bio-text and bio-sequences. The model is pre-trained and fine-tuned with a large number of experiments, including \emph{3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets}, demonstrating the remarkable performance and state-of-the-art results in most cases. BioT5+ stands out for its ability to capture intricate relationships in biological data, thereby contributing significantly to bioinformatics and computational biology. Our code is available at \url{https://github.com/QizhiPei/BioT5}.

Updated: 2024-05-31 14:07:00

标题: BioT5+：通过IUPAC集成和多任务调整实现广义生物理解

摘要: 计算生物学近期的研究趋势越来越关注于整合文本和生物实体建模，尤其是在分子和蛋白质的背景下。然而，之前像BioT5这样的努力在跨越多样化任务方面面临挑战，并且缺乏对分子结构的细致理解，特别是在它们的文本表示（例如，IUPAC）方面。本文介绍了BioT5+，这是BioT5框架的一个扩展，旨在增强生物研究和药物发现。BioT5+包含了几个新特点：整合IUPAC名称以增进对分子的理解，包括来自bioRxiv和PubChem等来源的广泛生物文本和分子数据，用于跨任务调整的多任务指令调整，并用于改进数值数据处理的数值标记技术。这些增强功能使得BioT5+能够弥合分子表示和它们的文本描述之间的差距，提供更全面的生物实体理解，并大大改善生物文本和生物序列的基础推理。该模型经过预训练和微调，包括3种类型的问题（分类、回归、生成）、15种任务和21个总基准数据集的大量实验，展示了在大多数情况下卓越的性能和最先进的结果。BioT5+以其捕捉生物数据中复杂关系的能力脱颖而出，从而对生物信息学和计算生物学有着显著贡献。我们的代码可在\url{https://github.com/QizhiPei/BioT5}上找到。

更新时间: 2024-05-31 14:07:00

领域: q-bio.QM,cs.AI,cs.CE,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2402.17810v2

There and Back Again: The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This perspective article draws attention to a fundamental challenge inherent in all AI alignment endeavors, which we term the "AI alignment paradox": The better we align AI models with our values, the easier we make it for adversaries to misalign the models. We illustrate the paradox by sketching three concrete example incarnations for the case of language models, each corresponding to a distinct way in which adversaries can exploit the paradox. With AI's increasing real-world impact, it is imperative that a broad community of researchers be aware of the AI alignment paradox and work to find ways to break out of it, in order to ensure the beneficial use of AI for the good of humanity.

Updated: 2024-05-31 14:06:24

标题: 来去自如：人工智能对准悖论

摘要: 人工智能对齐领域旨在引导人工智能系统朝向人类的目标、偏好和道德原则。其贡献对于提高当今人工智能模型的输出质量、安全性和可信度至关重要。这篇透视文章着重指出了所有人工智能对齐努力中固有的一个基本挑战，我们称之为“人工智能对齐悖论”：我们将人工智能模型与我们的价值观更好地对齐，就越容易让对手使模型脱离对齐。我们通过勾勒三个具体的案例化身来说明这个悖论，针对语言模型的情况，每个案例对应着对手可以利用悖论的不同方式。随着人工智能在现实世界中的影响增加，有必要让广泛的研究人员社区意识到人工智能对齐悖论，并努力寻找打破这一悖论的方法，以确保人工智能的有益利用造福人类。

更新时间: 2024-05-31 14:06:24

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.20806v1

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts ($\mu$MoE) layer to address this, focusing on vision models. $\mu$MoE layers enable scalable expert specialization by performing an implicit computation on prohibitively large weight tensors entirely in factorized form. Consequently, $\mu$MoEs (1) avoid the restrictively high inference-time costs of 'soft' MoEs, yet (2) do not inherit the training issues of the popular 'sparse' MoEs' discrete (non-differentiable) expert routing. We present both qualitative and quantitative evidence that scaling $\mu$MoE layers when fine-tuning foundation models for vision tasks leads to more specialized experts at the class-level, further enabling manual bias correction in CelebA attribute classification. Finally, we show qualitative results demonstrating the expert specialism achieved when pre-training large GPT2 and MLP-Mixer models with parameter-matched $\mu$MoE blocks at every layer, maintaining comparable accuracy. Our code is available at: https://github.com/james-oldfield/muMoE.

Updated: 2024-05-31 14:04:05

标题: 多线性专家混合模型：通过因子分解实现可扩展的专家特化

摘要: 混合专家（MoE）范式提供了一种强大的方法，将密集层分解为更小、模块化的计算，这些计算往往更容易被人类解释、调试和编辑。然而，一个主要挑战在于计算成本，需要将专家数量扩展到足够高的水平，以实现精细的专业化。在本文中，我们提出了多线性混合专家（$\mu$MoE）层来解决这个问题，重点放在视觉模型上。$\mu$MoE层通过在分解形式中完全隐式计算过大的权重张量，实现可扩展的专家专业化。因此，$\mu$MoEs（1）避免了“软”MoEs在推理时间成本方面的限制性高成本，但（2）不继承流行的“稀疏”MoEs的离散（不可微分）专家路由的训练问题。我们提供定性和定量证据，表明在微调视觉任务的基础模型时，通过扩展$\mu$MoE层，可以在类别级别获得更专业化的专家，进一步实现CelebA属性分类中的手动偏差校正。最后，我们展示了在每一层使用参数匹配的$\mu$MoE块预训练大型GPT2和MLP-Mixer模型时所达到的专家专业化的定性结果，保持了可比较的准确性。我们的代码可在以下网址找到：https://github.com/james-oldfield/muMoE。

更新时间: 2024-05-31 14:04:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.12550v2

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.

Updated: 2024-05-31 14:02:24

标题: 动态上下文修剪用于高效和可解释的自回归变换器

摘要: 采用自回归变换器的大型语言模型（LLMs）很难扩展到长序列。尽管有几项工作试图降低它们的计算成本，大多数LLMs仍然在序列中的所有标记对之间采用注意力层，因此产生二次成本。在本研究中，我们提出了一种新颖的方法，动态修剪上下文信息，同时保留模型的表达能力，在推断过程中减少内存和计算需求。我们的方法采用可学习机制，确定在生成过程中的任何时点可以删除哪些无关紧要的标记。通过这样做，我们的方法不仅解决了性能问题，还增强了可解释性，为模型的决策过程提供了宝贵的见解。我们的技术可以通过简单的微调过程应用于现有的预训练模型，并且修剪强度可以由稀疏参数指定。值得注意的是，我们的实证发现表明，我们可以有效地修剪高达80\%的上下文，而在下游任务中性能没有显著降低，为减少推理成本提供了有价值的工具。我们的参考实现可实现高达2倍的推理吞吐量提高，甚至可以节省更多内存。

更新时间: 2024-05-31 14:02:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2305.15805v3

Online Prompt Pricing based on Combinatorial Multi-Armed Bandit and Hierarchical Stackelberg Game

Generation models have shown promising performance in various tasks, making trading around machine learning models possible. In this paper, we aim at a novel prompt trading scenario, prompt bundle trading (PBT) system, and propose an online pricing mechanism. Based on the combinatorial multi-armed bandit (CMAB) and three-stage hierarchical Stackelburg (HS) game, our pricing mechanism considers the profits of the consumer, platform, and seller, simultaneously achieving the profit satisfaction of these three participants. We break down the pricing issue into two steps, namely unknown category selection and incentive strategy optimization. The former step is to select a set of categories with the highest qualities, and the latter is to derive the optimal strategy for each participant based on the chosen categories. Unlike the existing fixed pricing mode, the PBT pricing mechanism we propose is more flexible and diverse, which is more in accord with the transaction needs of real-world scenarios. We test our method on a simulated text-to-image dataset. The experimental results demonstrate the effectiveness of our algorithm, which provides a feasible price-setting standard for the prompt marketplaces.

Updated: 2024-05-31 14:01:32

标题: 基于组合多臂老虎机和分层斯塔克尔伯格博弈的在线提示定价

摘要: 生成模型在各种任务中表现出有希望的性能，使得围绕机器学习模型的交易成为可能。本文旨在针对一种新颖的提示交易场景，即提示捆绑交易（PBT）系统，并提出一种在线定价机制。基于组合多臂赌博机（CMAB）和三阶段分层斯塔克尔堡（HS）博弈，我们的定价机制同时考虑了消费者、平台和卖家的利润，实现了这三个参与者的利润满意度。我们将定价问题分解为两个步骤，即未知类别选择和激励策略优化。前者是选择一组质量最高的类别，后者是基于所选类别为每个参与者制定最佳策略。与现有的固定定价模式不同，我们提出的PBT定价机制更加灵活多样，更符合实际场景的交易需求。我们在模拟文本到图像数据集上测试了我们的方法。实验结果表明了我们算法的有效性，为提示市场提供了可行的定价标准。

更新时间: 2024-05-31 14:01:32

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.15154v2

Shape Constraints in Symbolic Regression using Penalized Least Squares

We study the addition of shape constraints and their consideration during the parameter estimation step of symbolic regression (SR). Shape constraints serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored shape constraints in SR, we propose minimizing shape constraint violations during parameter estimation using gradient-based numerical optimization. We test three algorithm variants to evaluate their performance in identifying three symbolic expressions from a synthetically generated data set. This paper examines two benchmark scenarios: one with varying noise levels and another with reduced amounts of training data. The results indicate that incorporating shape constraints into the expression search is particularly beneficial when data is scarce. Compared to using shape constraints only in the selection process, our approach of minimizing violations during parameter estimation shows a statistically significant benefit in some of our test cases, without being significantly worse in any instance.

Updated: 2024-05-31 14:01:12

标题: 在符号回归中使用惩罚最小二乘法的形状约束

摘要: 我们研究了在符号回归（SR）的参数估计步骤中添加形状约束及考虑形状约束的问题。形状约束可用作引入关于未知模型函数形状的先验知识到SR中的手段。与先前在SR中探索形状约束的工作不同，我们提出使用基于梯度的数值优化来在参数估计过程中最小化形状约束违反。我们测试了三种算法变体，评估它们在从合成生成的数据集中识别三个符号表达式的性能。本文考察了两种基准情景：一个是具有不同噪声水平的情况，另一个是训练数据量减少的情况。结果表明，在数据稀缺时，将形状约束纳入表达式搜索尤其有益。与仅在选择过程中使用形状约束相比，我们在参数估计过程中最小化违反显示出在一些测试案例中具有统计学上显著的好处，而在任何情况下都没有明显变差。

更新时间: 2024-05-31 14:01:12

领域: cs.LG,cs.SC

下载: http://arxiv.org/abs/2405.20800v1

Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent architectures with Neural ODE-based models to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose \textit{multi-view signature attention}, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.

Updated: 2024-05-31 14:00:44

标题: 粗糙变压器：具有路径标记的轻量级连续时间序列建模

摘要: 实际应用中的时间序列数据通常表现出长期依赖性，并且在非均匀间隔下观测。在这些情况下，传统的基于序列的循环模型面临困难。为了克服这一挑战，研究人员经常用基于神经ODE的模型替代循环架构，以适应不规则采样数据，并使用基于Transformer的架构来处理长期依赖性。尽管这两种方法取得了成功，但对于即使是中等长度的输入序列，两者都会产生非常高的计算成本。为了解决这一难题，我们引入了Rough Transformer，这是Transformer模型的一种变体，它操作于输入序列的连续时间表示，并且具有显著较低的计算成本。具体而言，我们提出了"多视图签名注意力"，利用路径签名来增强普通的注意力，并捕捉输入数据中的局部和全局（多尺度）依赖关系，同时对序列长度和采样频率的变化具有鲁棒性，并实现了改进的空间处理。我们发现，在各种与时间序列相关的任务中，Rough Transformers始终优于其普通注意力对应物，在获得神经ODE模型的表征优势的同时，计算时间和内存资源仅占一小部分。

更新时间: 2024-05-31 14:00:44

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.20799v1

Performative Reinforcement Learning in Gradually Shifting Environments

When Reinforcement Learning (RL) agents are deployed in practice, they might impact their environment and change its dynamics. We propose a new framework to model this phenomenon, where the current environment depends on the deployed policy as well as its previous dynamics. This is a generalization of Performative RL (PRL) [Mandal et al., 2023]. Unlike PRL, our framework allows to model scenarios where the environment gradually adjusts to a deployed policy. We adapt two algorithms from the performative prediction literature to our setting and propose a novel algorithm called Mixed Delayed Repeated Retraining (MDRR). We provide conditions under which these algorithms converge and compare them using three metrics: number of retrainings, approximation guarantee, and number of samples per deployment. MDRR is the first algorithm in this setting which combines samples from multiple deployments in its training. This makes MDRR particularly suitable for scenarios where the environment's response strongly depends on its previous dynamics, which are common in practice. We experimentally compare the algorithms using a simulation-based testbed and our results show that MDRR converges significantly faster than previous approaches.

Updated: 2024-05-31 13:59:44

标题: 逐渐变化环境中的表现性强化学习

摘要: 当强化学习（RL）代理在实践中部署时，它们可能会影响其环境并改变其动态。我们提出了一个新的框架来模拟这种现象，其中当前环境取决于部署策略以及其先前的动态。这是对Performative RL（PRL）[Mandal等，2023]的泛化。与PRL不同，我们的框架允许模拟环境逐渐调整到部署策略的情况。我们从表现预测文献中调整了两种算法到我们的设置，并提出了一种名为Mixed Delayed Repeated Retraining（MDRR）的新算法。我们提供了这些算法收敛的条件，并使用三个指标进行比较：重新训练次数，近似保证和每次部署的样本数量。MDRR是该设置中首个将多个部署的样本结合在其训练中的算法。这使得MDRR特别适用于环境的响应强烈依赖于其先前的动态的情况，这在实践中很常见。我们通过基于模拟的测试平台实验比较了这些算法，结果表明MDRR的收敛速度明显快于先前的方法。

更新时间: 2024-05-31 13:59:44

领域: cs.LG

下载: http://arxiv.org/abs/2402.09838v2

Ovis: Structural Embedding Alignment for Multimodal Large Language Model

Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information. We propose Ovis, a novel MLLM architecture designed to structurally align visual and textual embeddings. Ovis integrates an additional learnable visual embedding table into the visual encoder's process. To capture rich visual semantics, each image patch indexes the visual embedding table multiple times, resulting in a final visual embedding that is a probabilistic combination of the indexed embeddings. This structural approach mirrors the method used for generating textual embeddings. Empirical evaluations on various multimodal benchmarks demonstrate that Ovis outperforms open-source MLLMs of similar parameter scales and even surpasses the proprietary model Qwen-VL-Plus overall. These results highlight the potential of Ovis' structured visual representation for advancing MLLM architectural design and promoting more effective multimodal learning. Both the source code and the training dataset of Ovis will be made publicly available.

Updated: 2024-05-31 13:59:18

标题: Ovis：用于多模态大型语言模型的结构嵌入对齐

摘要: 目前的多模态大型语言模型（MLLMs）通常通过连接器（如MLP）将预训练的LLM与另一个预训练的视觉变换器集成在一起，赋予LLM视觉能力。然而，在MLLMs中两种嵌入策略之间的不一致性 -- 基于嵌入查找表的结构化文本嵌入和直接由视觉编码器生成的连续嵌入 -- 使得更无缝地融合视觉和文本信息面临挑战。我们提出了Ovis，一种新颖的MLLM架构，旨在结构化地对齐视觉和文本嵌入。Ovis将一个可学习的视觉嵌入表集成到视觉编码器的过程中。为了捕捉丰富的视觉语义，每个图像块多次索引视觉嵌入表，从而产生一个最终的视觉嵌入，它是索引嵌入的概率组合。这种结构化方法反映了生成文本嵌入的方法。在各种多模态基准测试中的实证评估表明，Ovis优于类似参数规模的开源MLLMs，甚至在整体上超过了专有模型Qwen-VL-Plus。这些结果突显了Ovis结构化视觉表示对于推进MLLM架构设计和促进更有效的多模态学习的潜力。Ovis的源代码和训练数据集将公开提供。

更新时间: 2024-05-31 13:59:18

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20797v1

Robust Collaborative Perception without External Localization and Clock Devices

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.

Updated: 2024-05-31 13:58:20

标题: 无需外部定位和时钟设备的稳健协作感知

摘要: 跨多个代理实现一致的时空协调对于协作感知至关重要，协作感知旨在通过代理之间的信息交换来提高感知能力。为了实现这种时空对齐，传统方法依赖于外部设备提供定位和时钟信号。然而，硬件生成的信号可能容易受到噪声和潜在恶意攻击的影响，从而危及时空对齐的精度。与依赖外部硬件不同，本文提出了一种新颖的方法：通过识别各个代理的感知数据中固有的几何模式来实现对齐。在这种精神指导下，我们提出了一个独立于外部定位和时钟设备运行的鲁棒的协作感知系统。我们系统的关键模块，FreeAlign，基于检测到的框构建了一个显著的对象图，并使用图神经网络识别代理之间的共同子图，从而实现准确的相对姿态和时间。我们在真实世界和模拟数据集上验证了FreeAlign。结果显示，FreeAlign赋予的鲁棒的协作感知系统表现与依赖精确定位和时钟设备的系统相当。

更新时间: 2024-05-31 13:58:20

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.02965v2

InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding

Accurate visual understanding is imperative for advancing autonomous systems and intelligent robots. Despite the powerful capabilities of vision-language models (VLMs) in processing complex visual scenes, precisely recognizing obscured or ambiguously presented visual elements remains challenging. To tackle such issues, this paper proposes InsightSee, a multi-agent framework to enhance VLMs' interpretative capabilities in handling complex visual understanding scenarios. The framework comprises a description agent, two reasoning agents, and a decision agent, which are integrated to refine the process of visual information interpretation. The design of these agents and the mechanisms by which they can be enhanced in visual information processing are presented. Experimental results demonstrate that the InsightSee framework not only boosts performance on specific visual tasks but also retains the original models' strength. The proposed framework outperforms state-of-the-art algorithms in 6 out of 9 benchmark tests, with a substantial advancement in multimodal understanding.

Updated: 2024-05-31 13:56:55

标题: InsightSee：推进增强视觉理解的多智能体视觉语言模型

摘要: 准确的视觉理解对于推进自主系统和智能机器人至关重要。尽管视觉语言模型（VLMs）在处理复杂视觉场景方面具有强大的能力，但精确识别被遮挡或模糊呈现的视觉元素仍然具有挑战性。为了解决这些问题，本文提出了InsightSee，一个多代理框架，用于增强VLMs在处理复杂视觉理解场景中的解释能力。该框架包括一个描述代理、两个推理代理和一个决策代理，它们被整合在一起以优化视觉信息解释过程。介绍了这些代理的设计以及它们在视觉信息处理中如何得以增强的机制。实验结果表明，InsightSee框架不仅在特定的视觉任务上提高了性能，还保留了原始模型的优势。所提出的框架在9个基准测试中有6个表现优于最先进的算法，在多模态理解方面有着显著进展。

更新时间: 2024-05-31 13:56:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20795v1

Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models

Explainable AI (XAI) has a counterpart in analytical modeling which we refer to as model explainability. We tackle the issue of model explainability in the context of prediction models. We analyze a dataset of loans from a credit card company and apply three stages: execute and compare four different prediction methods, apply the best known explainability techniques in the current literature to the model training sets to identify feature importance (FI) (static case), and finally to cross-check whether the FI set holds up under what if prediction scenarios for continuous and categorical variables (dynamic case). We found inconsistency in FI identification between the static and dynamic cases. We summarize the state of the art in model explainability and suggest further research to advance the field.

Updated: 2024-05-31 13:54:25

标题: 模型解释和可解释性：朝着在预测模型中创造透明度

摘要: 可解释性人工智能（XAI）在我们称之为模型可解释性的分析建模中有一个对应物。我们在预测模型的背景下处理模型可解释性问题。我们分析了一家信用卡公司的贷款数据集，并应用了三个阶段：执行和比较四种不同的预测方法，应用当前文献中已知的最佳可解释性技术到模型训练集以识别特征重要性（FI）（静态情况），最后交叉检查FI集在连续和分类变量的假设预测场景下是否保持（动态情况）。我们发现在静态和动态情况下FI识别存在不一致性。我们总结了模型可解释性的最新技术，并建议进一步研究推进该领域。

更新时间: 2024-05-31 13:54:25

领域: cs.LG

下载: http://arxiv.org/abs/2405.20794v1

Equivariant Deep Weight Space Alignment

Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.

Updated: 2024-05-31 13:53:26

标题: 等变深度权重空间对齐

摘要: 深度网络的置换对称性使模型合并和相似性估计等基本操作变得具有挑战性。在许多情况下，对齐网络的权重，即找到它们之间的最佳排列，是必要的。不幸的是，权重对齐是一个NP难题。先前的研究主要集中在解决对齐问题的放松版本，导致方法耗时或次优解。为了加速对齐过程并提高其质量，我们提出了一个旨在学习解决权重对齐问题的新框架，我们称之为Deep-Align。为此，我们首先证明权重对齐遵循两个基本对称性，然后提出一个尊重这些对称性的深度架构。值得注意的是，我们的框架不需要任何标记数据。我们对我们的方法进行了理论分析，并在几种类型的网络架构和学习设置上评估了Deep-Align。我们的实验结果表明，通过Deep-Align进行前向传递产生的对齐结果比当前优化算法产生的对齐结果更好或相当。此外，我们的对齐可以用作其他方法的有效初始化，从而导致收敛速度显著加快的改进解决方案。

更新时间: 2024-05-31 13:53:26

领域: cs.LG

下载: http://arxiv.org/abs/2310.13397v3

GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis

Decoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting effects. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework. The fundamental idea is to view the rendering tasks under various lighting positions as a multi-task learning problem, which our meta-learning approach effectively addresses by generalizing the learned Gaussian geometries not only across different viewpoints but also across diverse light positions. Experimental results demonstrate the effectiveness of our approach in terms of training efficiency and rendering quality compared to existing methods for free-viewpoint relighting.

Updated: 2024-05-31 13:48:54

标题: GS-Phong：元学习的用于可重照的新视角合成的3D高斯函数

摘要: 在3D场景中解耦照明对于新视图合成和重新照明至关重要。在本文中，我们提出了一种新颖的方法，用一组可重新照明的3D高斯点来表示由点光源照明的场景。受Blinn-Phong模型启发，我们的方法将场景分解为环境、漫反射和镜面三个部分，从而实现逼真的光照效果合成。为了促进几何信息在照明条件下的独立分解，我们引入了一种基于双层优化的元学习框架。基本思想是将在不同照明位置下的渲染任务视为多任务学习问题，我们的元学习方法通过将学习的高斯几何泛化到不同视点和不同光照位置来有效解决这个问题。实验结果显示，与现有的自由视点重新照明方法相比，我们的方法在训练效率和渲染质量方面具有较高的有效性。

更新时间: 2024-05-31 13:48:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20791v1

Intersectional Unfairness Discovery

AI systems have been shown to produce unfair results for certain subgroups of population, highlighting the need to understand bias on certain sensitive attributes. Current research often falls short, primarily focusing on the subgroups characterized by a single sensitive attribute, while neglecting the nature of intersectional fairness of multiple sensitive attributes. This paper focuses on its one fundamental aspect by discovering diverse high-bias subgroups under intersectional sensitive attributes. Specifically, we propose a Bias-Guided Generative Network (BGGN). By treating each bias value as a reward, BGGN efficiently generates high-bias intersectional sensitive attributes. Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. To further evaluate the generated unseen but possible unfair intersectional sensitive attributes, we formulate them as prompts and use modern generative AI to produce new texts and images. The results of frequently generating biased data provides new insights of discovering potential unfairness in popular modern generative AI systems. Warning: This paper contains generative examples that are offensive in nature.

Updated: 2024-05-31 13:45:52

标题: 交叉不公平发现

摘要: 人工智能系统已被证明会为某些人群产生不公平的结果，突显了了解某些敏感属性上的偏见的必要性。当前研究往往存在不足，主要集中在由单一敏感属性特征化的人群，而忽略了多个敏感属性的交叉公平性的性质。本文重点研究了通过发现在交叉敏感属性下多样化的高偏见人群。具体而言，我们提出了一个偏见引导生成网络（BGGN）。通过将每个偏见值视为奖励，BGGN有效地生成高偏见的交叉敏感属性。对真实世界的文本和图像数据集的实验表明了BGGN的多样化和高效的发现。为了进一步评估生成的未见但可能不公平的交叉敏感属性，我们将它们制定为提示，并使用现代生成人工智能来生成新的文本和图像。频繁生成偏见数据的结果提供了对在流行的现代生成人工智能系统中发现潜在不公平性的新见解。警告：本文包含具有冒犯性质的生成示例。

更新时间: 2024-05-31 13:45:52

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2405.20790v1

Reinforcement Learning for Sociohydrology

In this study, we discuss how reinforcement learning (RL) provides an effective and efficient framework for solving sociohydrology problems. The efficacy of RL for these types of problems is evident because of its ability to update policies in an iterative manner - something that is also foundational to sociohydrology, where we are interested in representing the co-evolution of human-water interactions. We present a simple case study to demonstrate the implementation of RL in a problem of runoff reduction through management decisions related to changes in land-use land-cover (LULC). We then discuss the benefits of RL for these types of problems and share our perspectives on the future research directions in this area.

Updated: 2024-05-31 13:28:37

标题: 强化学习在社会水文学中的应用

摘要: 在这项研究中，我们讨论了强化学习（RL）如何为解决社会水文问题提供了一种有效和高效的框架。RL对于这类问题的有效性显而易见，因为它能够以迭代的方式更新政策 - 这也是社会水文学的基础，我们对于代表人类与水资源相互作用的演化感兴趣。我们提供了一个简单的案例研究来展示RL在通过管理决策相关的土地利用和地表覆盖（LULC）变化减少径流问题中的实施。然后，我们讨论了RL对于这类问题的好处，并分享了我们对于该领域未来研究方向的看法。

更新时间: 2024-05-31 13:28:37

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2405.20772v1

NLP Verification: Towards a General Methodology for Certifying Robustness

Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literature, they are often light on the pragmatical issues of NLP verification. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we give a general (i.e. algorithm-independent) characterisation of verifiable subspaces that result from embedding sentences into continuous spaces. We identify, and give an effective method to deal with, the technical challenge of semantic generalisability of verified subspaces; and propose it as a standard metric in the NLP verification pipelines (alongside with the standard metrics of model accuracy and model verifiability). Secondly, we propose a general methodology to analyse the effect of the embedding gap -- a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. In extreme cases, poor choices in embedding of sentences may invalidate verification results. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap; and in particular we propose the metric of falsifiability of semantic subspaces as another fundamental metric to be reported as part of the NLP verification pipeline. We believe that together these general principles pave the way towards a more consolidated and effective development of this new domain.

Updated: 2024-05-31 13:11:15

标题: NLP验证：朝着一种通用的验证鲁棒性方法论

摘要: 深度神经网络在自然语言处理领域取得了显著的成功，确保它们的安全性和可靠性至关重要：在一些安全关键的情境中，这些模型必须对变化或攻击具有强大的鲁棒性，并对其输出提供保证。与计算机视觉不同，自然语言处理缺乏统一的验证方法，尽管近年来文献中取得了进展，但在实际问题的处理上仍然存在一定的轻视。本文试图梳理和评估一个自然语言处理验证流程的一般组成部分，这些组成部分是迄今为止该领域取得的进展所展现出来的。我们的贡献有两个方面。首先，我们提供了一个一般性（即与算法无关）的可验证子空间的特征化，这些子空间是由将句子嵌入连续空间而产生的。我们确定了，并提出了有效的方法来处理，验证子空间的语义泛化技术挑战；并提议将其作为自然语言处理验证流程中的一个标准度量（与模型准确性和可验证性的标准度量一起）。其次，我们提出了一种分析嵌入差距影响的一般方法论——这个问题指的是几何子空间的验证与几何子空间所代表的句子的语义含义之间的差异。在极端情况下，句子嵌入的选择不当可能会使验证结果无效。我们提出了一些实用的自然语言处理方法，可以帮助量化嵌入差距的影响；尤其是我们提出了语义子空间的可证伪性度量作为自然语言处理验证流程中报告的另一个基本度量。我们相信，这些一般原则共同为这个新领域的更加统一和有效的发展铺平了道路。

更新时间: 2024-05-31 13:11:15

领域: cs.CL,cs.AI,cs.LG,cs.LO,cs.PL

下载: http://arxiv.org/abs/2403.10144v2

Having Second Thoughts? Let's hear it

Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.

Updated: 2024-05-31 12:53:36

标题: 有所犹豫吗？让我们听一听

摘要: 深度学习模型在一定程度上模拟了从低阶感知区域到高阶认知区域的自下而上信号传递路径。经过训练后，深度学习模型在某些特定领域的任务上可以胜过人类，但它们的决策过程往往容易受到干扰。由于人类大脑由多个功能区域相互高度连接，并依赖于自下而上和自上而下（从高阶到低阶区域）处理之间错综复杂的相互作用，我们假设将自上而下信号处理纳入深度学习模型可能会使其更加稳健。为了验证这一假设，我们提出了一个模拟选择性注意力的认证过程，并测试其是否能使深度学习模型更加稳健。我们的实证评估表明，这种新提出的认证过程可以提高深度学习模型的准确性，并帮助我们建立安全措施以减轻其在人工和自然对抗性示例中的脆弱性。

更新时间: 2024-05-31 12:53:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.15356v2

LLMs achieve adult human performance on higher-order theory of mind tasks

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

Updated: 2024-05-31 12:45:50

标题: LLMs 在高阶心理理论任务上达到成年人类表现

摘要: 这篇论文探讨了大型语言模型（LLMs）在发展高阶心灵理论（ToM）方面的程度；即人类推理多个心理和情感状态的能力以递归方式（例如，我认为你相信她知道）。本文基于先前的工作，引入了一个手写测试套件--多阶心灵理论问答--并将其用于比较五个LLMs的表现与新收集的成年人基准。我们发现GPT-4和Flan-PaLM在总体ToM任务上达到了成人水平和接近成人水平的表现，而GPT-4在第6阶推理上超过了成人的表现。我们的结果表明，模型大小和微调之间存在相互作用，以实现ToM能力的实现，并且表现最佳的LLMs已经发展出了一种普遍的ToM能力。鉴于高阶ToM在广泛合作和竞争人类行为中所起的作用，这些发现对面向用户的LLM应用具有重要意义。

更新时间: 2024-05-31 12:45:50

领域: cs.AI,cs.CL,cs.HC,I.2.7; H.1.2

下载: http://arxiv.org/abs/2405.18870v2

Improving Generalization and Convergence by Enhancing Implicit Regularization

In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).

Updated: 2024-05-31 12:32:34

标题: Enhancing Implicit Regularization以改善泛化和收敛

摘要: 在这项工作中，我们提出了一种隐式正则化增强（IRE）框架，以加速深度学习中平坦解的发现，从而改善泛化和收敛性。具体而言，IRE解耦了平坦方向和尖锐方向的动态，从而提高了沿着平坦方向的尖锐度降低，同时保持了尖锐方向的训练稳定性。我们展示了IRE可以实际与通用基本优化器结合使用，而不会引入显著的计算负担。实验表明，IRE在各种基准数据集（CIFAR-10/100、ImageNet）和模型（ResNets和ViTs）的图像分类任务中始终提高泛化性能。令人惊讶的是，IRE在Llama模型（大小从60M到229M不等）在包括Wikitext-103、Minipile和Openwebtext在内的数据集上的预训练中，与AdamW相比实现了2倍的加速。此外，我们提供了理论保证，表明IRE可以大大加速尖锐感知最小化（SAM）中朝向平坦极小值的收敛。

更新时间: 2024-05-31 12:32:34

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.20763v1

Comparison of Access Control Approaches for Graph-Structured Data

Access control is the enforcement of the authorization policy, which defines subjects, resources, and access rights. Graph-structured data requires advanced, flexible, and fine-grained access control due to its complex structure as sequences of alternating vertices and edges. Several research works focus on protecting property graph-structured data, enforcing fine-grained access control, and proving the feasibility and applicability of their concept. However, they differ conceptually and technically. We select works from our systematic literature review on authorization and access control for different database models in addition to recent ones. Based on defined criteria, we exclude research works with different objectives, such as no protection of graph-structured data, graph models other than the property graph, coarse-grained access control approaches, or no application in a graph datastore (i.e., no proof-of-concept implementation). The latest version of the remaining works are discussed in detail in terms of their access control approach as well as authorization policy definition and enforcement. Finally, we analyze the strengths and limitations of the selected works and provide a comparison with respect to different aspects, including the base access control model, open/closed policy, negative permission support, and datastore-independent enforcement.

Updated: 2024-05-31 12:31:05

标题: 图结构数据的访问控制方法比较

摘要: 访问控制是授权政策的执行，该政策定义了主体、资源和访问权限。图结构化数据需要高级、灵活和细粒度的访问控制，因为其复杂的结构是交替顶点和边的序列。几项研究工作关注保护属性图结构化数据、执行细粒度访问控制，并证明其概念的可行性和适用性。然而，它们在概念上和技术上有所不同。我们从我们对不同数据库模型的授权和访问控制的系统文献综述中选择作品，以及最近的作品。根据定义的标准，我们排除了目标不同的研究工作，例如不保护图结构化数据、属性图之外的图模型、粗粒度访问控制方法，或者在图数据存储中没有应用（即没有概念验证实施）。剩下作品的最新版本将在其访问控制方法以及授权政策的定义和执行方面进行详细讨论。最后，我们分析了所选作品的优势和局限性，并在不同方面进行比较，包括基本访问控制模型、开放/封闭政策、负许可支持和与数据存储无关的执行。

更新时间: 2024-05-31 12:31:05

领域: cs.CR

下载: http://arxiv.org/abs/2405.20762v1

A Tale of Tails: Model Collapse as a Change of Scaling Laws

As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2.

Updated: 2024-05-31 12:27:52

标题: 《一个尾巴的故事：模型坍塌作为尺度定律的改变》

摘要: 随着人工智能模型规模的增长，神经网络的规模定律已经成为一个关键工具，用于预测大型模型在增加容量和原始（人类或自然）训练数据的大小时的改进。然而，流行模型的广泛使用意味着在线数据和文本的生态系统将逐渐包含更多的合成数据。在本文中，我们提出一个问题：在合成数据不可避免地进入训练语料库的情况下，规模定律会发生怎样的变化？未来的模型仍会改进，还是注定要退化到完全（模型）崩溃？我们通过规模定律的视角开发了一个模型崩溃的理论框架。我们发现了一系列衰减现象，分析了规模的损失、随着代数变化的偏移规模、技能的“反学习”以及混合人类和合成数据时的洞察力。我们的理论通过在算术任务上使用transformer和使用大语言模型Llama2进行文本生成的大规模实验得到了验证。

更新时间: 2024-05-31 12:27:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.07043v2

Share Your Secrets for Privacy! Confidential Forecasting with Vertical Federated Learning

Vertical federated learning (VFL) is a promising area for time series forecasting in industrial applications, such as predictive maintenance and machine control. Critical challenges to address in manufacturing include data privacy and over-fitting on small and noisy datasets during both training and inference. Additionally, to increase industry adaptability, such forecasting models must scale well with the number of parties while ensuring strong convergence and low-tuning complexity. We address those challenges and propose 'Secret-shared Time Series Forecasting with VFL' (STV), a novel framework that exhibits the following key features: i) a privacy-preserving algorithm for forecasting with SARIMAX and autoregressive trees on vertically partitioned data; ii) serverless forecasting using secret sharing and multi-party computation; iii) novel N-party algorithms for matrix multiplication and inverse operations for direct parameter optimization, giving strong convergence with minimal hyperparameter tuning complexity. We conduct evaluations on six representative datasets from public and industry-specific contexts. Our results demonstrate that STV's forecasting accuracy is comparable to those of centralized approaches. They also show that our direct optimization can outperform centralized methods, which include state-of-the-art diffusion models and long-short-term memory, by 23.81% on forecasting accuracy. We also conduct a scalability analysis by examining the communication costs of direct and iterative optimization to navigate the choice between the two. Code and appendix are available: https://github.com/adis98/STV

Updated: 2024-05-31 12:27:38

标题: 分享您的隐私秘密！使用垂直联邦学习进行机密预测

摘要: 垂直联邦学习（VFL）是工业应用中时间序列预测的一个有前景的领域，如预测性维护和机器控制。在制造业中需要解决的关键挑战包括数据隐私和在训练和推断过程中对小型和嘈杂数据集的过拟合。此外，为了增加行业适应性，这种预测模型必须能够随着参与方数量的增加而良好扩展，同时确保强收敛和低调整复杂性。我们解决了这些挑战并提出了“使用VFL进行秘密共享时间序列预测”（STV）的新框架，具有以下关键特点：i）用于在垂直分区数据上进行SARIMAX和自回归树预测的保护隐私算法；ii）使用秘密共享和多方计算进行无服务器预测；iii）用于矩阵乘法和反操作的新型N方算法，用于直接参数优化，具有强收敛性和最小的超参数调整复杂性。我们对来自公共和行业特定背景的六个代表性数据集进行评估。我们的结果表明，STV的预测准确性与集中式方法相当。它们还表明，我们的直接优化可以胜过集中式方法，包括最先进的扩散模型和长短期记忆，预测准确性提高了23.81%。我们还通过检查直接和迭代优化的通信成本来进行可扩展性分析，以确定两者之间的选择。代码和附录可在以下链接找到：https://github.com/adis98/STV

更新时间: 2024-05-31 12:27:38

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.20761v1

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for covariate distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.

Updated: 2024-05-31 12:25:31

标题: 高维度重尾数据下的鲁棒回归：渐近性和普适性

摘要: 我们研究了在协变量和响应函数同时存在重尾污染的情况下，鲁棒回归估计量的高维特性。特别地，我们提供了对在椭圆形协变量和噪声数据分布族上训练的M-估计量的尖锐渐进特征化，包括不存在二阶及更高阶矩的情况。我们表明，尽管Huber损失在位置参数$\delta$经过最优调整时是一致的，但在高维情况下，存在重尾噪声时，它是次优的，突出了进一步正则化以实现最佳性能的必要性。这个结果还揭示了作为样本复杂性和污染的函数的$\delta$存在转变。此外，我们推导了岭回归的超额风险衰减速率。我们表明，虽然在协变量分布具有有限二阶矩时它是最优且普适的，但当协变量的二阶矩不存在时，其衰减速率可能会更快。最后，我们展示了我们的公式可以轻松推广到更丰富的模型和数据分布族，例如在混合模型上训练的任意凸正则化的广义线性估计。

更新时间: 2024-05-31 12:25:31

领域: math.ST,cond-mat.dis-nn,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2309.16476v2

Information Theoretic Text-to-Image Alignment

Diffusion models for Text-to-Image (T2I) conditional generation have seen tremendous success recently. Despite their success, accurately capturing user intentions with these models still requires a laborious trial and error process. This challenge is commonly identified as a model alignment problem, an issue that has attracted considerable attention by the research community. Instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-language models to steer image generation, in this work we present a novel method that relies on an information-theoretic alignment measure. In a nutshell, our method uses self-supervised fine-tuning and relies on point-wise mutual information between prompts and images to define a synthetic training set to induce model alignment. Our comparative analysis shows that our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI and a lightweight fine-tuning strategy.

Updated: 2024-05-31 12:20:02

标题: 信息论文本到图像的对齐

摘要: 最近，针对文本到图像（T2I）有条件生成的扩散模型取得了巨大成功。尽管取得成功，准确捕捉用户意图仍然需要费时费力的试错过程。这一挑战通常被认定为模型对齐问题，这一问题已经引起了研究界的广泛关注。在本研究中，我们提出了一种新颖的方法，不依赖于细粒度的语言分析、人工标注或辅助视觉-语言模型来引导图像生成，而是依靠信息论对齐度量。简而言之，我们的方法使用自监督微调，并依赖于提示和图像之间的点对点互信息来定义一个合成训练集以诱导模型对齐。我们的比较分析表明，我们的方法与最先进技术相媲美甚至更优秀，但只需要一个预先训练的去噪网络来估计互信息和一个轻量级微调策略。

更新时间: 2024-05-31 12:20:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20759v1

FedSheafHN: Personalized Federated Learning on Graph-structured Data

Personalized subgraph Federated Learning (FL) is a task that customizes Graph Neural Networks (GNNs) to individual client needs, accommodating diverse data distributions. However, applying hypernetworks in FL, while aiming to facilitate model personalization, often encounters challenges due to inadequate representation of client-specific characteristics. To overcome these limitations, we propose a model called FedSheafHN, using enhanced collaboration graph embedding and efficient personalized model parameter generation. Specifically, our model embeds each client's local subgraph into a server-constructed collaboration graph. We utilize sheaf diffusion in the collaboration graph to learn client representations. Our model improves the integration and interpretation of complex client characteristics. Furthermore, our model ensures the generation of personalized models through advanced hypernetworks optimized for parallel operations across clients. Empirical evaluations demonstrate that FedSheafHN outperforms existing methods in most scenarios, in terms of client model performance on various graph-structured datasets. It also has fast model convergence and effective new clients generalization.

Updated: 2024-05-31 11:44:39

标题: FedSheafHN：基于图结构数据的个性化联邦学习

摘要: 个性化子图联邦学习（FL）是一项定制图神经网络（GNNs）以适应个体客户需求、适应多样化数据分布的任务。然而，在FL中应用超网络，旨在促进模型个性化，通常会遇到由于客户特定特征表示不足而带来的挑战。为了克服这些限制，我们提出了一种名为FedSheafHN的模型，利用增强的协作图嵌入和高效的个性化模型参数生成。具体来说，我们的模型将每个客户端的本地子图嵌入到服务器构建的协作图中。我们利用协作图中的群集扩散来学习客户端表示。我们的模型改善了复杂客户特征的集成和解释。此外，我们的模型通过针对跨客户的并行操作进行优化的先进超网络确保了个性化模型的生成。实证评估表明，FedSheafHN在大多数场景中的客户模型性能方面优于现有方法，在各种基于图结构的数据集上表现出快速的模型收敛和有效的新客户泛化。

更新时间: 2024-05-31 11:44:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.16056v3

AutoSAT: Automatically Optimize SAT Solvers via Large Language Models

Heuristics are crucial in SAT solvers, but no heuristic rules are suitable for all SAT problems. Therefore, it is helpful to refine specific heuristics for specific problems. In this context, we present AutoSAT, a novel framework for automatically optimizing heuristics in SAT solvers. AutoSAT is based on Large Language Models (LLMs) which is able to autonomously generate codes, conduct evaluation, and then utilize feedback to further optimize heuristics, thereby reducing human intervention and enhancing solver capabilities. AutoSAT operates on a plug-and-play basis, eliminating the need for extensive enterprise and model training, and fosters a Multi-Agent-based collaborative process with fault tolerance to ensure robust heuristic optimization. We implement AutoSAT on a lightweight Conflict-Driven Clause Learning (CDCL) solver EasySAT (the volume of EasySAT is about one-fiftieth of the State-of-the-Art hybrid solver Kissat) and extensive experiments on seven datasets demonstrate its superior performance. Out of the seven testing datasets, AutoSAT shows a superior performance to Kissat in two datasets and displays an overall similar performance in three datasets. Some heuristics generated by AutoSAT are even counter-intuitive but are very effective.

Updated: 2024-05-31 11:38:00

标题: AutoSAT:通过大型语言模型自动优化SAT求解器

摘要: 启发式在SAT求解器中至关重要，但是没有一种启发式规则适用于所有SAT问题。因此，对于特定问题，优化特定的启发式是有帮助的。在这种背景下，我们提出了AutoSAT，一个自动优化SAT求解器中启发式的新框架。AutoSAT基于大型语言模型（LLMs），能够自主生成代码，进行评估，然后利用反馈进一步优化启发式，从而减少人为干预，提升求解器能力。AutoSAT采用即插即用的方式运行，消除了对广泛企业和模型训练的需求，并促进了基于多智能体的协作过程，具有容错性以确保启发式的稳健优化。我们将AutoSAT实现在一个轻量级的冲突驱动子句学习（CDCL）求解器EasySAT上（EasySAT的体积约为现有混合求解器Kissat的五十分之一），并在七个数据集上进行了广泛实验，证明了其优异性能。在七个测试数据集中，AutoSAT在两个数据集上表现优于Kissat，并在三个数据集上展现出整体相似的性能。AutoSAT生成的一些启发式甚至是反直觉的，但非常有效。

更新时间: 2024-05-31 11:38:00

领域: cs.AI

下载: http://arxiv.org/abs/2402.10705v2

Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals

Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation. To address these issues, A-mode ultrasound-based tracking has been proposed as a non-invasive and safe alternative. However, this approach suffers from limited accuracy in peak detection when processing received ultrasound signals. To build a precise and real-time bone tracking approach, this paper introduces a deep learning-based method for anatomical region recognition and bone tracking using A-mode ultrasound signals, specifically focused on the knee joint. The algorithm is capable of simultaneously performing bone tracking and identifying the anatomical region where the A-mode ultrasound transducer is placed. It contains the fully connection between all encoding and decoding layers of the cascaded U-Nets to focus only on the signal region that is most likely to have the bone peak, thus pinpointing the exact location of the peak and classifying the anatomical region of the signal. The experiment showed a 97% accuracy in the classification of the anatomical regions and a precision of around 0.5$\pm$1mm under dynamic tracking conditions for various anatomical areas surrounding the knee joint. In general, this approach shows great potential beyond the traditional method, in terms of the accuracy achieved and the recognition of the anatomical region where the ultrasound has been attached as an additional functionality.

Updated: 2024-05-31 11:31:12

标题: 解剖区域识别和实时骨骼跟踪方法：通过动态解码A模式超声信号

摘要: 准确的骨骼跟踪对于在骨科手术和假肢机器人学中的运动学分析至关重要。传统方法（例如皮肤标记）容易受软组织伪影影响，手术中使用的骨钉会引入额外的创伤和感染风险。对于肌电图（EMG），由于其无法直接测量关节角度，需要复杂的算法进行运动学估计。为了解决这些问题，提出了基于A模式超声波跟踪的方法作为一种无创和安全的替代方法。然而，这种方法在处理接收到的超声信号时存在峰值检测的准确性有限。为了构建精确和实时的骨骼跟踪方法，本文介绍了一种基于深度学习的方法，使用A模式超声信号进行解剖区域识别和骨骼跟踪，特别关注膝关节。该算法能够同时进行骨骼跟踪并识别A模式超声波探头放置的解剖区域。它包含级联U-Nets所有编码和解码层的完全连接，仅关注最有可能具有骨峰的信号区域，从而准确定位峰值的确切位置并分类信号的解剖区域。实验证明，在不同围绕膝关节的解剖区域动态跟踪条件下，解剖区域分类准确率达到97％，精度约为0.5±1mm。总的来说，这种方法在实现的准确性和识别超声波附着的解剖区域的功能上显示出巨大潜力，超越了传统方法。

更新时间: 2024-05-31 11:31:12

领域: eess.SP,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.19542v2

Permutation Decision Trees

Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are Shannon entropy and Gini impurity. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to any permutation of the data. This is a limitation in terms of modeling when there are temporal order dependencies between data instances. In this research, we propose the adoption of Effort-To-Compress (ETC) - a complexity measure, for the first time, as an alternative impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity based on ETC is able to capture order dependencies in the data, thus obtaining potentially different decision trees for different permutations of the same data instances, a concept we term as Permutation Decision Trees (PDT). We then introduce the notion of Permutation Bagging achieved using permutation decision trees without the need for random feature selection and sub-sampling. We conduct a performance comparison between Permutation Decision Trees and classical decision trees across various real-world datasets, including Appendicitis, Breast Cancer Wisconsin, Diabetes Pima Indian, Ionosphere, Iris, Sonar, and Wine. Our findings reveal that PDT demonstrates comparable performance to classical decision trees across most datasets. Remarkably, in certain instances, PDT even slightly surpasses the performance of classical decision trees. In comparing Permutation Bagging with Random Forest, we attain comparable performance to Random Forest models consisting of 50 to 1000 trees, using merely 21 trees. This highlights the efficiency and effectiveness of Permutation Bagging in achieving comparable performance outcomes with significantly fewer trees.

Updated: 2024-05-31 11:30:31

标题: 排列决策树

摘要: 决策树是一种广泛理解的基于最小化内部节点不纯度的机器学习模型。最常见的不纯度度量是Shannon熵和Gini不纯度。这些不纯度度量对训练数据的顺序不敏感，因此最终获得的树对数据的任何排列都是不变的。当数据实例之间存在时间顺序依赖关系时，这在建模方面是一个限制。在这项研究中，我们首次提出采用Effort-To-Compress（ETC）-一种复杂度度量，作为替代的不纯度度量。与Shannon熵和Gini不纯度不同，基于ETC的结构不纯度能够捕捉数据中的顺序依赖关系，因此对相同数据实例的不同排列可能获得潜在不同的决策树，我们将其称为Permutation Decision Trees（PDT）的概念。然后，我们引入了通过使用不需要随机特征选择和子抽样的排列决策树实现的Permutation Bagging的概念。我们在包括阑尾炎、威斯康星州乳腺癌、糖尿病皮马印第安、电离层、鸢尾花、声纳和葡萄酒等各种真实世界数据集之间进行了性能比较。我们的研究结果显示，PDT在大多数数据集上表现出与经典决策树相当的性能。值得注意的是，在某些情况下，PDT甚至略微超过经典决策树的性能。在将Permutation Bagging与随机森林进行比较时，我们仅使用21棵树，就能达到与由50至1000棵树组成的随机森林模型相当的性能。这突出了Permutation Bagging在以明显较少的树实现相当性能结果方面的效率和有效性。

更新时间: 2024-05-31 11:30:31

领域: cs.LG

下载: http://arxiv.org/abs/2306.02617v3

Simplifying Transformer Blocks

A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable. In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical observations, we motivate modifications that allow many block components to be removed with no loss of training speed, including skip connections, projection or value parameters, sequential sub-blocks and normalisation layers. In experiments on both autoregressive decoder-only and BERT encoder-only models, our simplified transformers emulate the per-update training speed and performance of standard transformers, while enjoying 15% faster training throughput, and using 15% fewer parameters.

Updated: 2024-05-31 11:14:16

标题: 简化变压器块

摘要: 一个简单的深度Transformer设计配方是组合相同的构建模块。但是标准的Transformer模块远非简单，将注意力和MLP子模块与跳跃连接和规范化层交织在一起，以精确的排列。这种复杂性导致脆弱的架构，看似微小的变化可能显著降低训练速度，或使模型无法训练。在这项工作中，我们探讨了标准Transformer模块可以简化到什么程度。结合信号传播理论和经验观察，我们提出了修改建议，允许许多模块组件被移除而不会影响训练速度，包括跳跃连接、投影或值参数、顺序子模块和规范化层。在自回归解码器和BERT编码器模型的实验中，我们简化的Transformer模拟了标准Transformer的更新训练速度和性能，同时享有15%更快的训练吞吐量，并使用15%更少的参数。

更新时间: 2024-05-31 11:14:16

领域: cs.LG

下载: http://arxiv.org/abs/2311.01906v2

Awesome Multi-modal Object Tracking

Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attention. However, existing MMOT algorithms mainly focus on two modalities (\eg RGB+depth, RGB+thermal infrared, and RGB+language). To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality. Additionally, some large-scale multi-modal tracking benchmarks have been established by simultaneously providing more than two modalities, such as vision-language-audio (\eg WebUAV-3M) and vision-depth-language (\eg UniMod1K). To track the latest progress in MMOT, we conduct a comprehensive investigation in this report. Specifically, we first divide existing MMOT tasks into five main categories, \ie RGBL tracking, RGBE tracking, RGBD tracking, RGBT tracking, and miscellaneous (RGB+X), where X can be any modality, such as language, depth, and event. Then, we analyze and summarize each MMOT task, focusing on widely used datasets and mainstream tracking algorithms based on their technical paradigms (\eg self-supervised learning, prompt learning, knowledge distillation, generative models, and state space models). Finally, we maintain a continuously updated paper list for MMOT at https://github.com/983632847/Awesome-Multimodal-Object-Tracking.

Updated: 2024-05-31 11:09:59

标题: 出色的多模式物体跟踪

摘要: 多模态目标跟踪（MMOT）是一个新兴领域，它结合了来自各种模态的数据，例如视觉（RGB）、深度、热红外、事件、语言和音频，以估计视频序列中任意对象的状态。对于诸如自动驾驶和智能监控等许多应用来说，这具有重要意义。近年来，MMOT受到越来越多的关注。然而，现有的MMOT算法主要集中在两种模态上（例如RGB+深度、RGB+热红外和RGB+语言）。为了利用更多的模态，一些最近的努力已经做出了努力，学习一个统一的视觉物体跟踪模型适用于任何模态。此外，一些大规模的多模态跟踪基准已经建立，同时提供了两种以上的模态，例如视觉-语言-音频（例如WebUAV-3M）和视觉-深度-语言（例如UniMod1K）。为了追踪MMOT领域的最新进展，我们在本报告中进行了全面的调查。具体来说，我们首先将现有的MMOT任务分为五个主要类别，即RGBL跟踪、RGBE跟踪、RGBD跟踪、RGBT跟踪和其他（RGB+X），其中X可以是任何模态，如语言、深度和事件。然后，我们分析和总结每个MMOT任务，重点关注广泛使用的数据集和基于其技术范式（例如自监督学习、提示学习、知识蒸馏、生成模型和状态空间模型）的主流跟踪算法。最后，我们在https://github.com/983632847/Awesome-Multimodal-Object-Tracking 上维护一个持续更新的MMOT论文列表。

更新时间: 2024-05-31 11:09:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14200v2

4DHands: Reconstructing Interactive Hands in 4D with Transformers

In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transformer-based architecture with novel tokenization and feature fusion strategies. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a Spatio-temporal Interaction Reasoning (SIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://4dhands.github.io.

Updated: 2024-05-31 10:52:56

标题: 4DHands：使用变换器在四维空间中重建交互式手部

摘要: 在本文中，我们介绍了4DHands，这是一种从单眼输入中恢复交互式手部网格及其相对运动的稳健方法。我们的方法解决了以前方法的两个主要局限性：缺乏处理各种手部图像输入的统一解决方案，以及忽略图像中两只手的位置关系。为了克服这些挑战，我们开发了一种基于转换器的架构，具有新颖的标记化和特征融合策略。具体来说，我们提出了一种关系感知的双手标记化（RAT）方法，将位置关系信息嵌入手部标记中。通过这种方式，我们的网络可以处理单手和双手输入，并明确利用相对手部位置，促进在现实场景中复杂手部交互的重建。由于这种标记化表示了两只手的相对关系，它还支持更有效的特征融合。为此，我们进一步开发了一个时空交互推理（SIR）模块，以带有注意力的方式融合4D中的手部标记，并将它们解码为3D手部网格和相对时间运动。我们的方法的有效性已在多个基准数据集上得到验证。野外视频和现实场景的结果表明，我们的方法在交互式手部重建方面表现优越。更多视频结果可以在项目页面上找到：https://4dhands.github.io。

更新时间: 2024-05-31 10:52:56

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2405.20330v2

OpenTensor: Reproducing Faster Matrix Multiplication Discovering Algorithms

OpenTensor is a reproduction of AlphaTensor, which discovered a new algorithm that outperforms the state-of-the-art methods for matrix multiplication by Deep Reinforcement Learning (DRL). While AlphaTensor provides a promising framework for solving scientific problems, it is really hard to reproduce due to the massive tricks and lack of source codes. In this paper, we clean up the algorithm pipeline, clarify the technical details, and make some improvements to the training process. Computational results show that OpenTensor can successfully find efficient matrix multiplication algorithms.

Updated: 2024-05-31 10:30:14

标题: OpenTensor：复现更快的矩阵乘法发现算法

摘要: OpenTensor是AlphaTensor的一个复制品，它发现了一种新的算法，通过深度强化学习（DRL）在矩阵乘法方面优于现有的方法。虽然AlphaTensor为解决科学问题提供了一个有希望的框架，但由于技巧繁多且缺乏源代码，很难复制。在本文中，我们简化了算法流程，澄清了技术细节，并对训练过程进行了一些改进。计算结果表明，OpenTensor可以成功找到高效的矩阵乘法算法。

更新时间: 2024-05-31 10:30:14

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20748v1

Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes

Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents, e.g. basketball players engaged in intricate interactions with long-term intentions. Deep generative models offer a natural learning approach for trajectory forecasting, yet they encounter difficulties in achieving an optimal balance between sampling fidelity and diversity. We address this challenge by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs), which utilize a discrete latent space to tackle the issue of posterior collapse. Specifically, we introduce an instance-based codebook that allows tailored latent representations for each example. In a nutshell, the rows of the codebook are dynamically adjusted to reflect contextual information (i.e., past motion patterns extracted from the observed trajectories). In this way, the discretization process gains flexibility, leading to improved reconstructions. Notably, instance-level dynamics are injected into the codebook through low-rank updates, which restrict the customization of the codebook to a lower dimension space. The resulting discrete space serves as the basis of the subsequent step, which regards the training of a diffusion-based predictive model. We show that such a two-fold framework, augmented with instance-level discretization, leads to accurate and diverse forecasts, yielding state-of-the-art performance on three established benchmarks.

Updated: 2024-05-31 10:13:17

标题: 通过离散潜在代码的低秩适应进行轨迹预测

摘要: 轨迹预测对于视频监控分析至关重要，因为它可以预测一组代理人的未来移动，例如与长期意图交互的篮球运动员。深度生成模型为轨迹预测提供了一种自然的学习方法，但它们在实现采样保真度和多样性之间的最佳平衡时遇到困难。我们通过利用向量量化变分自动编码器（VQ-VAEs）来解决这一挑战，利用离散潜在空间来解决后验坍缩问题。具体来说，我们引入了一个基于实例的码书，允许为每个示例定制潜在表示。简而言之，码书的行动态调整以反映上下文信息（即从观察轨迹中提取的过去运动模式）。通过这种方式，离散化过程获得了灵活性，导致改进的重建。值得注意的是，通过低秩更新将实例级动态注入到码书中，限制了码书的定制到较低维度空间。由此产生的离散空间成为后续步骤的基础，这一步骤涉及基于扩散的预测模型的训练。我们表明，这样一个双重框架，辅以实例级离散化，可以产生准确且多样化的预测，在三个已建立的基准测试中取得了最先进的性能。

更新时间: 2024-05-31 10:13:17

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.20743v1

Federated Random Forest for Partially Overlapping Clinical Data

In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, or other recorded patient history information across hospitals or institutes. To address the challenges posed by partially overlapping features and incomplete data in clinical datasets, a comprehensive approach is required. Particularly in the domain of medical data, promising outcomes are achieved by federated random forests whenever features align. However, for most standard algorithms, like random forest, it is essential that all data sets have identical parameters. Therefore, in this work the concept of federated random forest is adapted to a setting with partially overlapping features. Moreover, our research assesses the effectiveness of the newly developed federated random forest models for partially overlapping clinical data. For aggregating the federated, globally optimized model, only features available locally at each site can be used. We tackled two issues in federation: (i) the quantity of involved parties, (ii) the varying overlap of features. This evaluation was conducted across three clinical datasets. The federated random forest model even in cases where only a subset of features overlaps consistently demonstrates superior performance compared to its local counterpart. This holds true across various scenarios, including datasets with imbalanced classes. Consequently, federated random forests for partially overlapped data offer a promising solution to transcend barriers in collaborative research and corporate cooperation.

Updated: 2024-05-31 10:07:24

标题: 部分重叠临床数据的联合随机森林

摘要: 在医疗保健领域，围绕数据隐私和相应的数据保护法规以及异质和不协调的数据的意识，给大规模数据分析带来巨大挑战。此外，临床数据通常涉及部分重叠的特征，因为一些观察结果可能由于各种原因而缺失，例如程序、诊断测试或其他记录的患者历史信息在医院或机构之间的差异。为了解决临床数据集中部分重叠特征和不完整数据带来的挑战，需要采取综合的方法。特别是在医疗数据领域，当特征对齐时，联邦随机森林取得了令人满意的结果。然而，对于大多数标准算法，如随机森林，关键在于所有数据集具有相同的参数。因此，在这项工作中，联邦随机森林的概念被调整为具有部分重叠特征的设置。此外，我们的研究评估了新开发的联邦随机森林模型在部分重叠临床数据中的有效性。为了聚合联邦全局优化模型，每个站点只能使用本地可用的特征。我们解决了联邦中的两个问题：(i)涉及各方的数量，(ii)特征的变化重叠。该评估跨越了三个临床数据集。即使在仅一部分特征重叠的情况下，联邦随机森林模型相对于其本地对应部分表现出更优异的性能。这适用于各种情景，包括类别不平衡的数据集。因此，联邦随机森林用于部分重叠数据为克服合作研究和企业合作中的障碍提供了一种有前途的解决方案。

更新时间: 2024-05-31 10:07:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.20738v1

Towards Climate Variable Prediction with Conditioned Spatio-Temporal Normalizing Flows

This study investigates how conditional normalizing flows can be applied to remote sensing data products in climate science for spatio-temporal prediction. The method is chosen due to its desired properties such as exact likelihood computation, predictive uncertainty estimation and efficient inference and sampling which facilitates faster exploration of climate scenarios. Experimental findings reveal that the conditioned spatio-temporal flow surpasses both deterministic and stochastic baselines in prolonged rollout scenarios. It exhibits stable extrapolation beyond the training time horizon for extended rollout durations. These findings contribute valuable insights to the field of spatio-temporal modeling, with potential applications spanning diverse scientific disciplines.

Updated: 2024-05-31 09:58:08

标题: 朝向具有条件空间-时间归一化流的气候变量预测

摘要: 本研究探讨了如何将条件正规化流应用于气候科学中的遥感数据产品，以进行时空预测。选择这种方法是因为它具有精确的似然计算、预测不确定性估计和有效的推理和采样等理想属性，有助于更快地探索气候场景。实验结果表明，条件时空流在长时间滚动情景中优于确定性和随机基线。它展示了在延长的滚动持续时间内稳定的外推能力超越训练时间范围。这些发现为时空建模领域提供了宝贵的见解，潜在应用涵盖了各种科学学科。

更新时间: 2024-05-31 09:58:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.06958v3

Maximum Temperature Prediction Using Remote Sensing Data Via Convolutional Neural Network

Urban heat islands, defined as specific zones exhibiting substantially higher temperatures than their immediate environs, pose significant threats to environmental sustainability and public health. This study introduces a novel machine-learning model that amalgamates data from the Sentinel-3 satellite, meteorological predictions, and additional remote sensing inputs. The primary aim is to generate detailed spatiotemporal maps that forecast the peak temperatures within a 24-hour period in Turin. Experimental results validate the model's proficiency in predicting temperature patterns, achieving a Mean Absolute Error (MAE) of 2.09 degrees Celsius for the year 2023 at a resolution of 20 meters per pixel, thereby enriching our knowledge of urban climatic behavior. This investigation enhances the understanding of urban microclimates, emphasizing the importance of cross-disciplinary data integration, and laying the groundwork for informed policy-making aimed at alleviating the negative impacts of extreme urban temperatures.

Updated: 2024-05-31 09:39:41

标题: 利用卷积神经网络和遥感数据进行最高温度预测

摘要: 城市热岛被定义为具有明显高于其周围环境的温度的特定区域，对环境可持续性和公共健康构成重大威胁。本研究引入了一种新颖的机器学习模型，将来自Sentinel-3卫星、气象预测和其他遥感输入的数据融合在一起。其主要目标是生成详细的时空地图，预测都灵市24小时内的最高温度。实验结果验证了该模型在预测温度模式方面的熟练程度，2023年的平均绝对误差（MAE）为2.09摄氏度，分辨率为每像素20米，从而丰富了我们对城市气候行为的了解。这项研究提升了对城市微气候的理解，强调跨学科数据整合的重要性，为制定旨在缓解极端城市温度负面影响的明智政策奠定了基础。

更新时间: 2024-05-31 09:39:41

领域: cs.AI,cs.LG,I.2.10; G.3

下载: http://arxiv.org/abs/2405.20731v1

Visual Attention Analysis in Online Learning

In this paper, we present an approach in the Multimodal Learning Analytics field. Within this approach, we have developed a tool to visualize and analyze eye movement data collected during learning sessions in online courses. The tool is named VAAD (an acronym for Visual Attention Analysis Dashboard). These eye movement data have been gathered using an eye-tracker and subsequently processed and visualized for interpretation. The purpose of the tool is to conduct a descriptive analysis of the data by facilitating its visualization, enabling the identification of differences and learning patterns among various learner populations. Additionally, it integrates a predictive module capable of anticipating learner activities during a learning session. Consequently, VAAD holds the potential to offer valuable insights into online learning behaviors from both descriptive and predictive perspectives.

Updated: 2024-05-31 09:35:36

标题: 在线学习中的视觉注意力分析

摘要: 在这篇论文中，我们提出了一种多模态学习分析领域的方法。在这个方法中，我们开发了一种工具，用于可视化和分析在线课程学习过程中收集到的眼动数据。该工具被命名为VAAD（Visual Attention Analysis Dashboard的首字母缩写）。这些眼动数据是使用眼动追踪器收集的，随后经过处理和可视化以进行解释。该工具的目的是通过促进其可视化来进行数据的描述性分析，从而能够识别不同学习人群之间的差异和学习模式。此外，它还集成了一个能够预测学习过程中学习者活动的预测模块。因此，VAAD有潜力从描述性和预测性的角度为在线学习行为提供有价值的见解。

更新时间: 2024-05-31 09:35:36

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.20091v2

GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning

With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process. This paper introduces a novel defense mechanism against backdoor attacks in federated learning, named GANcrop. This approach leverages contrastive learning to deeply explore the disparities between malicious and benign models for attack identification, followed by the utilization of Generative Adversarial Networks (GAN) to recover backdoor triggers and implement targeted mitigation strategies. Experimental findings demonstrate that GANcrop effectively safeguards against backdoor attacks, particularly in non-IID scenarios, while maintaining satisfactory model accuracy, showcasing its remarkable defensive efficacy and practical utility.

Updated: 2024-05-31 09:33:16

标题: GANcrop: 联邦学习中对抗后门攻击的对比防御

摘要: 随着对数据隐私保护意识的提高，联邦学习（FL）作为一种保护隐私的分布式机器学习方法引起了广泛关注。然而，联邦学习的分布式特性也为后门攻击提供了机会，攻击者可以引导模型产生不正确的预测，而不影响全局模型训练过程。本文介绍了一种新颖的防御机制，用于防范联邦学习中的后门攻击，名为GANcrop。这种方法利用对比学习深入探索恶意和良性模型之间的差异，用生成对抗网络（GAN）恢复后门触发器并实施有针对性的缓解策略。实验结果表明，GANcrop有效地防范了后门攻击，特别是在非独立同分布的情况下，同时保持了令人满意的模型准确性，展示了其卓越的防御效力和实用性。

更新时间: 2024-05-31 09:33:16

领域: cs.CR,cs.AI,cs.DC

下载: http://arxiv.org/abs/2405.20727v1

GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, researchers have proposed to leverage the implicit prior knowledge of an over-parameterized network. However, they only utilize a fixed neural architecture for all the attack settings. This would hinder the adaptive use of implicit architectural priors and consequently limit the generalizability. In this paper, we further exploit such implicit prior knowledge by proposing Gradient Inversion via Neural Architecture Search (GI-NAS), which adaptively searches the network and captures the implicit priors behind neural architectures. Extensive experiments verify that our proposed GI-NAS can achieve superior attack performance compared to state-of-the-art gradient inversion methods, even under more practical settings with high-resolution images, large-sized batches, and advanced defense strategies.

Updated: 2024-05-31 09:29:43

标题: GI-NAS：通过自适应神经架构搜索增强梯度反转攻击

摘要: Gradient Inversion Attacks通过反转联邦学习系统中传输的梯度来重建本地客户端的敏感数据，并引起了相当大的隐私关注。大多数梯度反转方法严重依赖于明确的先验知识（例如，良好预训练的生成模型），这在现实场景中通常是不可用的。为了减轻这个问题，研究人员提出利用过参数化网络的隐式先验知识。然而，他们只针对所有攻击设置使用固定的神经架构。这将阻碍隐式架构先验的自适应使用，从而限制泛化能力。在本文中，我们进一步利用这种隐式先验知识，提出了通过神经架构搜索的梯度反转（GI-NAS），它自适应地搜索网络并捕捉神经架构背后的隐式先验。大量实验验证了我们提出的GI-NAS相比最先进的梯度反转方法可以在更实际的设置中实现更优越的攻击性能，即使在高分辨率图像、大批量和先进的防御策略下也是如此。

更新时间: 2024-05-31 09:29:43

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.20725v1

Learning on Large Graphs using Intersecting Communities

Message Passing Neural Networks (MPNNs) are a staple of graph machine learning. MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors, which necessitates a memory complexity of the order of the number of graph edges. This complexity might quickly become prohibitive for large graphs provided they are not very sparse. In this paper, we propose a novel approach to alleviate this problem by approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques. The key insight is that the number of communities required to approximate a graph does not depend on the graph size. We develop a new constructive version of the Weak Graph Regularity Lemma to efficiently construct an approximating ICG for any input graph. We then devise an efficient graph learning algorithm operating directly on ICG in linear memory and time with respect to the number of nodes (rather than edges). This offers a new and fundamentally different pipeline for learning on very large non-sparse graphs, whose applicability is demonstrated empirically on node classification tasks and spatio-temporal data processing.

Updated: 2024-05-31 09:26:26

标题: 在大图中使用交叉社区进行学习

摘要: 消息传递神经网络（MPNNs）是图机器学习的重要工具。MPNNs通过从节点的邻居聚合消息来迭代更新输入图中每个节点的表示，这需要一个与图边数量级相当的内存复杂度。对于大型图来说，这种复杂性可能很快变得不可接受，前提是它们不是非常稀疏的。在本文中，我们提出了一种新颖的方法来缓解这个问题，即将输入图近似为一个相交社区图（ICG）-- 一个相交团体的组合。关键的洞察是用于近似图所需的社区数量并不取决于图的大小。我们开发了一个新的弱图规则引理的构造版本，以有效地构建一个近似的ICG用于任何输入图。然后我们设计了一种有效的图学习算法，直接在ICG上操作，其内存和时间复杂度与节点数量（而不是边）成线性关系。这为在非稀疏大型图上学习提供了一种新的基本不同的流程，其适用性在节点分类任务和时空数据处理上经验性地得到了证明。

更新时间: 2024-05-31 09:26:26

领域: cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2405.20724v1

ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model

Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.

Updated: 2024-05-31 09:23:39

标题: ContextGS：具有锚点级上下文模型的紧凑3D高斯喷溅

摘要: 最近，三维高斯喷洒（3DGS）已成为新视角合成的一个有前途的框架，提供快速渲染速度和高保真度。然而，大量高斯函数及其相关属性需要有效的压缩技术。现有方法主要是独立地压缩神经高斯函数，即同时对所有神经高斯函数进行编码，对它们的交互和空间依赖性设计较少。受图像压缩中上下文模型的有效性启发，我们在这项工作中提出了3DGS压缩的第一个锚定自回归模型。我们将锚点分为不同级别，尚未编码的锚点可以基于所有更粗级别上已编码的锚点进行预测，从而实现更准确的建模和更高的编码效率。为了进一步提高熵编码的效率，例如对没有已编码锚点的最粗级别进行编码，我们建议为每个锚点引入低维度量化特征作为超先验，这可以得到有效的压缩。我们的工作在3DGS表示的锚定级别引入了上下文模型，相较于原始3DGS，大小减小了100多倍，比最新的最先进工作Scaffold-GS减小了15倍，同时实现了可比较甚至更高的渲染质量。

更新时间: 2024-05-31 09:23:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20721v1

Climate Variable Downscaling with Conditional Normalizing Flows

Predictions of global climate models typically operate on coarse spatial scales due to the large computational costs of climate simulations. This has led to a considerable interest in methods for statistical downscaling, a similar process to super-resolution in the computer vision context, to provide more local and regional climate information. In this work, we apply conditional normalizing flows to the task of climate variable downscaling. We showcase its successful performance on an ERA5 water content dataset for different upsampling factors. Additionally, we show that the method allows us to assess the predictive uncertainty in terms of standard deviation from the fitted conditional distribution mean.

Updated: 2024-05-31 09:20:33

标题: 气候变量通过条件正态流降尺度

摘要: 全球气候模型的预测通常在粗略的空间尺度上运行，这是由于气候模拟的巨大计算成本。这导致人们对统计降尺度方法产生了极大兴趣，这类方法类似于计算机视觉背景下的超分辨率，可以提供更多的本地和区域气候信息。在这项工作中，我们将条件正态流应用于气候变量降尺度的任务。我们展示了它在不同上采样因子下对ERA5水含量数据集的成功表现。此外，我们展示了该方法使我们能够根据拟合的条件分布均值来评估预测不确定性的标准差。

更新时间: 2024-05-31 09:20:33

领域: cs.AI,cs.CV,physics.ao-ph

下载: http://arxiv.org/abs/2405.20719v1

Female mosquito detection by means of AI techniques inside release containers in the context of a Sterile Insect Technique program

The Sterile Insect Technique (SIT) is a biological pest control technique based on the release into the environment of sterile males of the insect species whose population is to be controlled. The entire SIT process involves mass-rearing within a biofactory, sorting of the specimens by sex, sterilization, and subsequent release of the sterile males into the environment. The reason for avoiding the release of female specimens is because, unlike males, females bite, with the subsequent risk of disease transmission. In the case of Aedes mosquito biofactories for SIT, the key point of the whole process is sex separation. This process is nowadays performed by a combination of mechanical devices and AI-based vision systems. However, there is still a possibility of false negatives, so a last stage of verification is necessary before releasing them into the environment. It is known that the sound produced by the flapping of adult male mosquitoes is different from that produced by females, so this feature can be used to detect the presence of females in containers prior to environmental release. This paper presents a study for the detection of females in Aedes mosquito release vessels for SIT programs. The containers used consist of PVC a tubular design of 8.8cm diameter and 12.5cm height. The containers were placed in an experimental setup that allowed the recording of the sound of mosquito flight inside of them. Each container was filled with 250 specimens considering the cases of (i) only male mosquitoes, (ii) only female mosquitoes, and (iii) 75% males and 25% females. Case (i) was used for training and testing, whereas cases (ii) and (iii) were used only for testing. Two algorithms were implemented for the detection of female mosquitoes: an unsupervised outlier detection algorithm (iForest) and a one-class SVM trained with male-only recordings.

Updated: 2024-05-31 09:20:24

标题: 在无性技术计划背景下，通过人工智能技术在释放容器内检测雌性蚊子

摘要: 无性昆虫技术（SIT）是一种基于释放被控制昆虫种群的无性雄性的生物害虫控制技术。整个SIT过程包括在生物工厂内的大规模繁殖、按性别分类标本、灭菌以及随后释放无性雄性到环境中。避免释放雌性标本的原因是因为与雄性不同，雌性会叮咬，从而导致疾病传播的风险。在用于SIT的伊蚊生物工厂中，整个过程的关键点是性别分离。该过程现在通过机械设备和基于人工智能视觉系统的组合来执行。然而，仍存在虚阴性的可能性，因此在释放到环境中之前需要进行最后的验证阶段。已知成年雄蚊振翅产生的声音与雌性产生的声音不同，因此可以利用这一特征来检测容器中女性的存在。本文提出了一项用于SIT计划中检测伊蚊释放容器中雌性的研究。所使用的容器是直径为8.8厘米、高度为12.5厘米的PVC管状设计。这些容器被放置在一个实验装置中，允许记录其中蚊子飞行的声音。每个容器中装有250个标本，考虑了（i）仅有雄性蚊子、（ii）仅有雌性蚊子和（iii）75%雄性和25%雌性的情况。情况（i）用于训练和测试，而情况（ii）和（iii）仅用于测试。实施了两种算法来检测雌性蚊子：一种无监督异常检测算法（iForest）和一种使用仅有雄性录音训练的一类支持向量机。

更新时间: 2024-05-31 09:20:24

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2306.10843v2

Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity. Despite the effectiveness, existing works still face two persistent challenges: (1) how to extract common supervision signals from popular items to improve the unpopular item representations, and (2) how to alleviate the representation separation caused by popularity bias. In this work, we conduct an empirical analysis of popularity bias and propose Popularity-Aware Alignment and Contrast (PAAC) to address two challenges. Specifically, we use the common supervisory signals modeled in popular item representations and propose a novel popularity-aware supervised alignment module to learn unpopular item representations. Additionally, we suggest re-weighting the contrastive learning loss to mitigate the representation separation from a popularity-centric perspective. Finally, we validate the effectiveness and rationale of PAAC in mitigating popularity bias through extensive experiments on three real-world datasets. Our code is available at https://github.com/miaomiao-cai2/KDD2024-PAAC.

Updated: 2024-05-31 09:14:48

标题: 考虑流行度的对齐和对比以减轻流行度偏见

摘要: 协同过滤（CF）通常受到流行度偏见的显著挑战，这是由于真实世界数据集中物品分布不均匀所导致的。这种偏见导致流行和不受欢迎物品之间存在显著的准确度差距。它不仅阻碍了准确的用户偏好理解，还加剧了推荐系统中的马太效应。为了缓解流行度偏见，现有努力集中在强调不受欢迎的物品或分离物品表示与其流行度之间的相关性。尽管有效，现有工作仍然面临两个持久的挑战：（1）如何从流行物品中提取共同的监督信号以改进不受欢迎的物品表示，以及（2）如何减轻由流行度偏见引起的表示分离。在这项工作中，我们对流行度偏见进行了实证分析，并提出了一种名为Popularity-Aware Alignment and Contrast（PAAC）的方法来解决这两个挑战。具体地，我们使用在流行物品表示中建模的共同监督信号，并提出了一种新颖的流行度感知监督对齐模块来学习不受欢迎的物品表示。此外，我们建议重新加权对比学习损失，以从流行度中心的角度减轻表示分离。最后，我们通过对三个真实世界数据集进行大量实验证实了PAAC在减轻流行度偏见方面的有效性和理论基础。我们的代码可在https://github.com/miaomiao-cai2/KDD2024-PAAC上找到。

更新时间: 2024-05-31 09:14:48

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.20718v1

Cyclic image generation using chaotic dynamics

Successive image generation using cyclic transformations is demonstrated by extending the CycleGAN model to transform images among three different categories. Repeated application of the trained generators produces sequences of images that transition among the different categories. The generated image sequences occupy a more limited region of the image space compared with the original training dataset. Quantitative evaluation using precision and recall metrics indicates that the generated images have high quality but reduced diversity relative to the training dataset. Such successive generation processes are characterized as chaotic dynamics in terms of dynamical system theory. Positive Lyapunov exponents estimated from the generated trajectories confirm the presence of chaotic dynamics, with the Lyapunov dimension of the attractor found to be comparable to the intrinsic dimension of the training data manifold. The results suggest that chaotic dynamics in the image space defined by the deep generative model contribute to the diversity of the generated images, constituting a novel approach for multi-class image generation. This model can be interpreted as an extension of classical associative memory to perform hetero-association among image categories.

Updated: 2024-05-31 09:14:36

标题: 使用混沌动力学生成循环图像

摘要: 通过将CycleGAN模型扩展到三个不同类别的图像之间进行图像生成，演示了连续图像生成使用循环变换的过程。经过训练的生成器的重复应用产生了在不同类别之间过渡的图像序列。与原始训练数据集相比，生成的图像序列占据了图像空间的更有限区域。使用精确度和召回率度量进行定量评估表明，生成的图像具有高质量但相对于训练数据集减少了多样性。这种连续生成过程在动力系统理论中被描述为混沌动力学。从生成的轨迹中估计的正立方指数确认了混沌动力学的存在，吸引子的Lyapunov维度发现与训练数据流形的内在维度相当。结果表明，由深度生成模型定义的图像空间中的混沌动力学有助于生成图像的多样性，构成了多类图像生成的新方法。这个模型可以被解释为将经典联惯性记忆扩展到执行图像类别之间的异质关联。

更新时间: 2024-05-31 09:14:36

领域: cs.CV,cs.LG,nlin.CD

下载: http://arxiv.org/abs/2405.20717v1

Fast Evaluation of S-boxes with Garbled Circuits

Garbling schemes are vital primitives for privacy-preserving protocols and secure two-party computation. This paper presents a projective garbling scheme that assigns $2^n$ values to wires in a circuit comprising XOR and unary projection gates. A generalization of FreeXOR allows the XOR of wires with $2^n$ values to be very efficient. We then analyze the performance of our scheme by evaluating substitution-permutation ciphers. Using our proposal, we measure high-speed evaluation of the ciphers with a moderately increased cost in garbling and bandwidth. Theoretical analysis suggests that for evaluating the nine examined ciphers, one can expect a 4- to 70-fold improvement in evaluation performance with, at most, a 4-fold increase in garbling cost and, at most, an 8-fold increase in communication cost compared to the Half-Gates (Zahur, Rosulek and Evans; Eurocrypt'15) and ThreeHalves (Rosulek and Roy; Crypto'21) garbling schemes. In an offline/online setting, such as secure function evaluation as a service, the circuit garbling and communication to the evaluator can proceed in the offline phase. Thus, our scheme offers a fast online phase. Furthermore, we present efficient Boolean circuits for the S-boxes of TWINE and Midori64 ciphers. To our knowledge, our formulas give the smallest number of AND gates for the S-boxes of these two ciphers.

Updated: 2024-05-31 09:07:44

标题: S-boxes带有混淆电路的快速评估

摘要: Garbling schemes are crucial components for protocols that aim to protect privacy and enable secure two-party computation. This paper introduces a projective garbling scheme that assigns $2^n$ values to wires in a circuit consisting of XOR and unary projection gates. By extending FreeXOR, the XOR operation on wires with $2^n$ values becomes highly efficient. The performance of this scheme is evaluated through substitution-permutation ciphers. With our proposal, the ciphers can be evaluated at high speed with a moderate increase in garbling and bandwidth costs. Theoretical analysis indicates that for the nine examined ciphers, performance improvement in evaluation can range from 4 to 70 times, with a maximum 4-fold increase in garbling cost and 8-fold increase in communication cost compared to the Half-Gates and ThreeHalves garbling schemes. In scenarios like secure function evaluation as a service, where circuit garbling and communication with the evaluator can be done offline, our scheme allows for a fast online phase. Additionally, we present efficient Boolean circuits for the S-boxes of TWINE and Midori64 ciphers, with our formulas requiring the fewest AND gates for these ciphers to our knowledge.

更新时间: 2024-05-31 09:07:44

领域: cs.CR

下载: http://arxiv.org/abs/2405.20713v1

FinGen: A Dataset for Argument Generation in Finance

Thinking about the future is one of the important activities that people do in daily life. Futurists also pay a lot of effort into figuring out possible scenarios for the future. We argue that the exploration of this direction is still in an early stage in the NLP research. To this end, we propose three argument generation tasks in the financial application scenario. Our experimental results show these tasks are still big challenges for representative generation models. Based on our empirical results, we further point out several unresolved issues and challenges in this research direction.

Updated: 2024-05-31 09:00:43

标题: FinGen：金融领域论点生成的数据集

摘要: 思考未来是人们日常生活中重要的活动之一。未来学家们也付出了很多努力来探索未来可能的场景。我们认为，在自然语言处理研究中，对这个方向的探索仍处于早期阶段。为此，我们提出了三个在金融应用场景中的论证生成任务。我们的实验结果显示，这些任务对于代表性生成模型仍然是巨大的挑战。根据我们的实证结果，我们进一步指出了这一研究方向中的几个尚未解决的问题和挑战。

更新时间: 2024-05-31 09:00:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20708v1

ADESSE: Advice Explanations in Complex Repeated Decision-Making Environments

In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated decision-making environments. Whether the human decision-maker would follow the agent's advice depends on their beliefs and trust in the agent and on their understanding of the advice itself. To this end, we developed an approach named ADESSE to generate explanations about the adviser agent to improve human trust and decision-making. Computational experiments on a range of environments with varying model sizes demonstrate the applicability and scalability of ADESSE. Furthermore, an interactive game-based user study shows that participants were significantly more satisfied, achieved a higher reward in the game, and took less time to select an action when presented with explanations generated by ADESSE. These findings illuminate the critical role of tailored, human-centered explanations in AI-assisted decision-making.

Updated: 2024-05-31 08:59:20

标题: ADESSE：在复杂重复决策环境中的建议解释

摘要: 在人工智能以人为中心的不断发展的背景下，促进人类与人工智能代理在决策过程中形成协同关系是一个重要挑战。本文考虑了一个问题设置，智能代理包括基于神经网络的预测组件和深度强化学习组件，为复杂的重复决策环境中的人类决策者提供建议。人类决策者是否会听从代理的建议取决于他们对代理的信念和信任以及对建议本身的理解。为此，我们开发了一种名为ADESSE的方法，用于生成关于顾问代理的解释，以提高人类的信任和决策能力。在一系列具有不同模型大小的环境上进行的计算实验展示了ADESSE的适用性和可扩展性。此外，一个基于互动游戏的用户研究显示，当被提供ADESSE生成的解释时，参与者在游戏中获得了更高的奖励，选取行动所需的时间更少，并且显著更加满意。这些发现阐明了定制的以人为中心的解释在人工智能辅助决策中的关键作用。

更新时间: 2024-05-31 08:59:20

领域: cs.AI

下载: http://arxiv.org/abs/2405.20705v1

Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

Detecting out-of-scope user utterances is essential for task-oriented dialogues and intent classification. Current methodologies face difficulties with the unpredictable distribution of outliers and often rely on assumptions about data distributions. We present the Dual Encoder for Threshold-Based Re-Classification (DETER) to address these challenges. This end-to-end framework efficiently detects out-of-scope intents without requiring assumptions on data distributions or additional post-processing steps. The core of DETER utilizes dual text encoders, the Universal Sentence Encoder (USE) and the Transformer-based Denoising AutoEncoder (TSDAE), to generate user utterance embeddings, which are classified through a branched neural architecture. Further, DETER generates synthetic outliers using self-supervision and incorporates out-of-scope phrases from open-domain datasets. This approach ensures a comprehensive training set for out-of-scope detection. Additionally, a threshold-based re-classification mechanism refines the model's initial predictions. Evaluations on the CLINC-150, Stackoverflow, and Banking77 datasets demonstrate DETER's efficacy. Our model outperforms previous benchmarks, increasing up to 13% and 5% in F1 score for known and unknown intents on CLINC-150 and Stackoverflow, and 16% for known and 24% % for unknown intents on Banking77. The source code has been released at https://github.com/Hossam-Mohammed-tech/Intent_Classification_OOS.

Updated: 2024-05-31 08:54:24

标题: 使用双重编码和基于阈值的重新分类改进超范围意图分类

摘要: 检测范围外用户话语对于面向任务的对话和意图分类至关重要。当前的方法面临着无法预测的异常值分布的困难，通常依赖于关于数据分布的假设。我们提出了基于阈值的双编码器再分类（DETER）来解决这些挑战。这一端到端的框架能够有效地检测出范围外的意图，而无需对数据分布或额外的后处理步骤做出假设。DETER的核心利用了双文本编码器，通用句子编码器（USE）和基于Transformer的去噪自动编码器（TSDAE），生成用户话语嵌入，通过分支神经结构进行分类。此外，DETER利用自监督生成合成的异常值，并结合开放领域数据集中的范围外短语。这种方法确保了范围外检测的全面训练集。此外，基于阈值的再分类机制调整模型的初始预测。在CLINC-150、Stackoverflow和Banking77数据集上的评估显示了DETER的有效性。我们的模型超越了先前的基准，CLINC-150和Stackoverflow上的已知和未知意图的F1分数增加了高达13%和5%，在Banking77上已知意图增加了16%，未知意图增加了24%。源代码已发布在https://github.com/Hossam-Mohammed-tech/Intent_Classification_OOS。

更新时间: 2024-05-31 08:54:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19967v2

Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to humans. By providing models with neighborhood instructions, which are closely situated in the latent representation space and differ by only one semantically similar word, the performance on downstream tasks can be vastly different. Following this property, we propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE). COPLE performs iterative lexical optimization according to the feedback from a batch of proxy tasks, using a search strategy related to word influence. Experiments show that even widely-used human-crafted prompts for current benchmarks suffer from the lexical sensitivity of models, and COPLE recovers the declined model ability in both instruct-following and solving downstream tasks.

Updated: 2024-05-31 08:53:59

标题: 揭示LLMs的词汇敏感性：为提示增强进行组合优化

摘要: 大型语言模型(LLMs)展示了出色的指示遵循能力，能够完成各种下游任务。尽管这种令人印象深刻的能力使LLMs成为灵活的任务解决者，但它们解决任务的表现也严重依赖于指令。在本文中，我们揭示了LLMs对任务指令中的词汇变化过于敏感，即使这些变化对人类来说是不可察觉的。通过为模型提供邻域指令，这些指令在潜在表示空间中紧密相邻，仅有一个语义相似的词不同，下游任务的表现可以截然不同。基于这一特性，我们提出了一个黑盒组合优化框架，用于指令词汇增强(COPLE)。COPLE根据一批代理任务的反馈执行迭代的词汇优化，使用与词语影响相关的搜索策略。实验证明，即使是当前基准中广泛使用的人工设计的提示也受到模型对词汇的敏感性影响，而COPLE可以恢复模型在指示遵循和解决下游任务中的能力下降。

更新时间: 2024-05-31 08:53:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20701v1

Self-degraded contrastive domain adaptation for industrial fault diagnosis with bi-imbalanced data

Modern industrial fault diagnosis tasks often face the combined challenge of distribution discrepancy and bi-imbalance. Existing domain adaptation approaches pay little attention to the prevailing bi-imbalance, leading to poor domain adaptation performance or even negative transfer. In this work, we propose a self-degraded contrastive domain adaptation (Sd-CDA) diagnosis framework to handle the domain discrepancy under the bi-imbalanced data. It first pre-trains the feature extractor via imbalance-aware contrastive learning based on model pruning to learn the feature representation efficiently in a self-supervised manner. Then it forces the samples away from the domain boundary based on supervised contrastive domain adversarial learning (SupCon-DA) and ensures the features generated by the feature extractor are discriminative enough. Furthermore, we propose the pruned contrastive domain adversarial learning (PSupCon-DA) to pay automatically re-weighted attention to the minorities to enhance the performance towards bi-imbalanced data. We show the superiority of the proposed method via two experiments.

Updated: 2024-05-31 08:51:57

标题: 自降级对比领域自适应在工业故障诊断中的应用：基于双不平衡数据

摘要: 现代工业故障诊断任务常常面临分布差异和双向不平衡的挑战。现有的领域自适应方法很少关注普遍存在的双向不平衡问题，导致领域自适应性能不佳甚至出现负转移。在本研究中，我们提出了一种自降级对比领域自适应（Sd-CDA）诊断框架，以处理双向不平衡数据下的领域差异。首先，它通过基于模型修剪的失衡感知对比学习预训练特征提取器，以自监督方式高效学习特征表示。然后，基于监督对比领域对抗学习（SupCon-DA），使样本远离领域边界，并确保特征提取器生成的特征具有足够的区分性。此外，我们提出了修剪对比领域对抗学习（PSupCon-DA），以自动重新加权关注少数群体，以增强对双向不平衡数据的性能。通过两项实验展示了所提方法的优越性。

更新时间: 2024-05-31 08:51:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.20700v1

Hardware-Efficient EMG Decoding for Next-Generation Hand Prostheses

Advancements in neural engineering have enabled the development of Robotic Prosthetic Hands (RPHs) aimed at restoring hand functionality. Current commercial RPHs offer limited control through basic on/off commands. Recent progresses in machine learning enable finger movement decoding with higher degrees of freedom, yet the high computational complexity of such models limits their application in portable devices. Future RPH designs must balance portability, low power consumption, and high decoding accuracy to be practical for individuals with disabilities. To this end, we introduce a novel attractor-based neural network to realize on-chip movement decoding for next-generation portable RPHs. The proposed architecture comprises an encoder, an attention layer, an attractor network, and a refinement regressor. We tested our model on four healthy subjects and achieved a decoding accuracy of 80.3%. Our proposed model is over 120 and 50 times more compact compared to state-of-the-art LSTM and CNN models, respectively, with comparable (or superior) decoding accuracy. Therefore, it exhibits minimal hardware complexity and can be effectively integrated as a System-on-Chip.

Updated: 2024-05-31 08:50:25

标题: "硬件高效的EMG解码技术用于下一代手部假肢"

摘要: 神经工程的进步使得机器人假肢手的发展成为可能，旨在恢复手部功能。目前市面上的机器人假肢手通过基本的开关指令提供有限控制。最近机器学习的进展使得手指运动解码具有更高的自由度，然而这类模型的高计算复杂性限制了它们在便携设备中的应用。未来机器人假肢手的设计必须在便携性、低功耗和高解码准确度之间取得平衡，以便对残疾人士实用。为此，我们引入了一种新颖的基于吸引子的神经网络，以实现下一代便携机器人假肢手的芯片内运动解码。所提出的体系结构包括编码器、注意力层、吸引子网络和改进回归器。我们在四名健康受试者上测试了我们的模型，实现了80.3%的解码准确度。与最先进的LSTM和CNN模型相比，我们提出的模型分别更紧凑120倍和50倍，具有可比较（甚至更优）的解码准确度。因此，它具有最小的硬件复杂性，并可以有效地集成为片上系统。

更新时间: 2024-05-31 08:50:25

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.20052v2

CoDeGAN: Contrastive Disentanglement for Generative Adversarial Network

Disentanglement, a critical concern in interpretable machine learning, has also garnered significant attention from the computer vision community. Many existing GAN-based class disentanglement (unsupervised) approaches, such as InfoGAN and its variants, primarily aim to maximize the mutual information (MI) between the generated image and its latent codes. However, this focus may lead to a tendency for the network to generate highly similar images when presented with the same latent class factor, potentially resulting in mode collapse or mode dropping. To alleviate this problem, we propose \texttt{CoDeGAN} (Contrastive Disentanglement for Generative Adversarial Networks), where we relax similarity constraints for disentanglement from the image domain to the feature domain. This modification not only enhances the stability of GAN training but also improves their disentangling capabilities. Moreover, we integrate self-supervised pre-training into CoDeGAN to learn semantic representations, significantly facilitating unsupervised disentanglement. Extensive experimental results demonstrate the superiority of our method over state-of-the-art approaches across multiple benchmarks. The code is available at https://github.com/learninginvision/CoDeGAN.

Updated: 2024-05-31 08:50:18

标题: CoDeGAN: 生成对抗网络的对比解缠

摘要: 解缠，这是可解释机器学习中一个关键关注点，也引起了计算机视觉社区的重视。许多现有基于GAN的类解缠（无监督）方法，如InfoGAN及其变种，主要旨在最大化生成图像与其潜在代码之间的互信息（MI）。然而，这种关注可能导致网络在提供相同的潜在类因子时生成非常相似的图像，可能导致模式崩溃或模式丢失。为了缓解这个问题，我们提出了CoDeGAN（用于生成对抗网络的对比解缠），在其中我们将解缠的相似性约束从图像领域放松到特征领域。这种修改不仅增强了GAN训练的稳定性，还提高了它们的解缠能力。此外，我们将自监督预训练集成到CoDeGAN中，以学习语义表示，显着促进无监督解缠。大量实验结果证明我们的方法在多个基准测试中优于最先进的方法。代码可在https://github.com/learninginvision/CoDeGAN上获得。

更新时间: 2024-05-31 08:50:18

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2103.03636v2

All-in-one simulation-based inference

Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks ahead of time. Here, we present a new amortized inference method -- the Simformer -- which overcomes these limitations. By training a probabilistic diffusion model with transformer architectures, the Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks and is substantially more flexible: It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data, including both posterior and likelihood. We showcase the performance and flexibility of the Simformer on simulators from ecology, epidemiology, and neuroscience, and demonstrate that it opens up new possibilities and application domains for amortized Bayesian inference on simulation-based models.

Updated: 2024-05-31 08:50:05

标题: 基于仿真的一体化推断

摘要: 摘要：摊销贝叶斯推断通过模型模拟训练神经网络来解决随机推断问题，从而使得可以快速对任何新观察到的数据进行贝叶斯推断。然而，当前基于模拟的摊销推断方法对模拟需求较高且缺乏灵活性：它们需要提前指定固定的参数先验、模拟器和推断任务。在这里，我们提出了一种新的摊销推断方法——Simformer，克服了这些限制。通过使用变压器架构训练概率扩散模型，Simformer在基准任务上优于当前最先进的摊销推断方法，并且具有更大的灵活性：它可以应用于具有函数值参数的模型，可以处理具有缺失或非结构化数据的推断场景，并且可以对参数和数据的联合分布的任意条件进行抽样，包括后验和似然。我们展示了Simformer在生态学、流行病学和神经科学模拟器上的性能和灵活性，并证明它为基于模拟模型的摊销贝叶斯推断开辟了新的可能性和应用领域。

更新时间: 2024-05-31 08:50:05

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.09636v2

A Lightweight Method for Defending Against UAF Vulnerabilities

The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations to look up the objects pointed to by the pointers and store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers or locating the objects pointed to by pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine the objects pointed to by pointers and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to check whether the pointers identified by the pointer analysis point to the released objects when the objects are released, LightDE is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities, and the additional performance overhead it introduces is very low.

Updated: 2024-05-31 08:47:24

标题: 一种轻量级防御UAF漏洞的方法

摘要: 广泛存在的使用后释放（UAF）漏洞对软件安全构成严重威胁，悬空指针被认为是这些漏洞的主要原因。然而，现有的消除悬空指针以防御UAF漏洞的方法需要在遇到指针赋值操作时中断程序执行，以查找指针指向的对象并将指针的内存地址存储在特定数据结构中。这使得这些方法不够轻量级。为了克服这一缺点，我们提出了一种称为LightDE的新方法。该方法在程序执行期间不需要存储指针的内存地址或定位指针指向的对象。LightDE使用我们提出的结构敏感指针分析方法来确定指针指向的对象，并在程序编译期间将指向关系存储在程序的数据段中。由于LightDE只需要在释放对象时检查指针分析识别的指针是否指向已释放的对象，因此LightDE非常轻量级。我们的实验结果表明，LightDE可以有效防御UAF漏洞，并且它引入的额外性能开销非常低。

更新时间: 2024-05-31 08:47:24

领域: cs.CR

下载: http://arxiv.org/abs/2405.20697v1

TensorKrowch: Smooth integration of tensor networks in machine learning

Tensor networks are factorizations of high-dimensional tensors into networks of smaller tensors. They have applications in physics and mathematics, and recently have been proposed as promising machine learning architectures. To ease the integration of tensor networks in machine learning pipelines, we introduce TensorKrowch, an open source Python library built on top of PyTorch. Providing a user-friendly interface, TensorKrowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models. In this paper, we describe the main functionality and basic usage of TensorKrowch, and provide technical details on its building blocks and the optimizations performed to achieve efficient operation.

Updated: 2024-05-31 08:39:03

标题: TensorKrowch：张量网络在机器学习中的平滑集成

摘要: 张量网络是将高维张量分解为较小张量网络的方法。它们在物理和数学中有应用，并且最近被提议作为有前途的机器学习架构。为了方便张量网络在机器学习流程中的整合，我们引入了TensorKrowch，这是一个基于PyTorch的开源Python库。提供用户友好的接口，TensorKrowch允许用户构建任何张量网络，训练它，并将其整合为更复杂深度学习模型的层。在本文中，我们描述了TensorKrowch的主要功能和基本用法，并提供了有关其构建模块和为实现高效运行而执行的优化的技术细节。

更新时间: 2024-05-31 08:39:03

领域: cs.LG,cond-mat.stat-mech,cond-mat.str-el,quant-ph

下载: http://arxiv.org/abs/2306.08595v3

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is \textbf{36$\times$} times faster than baselines in the D4RL benchmark and \textbf{27$\times$} times faster in the Grid World benchmark.

Updated: 2024-05-31 08:38:25

标题: 上下文决策变换器：通过分层思维链强化学习

摘要: 上下文学习是离线强化学习（RL）处理在线任务的一种有前途的方法，可以通过提供任务提示来实现。最近的研究表明，在上下文RL中，当将RL任务视为跨周期顺序预测问题时，可以通过试错方式自我改进。尽管自我改进不需要梯度更新，但当前的工作在跨周期序列随着任务范围增加时仍然面临高计算成本的问题。为此，我们提出了一种上下文决策变换器（IDT），以高层次的试错方式实现自我改进。具体来说，IDT受到人类决策高效分层结构的启发，因此重新构建序列，由高级决策而不是与环境交互的低级动作组成。由于一个高级决策可以引导多步低级动作，IDT自然地避免了过长的序列，并更有效地解决在线任务。实验结果表明，IDT在长期任务上取得了现有上下文RL方法的最新成果。特别是，我们的IDT的在线评估时间在D4RL基准测试中比基线快\textbf{36倍}，在Grid World基准测试中比基线快\textbf{27倍}。

更新时间: 2024-05-31 08:38:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20692v1

Unleashing the Potential of Diffusion Models for Incomplete Data Imputation

This paper introduces DiffPuter, an iterative method for missing data imputation that leverages the Expectation-Maximization (EM) algorithm and Diffusion Models. By treating missing data as hidden variables that can be updated during model training, we frame the missing data imputation task as an EM problem. During the M-step, DiffPuter employs a diffusion model to learn the joint distribution of both the observed and currently estimated missing data. In the E-step, DiffPuter re-estimates the missing data based on the conditional probability given the observed data, utilizing the diffusion model learned in the M-step. Starting with an initial imputation, DiffPuter alternates between the M-step and E-step until convergence. Through this iterative process, DiffPuter progressively refines the complete data distribution, yielding increasingly accurate estimations of the missing data. Our theoretical analysis demonstrates that the unconditional training and conditional sampling processes of the diffusion model align precisely with the objectives of the M-step and E-step, respectively. Empirical evaluations across 10 diverse datasets and comparisons with 16 different imputation methods highlight DiffPuter's superior performance. Notably, DiffPuter achieves an average improvement of 8.10% in MAE and 5.64% in RMSE compared to the most competitive existing method.

Updated: 2024-05-31 08:35:56

标题: 释放扩散模型在不完整数据填补中的潜力

摘要: 本文介绍了DiffPuter，这是一种利用期望最大化（EM）算法和扩散模型进行缺失数据插补的迭代方法。通过将缺失数据视为可以在模型训练过程中更新的隐藏变量，我们将缺失数据插补任务构建为一个EM问题。在M步骤中，DiffPuter利用扩散模型学习观测数据和当前估计的缺失数据的联合分布。在E步骤中，DiffPuter基于给定观测数据的条件概率重新估计缺失数据，利用在M步骤中学习的扩散模型。从初始插补开始，DiffPuter在M步骤和E步骤之间交替进行，直到收敛。通过这个迭代过程，DiffPuter逐渐完善完整数据分布，产生对缺失数据越来越准确的估计。我们的理论分析表明，扩散模型的无条件训练和条件抽样过程分别与M步骤和E步骤的目标完全吻合。通过对10个不同数据集的实证评估和与16种不同插补方法的比较，突显了DiffPuter的优越性能。值得注意的是，与最具竞争力的现有方法相比，DiffPuter在MAE和RMSE上平均提高了8.10%和5.64%。

更新时间: 2024-05-31 08:35:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.20690v1

The Road to Trust: Building Enclaves within Confidential VMs

Integrity is critical for maintaining system security, as it ensures that only genuine software is loaded onto a machine. Although confidential virtual machines (CVMs) function within isolated environments separate from the host, it is important to recognize that users still encounter challenges in maintaining control over the integrity of the code running within the trusted execution environments (TEEs). The presence of a sophisticated operating system (OS) raises the possibility of dynamically creating and executing any code, making user applications within TEEs vulnerable to interference or tampering if the guest OS is compromised. To address this issue, this paper introduces NestedSGX, a framework which leverages virtual machine privilege level (VMPL), a recent hardware feature available on AMD SEV-SNP to enable the creation of hardware enclaves within the guest VM. Similar to Intel SGX, NestedSGX considers the guest OS untrusted for loading potentially malicious code. It ensures that only trusted and measured code executed within the enclave can be remotely attested. To seamlessly protect existing applications, NestedSGX aims for compatibility with Intel SGX by simulating SGX leaf functions. We have also ported the SGX SDK and the Occlum library OS to NestedSGX, enabling the use of existing SGX toolchains and applications in the system. Performance evaluations show that context switches in NestedSGX take about 32,000 -- 34,000 cycles, approximately $1.33\times$ -- $1.54\times$ higher than that of Intel SGX. NestedSGX incurs minimal overhead in most real-world applications, with an average overhead below 2% for computation and memory intensive workloads and below 15.68% for I/O intensive workloads.

Updated: 2024-05-31 08:33:05

标题: 通往信任之路：在机密虚拟机中构建飞地

摘要: 完整性对于维护系统安全至关重要，因为它确保只有真实的软件被加载到机器上。虽然保密虚拟机（CVMs）在与主机分离的隔离环境中运行，但重要的是要意识到用户在控制可信执行环境（TEEs）内运行的代码的完整性方面仍然面临挑战。复杂操作系统（OS）的存在提高了动态创建和执行任何代码的可能性，使得在TEE内运行的用户应用程序容易受到干扰或篡改，如果客户OS受到损害。为了解决这个问题，本文介绍了NestedSGX，这是一个利用AMD SEV-SNP上可用的最新硬件特性——虚拟机特权级别（VMPL）的框架，可以在客户VM内创建硬件隔离。与Intel SGX类似，NestedSGX认为客户OS不可信，无法加载潜在恶意代码。它确保只有受信任和经过测量的代码在隔离内执行时才能进行远程验证。为了无缝保护现有应用程序，NestedSGX旨在与Intel SGX兼容，通过模拟SGX叶子函数来实现。我们还将SGX SDK和Occlum库OS移植到了NestedSGX，使系统可以使用现有的SGX工具链和应用程序。性能评估显示，NestedSGX中的上下文切换大约需要32,000-34,000个周期，大约比Intel SGX高约$1.33\times$-$1.54\times$。在大多数现实世界应用程序中，NestedSGX几乎不会带来额外开销，对于计算和内存密集型工作负载，平均开销低于2%，对于I/O密集型工作负载低于15.68%。

更新时间: 2024-05-31 08:33:05

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2402.11438v2

Conditioning GAN Without Training Dataset

Deep learning algorithms have a large number of trainable parameters often with sizes of hundreds of thousands or more. Training this algorithm requires a large amount of training data and generating a sufficiently large dataset for these algorithms is costly\cite{noguchi2019image}. GANs are generative neural networks that use two deep learning networks that are competing with each other. The networks are generator and discriminator networks. The generator tries to generate realistic images which resemble the actual training dataset by approximating the training data distribution and the discriminator is trained to classify images as real or fake(generated)\cite{goodfellow2016nips}. Training these GAN algorithms also requires a large amount of training dataset\cite{noguchi2019image}. In this study, the aim is to address the question, "Given an unconditioned pretrained generator network and a pretrained classifier, is it feasible to develop a conditioned generator without relying on any training dataset?" The paper begins with a general introduction to the problem. The subsequent sections are structured as follows: Section 2 provides background information on the problem. Section 3 reviews relevant literature on the topic. Section 4 outlines the methodology employed in this study. Section 5 presents the experimental results. Section 6 discusses the findings and proposes potential future research directions. Finally, Section 7 offers concluding remarks. The implementation can be accessed \href{https://github.com/kidist-amde/BigGAN-PyTorch}{here}.

Updated: 2024-05-31 08:31:26

标题: 在没有训练数据集的情况下对生成对抗网络进行条件化

摘要: 深度学习算法通常具有大量可训练参数，常常包含数十万个甚至更多的参数。训练这种算法需要大量的训练数据，并为这些算法生成足够大的数据集是昂贵的。GANs 是生成式神经网络，使用两个相互竞争的深度学习网络，即生成器和鉴别器网络。生成器试图生成类似实际训练数据集的逼真图像，通过逼近训练数据分布，而鉴别器则被训练为将图像分类为真实或伪造（生成的）。训练这些 GAN 算法也需要大量的训练数据集。在这项研究中，旨在回答一个问题：“在给定一个无条件的预训练生成器网络和一个预训练分类器的情况下，是否可以开发一个有条件的生成器，而不依赖于任何训练数据集？”本文以对问题的一般介绍开篇。接下来的部分结构如下：第 2 节提供了问题的背景信息。第 3 节回顾了有关主题的相关文献。第 4 节概述了本研究采用的方法。第 5 节展示了实验结果。第 6 节讨论了研究结果并提出了潜在的未来研究方向。最后，第 7 节提供了结论性的言论。该实现可以在此处访问：https://github.com/kidist-amde/BigGAN-PyTorch。

更新时间: 2024-05-31 08:31:26

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.20687v1

Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation

Money laundering presents a pervasive challenge, burdening society by financing illegal activities. To more effectively combat and detect money laundering, the use of network information is increasingly being explored, exploiting that money laundering necessarily involves interconnected parties. This has lead to a surge in literature on network analytics (NA) for anti-money laundering (AML). The literature, however, is fragmented and a comprehensive overview of existing work is missing. This results in limited understanding of the methods that may be applied and their comparative detection power. Therefore, this paper presents an extensive and systematic review of the literature. We identify and analyse 97 papers in the Web of Science and Scopus databases, resulting in a taxonomy of approaches following the fraud analytics framework of Bockel-Rickermann et al.. Moreover, this paper presents a comprehensive experimental framework to evaluate and compare the performance of prominent NA methods in a uniform setup. The framework is applied on the publicly available Elliptic data set and implements manual feature engineering, random walk-based methods, and deep learning GNNs. We conclude from the results that network analytics increases the predictive power of the AML model with graph neural networks giving the best results. An open source implementation of the experimental framework is provided to facilitate researchers and practitioners to extend upon these results and experiment on proprietary data. As such, we aim to promote a standardised approach towards the analysis and evaluation of network analytics for AML.

Updated: 2024-05-31 08:29:26

标题: 《反洗钱网络分析--系统文献综述与实验评估》

摘要: 洗钱是一个普遍存在的挑战，通过为非法活动提供资金，给社会带来负担。为了更有效地打击和检测洗钱行为，人们越来越多地探索利用网络信息，因为洗钱必然涉及到相互关联的各方。这导致了反洗钱领域网络分析（NA）文献的激增。然而，相关文献分散，缺乏对现有工作的综合概述。这导致了对可应用方法及其比较检测能力的理解有限。因此，本文对相关文献进行了广泛而系统的回顾。我们在Web of Science和Scopus数据库中识别和分析了97篇论文，得出了一个遵循Bockel-Rickermann等人欺诈分析框架的方法分类体系。此外，本文提出了一个全面的实验框架，以评估和比较主流NA方法在统一设置下的表现。该框架应用于公开可用的Elliptic数据集，并实施了手动特征工程、基于随机游走的方法和深度学习GNNs。我们从结果中得出结论，网络分析提高了AML模型的预测能力，图神经网络取得了最佳结果。我们提供了实验框架的开源实现，以便研究人员和实践者扩展这些结果并在专有数据上进行实验。因此，我们旨在推广对AML的网络分析的分析和评估的标准化方法。

更新时间: 2024-05-31 08:29:26

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2405.19383v2

Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space

In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.

Updated: 2024-05-31 08:26:53

标题: 利用特征空间中的马氏距离和分布偏好增强反事实图像生成

摘要: 在人工智能领域中，越来越多地意识到可解释人工智能（XAI）的重要性，特别是随着人工智能模型对我们的生活变得更加重要。一个显著的单一实例XAI方法是反事实解释，它帮助用户理解模型的决策，并提供指导如何改变这些决策。特别是在图像分类模型的背景下，有效的图像反事实解释可以显著提升用户的理解。本文介绍了一种计算黑匣子模型特征空间内特征重要性的新方法。通过采用信息融合技术，我们的方法最大化利用数据来解决特征空间内的特征反事实解释。随后，我们利用图像生成模型将这些特征反事实解释转化为图像反事实解释。我们的实验表明，我们方法生成的反事实解释在像素和特征空间中与原始图像非常相似。此外，我们的方法胜过已建立的基线，实现了令人印象深刻的实验结果。

更新时间: 2024-05-31 08:26:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20685v1

Relaxed Contrastive Learning for Federated Learning

We propose a novel contrastive learning framework to effectively address the challenges of data heterogeneity in federated learning. We first analyze the inconsistency of gradient updates across clients during local training and establish its dependence on the distribution of feature representations, leading to the derivation of the supervised contrastive learning (SCL) objective to mitigate local deviations. In addition, we show that a na\"ive adoption of SCL in federated learning leads to representation collapse, resulting in slow convergence and limited performance gains. To address this issue, we introduce a relaxed contrastive learning loss that imposes a divergence penalty on excessively similar sample pairs within each class. This strategy prevents collapsed representations and enhances feature transferability, facilitating collaborative training and leading to significant performance improvements. Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks through extensive experimental results.

Updated: 2024-05-31 08:23:42

标题: 松弛对比学习在联邦学习中的应用

摘要: 我们提出了一个新颖的对比学习框架，以有效应对联邦学习中数据异质性的挑战。我们首先分析了本地训练过程中客户端之间梯度更新的不一致性，并确定其与特征表示分布的依赖关系，从而推导出监督对比学习（SCL）目标，以减轻本地偏差。此外，我们发现在联邦学习中简单采用SCL会导致表示坍缩，导致收敛速度缓慢和性能增益有限。为了解决这个问题，我们引入了一种放松的对比学习损失函数，对每个类别中过度相似的样本对施加发散惩罚。这种策略可以防止表示坍缩，增强特征可迁移性，促进协作训练，从而实现显著的性能改进。我们的框架通过大量实验证据，在标准基准测试中凭借巨大的优势胜过所有现有的联邦学习方法。

更新时间: 2024-05-31 08:23:42

领域: cs.LG

下载: http://arxiv.org/abs/2401.04928v2

No Free Lunch Theorem for Privacy-Preserving LLM Inference

Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the frontiers of technology and science. However, LLMs also pose privacy concerns. Users' interactions with LLMs may expose their sensitive personal or company information. A lack of robust privacy safeguards and legal frameworks could permit the unwarranted intrusion or improper handling of individual data, thereby risking infringements of privacy and the theft of personal identities. To ensure privacy, it is essential to minimize the dependency between shared prompts and private information. Various randomization approaches have been proposed to protect prompts' privacy, but they may incur utility loss compared to unprotected LLMs prompting. Therefore, it is essential to evaluate the balance between the risk of privacy leakage and loss of utility when conducting effective protection mechanisms. The current study develops a framework for inferring privacy-protected Large Language Models (LLMs) and lays down a solid theoretical basis for examining the interplay between privacy preservation and utility. The core insight is encapsulated within a theorem that is called as the NFL (abbreviation of the word No-Free-Lunch) Theorem.

Updated: 2024-05-31 08:22:53

标题: 隐私保护的LLM推断的无免费午餐定理

摘要: 个人和企业在各种方面显著受益于包括PaLM、Gemini和ChatGPT在内的大型语言模型（LLMs）。例如，LLMs提高了生产力，降低了成本，并使我们能够专注于更有价值的任务。此外，LLMs具有筛选大量数据集、发现潜在模式并提供关键见解的能力，推动了技术和科学的前沿。然而，LLMs也带来了隐私问题。用户与LLMs的互动可能暴露其敏感个人或公司信息。缺乏强大的隐私保护和法律框架可能允许未经授权的侵入或不当处理个人数据，从而导致侵犯隐私和个人身份盗窃的风险。为确保隐私，至关重要的是最大程度地减少共享提示和私人信息之间的依赖关系。已经提出了各种随机化方法来保护提示的隐私，但与未受保护的LLMs提示相比，它们可能带来效用损失。因此，在进行有效的保护机制时，评估隐私泄露风险和效用损失之间的平衡至关重要。当前研究开发了一个框架，用于推断受隐私保护的大型语言模型（LLMs），并为检验隐私保护和效用之间的相互作用奠定了坚实的理论基础。核心洞见被包含在一个称为NFL（No-Free-Lunch的缩写）定理中。

更新时间: 2024-05-31 08:22:53

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.20681v1

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic $N$-agent $K$-armed bandits, we develop an algorithm with $\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$ regret and prove that the dependence on $T$ is tight, making it a sharp contrast to the $\sqrt{T}$-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We then consider a more challenging version of the problem with adversarial rewards. Somewhat surprisingly, despite NSW being a concave function, we prove that no algorithm can achieve sublinear regret. To circumvent such negative results, we further consider a setting with full-information feedback and design two algorithms with $\sqrt{T}$-regret: the first one has no dependence on $N$ at all and is applicable to not just NSW but a broad class of welfare functions, while the second one has better dependence on $K$ and is preferable when $N$ is small. Finally, we also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms.

Updated: 2024-05-31 08:21:11

标题: 无悔学习对公平多智能体社会福利优化的影响

摘要: 我们考虑在线多智能体Nash社会福利（NSW）最大化的问题。尽管Hossain等人[2021]，Jones等人[2023]的先前作品研究了随机多智能体多臂老虎机中类似的问题，并且展示了在T轮后可能出现$\sqrt{T}$-后悔，他们的公平度量是所有智能体奖励的乘积，而不是它们的NSW（即他们的几何平均）。鉴于NSW在公平文献中的基础作用，自然而然地会问是否可以实现以NSW为目标的无后悔公平学习。在这项工作中，我们在各种设置中对这个问题提供了一个完整的答案。具体地，在随机N智能体K臂老虎机中，我们开发了一个算法，具有$\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$的后悔，并证明了对T的依赖性很紧，与Hossain等人[2021]，Jones等人[2023]的$\sqrt{T}$-后悔界形成鲜明对比。然后，我们考虑了一个更具挑战性的问题版本，其中奖励是对抗性的。有点令人惊讶的是，尽管NSW是一个凹函数，我们证明没有算法可以实现亚线性后悔。为了避免这种消极结果，我们进一步考虑具有全信息反馈的设置，并设计了两个具有$\sqrt{T}$-后悔的算法：第一个根本不依赖于N，并且适用于不仅仅是NSW而且是广泛的福利函数，而第二个对K的依赖性更好，当N较小时更可取。最后，我们还表明，只要存在一个对不同臂持中立态度的智能体，就可以实现对数后悔。

更新时间: 2024-05-31 08:21:11

领域: cs.LG,cs.GT,cs.MA,stat.ML

下载: http://arxiv.org/abs/2405.20678v1

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting. We apply IGL to learning from image feedback and learning from text feedback, which are reward-free settings that arise in practice. Experimental results showcase the importance of using our Lipschitz reward estimator and the overall effectiveness of our algorithms.

Updated: 2024-05-31 08:21:09

标题: 可证明高效的个性化奖励互动式地基学习

摘要: 交互式基于环境的学习（IGL）[Xie等，2021]是一个强大的框架，其中学习者通过与环境交互并观察所采取行动的奖励相关反馈来最大化不可观测的奖励。为了处理在诸如推荐系统等应用中普遍存在的个性化奖励，Maghakian等人[2022]研究了一种具有上下文相关反馈的IGL版本，但是他们的算法没有理论保证。在这项工作中，我们考虑了相同的问题，并提供了在可实现性条件下具有次线性遗憾的第一个经过证明的高效算法。我们的分析揭示了先前工作的阶跃函数估计器可能由于有限样本效应而无法受控地偏离。我们的解决方案是一种新颖的Lipschitz奖励估计器，它低估了真实奖励并具有良好的泛化性能。基于这个估计器，我们提出了两种算法，一种基于探索-利用，另一种基于逆差权重。我们将IGL应用于从图像反馈和文本反馈中学习，这些是在实践中出现的无奖励设置。实验结果展示了使用我们的Lipschitz奖励估计器的重要性以及我们算法的整体有效性。

更新时间: 2024-05-31 08:21:09

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20677v1

Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling

Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models, achieving remarkable performance in image synthesis tasks. However, these models face challenges in terms of widespread adoption due to their reliance on sequential denoising steps during sample generation. This dependence leads to substantial computational requirements, making them unsuitable for resource-constrained or real-time processing systems. To address these challenges, we propose a novel method that integrates denoising phases directly into the model's architecture, thereby reducing the need for resource-intensive computations. Our approach combines diffusion models with generative adversarial networks (GANs) through knowledge distillation, enabling more efficient training and evaluation. By utilizing a pre-trained diffusion model as a teacher model, we train a student model through adversarial learning, employing layerwise transformations for denoising and submodules for predicting the teacher model's output at various points in time. This integration significantly reduces the number of parameters and denoising steps required, leading to improved sampling speed at test time. We validate our method with extensive experiments, demonstrating comparable performance with reduced computational requirements compared to existing approaches. By enabling the deployment of diffusion models on resource-constrained devices, our research mitigates their computational burden and paves the way for wider accessibility and practical use across the research community and end-users. Our code is publicly available at https://github.com/kidist-amde/Adv-KD

Updated: 2024-05-31 08:19:44

标题: Adv-KD: 对抗性知识蒸馏用于加速扩散抽样

摘要: 扩散概率模型（DPMs）已经成为一类强大的深度生成模型，在图像合成任务中取得了显著的性能。然而，这些模型面临着普遍采用方面的挑战，因为它们在样本生成过程中依赖于顺序去噪步骤。这种依赖性导致了大量的计算需求，使它们不适用于资源受限或实时处理系统。为了解决这些挑战，我们提出了一种新颖的方法，将去噪阶段直接集成到模型的架构中，从而减少对资源密集型计算的需求。我们的方法通过知识蒸馏将扩散模型与生成对抗网络（GANs）结合起来，实现更高效的训练和评估。通过利用一个预先训练的扩散模型作为教师模型，我们通过对抗学习训练一个学生模型，利用逐层转换进行去噪并使用子模块在不同时间点上预测教师模型的输出。这种集成显著减少了所需的参数数量和去噪步骤，提高了测试时的采样速度。我们通过广泛的实验证实了我们的方法，在减少计算需求的同时表现出与现有方法相当的性能。通过在资源受限设备上部署扩散模型，我们的研究减轻了它们的计算负担，并为广泛的研究社区和最终用户提供了更广泛的可访问性和实际使用途径。我们的代码公开可用于https://github.com/kidist-amde/Adv-KD。

更新时间: 2024-05-31 08:19:44

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.20675v1

End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

End-to-end (E2E) training, optimizing the entire model through error backpropagation, fundamentally supports the advancements of deep learning. Despite its high performance, E2E training faces the problems of memory consumption, parallel computing, and discrepancy with the functionalities of the actual brain. Various alternative methods have been proposed to overcome these difficulties; however, no one can yet match the performance of E2E training, thereby falling short in practicality. Furthermore, there is no deep understanding regarding differences in the trained model properties beyond the performance gap. In this paper, we reconsider why E2E training demonstrates a superior performance through a comparison with layer-wise training, a non-E2E method that locally sets errors. On the basis of the observation that E2E training has an advantage in propagating input information, we analyze the information plane dynamics of intermediate representations based on the Hilbert-Schmidt independence criterion (HSIC). The results of our normalized HSIC value analysis reveal the E2E training ability to exhibit different information dynamics across layers, in addition to efficient information propagation. Furthermore, we show that this layer-role differentiation leads to the final representation following the information bottleneck principle. It suggests the need to consider the cooperative interactions between layers, not just the final layer when analyzing the information bottleneck of deep learning.

Updated: 2024-05-31 08:15:06

标题: 端到端训练通过层-角色区分诱导信息瓶颈：与逐层训练的比较分析

摘要: 端到端（E2E）训练通过误差反向传播优化整个模型，从根本上支持深度学习的进步。尽管其性能表现出色，但E2E训练面临内存消耗、并行计算和与实际大脑功能不符等问题。已经提出了各种替代方法来克服这些困难，然而目前还没有一种方法能够与E2E训练的性能相匹配，因此在实用性方面存在不足。此外，关于训练模型属性差异的深层理解尚不明确。本文通过与逐层训练进行比较重新考虑了为什么E2E训练表现出卓越性能，逐层训练是一种局部设置错误的非E2E方法。基于E2E训练在传播输入信息方面具有优势的观察，我们基于Hilbert-Schmidt独立性准则（HSIC）分析了中间表示的信息平面动态。我们的归一化HSIC值分析结果显示，E2E训练具有展示不同信息动态的能力，同时具有高效的信息传播。此外，我们表明这种层角色的差异导致最终表示遵循信息瓶颈原则。它表明在分析深度学习的信息瓶颈时需要考虑层之间的合作交互，而不仅仅是最终层。

更新时间: 2024-05-31 08:15:06

领域: cs.LG

下载: http://arxiv.org/abs/2402.09050v2

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absolute position mechanism assigning unique position IDs to each of the tokens, we assign the same position IDs to two or more "relevant" tokens; for integer addition tasks, we regard digits of the same significance as in the same position. On the empirical side, we show that with the proposed position coupling, a small (1-layer) Transformer trained on 1 to 30-digit additions can generalize up to 200-digit additions (6.67x of the trained length). On the theoretical side, we prove that a 1-layer Transformer with coupled positions can solve the addition task involving exponentially many digits, whereas any 1-layer Transformer without positional information cannot entirely solve it. We also demonstrate that position coupling can be applied to other algorithmic tasks such as addition with multiple summands, Nx2 multiplication, copy/reverse, and a two-dimensional task.

Updated: 2024-05-31 08:13:35

标题: 位置耦合：利用任务结构改进变压器的长度泛化

摘要: 即使对于像整数加法这样的简单算术任务，对于Transformer来说，在训练过程中遇到的序列比那些遇到的更长的序列进行泛化是具有挑战性的。为了解决这个问题，我们提出了位置耦合，这是一种简单而有效的方法，直接将任务的结构嵌入到（仅解码器）Transformer的位置编码中。与将唯一位置ID分配给每个令牌的传统绝对位置机制不同，我们将相同位置ID分配给两个或更多“相关”的令牌；对于整数加法任务，我们将相同重要性的数字视为处于相同的位置。在实证方面，我们展示了借助所提出的位置耦合，一个小型（1层）Transformer在1到30位加法训练后能够泛化到200位加法（训练长度的6.67倍）。在理论方面，我们证明了具有耦合位置的1层Transformer可以解决涉及指数数量位数的加法任务，而没有位置信息的任何1层Transformer不能完全解决这个问题。我们还证明了位置耦合可以应用于其他算法任务，如多个加数的加法、Nx2乘法、复制/反转以及二维任务。

更新时间: 2024-05-31 08:13:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20671v1

Two Optimizers Are Better Than One: LLM Catalyst for Enhancing Gradient-Based Optimization

Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for concrete problems by inferring from natural language instructions, akin to a high-level instructor. In this paper, we show that these two optimizers are complementary to each other, suggesting a collaborative optimization approach. The gradient-based optimizer and LLM-based optimizer are combined in an interleaved manner. We instruct LLMs using task descriptions and timely optimization trajectories recorded during gradient-based optimization. Inferred results from LLMs are used as restarting points for the next stage of gradient optimization. By leveraging both the locally rigorous gradient-based optimizer and the high-level deductive LLM-based optimizer, our combined optimization method consistently yields improvements over competitive baseline prompt tuning methods. Our results demonstrate the synergistic effect of conventional gradient-based optimization and the inference ability of LLMs. The code is released at https://github.com/guozix/LLM-catalyst.

Updated: 2024-05-31 08:13:34

标题: 两个优化器胜过一个：LLM催化剂用于增强基于梯度的优化

摘要: 学习一项技能通常依赖于实践者的实际经验和教练的深刻高水平指导。这种策略是否也适用于解决复杂的非凸优化问题？在这里，一个常见的基于梯度的优化器就像一个有纪律的实践者，在每一步都做出局部最优的更新。最近的方法利用大型语言模型（LLMs）通过从自然语言指令推断来优化具体问题的解决方案，类似于高水平的教练。在本文中，我们展示了这两种优化器彼此互补，建议采用协作优化方法。基于梯度的优化器和基于LLM的优化器以交错的方式结合在一起。我们使用任务描述和在基于梯度的优化过程中记录的及时优化轨迹来指导LLMs。LLMs推断出的结果被用作下一阶段基于梯度的优化的重新启动点。通过利用本地严格的基于梯度的优化器和高水平的推理LLM-based优化器，我们的组合优化方法始终比竞争基线提示调整方法产生改进。我们的结果表明了传统基于梯度的优化和LLMs的推理能力之间的协同效应。代码已发布在https://github.com/guozix/LLM-catalyst。

更新时间: 2024-05-31 08:13:34

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19732v2

Exploratory Machine Learning with Unknown Unknowns

In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. This paper studies a new problem setting in which there are unknown classes in the training data misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates training data by actively augmenting the feature space to discover potentially hidden classes. Our method consists of three ingredients including rejection model, feature exploration, and model cascade. We provide theoretical analysis to justify its superiority, and validate the effectiveness on both synthetic and real datasets.

Updated: 2024-05-31 08:11:57

标题: 探索性机器学习中的未知未知因素

摘要: 在传统的监督学习中，会提供一个包含已知标签集的训练数据集，学习模型将对未知实例进行分类到已知标签。本文研究了一个新的问题设置，即在训练数据中存在被误认为是其他标签的未知类别，因此它们的存在在给定监督下是未知的。我们将未知的未知归因于训练数据集受到不完全感知的标签空间的误导，这是由于特征信息不足导致的。为此，我们提出了探索性机器学习，通过主动增加特征空间来检查和调查训练数据，以发现潜在的隐藏类别。我们的方法包括拒绝模型、特征探索和模型级联这三个要素。我们提供理论分析来证明其优越性，并在合成和真实数据集上验证了其有效性。

更新时间: 2024-05-31 08:11:57

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2002.01605v2

Improving Paratope and Epitope Prediction by Multi-Modal Contrastive Learning and Interaction Informativeness Estimation

Accurately predicting antibody-antigen binding residues, i.e., paratopes and epitopes, is crucial in antibody design. However, existing methods solely focus on uni-modal data (either sequence or structure), disregarding the complementary information present in multi-modal data, and most methods predict paratopes and epitopes separately, overlooking their specific spatial interactions. In this paper, we propose a novel Multi-modal contrastive learning and Interaction informativeness estimation-based method for Paratope and Epitope prediction, named MIPE, by using both sequence and structure data of antibodies and antigens. MIPE implements a multi-modal contrastive learning strategy, which maximizes representations of binding and non-binding residues within each modality and meanwhile aligns uni-modal representations towards effective modal representations. To exploit the spatial interaction information, MIPE also incorporates an interaction informativeness estimation that computes the estimated interaction matrices between antibodies and antigens, thereby approximating them to the actual ones. Extensive experiments demonstrate the superiority of our method compared to baselines. Additionally, the ablation studies and visualizations demonstrate the superiority of MIPE owing to the better representations acquired through multi-modal contrastive learning and the interaction patterns comprehended by the interaction informativeness estimation.

Updated: 2024-05-31 08:09:36

标题: 通过多模态对比学习和交互信息量估计改进抗原表位和抗体表位预测

摘要: 精确预测抗体-抗原结合残基，即抗原表位和抗体表位，在抗体设计中至关重要。然而，现有方法仅专注于单模态数据（序列或结构），忽略了多模态数据中存在的互补信息，并且大多数方法分别预测抗体表位和抗原表位，忽视了它们特定的空间相互作用。在本文中，我们提出了一种基于多模态对比学习和交互信息量估计的Paratope和Epitope预测方法，命名为MIPE，通过使用抗体和抗原的序列和结构数据。MIPE实现了多模态对比学习策略，最大化每种模态中结合和非结合残基的表示，并同时将单模态表示对齐到有效的模态表示。为了利用空间相互作用信息，MIPE还融入了一个交互信息量估计，计算抗体和抗原之间的估计交互矩阵，从而将它们近似为实际值。大量实验证明了我们的方法相对于基线的优越性。此外，消融研究和可视化展示了MIPE的优越性，这归功于通过多模态对比学习获得的更好表示和通过交互信息量估计理解的交互模式。

更新时间: 2024-05-31 08:09:36

领域: q-bio.BM,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.20668v1

Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.

Updated: 2024-05-31 08:03:52

标题: 学习算法与反事实解释生成算法之间的弱稳健兼容性

摘要: 反事实解释生成是可解释人工智能的强大方法。它可以帮助用户了解机器学习模型为何做出特定决策以及如何改变这些决策。因此，评估反事实解释算法的鲁棒性至关重要。先前的文献广泛研究了基于输入实例扰动的鲁棒性。然而，从扰动实例的角度定义的鲁棒性有时会存在偏见，因为这种定义忽略了学习算法对鲁棒性的影响。在本文中，我们提出了一个更为合理的定义，即弱鲁棒兼容性，基于解释强度的视角。在实践中，我们提出了WRC-Test来帮助我们生成更为稳健的反事实。同时，我们设计了实验来验证WRC-Test的有效性。从理论上讲，我们引入了PAC学习理论的概念，并定义了PAC WRC-可近似性的概念。基于合理的假设，我们建立了关于弱鲁棒性的Oracle不等式，为PAC WRC-可近似性提供了充分条件。

更新时间: 2024-05-31 08:03:52

领域: cs.LG

下载: http://arxiv.org/abs/2405.20664v1

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law.

Updated: 2024-05-31 08:01:56

标题: 学习率和批量大小缩放中的激增现象

摘要: 在当前的深度学习任务中，Adam风格的优化器，如Adam、Adagrad、RMSProp、Adafactor和Lion已被广泛用作替代SGD风格优化器。这些优化器通常使用梯度的符号来更新模型参数，导致更稳定的收敛曲线。学习率和批大小是优化器最关键的超参数，需要仔细调整以实现有效的收敛。先前的研究表明，对于SGD风格的优化器，最佳学习率会线性增加或遵循类似的规则与批大小。然而，这个结论并不适用于Adam风格的优化器。在本文中，我们通过理论分析和广泛的实验阐明了Adam风格优化器的最佳学习率和批大小之间的联系。首先，在梯度符号情况下，我们提出了批大小和最佳学习率之间的缩放规律，证明了随着批大小的增加，最佳学习率会先上升然后下降。此外，激增的峰值将随着训练的进行逐渐向较大的批大小移动。其次，我们在各种CV和NLP任务上进行了实验，验证了缩放规律的正确性。

更新时间: 2024-05-31 08:01:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.14578v2

When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications

The recent surge in Large Language Models (LLMs) has garnered significant attention across numerous fields. Fine-tuning is often required to fit general LLMs for a specific domain, like the web-based healthcare system. However, two problems arise during fine-tuning LLMs for medical applications. One is the task variety problem, which involves distinct tasks in real-world medical scenarios. The variety often leads to sub-optimal fine-tuning for data imbalance and seesaw problems. Besides, the large amount of parameters in LLMs leads to huge time and computation consumption by fine-tuning. To address these two problems, we propose a novel parameter efficient fine-tuning framework for multi-task medical applications, dubbed as MOELoRA. The designed framework aims to absorb both the benefits of mixture-of-expert (MOE) for multi-task learning and low-rank adaptation (LoRA) for parameter efficient fine-tuning. For unifying MOE and LoRA, we devise multiple experts as the trainable parameters, where each expert consists of a pair of low-rank matrices to retain the small size of trainable parameters. Then, a task-motivated gate function for all MOELoRA layers is proposed, which can control the contributions of each expert and produce distinct parameters for various tasks. We conduct experiments on a multi-task medical dataset, indicating MOELoRA outperforms the existing parameter efficient fine-tuning methods. The code is available online.

Updated: 2024-05-31 07:56:08

标题: 当MOE遇见LLMs：多任务医疗应用的参数高效微调

摘要: 最近大规模语言模型（LLMs）的激增引起了各个领域的重视。通常需要微调以适应特定领域，比如基于网络的医疗系统。然而，在为医疗应用微调LLMs时会出现两个问题。一个是任务多样性问题，涉及到现实世界医疗场景中的不同任务。多样性通常会导致微调数据不平衡和跷跷板问题，造成次优微调结果。此外，LLMs中大量参数导致微调时时间和计算消耗巨大。为解决这两个问题，我们提出了一种新颖的参数高效微调框架，命名为MOELoRA，用于多任务医疗应用。设计的框架旨在吸收混合专家（MOE）多任务学习和低秩适应（LoRA）参数高效微调的优点。为了统一MOE和LoRA，我们设计多个专家作为可训练参数，其中每个专家包含一对低秩矩阵，以保持可训练参数的小尺寸。然后，提出了一个针对所有MOELoRA层的任务驱动门控函数，可以控制每个专家的贡献并为不同任务产生不同的参数。我们在一个多任务医疗数据集上进行实验，表明MOELoRA优于现有的参数高效微调方法。代码已在线上提供。

更新时间: 2024-05-31 07:56:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.18339v2

Automatic Counting and Classification of Mosquito Eggs in Field Traps

The analysis of the field traps where the mosquitoes insert their eggs is vital to check that the sterile insect technique (SIT) is working properly. This is because the number of hatched eggs may indicate that the sterile males are not competing with the wild ones. Nowadays, the study of the traps is done manually by microscope and is very time-consuming and prone to human error. This paper presents an automatic trap survey. For this purpose, a device has been designed that automatically scans the slat obtaining different overlapping photos. Subsequently, the images are analyzed by a Mask-RCNN neural network that segments the eggs and classifies them into 2 classes: full or hatch

Updated: 2024-05-31 07:48:48

标题: 野外陷阱中蚊卵的自动计数和分类

摘要: 对蚊子产卵的野外陷阱进行分析对于检查无性昆虫技术(SIT)是否正常运作至关重要。这是因为孵化的卵数量可能表明无性雄性与野生雄性之间存在竞争。目前，对陷阱的研究是通过显微镜手动完成的，非常耗时且容易出现人为错误。本文提出了一种自动陷阱调查方法。为此，设计了一种自动扫描板条的设备，获取不同重叠的照片。随后，这些图像由一个Mask-RCNN神经网络进行分析，将卵分割并分类为两类：完整或孵化。

更新时间: 2024-05-31 07:48:48

领域: cs.AI

下载: http://arxiv.org/abs/2405.20656v1

Robust Entropy Search for Safe Efficient Bayesian Optimization

The practical use of Bayesian Optimization (BO) in engineering applications imposes special requirements: high sampling efficiency on the one hand and finding a robust solution on the other hand. We address the case of adversarial robustness, where all parameters are controllable during the optimization process, but a subset of them is uncontrollable or even adversely perturbed at the time of application. To this end, we develop an efficient information-based acquisition function that we call Robust Entropy Search (RES). We empirically demonstrate its benefits in experiments on synthetic and real-life data. The results showthat RES reliably finds robust optima, outperforming state-of-the-art algorithms.

Updated: 2024-05-31 07:45:53

标题: 稳健熵搜索用于安全高效的贝叶斯优化

摘要: 在工程应用中，贝叶斯优化（BO）的实际应用提出了特殊要求：一方面需要高效的采样效率，另一方面需要找到稳健的解决方案。我们讨论了对抗鲁棒性的情况，即在优化过程中所有参数都是可控的，但其中一部分参数在应用时是不可控的，甚至被故意扰乱。为此，我们开发了一种高效的基于信息的获取函数，称为Robust Entropy Search（RES）。我们在合成数据和真实数据的实验中实验证明了它的优势。结果显示，RES可可靠地找到稳健的最优解，胜过了目前最先进的算法。

更新时间: 2024-05-31 07:45:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19059v2

Systematic Solutions to Login and Authentication Security Problems: A Dual-Password Login-Authentication Mechanism

Credential theft and remote attacks are the most serious threats to user authentication mechanisms. The crux of these problems is that we cannot control such behaviors. However, if a password does not contain user secrets, stealing it is useless. If unauthorized inputs are invalidated, remote attacks can be disabled. Thus, credential secrets and account input fields can be controlled. Rather than encrypting passwords, we design a dual-password login-authentication mechanism, where a user-selected secret-free login password is converted into an untypable authentication password. Subsequently, the authenticatable functionality of the login password and the typable functionality of the authentication password can be disabled or invalidated to prevent credential theft and remote attacks. Thus, the usability-security tradeoff and password reuse issues are resolved; local authentication password storage is no longer necessary. More importantly, the password converter acts as an open hashing algorithm, meaning that its intermediate elements can be used to define a truly unique identity for the login process to implement a novel dual-identity authentication scheme. In particular, the system-managed elements are concealed, inaccessible, and independent of any personal information and therefore can be used to define a perfect unforgeable process identifier to identify unauthorized inputs.

Updated: 2024-05-31 07:45:35

标题: 系统性解决登录和身份验证安全问题：双密码登录验证机制

摘要: 凭证盗窃和远程攻击是对用户认证机制最严重的威胁。这些问题的关键在于我们无法控制这些行为。然而，如果一个密码不包含用户的秘密，那么窃取它是无用的。如果未经授权的输入被作废，远程攻击就可以被禁用。因此，凭证秘密和账户输入字段可以被控制。与其加密密码，我们设计了一个双密码登录认证机制，用户选择的无秘密登录密码被转换成无法输入的认证密码。随后，登录密码的可认证功能和认证密码的可输入功能可以被禁用或作废，以防止凭证盗窃和远程攻击。因此，可用性和安全性的平衡以及密码重用问题得到解决；本地认证密码存储不再必要。更重要的是，密码转换器充当一个开放的哈希算法，这意味着其中间元素可以用来为登录过程定义一个真正独特的身份，以实现一种新颖的双身份认证方案。特别地，系统管理的元素被隐藏、无法访问，并且独立于任何个人信息，因此可以用来定义一个完美不可伪造的过程标识符，用于识别未经授权的输入。

更新时间: 2024-05-31 07:45:35

领域: cs.CR,cs.ET,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01803v2

Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens

Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.

Updated: 2024-05-31 07:41:03

标题: 通过无声令牌增强对大型语言模型的越狱攻击

摘要: 随着语言模型取得的显著成功，最近的研究也开始探讨LLMs的安全威胁，包括越狱攻击。攻击者精心制作越狱提示，以使目标LLM对有害问题作出回应。现有的越狱攻击要求要么借助人类专家，要么利用复杂的算法来构建越狱提示。在本文中，我们介绍了BOOST，一种仅利用eos令牌的简单攻击。我们证明，攻击者可以简单地将几个eos令牌附加到有害问题的末尾，而不是构造复杂的越狱提示。这将绕过LLMs的安全对齐，并导致成功的越狱攻击。我们进一步将BOOST应用于四种代表性的越狱方法，并展示了通过简单添加eos令牌到提示可以显著增强这些方法的攻击成功率。为了理解这一简单但新颖的现象，我们进行了实证分析。我们的分析揭示了添加eos令牌使目标LLM相信输入的危害要小得多，而eos令牌具有较低的注意力值，不影响LLM对有害问题的理解，导致模型实际上回应了这些问题。我们的发现揭示了LLM对越狱攻击的脆弱性，促使强大的安全对齐方法的发展。

更新时间: 2024-05-31 07:41:03

领域: cs.AI

下载: http://arxiv.org/abs/2405.20653v1

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost.

Updated: 2024-05-31 07:40:04

标题: NoiseBoost: 使用噪声扰动缓解多模态大型语言模型的幻觉

摘要: 多模态大型语言模型（MLLMs）为理解视觉信息提供了强大的机制，建立在大型语言模型的基础上。然而，MLLMs 因产生幻觉而臭名昭著，尤其是在为图像生成冗长、详细描述时。我们的分析表明，幻觉源于大型语言模型固有的总结机制，导致过度依赖语言标记而忽视视觉信息。在本文中，我们提出了 NoiseBoost，这是一种广泛适用且简单的方法，通过集成噪声特征扰动来缓解 MLLMs 的幻觉。噪声扰动充当正则化器，促进视觉和语言标记之间的注意权重的平衡分布。尽管它简单，但 NoiseBoost 在常见的训练策略下，包括监督微调和强化学习，始终提高了 MLLMs 的性能。此外，NoiseBoost 首创地为 MLLMs 启用了半监督学习，释放了未标记数据的力量。全面的实验表明，NoiseBoost 在人类评估中将稠密标题的准确性提高了 8.1%，并通过挖掘未标记数据在 50% 的数据情况下取得了可比较的结果。代码和模型可在 https://kaiwu5.github.io/noiseboost 上获得。

更新时间: 2024-05-31 07:40:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20081v2

SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning

Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

Updated: 2024-05-31 07:39:50

标题: SecureBoost+：一种用于大规模垂直联邦学习的高性能梯度提升树框架

摘要: 梯度提升决策树（GBDT）是工业界广泛使用的集成算法。其垂直联邦学习版本SecureBoost是跨界隐私保护建模中最受欢迎的算法之一。随着隐私计算领域近年来的蓬勃发展，对大规模和高性能联邦学习的需求在实际应用中急剧增长。在本文中，为了满足这些要求，我们提出了SecureBoost+，这是一种新颖且改进自之前工作SecureBoost的算法。SecureBoost+集成了几种密文计算优化和工程优化。实验结果表明，与SecureBoost相比，SecureBoost+在大规模和高维数据集上有显著的性能改进。它使有效且高效的大规模垂直联邦学习成为可能。

更新时间: 2024-05-31 07:39:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2110.10927v4

Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs

Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations of SMP. In this work, we unveil some potential pitfalls of SMP and their remedies. We first identify two limitations of SMP: undesirable representation update for multi-hop neighbors and vulnerability against oversmoothing issues. To overcome these challenges, we propose a novel message passing function called Multiset to Multiset GNN(M2M-GNN). Our theoretical analyses and extensive experiments demonstrate that M2M-GNN effectively alleviates the aforementioned limitations of SMP, yielding superior performance in comparison

Updated: 2024-05-31 07:39:22

标题: 标志不是解决方案：用于异构图上学习的多集合到多集合消息传递

摘要: 图神经网络（GNNs）作为一种强大的建模和推理方法，尤其适用于同质图结构数据，已经引起了广泛关注。为了在异质图中增强GNNs的性能，即相邻节点具有不同标签或特征的情况下，已经广泛采用了带符号的消息传递（SMP）。然而，关于SMP的局限性缺乏理论和实证分析。在这项工作中，我们揭示了SMP的一些潜在缺陷及其解决方法。我们首先确定了SMP的两个局限性：对多跳邻居的不良表示更新和对过度平滑问题的脆弱性。为了克服这些挑战，我们提出了一种名为Multiset to Multiset GNN（M2M-GNN）的新型消息传递函数。我们的理论分析和大量实验证明，M2M-GNN有效地缓解了SMP的前述局限性，表现出更卓越的性能。

更新时间: 2024-05-31 07:39:22

领域: cs.LG

下载: http://arxiv.org/abs/2405.20652v1

Reward-based Input Construction for Cross-document Relation Extraction

Relation extraction (RE) is a fundamental task in natural language processing, aiming to identify relations between target entities in text. While many RE methods are designed for a single sentence or document, cross-document RE has emerged to address relations across multiple long documents. Given the nature of long documents in cross-document RE, extracting document embeddings is challenging due to the length constraints of pre-trained language models. Therefore, we propose REward-based Input Construction (REIC), the first learning-based sentence selector for cross-document RE. REIC extracts sentences based on relational evidence, enabling the RE module to effectively infer relations. Since supervision of evidence sentences is generally unavailable, we train REIC using reinforcement learning with RE prediction scores as rewards. Experimental results demonstrate the superiority of our method over heuristic methods for different RE structures and backbones in cross-document RE. Our code is publicly available at https://github.com/aailabkaist/REIC.

Updated: 2024-05-31 07:30:34

标题: 基于奖励的输入构造用于跨文档关系抽取

摘要: 关系抽取（RE）是自然语言处理中的一个基本任务，旨在识别文本中目标实体之间的关系。虽然许多关系抽取方法设计用于单个句子或文档，但跨文档关系抽取已经出现，以解决跨多个长文档的关系。鉴于跨文档关系抽取中长文档的特性，提取文档嵌入是具有挑战性的，因为预先训练的语言模型存在长度约束。因此，我们提出了基于REward的输入构造（REIC），这是用于跨文档关系抽取的第一个基于学习的句子选择器。REIC基于关联证据提取句子，使得RE模块能够有效地推断关系。由于证据句子的监督通常是不可用的，我们使用强化学习以RE预测分数作为奖励来训练REIC。实验结果表明，我们的方法在跨文档关系抽取中的不同RE结构和骨干上优于启发式方法。我们的代码可以公开获得，链接为https://github.com/aailabkaist/REIC。

更新时间: 2024-05-31 07:30:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20649v1

Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization

Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story. Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information. In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes. By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames. Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models.

Updated: 2024-05-31 07:30:24

标题: 夏洛克·肖特：一家高效的小规模大语言视觉模型家族，用于视频字幕和总结

摘要: 视频是一种越来越突出且信息密集的媒介，但对于语言模型而言却带来了重大挑战。一个典型的视频由一系列较短的片段或镜头组成，这些片段共同构成一个连贯的叙事。每个镜头类似于句子中的一个单词，其中必须同时处理多个数据流的信息（如视觉和听觉数据）。理解整个视频不仅需要理解每个镜头的视听信息，还需要模型将各个镜头之间的思想联系起来，生成一个更大、全面的故事。尽管该领域取得了显著进展，但目前的作品往往忽视了视频更细粒度的逐镜头语义信息。在这个项目中，我们提出了一系列高效的大型语言视觉模型(LLVMs)，以促进名为Shotluck Holmes的视频摘要和字幕生成。通过利用更好的预训练和数据收集策略，我们将现有小型LLVMs的能力从理解一幅图片扩展到理解一系列帧。具体而言，我们展示了Shotluck Holmes在Shot2Story视频字幕和摘要任务上比最先进的结果表现更好，而且模型更小、计算效率更高。

更新时间: 2024-05-31 07:30:24

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20648v1

An Efficient and Multi-private Key Secure Aggregation for Federated Learning

With the emergence of privacy leaks in federated learning, secure aggregation protocols that mainly adopt either homomorphic encryption or threshold secret sharing have been widely developed for federated learning to protect the privacy of the local training data of each client. However, these existing protocols suffer from many shortcomings, such as the dependence on a trusted third party, the vulnerability to clients being corrupted, low efficiency, the trade-off between security and fault tolerance, etc. To solve these disadvantages, we propose an efficient and multi-private key secure aggregation scheme for federated learning. Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation, which has two important advantages: 1) The server and each client can freely select public and private keys without introducing a trust third party and 2) Compared to the variant ElGamal encryption, the plaintext space is relatively large, which is more suitable for the deep model. Besides, for the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D, which can greatly reduce encryption and decryption times as well as communication for ciphertext transmission. Detailed security analyses show that our proposed scheme achieves the semantic security of both individual local gradients and the aggregated result while achieving optimal robustness in tolerating both client collusion and dropped clients. Extensive simulations demonstrate that the accuracy of our scheme is almost the same as the non-private approach, while the efficiency of our scheme is much better than the state-of-the-art homomorphic encryption-based secure aggregation schemes. More importantly, the efficiency advantages of our scheme will become increasingly prominent as the number of model parameters increases.

Updated: 2024-05-31 07:29:20

标题: 一种高效且多私密密钥安全的联邦学习聚合算法

摘要: 随着联邦学习中隐私泄露的出现，主要采用同态加密或阈值秘密共享的安全聚合协议已广泛开发，用于保护每个客户端的本地训练数据的隐私。然而，这些现有协议存在许多缺点，例如依赖于受信任的第三方、容易受到客户端损坏、效率低、安全和容错之间的权衡等。为了解决这些缺点，我们提出了一种高效且多私钥安全聚合方案，用于联邦学习。具体地，我们巧妙地修改变种ElGamal加密技术，实现同态加法操作，具有两个重要优点：1）服务器和每个客户端可以自由选择公钥和私钥，而无需引入受信任的第三方；2）与变种ElGamal加密相比，明文空间相对较大，更适合深度模型。此外，针对高维深度模型参数，我们引入超增长序列将多维数据压缩为1-D，可以大大减少加密和解密次数以及用于密文传输的通信。详细的安全分析表明，我们提出的方案在实现个体本地梯度和聚合结果的语义安全性的同时，实现了在容忍客户端勾结和丢失客户端方面的最佳鲁棒性。广泛的模拟表明，我们方案的准确性几乎与非私有方法相同，而我们方案的效率远远优于基于同态加密的最先进的安全聚合方案。更重要的是，随着模型参数数量的增加，我们方案的效率优势将日益突出。

更新时间: 2024-05-31 07:29:20

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.08970v3

ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA.

Updated: 2024-05-31 07:28:40

标题: ESM全原子：统一分子建模的多尺度蛋白质语言模型

摘要: 蛋白质语言模型在蛋白质工程领域展示了显著的潜力。然而，当前的蛋白质语言模型主要在残基尺度上运行，这限制了它们在原子水平提供信息的能力。这种限制阻碍了我们充分利用蛋白质语言模型在涉及蛋白质和小分子的应用中的能力。在本文中，我们提出了ESM-AA（ESM全原子），这是一种新颖的方法，可以实现原子尺度和残基尺度统一的分子建模。ESM-AA通过在多尺度代码切换蛋白质序列上进行预训练，并利用多尺度位置编码来捕捉残基和原子之间的关系来实现这一目标。实验结果表明，ESM-AA在蛋白质-分子任务中超越了先前的方法，展示了对蛋白质语言模型的充分利用。进一步的研究表明，通过统一的分子建模，ESM-AA不仅获得了分子知识，还保留了对蛋白质的理解。ESM-AA的源代码已在https://github.com/zhengkangjie/ESM-AA 上公开发布。

更新时间: 2024-05-31 07:28:40

领域: q-bio.BM,cs.CE,cs.LG

下载: http://arxiv.org/abs/2403.12995v3

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.

Updated: 2024-05-31 07:19:03

标题: 冲突回避梯度聚合在约束多目标强化学习中的应用

摘要: 在许多现实世界的应用中，强化学习（RL）代理应该考虑多个目标并遵守安全准则。为了解决这些考虑，我们提出了一种名为Constrained Multi-Objective Gradient Aggregator（CoMOGA）的受限多目标RL算法。在多目标优化领域，管理多个目标的梯度之间的冲突至关重要，以防止策略收敛到局部最优解。有效处理安全约束以实现稳定训练和约束满足也是至关重要的。我们直接面对这些挑战，将多目标的最大化视为约束优化问题（COP），其中约束被定义为改进原始目标。现有的安全约束然后被整合到COP中，并且使用线性逼近更新策略，确保避免梯度冲突。尽管CoMOGA在表格设置中保证了最优收敛，但其简单性。通过各种实验，我们已经确认预防梯度冲突是至关重要的，并且提出的方法在所有任务中实现了约束满足。

更新时间: 2024-05-31 07:19:03

领域: cs.LG

下载: http://arxiv.org/abs/2403.00282v2

Learning Gaze-aware Compositional GAN

Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.

Updated: 2024-05-31 07:07:54

标题: 学习注视感知的组合生成对抗网络

摘要: 凝视注释的面部数据对于训练深度神经网络（DNNs）进行凝视估计至关重要。然而，由于准确注释受试者凝视方向的挑战，获取这些数据是费时且需要专门设备的。在这项工作中，我们提出了一个生成框架，通过利用标记和未标记数据源的优势来创建带注释的凝视数据。我们提出了一个凝视感知的组合生成对抗网络（GAN），该网络学习从有限标记数据集中生成带注释的面部图像。然后我们将该模型转移到未标记数据领域，以利用其提供的多样性。实验表明我们的方法在ETH-XGaze数据集中生成领域内图像增强以及在CelebAMask-HQ数据集领域中进行跨领域增强，用于凝视估计DNN训练时的有效性。我们还展示了我们工作的其他应用，包括面部图像编辑和凝视重定向。

更新时间: 2024-05-31 07:07:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20643v1

Principal-Agent Multitasking: the Uniformity of Optimal Contracts and its Efficient Learning via Instrumental Regression

This work studies the multitasking principal-agent problem. I first show a ``uniformity'' result. Specifically, when the tasks are perfect substitutes, and the agent's cost function is homogeneous to a certain degree, then the optimal contract only depends on the marginal utility of each task and the degree of homogeneity. I then study a setting where the marginal utility of each task is unknown so that the optimal contract must be learned or estimated with observational data. I identify this problem as a regression problem with measurement errors and observe that this problem can be cast as an instrumental regression problem. The current works observe that both the contract and the repeated observations (when available) can act as valid instrumental variables, and propose using the generalized method of moments estimator to compute an approximately optimal contract from offline data. I also study an online setting and show how the optimal contract can be efficiently learned in an online fashion using the two estimators. Here the principal faces an exploration-exploitation tradeoff: she must experiment with new contracts and observe their outcome whilst at the same time ensuring her experimentations are not deviating too much from the optimal contract. This work shows when repeated observations are available and agents are sufficiently ``diverse", the principal can achieve a very low $\widetilde{O}(d)$ cumulative utility loss, even with a ``pure exploitation" algorithm.

Updated: 2024-05-31 07:01:49

标题: 委托-代理多任务：最优合同的一致性及通过工具回归的有效学习

摘要: 这项工作研究了多任务委托代理问题。首先展示了一个“一致性”结果。具体来说，当任务是完全替代品时，代理的成本函数具有一定程度的齐次性时，最优合同仅取决于每项任务的边际效用和齐次性程度。然后研究了一种情况，其中每项任务的边际效用是未知的，因此最优合同必须通过观察数据进行学习或估计。将这个问题确定为带测量误差的回归问题，并观察到这个问题可以被视为一个工具回归问题。目前的研究表明，合同和重复观察（如果可用）都可以作为有效的工具变量，并建议使用广义矩估计器从离线数据中计算出一个近似最优的合同。还研究了在线设置，并展示了如何使用两个估计器以在线方式高效地学习最优合同。在这里，委托面临着探索与利用的权衡：她必须尝试新的合同并观察它们的结果，同时确保她的实验不会偏离最优合同太远。这项工作表明，当可用重复观察并且代理足够“多样化”时，委托可以通过“纯利用”算法实现非常低的Τilde{O}(d)累积效用损失。

更新时间: 2024-05-31 07:01:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20642v1

Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense

Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs. In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA.

Updated: 2024-05-31 06:56:54

标题: 查询溯源分析用于鲁棒和高效的基于查询的黑盒攻击防御

摘要: 基于查询的黑盒攻击已经成为机器学习系统的重要威胁，攻击者可以操纵输入查询生成对模型进行错误分类的对抗性示例。为了抵御这些攻击，研究人员提出了用于检测对抗性查询序列并拒绝与历史查询“相似”的查询的Stateful Defense Models（SDMs）。现有的最先进（SOTA）SDMs（例如BlackLight和PIHA）在抵御这些攻击方面表现出很高的有效性。然而，最近的研究表明它们容易受到Oracle-guided Adaptive Rejection Sampling（OARS）攻击的影响，这是一种更强大的自适应攻击策略。它可以通过利用SDMs泄露的决策信息生成具有经过微调的扰动方向和步长的查询，轻松地与现有的攻击算法集成以规避SDMs。在本文中，我们提出了一种新颖的方法，即Query Provenance Analysis（QPA），用于更强大和高效的SDMs。QPA将查询之间的历史关系封装为序列特征，以捕捉良性和对抗性查询序列之间的根本差异。为了利用查询来源，我们提出了一种具有动态管理的高效查询来源分析算法。我们将QPA与两个基准BlackLight和PIHA在四个广泛使用的数据集上以及六种基于查询的黑盒攻击算法进行了比较评估。结果显示，QPA在防御有效性和效率方面均优于基线，在非自适应和自适应攻击方面表现出色。具体而言，QPA将OARS的攻击成功率（ASR）降低到4.08%，而BlackLight和PIHA分别为77.63%和87.72%。此外，QPA的吞吐量也比BlackLight和PIHA分别高出7.67倍和2.25倍。

更新时间: 2024-05-31 06:56:54

领域: cs.CR

下载: http://arxiv.org/abs/2405.20641v1

Heterophilous Distribution Propagation for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved remarkable success in various graph mining tasks by aggregating information from neighborhoods for representation learning. The success relies on the homophily assumption that nearby nodes exhibit similar behaviors, while it may be violated in many real-world graphs. Recently, heterophilous graph neural networks (HeterGNNs) have attracted increasing attention by modifying the neural message passing schema for heterophilous neighborhoods. However, they suffer from insufficient neighborhood partition and heterophily modeling, both of which are critical but challenging to break through. To tackle these challenges, in this paper, we propose heterophilous distribution propagation (HDP) for graph neural networks. Instead of aggregating information from all neighborhoods, HDP adaptively separates the neighbors into homophilous and heterphilous parts based on the pseudo assignments during training. The heterophilous neighborhood distribution is learned with orthogonality-oriented constraint via a trusted prototype contrastive learning paradigm. Both the homophilous and heterophilous patterns are propagated with a novel semantic-aware message passing mechanism. We conduct extensive experiments on 9 benchmark datasets with different levels of homophily. Experimental results show that our method outperforms representative baselines on heterophilous datasets.

Updated: 2024-05-31 06:40:56

标题: 异质分布传播用于图神经网络

摘要: 图神经网络（GNNs）通过聚合邻域信息进行表示学习，在各种图挖掘任务中取得了显著的成功。这种成功依赖于同质性假设，即附近的节点表现出相似的行为，尽管在许多现实世界的图中可能会违反这一假设。最近，通过修改神经信息传递模式以适应异质邻域的异质图神经网络（HeterGNNs）引起了越来越多的关注。然而，它们受到邻域分区和异质性建模不足的困扰，这两者都是关键的但具有挑战性的突破点。为了应对这些挑战，在本文中，我们提出了用于图神经网络的异质分布传播（HDP）方法。HDP在训练过程中基于伪分配自适应地将邻居分为同质和异质部分，而不是从所有邻域聚合信息。异质邻域分布通过受信赖的原型对比学习范式中的正交约束学习。同质和异质模式都通过一种新颖的语义感知消息传递机制传播。我们在9个具有不同同质性水平的基准数据集上进行了大量实验。实验结果表明，我们的方法在异质数据集上优于代表性基线。

更新时间: 2024-05-31 06:40:56

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2405.20640v1

ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos

In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Large Language Models (LLMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLLM incorporates three key modules the Encoder module, Cross-Modal Synchronization module, and Multitask module crafting a generic multimodal LLM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.

Updated: 2024-05-31 05:40:56

标题: ToxVidLLM：一种基于多模态LLM的混合代码视频毒性检测框架

摘要: 在互联网技术迅速发展的时代，包括视频在内的多模态内容的激增扩展了在线交流的视野。然而，在这种多样化的环境中检测有毒内容，特别是在资源匮乏的混合语言中，仍然是一个关键挑战。尽管有大量研究致力于文本数据中有毒内容的检测，但视频内容领域，特别是在非英语语言中，却相对较少被探索。本文通过引入一个基准数据集来解决这一研究空白，这是第一个由YouTube收集的931个视频和4021个混合编码的印地语-英语话语构成的数据集。该数据集中的每个话语都经过了仔细标注，包括毒性、严重性和情感标签。我们开发了一个先进的多模态多任务框架，构建了一个基于大型语言模型（LLMs）的视频内容毒性检测系统，旨在进行情感和严重性分析等额外任务。ToxVidLLM包括三个关键模块：编码器模块、跨模态同步模块和多任务模块，打造了一个通用的多模态LLM，定制用于复杂的视频分类任务。我们的实验表明，整合视频中的多种模态显著提高了毒性内容检测的性能，分别实现了94.29%的准确率和94.35%的加权F1分数。

更新时间: 2024-05-31 05:40:56

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.20628v1

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

Updated: 2024-05-31 05:23:35

标题: 具有LLM-模块框架的鲁棒规划：旅行规划案例研究

摘要: 随着大型语言模型（LLMs）的适用性超越传统的文本处理任务，人们对它们在规划和推理任务中表现优异的潜力产生了日益增长的兴趣，这些领域传统上是系统2认知能力的专属领域。尽管它们被认为具有多功能性，但研究界仍在揭示有效的策略，以在这些复杂领域中利用这些模型。最近发布的LLM Modulo论文引入了一个重要的概念框架，提出了一个增强LLMs整合到各种规划和推理活动中的方法。本研讨会论文深入探讨了该框架在旅行规划领域中的实际应用，展示了其实施的具体实例。我们使用了OSU NLP小组的旅行规划基准，这是一个用于评估LLMs在根据用户查询生成有效行程的性能的基准，这些查询以自然语言呈现。尽管增强LLMs推理能力的流行方法，如Chain of Thought、ReAct和Reflexion，在GPT3.5-Turbo方面分别只能达到0％、0.6％和0％，但我们对TravelPlanning领域的LLM-Modulo框架的实施提供了显著的改进，将基线性能提高了4.6倍，对于老模型如GPT3.5-Turbo，从0％提高到5％。此外，我们强调了LLMs在规划流程中的其他有用角色，如LLM-Modulo所建议的，可以可靠地实施，如提取有用的评论者和评论重新表述者。

更新时间: 2024-05-31 05:23:35

领域: cs.AI

下载: http://arxiv.org/abs/2405.20625v1

Leveraging Large Language Models for Entity Matching

Entity matching (EM) is a critical task in data integration, aiming to identify records across different datasets that refer to the same real-world entities. Traditional methods often rely on manually engineered features and rule-based systems, which struggle with diverse and unstructured data. The emergence of Large Language Models (LLMs) such as GPT-4 offers transformative potential for EM, leveraging their advanced semantic understanding and contextual capabilities. This vision paper explores the application of LLMs to EM, discussing their advantages, challenges, and future research directions. Additionally, we review related work on applying weak supervision and unsupervised approaches to EM, highlighting how LLMs can enhance these methods.

Updated: 2024-05-31 05:22:07

标题: 利用大型语言模型进行实体匹配

摘要: 实体匹配（EM）是数据集成中的关键任务，旨在识别跨不同数据集指向同一现实世界实体的记录。传统方法通常依赖于手动设计的特征和基于规则的系统，这些方法在处理多样化和无结构化数据时存在困难。大语言模型（LLM）如GPT-4的出现为EM提供了变革性的潜力，利用其先进的语义理解和上下文能力。本文探讨了LLMs在EM中的应用，讨论了它们的优势、挑战和未来研究方向。此外，我们还回顾了将弱监督和无监督方法应用于EM的相关工作，强调了LLMs如何可以增强这些方法。

更新时间: 2024-05-31 05:22:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20624v1

Prune at the Clients, Not the Server: Accelerated Sparse Training in Federated Learning

In the recent paradigm of Federated Learning (FL), multiple clients train a shared model while keeping their local data private. Resource constraints of clients and communication costs pose major problems for training large models in FL. On the one hand, addressing the resource limitations of the clients, sparse training has proven to be a powerful tool in the centralized setting. On the other hand, communication costs in FL can be addressed by local training, where each client takes multiple gradient steps on its local data. Recent work has shown that local training can provably achieve the optimal accelerated communication complexity [Mishchenko et al., 2022]. Hence, one would like an accelerated sparse training algorithm. In this work we show that naive integration of sparse training and acceleration at the server fails, and how to fix it by letting the clients perform these tasks appropriately. We introduce Sparse-ProxSkip, our method developed for the nonconvex setting, inspired by RandProx [Condat and Richt\'arik, 2022], which provably combines sparse training and acceleration in the convex setting. We demonstrate the good performance of Sparse-ProxSkip in extensive experiments.

Updated: 2024-05-31 05:21:12

标题: 在客户端修剪，而不是在服务器端修剪：联邦学习中的加速稀疏训练

摘要: 在最近的联邦学习（FL）范式中，多个客户端在保持其本地数据私密的情况下训练共享模型。客户端的资源限制和通信成本对在FL中训练大型模型造成了重大问题。一方面，针对客户端的资源限制，稀疏训练已被证明是集中式环境中的一个强大工具。另一方面，FL中的通信成本可以通过本地训练来解决，其中每个客户端在其本地数据上进行多次梯度步骤。最近的研究表明，本地训练可以证明实现最优加速通信复杂性[Mishchenko等人，2022]。因此，人们希望有一种加速的稀疏训练算法。在这项工作中，我们展示了在服务器上对稀疏训练和加速进行天真整合失败的情况，并说明了如何通过让客户端适当执行这些任务来修复它。我们介绍了Sparse-ProxSkip，这是我们为非凸设置开发的方法，受RandProx [Condat和Richt\'arik，2022]启发，它可以在凸设置中证明结合稀疏训练和加速。我们通过大量实验展示了Sparse-ProxSkip的良好性能。

更新时间: 2024-05-31 05:21:12

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.20623v1

Superfast Selection for Decision Tree Algorithms

We present a novel and systematic method, called Superfast Selection, for selecting the "optimal split" for decision tree and feature selection algorithms over tabular data. The method speeds up split selection on a single feature by lowering the time complexity, from O(MN) (using the standard selection methods) to O(M), where M represents the number of input examples and N the number of unique values. Additionally, the need for pre-encoding, such as one-hot or integer encoding, for feature value heterogeneity is eliminated. To demonstrate the efficiency of Superfast Selection, we empower the CART algorithm by integrating Superfast Selection into it, creating what we call Ultrafast Decision Tree (UDT). This enhancement enables UDT to complete the training process with a time complexity O(KMlogM) (K is the number of features). Additionally, the Training Only Once Tuning enables UDT to avoid the repetitive training process required to find the optimal hyper-parameter. Experiments show that the UDT can finish a single training on KDD99-10% dataset (494K examples with 41 features) within 1 second and tuning with 214.8 sets of hyper-parameters within 0.25 second on a laptop.

Updated: 2024-05-31 05:18:05

标题: 决策树算法的超快速选择

摘要: 我们提出了一种新颖和系统化的方法，称为超快速选择，用于在表格数据上选择决策树和特征选择算法的“最佳分割”。该方法通过降低时间复杂度，从O（MN）（使用标准选择方法）降低到O（M）来加快对单个特征的分割选择，其中M代表输入示例的数量，N代表唯一值的数量。此外，消除了对特征值异质性的预编码，如独热编码或整数编码的需求。为了展示超快速选择的效率，我们将其集成到CART算法中，创建了我们称之为超快速决策树（UDT）。这种增强使得UDT能够在时间复杂度为O（KMlogM）（K是特征数量）的情况下完成训练过程。此外，仅训练一次调优使得UDT能够避免寻找最佳超参数所需的重复训练过程。实验表明，在笔记本电脑上，UDT能够在1秒内完成对KDD99-10%数据集（494K个示例，41个特征）的单次训练，并在0.25秒内对214.8组超参数进行调优。

更新时间: 2024-05-31 05:18:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.20622v1

"Forgetting" in Machine Learning and Beyond: A Survey

This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model performance and enhance data privacy. Moreover, the paper discusses current challenges, future directions, and ethical considerations regarding the integration of forgetting mechanisms into machine learning models.

Updated: 2024-05-31 05:10:30

标题: 机器学习及其应用中的“遗忘”：一项调查

摘要: 这项调查探讨了机器学习中遗忘的多面性，从神经科学研究中得出结论，认为遗忘是一种适应性功能，而不是缺陷，可以增强学习过程并防止过拟合。该调查着重探讨了遗忘的好处及其在各种机器学习子领域中的应用，可以帮助提高模型性能并增强数据隐私。此外，该论文讨论了将遗忘机制整合到机器学习模型中所面临的当前挑战、未来方向和伦理考虑。

更新时间: 2024-05-31 05:10:30

领域: cs.LG

下载: http://arxiv.org/abs/2405.20620v1

LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification

Existing heterogeneous graph neural network algorithms (HGNNs) mostly rely on meta-paths to capture the rich semantic information contained in heterogeneous graphs (also known as heterogeneous information networks (HINs)), but most of these HGNNs focus on different ways of feature aggre gation and ignore the properties of the meta-paths themselves. This paper studies meta-paths in three commonly used data sets and finds that there are huge differences in the number of neighbors connected by different meta paths. At the same time, the noise information contained in large neigh bor paths will have an adverse impact on model performance. Therefore, this paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Iden tification(LSPI). LSPI firstly divides the meta-paths into large and small neighbor paths through the path discriminator , and in order to reduce the noise interference problem in large neighbor paths, LSPI selects neighbor nodes with higher similarity from both topology and feature perspectives, and passes small neighbor paths and filtered large neighbor paths through different graph convolution components. Aggregation is performed to obtain feature information under different subgraphs, and then LSPI uses subgraph level attention to fuse the feature information under different subgraphs to generate the final node embedding. Finally this paper verifies the superiority of the method through extensive experiments and also gives suggestions on the number of nodes to be retained in large neighbor paths through exper iments. The complete reproducible code adn data has been published at: https://github.com/liuhua811/LSPIA.

Updated: 2024-05-31 05:03:48

标题: LSPI：基于大小邻居路径识别的异质图神经网络分类聚合算法

摘要: 现有的异质图神经网络算法（HGNNs）大多依赖于元路径来捕捉异构图中所包含的丰富语义信息（也称为异构信息网络（HINs）），但大多数这些HGNNs专注于不同的特征聚合方式，忽略了元路径本身的属性。本文研究了三个常用数据集中的元路径，并发现不同元路径连接的邻居数量存在巨大差异。同时，大邻居路径中包含的噪声信息将对模型性能产生不利影响。因此，本文提出了一种基于大和小邻居路径识别的异构图神经网络分类和聚合算法（LSPI）。LSPI首先通过路径鉴别器将元路径划分为大和小邻居路径，为了减少大邻居路径中的噪声干扰问题，LSPI从拓扑和特征两个角度选择具有较高相似度的邻居节点，并通过不同的图卷积组件传递小邻居路径和经过筛选的大邻居路径。聚合操作用于在不同子图下获取特征信息，然后LSPI使用子图级别的注意力将不同子图下的特征信息融合生成最终的节点嵌入。最后，本文通过大量实验验证了该方法的优越性，并通过实验提出了大邻居路径中应保留的节点数量建议。完整的可复现代码和数据已经发布在：https://github.com/liuhua811/LSPIA。

更新时间: 2024-05-31 05:03:48

领域: cs.LG

下载: http://arxiv.org/abs/2405.18933v2

ParSEL: Parameterized Shape Editing with Language

The ability to edit 3D assets from natural language presents a compelling paradigm to aid in the democratization of 3D content creation. However, while natural language is often effective at communicating general intent, it is poorly suited for specifying precise manipulation. To address this gap, we introduce ParSEL, a system that enables controllable editing of high-quality 3D assets from natural language. Given a segmented 3D mesh and an editing request, ParSEL produces a parameterized editing program. Adjusting the program parameters allows users to explore shape variations with a precise control over the magnitudes of edits. To infer editing programs which align with an input edit request, we leverage the abilities of large-language models (LLMs). However, while we find that LLMs excel at identifying initial edit operations, they often fail to infer complete editing programs, and produce outputs that violate shape semantics. To overcome this issue, we introduce Analytical Edit Propagation (AEP), an algorithm which extends a seed edit with additional operations until a complete editing program has been formed. Unlike prior methods, AEP searches for analytical editing operations compatible with a range of possible user edits through the integration of computer algebra systems for geometric analysis. Experimentally we demonstrate ParSEL's effectiveness in enabling controllable editing of 3D objects through natural language requests over alternative system designs.

Updated: 2024-05-31 04:09:41

标题: ParSEL：具有语言的参数化形状编辑

摘要: 能够通过自然语言编辑3D资产的能力，为3D内容创作的民主化提供了一个引人注目的范例。然而，虽然自然语言通常能够有效传达一般意图，但并不适合精确规定操作。为了填补这一差距，我们引入了ParSEL，这是一个能够从自然语言实现对高质量3D资产进行可控编辑的系统。给定一个分段的3D网格和一个编辑请求，ParSEL会生成一个参数化的编辑程序。调整程序参数允许用户精确控制编辑的幅度，从而探索形状变化。为了推断与输入编辑请求相一致的编辑程序，我们利用大型语言模型（LLMs）的能力。然而，我们发现LLMs擅长识别初始编辑操作，但往往无法推断完整的编辑程序，并产生违反形状语义的输出。为了解决这个问题，我们引入了分析编辑传播（AEP），这是一种算法，通过将种子编辑扩展为额外操作，直到形成完整的编辑程序。与以往的方法不同，AEP通过整合用于几何分析的计算机代数系统，搜索与一系列可能的用户编辑兼容的分析编辑操作。在实验中，我们展示了ParSEL通过自然语言请求实现对3D对象进行可控编辑的有效性，相比其他系统设计。

更新时间: 2024-05-31 04:09:41

领域: cs.CV,cs.AI,cs.GR,cs.HC,cs.SC

下载: http://arxiv.org/abs/2405.20319v2

Federated Compositional Deep AUC Maximization

Federated learning has attracted increasing attention due to the promise of balancing privacy and large-scale learning; numerous approaches have been proposed. However, most existing approaches focus on problems with balanced data, and prediction performance is far from satisfactory for many real-world applications where the number of samples in different classes is highly imbalanced. To address this challenging problem, we developed a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score. In particular, we formulate the AUC maximization problem as a federated compositional minimax optimization problem, develop a local stochastic compositional gradient descent ascent with momentum algorithm, and provide bounds on the computational and communication complexities of our algorithm. To the best of our knowledge, this is the first work to achieve such favorable theoretical results. Finally, extensive experimental results confirm the efficacy of our method.

Updated: 2024-05-31 04:02:22

标题: 联邦式组合深度AUC最大化

摘要: 联邦学习因平衡隐私和大规模学习的承诺而受到越来越多的关注；已经提出了许多方法。然而，大多数现有方法侧重于平衡数据的问题，并且对许多真实应用中不同类别样本数量高度不平衡的情况下，预测性能远未令人满意。为了解决这一具有挑战性的问题，我们开发了一种针对不平衡数据的新型联邦学习方法，通过直接优化曲线下面积（AUC）得分。具体而言，我们将AUC最大化问题制定为联邦组合极小极大优化问题，开发了一个具有动量的本地随机组合梯度下降上升算法，并提供了我们算法的计算和通信复杂性的界限。据我们所知，这是第一项实现如此有利的理论结果的工作。最后，大量的实验结果证实了我们方法的有效性。

更新时间: 2024-05-31 04:02:22

领域: cs.LG

下载: http://arxiv.org/abs/2304.10101v2

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of model outputs, but the internal mechanisms that lead to such bias remain unexplored. Our work delves into these mechanisms, particularly investigating how feedforward neural networks (FFNs) and attention heads result in the bias of LLMs. By Interpreting the contribution of individual FFN vectors and attention heads, we identify the biased LLM components that skew LLMs' prediction toward specific labels. To mitigate these biases, we introduce UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads. Extensive experiments across 12 NLP datasets demonstrate that UniBias significantly enhances ICL performance and alleviates prompt brittleness of LLMs.

Updated: 2024-05-31 03:59:15

标题: UniBias: 通过内部注意力和FFN操作揭示和减轻LLM偏差

摘要: 大型语言模型（LLM）已经展示出在各种任务中使用上下文学习（ICL）范式的令人印象深刻的能力。然而，它们的有效性常常受到固有偏见的影响，导致了即时脆性，即对设计设置（例如示例选择、顺序和提示格式）的敏感性。先前的研究通过对模型输出进行外部调整来解决LLM偏见问题，但导致这种偏见的内在机制尚未被探索。我们的工作深入研究了这些机制，特别是研究了前馈神经网络（FFNs）和注意头如何导致LLM的偏见。通过解释单个FFN向量和注意头的贡献，我们确定了偏见LLM组件，这些组件使LLM的预测偏向特定标签。为了减轻这些偏见，我们引入了UniBias，这是一种仅推理的方法，可以有效地识别和消除有偏见的FFN向量和注意头。对12个自然语言处理数据集进行的大量实验表明，UniBias显著提高了ICL性能，并减轻了LLM的即时脆性。

更新时间: 2024-05-31 03:59:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20612v1

Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code

Detecting vulnerabilities within compiled binaries is challenging due to lost high-level code structures and other factors such as architectural dependencies, compilers, and optimization options. To address these obstacles, this research explores vulnerability detection by using natural language processing (NLP) embedding techniques with word2vec, BERT, and RoBERTa to learn semantics from intermediate representation (LLVM) code. Long short-term memory (LSTM) neural networks were trained on embeddings from encoders created using approximately 118k LLVM functions from the Juliet dataset. This study is pioneering in its comparison of word2vec models with multiple bidirectional transformer (BERT, RoBERTa) embeddings built using LLVM code to train neural networks to detect vulnerabilities in compiled binaries. word2vec Continuous Bag of Words (CBOW) models achieved 92.3% validation accuracy in detecting vulnerabilities, outperforming word2vec Skip-Gram, BERT, and RoBERTa. This suggests that complex contextual NLP embeddings may not provide advantages over simpler word2vec models for this task when a limited number (e.g. 118K) of data samples are used to train the bidirectional transformer-based models. The comparative results provide novel insights into selecting optimal embeddings for learning compiler-independent semantic code representations to advance machine learning detection of vulnerabilities in compiled binaries.

Updated: 2024-05-31 03:57:19

标题: 双向变压器与word2vec：发现提升编译代码中的漏洞

摘要: 在编译的二进制文件中检测漏洞是具有挑战性的，因为丢失了高级代码结构以及其他因素，如架构依赖、编译器和优化选项。为了解决这些障碍，本研究探讨了利用自然语言处理（NLP）嵌入技术，如word2vec、BERT和RoBERTa，从中间表示（LLVM）代码中学习语义来检测漏洞。长短期记忆（LSTM）神经网络在使用从Juliet数据集中的约118k个LLVM函数创建的编码器生成的嵌入进行训练。本研究在比较word2vec模型与使用LLVM代码构建的多个双向变换器（BERT、RoBERTa）嵌入来训练神经网络以检测编译的二进制文件中的漏洞方面具有开创性。word2vec的CBOW模型在检测漏洞时达到92.3％的验证准确率，胜过word2vec的Skip-Gram、BERT和RoBERTa。这表明，当使用有限数量（例如118K）的数据样本来训练双向变换器模型时，复杂的上下文NLP嵌入可能不会为此任务提供优势，相较于更简单的word2vec模型。比较结果为选择学习与编译器无关的语义代码表示的最佳嵌入提供了新颖见解，以推进机器学习在编译的二进制文件中检测漏洞的能力。

更新时间: 2024-05-31 03:57:19

领域: cs.CR,cs.CL,cs.LG,cs.SE,D.4.6; I.2.6; I.5.1

下载: http://arxiv.org/abs/2405.20611v1

Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a novel skeleton-based training framework (C$^2$VL) based on Cross-modal Contrastive learning that uses the progressive distillation to learn task-agnostic human skeleton action representation from the Vision-Language knowledge prompts. Specifically, we establish the vision-language action concept space through vision-language knowledge prompts generated by pre-trained large multimodal models (LMMs), which enrich the fine-grained details that the skeleton action space lacks. Moreover, we propose the intra-modal self-similarity and inter-modal cross-consistency softened targets in the cross-modal contrastive process to progressively control and guide the degree of pulling vision-language knowledge prompts and corresponding skeletons closer. These soft instance discrimination and self-knowledge distillation strategies contribute to the learning of better skeleton-based action representations from the noisy skeleton-vision-language pairs. During the inference phase, our method requires only the skeleton data as the input for action recognition and no longer for vision-language prompts. Extensive experiments show that our method achieves state-of-the-art results on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The code will be available in the future.

Updated: 2024-05-31 03:40:15

标题: 视觉语言遇见骨架：跨模态知识逐步蒸馏用于3D动作表示学习的进展

摘要: 监督学习和自监督学习是基于骨骼的人体动作识别的两种主要训练范式。然而，前者需要耗时耗力的预定义动作类别注释进行独热分类，而后者涉及在假设任务中进行骨骼变换（例如裁剪），可能会损害骨架结构。为了解决这些挑战，我们引入了一种基于交叉模态对比学习的新型骨架训练框架（C$^2$VL），利用渐进蒸馏从视觉语言知识提示中学习任务无关的人体骨架动作表示。具体来说，我们通过预训练的大型多模态模型（LMMs）生成的视觉语言知识提示建立了视觉语言动作概念空间，这些提示丰富了骨骼动作空间缺少的细粒度细节。此外，我们在交叉模态对比过程中提出了内模态自相似性和跨模态交叉一致性软化目标，逐渐控制和引导将视觉语言知识提示和相应骨架拉近的程度。这些软实例区分和自我知识蒸馏策略有助于从嘈杂的骨架-视觉语言对中学习更好的基于骨架的动作表示。在推理阶段，我们的方法仅需要骨骼数据作为动作识别的输入，不再需要视觉语言提示。大量实验证明，我们的方法在NTU RGB+D 60、NTU RGB+D 120和PKU-MMD数据集上取得了最先进的结果。该代码将来会提供。

更新时间: 2024-05-31 03:40:15

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.20606v1

Searching for internal symbols underlying deep learning

Deep learning (DL) enables deep neural networks (DNNs) to automatically learn complex tasks or rules from given examples without instructions or guiding principles. As we do not engineer DNNs' functions, it is extremely difficult to diagnose their decisions, and multiple lines of studies proposed to explain principles of DNNs/DL operations. Notably, one line of studies suggests that DNNs may learn concepts, the high level features recognizable to humans. Thus, we hypothesized that DNNs develop abstract codes, not necessarily recognizable to humans, which can be used to augment DNNs' decision-making. To address this hypothesis, we combined foundation segmentation models and unsupervised learning to extract internal codes and identify potential use of abstract codes to make DL's decision-making more reliable and safer.

Updated: 2024-05-31 03:39:26

标题: 寻找深度学习中的内在符号

摘要: 深度学习（DL）使深度神经网络（DNNs）能够自动从给定的示例中学习复杂任务或规则，而无需指令或引导原则。由于我们并未设计DNNs的功能，因此诊断它们的决策非常困难，许多研究提出了解释DNNs/DL操作原则的多条线索。值得注意的是，一些研究表明DNNs可能学习概念，即对人类可识别的高级特征。因此，我们假设DNNs发展出抽象代码，不一定可被人类识别，这些代码可以用来增强DNNs的决策过程。为了验证这一假设，我们结合了基础分割模型和无监督学习，提取内部代码并确定抽象代码的潜在用途，以使DL的决策过程更加可靠和安全。

更新时间: 2024-05-31 03:39:26

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.20605v1

Quantum linear algebra is all you need for Transformer architectures

Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-language models, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage.

Updated: 2024-05-31 03:34:57

标题: 量子线性代数就是Transformer架构所需的一切

摘要: 生成式机器学习方法，如大型语言模型，正在革新文本和图像的创作。虽然这些模型非常强大，但也利用了大量的计算资源。Transformer是大型语言模型中的关键组件，旨在生成给定部分序列的适当完成。在这项工作中，我们研究了在容错量子计算的视角下的transformer架构。输入模型是一个模型，在这个模型中，训练好的权重矩阵被给定为块编码，并且我们构建了transformer的查询、键和值矩阵。我们展示了如何准备自注意力矩阵的块编码，以及一种新的子程序用于对softmax函数进行逐行应用。此外，我们结合了量子子程序来构建transformer中的重要构建块，包括残差连接和层归一化以及前馈神经网络。我们的子程序准备了transformer输出的振幅编码，可以对其进行测量以获得预测。基于常见的开源大型语言模型，我们提供了有关决定量子算法运行时间的重要参数行为的见解。我们讨论了获得量子优势的潜力和挑战。

更新时间: 2024-05-31 03:34:57

领域: quant-ph,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.16714v2

MSSC-BiMamba: Multimodal Sleep Stage Classification and Early Diagnosis of Sleep Disorders with Bidirectional Mamba

Monitoring sleep states is essential for evaluating sleep quality and diagnosing sleep disorders. Traditional manual staging is time-consuming and prone to subjective bias, often resulting in inconsistent outcomes. Here, we developed an automated model for sleep staging and disorder classification to enhance diagnostic accuracy and efficiency. Considering the characteristics of polysomnography (PSG) multi-lead sleep monitoring, we designed a multimodal sleep state classification model, MSSC-BiMamba, that combines an Efficient Channel Attention (ECA) mechanism with a Bidirectional State Space Model (BSSM). The ECA module allows for weighting data from different sensor channels, thereby amplifying the influence of diverse sensor inputs. Additionally, the implementation of bidirectional Mamba (BiMamba) enables the model to effectively capture the multidimensional features and long-range dependencies of PSG data. The developed model demonstrated impressive performance on sleep stage classification tasks on both the ISRUC-S3 and ISRUC-S1 datasets, respectively containing data with healthy and unhealthy sleep patterns. Also, the model exhibited a high accuracy for sleep health prediction when evaluated on a combined dataset consisting of ISRUC and Sleep-EDF. Our model, which can effectively handle diverse sleep conditions, is the first to apply BiMamba to sleep staging with multimodal PSG data, showing substantial gains in computational and memory efficiency over traditional Transformer-style models. This method enhances sleep health management by making monitoring more accessible and extending advanced healthcare through innovative technology.

Updated: 2024-05-31 03:31:23

标题: MSSC-BiMamba：双向曼巴用于多模式睡眠分期分类和睡眠障碍早期诊断

摘要: 监测睡眠状态对评估睡眠质量和诊断睡眠障碍至关重要。传统的手动分期耗时且容易受主观偏见影响，通常导致结果不一致。在这里，我们开发了一个自动化模型用于睡眠分期和障碍分类，以提高诊断准确性和效率。考虑到多导睡眠监测多通道多模态性的特征，我们设计了一个多模态睡眠状态分类模型MSSC-BiMamba，结合了高效通道注意机制（ECA）和双向状态空间模型（BSSM）。ECA模块允许对来自不同传感器通道的数据进行加权，从而放大各种传感器输入的影响。此外，双向Mamba（BiMamba）的实现使模型能够有效捕捉PSG数据的多维特征和长程依赖关系。开发的模型在ISRUC-S3和ISRUC-S1数据集上的睡眠分期分类任务上展现了出色的性能，分别包含健康和不健康睡眠模式的数据。此外，该模型在由ISRUC和Sleep-EDF组成的综合数据集上评估时表现出高的睡眠健康预测准确性。我们的模型可以有效处理各种睡眠条件，是首个将BiMamba应用于多模态PSG数据的睡眠分期模型，在传统Transformer风格模型上取得了计算和内存效率上的实质性进展。这种方法通过使监测更加便捷和通过创新技术延伸先进医疗保健，增强了睡眠健康管理。

更新时间: 2024-05-31 03:31:23

领域: cs.AI

下载: http://arxiv.org/abs/2405.20142v2

Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis

This paper focuses on the application and optimization of LSTM model in financial risk prediction. The study starts with an overview of the architecture and algorithm foundation of LSTM, and then details the model training process and hyperparameter tuning strategy, and adjusts network parameters through experiments to improve performance. Comparative experiments show that the optimized LSTM model shows significant advantages in AUC index compared with random forest, BP neural network and XGBoost, which verifies its efficiency and practicability in the field of financial risk prediction, especially its ability to deal with complex time series data, which lays a solid foundation for the application of the model in the actual production environment.

Updated: 2024-05-31 03:31:17

标题: 通过优化LSTM模型性能和比较分析推进金融风险预测

摘要: 本文侧重于LSTM模型在金融风险预测中的应用和优化。研究从LSTM的架构和算法基础概述开始，详细介绍了模型训练过程和超参数调优策略，并通过实验调整网络参数以改善性能。比较实验表明，优化后的LSTM模型在AUC指数上较随机森林、BP神经网络和XGBoost表现出显著优势，验证了其在金融风险预测领域的效率和实用性，尤其是其处理复杂时间序列数据的能力，为模型在实际生产环境中的应用奠定了坚实基础。

更新时间: 2024-05-31 03:31:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20603v1

Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis

In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training.

Updated: 2024-05-31 03:26:42

标题: 掩盖语言建模变为条件密度估计用于表格数据合成

摘要: 本文的目标是生成具有高机器学习效用（MLu）的异构（混合类型）表格数据的合成数据。鉴于MLu性能依赖于准确逼近条件分布，我们专注于设计一种基于条件分布估计的合成数据生成方法。我们提出了一种新颖的合成数据生成方法MaCoDE，通过重新定义掩码语言建模（MLM）的多类分类任务，将其作为基于直方图的非参数条件密度估计。我们的方法能够估计任意目标和条件变量的条件密度。此外，我们展示了我们的方法弥合了分布学习和MLM之间的理论差距。为验证我们提出的模型的有效性，我们在10个真实世界数据集上进行了合成数据生成实验。鉴于MLM中预测掩码输入标记与缺失数据插补之间的类比，我们还评估了不完整数据集上多重插补的性能，涵盖各种缺失数据机制。此外，我们提出的模型还具有调整数据隐私级别而无需重新训练的优点。

更新时间: 2024-05-31 03:26:42

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.20602v1

Enabling Weak LLMs to Judge Response Reliability via Meta Ranking

Despite the strong performance of large language models (LLMs) across a wide range of tasks, they still have reliability issues. Previous studies indicate that strong LLMs like GPT-4-turbo excel in evaluating the reliability of responses from LLMs, but face efficiency and local deployment issues. Thus, to enable weak LLMs to effectively assess the reliability of LLM responses, we propose a novel cross-query-comparison-based method called $\textit{Meta Ranking}$ (MR). Unlike previous few-shot methods that solely based on in-context learning capabilities in LLMs, MR assesses reliability by pairwisely ranking the target query-response pair with multiple reference query-response pairs. We found that MR is highly effective in error detection for LLM responses, where weak LLMs, such as Phi-2, could surpass strong baselines like GPT-3.5-turbo, requiring only five reference samples and significantly improving efficiency. We further demonstrate that MR can enhance strong LLMs' performance in two practical applications: model cascading and instruction tuning. In model cascading, we combine open- and closed-source LLMs to achieve performance comparable to GPT-4-turbo with lower costs. In instruction tuning, we use MR for iterative training data filtering, significantly reducing data processing time and enabling LLaMA-7B and Phi-2 to surpass Alpaca-13B with fewer training tokens. These results underscore the high potential of MR in both efficiency and effectiveness.

Updated: 2024-05-31 03:25:42

标题: 通过元排名使弱LLMs能够判断响应可靠性

摘要: 尽管大型语言模型（LLMs）在各种任务中表现出色，但它们仍然存在可靠性问题。先前的研究表明，像GPT-4-turbo这样强大的LLMs在评估LLMs响应的可靠性方面表现出色，但面临效率和本地部署问题。因此，为了使弱LLMs能够有效评估LLMs响应的可靠性，我们提出了一种新颖的基于交叉查询比较的方法，称为$\textit{Meta Ranking}$（MR）。与先前仅基于LLMs上下文学习能力的少量样本方法不同，MR通过将目标查询-响应对与多个参考查询-响应对进行成对排名来评估可靠性。我们发现，MR在LLMs响应的错误检测方面非常有效，弱LLMs（如Phi-2）可以超越GPT-3.5-turbo等强基线，只需五个参考样本就能显著提高效率。我们进一步展示，MR可以提高强LLMs在两种实际应用中的性能：模型级联和指令调整。在模型级联中，我们结合开源和闭源LLMs，以更低成本实现与GPT-4-turbo相媲美的性能。在指令调整中，我们使用MR进行迭代训练数据过滤，显著减少数据处理时间，使LLaMA-7B和Phi-2能够在更少的训练令牌下超越Alpaca-13B。这些结果突显了MR在效率和效果方面的巨大潜力。

更新时间: 2024-05-31 03:25:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12146v3

Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate

Stochastic multi-level compositional optimization problems cover many new machine learning paradigms, e.g., multi-step model-agnostic meta-learning, which require efficient optimization algorithms for large-scale data. This paper studies the decentralized stochastic multi-level optimization algorithm, which is challenging because the multi-level structure and decentralized communication scheme may make the number of levels significantly affect the order of the convergence rate. To this end, we develop two novel decentralized optimization algorithms to optimize the multi-level compositional optimization problem. Our theoretical results show that both algorithms can achieve the level-independent convergence rate for nonconvex problems under much milder conditions compared with existing single-machine algorithms. To the best of our knowledge, this is the first work that achieves the level-independent convergence rate under the decentralized setting. Moreover, extensive experiments confirm the efficacy of our proposed algorithms.

Updated: 2024-05-31 03:23:18

标题: 分散式多级组合优化算法与独立级收敛速率

摘要: 随机多层组合优化问题涵盖许多新的机器学习范式，例如多步模型无关元学习，这需要高效的优化算法来处理大规模数据。本文研究了分散式随机多层优化算法，这是具有挑战性的，因为多层结构和分散式通信方案可能会显著影响收敛速度的顺序。为此，我们开发了两种新颖的分散式优化算法来优化多层组合优化问题。我们的理论结果表明，与现有的单机算法相比，这两种算法都可以在更温和的条件下实现非凸问题的与层无关的收敛速度。据我们所知，这是在分散式环境下实现与层无关收敛速度的第一项工作。此外，广泛的实验验证了我们提出的算法的有效性。

更新时间: 2024-05-31 03:23:18

领域: cs.LG

下载: http://arxiv.org/abs/2306.03322v2

Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious catastrophic forgetting caused by partial label problem and inadequate label semantics mining. In this paper, we propose an augmented emotional semantics learning framework for multi-label class incremental emotion decoding. Specifically, we design an augmented emotional relation graph module with label disambiguation to handle the past-missing partial label problem. Then, we leverage domain knowledge from affective dimension space to alleviate future-missing partial label problem by knowledge distillation. Besides, an emotional semantics learning module is constructed with a graph autoencoder to obtain emotion embeddings in order to guide the semantic-specific feature decoupling for better multi-label learning. Extensive experiments on three datasets show the superiority of our method for improving emotion decoding performance and mitigating forgetting on MLCIL problem.

Updated: 2024-05-31 03:16:54

标题: 使用增强情感语义学习的多标签类增量情绪解码

摘要: 情绪解码在情感人机交互中扮演着重要角色。然而，先前的研究忽视了动态的现实场景，人类在其中经历多种情绪的融合，这些情绪逐渐融入模型中，导致多标签类增量学习（MLCIL）问题。现有方法在解决MLCIL问题时存在困难，部分原因是由于部分标签问题引起的臭名昭著的遗忘现象和标签语义挖掘不足。在本文中，我们提出了一个增强的情感语义学习框架，用于多标签类增量情绪解码。具体来说，我们设计了一个带有标签消歧的增强情感关系图模块，以解决过去缺失的部分标签问题。然后，我们利用情感维度空间的领域知识，通过知识蒸馏来缓解未来缺失的部分标签问题。此外，构建了一个带有图自动编码器的情感语义学习模块，以获取情绪嵌入，以引导更好的多标签学习的语义特定特征解耦。对三个数据集进行的大量实验表明，我们的方法在提高情绪解码性能和减轻MLCIL问题中的遗忘方面表现出优越性。

更新时间: 2024-05-31 03:16:54

领域: cs.AI

下载: http://arxiv.org/abs/2405.20600v1

Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

Traditional semi-supervised learning (SSL) assumes that the feature distributions of labeled and unlabeled data are consistent which rarely holds in realistic scenarios. In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples. Under this setting, previous SSL methods tend to predict wrong pseudo-labels with the model fitted on labeled data, resulting in noise accumulation. To tackle this issue, we propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions. SSFA decouples the prediction of pseudo-labels from the current model to improve the quality of pseudo-labels. Particularly, SSFA incorporates a self-supervised task into the SSL framework and uses it to adapt the feature extractor of the model to the unlabeled data. In this way, the extracted features better fit the distribution of unlabeled data, thereby generating high-quality pseudo-labels. Extensive experiments show that our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.

Updated: 2024-05-31 03:13:45

标题: 通过自监督特征适应实现广义半监督学习

摘要: 传统的半监督学习（SSL）假设标记和未标记数据的特征分布是一致的，而在现实场景中很少成立。在本文中，我们提出了一种新颖的SSL设置，其中未标记样本来自一个与标记样本的特征分布偏离的混合分布。在这种设置下，先前的SSL方法往往会使用在标记数据上拟合的模型预测错误的伪标签，导致噪声积累。为了解决这个问题，我们提出了自监督特征适应（SSFA），这是一个用于改进SSL性能的通用框架，当标记和未标记数据来自不同分布时。SSFA将伪标签的预测与当前模型解耦，以提高伪标签的质量。特别地，SSFA将自监督任务融入SSL框架中，并使用它来调整模型的特征提取器以适应未标记数据。通过这种方式，提取的特征更好地适应未标记数据的分布，从而产生高质量的伪标签。大量实验证明，我们提出的SSFA适用于各种基于伪标签的SSL学习器，并显著提高了在标记、未标记甚至看不见的分布中的性能。

更新时间: 2024-05-31 03:13:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20596v1

Deep Learning without Weight Symmetry

Backpropagation (BP), a foundational algorithm for training artificial neural networks, predominates in contemporary deep learning. Although highly successful, it is often considered biologically implausible. A significant limitation arises from the need for precise symmetry between connections in the backward and forward pathways to backpropagate gradient signals accurately, which is not observed in biological brains. Researchers have proposed several algorithms to alleviate this symmetry constraint, such as feedback alignment and direct feedback alignment. However, their divergence from backpropagation dynamics presents challenges, particularly in deeper networks and convolutional layers. Here we introduce the Product Feedback Alignment (PFA) algorithm. Our findings demonstrate that PFA closely approximates BP and achieves comparable performance in deep convolutional networks while avoiding explicit weight symmetry. Our results offer a novel solution to the longstanding weight symmetry problem, leading to more biologically plausible learning in deep convolutional networks compared to earlier methods.

Updated: 2024-05-31 03:11:19

标题: 不对称权重的深度学习

摘要: 反向传播（BP）是训练人工神经网络的基础算法，在当代深度学习中占主导地位。虽然非常成功，但通常被认为在生物学上不可行。一个重要的局限性来自于需要在向后和向前路径之间保持精确的连接对称性，以准确地反向传播梯度信号，而这在生物大脑中并不观察到。研究人员提出了几种算法来减轻这种对称性约束，如反馈对齐和直接反馈对齐。然而，它们与反向传播动态的偏离提出了挑战，特别是在更深的网络和卷积层中。在这里，我们介绍了产品反馈对齐（PFA）算法。我们的研究结果表明，PFA与BP非常接近，并在深度卷积网络中实现了可比拟的性能，同时避免了显式的权重对称性。我们的结果提供了一个新颖的解决方案，解决了长期存在的权重对称性问题，在深度卷积网络中实现了更具生物学可行性的学习，相比之前的方法。

更新时间: 2024-05-31 03:11:19

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2405.20594v1

Learning to Learn for Few-shot Continual Active Learning

Continual learning strives to ensure stability in solving previously seen tasks while demonstrating plasticity in a novel domain. Recent advances in continual learning are mostly confined to a supervised learning setting, especially in NLP domain. In this work, we consider a few-shot continual active learning setting where labeled data are inadequate, and unlabeled data are abundant but with a limited annotation budget. We exploit meta-learning and propose a method, called Meta-Continual Active Learning. This method sequentially queries the most informative examples from a pool of unlabeled data for annotation to enhance task-specific performance and tackle continual learning problems through meta-objective. Specifically, we employ meta-learning and experience replay to address inter-task confusion and catastrophic forgetting. We further incorporate textual augmentations to avoid memory over-fitting caused by experience replay and sample queries, thereby ensuring generalization. We conduct extensive experiments on benchmark text classification datasets from diverse domains to validate the feasibility and effectiveness of meta-continual active learning. We also analyze the impact of different active learning strategies on various meta continual learning models. The experimental results demonstrate that introducing randomness into sample selection is the best default strategy for maintaining generalization in meta-continual learning framework.

Updated: 2024-05-31 03:07:23

标题: 学习学习以进行少样本连续主动学习

摘要: Continual learning旨在确保解决先前看到的任务的稳定性，同时在新领域展现出可塑性。最近在continual learning方面的进展主要局限在监督学习环境中，尤其是在NLP领域。在这项工作中，我们考虑了一个few-shot continual active learning的设置，其中标记数据不足，而未标记数据丰富但标注预算有限。我们利用元学习，提出了一种称为Meta-Continual Active Learning的方法。该方法从未标记数据池中顺序查询最具信息量的示例以增强任务特定性能并通过元目标解决continual learning问题。具体来说，我们采用元学习和经验重演来解决任务间混淆和灾难性遗忘问题。我们进一步结合文本增强来避免经验重演和样本查询引起的记忆过度拟合，从而确保泛化性。我们在来自不同领域的基准文本分类数据集上进行了大量实验，以验证元continual active learning的可行性和有效性。我们还分析了不同主动学习策略对各种元continual learning模型的影响。实验结果表明，在meta-continual learning框架中引入随机性选择样本是保持泛化的最佳默认策略。

更新时间: 2024-05-31 03:07:23

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2311.03732v4

LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis

In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques for solving complex inverse problems in engineering design with discrete and continuous variables. We focus on the path synthesis problem for planar linkage mechanisms. By leveraging a multi-modal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms, enabling rapid retrieval from a vast dataset of over 10 million mechanisms. This approach improves precision through the warm start of a hierarchical unconstrained nonlinear optimization algorithm, combining the robustness of traditional optimization with the speed and adaptability of modern deep learning methods. Our results on an existing benchmark demonstrate that LInK outperforms existing methods with 28 times less error compared to a state-of-the-art approach while taking 20 times less time on an existing benchmark. Moreover, we introduce a significantly more challenging benchmark, named LINK-ABC, which involves synthesizing linkages that trace the trajectories of English capital alphabets - an inverse design benchmark task that existing methods struggle with due to large non-linearities and tiny feasible space. Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering.

Updated: 2024-05-31 03:04:57

标题: LInK: 通过对比学习学习机构合成中设计和性能空间的联合表示

摘要: 在本文中，我们介绍了LInK，这是一个新颖的框架，将性能和设计空间的对比学习与优化技术相结合，用于解决工程设计中离散和连续变量的复杂逆问题。我们专注于平面连杆机构的路径合成问题。通过利用多模态和变换不变性对比学习框架，LInK学习了一个捕捉机构复杂物理和设计表示的联合表示，使得能够从超过1000万个机构的庞大数据集中快速检索。这种方法通过层次无约束非线性优化算法的热启动来提高精度，结合传统优化的稳健性和现代深度学习方法的速度和适应性。我们在现有基准上的结果表明，与最先进的方法相比，LInK的误差减少了28倍，而在现有基准上花费的时间减少了20倍。此外，我们引入了一个更具挑战性的基准，名为LINK-ABC，涉及合成能够跟踪英文大写字母轨迹的连杆 - 这是一个逆向设计基准任务，由于非常非线性和微小的可行空间，现有方法很难解决。我们的结果表明，LInK不仅推动了机构设计领域的发展，还将对比学习和优化的适用范围扩展到工程的其他领域。

更新时间: 2024-05-31 03:04:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20592v1

Weak-Form Inference for Hybrid Dynamical Systems in Ecology

Species subject to predation and environmental threats commonly exhibit variable periods of population boom and bust over long timescales. Understanding and predicting such behavior, especially given the inherent heterogeneity and stochasticity of exogenous driving factors over short timescales, is an ongoing challenge. A modeling paradigm gaining popularity in the ecological sciences for such multi-scale effects is to couple short-term continuous dynamics to long-term discrete updates. We develop a data-driven method utilizing weak-form equation learning to extract such hybrid governing equations for population dynamics and to estimate the requisite parameters using sparse intermittent measurements of the discrete and continuous variables. The method produces a set of short-term continuous dynamical system equations parametrized by long-term variables, and long-term discrete equations parametrized by short-term variables, allowing direct assessment of interdependencies between the two time scales. We demonstrate the utility of the method on a variety of ecological scenarios and provide extensive tests using models previously derived for epizootics experienced by the North American spongy moth (Lymantria dispar dispar).

Updated: 2024-05-31 03:03:27

标题: 在生态学中混合动力系统的弱形推断

摘要: 受捕食和环境威胁影响的物种通常在长时间尺度上表现出不断变化的种群繁荣和崩溃周期。理解和预测这种行为，特别是考虑到短时间尺度上外部驱动因素的异质性和随机性，是一个持续的挑战。在生态科学中，针对这种多尺度效应日益受欢迎的建模范式是将短期连续动态与长期离散更新相结合。我们开发了一种利用弱形式方程学习的数据驱动方法，用于提取人口动态的混合控制方程，并利用离散和连续变量的稀疏间歇测量来估计所需参数。该方法生成了一组由长期变量参数化的短期连续动力系统方程，以及由短期变量参数化的长期离散方程，从而允许直接评估两个时间尺度之间的相互依赖关系。我们在各种生态情景下展示了该方法的实用性，并使用先前为北美毛毛虫（Lymantria dispar dispar）经历的流行病学模型进行了广泛测试。

更新时间: 2024-05-31 03:03:27

领域: q-bio.PE,cs.LG,math.DS

下载: http://arxiv.org/abs/2405.20591v1

Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction

Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling. Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity, often constrained by data scarcity and class imbalance, particularly for rare events like solar flares. Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored. In this study, we propose a novel data augmentation method for time series data named Mean Gaussian Noise (MGN). We investigate the performance of MGN compared to eight existing basic data augmentation methods on a multivariate time series dataset for solar flare prediction, SWAN-SF, using a ML algorithm for time series data, TimeSeriesSVC. The results demonstrate the efficacy of MGN and highlight its potential for improving classification performance in scenarios with extremely imbalanced data. Our time complexity analysis shows that MGN also has a competitive computational cost compared to the investigated alternative methods.

Updated: 2024-05-31 03:03:19

标题: 基于类别的时间序列数据增强以减轻太阳耀斑预测中的极端类别不平衡问题

摘要: 时间序列数据在各个领域中扮演着至关重要的角色，对于决策和预测建模具有重要价值。机器学习（ML）和深度学习（DL）在这方面显示出了潜力，但它们的性能取决于数据的质量和数量，通常受到数据稀缺和类别不平衡的限制，特别是对于像太阳耀斑这样的罕见事件。数据增强技术提供了一个潜在的解决方案来解决这些挑战，然而它们在多变量时间序列数据集上的效果仍未被充分探讨。在本研究中，我们提出了一种新的时间序列数据增强方法，命名为均值高斯噪声（MGN）。我们比较了MGN与八种现有基本数据增强方法在多变量时间序列数据集SWAN-SF（用于太阳耀斑预测）上的性能，使用了一个针对时间序列数据的机器学习算法TimeSeriesSVC。结果显示MGN的有效性，并突显其在极度不平衡数据场景中改善分类性能的潜力。我们的时间复杂度分析表明，与研究的替代方法相比，MGN在计算成本上也具有竞争力。

更新时间: 2024-05-31 03:03:19

领域: cs.LG,astro-ph.IM,astro-ph.SR,cs.AI

下载: http://arxiv.org/abs/2405.20590v1

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.

Updated: 2024-05-31 02:59:25

标题: 容量异质化条件下的个性化联邦学习中的选择性知识共享

摘要: 联邦学习（FL）有望从协作训练容量异构模型中获得显着优势，从而实现利用来自低容量设备的私有数据和计算能力。然而，基于客户特定数据的个性化容量异构模型的重点受到限制，导致本地模型效用不佳，特别是对于低容量客户。数据和设备容量的异质性对模型个性化构成两个关键挑战：1）准确保留嵌入在每个客户的缩减子模型中的必要知识，以及2）通过聚合大小变化的参数有效共享知识。为此，我们介绍了Pa3dFL，这是一个旨在通过解耦和选择性共享容量异构模型之间的知识来增强本地模型性能的新框架。首先，我们将模型的每一层分解为通用参数和个人参数。然后，我们在客户端之间保持通用参数的统一大小，并通过直接平均对它们进行聚合。随后，我们使用超网络通过可学习的嵌入为客户生成大小变化的个人参数。最后，我们通过自注意模块通过聚合客户嵌入促进个人参数的隐式聚合。我们在三个数据集上进行了大量实验，以评估Pa3dFL的有效性。我们的研究结果表明，与各种异质性设置中的基线方法相比，Pa3dFL始终表现优异。此外，与基线方法相比，Pa3dFL展示了竞争力的通信和计算效率，突显了其在逆境系统条件下的实用性和适应性。

更新时间: 2024-05-31 02:59:25

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2405.20589v1

Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data

Model averaging is a widely adopted technique in federated learning (FL) that aggregates multiple client models to obtain a global model. Remarkably, model averaging in FL yields a superior global model, even when client models are trained with non-convex objective functions and on heterogeneous local datasets. However, the rationale behind its success remains poorly understood. To shed light on this issue, we first visualize the loss landscape of FL over client and global models to illustrate their geometric properties. The visualization shows that the client models encompass the global model within a common basin, and interestingly, the global model may deviate from the basin's center while still outperforming the client models. To gain further insights into model averaging in FL, we decompose the expected loss of the global model into five factors related to the client models. Specifically, our analysis reveals that the global model loss after early training mainly arises from \textit{i)} the client model's loss on non-overlapping data between client datasets and the global dataset and \textit{ii)} the maximum distance between the global and client models. Based on the findings from our loss landscape visualization and loss decomposition, we propose utilizing iterative moving averaging (IMA) on the global model at the late training phase to reduce its deviation from the expected minimum, while constraining client exploration to limit the maximum distance between the global and client models. Our experiments demonstrate that incorporating IMA into existing FL methods significantly improves their accuracy and training speed on various heterogeneous data setups of benchmark datasets. Code is available at \url{https://github.com/TailinZhou/FedIMA}.

Updated: 2024-05-31 02:57:28

标题: 理解和改进异构数据上联邦学习中的模型平均

摘要: 模型平均是联邦学习（FL）中广泛采用的技术，它聚合多个客户端模型以获得全局模型。值得注意的是，在FL中进行模型平均会产生更优秀的全局模型，即使客户端模型是使用非凸目标函数和异构本地数据集训练的。然而，其成功背后的原因仍然知之甚少。为了阐明这个问题，我们首先可视化FL在客户端和全局模型上的损失景观，以展示它们的几何特性。可视化显示客户端模型包含在一个共同盆地内的全局模型，并且有趣的是，全局模型可能偏离盆地中心但仍然胜过客户端模型。为了进一步了解FL中的模型平均，我们将全局模型的预期损失分解为与客户端模型相关的五个因素。具体而言，我们的分析表明，在早期训练后的全局模型损失主要来自于i)客户端模型在客户端数据集和全局数据集之间不重叠数据上的损失，以及ii)全局模型和客户端模型之间的最大距离。基于我们损失景观可视化和损失分解的发现，我们提出在后期训练阶段对全局模型进行迭代移动平均（IMA），以减少其偏离预期最小值，同时限制客户端探索以限制全局模型和客户端模型之间的最大距离。我们的实验表明，将IMA纳入现有的FL方法显著提高了它们在各种异构数据设置的基准数据集上的准确性和训练速度。代码可在\url{https://github.com/TailinZhou/FedIMA}找到。

更新时间: 2024-05-31 02:57:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.07845v4

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models have been extensively used in data generation tasks and are recognized as one of the best generative models. However, their time-consuming deployment, long inference time, and requirements on large memory limit their application on mobile devices. In this paper, we propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models. Specifically, we add sparse masks to the Convolution and Linear layers in a pre-trained diffusion model, then use design progressive sparsity for model training in the fine-tuning stage, and switch the inference mask on and off, which supports a flexible choice of sparsity during inference according to the FID and MACs requirements. Experiments on four datasets conducted on a state-of-the-art Transformer-based diffusion model demonstrate that our method reduces MACs by $50\%$ while increasing FID by only 1.5 on average. Under other MACs conditions, the FID is also lower than 1$\sim$137 compared to other methods.

Updated: 2024-05-31 02:56:14

标题: SparseDM:朝向稀疏高效扩散模型

摘要: 扩散模型在数据生成任务中被广泛使用，并被认为是最佳生成模型之一。然而，它们耗时部署、推理时间长，以及对大内存的要求限制了它们在移动设备上的应用。本文提出了一种基于改进的直通估计器的方法，以提高扩散模型的部署效率。具体地，我们在预训练的扩散模型中的卷积和线性层中添加稀疏掩码，然后在微调阶段使用设计渐进稀疏性进行模型训练，并在推理阶段切换掩码的开关，根据FID和MACs的需求灵活选择稀疏性。在一个基于最先进的基于Transformer的扩散模型上进行的四个数据集实验表明，我们的方法将MACs减少了50%，同时平均增加了仅1.5的FID。在其他MACs条件下，与其他方法相比，FID也低于1~137。

更新时间: 2024-05-31 02:56:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.10445v2

GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models

In the rapidly evolving field of healthcare and beyond, the integration of generative AI in Electronic Health Records (EHRs) represents a pivotal advancement, addressing a critical gap in current information extraction techniques. This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs) to efficiently extract entities from medical narratives and unstructured text generated throughout various phases of the patient hospital visit. By addressing the significant challenge of processing unstructured medical text, GAMedX leverages the capabilities of generative AI and LLMs for improved data extraction. Employing a unified approach, the methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon. The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98\%. This innovation enhances entity extraction, offering a scalable, cost-effective solution for automated forms filling from unstructured data. As a result, GAMedX streamlines the processing of unstructured narratives, and sets a new standard in NER applications, contributing significantly to theoretical and practical advancements beyond the medical technology sphere.

Updated: 2024-05-31 02:53:22

标题: GAMedX: 基于生成式人工智能的医疗实体数据提取器，利用大型语言模型

摘要: 在迅速发展的医疗保健领域及其他领域中，将生成式人工智能（AI）整合到电子健康记录（EHRs）中代表了一个重要的进步，解决了当前信息提取技术中的一个关键缺口。本文介绍了GAMedX，一种利用大型语言模型（LLMs）的命名实体识别（NER）方法，可以高效地从医学叙述和患者住院期间生成的非结构化文本中提取实体。通过解决处理非结构化医学文本的重大挑战，GAMedX利用生成式AI和LLMs的能力进行改进的数据提取。采用统一的方法，该方法整合了开源LLMs用于NER，利用链式提示和Pydantic模式来生成结构化输出，以应对专业医学术语的复杂性。研究结果显示，在一个评估数据集上获得了显著的ROUGE F1分数，准确率为98\%。这一创新提升了实体提取的能力，为从非结构化数据中自动填充表单提供了一种可扩展、成本效益的解决方案。因此，GAMedX简化了非结构化叙述的处理流程，并在NER应用中树立了新的标准，显著促进了超越医疗技术领域的理论和实践进步。

更新时间: 2024-05-31 02:53:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20585v1

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average.

Updated: 2024-05-31 02:45:31

标题: 破坏扩散：针对基于扩散的定制化的令牌级注意力抹除攻击

摘要: 随着基于扩散的定制方法（如DreamBooth）的发展，个人现在可以训练能够生成他们个性化图像的模型。尽管这种便利，恶意用户已经滥用这些技术来创建虚假图像，从而引发隐私安全危机。鉴于此，提出了主动对抗攻击来保护用户免受定制的影响。对抗样本被训练以扭曲定制模型的输出，从而阻止滥用。在本文中，我们提出了DisDiff（Disrupting Diffusion），一种新颖的对抗攻击方法，以扰乱扩散模型的输出。我们首先深入研究了内在的图像-文本关系，即众所周知的交叉注意力，经验性地发现主题标识符令牌在引导图像生成方面起着重要作用。因此，我们提出了交叉注意力消除模块，以显式地“擦除”指示的注意力地图，并扰乱文本引导。此外，我们分析了扩散模型的采样过程对投影梯度下降（PGD）攻击的影响，并引入了一种新颖的梅里特采样调度器，以自适应地调节扰动更新振幅的步态方式。我们的DisDiff在两个面部基准和两个常用提示上的FDFR分数和ISM分数平均超过了现有技术方法12.75%和7.25%。

更新时间: 2024-05-31 02:45:31

领域: cs.CV,cs.AI,I.2.10

下载: http://arxiv.org/abs/2405.20584v1

Graph Convolutions Enrich the Self-Attention in Transformers!

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph regression, speech recognition, and code classification.

Updated: 2024-05-31 02:42:19

标题: 图卷积丰富了变压器中的自注意力机制!

摘要: 变压器以其自注意机制而闻名，在自然语言处理、计算机视觉、时间序列建模等各种任务中取得了最先进的性能。然而，深度变压器模型面临的挑战之一是过度平滑问题，即各层之间的表示收敛到不可区分的值，导致性能显著下降。我们将原始的自注意机制解释为简单的图滤波器，并从图信号处理（GSP）的角度重新设计它。我们提出了一种基于图滤波器的自注意（GFSA）来学习一个普遍而有效的自注意机制，其复杂性略高于原始的自注意机制。我们证明GFSA可以提高变压器在计算机视觉、自然语言处理、图回归、语音识别和代码分类等各个领域的性能。

更新时间: 2024-05-31 02:42:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.04234v4

Experimental Design for Active Transductive Inference in Large Language Models

One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD). We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. The training examples are initially unlabeled and we obtain the label of the most informative ones, which maximally reduces uncertainty in the LLM prediction. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen. We analyze these algorithms in linear models: first GO and then use its equivalence with SAL. We experiment with many different tasks in small, medium-sized, and large language models; and show that GO and SAL outperform other methods for choosing few-shot examples in the LLM prompt at inference time.

Updated: 2024-05-31 02:37:10

标题: 大语言模型中主动传导推断的实验设计

摘要: 大型语言模型（LLMs）的一个新兴能力是在推断时可以将特定查询示例包含在提示中。在这项工作中，我们使用主动学习来进行自适应提示设计，并将其称为主动上下文提示设计（AIPD）。我们通过从训练集中自适应选择少量示例来设计LLM提示，以优化在测试集上的性能。训练示例最初是未标记的，我们获得最具信息量的示例的标签，最大程度地减少LLM预测中的不确定性。我们提出了两种算法，GO和SAL，它们在选择少量示例的方式上有所不同。我们首先在线性模型中分析这些算法：首先是GO，然后使用其与SAL的等效性。我们在小型、中型和大型语言模型中尝试了许多不同的任务，并展示了GO和SAL在推断时选择少量示例的LLM提示方面优于其他方法。

更新时间: 2024-05-31 02:37:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08846v2

The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes

In psychiatry, negative patient descriptions and stigmatizing language can contribute to healthcare disparities in two ways: (1) read by patients they can harm their trust and engagement with the medical center; (2) read by future providers they may negatively influence the future perspective of a patient. By leveraging large language models, this work aims to identify the sentiment expressed in psychiatric clinical notes based on the reader's point of view. Extracting sentences from the Mount Sinai Health System's large and diverse clinical notes, we used prompts and in-context learning to adapt three large language models (GPT-3.5, Llama 2, Mistral) to classify the sentiment conveyed by the sentences according to the provider or non-provider point of view. Results showed that GPT-3.5 aligns best to provider point of view, whereas Mistral aligns best to non-provider point of view.

Updated: 2024-05-31 02:28:41

标题: 情感视角：向精神病学笔记中的临床医生偏见检测迈进

摘要: 在精神病学中，消极的病人描述和带有污名化语言可能会以两种方式导致医疗差异：（1）病人阅读时可能会损害他们对医疗中心的信任和参与度；（2）未来的医疗提供者阅读时可能会对患者的未来观点产生负面影响。通过利用大型语言模型，本研究旨在根据读者的观点识别精神病临床笔记中表达的情感。从西奈山医疗系统的大量和多样化的临床笔记中提取句子，我们使用提示和上下文学习来调整三个大型语言模型（GPT-3.5，Llama 2，Mistral），以根据提供者或非提供者的观点对句子传达的情感进行分类。结果显示，GPT-3.5最符合提供者的观点，而Mistral最符合非提供者的观点。

更新时间: 2024-05-31 02:28:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20582v1

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

Path planning plays a pivotal role in automated parking, yet current methods struggle to efficiently handle the intricate and diverse parking scenarios. One potential solution is the reinforcement learning-based method, leveraging its exploration in unrecorded situations. However, a key challenge lies in training reinforcement learning methods is the inherent randomness in converging to a feasible policy. This paper introduces a novel solution, the Hybrid POlicy Path plannEr (HOPE), which integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. The paper presents a method to calculate and implement an action mask mechanism in path planning, significantly boosting the efficiency and effectiveness of reinforcement learning training. A transformer is employed as the network structure to fuse environmental information and generate planned paths. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showcasing higher planning success rates and generalization across various scenarios. The code for our solution will be openly available on \href{GitHub}{https://github.com/jiamiya/HOPE}. % after the paper's acceptance.

Updated: 2024-05-31 02:17:51

标题: 希望：基于强化学习的多样停车场景混合策略路径规划器

摘要: 路径规划在自动停车中发挥着关键作用，然而当前的方法往往难以有效处理复杂多样的停车场景。一种潜在的解决方案是基于强化学习的方法，利用其在未记录情况下的探索能力。然而，训练强化学习方法面临的一个关键挑战在于收敛到可行策略时固有的随机性。本文介绍了一种新颖的解决方案，混合策略路径规划器（HOPE），将强化学习代理与Reeds-Shepp曲线集成在一起，实现有效规划各种场景。文章提出了一种计算和实现动作屏蔽机制的方法，显著提高了强化学习训练的效率和有效性。采用变压器作为网络结构，融合环境信息并生成计划路径。为了促进所提出规划器的训练和评估，我们提出了一种根据空间和障碍物分布对停车场景难度级别进行分类的标准。实验结果表明，我们的方法优于典型的基于规则的算法和传统的强化学习方法，展示了更高的规划成功率和在各种场景下的泛化能力。我们的解决方案的代码将在文章被接受后公开在GitHub上（https://github.com/jiamiya/HOPE）。

更新时间: 2024-05-31 02:17:51

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.20579v1

Robustifying Safety-Aligned Large Language Models through Clean Data Curation

Large language models (LLMs) are vulnerable when trained on datasets containing harmful content, which leads to potential jailbreaking attacks in two scenarios: the integration of harmful texts within crowdsourced data used for pre-training and direct tampering with LLMs through fine-tuning. In both scenarios, adversaries can compromise the safety alignment of LLMs, exacerbating malfunctions. Motivated by the need to mitigate these adversarial influences, our research aims to enhance safety alignment by either neutralizing the impact of malicious texts in pre-training datasets or increasing the difficulty of jailbreaking during downstream fine-tuning. In this paper, we propose a data curation framework designed to counter adversarial impacts in both scenarios. Our method operates under the assumption that we have no prior knowledge of attack details, focusing solely on curating clean texts. We introduce an iterative process aimed at revising texts to reduce their perplexity as perceived by LLMs, while simultaneously preserving their text quality. By pre-training or fine-tuning LLMs with curated clean texts, we observe a notable improvement in LLM robustness regarding safety alignment against harmful queries. For instance, when pre-training LLMs using a crowdsourced dataset containing 5\% harmful instances, adding an equivalent amount of curated texts significantly mitigates the likelihood of providing harmful responses in LLMs and reduces the attack success rate by 71\%. Our study represents a significant step towards mitigating the risks associated with training-based jailbreaking and fortifying the secure utilization of LLMs.

Updated: 2024-05-31 02:09:51

标题: 通过干净数据整理强化与安全对齐的大型语言模型

摘要: 大型语言模型（LLMs）在训练时使用包含有害内容的数据集时容易受到攻击，这导致潜在的越狱攻击出现在两种情况下：在用于预训练的众包数据中集成有害文本，以及通过微调直接篡改LLMs。在这两种情况下，对手可以破坏LLMs的安全对齐，加剧故障。基于缓解这些对抗性影响的需求，我们的研究旨在通过中和预训练数据集中恶意文本的影响或增加下游微调期间的越狱难度来增强安全对齐。在本文中，我们提出了一个旨在应对这两种情况下对抗性影响的数据策划框架。我们的方法基于这样一个假设，即我们没有攻击细节的先验知识，只专注于策划干净的文本。我们介绍了一个迭代过程，旨在修改文本以减少LLMs感知到的困惑度，同时保持其文本质量。通过使用策划干净文本进行预训练或微调LLMs，我们观察到LLMs在对抗有害查询方面的安全对齐性显著提高。例如，当使用包含5％有害实例的众包数据集进行LLMs的预训练时，添加相同数量的策划文本显著减少了LLMs提供有害响应的可能性，并将攻击成功率降低了71％。我们的研究代表了缓解基于训练的越狱风险并加强LLMs安全利用的重要一步。

更新时间: 2024-05-31 02:09:51

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.19358v2

Federated Graph Analytics with Differential Privacy

Collaborative graph analysis across multiple institutions is becoming increasingly popular. Realistic examples include social network analysis across various social platforms, financial transaction analysis across multiple banks, and analyzing the transmission of infectious diseases across multiple hospitals. We define the federated graph analytics, a new problem for collaborative graph analytics under differential privacy. Although differentially private graph analysis has been widely studied, it fails to achieve a good tradeoff between utility and privacy in federated scenarios, due to the limited view of local clients and overlapping information across multiple subgraphs. Motivated by this, we first propose a federated graph analytic framework, named FEAT, which enables arbitrary downstream common graph statistics while preserving individual privacy. Furthermore, we introduce an optimized framework based on our proposed degree-based partition algorithm, called FEAT+, which improves the overall utility by leveraging the true local subgraphs. Finally, extensive experiments demonstrate that our FEAT and FEAT+ significantly outperform the baseline approach by approximately one and four orders of magnitude, respectively.

Updated: 2024-05-31 02:09:43

标题: 带有差分隐私的联合图分析

摘要: 跨多个机构进行协作图分析正变得越来越流行。现实示例包括在各种社交平台上进行社交网络分析、跨多家银行进行金融交易分析，以及在多家医院中分析传染病的传播。我们定义了联合图分析，这是在差分隐私下进行协作图分析的一个新问题。尽管差分隐私图分析已经得到广泛研究，但在联合情景中，由于本地客户的有限视角和多个子图之间的重叠信息，它未能在效用和隐私之间取得良好的平衡。受此启发，我们首先提出了一个名为FEAT的联合图分析框架，它能够保护个人隐私同时实现任意下游常见图统计。此外，我们还介绍了一个基于我们提出的基于度的分区算法的优化框架，称为FEAT+，通过利用真实的本地子图来提高整体效用。最后，广泛的实验表明，我们的FEAT和FEAT+分别比基线方法显著优越约一个和四个数量级。

更新时间: 2024-05-31 02:09:43

领域: cs.CR

下载: http://arxiv.org/abs/2405.20576v1

Mixed-Integer Optimal Control via Reinforcement Learning: A Case Study on Hybrid Electric Vehicle Energy Management

Many optimal control problems require the simultaneous output of discrete and continuous control variables. These problems are usually formulated as mixed-integer optimal control (MIOC) problems, which are challenging to solve due to the complexity of the solution space. Numerical methods such as branch-and-bound are computationally expensive and undesirable for real-time control. This paper proposes a novel hybrid-action reinforcement learning (HARL) algorithm, twin delayed deep deterministic actor-Q (TD3AQ), for MIOC problems. TD3AQ combines the advantages of both actor-critic and Q-learning methods, and can handle the discrete and continuous action spaces simultaneously. The proposed algorithm is evaluated on a plug-in hybrid electric vehicle (PHEV) energy management problem, where real-time control of the discrete variables, clutch engagement/disengagement and gear shift, and continuous variable, engine torque, is essential to maximize fuel economy while satisfying driving constraints. Simulation outcomes demonstrate that TD3AQ achieves control results close to optimality when compared with dynamic programming (DP), with just 4.69% difference. Furthermore, it surpasses the performance of baseline reinforcement learning algorithms.

Updated: 2024-05-31 02:07:42

标题: 混合整数最优控制的强化学习：混合电动汽车能量管理案例研究

摘要: 许多最优控制问题需要同时输出离散和连续控制变量。这些问题通常被制定为混合整数最优控制（MIOC）问题，由于解决空间的复杂性，这些问题具有挑战性。数值方法如分支定界计算成本高昂且不适合实时控制。本文提出了一种新颖的混合行动强化学习（HARL）算法，双延迟深度确定性演员-评论家-Q（TD3AQ），用于MIOC问题。TD3AQ结合了演员-评论家和Q学习方法的优点，可以同时处理离散和连续行动空间。该算法在插电式混合动力电动汽车（PHEV）能源管理问题上进行了评估，其中对离散变量（离合器接合/分离和换挡）和连续变量（发动机扭矩）的实时控制对于最大化燃油经济性并满足驾驶约束至关重要。模拟结果表明，与动态规划（DP）相比，TD3AQ实现了接近最优的控制结果，仅相差4.69%。此外，它超过了基线强化学习算法的性能。

更新时间: 2024-05-31 02:07:42

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2305.01461v3

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.

Updated: 2024-05-31 02:05:45

标题: 开放式Ko-LLM排行榜：使用Ko-H5基准评估韩语大型语言模型

摘要: 本文介绍了Open Ko-LLM排行榜和Ko-H5基准作为评估韩文大型语言模型（LLMs）的重要工具。在模仿英文Open LLM排行榜的同时，结合私有测试集，我们建立了一个健全的评估框架，已经在韩文LLM社区中得到很好的整合。我们进行了数据泄漏分析，显示了私有测试集的好处，以及在Ko-H5基准内的相关性研究和Ko-H5得分的时间分析。此外，我们提出了需要超越固定基准的经验支持。我们希望Open Ko-LLM排行榜能够开创扩展LLM评估以促进更多语言多样性的先例。

更新时间: 2024-05-31 02:05:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20574v1

Optimal Design for Human Feedback

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study the problem of data collection for learning preference models. The key idea in our work is to generalize the optimal design, a method for computing information gathering policies, to ranked lists. To show the generality of our ideas, we study both absolute and relative feedback on the lists. We design efficient algorithms for both settings and analyze them. We prove that our preference model estimators improve with more data and so does the ranking error under the estimators. Finally, we experiment with several synthetic and real-world datasets to show the statistical efficiency of our algorithms.

Updated: 2024-05-31 02:04:44

标题: 人类反馈的最佳设计

摘要: 学习偏好模型是近期人工智能进展的核心。受到获取高质量人类注释成本的启发，我们研究了学习偏好模型的数据收集问题。我们工作的关键思想是将最优设计概括为一种计算信息收集策略的方法，应用于排名列表。为了展示我们的思想的普适性，我们研究了列表上的绝对和相对反馈。我们设计了高效的算法用于两种情景，并对其进行了分析。我们证明，随着数据量的增加，我们的偏好模型估计器会提高，估计器下的排名误差也会随之减少。最后，我们通过几个合成和真实世界数据集的实验来展示我们算法的统计效率。

更新时间: 2024-05-31 02:04:44

领域: cs.LG

下载: http://arxiv.org/abs/2404.13895v2

On the universality of $S_n$-equivariant $k$-body gates

The importance of symmetries has recently been recognized in quantum machine learning from the simple motto: if a task exhibits a symmetry (given by a group $\mathfrak{G}$), the learning model should respect said symmetry. This can be instantiated via $\mathfrak{G}$-equivariant Quantum Neural Networks (QNNs), i.e., parametrized quantum circuits whose gates are generated by operators commuting with a given representation of $\mathfrak{G}$. In practice, however, there might be additional restrictions to the types of gates one can use, such as being able to act on at most $k$ qubits. In this work we study how the interplay between symmetry and $k$-bodyness in the QNN generators affect its expressiveness for the special case of $\mathfrak{G}=S_n$, the symmetric group. Our results show that if the QNN is generated by one- and two-body $S_n$-equivariant gates, the QNN is semi-universal but not universal. That is, the QNN can generate any arbitrary special unitary matrix in the invariant subspaces, but has no control over the relative phases between them. Then, we show that in order to reach universality one needs to include $n$-body generators (if $n$ is even) or $(n-1)$-body generators (if $n$ is odd). As such, our results brings us a step closer to better understanding the capabilities and limitations of equivariant QNNs.

Updated: 2024-05-31 02:04:30

标题: 关于$S_n$-等变$k$-体门的普适性

摘要: 最近在量子机器学习中认识到了对称性的重要性，简单的格言是：如果一个任务具有对称性（由群$\mathfrak{G}$给定），学习模型应该尊重该对称性。这可以通过$\mathfrak{G}$-等变量量神经网络（QNNs）实现，即由与$\mathfrak{G}$的一个给定表示对易的算子生成的参数化量子电路。然而，在实践中，可能会有使用的门的类型受到额外限制，比如最多能够作用于$k$个量子比特。在这项工作中，我们研究了对称性和QNN生成器中的$k$体性之间的相互作用如何影响其对于$\mathfrak{G}=S_n$，即对称群的特殊情况的表现能力。我们的结果表明，如果QNN由一体和二体$S_n$-等变门生成，那么QNN是半通用的但不是通用的。也就是说，QNN可以在不变子空间中生成任意的特殊酉矩阵，但无法控制它们之间的相对相位。接着，我们表明为了达到通用性，需要包含$n$体生成器（如果$n$是偶数）或$(n-1)$体生成器（如果$n$是奇数）。因此，我们的结果让我们更接近于更好地理解等变QNN的能力和限制。

更新时间: 2024-05-31 02:04:30

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2303.00728v2

Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational Autoencoders

In recent years, deep generative models have been successfully adopted for various molecular design tasks, particularly in the life and material sciences. A critical challenge for pre-trained generative molecular design (GMD) models is to fine-tune them to be better suited for downstream design tasks aimed at optimizing specific molecular properties. However, redesigning and training an existing effective generative model from scratch for each new design task is impractical. Furthermore, the black-box nature of typical downstream tasks$\unicode{x2013}$such as property prediction$\unicode{x2013}$makes it nontrivial to optimize the generative model in a task-specific manner. In this work, we propose a novel approach for a model uncertainty-guided fine-tuning of a pre-trained variational autoencoder (VAE)-based GMD model through performance feedback in an active learning setting. The main idea is to quantify model uncertainty in the generative model, which is made efficient by working within a low-dimensional active subspace of the high-dimensional VAE parameters explaining most of the variability in the model's output. The inclusion of model uncertainty expands the space of viable molecules through decoder diversity. We then explore the resulting model uncertainty class via black-box optimization made tractable by low-dimensionality of the active subspace. This enables us to identify and leverage a diverse set of high-performing models to generate enhanced molecules. Empirical results across six target molecular properties, using multiple VAE-based generative models, demonstrate that our uncertainty-guided fine-tuning approach consistently outperforms the original pre-trained models.

Updated: 2024-05-31 02:00:25

标题: 通过不确定性引导的变分自动编码器微调，增强生成式分子设计

摘要: 近年来，深度生成模型已成功应用于各种分子设计任务，特别是在生命科学和材料科学领域。预训练的生成分子设计（GMD）模型面临的一个关键挑战是将它们微调，使其更适合于旨在优化特定分子性质的下游设计任务。然而，为每个新设计任务从头开始重新设计和训练一个现有的有效生成模型是不现实的。此外，典型下游任务（如属性预测）的黑盒特性使得以任务特定方式优化生成模型变得不容易。在这项工作中，我们提出了一种新颖的方法，通过性能反馈在主动学习环境中对预训练的基于变分自编码器（VAE）的GMD模型进行模型不确定性引导的微调。主要思想是量化生成模型中的模型不确定性，通过在高维VAE参数的低维主动子空间中进行工作，解释模型输出中的大部分变异性，使其更加高效。模型不确定性的引入通过解码器多样性扩展了可行分子的空间。然后，通过主动子空间的低维度，探索由黑盒优化生成的模型不确定性类，使其易于处理。这使我们能够识别和利用一组性能优异的多样化模型来生成增强的分子。通过使用多个基于VAE的生成模型，跨六个目标分子属性的经验结果表明，我们的不确定性引导微调方法始终优于原始的预训练模型。

更新时间: 2024-05-31 02:00:25

领域: cs.LG,q-bio.BM,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2405.20573v1

Iterative Feature Boosting for Explainable Speech Emotion Recognition

In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.

Updated: 2024-05-31 01:59:20

标题: 迭代特征增强用于可解释的语音情感识别

摘要: 在语音情感识别（SER）中，如果使用预定义特征而不考虑它们的实际重要性，可能会导致高维数据集，其中包含冗余和无关信息。因此，高维学习往往会导致模型准确性降低，同时增加计算复杂性。我们的工作强调了仔细考虑和分析特征以构建高效的SER系统的重要性。我们提出了一种基于高效特征工程方法的新型监督式SER方法。我们特别关注结果的解释性，以评估特征相关性并完善特征集。通过特征评估循环迭代地进行，使用Shapley值来提升特征选择并改善整体框架性能。我们的方法因此能够平衡模型性能和透明度之间的收益。所提出的方法在TESS数据集上的情感识别中优于人类水平性能（HLP）和最先进的机器学习方法。

更新时间: 2024-05-31 01:59:20

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS,I.2.7; I.2.6; I.2.1; I.2.8

下载: http://arxiv.org/abs/2405.20172v2

Data Cleaning and Machine Learning: A Systematic Literature Review

Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.

Updated: 2024-05-31 01:39:49

标题: 数据清洗与机器学习：系统文献综述

摘要: 背景：机器学习（ML）被整合到越来越多的系统中，用于各种应用。因为ML模型的性能高度依赖于其训练数据的质量，所以人们对检测和修复数据错误（即数据清洗）的方法越来越感兴趣。研究人员还在探索如何利用ML进行数据清洗；因此，ML和数据清洗之间存在双重关系。据我们所知，尚无研究全面审查这种关系。目标：本文的目标是双重的。首先，它旨在总结最新的数据清理方法，包括ML的数据清理和数据清理的ML。其次，提供未来工作建议。方法：我们对2016年至2022年间发表的论文进行了系统文献综述。我们确定了不同类型的数据清洗活动，包括特征清洗、标签清洗、实体匹配、异常值检测、填充和整体数据清洗。结果：我们总结了101篇论文的内容，涵盖了各种数据清洗活动，并提供了24条未来工作建议。我们的综述突出了许多有前途的数据清洗技术，可以进一步扩展。结论：我们相信我们对文献的审查将有助于社区开发更好的数据清洗方法。

更新时间: 2024-05-31 01:39:49

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2310.01765v2

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantly, their frameworks are mainly designed for 2D models, lacking a general any-to-3D paradigm. In this paper, we introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding. Given a frozen transformer from any source modality, we propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality. This mechanism enables us to assign each 3D token with a positional encoding paired with the pre-trained model, which avoids 3D geometry loss caused by the true projection and better motivates the transformer for 3D learning with 1D/2D positional priors. Then, within each transformer block, we insert an any-to-3D guided adapter module for parameter-efficient fine-tuning. The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers. We conduct extensive experiments to showcase the effectiveness and efficiency of our method. Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point.

Updated: 2024-05-31 01:36:53

标题: Any2Point：为高效的3D理解赋能任意模态大型模型

摘要: 最近，大型基础模型已成为研究的一个突出焦点，在各种场景中取得了卓越的性能。由于3D数据的稀缺性，许多工作已经开始将从视觉到3D领域的预训练transformers进行调整。然而，这种从2D到3D的方法仍然存在局限性，可能会导致空间几何丢失和高计算成本。更重要的是，它们的框架主要设计用于2D模型，缺乏通用的任意到3D范例。在本文中，我们介绍了Any2Point，一种参数高效的方法，用于赋能任何模态的大型模型（视觉、语言、音频）进行3D理解。给定来自任何源模态的冻结transformer，我们提出了一种3D到任意（1D或2D）的虚拟投影策略，将输入的3D点与源模态内的原始1D或2D位置相关联。这种机制使我们能够为每个3D令牌分配一个与预训练模型配对的位置编码，避免了由真实投影引起的3D几何丢失，并更好地激励transformer进行带有1D/2D位置先验的3D学习。然后，在每个transformer块中，我们插入一个任意到3D引导适配器模块，用于参数高效的微调。该适配器结合了来自源模态的先验空间知识，引导3D令牌的局部特征聚合，促使任何模态transformers的语义适应。我们进行了广泛的实验，展示了我们方法的有效性和效率。代码和模型发布在https://github.com/Ivan-Tang-3D/Any2Point。

更新时间: 2024-05-31 01:36:53

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.07989v2

KerasCV and KerasNLP: Vision and Language Power-Ups

We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction, we provide building blocks for creating models and data preprocessing pipelines, and at the library's highest level of abstraction, we provide pretrained ``task" models for popular architectures such as Stable Diffusion, YOLOv8, GPT2, BERT, Mistral, CLIP, Gemma, T5, etc. Task models have built-in preprocessing, pretrained weights, and can be fine-tuned on raw inputs. To enable efficient training, we support XLA compilation for all models, and run all preprocessing via a compiled graph of TensorFlow operations using the tf.data API. The libraries are fully open-source (Apache 2.0 license) and available on GitHub.

Updated: 2024-05-31 01:33:45

标题: KerasCV和KerasNLP：视觉和语言的增强功能

摘要: 我们介绍了Keras领域包KerasCV和KerasNLP，这是Keras API的扩展，用于计算机视觉和自然语言处理工作流，在JAX、TensorFlow或PyTorch上运行。这些领域包旨在实现快速实验，侧重于易用性和性能。我们采用模块化、分层设计：在库的最低抽象级别上，我们提供用于创建模型和数据预处理流程的构建块，在库的最高抽象级别上，我们提供了预训练的“任务”模型，如Stable Diffusion、YOLOv8、GPT2、BERT、Mistral、CLIP、Gemma、T5等。任务模型具有内置预处理、预训练权重，并可以在原始输入上进行微调。为了实现高效训练，我们支持所有模型的XLA编译，并通过使用tf.data API的TensorFlow操作的编译图运行所有预处理。这些库是完全开源的（Apache 2.0许可证），并可在GitHub上获得。

更新时间: 2024-05-31 01:33:45

领域: cs.AI,cs.CV,cs.LG,cs.SE,I.2.5; I.2.7; I.2.10

下载: http://arxiv.org/abs/2405.20247v2

Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

We present Spectron, a novel approach to adapting pre-trained large language models (LLMs) to perform spoken question answering (QA) and speech continuation. By endowing the LLM with a pre-trained speech encoder, our model becomes able to take speech inputs and generate speech outputs. The entire system is trained end-to-end and operates directly on spectrograms, simplifying our architecture. Key to our approach is a training objective that jointly supervises speech recognition, text continuation, and speech synthesis using only paired speech-text pairs, enabling a `cross-modal' chain-of-thought within a single decoding pass. Our method surpasses existing spoken language models in speaker preservation and semantic coherence. Furthermore, the proposed model improves upon direct initialization in retaining the knowledge of the original LLM as demonstrated through spoken QA datasets. We release our audio samples (https://michelleramanovich.github.io/spectron/spectron) and spoken QA dataset (https://github.com/google-research-datasets/LLAMA1-Test-Set).

Updated: 2024-05-31 01:29:27

标题: 口语问答和语音延续：使用声谱图驱动的LLM

摘要: 我们提出了Spectron，一种新颖的方法，用于将预训练的大型语言模型（LLMs）调整为执行口语问答（QA）和语音延续。通过赋予LLM预训练的语音编码器，我们的模型能够接受语音输入并生成语音输出。整个系统是端到端训练的，并直接在频谱图上运行，简化了我们的架构。我们方法的关键是一个训练目标，联合监督语音识别、文本延续和语音合成，仅使用配对的语音-文本对，实现在单个解码过程中的“跨模态”思维链。我们的方法在保留说话者特征和语义连贯性方面超越了现有的口语语言模型。此外，所提出的模型通过口语QA数据集展示，改善了直接初始化中保留原始LLM知识的能力。我们发布了我们的音频样本（https://michelleramanovich.github.io/spectron/spectron）和口语QA数据集（https://github.com/google-research-datasets/LLAMA1-Test-Set）。

更新时间: 2024-05-31 01:29:27

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2305.15255v4

Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases

As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and enhance the performance of DRL algorithms in this paper. We first introduce several classic GAI and DRL algorithms and demonstrate the applications of GAI-enhanced DRL algorithms. Then, we discuss how to use GAI to improve DRL algorithms from the data and policy perspectives. Subsequently, we introduce a framework that demonstrates an actual and novel integration of GAI with DRL, i.e., GAI-enhanced DRL. Additionally, we provide a case study of the framework on UAV-assisted integrated near-field/far-field communication to validate the performance of the proposed framework. Moreover, we present several future directions. Finally, the related code is available at: https://xiewenwen22.github.io/GAI-enhanced-DRL.

Updated: 2024-05-31 01:25:40

标题: 基于深度强化学习的生成式人工智能：框架、分析和应用案例

摘要: 作为基于交互式学习的人工智能（AI）技术形式，深度强化学习（DRL）已广泛应用于各个领域，并取得了显著成就。然而，DRL面临一定的限制，包括样本效率低和泛化能力差。因此，本文介绍如何利用生成式人工智能（GAI）来解决上述问题，并提高DRL算法的性能。首先，我们介绍了几种经典的GAI和DRL算法，并展示了GAI增强的DRL算法的应用。然后，我们讨论如何从数据和策略的角度利用GAI来改进DRL算法。随后，我们介绍了一个框架，展示了GAI与DRL的实际和新颖集成，即GAI增强的DRL。此外，我们提供了一个关于无人机辅助集成近场/远场通信的案例研究，以验证所提出框架的性能。此外，我们提出了几个未来方向。最后，相关代码可在以下链接找到：https://xiewenwen22.github.io/GAI-enhanced-DRL。

更新时间: 2024-05-31 01:25:40

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.20568v1

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Instruction tuning -- tuning large language models on instruction-output pairs -- is a promising technique for making models better adapted to the real world. Yet, the key factors driving the model's capability to understand and follow instructions not seen during training remain under-explored. Our investigation begins with a series of synthetic experiments within the theoretical framework of a Turing-complete algorithm called Markov algorithm, which allows fine-grained control over the instruction-tuning data. Generalization and robustness with respect to the training distribution emerge once a diverse enough set of tasks is provided, even though very few examples are provided for each task. We extend these initial results to a real-world application scenario of code generation and find that a more diverse instruction set, extending beyond code-related tasks, improves the performance of code generation. Our observations suggest that a more diverse semantic space for instruction-tuning sets greatly improves the model's ability to follow instructions and perform tasks.

Updated: 2024-05-31 01:23:41

标题: 从符号任务到代码生成：多样性产生更好的任务执行者

摘要: 指令调优——在指令输出对上调优大型语言模型——是一种使模型更适应现实世界的有前途的技术。然而，驱动模型理解和遵循在训练期间未见的指令能力的关键因素仍未被充分探索。我们的研究从在图灵完备算法称为马尔可夫算法的理论框架内进行一系列合成实验开始，这种算法允许对指令调优数据进行精细控制。一旦提供了足够多样化的任务集，尽管每个任务只提供了很少的示例，泛化性和对训练分布的鲁棒性会出现。我们将这些初步结果扩展到一个代码生成的真实应用场景中，并发现一个更多样化的指令集，扩展到超出与代码相关的任务，会提高代码生成的性能。我们的观察表明，一个更多样化的语义空间对指令调优集会大大提高模型遵循指令和执行任务的能力。

更新时间: 2024-05-31 01:23:41

领域: cs.CL,cs.AI,cs.LG,cs.LO,cs.PL

下载: http://arxiv.org/abs/2405.19787v2

SNeurodCNN: Structure-focused Neurodegeneration Convolutional Neural Network for Modelling and Classification of Alzheimer's Disease

Alzheimer's disease (AD), the predominant form of dementia, is a growing global challenge, emphasizing the urgent need for accurate and early diagnosis. Current clinical diagnoses rely on radiologist expert interpretation, which is prone to human error. Deep learning has thus far shown promise for early AD diagnosis. However, existing methods often overlook focal structural atrophy critical for enhanced understanding of the cerebral cortex neurodegeneration. This paper proposes a deep learning framework that includes a novel structure-focused neurodegeneration CNN architecture named SNeurodCNN and an image brightness enhancement preprocessor using gamma correction. The SNeurodCNN architecture takes as input the focal structural atrophy features resulting from segmentation of brain structures captured through magnetic resonance imaging (MRI). As a result, the architecture considers only necessary CNN components, which comprises of two downsampling convolutional blocks and two fully connected layers, for achieving the desired classification task, and utilises regularisation techniques to regularise learnable parameters. Leveraging mid-sagittal and para-sagittal brain image viewpoints from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, our framework demonstrated exceptional performance. The para-sagittal viewpoint achieved 97.8% accuracy, 97.0% specificity, and 98.5% sensitivity, while the mid-sagittal viewpoint offered deeper insights with 98.1% accuracy, 97.2% specificity, and 99.0% sensitivity. Model analysis revealed the ability of SNeurodCNN to capture the structural dynamics of mild cognitive impairment (MCI) and AD in the frontal lobe, occipital lobe, cerebellum, temporal, and parietal lobe, suggesting its potential as a brain structural change digi-biomarker for early AD diagnosis. This work can be reproduced using code we made available on GitHub.

Updated: 2024-05-31 01:10:42

标题: SNeurodCNN：结构关注型神经退行性卷积神经网络用于模拟和分类阿尔茨海默病

摘要: 阿尔茨海默病（AD）是一种日益增长的主要痴呆形式，是一个全球性挑战，强调了对准确和早期诊断的迫切需要。目前的临床诊断依赖于放射学专家的解释，容易出现人为错误。迄今为止，深度学习已显示出对早期AD诊断的潜力。然而，现有方法往往忽视了对增强对大脑皮层神经退行性理解至关重要的局部结构萎缩。本文提出了一个深度学习框架，其中包括一个名为SNeurodCNN的新型结构关注神经退行性CNN架构和一个使用伽马校正的图像亮度增强预处理程序。SNeurodCNN架构以磁共振成像（MRI）捕获的大脑结构分割结果为输入，考虑了只包括两个下采样卷积块和两个全连接层的必要CNN组件，以实现所需的分类任务，并利用正则化技术来规范可学习参数。利用阿尔茨海默病神经影像计划（ADNI）数据集中的中矢状和侧矢状大脑图像视角，我们的框架表现出色。侧矢状视角实现了97.8%的准确率，97.0%的特异性和98.5%的灵敏度，而中矢状视角则提供了更深入的洞察力，准确率为98.1%，特异性为97.2%，灵敏度为99.0%。模型分析显示了SNeurodCNN捕捉轻度认知障碍（MCI）和AD在额叶、枕叶、小脑、颞叶和顶叶的结构动态的能力，表明其作为早期AD诊断的大脑结构变化数字生物标志物的潜力。这项工作可以使用我们在GitHub上提供的代码进行复制。

更新时间: 2024-05-31 01:10:42

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.03922v3

Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study

Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome. In this work we conduct a feasibility study to check if machine learning can be applied effectively for diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting. Various ML models, including Logistic Regression, Support Vector Machine, k-Nearest Neighbor, Decision Tree and Random Forest, were applied to data from the UK Adult ITP Registry and a general hematology clinic. Two different approaches were investigated: a demographic-unaware and a demographic-aware one. We conduct extensive experiments to evaluate the predictive performance of these models and approaches, as well as their bias. The results revealed that Decision Tree and Random Forest models were both superior and fair, achieving nearly perfect predictive and fairness scores, with platelet count identified as the most significant variable. Models not provided with demographic information performed better in terms of predictive accuracy but showed lower fairness score, illustrating a trade-off between predictive performance and fairness.

Updated: 2024-05-31 01:04:46

标题: 机器学习能帮助诊断原发性免疫性血小板减少症吗？一项可行性研究

摘要: 原发性免疫性血小板减少症（ITP）是一种罕见的自身免疫性疾病，其特征是免疫介导的外周血小板破坏，导致患者血小板计数降低和出血。ITP的诊断和有效管理具有挑战性，因为尚无确诊疾病的已建立测试，也没有生物标志物可预测治疗反应和预后。在这项工作中，我们进行了一项可行性研究，以检查机器学习是否能够有效地应用于非急性门诊设置中使用常规血液检查和人口统计数据进行ITP的诊断。我们应用了各种机器学习模型，包括逻辑回归、支持向量机、k-最近邻、决策树和随机森林，对来自英国成人ITP登记处和一般血液学诊所的数据进行了应用。我们研究了两种不同的方法：一种是不考虑人口统计信息的方法，另一种是考虑人口统计信息的方法。我们进行了大量实验来评估这些模型和方法的预测性能，以及它们的偏见。结果显示，决策树和随机森林模型均表现出较高的预测性能和公平性得分，血小板计数被确定为最重要的变量。没有提供人口统计信息的模型在预测准确性方面表现更好，但显示出更低的公平性得分，说明了预测性能和公平性之间的权衡。

更新时间: 2024-05-31 01:04:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20562v1

All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts

In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the void, in this paper, we aim to characterize and detect this kind of emerging vulnerability. We design and implement AVVERIFIER, a lightweight taint analyzer based on static EVM opcode simulation. Its three-phase detector can progressively rule out false positives and false negatives based on the intrinsic characteristics. Upon a well-established and unbiased benchmark, AVVERIFIER can improve efficiency 2 to 5 times than the SOTA while maintaining a 94.3% precision and 100% recall. After a large-scale evaluation of over 5 million Ethereum smart contracts, we have identified 812 vulnerable smart contracts that were undisclosed by our community before this work, and 348 open source smart contracts were further verified, whose largest total value locked is over $11.2 billion. We further deploy AVVERIFIER as a real-time detector on Ethereum and Binance Smart Chain, and the results suggest that AVVERIFIER can raise timely warnings once contracts are deployed.

Updated: 2024-05-31 01:02:07

标题: 我们拥有您的所有令牌：揭示Solidity智能合约中地址验证漏洞

摘要: 在以太坊中，验证传递地址的有效性是一种常见做法，这是确保智能合约安全执行的关键步骤。地址验证过程中的漏洞可能导致严重的安全问题，我们的社区已经报告了一些事实证据。然而，这种类型的漏洞尚未得到很好的研究。为了填补这一空白，本文旨在表征和检测这种新兴漏洞。我们设计并实现了AVVERIFIER，这是一个基于静态EVM操作码模拟的轻量级污点分析器。其三阶段检测器可以根据内在特性逐步排除误报和漏报。在一个建立良好且无偏见的基准上，AVVERIFIER的效率可以提高2到5倍，同时保持94.3%的精度和100%的召回率。在对超过500万个以太坊智能合约进行大规模评估后，我们发现了812个脆弱的智能合约，这些合约在此前的工作中未被披露，还进一步验证了348个开源智能合约，其总锁定价值超过112亿美元。我们进一步在以太坊和币安智能链上部署AVVERIFIER作为实时检测器，结果表明AVVERIFIER可以在合约部署后及时发出警告。

更新时间: 2024-05-31 01:02:07

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2405.20561v1

Certifying Global Robustness for Deep Neural Networks

A globally robust deep neural network resists perturbations on all meaningful inputs. Current robustness certification methods emphasize local robustness, struggling to scale and generalize. This paper presents a systematic and efficient method to evaluate and verify global robustness for deep neural networks, leveraging the PAC verification framework for solid guarantees on verification results. We utilize probabilistic programs to characterize meaningful input regions, setting a realistic standard for global robustness. Additionally, we introduce the cumulative robustness curve as a criterion in evaluating global robustness. We design a statistical method that combines multi-level splitting and regression analysis for the estimation, significantly reducing the execution time. Experimental results demonstrate the efficiency and effectiveness of our verification method and its capability to find rare and diversified counterexamples for adversarial training.

Updated: 2024-05-31 00:46:04

标题: 为深度神经网络认证全局鲁棒性

摘要: 一个全球性稳健的深度神经网络能够抵抗所有有意义输入的扰动。当前的稳健性认证方法侧重于局部稳健性，很难扩展和泛化。本文提出了一种系统化和高效的方法来评估和验证深度神经网络的全局稳健性，利用PAC验证框架对验证结果提供可靠保证。我们利用概率程序来描述有意义的输入区域，为全局稳健性设定了一个现实标准。此外，我们引入了累积稳健性曲线作为评估全局稳健性的标准。我们设计了一种统计方法，结合多级分割和回归分析用于估计，显著减少执行时间。实验结果表明我们的验证方法的高效性和有效性，以及其寻找罕见和多样化对抗训练反例的能力。

更新时间: 2024-05-31 00:46:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20556v1

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

The study of time series data is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. Not only do they enhance the generative and inferential capabilities for sequential and temporal data, but they also extend to other downstream tasks. In this survey, we comprehensively and thoroughly review the use of diffusion models in time series and spatio-temporal data, categorizing them by model category, task type, data modality, and practical application domain. In detail, we categorize diffusion models into unconditioned and conditioned types and discuss time series data and spatio-temporal data separately. Unconditioned models, which operate unsupervised, are subdivided into probability-based and score-based models, serving predictive and generative tasks such as forecasting, anomaly detection, classification, and imputation. Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks. Our survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Through this structured overview, we aim to provide researchers and practitioners with a comprehensive understanding of diffusion models for time series and spatio-temporal data analysis, aiming to direct future innovations and applications by addressing traditional challenges and exploring innovative solutions within the diffusion model framework.

Updated: 2024-05-31 00:44:31

标题: 一份关于时间序列和时空数据扩散模型的调查

摘要: 时间序列数据的研究对于理解趋势和随时间变化的异常至关重要，可以为各个领域提供预测性见解。另一方面，时空数据对于分析空间和时间中的现象至关重要，为复杂系统交互提供动态视角。最近，扩散模型在时间序列和时空数据挖掘中得到了广泛应用。它们不仅增强了对顺序和时间数据的生成和推理能力，而且还延伸到其他下游任务。在这项调查中，我们全面而彻底地审视了扩散模型在时间序列和时空数据中的应用，按模型类别、任务类型、数据模态和实际应用领域进行分类。具体而言，我们将扩散模型分为未条件和条件类型，并分别讨论时间序列数据和时空数据。未条件模型是无监督操作的，分为基于概率和基于分数的模型，用于预测和生成任务，如预测、异常检测、分类和插补。另一方面，条件模型利用额外信息提高性能，并分别用于预测和生成任务。我们的调查广泛涵盖了它们在各个领域的应用，包括医疗保健、推荐、气候、能源、音频和交通，为了深入了解这些模型如何分析和生成数据。通过这种结构化概述，我们旨在为研究人员和从业者提供扩散模型在时间序列和时空数据分析中的全面理解，旨在通过解决传统挑战和探索扩散模型框架内的创新解决方案，引导未来的创新和应用。

更新时间: 2024-05-31 00:44:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18886v2

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

In offline reinforcement learning (RL), it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. Policy-regularized methods address this problem by constraining the target policy to stay close to the behavior policy. Although several approaches suggest representing the behavior policy as an expressive diffusion model to boost performance, it remains unclear how to regularize the target policy given a diffusion-modeled behavior sampler. In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models. Our approach follows the actor-critic learning paradigm that we alternatively train a diffusion-modeled target policy and a critic network. The actor training loss includes a soft Q-guidance term from the Q-gradient. The soft Q-guidance grounds on the theoretical solution of the KL constraint policy iteration, which prevents the learned policy from taking out-of-distribution actions. For critic training, we train a Q-ensemble to stabilize the estimation of Q-gradient. Additionally, DAC employs lower confidence bound (LCB) to address the overestimation and underestimation of value targets due to function approximation error. Our approach is evaluated on the D4RL benchmarks and outperforms the state-of-the-art in almost all environments. Code is available at \href{https://github.com/Fang-Lin93/DAC}{\texttt{github.com/Fang-Lin93/DAC}}.

Updated: 2024-05-31 00:41:04

标题: 演员-评论家扩散算法：将受限策略迭代形式化为离线强化学习的扩散噪声回归

摘要: 在离线强化学习（RL）中，必须管理超出分布的动作，以防止价值函数的过度估计。政策正则化方法通过将目标政策限制在行为政策附近来解决这个问题。尽管有几种方法建议将行为政策表示为表现性扩散模型以提高性能，但如何在给定扩散建模的行为采样器的情况下对目标政策进行正则化仍不清楚。在本文中，我们提出了扩散演员-评论家（DAC），将Kullback-Leibler（KL）约束政策迭代公式化为扩散噪声回归问题，从而直接表示目标政策为扩散模型。我们的方法遵循演员-评论家学习范式，即我们交替训练一个扩散建模的目标政策和一个评论家网络。演员训练损失包括来自Q梯度的软Q引导项。软Q引导基于KL约束政策迭代的理论解决方案，可以防止学习到的政策采取超出分布的动作。对于评论家训练，我们训练一个Q集合以稳定Q梯度的估计。此外，DAC采用较低置信下界（LCB）来解决由于函数逼近误差而导致的价值目标的过度估计和低估计。我们的方法在D4RL基准测试中进行了评估，并在几乎所有环境中均优于最先进的方法。代码可在\href{https://github.com/Fang-Lin93/DAC}{\texttt{github.com/Fang-Lin93/DAC}}找到。

更新时间: 2024-05-31 00:41:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.20555v1

EM-Assist: Safe Automated ExtractMethod Refactoring with LLMs

Excessively long methods, loaded with multiple responsibilities, are challenging to understand, debug, reuse, and maintain. The solution lies in the widely recognized Extract Method refactoring. While the application of this refactoring is supported in modern IDEs, recommending which code fragments to extract has been the topic of many research tools. However, they often struggle to replicate real-world developer practices, resulting in recommendations that do not align with what a human developer would do in real life. To address this issue, we introduce EM-Assist, an IntelliJ IDEA plugin that uses LLMs to generate refactoring suggestions and subsequently validates, enhances, and ranks them. Finally, EM-Assist uses the IntelliJ IDE to apply the user-selected recommendation. In our extensive evaluation of 1,752 real-world refactorings that actually took place in open-source projects, EM-Assist's recall rate was 53.4% among its top-5 recommendations, compared to 39.4% for the previous best-in-class tool that relies solely on static analysis. Moreover, we conducted a usability survey with 18 industrial developers and 94.4% gave a positive rating.

Updated: 2024-05-31 00:32:04

标题: EM-Assist：使用LLMs进行安全自动化的ExtractMethod重构

摘要: 方法过长，负载过多责任，难以理解、调试、重用和维护。解决方法在于广泛认可的提取方法重构。虽然现代IDE支持此重构的应用，但推荐要提取哪些代码片段一直是许多研究工具的研究主题。然而，它们往往难以复制真实开发者的实践，导致推荐与人类开发者在现实生活中所做的不一致。为了解决这个问题，我们引入了一个名为EM-Assist的IntelliJ IDEA插件，它使用LLMs生成重构建议，随后验证、增强和排名这些建议。最后，EM-Assist使用IntelliJ IDE来应用用户选择的建议。在我们对1,752个真实发生在开源项目中的重构进行了广泛评估后，EM-Assist的前五个推荐中的召回率为53.4％，而之前仅依靠静态分析的最佳工具为39.4％。此外，我们对18名工业开发者进行了可用性调查，94.4％给出了积极评价。

更新时间: 2024-05-31 00:32:04

领域: cs.SE,cs.HC,cs.LG,cs.PL

下载: http://arxiv.org/abs/2405.20551v1

A Novel Review of Stability Techniques for Improved Privacy-Preserving Machine Learning

Machine learning models have recently enjoyed a significant increase in size and popularity. However, this growth has created concerns about dataset privacy. To counteract data leakage, various privacy frameworks guarantee that the output of machine learning models does not compromise their training data. However, this privatization comes at a cost by adding random noise to the training process, which reduces model performance. By making models more resistant to small changes in input and thus more stable, the necessary amount of noise can be decreased while still protecting privacy. This paper investigates various techniques to enhance stability, thereby minimizing the negative effects of privatization in machine learning.

Updated: 2024-05-31 00:30:29

标题: 一种改进隐私保护机器学习稳定性技术的新方法回顾

摘要: 最近，机器学习模型在规模和受欢迎程度方面都取得了显著增长。然而，这种增长引发了关于数据集隐私的担忧。为了抵消数据泄露，各种隐私框架保证机器学习模型的输出不会泄露其训练数据。然而，这种私有化操作会增加随机噪音到训练过程中，从而降低模型性能。通过使模型更具抗性对输入的微小变化，从而更加稳定，可以减少所需的噪音量，同时保护隐私。本文研究了各种增强稳定性的技术，从而最大程度地减少机器学习中私有化的负面影响。

更新时间: 2024-05-31 00:30:29

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.00073v1

Uncertainty Quantification for Deep Learning

A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each uncertainty source can be systematically quantified. We also introduce a fast and practical way to incorporate and combine all sources of errors for the first time. For illustration, the new method is applied to quantify errors in cloud autoconversion rates, predicted from an artificial neural network that was trained by aircraft cloud probe measurements in the Azores and the stochastic collection equation formulated as a two-moment bin model. For this specific example, the output uncertainty arising from uncertainty in the training and testing data is dominant, followed by uncertainty in the input data, in the trained neural network, and uncertainty in the weights. We discuss the usefulness of the methodology for machine learning practice, and how, through inclusion of uncertainty in the training data, the new methodology is less sensitive to input data that falls outside of the training data set.

Updated: 2024-05-31 00:20:19

标题: 深度学习的不确定性量化

摘要: 提供了深度学习的完整且统计一致的不确定性量化，包括由以下原因产生的不确定性来源：（1）新输入数据，（2）训练和测试数据，（3）神经网络的权重向量，以及（4）神经网络本身因为它不是完美的预测器。通过贝叶斯定理和条件概率密度，我们展示了如何系统地量化每个不确定性来源。我们还介绍了一种快速且实用的方式，首次将所有错误来源结合起来。为了说明，新方法被应用于量化云自动转化率的误差，这些误差是从一个人工神经网络预测的，该网络是通过亚速尔群岛的飞机云探测器测量和作为两时刻箱模型的随机收集方程训练而成的。对于这个具体的示例，由训练和测试数据中的不确定性引起的输出不确定性是主导的，其次是输入数据中的不确定性，训练后的神经网络中的不确定性以及权重中的不确定性。我们讨论了该方法对机器学习实践的用处，并且通过包含训练数据中的不确定性，新方法对于落在训练数据集之外的输入数据不太敏感。

更新时间: 2024-05-31 00:20:19

领域: cs.LG,stat.ML,62D99,G.3

下载: http://arxiv.org/abs/2405.20550v1

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

Within the domain of medical analysis, extensive research has explored the potential of mutual learning between Masked Autoencoders(MAEs) and multimodal data. However, the impact of MAEs on intermodality remains a key challenge. We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis. We explore MAEs for zero-shot learning with crossed domains, which enhances the model's ability to learn from limited data, a common scenario in medical diagnostics. We verify that masking an image does not affect inter-modal learning. Furthermore, we propose the SVD loss to enhance the representation learning for characteristics of medical images, aiming to improve classification accuracy by leveraging the structural intricacies of such data. Our theory posits that masking encourages semantic preservation, robust feature extraction, regularization, domain adaptation, and invariance learning. Lastly, we validate using language will improve the zero-shot performance for the medical image analysis. MedFLIP's scaling of the masking process marks an advancement in the field, offering a pathway to rapid and precise medical image analysis without the traditional computational bottlenecks. Through experiments and validation, MedFLIP demonstrates efficient performance improvements, helps for future research and application in medical diagnostics.

Updated: 2024-05-31 00:12:59

标题: MedFLIP：医学视觉与语言自监督快速预训练与遮蔽自编码器

摘要: 在医学分析领域，广泛的研究探索了掩模自动编码器（MAEs）和多模态数据之间相互学习的潜力。然而，MAEs对跨模态的影响仍然是一个关键挑战。我们引入了MedFLIP，一种用于医学分析的快速语言-图像预训练方法。我们探索了MAEs在跨领域的零样本学习中的应用，这增强了模型从有限数据中学习的能力，这在医学诊断中是一个常见情景。我们验证了对图像进行掩模处理不会影响跨模态学习。此外，我们提出了SVD损失来增强医学图像特征的表示学习，旨在通过利用这些数据的结构复杂性来提高分类准确性。我们的理论认为，掩模鼓励语义保留、稳健特征提取、正则化、领域适应和不变学习。最后，我们验证使用语言将提高医学图像分析的零样本性能。MedFLIP对掩模过程的扩展标志着该领域的进步，为快速而精确的医学图像分析提供了一条道路，避免了传统的计算瓶颈。通过实验和验证，MedFLIP展示了高效的性能改进，有助于未来在医学诊断领域的研究和应用。

更新时间: 2024-05-31 00:12:59

领域: eess.IV,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.04626v2

Towards a General GNN Framework for Combinatorial Optimization

Graph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce a novel GNN architecture which leverages a complex filter bank and localized attention mechanisms designed to solve CO problems on graphs. We show how our method differentiates itself from prior GNN-based CO solvers and how it can be effectively applied to the maximum clique, minimum dominating set, and maximum cut problems in a self-supervised learning setting. In addition to demonstrating competitive overall performance across all tasks, we establish state-of-the-art results for the max cut problem.

Updated: 2024-05-31 00:02:07

标题: 朝向一个通用的图神经网络框架用于组合优化

摘要: 图神经网络（GNNs）在节点分类、图分类和链接预测等各种任务中取得了巨大成功。然而，利用GNNs（以及更广泛地说是机器学习）来解决组合优化（CO）问题的研究远远不及。在这里，我们介绍了一种新颖的GNN架构，利用复杂的滤波器组和局部注意机制来解决图上的CO问题。我们展示了我们的方法如何与先前基于GNN的CO求解器有所区别，以及如何在自监督学习环境中有效应用于最大团、最小支配集和最大割问题。除了展示在所有任务中竞争性的整体表现外，我们还确立了最大割问题的最新成果。

更新时间: 2024-05-31 00:02:07

领域: cs.LG,cs.AI,cs.DM,68T07 (Primary) 68T20, 90C35, 05C62 (Secondary),F.2.2; I.2.6

下载: http://arxiv.org/abs/2405.20543v1