Arxiv Day: Article

A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and attacks without fine-tuning. Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms.

Updated: 2024-06-10 23:54:21

标题: 大语言模型的后门攻击和防御调查：对安全措施的影响

摘要: 大型语言模型（LLMs）弥合了人类语言理解与复杂问题解决之间的差距，在几个NLP任务中取得了最先进的性能，特别是在少样本和零样本设置下。尽管LMMs的有效性已经得到证明，但由于计算资源的限制，用户必须与开源语言模型互动或将整个训练过程外包给第三方平台。然而，研究表明语言模型容易受到潜在的安全漏洞的影响，尤其是在后门攻击方面。后门攻击旨在通过植入恶意触发器来向语言模型中引入有针对性的漏洞，从而允许攻击者通过毒化训练样本或模型权重来操纵模型响应。虽然现有的关于后门攻击的调查提供了全面的概述，但它们缺乏对专门针对LLMs的后门攻击进行深入研究。为了弥补这一差距并了解该领域的最新趋势，本文通过关注微调方法，提出了一种针对LLMs的后门攻击的新视角。具体来说，我们将后门攻击系统地分类为三类：全参数微调、参数高效微调以及无需微调的攻击。根据大量审查的见解，我们还讨论了未来关于后门攻击的研究中关键问题，例如进一步探索不需要微调的攻击算法，或开发更隐蔽的攻击算法。

更新时间: 2024-06-10 23:54:21

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06852v1

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost of increased chip area. In this work, we first conduct a large-scale analysis of LLM weights and activations across 30 networks and conclude that most distributions follow a Student's t-distribution. We then derive a new theoretically optimal format, Student Float (SF4), that improves over NF4 across modern LLMs, for example increasing the average accuracy on LLaMA2-7B by 0.76% across tasks. Using this format as a high-accuracy reference, we then propose augmenting E2M1 with two variants of supernormal support for higher model accuracy. Finally, we explore the quality and efficiency frontier across 11 datatypes by evaluating their model accuracy and hardware complexity. We discover a Pareto curve composed of INT4, E2M1, and E2M1 with supernormal support, which offers a continuous tradeoff between model accuracy and chip area. For example, E2M1 with supernormal support increases the accuracy of Phi-2 by up to 2.19% with 1.22% area overhead, enabling more LLM-based applications to be run at four bits. The supporting code is hosted at https://github.com/cornell-zhang/llm-datatypes.

Updated: 2024-06-10 23:41:18

标题: 学习于学生：应用t-分布探索适用于LLMs的准确和高效格式

摘要: 这项工作研究了大型语言模型（LLMs）的不断增大尺寸传统上需要低精度整数格式来满足严格的延迟和功耗要求。然而，最近，诸如正态浮点（NF4）之类的替代格式提高了模型的准确性，但增加了芯片面积。在这项工作中，我们首先对30个网络中的LLM权重和激活进行了大规模分析，并得出大多数分布遵循学生t分布的结论。然后，我们推导出一种新的理论上最佳格式，学生浮点（SF4），在现代LLMs中优于NF4，例如在LLaMA2-7B上跨任务平均准确度提高了0.76%。使用这种格式作为高准确度参考，我们提出了用两种超常支持的E2M1来增加模型准确性。最后，我们通过评估它们的模型准确性和硬件复杂性，探索了11种数据类型的质量和效率前沿。我们发现了由INT4、E2M1和带有超常支持的E2M1组成的帕累托曲线，提供了模型准确性和芯片面积之间的连续权衡。例如，带有超常支持的E2M1可以提高Phi-2的准确度高达2.19%，并带来1.22%的面积开销，使更多基于LLM的应用程序可以在四位数上运行。支持的代码托管在https://github.com/cornell-zhang/llm-datatypes。

更新时间: 2024-06-10 23:41:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.03103v2

Flexible Parametric Inference for Space-Time Hawkes Processes

Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in the intensity function of a space-time Hawkes process based on such data. Our statistical approach combines three key ingredients: 1) kernels with finite support are considered, 2) the space-time domain is appropriately discretized, and 3) (approximate) precomputations are used. The inference technique we propose then consists of a $\ell_2$ gradient-based solver that is fast and statistically accurate. In addition to describing the algorithmic aspects, numerical experiments have been carried out on synthetic and real spatio-temporal data, providing solid empirical evidence of the relevance of the proposed methodology.

Updated: 2024-06-10 23:40:16

标题: 弹性参数推断的空间-时间霍克斯过程

摘要: 许多现代的时空数据集，例如社会学、流行病学或地震学等领域，表现出自激特性，同时触发和聚类行为，适当的Hawkes时空过程能够准确捕捉这些特征。本文旨在开发一种快速灵活的参数推断技术，以恢复基于此类数据的时空Hawkes过程的强度函数中涉及的核函数的参数。我们的统计方法结合了三个关键因素：1）考虑有限支持的核函数，2）适当将时空域离散化，3）使用（近似）预计算。我们提出的推断技术包括一个快速且统计准确的基于$\ell_2$梯度的求解器。除了描述算法方面，还对合成和真实的时空数据进行了数值实验，提供了对所提出方法的实证证据。

更新时间: 2024-06-10 23:40:16

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.06849v1

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?

Recent multimodal large language models (MLLMs) have shown promising instruction following capabilities on vision-language tasks. In this work, we introduce VISUAL MODALITY INSTRUCTION (VIM), and investigate how well multimodal models can understand textual instructions provided in pixels, despite not being explicitly trained on such data during pretraining or fine-tuning. We adapt VIM to eight benchmarks, including OKVQA, MM-Vet, MathVista, MMMU, and probe diverse MLLMs in both the text-modality instruction (TEM) setting and VIM setting. Notably, we observe a significant performance disparity between the original TEM and VIM settings for open-source MLLMs, indicating that open-source MLLMs face greater challenges when text instruction is presented solely in image form. To address this issue, we train v-MLLM, a generalizable model that is capable to conduct robust instruction following in both text-modality and visual-modality instructions.

Updated: 2024-06-10 23:39:24

标题: 文本作为图像：多模态大型语言模型能否按照像素打印的指令进行跟随？

摘要: 最近的多模态大型语言模型（MLLMs）在视觉-语言任务上展现出了有希望的指导能力。在这项工作中，我们引入了视觉模态指导（VIM），并研究了多模态模型在像素级提供的文本指导上的理解能力，尽管在预训练或微调过程中并没有明确训练过这样的数据。我们将VIM适应到包括OKVQA、MM-Vet、MathVista、MMMU等八个基准测试中，并在文本模态指导（TEM）设置和VIM设置中探索各种MLLMs。值得注意的是，我们观察到对于开源MLLMs，原始TEM和VIM设置之间存在显著的性能差距，表明当文本指导仅以图像形式呈现时，开源MLLMs面临更大的挑战。为了解决这个问题，我们训练了v-MLLM，这是一个通用模型，能够在文本模态和视觉模态指导中进行强大的指导跟随。

更新时间: 2024-06-10 23:39:24

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.17647v2

Taxes Are All You Need: Integration of Taxonomical Hierarchy Relationships into the Contrastive Loss

In this work, we propose a novel supervised contrastive loss that enables the integration of taxonomic hierarchy information during the representation learning process. A supervised contrastive loss operates by enforcing that images with the same class label (positive samples) project closer to each other than images with differing class labels (negative samples). The advantage of this approach is that it directly penalizes the structure of the representation space itself. This enables greater flexibility with respect to encoding semantic concepts. However, the standard supervised contrastive loss only enforces semantic structure based on the downstream task (i.e. the class label). In reality, the class label is only one level of a \emph{hierarchy of different semantic relationships known as a taxonomy}. For example, the class label is oftentimes the species of an animal, but between different classes there are higher order relationships such as all animals with wings being ``birds". We show that by explicitly accounting for these relationships with a weighting penalty in the contrastive loss we can out-perform the supervised contrastive loss. Additionally, we demonstrate the adaptability of the notion of a taxonomy by integrating our loss into medical and noise-based settings that show performance improvements by as much as 7%.

Updated: 2024-06-10 23:36:58

标题: "税收就是你需要的一切：将分类层次关系整合到对比损失中"

摘要: 在这项工作中，我们提出了一种新颖的监督对比损失，可以在表示学习过程中整合分类层次信息。监督对比损失通过强制要求具有相同类标签（正样本）的图像投影互相靠近，而具有不同类标签（负样本）的图像之间投影相距较远。这种方法的优势在于直接惩罚表示空间本身的结构，从而在编码语义概念方面具有更大的灵活性。然而，标准的监督对比损失只基于下游任务（即类标签）强制实现语义结构。实际上，类标签只是称为“分类法”的不同语义关系层次中的一个级别。例如，类标签通常是动物的物种，但在不同类之间存在更高级别的关系，例如所有有翅膀的动物被称为“鸟类”。我们表明，通过在对比损失中明确考虑这些关系，通过加权惩罚，我们可以优于监督对比损失。此外，我们展示了分类法概念的适应性，将我们的损失集成到医学和基于噪声的环境中，表现出高达7%的性能提升。

更新时间: 2024-06-10 23:36:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06848v1

Hypernetworks for Personalizing ASR to Atypical Speech

Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for -- the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarcity and high inter/intra-speaker variability further limit the effectiveness of traditional fine-tuning. To circumvent these challenges, we first identify the minimal set of model parameters required for ASR adaptation. Our analysis of each individual parameter's effect on adaptation performance allows us to reduce Word Error Rate (WER) by half while adapting 0.03% of all weights. Alleviating the need for cohort-specific models, we next propose the novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics. Evaluating adaptation at the global, cohort and individual-level, we show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.

Updated: 2024-06-10 23:33:10

标题: 超网络用于个性化非典型语音自动语音识别

摘要: 参数高效微调（PEFT）用于个性化自动语音识别（ASR）最近显示出将一般人群模型适应非典型语音的潜力。然而，这些方法假设对正在适应的非典型语音障碍有先验知识 - 其诊断需要专业知识，而这种专业知识并不总是可用。即使具有这种知识，数据稀缺和高的讲者内/间变异性进一步限制了传统微调的有效性。为了避开这些挑战，我们首先确定了用于ASR适应所需的最小模型参数集。我们对每个单独参数对适应性能的影响的分析使我们能够将词误率（WER）减少一半，同时适应所有权重的0.03％。为了减轻对队友特定模型的需求，我们接下来提出了使用元学习的超网络的新颖用途，以实时生成针对各种非典型语音特征的高度个性化的话语级适应。通过在全局、队友和个体级别评估适应性，我们展示了超网络更好地概括了分布之外的讲者，同时保持了使用完整参数预算的0.1％的总体相对WER减少了75.2％。

更新时间: 2024-06-10 23:33:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.04240v3

Rate-Preserving Reductions for Blackwell Approachability

Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation between problems preserves not only a sublinear rate of convergence, but also preserves the optimal rate of convergence. That is, in which cases does it suffice to find the optimal regret bound for a no-regret learning instance in order to find the optimal rate of convergence for a corresponding approachability instance? We show that the reduction of Abernethy et al. (2011) does not preserve rates: their reduction may reduce a $d$-dimensional approachability instance $I_1$ with optimal convergence rate $R_1$ to a no-regret learning instance $I_2$ with optimal regret-per-round of $R_2$, with $R_{2}/R_{1}$ arbitrarily large (in particular, it is possible that $R_1 = 0$ and $R_{2} > 0$). On the other hand, we show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization we call improper $\phi$-regret minimization (a variant of the $\phi$-regret minimization of Gordon et al. (2008) where the transformation functions may map actions outside of the action set). Finally, we characterize when linear transformations suffice to reduce improper $\phi$-regret minimization problems to standard classes of regret minimization problems in a rate preserving manner. We prove that some improper $\phi$-regret minimization instances cannot be reduced to either subclass of instance in this way, suggesting that approachability can capture some problems that cannot be phrased in the language of online learning.

Updated: 2024-06-10 23:23:52

标题: Rate-Preserving Reductions for Blackwell Approachability 的翻译是：保持速率的降低对于Blackwell可接近性

摘要: Abernethy等人（2011年）表明，Blackwell可达性和无悔学习是等价的，即解决特定Blackwell可达性实例的任何算法都可以转换为特定无悔学习实例的次线性后悔算法，反之亦然。在本文中，我们研究了这种降低的更精细形式，并询问何时在问题之间的这种转换不仅保留次线性的收敛速度，还保留最佳的收敛速度。也就是说，在哪些情况下，找到无悔学习实例的最佳后悔界限就足以找到对应可达性实例的最佳收敛速度？我们表明，Abernethy等人（2011年）的降低并不保留速率：他们的降低可能将具有最佳收敛速率$R_1$的$d$维可达性实例$I_1$降低为具有最佳每轮后悔$R_2$的无悔学习实例$I_2$，其中$R_{2}/R_{1}$可以任意大（特别地，可能$R_1=0$且$R_{2}>0$）。另一方面，我们表明，可以紧密将任何可达性实例降低为我们称之为不当$\phi$后悔最小化（Gordon等人（2008年）的$\phi$后悔最小化的变体，其中转换函数可能将动作映射到动作集之外）实例。最后，我们表征了何时线性变换足以以保持速率的方式将不当$\phi$后悔最小化问题降低到标准后悔最小化问题的常见类别。我们证明一些不当$\phi$后悔最小化实例无法以这种方式降低到任何子类实例，这表明可达性可以涵盖一些无法用在线学习语言表述的问题。

更新时间: 2024-06-10 23:23:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.07585v1

Compass: A Comprehensive Tool for Accurate and Efficient Molecular Docking in Inference and Fine-Tuning

While there has been discussion about noise levels in molecular docking datasets such as PDBBind, a thorough analysis of their physical/chemical and bioactivity noise characteristics is still lacking. PoseCheck addresses this issue by examining molecular strain energy, molecular-protein clashes, and interactions, but it is primarily created for $de$ $novo$ drug design. Another important metric in molecular docking, Binding Affinity Energy, is better assessed by the new empirical score function, AA-Score, which has demonstrated improved performance over existing methods. To tackle these challenges, we propose the COMPASS method, which integrates the PoseCheck and AA-Score modules. This approach evaluates dataset noise levels and the physical/chemical and bioactivity feasibility of docked molecules. Our analysis of the PDBBind dataset using COMPASS reveals significant noise in the ground truth data. Additionally, we incorporate COMPASS with the state-of-the-art molecular docking method, DiffDock, in inference mode to achieve efficient and accurate assessments of docked ligands. Finally, we propose a new paradigm to enhance model performance for molecular docking through fine-tuning and discuss the potential benefits of this approach. The source code is available publicly at https://github.com/BIMSBbioinfo/Compass.

Updated: 2024-06-10 23:23:36

标题: 指南针：一种用于推断和微调中准确高效的分子对接的全面工具

摘要: 尽管关于分子对接数据集（如PDBBind）中的噪声水平已经进行了讨论，但对其物理/化学和生物活性噪声特征的彻底分析仍然缺乏。PoseCheck通过检查分子应变能量、分子-蛋白冲突和相互作用来解决这个问题，但它主要是为了$de$ $novo$药物设计而创建的。分子对接中另一个重要的度量标准是结合亲和能量，新的经验评分函数AA-Score更好地评估了这一指标，并且已经证明相比现有方法有更好的性能。为了解决这些挑战，我们提出了COMPASS方法，该方法集成了PoseCheck和AA-Score模块。这种方法评估了数据集中的噪声水平以及对接分子的物理/化学和生物活性可行性。我们使用COMPASS对PDBBind数据集进行的分析揭示了地面真实数据中的显著噪声。此外，我们将COMPASS与最先进的分子对接方法DiffDock结合在推断模式下，以实现对对接配体的高效和准确评估。最后，我们提出了一种通过微调来增强分子对接模型性能的新范式，并讨论了这种方法的潜在好处。源代码可在https://github.com/BIMSBbioinfo/Compass上公开获取。

更新时间: 2024-06-10 23:23:36

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.06841v1

Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles

A dog whistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination. Dog whistling historically originated from United States politics, but in recent years has taken root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. In this paper, we present an approach for word-sense disambiguation of dog whistles from standard speech using Large Language Models (LLMs), and leverage this technique to create a dataset of 16,550 high-confidence coded examples of dog whistles used in formal and informal communication. Silent Signals is the largest dataset of disambiguated dog whistle usage, created for applications in hate speech detection, neology, and political science. The dataset can be found at https://huggingface.co/datasets/SALT-NLP/silent_signals.

Updated: 2024-06-10 23:09:19

标题: 沉默的信号，巨大的影响：用于编码狗哨的词义消歧的LLMs

摘要: 狗哨是一种编码通信形式，向特定受众传达次级含义，通常被用作种族和社会经济歧视的武器。狗哨在历史上源自美国政治，但近年来已在社交媒体上生根，用作规避仇恨言论检测系统并保持可否认性的手段。本文提出了一种利用大型语言模型（LLMs）进行狗哨词义消歧的方法，并利用这一技术创建了一个包含16,550个高置信度编码示例的数据集，这些狗哨在正式和非正式交流中被使用。《沉默的信号》是迄今最大的狗哨用法数据集，旨在应用于仇恨言论检测、新词汇学和政治科学。数据集可在https://huggingface.co/datasets/SALT-NLP/silent_signals 找到。

更新时间: 2024-06-10 23:09:19

领域: cs.CL,cs.LG,J.4; K.4.1; K.4.2

下载: http://arxiv.org/abs/2406.06840v1

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate $\eta$ can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by $1/\eta - 1/2 + \widetilde{O}(\sigma + \sqrt{\mathrm{MSE}})$ where $\sigma$ is the label noise level, $\mathrm{MSE}$ is short for mean squared error against the ground truth, and $\widetilde{O}(\cdot)$ hides a logarithmic factor. Under mild assumptions, we also prove a nearly-optimal MSE bound of $\widetilde{O}(n^{-4/5})$ within the strict interior of the support of the $n$ data points. Our theoretical results are validated by extensive simulation that demonstrates large learning rate training induces sparse linear spline fits. To the best of our knowledge, we are the first to obtain generalization bound via minima stability in the non-interpolation case and the first to show ReLU NNs without regularization can achieve near-optimal rates in nonparametric regression.

Updated: 2024-06-10 22:57:27

标题: 稳定的最小值不会在单变量ReLU网络中过拟合：通过大步长实现泛化

摘要: 我们研究了具有噪声标签的一元非参数回归问题中两层ReLU神经网络的泛化。这是一个核（如NTK）被证明是次优且良性过拟合不会发生的问题，因此淘汰了现有的插值（0损失，全局最优）解的理论。我们提出了一个新的理论，用于描述梯度下降可以稳定收敛到的局部最小值。我们表明，使用固定学习率$\eta$的梯度下降只能找到代表具有一定加权一阶总变化受限于$1/\eta - 1/2 + \widetilde{O}(\sigma + \sqrt{\mathrm{MSE}})$的平滑函数的局部最小值，其中$\sigma$是标签噪声水平，$\mathrm{MSE}$是与真实值的均方误差，$\widetilde{O}(\cdot)$中隐藏了一个对数因子。在温和的假设下，我们还证明了在$n$个数据点的支持的严格内部内的几乎最优MSE界限为$\widetilde{O}(n^{-4/5})$。我们的理论结果通过广泛的模拟得到验证，表明大学习率训练导致了稀疏线性样条拟合。据我们所知，我们是第一个通过极小值稳定性在非插值情况下获得泛化界限的研究，并且也是第一个展示ReLU神经网络在无正则化的情况下可以在非参数回归中实现接近最优的速率。

更新时间: 2024-06-10 22:57:27

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.06838v1

Personalized Binomial DAGs Learning with Network Structured Covariates

The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.

Updated: 2024-06-10 22:33:24

标题: 个性化二项式DAGs学习与网络结构协变量

摘要: 数据中的因果依赖通常由有向无环图（DAG）模型来描述，在许多领域广泛使用。因果发现旨在利用观测数据恢复DAG结构。本文专注于使用多变量计数数据进行因果发现。我们受到现实世界网站访问数据的启发，记录了用户访问多个网站的情况。建立因果图可以帮助理解用户在不同网站之间的转换行为，启发运营策略。建模中的一个挑战是用户异质性，因为具有不同背景的用户表现出不同的行为。此外，社交网络连接可能导致朋友之间出现类似的行为。我们引入了个性化的二项式DAG模型来解决现实世界应用中常见的异质性和观测之间的网络依赖关系。为了学习提出的DAG模型，我们开发了一种算法，将网络结构嵌入到降维的协变量中，学习每个节点的邻域以减少DAG搜索空间，并探索方差均值关系以确定排序。模拟结果显示我们的算法在异质数据中表现优于最先进的竞争对手。我们在一个真实世界的网站访问数据集上展示了其实际有用性。

更新时间: 2024-06-10 22:33:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06829v1

Graph Contrastive Learning under Heterophily via Graph Filters

Graph contrastive learning (CL) methods learn node representations in a self-supervised manner by maximizing the similarity between the augmented node representations obtained via a GNN-based encoder. However, CL methods perform poorly on graphs with heterophily, where connected nodes tend to belong to different classes. In this work, we address this problem by proposing an effective graph CL method, namely HLCL, for learning graph representations under heterophily. HLCL first identifies a homophilic and a heterophilic subgraph based on the cosine similarity of node features. It then uses a low-pass and a high-pass graph filter to aggregate representations of nodes connected in the homophilic subgraph and differentiate representations of nodes in the heterophilic subgraph. The final node representations are learned by contrasting both the augmented high-pass filtered views and the augmented low-pass filtered node views. Our extensive experiments show that HLCL outperforms state-of-the-art graph CL methods on benchmark datasets with heterophily, as well as large-scale real-world graphs, by up to 7%, and outperforms graph supervised learning methods on datasets with heterophily by up to 10%.

Updated: 2024-06-10 22:31:20

标题: 图对比学习在异质性图上通过图滤波器实现

摘要: 图对比学习（CL）方法通过最大化通过基于GNN的编码器获得的增强节点表示之间的相似度，以自监督的方式学习节点表示。然而，在具有异质性的图中，CL方法表现不佳，其中连接的节点往往属于不同的类。在这项工作中，我们通过提出一种有效的图CL方法HLCL来解决这个问题，用于在异质性下学习图表示。HLCL首先根据节点特征的余弦相似度识别同质性和异质性子图。然后，它使用低通和高通图滤波器来聚合同质性子图中连接节点的表示，并区分异质性子图中节点的表示。最终的节点表示是通过对比增强的高通滤波视图和增强的低通滤波节点视图来学习的。我们的广泛实验表明，HLCL在具有异质性的基准数据集以及大规模真实世界图上的表现优于最先进的图CL方法，最高可提高至7%，并且在具有异质性的数据集上优于图监督学习方法，最高可提高至10%。

更新时间: 2024-06-10 22:31:20

领域: cs.LG

下载: http://arxiv.org/abs/2303.06344v2

AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model aligns at a high level with many of the major scientific theories of human and animal consciousness, supporting our claim that machine consciousness is inevitable.

Updated: 2024-06-10 22:29:49

标题: 人工智能意识是不可避免的：一个理论计算机科学的视角

摘要: 我们通过理论计算机科学的视角来研究意识，这是数学的一个分支，研究在资源限制下的计算。从这个角度来看，我们为意识开发了一个形式的机器模型。这个模型受到艾伦·图灵简单而强大的计算模型和伯纳德·巴尔斯意识剧院模型的启发。尽管非常简单，这个模型在高层次上与许多主要的人类和动物意识的科学理论保持一致，支持我们的观点，即机器意识是不可避免的。

更新时间: 2024-06-10 22:29:49

领域: cs.AI,68T01,F.1; I.2

下载: http://arxiv.org/abs/2403.17101v4

(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better understand the theoretical advantage of SHB and subsequently improve the method. For strongly-convex quadratics, Kidambi et al. (2018) show that SHB (with a mini-batch of size $1$) cannot attain accelerated convergence, and hence has no theoretical benefit over SGD. They conjecture that the practical gain of SHB is a by-product of using larger mini-batches. We first substantiate this claim by showing that SHB can attain an accelerated rate when the mini-batch size is larger than a threshold $b^*$ that depends on the condition number $\kappa$. Specifically, we prove that with the same step-size and momentum parameters as in the deterministic setting, SHB with a sufficiently large mini-batch size results in an $O\left(\exp(-\frac{T}{\sqrt{\kappa}}) + \sigma \right)$ convergence, where $T$ is the number of iterations and $\sigma^2$ is the variance in the stochastic gradients. We prove a lower-bound which demonstrates that a $\kappa$ dependence in $b^*$ is necessary. To ensure convergence to the minimizer, we design a noise-adaptive multi-stage algorithm that results in an $O\left(\exp\left(-\frac{T}{\sqrt{\kappa}}\right) + \frac{\sigma}{T}\right)$ rate. We also consider the general smooth, strongly-convex setting and propose the first noise-adaptive SHB variant that converges to the minimizer at an $O(\exp(-\frac{T}{\kappa}) + \frac{\sigma^2}{T})$ rate. We empirically demonstrate the effectiveness of the proposed algorithms.

Updated: 2024-06-10 22:16:01

标题: 加速噪声自适应随机重球动量

摘要: 随机重球动量（SHB）通常用于训练机器学习模型，并且通常比随机梯度下降提供经验上的改进。通过主要关注强凸二次函数，我们旨在更好地理解SHB的理论优势，并随后改进该方法。对于强凸二次函数，Kidambi等人（2018）表明，SHB（具有大小为$1$的迷你批处理）无法实现加速收敛，因此与SGD相比没有理论上的好处。他们推测SHB的实际增益是使用更大的迷你批次的副产品。我们首先通过展示，当迷你批次大小大于取决于条件数$\kappa$的阈值$b^*$时，SHB可以实现加速率来证实这一主张。具体来说，我们证明，具有与确定性设置中相同步长和动量参数的SHB，当迷你批次大小足够大时，会导致$O\left(\exp(-\frac{T}{\sqrt{\kappa}}) + \sigma \right)$的收敛，其中$T$是迭代次数，$\sigma^2$是随机梯度的方差。我们证明了一个下界，证明了$b^*$中的$\kappa$依赖性是必要的。为了确保收敛到最小化器，我们设计了一个噪声自适应多阶段算法，其结果是$O\left(\exp\left(-\frac{T}{\sqrt{\kappa}}\right) + \frac{\sigma}{T}\right)$的速率。我们还考虑了一般的平滑、强凸设置，并提出了第一个收敛到最小化器的噪声自适应SHB变体，其速率为$O(\exp(-\frac{T}{\kappa}) + \frac{\sigma^2}{T})$。我们通过实验证明了所提算法的有效性。

更新时间: 2024-06-10 22:16:01

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.06738v2

A local squared Wasserstein-2 method for efficient reconstruction of models with uncertainty

In this paper, we propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters. A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models. Instead, our method can efficiently reconstruct the distributions of the output associated with different inputs based on empirical distributions of observation data. We demonstrate the effectiveness of our proposed method across several uncertainty quantification (UQ) tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.

Updated: 2024-06-10 22:15:55

标题: 一种用于带有不确定性模型的高效重建的本地平方Wasserstein-2方法

摘要: 在本文中，我们提出了一种局部平方Wasserstein-2（W_2）方法来解决具有不确定潜变量或参数的重建模型的逆问题。我们方法的一个关键优势是不需要关于潜变量或参数分布的先验信息。相反，我们的方法可以根据观测数据的经验分布有效地重建与不同输入相关的输出分布。我们证明了我们提出的方法在几个不确定性量化（UQ）任务中的有效性，包括具有系数不确定性的线性回归，使用权重不确定性训练神经网络，以及重建具有潜在随机变量的常微分方程（ODEs）。

更新时间: 2024-06-10 22:15:55

领域: stat.ML,cs.LG,math.PR,60E05, 62D05

下载: http://arxiv.org/abs/2406.06825v1

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.

Updated: 2024-06-10 22:11:00

标题: 局部相互依赖的多智能体MDP：动态依赖下分散智能体的理论框架

摘要: 实践中许多多智能体系统是分散的，并且具有动态变化的依赖关系。文献中缺乏对这些系统进行理论分析的尝试。在本文中，我们提出并理论分析了一个名为局部相互依赖多智能体MDP的分散模型，该模型可以表示许多不同领域的问题，如合作导航、障碍物回避和形成控制。尽管一般局部可观察多智能体系统受到的难以处理，我们提出了三种在这种情况下在理论上接近最优的闭合形式策略，并且可以扩展到计算和存储。因此，我们揭示了局部相互依赖多智能体MDP的一个基本属性，即部分可观察的分散解决方案在可见半径方面与完全可观察的解决方案呈指数接近。然后，我们讨论了将我们的闭合形式策略扩展以进一步改善可处理性。最后，我们通过提供模拟来研究我们的闭合形式策略的一些长期行为。

更新时间: 2024-06-10 22:11:00

领域: cs.LG,cs.AI,cs.MA,math.OC

下载: http://arxiv.org/abs/2406.06823v1

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion.

Updated: 2024-06-10 22:10:05

标题: 一种LLM辅助的易触发后门攻击代码补全模型：注入伪装漏洞以对抗强大检测

摘要: 大型语言模型（LLMs）已经改变了代码完成任务，提供基于上下文的建议，以提高软件工程中开发人员的生产力。由于用户经常为特定应用程序微调这些模型，因此毒化和后门攻击可能会秘密地改变模型输出。为了解决这一关键的安全挑战，我们引入了CodeBreaker，这是一个在代码完成模型上使用LLM的后门攻击框架的先驱。与最近将恶意载荷嵌入在可检测或无关紧要的代码部分（例如注释）的攻击不同，CodeBreaker利用LLMs（例如GPT-4）进行复杂的载荷转换（而不影响功能），确保用于微调的毒化数据和生成的代码都可以避开强大的漏洞检测。CodeBreaker以其对漏洞的全面覆盖而脱颖而出，使其成为第一个提供如此广泛集合以供评估的工具。我们广泛的实验评估和用户研究强调了CodeBreaker在各种设置下的强大攻击性能，验证了其优于现有方法的优越性。通过将恶意载荷直接集成到源代码中并进行最小转换，CodeBreaker挑战了当前的安全措施，强调了对代码完成更强大的防御措施的迫切需求。

更新时间: 2024-06-10 22:10:05

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.06822v1

Adapters Strike Back

Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However, they have often been found to be outperformed by other adaptation mechanisms, including low-rank adaptation. In this paper, we provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We uncover pitfalls for using adapters and suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings. Despite this, our suggested adapter is highly robust and, unlike previous work, requires little to no manual intervention when addressing a novel scenario. Adapter+ reaches state-of-the-art average accuracy on the VTAB benchmark, even without a per-task hyperparameter optimization.

Updated: 2024-06-10 22:07:57

标题: 适配器反击

摘要: 适配器为将训练好的transformer模型适应于各种不同任务提供了高效且轻量级的机制。然而，它们通常被发现在适应机制中被其他方法，包括低秩适应，超越。在本文中，我们对适配器进行了深入研究，包括它们的内部结构以及各种实现选择。我们揭示了使用适配器的一些缺陷，并提出了一个具体的、改进的适配器架构，称为Adapter+，不仅超越了先前的适配器实现，而且在几个具有挑战性的场景中超越了许多其他更复杂的适应机制。尽管如此，我们提出的适配器非常稳健，与以往的工作不同，在处理新情况时几乎不需要手动干预。Adapter+在VTAB基准测试中达到了最先进的平均准确度，甚至无需为每个任务进行超参数优化。

更新时间: 2024-06-10 22:07:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06820v1

Unraveling overoptimism and publication bias in ML-driven science

Machine Learning (ML) is increasingly used across many disciplines with impressive reported results across many domain areas. However, recent studies suggest that the published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy should improve or remain stable with increasing sample size. This paper investigates factors contributing to overoptimistic accuracy reports in ML-driven science, focusing on data leakage and publication bias. We introduce a novel stochastic model for observed accuracy, integrating parametric learning curves and the aforementioned biases. We then construct an estimator that corrects for these biases in observed data. Theoretical and empirical results show that our framework can estimate the underlying learning curve, providing realistic performance assessments from published results. Applying the model to meta-analyses in ML-driven science, including neuroimaging-based and speech-based classifications of neurological conditions, we find prevalent overoptimism and estimate the inherent limits of ML-based prediction in each domain.

Updated: 2024-06-10 22:04:49

标题: 揭示机器学习驱动科学中的过度乐观和出版偏见

摘要: 机器学习（ML）越来越多地应用于许多学科，在许多领域取得了令人印象深刻的成果。然而，最近的研究表明，ML模型的发布性能往往过于乐观。有效性问题凸显了发现在已发表的ML模型中样本量与报告的准确性之间存在逆相关关系，与学习曲线理论相矛盾，根据学习曲线理论，准确性应随着样本量的增加而提高或保持稳定。本文调查了导致ML驱动科学中准确性报告过于乐观的因素，重点关注数据泄漏和出版偏见。我们引入了一个新颖的随机模型用于观察准确性，将参数学习曲线和上述偏见整合在一起。然后我们构建了一个校正这些偏见的观测数据的估计器。理论和实证结果表明，我们的框架可以估计潜在的学习曲线，从已发表的结果中提供现实的性能评估。将该模型应用于ML驱动科学的元分析中，包括基于神经影像和基于语音的神经疾病分类，我们发现过度乐观并估计了每个领域中基于ML的预测的固有限制。

更新时间: 2024-06-10 22:04:49

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.14422v2

Open Ad Hoc Teamwork with Cooperative Game Theory

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.

Updated: 2024-06-10 22:01:50

标题: 用合作博弈理论进行开放式临时团队合作

摘要: Ad hoc团队合作提出了一个具有挑战性的问题，需要设计一个代理与队友合作，而无需事先协调或共同培训。开放式ad hoc团队合作（OAHT）进一步复杂化了这一挑战，因为它考虑了在队友数量变化的环境中进行合作，被称为开放团队。实践中解决这一问题的一个有前途的解决方案是利用图神经网络的泛化能力来处理具有各种代理类型的无限数量的代理，称为基于图的策略学习（GPL）。然而，其在协调图上的联合Q值表示缺乏令人信服的解释。在本文中，我们建立了一个新理论，通过合作博弈理论的视角来理解OAHT的联合Q值表示及其学习范式。基于我们的理论，我们提出了一种名为CIAO的新算法，基于GPL的框架，具有额外的可证明的实现技巧，可以促进学习。实验结果的演示可以在https://sites.google.com/view/ciao2024 上找到，并且实验代码已发布在https://github.com/hsvgbkhgbv/CIAO。

更新时间: 2024-06-10 22:01:50

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2402.15259v4

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.

Updated: 2024-06-10 22:01:34

标题: 透过增强标签排序校准的方式，为类别覆盖进行一致性预测

摘要: 共形预测（CP）是一种新兴的不确定性量化框架，允许我们构建一个预测集，以覆盖真实标签，并具有预先指定的边际或条件概率。尽管对于分类问题已经广泛研究了有效的覆盖保证，但CP通常会产生大型预测集，这可能在实践中并不实用。对于不平衡分类任务的类条件覆盖设置，这个问题变得更加严重。本文提出了Rank Calibrated Class-conditional CP（RC3P）算法，以减小预测集大小以实现类条件覆盖，其中有效覆盖对每个类都成立。与标准的类条件CP（CCP）方法相比，后者对每个类均匀阈值化类别-wise符合度得分，增加的标签排序校准步骤允许RC3P仅针对类别-wise顶k误差较小的子集选择性地迭代这种类别-wise阈值化子程序。我们证明，RC3P不考虑分类器和数据分布，实现了类别-wise覆盖。我们还展示，与CCP方法相比，RC3P减小了预测集的大小。对多个真实世界数据集的全面实验表明，RC3P实现了类别-wise覆盖，并且平均减少了26.25%的预测集大小。

更新时间: 2024-06-10 22:01:34

领域: cs.LG

下载: http://arxiv.org/abs/2406.06818v1

Optimizing Large Language Models for OpenAPI Code Completion

Recent advancements in Large Language Models (LLMs) and their utilization in code generation tasks have significantly reshaped the field of software development. Despite the remarkable efficacy of code completion solutions in mainstream programming languages, their performance lags when applied to less ubiquitous formats such as OpenAPI definitions. This study evaluates the OpenAPI completion performance of GitHub Copilot, a prevalent commercial code completion tool, and proposes a set of task-specific optimizations leveraging Meta's open-source model Code Llama. A semantics-aware OpenAPI completion benchmark proposed in this research is used to perform a series of experiments through which the impact of various prompt-engineering and fine-tuning techniques on the Code Llama model's performance is analyzed. The fine-tuned Code Llama model reaches a peak correctness improvement of 55.2% over GitHub Copilot despite utilizing 25 times fewer parameters than the commercial solution's underlying Codex model. Additionally, this research proposes an enhancement to a widely used code infilling training technique, addressing the issue of underperformance when the model is prompted with context sizes smaller than those used during training. The dataset, the benchmark, and the model fine-tuning code are made publicly available.

Updated: 2024-06-10 21:58:24

标题: 为OpenAPI代码补全优化大型语言模型

摘要: 最近大型语言模型（LLMs）的进展及其在代码生成任务中的利用显著改变了软件开发领域。尽管主流编程语言中的代码补全解决方案效果显著，但在应用于不太普遍的格式如OpenAPI定义时，其性能仍有所滞后。本研究评估了GitHub Copilot这一常见商业代码补全工具在OpenAPI补全性能上的表现，并提出了一套任务特定的优化方案，利用Meta的开源模型Code Llama。本研究提出了一个基于语义意识的OpenAPI补全基准，用于进行一系列实验，分析各种提示工程和微调技术对Code Llama模型性能的影响。经过微调的Code Llama模型在GitHub Copilot的基础Codex模型的参数使用量的25倍以下的情况下，达到了55.2%的最高正确性改善。此外，本研究提出了一种对广泛使用的代码填充训练技术的增强，解决了当模型受到比训练中使用的上下文尺寸更小的提示时性能不佳的问题。数据集、基准和模型微调代码均已公开。

更新时间: 2024-06-10 21:58:24

领域: cs.SE,cs.CL,cs.LG,68T07, 68T50, 68T05,I.2.2; I.2.6; I.2.7; D.1.2; D.2.1; D.2.3; D.2.6

下载: http://arxiv.org/abs/2405.15729v2

Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal

The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins -- the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." The major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago, or maybe never evolved in the first place. We outline a computational approach to solving this problem. By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before. We envision this work forming a new sub-field of computational evolution we dub evolutionary algorithms simulating molecular evolution (EASME).

Updated: 2024-06-10 21:49:16

标题: 进化算法模拟分子进化：一个新的领域建议

摘要: 生命必需功能的遗传蓝图编码在DNA中，然后被翻译成蛋白质 - 驱动大部分代谢过程的引擎。最近基因组测序的进步揭示了庞大的蛋白质家族多样性，但与所有可能氨基酸序列的庞大搜索空间相比，已知功能家族的数量很少。可以说自然有着有限的蛋白质“词汇表”。因此，计算生物学家的主要问题是这个词汇表是否能够扩展，以包括很久以前灭绝或者从未进化的有用蛋白质。我们概述了一种解决这个问题的计算方法。通过将进化算法、机器学习（ML）和生物信息学相结合，我们可以促进开发以前从未存在过的全新蛋白质。我们设想这项工作将形成一个新的计算进化子领域，我们称之为模拟分子进化的进化算法（EASME）。

更新时间: 2024-06-10 21:49:16

领域: cs.NE,cs.AI,I.2.1

下载: http://arxiv.org/abs/2403.08797v2

On Learning what to Learn: heterogeneous observations of dynamics and establishing (possibly causal) relations among them

Before we attempt to learn a function between two (sets of) observables of a physical process, we must first decide what the inputs and what the outputs of the desired function are going to be. Here we demonstrate two distinct, data-driven ways of initially deciding ``the right quantities'' to relate through such a function, and then proceed to learn it. This is accomplished by processing multiple simultaneous heterogeneous data streams (ensembles of time series) from observations of a physical system: multiple observation processes of the system. We thus determine (a) what subsets of observables are common between the observation processes (and therefore observable from each other, relatable through a function); and (b) what information is unrelated to these common observables, and therefore particular to each observation process, and not contributing to the desired function. Any data-driven function approximation technique can subsequently be used to learn the input-output relation, from k-nearest neighbors and Geometric Harmonics to Gaussian Processes and Neural Networks. Two particular ``twists'' of the approach are discussed. The first has to do with the identifiability of particular quantities of interest from the measurements. We now construct mappings from a single set of observations of one process to entire level sets of measurements of the process, consistent with this single set. The second attempts to relate our framework to a form of causality: if one of the observation processes measures ``now'', while the second observation process measures ``in the future'', the function to be learned among what is common across observation processes constitutes a dynamical model for the system evolution.

Updated: 2024-06-10 21:37:36

标题: 关于学习什么和怎样学习：动态的异质观察及建立它们之间的（可能因果）关系

摘要: 在我们尝试学习物理过程中两个（集合的）可观测量之间的函数之前，我们必须首先确定期望函数的输入和输出是什么。在这里，我们展示了两种不同的数据驱动方式来最初决定通过这样一个函数相关的“正确数量”，然后继续学习它。这是通过处理来自物理系统观测的多个同时异构数据流（时间序列集合）实现的：系统的多个观测过程。因此，我们确定了（a）哪些可观测量子集在观测过程之间是共同的（因此可以从彼此观测到，通过函数相关）；和（b）哪些信息与这些共同可观测量无关，因此是特定于每个观测过程的，并且不会对期望函数做出贡献。随后可以使用任何数据驱动的函数近似技术来学习输入输出关系，从k最近邻和几何谐波到高斯过程和神经网络。讨论了两种特殊的“扭转”方法。第一个与特定兴趣数量的可识别性有关从测量中。我们现在构建从一个过程的单个观察集到与该单个集一致的过程的整个水平集的映射。第二个试图将我们的框架与因果形式相关联：如果一个观测过程测量“现在”，而第二个观测过程测量“将来”，则在观测过程之间共同的学习函数组成系统演变的动态模型。

更新时间: 2024-06-10 21:37:36

领域: cs.LG,cs.NA,math.DS,math.NA

下载: http://arxiv.org/abs/2406.06812v1

Graph Positional and Structural Encoder

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, rendering them essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for all graph prediction tasks is a challenging and unsolved problem. Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph encoder designed to capture rich PSE representations for augmenting any GNN. GPSE learns an efficient common latent representation for multiple PSEs, and is highly transferable: The encoder trained on a particular graph dataset can be used effectively on datasets drawn from markedly different distributions and modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly outperform those that employ explicitly computed PSEs, and at least match their performance in others. Our results pave the way for the development of foundational pre-trained graph encoders for extracting positional and structural information, and highlight their potential as a more powerful and efficient alternative to explicitly computed PSEs and existing self-supervised pre-training approaches. Our framework and pre-trained models are publicly available at https://github.com/G-Taxonomy-Workgroup/GPSE. For convenience, GPSE has also been integrated into the PyG library to facilitate downstream applications.

Updated: 2024-06-10 21:36:14

标题: 图位置和结构编码器

摘要: 位置和结构编码（PSE）使得在图中更好地识别节点成为可能，从而成为赋予现代GNNs和特别是图变换器的关键工具。然而，设计适用于所有图预测任务的PSE是一个具有挑战性且尚未解决的问题。在这里，我们提出了图位置和结构编码器（GPSE），这是第一个旨在捕获丰富PSE表示以增强任何GNN的图编码器。GPSE学习了多个PSE的高效共同潜在表示，并且具有高度的可转移性：在特定图数据集上训练的编码器可以有效地用于从明显不同分布和模态的数据集中绘制。我们展示，在广泛的基准测试中，GPSE增强模型可以显著优于那些使用显式计算的PSE，并且至少在其他方面匹配它们的性能。我们的结果为开发用于提取位置和结构信息的基础预训练图编码器铺平了道路，并突出了它们作为显式计算的PSE和现有自监督预训练方法的更强大和高效的替代方案的潜力。我们的框架和预训练模型可以在https://github.com/G-Taxonomy-Workgroup/GPSE 公开获取。为了方便起见，GPSE还已集成到PyG库中，以促进下游应用。

更新时间: 2024-06-10 21:36:14

领域: cs.LG

下载: http://arxiv.org/abs/2307.07107v2

Learning Continually by Spectral Regularization

Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early phases of learning. From this perspective, we derive new regularization strategies for continual learning that ensure beneficial initialization properties are better maintained throughout training. In particular, we investigate two new regularization techniques for continual learning: (i) Wasserstein regularization toward the initial weight distribution, which is less restrictive than regularizing toward initial weights; and (ii) regularizing weight matrix singular values, which directly ensures gradient diversity is maintained throughout training. We present an experimental analysis that shows these alternative regularizers can improve continual learning performance across a range of supervised learning tasks and model architectures. The alternative regularizers prove to be less sensitive to hyperparameters while demonstrating better training in individual tasks, sustaining trainability as new tasks arrive, and achieving better generalization performance.

Updated: 2024-06-10 21:34:43

标题: 通过光谱正则化持续学习

摘要: 失去可塑性是一种现象，即神经网络在学习过程中变得更难训练。持续学习算法旨在通过保持良好的预测性能同时保持网络的可训练性来减轻这种影响。我们通过首先重新考虑初始化如何确保在学习的早期阶段保持可训练性来开发新的技术来改进持续学习。从这个角度出发，我们提出了新的持续学习正则化策略，以确保有益的初始化属性在整个训练过程中得到更好地维护。具体来说，我们研究了两种新的持续学习正则化技术：(i)朝向初始权重分布的Wasserstein正则化，相对于正则化初始权重来说更不受限制；(ii)正则化权重矩阵的奇异值，直接确保梯度多样性在整个训练过程中得到维持。我们展示了一项实验分析，表明这些替代正则化器可以改善各种监督学习任务和模型架构的持续学习性能。这些替代正则化器对超参数不太敏感，同时在个别任务中表现出更好的训练效果，在新任务到来时保持可训练性，并实现更好的泛化性能。

更新时间: 2024-06-10 21:34:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.06811v1

Analysis, Identification and Prediction of Parkinson Disease Sub-Types and Progression through Machine Learning

This paper represents a groundbreaking advancement in Parkinson disease (PD) research by employing a novel machine learning framework to categorize PD into distinct subtypes and predict its progression. Utilizing a comprehensive dataset encompassing both clinical and neurological parameters, the research applies advanced supervised and unsupervised learning techniques. This innovative approach enables the identification of subtle, yet critical, patterns in PD manifestation, which traditional methodologies often miss. Significantly, this research offers a path toward personalized treatment strategies, marking a major stride in the precision medicine domain and showcasing the transformative potential of integrating machine learning into medical research.

Updated: 2024-06-10 21:29:02

标题: 通过机器学习分析、识别和预测帕金森病亚型和进展

摘要: 这篇论文通过采用一种新颖的机器学习框架，将帕金森病（PD）分类为不同的亚型，并预测其进展，代表了帕金森病研究的一项突破性进展。利用包含临床和神经学参数的全面数据集，该研究应用了先进的监督和无监督学习技术。这种创新方法使得能够识别PD表现中微妙但至关重要的模式，这些传统方法往往会忽略。值得注意的是，这项研究为个性化治疗策略提供了一条道路，标志着精准医学领域的重大进展，并展示了将机器学习整合到医学研究中的变革潜力。

更新时间: 2024-06-10 21:29:02

领域: cs.LG

下载: http://arxiv.org/abs/2306.04748v2

Fast White-Box Adversarial Streaming Without a Random Oracle

Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse recovery problem and extend our result to other tasks such as distinct element estimation and low-rank approximation of matrices and tensors. The main drawback of previous work is that it requires a random oracle, which is especially problematic in the streaming model since the amount of randomness is counted in the space complexity of a streaming algorithm. Also, the previous work suffers from large update time. We construct a near-optimal solution for the sparse recovery problem in white-box adversarial streams, based on the subexponentially secure Learning with Errors assumption. Importantly, our solution does not require a random oracle and has a polylogarithmic per item processing time. We also give results in a related white-box adversarially robust distributed model. Our constructions are based on homomorphic encryption schemes satisfying very mild structural properties that are currently satisfied by most known schemes.

Updated: 2024-06-10 21:23:19

标题: 不使用随机预言的快速白盒对抗流处理

摘要: 最近，对于对抗性强的流处理问题引起了很多关注，其中流可以依赖于流处理算法的随机性。在这项工作中，我们考虑了一个强白盒对抗模型（Ajtai等人，PODS 2022），其中对手可以访问所有过去的随机硬币和流处理算法使用的参数。我们关注稀疏恢复问题，并将我们的结果扩展到其他任务，如不同元素估计和矩阵和张量的低秩逼近。先前工作的主要缺点是它需要一个随机预言者，在流处理模型中尤其有问题，因为随机性的数量计入了流处理算法的空间复杂性。此外，先前的工作更新时间较长。我们基于次指数安全的误差学习假设，在白盒对抗流中构建了稀疏恢复问题的近乎最优解。重要的是，我们的解决方案不需要随机预言者，并且每个项目的处理时间是对数多项式的。我们还在相关的白盒对抗鲁棒分布模型中给出了结果。我们的构造基于满足目前大多数已知方案的非常温和结构属性的同态加密方案。

更新时间: 2024-06-10 21:23:19

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2406.06808v1

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research.

Updated: 2024-06-10 21:22:08

标题: SciRIFF：一个用于增强科学文献指令跟随的资源

摘要: 我们提出了SciRIFF（科学资源用于指示遵循和微调），这是一个包含137K个指示遵循演示的数据集，涵盖了54项任务，涵盖了五个基本的科学文献理解能力：信息提取、摘要、问题回答、声明验证和分类。SciRIFF演示以其长输入上下文、详细任务规范和复杂结构化输出而引人注目。虽然在特定领域如临床医学和化学中可以获得指示遵循资源，但SciRIFF是第一个专注于从广泛的科学领域研究文献中提取和合成信息的数据集。为了展示SciRIFF的实用性，我们开发了一种样本有效的策略，通过在一系列通用领域和SciRIFF演示中进行额外的微调，来调整通用指示遵循模型以适应科学。在对九个保留的科学任务进行评估时，我们的模型--称为SciTulu，在7B和70B尺度上分别提高了28.1%和6.5%的性能，同时保持了通用指示遵循性能与基线之间的2%差距。我们对SciRIFF将促进LLM的发展和评估以帮助研究人员浏览不断增长的科学文献表示乐观。我们发布了我们的数据集、模型检查点以及数据处理和评估代码，以促进进一步的研究。

更新时间: 2024-06-10 21:22:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07835v1

Satisficing Exploration in Bandit Optimization

Motivated by the concept of satisficing in decision-making, we consider the problem of satisficing exploration in bandit optimization. In this setting, the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshold value) as frequently as possible. The performance is measured by satisficing regret, which is the cumulative deficit of the chosen arm's mean reward compared to the threshold. We propose SELECT, a general algorithmic template for Satisficing Exploration via LowEr Confidence bound Testing, that attains constant satisficing regret for a wide variety of bandit optimization problems in the realizable case (i.e., a satisficing arm exists). Specifically, given a class of bandit optimization problems and a corresponding learning oracle with sub-linear (standard) regret upper bound, SELECT iteratively makes use of the oracle to identify a potential satisficing arm with low regret. Then, it collects data samples from this arm, and continuously compares the LCB of the identified arm's mean reward against the threshold value to determine if it is a satisficing arm. As a complement, SELECT also enjoys the same (standard) regret guarantee as the oracle in the non-realizable case. Finally, we conduct numerical experiments to validate the performance of SELECT for several popular bandit optimization settings.

Updated: 2024-06-10 21:15:28

标题: 在赌博优化中的满意探索

摘要: 受满足性概念在决策中的启发，我们考虑在赌博优化中的满足性探索问题。在这种情况下，学习者旨在尽可能频繁地选择满足性手臂（平均奖励超过一定阈值的手臂）。绩效以满足性遗憾来衡量，即所选手臂的平均奖励与阈值之间的累积差值。我们提出了SELECT，一个用于通过低置信界测试实现满足性探索的通用算法模板，它在可实现情况下（即存在满足性手臂的情况下）为各种赌博优化问题实现了恒定的满足性遗憾。具体来说，给定一类赌博优化问题和相应的具有次线性（标准）遗憾上界的学习神经网络，SELECT迭代地利用神经网络来识别潜在的具有低遗憾的满足性手臂。然后，它从这个手臂收集数据样本，并不断地比较所识别手臂的平均奖励的LCB与阈值之间的差值，以确定它是否是一个满足性手臂。作为补充，SELECT在不可实现情况下也享有与神经网络相同的（标准）遗憾保证。最后，我们进行数值实验来验证SELECT在几种流行的赌博优化设置中的性能。

更新时间: 2024-06-10 21:15:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.06802v1

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

Updated: 2024-06-10 21:14:38

标题: 模型预测控制和强化学习：基于动态规划的统一框架

摘要: 在本文中，我们描述了一个将近似动态规划（DP）、模型预测控制（MPC）和强化学习（RL）连接起来的新概念框架。这个框架围绕两个算法展开，这两个算法在很大程度上是独立设计的，并通过牛顿方法这一强大机制协同操作。我们称它们为离线训练算法和在线对战算法。这些名称来自RL在游戏中取得的一些重大成功；主要例子是最近（2017年）的AlphaZero程序（玩国际象棋），以及结构相似且更早（1990年代）的TD-Gammon程序（玩戴具棋）。在这些游戏环境中，离线训练算法是用来教授程序如何评估位置并在任何给定位置生成良好移动的方法，而在线对战算法是用来实时与人类或计算机对手对战的方法。值得注意的是，离线训练和在线对战之间的协同作用也支撑了MPC（以及其他主要类别的序贯决策问题），实际上MPC的设计架构与AlphaZero和TD-Gammon非常相似。这种概念上的洞察为弥合RL和MPC之间的文化差距提供了一个途径，并为MPC中的一些基本问题带来了新的启示。这些问题包括通过展开提高稳定性特性、通过使用确定性等效处理不确定性、在涉及变化系统参数的自适应控制设置中MPC的韧性，以及通过牛顿方法暗示的超线性性能界的启示。

更新时间: 2024-06-10 21:14:38

领域: eess.SY,cs.AI,cs.SY,math.OC

下载: http://arxiv.org/abs/2406.00592v2

Towards Algorithmic Fairness by means of Instance-level Data Re-weighting based on Shapley Values

Algorithmic fairness is of utmost societal importance, yet state-of-the-art large-scale machine learning models require training with massive datasets that are frequently biased. In this context, pre-processing methods that focus on modeling and correcting bias in the data emerge as valuable approaches. In this paper, we propose FairShap, a novel instance-level data re-weighting method for fair algorithmic decision-making through data valuation by means of Shapley Values. FairShap is model-agnostic and easily interpretable. It measures the contribution of each training data point to a predefined fairness metric. We empirically validate FairShap on several state-of-the-art datasets of different nature, with a variety of training scenarios and machine learning models and show how it yields fairer models with similar levels of accuracy than the baselines. We illustrate FairShap's interpretability by means of histograms and latent space visualizations. Moreover, we perform a utility-fairness study and analyze FairShap's computational cost depending on the size of the dataset and the number of features. We believe that FairShap represents a novel contribution in interpretable and model-agnostic approaches to algorithmic fairness that yields competitive accuracy even when only biased training datasets are available.

Updated: 2024-06-10 21:10:55

标题: 通过基于Shapley值的实例级数据重新加权实现算法公平性

摘要: 算法公平性是极其重要的社会问题，然而最先进的大规模机器学习模型需要训练大量数据集，这些数据集经常存在偏见。在这种情况下，专注于建模和纠正数据中偏见的预处理方法成为有价值的方法。在本文中，我们提出了FairShap，一种新颖的基于实例级数据重新加权的方法，通过Shapley值对数据进行估值，实现公平的算法决策。FairShap是与模型无关且易于解释的。它衡量每个训练数据点对预定义公平度量的贡献。我们在几个不同特性的最先进数据集上对FairShap进行了经验验证，使用各种训练方案和机器学习模型，展示了如何产生比基准模型更公平且准确度类似的模型。我们通过直方图和潜在空间可视化展示了FairShap的可解释性。此外，我们进行了一项效用和公平性研究，并分析了FairShap的计算成本，具体取决于数据集的大小和特征数。我们相信FairShap代表了一种新颖的解释性和与模型无关的方法，用于算法公平性，即使只有存在偏见的训练数据集时，也能实现竞争性的准确性。

更新时间: 2024-06-10 21:10:55

领域: cs.LG,cs.AI,cs.CY,68T99,I.2.6; I.2

下载: http://arxiv.org/abs/2303.01928v4

Batched Nonparametric Contextual Bandits

We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In essence, our procedure dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret in the fully online setting.

Updated: 2024-06-10 21:10:00

标题: 批量非参数上下文臂 bandits

摘要: 我们研究了在批处理约束下的非参数上下文强化学习，其中每个动作的预期奖励被建模为协变量的平滑函数，策略更新在每个观察批次结束时进行。我们为这种情况建立了一个极小化遗憾的下限，并提出了一种新颖的批处理学习算法，实现了最优遗憾（直到对数因子）。实质上，我们的过程动态地将协变量空间分割成较小的箱子，仔细地将它们的宽度与批处理大小对齐。我们的理论结果表明，对于非参数上下文强化学习，几乎恒定数量的策略更新可以在完全在线设置中达到最优遗憾。

更新时间: 2024-06-10 21:10:00

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.17732v2

FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neural models trained on large datasets due to their superior performance and ability to handle data from diverse sensor modalities. However, such neural models are often trained on data collected from a particular set of sensor poses (i.e., locations and orientations). During real-world deployments, slight deviations from these sensor poses can result in extreme inaccuracies. To address this challenge, we introduce FlexLoc, which employs conditional neural networks to inject node perspective information to adapt the localization pipeline. Specifically, a small subset of model weights are derived from node poses at run time, enabling accurate generalization to unseen perspectives with minimal additional overhead. Our evaluations on a multimodal, multiview indoor tracking dataset showcase that FlexLoc improves the localization accuracy by almost 50% in the zero-shot case (no calibration data available) compared to the baselines. The source code of FlexLoc is available at https://github.com/nesl/FlexLoc.

Updated: 2024-06-10 21:02:53

标题: FlexLoc：用于零样本传感器透视不变性的条件神经网络，在具有分布式多模态传感器的物体定位中

摘要: 本文研究了定位技术在各种应用中的关键性，从导航和监测到辅助生活。定位系统通常通过融合来自不同视角传感器观察场景的信息来估计目标位置，同时利用多种模态以提高鲁棒性和准确性。最近，这种系统采用了在大型数据集上训练的端到端深度神经模型，因为它们具有卓越的性能和处理来自不同传感器模态的数据的能力。然而，这种神经模型通常是在从特定一组传感器姿势（即位置和方向）采集的数据上进行训练的。在实际部署中，与这些传感器姿势的微小偏差可能导致极端的不准确性。为了解决这一挑战，我们引入了FlexLoc，该系统采用条件神经网络将节点视角信息注入到定位流程中以进行适应。具体来说，在运行时，从节点姿势派生出一小部分模型权重，使其能够准确地泛化到未见过的视角，同时额外的开销最小化。我们对一个多模态、多视角的室内跟踪数据集进行的评估表明，与基线相比，FlexLoc在零校验数据可用的情况下将定位准确性提高了近50％。FlexLoc的源代码可在https://github.com/nesl/FlexLoc上找到。

更新时间: 2024-06-10 21:02:53

领域: cs.CV,cs.AI,cs.LG,cs.RO,eess.SP

下载: http://arxiv.org/abs/2406.06796v1

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it divides the data into safe and risky sets, by applying a Gaussian Mixture Model to the cosine similarity of image-caption pair representations. SAFECLIP pre-trains the model by applying the CLIP loss to the safe set and applying unimodal CL to image and text modalities of the risky set separately. By gradually increasing the size of the safe set during pre-training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments on CC3M, Visual Genome, and MSCOCO demonstrate that SAFECLIP significantly reduces the success rate of targeted data poisoning attacks from 93.75% to 0% and that of various backdoor attacks from up to 100% to 0%, without harming CLIP's performance.

Updated: 2024-06-10 21:01:11

标题: 宁可安全也不要后悔：对抗有针对性的数据毒害和后门攻击的预训练CLIP

摘要: 在大型图像-标题数据集上进行的对比语言-图像预训练（CLIP）在零样本分类方面取得了显著成功，并实现了对新领域的可转移性。然而，与监督学习相比，CLIP极其容易受到有针对性的数据毒害和后门攻击。令人惊讶的是，对CLIP预训练数据进行0.0001%的毒害就足以使有针对性的数据毒害攻击成功。这比毒害监督模型所需的数量级小了四个数量级。尽管存在这种脆弱性，现有方法在预训练期间保护CLIP模型的能力非常有限。在本研究中，我们提出了一个强大的防御方法SAFECLIP，以安全地预训练CLIP，防止有针对性的数据毒害和后门攻击。SAFECLIP通过分别在图像和文本模态上应用单模态对比学习（CL）来热身模型。然后，通过将图像-标题对表示的余弦相似度应用高斯混合模型，将数据分为安全集和风险集。SAFECLIP通过将CLIP损失应用于安全集并分别将单模态CL应用于风险集的图像和文本模态来预训练模型。在预训练过程中逐渐增加安全集的大小，SAFECLIP有效地打破了有针对性的数据毒害和后门攻击，而不会损害CLIP的性能。我们在CC3M、Visual Genome和MSCOCO上进行了大量实验，结果表明SAFECLIP将有针对性的数据毒害攻击的成功率从93.75%降低到0%，将各种后门攻击的成功率从高达100%降低到0%，同时不损害CLIP的性能。

更新时间: 2024-06-10 21:01:11

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2310.05862v2

PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals. Our experimental results suggest that PlanDQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as AntMaze, Kitchen, and Calvin as long-horizon tasks.

Updated: 2024-06-10 20:59:53

标题: PlanDQ：通过D-Conductor和Q-Performer实现的分层计划编排

摘要: 尽管离线RL技术最近取得了一些进展，但目前尚无统一的算法能够在广泛的任务范围内实现卓越性能。特别是离线值函数学习在稀疏奖励、长时间跨度任务中遇到困难，这是因为随着任务时间跨度增加，解决信用分配和外推误差变得困难。另一方面，能够在长时间跨度任务中表现良好的模型通常专门设计用于目标条件任务，但在短时间跨度、密集奖励场景中通常表现不如值函数学习方法。为了弥合这一差距，我们提出了一种用于离线RL的分层规划器PlanDQ。PlanDQ在高层引入了一种基于扩散的规划器，称为D-Conductor，用于指导低层策略实现子目标。在低层，我们使用了基于Q-learning的方法，称为Q-Performer，来实现这些子目标。我们的实验结果表明，PlanDQ在D4RL连续控制基准任务以及AntMaze、Kitchen和Calvin等长时间跨度任务上可以实现更优异或具有竞争力的性能。

更新时间: 2024-06-10 20:59:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06793v1

Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristic methods cannot uniformly handle different adversarial attacks and "teacher" network capacity. To solve this challenge, we propose a Reinforced Compressive Neural Architecture Search (RC-NAS) for Versatile Adversarial Robustness. Specifically, we define task settings that compose datasets, adversarial attacks, and teacher network information. Given diverse tasks, we conduct a novel dual-level training paradigm that consists of a meta-training and a fine-tuning phase to effectively expose the RL agent to diverse attack scenarios (in meta-training), and making it adapt quickly to locate a sub-network (in fine-tuning) for any previously unseen scenarios. Experiments show that our framework could achieve adaptive compression towards different initial teacher networks, datasets, and adversarial attacks, resulting in more lightweight and adversarially robust architectures.

Updated: 2024-06-10 20:59:52

标题: 增强型压缩神经架构搜索用于多功能对抗鲁棒性

摘要: 过去的神经架构搜索(NAS)针对对抗性鲁棒性的研究已经发现，在一个非鲁棒的大型教师网络中可能存在一个轻量级且对抗性强的神经网络架构，通常通过统计分析和神经架构搜索揭示的启发式规则进行披露。然而，启发式方法不能统一处理不同的对抗攻击和"教师"网络容量。为了解决这个挑战，我们提出了一种适用于多功能对抗性鲁棒性的强化压缩神经架构搜索(RC-NAS)。具体而言，我们定义了包括数据集、对抗攻击和教师网络信息在内的任务设置。在给定多样化的任务情境下，我们进行了一种新颖的双层训练范式，包括元训练和微调阶段，有效地让RL代理暴露于多样化的攻击情景(在元训练中)，并使其迅速适应于为任何之前未见过的情景定位子网络(在微调中)。实验证明，我们的框架能够实现对不同初始教师网络、数据集和对抗攻击的自适应压缩，从而产生更轻量级和对抗性强的架构。

更新时间: 2024-06-10 20:59:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06792v1

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular interest is the 2B parameter Gemma model, which provides opportunities to construct capable small-scale MMFMs. In line with findings from other papers in this space, we test the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone. The resulting models, which we call LLaVA-Gemma, exhibit moderate performance on an array of evaluations, but fail to improve past the current comparably sized SOTA models. Closer analysis of performance shows mixed effects; skipping pretraining tends to reduce performance, larger vision models sometimes improve performance, and increasing language model size has inconsistent effects. We publicly release training recipes, code and weights for our models for the LLaVA-Gemma models.

Updated: 2024-06-10 20:59:48

标题: LLaVA-Gemma：利用紧凑语言模型加速多模态基础模型

摘要: 我们使用最近发布的大型语言模型（LLMs）Gemma家族在流行的LLaVA框架中训练了一套多模态基础模型（MMFM）。特别值得关注的是具有2B参数的Gemma模型，它提供了构建功能强大的小型MMFMs的机会。与该领域其他论文的研究结果一致，我们测试了去除三个设计特征的影响：预训练连接器、利用更强大的图像骨干和增加语言骨干的大小。结果模型，我们称之为LLaVA-Gemma，在各种评估中表现出中等性能，但未能超越当前规模相当的SOTA模型。对性能的更详细分析显示出混合效果；跳过预训练往往会降低性能，较大的视觉模型有时会提高性能，而增加语言模型的大小会产生不一致的效果。我们公开发布了LLaVA-Gemma模型的训练配方、代码和权重。

更新时间: 2024-06-10 20:59:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.01331v2

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss (>10%). In this paper, we propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098x and 7.68x, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5x faster and 3.5x more energy-efficient training over status-quo approaches, and 2.23x smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms.

Updated: 2024-06-10 20:57:14

标题: 微小列车：资源感知任务自适应稀疏训练在数据稀缺边缘的深度神经网络

摘要: 设备上的训练对于用户个性化和隐私非常重要。随着物联网设备和微控制器单元（MCU）的普及，由于受限的内存和计算资源以及标记用户数据的有限可用性，这项任务变得更具挑战性。然而，先前的研究忽视了数据稀缺问题，需要过长的训练时间（例如几个小时），或导致重大的准确性损失（> 10%）。在本文中，我们提出了TinyTrain，一种在设备上训练的方法，通过选择性更新模型的部分并明确应对数据稀缺问题，大幅减少了训练时间。TinyTrain引入了一种任务自适应的稀疏更新方法，根据一个多目标标准动态选择要更新的层/通道，该标准同时捕捉用户数据、内存和目标设备的计算能力，从而在未知任务上实现高准确性，并减少计算和内存占用。TinyTrain在准确性方面比整个网络的普通微调提高了3.6-5.0%，同时将向后传递的内存和计算成本分别降低了高达1,098倍和7.68倍。针对广泛使用的现实世界边缘设备，TinyTrain相比现状方法实现了9.5倍更快和3.5倍更节能的训练，以及比SOTA方法小2.23倍的内存占用，同时保持在MCU级平台的1 MB内存限制范围内。

更新时间: 2024-06-10 20:57:14

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2307.09988v2

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner. Our codes, pretrained models, and audio samples are available at https://github.com/sony/soundctm.

Updated: 2024-06-10 20:49:58

标题: SoundCTM：将基于分数和一致性模型统一起来，用于文本到声音的生成

摘要: 声音内容是多媒体作品（如视频游戏、音乐和电影）中不可或缺的元素。最近高质量的扩散基声音生成模型可以作为创作者宝贵的工具。然而，尽管产生高质量的声音，这些模型通常受到推理速度慢的困扰。这一缺点使创作者负担沉重，他们通常通过试错来完善声音，以使其符合其艺术意图。为了解决这个问题，我们引入了声音一致性轨迹模型（SoundCTM）。我们的模型能够实现高质量的1步声音生成和通过多步生成获得更优异的声音质量之间的灵活过渡。这使得创作者可以在通过多步生成进行完善之前，最初通过1步样本来控制声音。虽然CTM基本上实现了灵活的1步和多步生成，但其出色的性能在很大程度上取决于额外的预训练特征提取器和对抗损失，这些训练成本高，并且在其他领域并不总是可用。因此，我们重新构建了CTM的训练框架，并通过利用教师网络进行蒸馏损失引入了一种新的特征距离。此外，我们训练无分类器引导轨迹的蒸馏过程中，同时训练条件和无条件的学生模型，并在推理过程中在这些模型之间进行插值。我们还提出了无需训练的可控制框架，利用其灵活的采样能力。SoundCTM实现了无需使用任何额外现成网络的有前景的1步和多步实时声音生成。此外，我们展示了SoundCTM在无需训练的情况下实现可控制声音生成的能力。我们的代码、预训练模型和音频样本可在https://github.com/sony/soundctm 上找到。

更新时间: 2024-06-10 20:49:58

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.18503v2

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Updated: 2024-06-10 20:49:54

标题: BTS：利用元数据桥接文本和声音模态以辅助呼吸音分类

摘要: 呼吸音分类(RSC)具有挑战性，主要是由于患者人口统计信息和录音环境的不同影响了各种声学特征。为了解决这个问题，我们引入了一种文本-音频多模态模型，利用呼吸音的元数据，为RSC提供有用的补充信息。具体来说，我们利用从声音样本的元数据中提取的自由文本描述微调预训练的文本-音频多模态模型，其中包括患者的性别和年龄、录音设备类型以及患者身体上的录音位置。我们的方法在ICBHI数据集上实现了最先进的性能，超过了之前最佳结果1.17%的显著边界。这一结果验证了利用元数据和呼吸音样本来增强RSC性能的有效性。此外，我们还研究了在部分元数据不可用的情况下模型的性能，这可能在现实世界的临床环境中发生。

更新时间: 2024-06-10 20:49:54

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.06786v1

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding communication latency and other inherent at-scale inefficiencies, we introduce an agile performance modeling framework, MAD-Max. This framework is designed to optimize parallelization strategies and facilitate hardware-software co-design opportunities. Through the application of MAD-Max to a suite of real-world large-scale ML models on state-of-the-art GPU clusters, we showcase potential throughput enhancements of up to 2.24x for pre-training and up to 5.2x for inference scenarios, respectively.

Updated: 2024-06-10 20:31:07

标题: MAD Max超越单节点：在分布式系统上实现大型机器学习模型加速

摘要: 训练和部署大规模机器学习模型是耗时的，需要大量的分布式计算基础设施，并且会产生高昂的运营成本。我们的分析基于在数据中心规模基础设施上进行的实际大型模型训练，发现了14~32%的GPU小时用于通信，没有重叠的计算。为了最小化这种突出的通信延迟和其他固有的大规模效率低下问题，我们引入了一个灵活的性能建模框架，MAD-Max。该框架旨在优化并行化策略，并促进硬件软件协同设计机会。通过将MAD-Max应用于一套最新GPU集群上的真实大规模机器学习模型，我们展示了预训练和推理场景的潜在吞吐量提升，分别达到了最高2.24倍和5.2倍。

更新时间: 2024-06-10 20:31:07

领域: cs.DC,cs.AR,cs.LG

下载: http://arxiv.org/abs/2310.02784v3

Social Environment Design

Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making.

Updated: 2024-06-10 20:25:45

标题: 社会环境设计

摘要: 人工智能（AI）有望作为一种技术，可用于改善政府和经济政策制定。本文提出了一个新的研究议程，引入了社会环境设计的概念，这是一个利用AI进行自动政策制定的通用框架，与强化学习、计算经济学和计算社会选择社区相连接。该框架旨在捕捉一般的经济环境，包括对政策目标的投票，并为通过AI模拟进行政府和经济政策的系统分析提供方向。我们强调了AI政策制定未来研究中的关键开放问题。通过解决这些挑战，我们希望实现各种社会福利目标，从而促进更具道德和负责任的决策制定。

更新时间: 2024-06-10 20:25:45

领域: cs.AI,econ.GN,q-fin.EC,stat.ML

下载: http://arxiv.org/abs/2402.14090v2

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equipping them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM.

Updated: 2024-06-10 20:25:18

标题: MolX: 利用多模态扩展增强大型语言模型以进行分子学习

摘要: 最近，大型语言模型（LLMs）以其强大的任务处理能力在各个领域取得了显著进展，超越了自然语言理解。然而，在化学领域，它们的熟练程度仍然受限，特别是在解决与专业分子相关的任务方面。这一挑战归因于它们仅使用常见文本表示，即SMILES字符串，来理解分子的固有限制。在这项研究中，我们试图通过设计并为它们配备一个多模态外部模块MolX来增强LLMs理解分子的能力。具体而言，我们使用特定编码器从SMILES字符串和2D分子图表示中提取细粒度特征，以供LLM使用，而不是直接使用SMILES字符串来表示分子。此外，还引入了人工定义的分子指纹，以利用其嵌入的领域知识。然后，为了建立MolX与LLM的文本输入空间之间的对齐，整个模型（其中LLM被冻结）使用多样化的任务进行了预训练。广泛的实验评估表明，我们提出的方法仅引入了少量可训练参数，却在各种下游与分子相关的任务中表现优异，从分子到文本的翻译到合成反应，无论是否对LLM进行微调。

更新时间: 2024-06-10 20:25:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06777v1

SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models

SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in the previous fifty. Modern commercial satellites now offer up to 100 times the spatial resolution of public access satellites. However, the high cost and limited historical availability of commercial satellite imagery is a barrier to the training of foundational models, impacting what images can be used during inference. The SeeFar dataset represents a step towards training models that are satellite-agnostic by combining multi-resolution commercial and public access pre-processed images. This will enable users to utilize historical data alongside higher-resolution, more expensive satellite imagery, offering greater flexibility during inference. To achieve this, we describe a process for standardizing data from diverse satellite sources, normalizing different data formats, and aligning spectral bands to enhance interoperability. The SeeFar dataset includes images at a resolution of 384x384 pixels, spanning four spectral bands (Blue, Green, Red, and Near-Infrared) and expanding spatial resolutions (starting with 30, 10, 1.5, and 1.0 meters), all in cloud-optimized GeoTIFF format. It also provides consistent and comprehensive metadata to enhance data transparency and reliability. By aggregating data from multiple sources, SeeFar makes processed and consistent satellite data accessible to a wider range of users - from researchers to policymakers - fostering competition and innovation in satellite imagery analysis. The dataset is available at \url{coastalcarbon.ai/seefar}.

Updated: 2024-06-10 20:24:14

标题: SeeFar：用于地理空间基础模型的卫星不可知多分辨率数据集

摘要: SeeFar是一个不断发展的多分辨率卫星图像数据集，来自公共和商业卫星。我们专门为训练地理空间基础模型精心策划了这个数据集，不受卫星类型限制。近年来，技术的进步使卫星图像比以往任何时候都更易获取。在过去的五年里，发射的地球观测卫星比之前五十年还要多。现代商业卫星现在提供的空间分辨率是公共卫星的100倍。然而，商业卫星图像的高成本和有限的历史可用性是训练基础模型的障碍，影响了推理过程中可以使用的图像。SeeFar数据集代表了朝着培训不受卫星限制的模型迈出的一步，通过结合多分辨率商业和公共访问预处理的图像。这将使用户能够在推理过程中利用历史数据以及更昂贵的高分辨率卫星图像，提供更大的灵活性。为了实现这一目标，我们描述了一个标准化来自不同卫星来源的数据、规范化不同数据格式和对齐光谱波段以增强互操作性的过程。SeeFar数据集包括384x384像素分辨率的图像，涵盖四个光谱波段（蓝色、绿色、红色和近红外）和不断扩大的空间分辨率（从30、10、1.5和1.0米开始），全部采用优化云存储的GeoTIFF格式。它还提供一致和全面的元数据，以增强数据的透明度和可靠性。通过聚合来自多个来源的数据，SeeFar使处理过的一致卫星数据对更广泛的用户（从研究人员到决策者）可访问，促进卫星图像分析中的竞争和创新。该数据集可在\url{coastalcarbon.ai/seefar}上获取。

更新时间: 2024-06-10 20:24:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06776v1

Evaluating Zero-Shot Long-Context LLM Compression

This study evaluates the effectiveness of zero-shot compression techniques on large language models (LLMs) under long-context. We identify the tendency for computational errors to increase under long-context when employing certain compression methods. We propose a hypothesis to explain the varied behavior of different LLM compression techniques and explore remedies to mitigate the performance decline observed in some techniques under long-context. This is a course report for COS 598D Machine Learning and Systems by Prof. Kai Li at Princeton University. Due to limited computational resources, our experiments were conducted only on LLaMA-2-7B-32K.

Updated: 2024-06-10 20:19:55

标题: 评估零-shot 长文本 LLM 压缩

摘要: 本研究评估了零样本压缩技术在大型语言模型（LLMs）长上下文下的有效性。我们发现，在使用某些压缩方法时，计算错误在长上下文下有增加的趋势。我们提出了一个假设来解释不同LLM压缩技术的不同行为，并探讨缓解在长上下文下观察到的一些技术性能下降的方法。这是普林斯顿大学Kai Li教授主持的COS 598D机器学习与系统的课程报告。由于计算资源有限，我们的实验仅在LLaMA-2-7B-32K上进行。

更新时间: 2024-06-10 20:19:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06773v1

DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. DISCOVERYWORLD contains a variety of different challenges, covering topics as diverse as radioisotope dating, rocket science, and proteomics, to encourage development of general discovery skills rather than task-specific solutions. DISCOVERYWORLD itself is an inexpensive, simulated, text-based environment (with optional 2D visual overlay). It includes 120 different challenge tasks, spanning eight topics each with three levels of difficulty and several parametric variations. Each task requires an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. DISCOVERYWORLD further provides three automatic metrics for evaluating performance, based on (a) task completion, (b) task-relevant actions taken, and (c) the discovered explanatory knowledge. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks, suggesting that DISCOVERYWORLD captures some of the novel challenges of discovery, and thus that DISCOVERYWORLD may help accelerate near-term development and assessment of scientific discovery competency in agents. Code available at: www.github.com/allenai/discoveryworld

Updated: 2024-06-10 20:08:44

标题: 《DISCOVERYWORLD：用于开发和评估自动科学发现代理的虚拟环境》

摘要: 自动科学发现承诺加速各个科学领域的进展。然而，开发和评估人工智能代理进行端到端科学推理的能力具有挑战性，因为运行真实世界实验通常是昂贵或不可行的。在这项工作中，我们介绍了DISCOVERYWORLD，这是第一个用于开发和基准测试代理执行完整的新科学发现循环能力的虚拟环境。DISCOVERYWORLD包含各种不同的挑战，涵盖了广泛的主题，如放射性同位素定年、火箭科学和蛋白质组学，以鼓励发展通用的发现技能而非任务特定的解决方案。DISCOVERYWORLD本身是一个廉价、模拟、基于文本的环境（可选择2D视觉叠加）。它包括120个不同的挑战任务，涵盖了八个主题，每个主题有三个不同的难度级别和几个参数化变化。每个任务都要求代理提出假设、设计和运行实验、分析结果并根据结论采取行动。DISCOVERYWORLD进一步提供了三个自动评估性能的指标，基于（a）任务完成情况，（b）采取的与任务相关的行动，以及（c）发现的解释性知识。我们发现，在先前发布的环境中表现良好的强基线代理在大多数DISCOVERYWORLD任务中都很困难，这表明DISCOVERYWORLD捕捉到了一些新型发现的挑战，并且DISCOVERYWORLD可能有助于加速代理在科学发现能力的近期发展和评估。代码可在以下网址找到：www.github.com/allenai/discoveryworld

更新时间: 2024-06-10 20:08:44

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06769v1

Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs

We study the design and analysis of switchback experiments conducted on a single aggregate unit. The design problem is to partition the continuous time space into intervals and switch treatments between intervals, in order to minimize the estimation error of the treatment effect. We show that the estimation error depends on four factors: carryover effects, periodicity, serially correlated outcomes, and impacts from simultaneous experiments. We derive a rigorous bias-variance decomposition and show the tradeoffs of the estimation error from these factors. The decomposition provides three new insights in choosing a design: First, balancing the periodicity between treated and control intervals reduces the variance; second, switching less frequently reduces the bias from carryover effects while increasing the variance from correlated outcomes, and vice versa; third, randomizing interval start and end points reduces both bias and variance from simultaneous experiments. Combining these insights, we propose a new empirical Bayes design approach. This approach uses prior data and experiments for designing future experiments. We illustrate this approach using real data from a ride-sharing platform, yielding a design that reduces MSE by 33% compared to the status quo design used on the platform.

Updated: 2024-06-10 20:06:53

标题: 数据驱动的回头实验：理论权衡和经验贝叶斯设计

摘要: 我们研究了在单一聚合单元上进行的交叉试验的设计和分析。设计问题是将连续时间空间划分为间隔，并在间隔之间切换处理，以最小化处理效应的估计误差。我们表明，估计误差取决于四个因素：残留效应、周期性、序贯相关的结果以及同时进行实验的影响。我们推导了严格的偏差-方差分解，并展示了这些因素对估计误差的权衡。分解提供了三个选择设计的新见解：首先，平衡处理和对照间隔之间的周期性可减少方差；其次，较少频繁地切换可减少残留效应的偏差，同时增加来自相关结果的方差，反之亦然；第三，随机化间隔的起始和结束点可减少同时进行实验的偏差和方差。结合这些见解，我们提出了一种新的经验贝叶斯设计方法。该方法利用先前的数据和实验来设计未来的实验。我们利用来自一个拼车平台的真实数据来说明这种方法，得到了一种设计，与该平台上使用的现状设计相比，将均方误差降低了33%。

更新时间: 2024-06-10 20:06:53

领域: stat.ME,cs.LG,econ.EM,q-bio.QM

下载: http://arxiv.org/abs/2406.06768v1

Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R\'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.

Updated: 2024-06-10 19:44:02

标题: 统一机制特定的子抽样和群隐私放大

摘要: 子采样放大是具有差分隐私（DP）的机器学习中的主要基元之一：在随机批次而不是完整数据集上训练模型会产生更强的隐私保护。传统上，这是通过机制不可知的子采样保证来形式化的，这些保证表达了子采样机制的隐私参数作为原始机制隐私参数的函数。我们提出了第一个用于推导机制特定保证的通用框架，利用除这些参数之外的额外信息，更紧密地表征了子采样机制的隐私。这种保证对于隐私核算尤为重要，即跟踪多次迭代中的隐私。总的来说，我们基于条件最优传输的框架让我们以统一、有原则的方式推导现有和新颖的近似DP保证，考虑Rényi DP核算和支配对核算。作为应用，我们分析了子采样如何影响多个用户组的隐私。我们的紧密机制特定边界优于紧密的机制不可知边界和经典的群体隐私结果。

更新时间: 2024-06-10 19:44:02

领域: cs.CR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.04867v2

Scalable Private Search with Wally

This paper presents Wally, a private search system that supports efficient semantic and keyword search queries against large databases. When sufficient clients are making the queries, Wally performance is significantly better than previous systems. In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry. As a result, performance degraded proportionally with the number of entries in the database. In Wally we get rid of this limitation. Specifically, for each query the server performs cryptographic operations only against a few database entries. We achieve these results by requiring each client to add a few fake queries, and sends each query via an anonymous network to the server at independently chosen random instants. Additionally, each client also uses somewhat homomorphic encryption (SHE) to hide whether a query is real or fake, Wally provides $(\epsilon, \delta)$-differential privacy guarantee, which is an accepted standard for strong privacy. The number of fake queries each client makes depends inversely on the number of clients making queries. Therefore, the fake queries' overhead vanishes as the number of clients increases, enabling scalability to millions of queries and large databases. Concretely, Wally can serve $8$M requests at a rate of 3,000 queries per second. That is around 60x higher than the state-of-the-art scheme.

Updated: 2024-06-10 19:41:25

标题: 使用Wally实现可扩展的私人搜索

摘要: 这篇论文介绍了Wally，一个支持对大型数据库进行高效语义和关键字搜索查询的私人搜索系统。当有足够多的客户端进行查询时，Wally的性能显著优于先前的系统。在先前的私人搜索系统中，对于每个客户端查询，服务器必须对每个数据库条目执行至少一个昂贵的密码操作。因此，性能随着数据库条目数量的增加而降低。在Wally中，我们摆脱了这种限制。具体来说，对于每个查询，服务器只对少数数据库条目执行密码操作。我们通过要求每个客户端添加几个虚假查询，并通过匿名网络以独立选择的随机时刻将每个查询发送到服务器来实现这些结果。此外，每个客户端还使用了略微同态加密（SHE）来隐藏查询是真实还是虚假，Wally提供了$(\epsilon,\delta)$-差分隐私保证，这是强隐私的接受标准。每个客户端制造的虚假查询数量与进行查询的客户端数量成反比。因此，随着客户端数量的增加，虚假查询的开销消失，实现了对数百万查询和大型数据库的可伸缩性。具体而言，Wally可以以每秒3,000个查询的速率处理800万请求。这是现有方案的大约60倍。

更新时间: 2024-06-10 19:41:25

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2406.06761v1

Proving membership in LLM pretraining data via data watermarks

Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem. This work proposes using data watermarks to enable principled detection with only black-box model access, provided that the rightholder contributed multiple training documents and watermarked them before public release. By applying a randomly sampled data watermark, detection can be framed as hypothesis testing, which provides guarantees on the false detection rate. We study two watermarks: one that inserts random sequences, and another that randomly substitutes characters with Unicode lookalikes. We first show how three aspects of watermark design -- watermark length, number of duplications, and interference -- affect the power of the hypothesis test. Next, we study how a watermark's detection strength changes under model and dataset scaling: while increasing the dataset size decreases the strength of the watermark, watermarks remain strong if the model size also increases. Finally, we view SHA hashes as natural watermarks and show that we can robustly detect hashes from BLOOM-176B's training data, as long as they occurred at least 90 times. Together, our results point towards a promising future for data watermarks in real world use.

Updated: 2024-06-10 19:39:34

标题: 通过数据水印证明LLM预训练数据的成员身份

摘要: 检测LLM预训练中是否使用了版权持有者的作品即将成为一个重要问题。本文提出使用数据水印来实现基于黑盒模型访问的原则性检测，前提是版权持有者在公开发布之前贡献了多个训练文档并对其进行了水印处理。通过应用随机抽样的数据水印，检测可以被构建为假设检验，从而在错误检测率上提供保证。我们研究了两种水印：一种插入随机序列，另一种随机替换字符为Unicode相似字符。我们首先展示了水印设计的三个方面 -- 水印长度、重复次数和干扰 -- 如何影响假设检验的能力。接下来，我们研究了在模型和数据集扩展下水印检测强度的变化：增加数据集大小会降低水印的强度，但如果模型大小也增加，水印仍然保持强度。最后，我们将SHA哈希视为自然水印，并展示我们可以稳健地检测出BLOOM-176B的训练数据中的哈希，只要它们至少出现了90次。综合而言，我们的结果指向数据水印在现实世界使用中具有光明前景。

更新时间: 2024-06-10 19:39:34

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.10892v2

Decentralized Reliability Estimation for Mixnets

Continuous-time decryption mixnets can anonymously route data packets with end to end latency that can be as low as a second, making them usable for a variety of applications. Such mixnets however lack verifiable reliability properties that ensure the correct processing and delivery of packets, while existing verifiability mechanisms are incompatible with scalable low latency continuous-time mixnets due to imposing overheads measuring in minutes to hours. This work addresses this gap by proposing a scheme that can estimate reliability scores for links and nodes forming a continuous-time mixnet where some form of credentials authorize clients to send traffic. The scores can be computed publicly by all participants from a set of measurement packets that are eventually revealed and act as a random sample of the traffic, without affecting mixnet transmission latency for client packets. Our scheme relies on VRF-based routing, a novel primitive that ensures that legitimate client packets follow the routing policy of the mixnet, as well as randomly generating unforgeable measurement packets. We experimentally validate our construction both in unreliable and adversarial settings, demonstrating its feasibility.

Updated: 2024-06-10 19:38:04

标题: 混合网络的去中心化可靠性估计

摘要: 连续时间解密混合网络可以匿名地路由数据包，端到端延迟可以低至一秒，使其适用于各种应用。然而，这种混合网络缺乏可验证的可靠性属性，无法确保数据包的正确处理和传递，而现有的可验证机制与可扩展的低延迟连续时间混合网络不兼容，因为会带来以分钟计量或以小时计量的开销。本文通过提出一种方案来解决这一问题，该方案可以估计形成连续时间混合网络的链路和节点的可靠性评分，其中某种形式的凭据授权客户端发送流量。这些评分可以由所有参与者根据一组最终揭示的测量数据包公开计算，并作为流量的随机样本，而不会影响客户端数据包的混合网络传输延迟。我们的方案依赖于基于VRF的路由，这是一种确保合法客户端数据包遵循混合网络路由策略，并随机生成不可伪造测量数据包的新型基元。我们在不可靠和对抗性环境中对我们的构建进行了实验验证，证明了其可行性。

更新时间: 2024-06-10 19:38:04

领域: cs.CR

下载: http://arxiv.org/abs/2406.06760v1

Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints

This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered, and optimal rates of convergence over the Besov spaces are established. Distributed privacy-preserving estimators are proposed and their risk properties are investigated. Matching minimax lower bounds, up to a logarithmic factor, are established for both global and pointwise estimation. Together, these findings shed light on the tradeoff between statistical accuracy and privacy preservation. In particular, we characterize the compromise not only in terms of the privacy budget but also concerning the loss incurred by distributing data within the privacy framework as a whole. This insight captures the folklore wisdom that it is easier to retain privacy in larger samples, and explores the differences between pointwise and global estimation under distributed privacy constraints.

Updated: 2024-06-10 19:34:07

标题: 异构分布式差分隐私约束下的非参数回归最优联邦学习

摘要: 这篇论文研究了在分布在不同服务器上的样本之间进行联邦学习的非参数回归问题，每个服务器都遵循不同的差分隐私约束。我们考虑的设置是异构的，涵盖了不同的样本大小和服务器之间的差分隐私约束。在这个框架内，同时考虑了全局和逐点估计，并建立了在Besov空间上的最优收敛速度。提出了分布式隐私保护估计器，并研究了它们的风险特性。对于全局和逐点估计，我们建立了匹配的极小下界，最多相差一个对数因子。总的来说，这些发现揭示了统计准确性和隐私保护之间的权衡。特别地，我们不仅在隐私预算方面表征了妥协，还考虑了在整个隐私框架内分发数据所造成的损失。这一洞察力捕捉了这样一种智慧，即在更大的样本中更容易保持隐私，并探讨了在分布式隐私约束下逐点估计和全局估计之间的差异。

更新时间: 2024-06-10 19:34:07

领域: math.ST,cs.LG,stat.ML,stat.TH,62G08, 62C20, 68P27, 62F30,

下载: http://arxiv.org/abs/2406.06755v1

Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients

This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery. Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness, and potentially meet tail barriers, which can zero out the policy gradient and cause inefficient model updates. To overcome these limitations, we use transformers in conjunction with breadth-first-search to improve the learning performance. We use Bayesian information criterion (BIC) as the reward function to explicitly account for the expression complexity and optimize the trade-off between interpretability and data fitness. We propose a modified risk-seeking policy that not only ensures the unbiasness of the gradient, but also removes the tail barriers, thus ensuring effective updates from top performers. Through a series of benchmarks and systematic experiments, we demonstrate the advantages of our approach.

Updated: 2024-06-10 19:29:10

标题: 具有鲁棒风险寻求策略梯度的复杂感知深度符号回归

摘要: 本文提出了一种新颖的深度符号回归方法，旨在增强数据驱动数学表达式发现的鲁棒性和可解释性。尽管最先进方法DSR取得了成功，但它是基于循环神经网络构建的，纯粹受数据适应性指导，并且可能遇到尾部障碍，这可能会使政策梯度归零并导致模型更新低效。为了克服这些限制，我们将transformers与广度优先搜索结合使用以提高学习性能。我们使用贝叶斯信息准则（BIC）作为奖励函数，明确考虑表达式复杂性并优化解释性和数据适应性之间的权衡。我们提出了一种修改后的风险寻求策略，不仅确保梯度的无偏性，还消除尾部障碍，从而确保来自表现最佳者的有效更新。通过一系列基准测试和系统实验，我们展示了我们方法的优势。

更新时间: 2024-06-10 19:29:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06751v1

Toward Constraint Compliant Goal Formulation and Planning

One part of complying with norms, rules, and preferences is incorporating constraints (such as knowledge of ethics) into one's goal formulation and planning processing. We explore in a simple domain how the encoding of knowledge in different ethical frameworks influences an agent's goal formulation and planning processing and demonstrate ability of an agent to satisfy and satisfice when its collection of relevant constraints includes a mix of "hard" and "soft" constraints of various types. How the agent attempts to comply with ethical constraints depends on the ethical framing and we investigate tradeoffs between deontological framing and utilitarian framing for complying with an ethical norm. Representative scenarios highlight how performing the same task with different framings of the same norm leads to different behaviors. Our explorations suggest an important role for metacognitive judgments in resolving ethical conflicts during goal formulation and planning.

Updated: 2024-06-10 19:26:05

标题: 朝向遵循约束的目标制定和规划

摘要: 遵守规范、规则和偏好的一部分是将约束（如道德知识）纳入个人的目标制定和规划处理中。我们在一个简单的领域中探讨了不同道德框架中知识的编码如何影响代理人的目标制定和规划处理，并展示了代理人在相关约束集合中包含各种类型的“硬”和“软”约束时实现满足和满足的能力。代理人如何尝试遵守道德约束取决于道德框架，我们研究了为遵守道德规范而进行义务论框架和功利主义框架之间的权衡。代表性场景突显了以不同框架执行相同任务导致不同行为的情况。我们的探索表明，在目标制定和规划过程中解决道德冲突时，元认知判断发挥着重要作用。

更新时间: 2024-06-10 19:26:05

领域: cs.AI,I.2.11; I.2.8

下载: http://arxiv.org/abs/2405.12862v2

Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bounds, up to a logarithmic factor, on the minimax separation rate. This optimal rate serves as a benchmark for the difficulty of the testing problem, factoring in model characteristics such as the number of observations, noise level, and regularity of the signal class, along with the strictness of the $(\epsilon,\delta)$-DP requirement. The results demonstrate interesting and novel phase transition phenomena. Furthermore, the results reveal an interesting phenomenon that distributed one-shot protocols with access to shared randomness outperform those without access to shared randomness. We also construct a data-driven testing procedure that possesses the ability to adapt to an unknown regularity parameter over a large collection of function classes with minimal additional cost, all while maintaining adherence to the same set of DP constraints.

Updated: 2024-06-10 19:25:19

标题: 在具有差分隐私约束的联合非参数假设检验中：最佳速率和自适应测试

摘要: 联邦学习近年来引起了广泛关注，因为它适用于在不同地点收集和分析数据的各种情境。本文研究了在分布式差分隐私（DP）约束下的白噪声漂移模型中的联邦非参数拟合优度检验。我们首先建立了匹配的最小最大分离率的下限和上限，最多相差一个对数因子。这个最优率作为测试问题的难度基准，考虑了模型特征，如观测数量、噪声水平和信号类的正则性，以及$(\epsilon,\delta)$-DP要求的严格性。结果展示了有趣且新颖的相变现象。此外，结果揭示了一个有趣的现象，即具有共享随机性的分布式一次性协议优于没有共享随机性的协议。我们还构建了一个数据驱动的测试程序，能够在保持遵守相同一组DP约束的同时，适应大量函数类的未知正则性参数，而且成本增加很少。

更新时间: 2024-06-10 19:25:19

领域: math.ST,cs.LG,stat.ML,stat.TH,62G10, 62C20, 68P27, 62F30

下载: http://arxiv.org/abs/2406.06749v1

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that absorbs their expertise. Our method integrates techniques of multi-task learning, continual learning, and distillation. Further, it demands significantly less computational cost compared to traditional multi-task training from scratch, and it only needs a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we obtain SAM-CLIP: a unified model that combines the capabilities of SAM and CLIP into a single vision transformer. Compared with deploying SAM and CLIP independently, our merged model, SAM-CLIP, reduces storage and compute costs for inference, making it well-suited for edge device applications. We show that SAM-CLIP not only retains the foundational strengths of SAM and CLIP, but also introduces synergistic functionalities, notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.

Updated: 2024-06-10 19:19:16

标题: SAM-CLIP：将视觉基础模型合并为语义和空间理解

摘要: 公开可用的视觉基础模型（VFMs）的领域，例如CLIP和Segment Anything Model（SAM），正在迅速扩展。VFMs具有不同的能力，源自它们的预训练目标。例如，CLIP在语义理解方面表现出色，而SAM专注于分割的空间理解。在这项工作中，我们介绍了一种简单的方法，可以高效地将VFMs合并成一个统一的模型，吸收它们的专业知识。我们的方法集成了多任务学习、持续学习和蒸馏技术。此外，与传统的从头开始进行多任务训练相比，我们的方法需要显著更少的计算成本，而且只需要一小部分最初用于训练单独模型的预训练数据集。通过将我们的方法应用于SAM和CLIP，我们获得了SAM-CLIP：一个统一的模型，将SAM和CLIP的能力合并成一个视觉变压器。与独立部署SAM和CLIP相比，我们合并的模型SAM-CLIP减少了推断的存储和计算成本，使其非常适合边缘设备应用。我们展示了SAM-CLIP不仅保留了SAM和CLIP的基本优势，而且还引入了协同功能，特别是在零样本语义分割方面，在5个基准测试中，SAM-CLIP取得了新的最先进结果。它大大优于以前专门为此任务设计的模型，分别在Pascal-VOC和COCO-Stuff数据集上分别提高了+6.8%和+5.9%的平均IoU。

更新时间: 2024-06-10 19:19:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.15308v4

Multi-Objective Neural Architecture Search for In-Memory Computing

In this work, we employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing (IMC) architectures. Initially, we design three fundamental components inspired by the convolutional layers found in VGG and ResNet models. Subsequently, we utilize Bayesian optimization to construct a convolutional neural network (CNN) model with adaptable depths, employing these components. Through the Bayesian search algorithm, we explore a vast search space comprising over 640 million network configurations to identify the optimal solution, considering various multi-objective cost functions like accuracy/latency and accuracy/energy. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets, demonstrating the effectiveness of our method in achieving a balanced solution characterized by high accuracy and reduced latency and energy consumption.

Updated: 2024-06-10 19:17:09

标题: 多目标神经架构搜索用于内存计算

摘要: 在这项工作中，我们利用神经架构搜索（NAS）来增强在内存计算（IMC）架构上部署多样化机器学习（ML）任务的效率。最初，我们设计了受VGG和ResNet模型中卷积层启发的三个基本组件。随后，我们利用贝叶斯优化构建了一个具有可调深度的卷积神经网络（CNN）模型，使用这些组件。通过贝叶斯搜索算法，我们探索了一个庞大的搜索空间，包括超过6.4亿个网络配置，以确定最佳解决方案，考虑各种多目标成本函数，如准确率/延迟和准确率/能耗。我们对这种NAS方法在IMC架构部署中的评估涵盖了三个不同的图像分类数据集，展示了我们方法在实现高准确率和减少延迟和能耗的平衡解决方案方面的有效性。

更新时间: 2024-06-10 19:17:09

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2406.06746v1

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Updated: 2024-06-10 19:08:03

标题: 梦境图：从场景图中合成的组合式3D场景

摘要: 随着预训练的文本到图像扩散模型变得越来越强大，最近开始努力从这些文本到图像预训练模型中提炼知识，以优化文本引导的3D模型。大多数现有方法都是从简单的文本输入生成整体性的3D模型。当文本描述一个具有多个对象的复杂场景时，这可能会出现问题，因为向量化的文本嵌入无法本质上捕捉具有多个实体和关系的复杂描述。整体性的3D建模进一步阻止了对文本实体和概念的准确基础。为了解决这一限制，我们提出了GraphDreamer，一个新颖的框架，用于从场景图中生成组合式3D场景，其中对象表示为节点，它们的交互作用表示为边。通过利用场景图中的节点和边信息，我们的方法更好地利用了预训练的文本到图像扩散模型，并能够完全解离不同的对象而无需图像级别的监督。为了便于对物体间关系进行建模，我们使用有符号距离场作为表示，并施加约束以避免物体的相互穿透。为了避免手动创建场景图，我们设计了一个文本提示，让ChatGPT基于文本输入生成场景图。我们进行了定性和定量实验，验证了GraphDreamer在生成具有解离对象实体的高保真组合式3D场景方面的有效性。

更新时间: 2024-06-10 19:08:03

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2312.00093v2

Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning

Leveraging generative Artificial Intelligence (AI), we have transformed a dataset comprising 1,000 scientific papers into an ontological knowledge graph. Through an in-depth structural analysis, we have calculated node degrees, identified communities and connectivities, and evaluated clustering coefficients and betweenness centrality of pivotal nodes, uncovering fascinating knowledge architectures. The graph has an inherently scale-free nature, is highly connected, and can be used for graph reasoning by taking advantage of transitive and isomorphic properties that reveal unprecedented interdisciplinary relationships that can be used to answer queries, identify gaps in knowledge, propose never-before-seen material designs, and predict material behaviors. We compute deep node embeddings for combinatorial node similarity ranking for use in a path sampling strategy links dissimilar concepts that have previously not been related. One comparison revealed structural parallels between biological materials and Beethoven's 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. In another example, the algorithm proposed a hierarchical mycelium-based composite based on integrating path sampling with principles extracted from Kandinsky's 'Composition VII' painting. The resulting material integrates an innovative set of concepts that include a balance of chaos/order, adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across science, technology and art, revealing a nuanced ontology of immanence that reveal a context-dependent heterarchical interplay of constituents. Graph-based generative AI achieves a far higher degree of novelty, explorative capacity, and technical detail, than conventional approaches and establishes a widely useful framework for innovation by revealing hidden connections.

Updated: 2024-06-10 19:06:26

标题: 用生成式知识提取、基于图的表示和多模态智能图推理加速科学发现

摘要: 借助生成式人工智能（AI），我们将一个包含1,000篇科学论文的数据集转化为一个本体知识图谱。通过深入的结构分析，我们计算了节点度，确定了社区和连接性，并评估了关键节点的聚类系数和介数中心性，揭示了迷人的知识架构。该图谱具有固有的无标度特性，连接性强，可以利用传递性和同构性质进行图推理，揭示前所未见的跨学科关系，用于回答查询，识别知识空白，提出前所未见的材料设计，并预测材料行为。我们计算深层节点嵌入以用于组合节点相似性排序，用于路径抽样策略，链接之前未相关的不同概念。一个比较揭示了生物材料和贝多芬第九交响曲之间的结构相似之处，通过同构映射突显了复杂性的共享模式。在另一个例子中，算法提出了基于分层菌丝体的复合材料，该材料整合了路径抽样和从康定斯基的“第七号构图”绘画中提取的原则。结果材料融合了一组创新概念，包括混沌/秩序的平衡，可调的多孔性，机械强度和复杂的图案化化学功能化。我们揭示了科学、技术和艺术领域之间的其他同构性，揭示了一个涵盖性的内在本体论，揭示了一个依赖于背景的异等交互作用的复杂结构。基于图的生成式AI实现了比传统方法更高的新颖性、探索能力和技术细节，并通过揭示隐藏的连接建立了一个广泛有用的创新框架。

更新时间: 2024-06-10 19:06:26

领域: cs.LG,cond-mat.mes-hall,cond-mat.mtrl-sci,cond-mat.soft,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.11996v3

Eliciting Problem Specifications via Large Language Models

Cognitive systems generally require a human to translate a problem definition into some specification that the cognitive system can use to attempt to solve the problem or perform the task. In this paper, we illustrate that large language models (LLMs) can be utilized to map a problem class, defined in natural language, into a semi-formal specification that can then be utilized by an existing reasoning and learning system to solve instances from the problem class. We present the design of LLM-enabled cognitive task analyst agent(s). Implemented with LLM agents, this system produces a definition of problem spaces for tasks specified in natural language. LLM prompts are derived from the definition of problem spaces in the AI literature and general problem-solving strategies (Polya's How to Solve It). A cognitive system can then use the problem-space specification, applying domain-general problem solving strategies ("weak methods" such as search), to solve multiple instances of problems from the problem class. This result, while preliminary, suggests the potential for speeding cognitive systems research via disintermediation of problem formulation while also retaining core capabilities of cognitive systems, such as robust inference and online learning.

Updated: 2024-06-10 19:05:57

标题: 通过大型语言模型引发问题规范

摘要: 认知系统通常需要人类将问题定义转化为某种规范，以便认知系统可以尝试解决问题或执行任务。在本文中，我们阐述了大型语言模型（LLMs）可以被用来将自然语言中定义的问题类映射为半正式规范，然后可以被现有的推理和学习系统利用来解决来自问题类的实例。我们展示了LLM能力的认知任务分析代理的设计。通过LLM代理实现，该系统可以为用自然语言指定的任务产生问题空间的定义。LLM提示来源于AI文献和一般问题解决策略（Polya的《如何解决问题》）中的问题空间定义。认知系统可以利用问题空间规范，应用领域通用的问题解决策略（例如搜索等“弱方法”）来解决来自问题类的多个问题实例。尽管这一结果还处于初步阶段，但它表明通过去中介化问题制定的潜力，同时保留认知系统的核心能力，如强大的推理和在线学习，可以加速认知系统研究的发展。

更新时间: 2024-06-10 19:05:57

领域: cs.AI,cs.CL,I.2.11; I.2.7

下载: http://arxiv.org/abs/2405.12147v2

A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised training process misguided by FLI in an unsupervised manner. In MMR, a supervised classification module and an unsupervised clustering module are alternatively trained to improve the clustering friendliness of representation leaning, thereby achieving accurate clustering assignments. Leveraging the clustering assignments, we construct a training label corrector to rectify the injected false labels and progressively enhance robustness and resilience against FLI. However, there is still a gap on accuracy and convergence speed between MMR and FLI-free deep TSA models. To narrow this gap, we further propose a human-in-the-loop training strategy, named MMR-HIL. In MMR-HIL, potential false samples can be detected by modeling the training loss with a Gaussian distribution. From these samples, the most likely false samples and most ambiguous samples are re-labeled by a TSA experts guided bi-directional annotator and then subjected to penalized optimization, aimed at improving accuracy and convergence speed. Extensive experiments indicate that MMR and MMR-HIL both exhibit powerful robustness against FLI in TSA performance. Moreover, the contaminated labels can also be effectively corrected, demonstrating superior resilience of the proposed methods.

Updated: 2024-06-10 19:05:21

标题: 一个多模块的稳定性评估鲁棒方法，用于抵御虚假标签注入网络攻击

摘要: 深度学习在瞬态稳定性评估（TSA）中的成功严重依赖于高质量的训练数据。然而，在TSA数据集中的标签信息容易受到虚假标签注入（FLI）网络攻击的污染，导致深度TSA模型性能下降。为了解决这一挑战，提出了一种多模块鲁棒TSA方法（MMR），以无监督方式纠正受FLI误导的监督训练过程。在MMR中，交替训练一个监督分类模块和一个无监督聚类模块，以改善表征学习的聚类友好性，从而实现准确的聚类分配。利用聚类分配，构建一个训练标签校正器，纠正注入的虚假标签，并逐步增强对FLI的鲁棒性和弹性。然而，MMR和无FLI的深度TSA模型之间仍存在精确性和收敛速度的差距。为了缩小这一差距，进一步提出了一种人机协同训练策略，称为MMR-HIL。在MMR-HIL中，通过对训练损失建模为高斯分布来检测潜在的虚假样本。从这些样本中，通过TSA专家引导的双向注释者重新标记最有可能的虚假样本和最模糊的样本，然后进行受惩罚的优化，旨在提高精确性和收敛速度。大量实验表明，MMR和MMR-HIL在TSA性能中都表现出强大的对抗FLI的鲁棒性。此外，污染的标签也可以被有效地校正，展示了所提方法的优越弹性。

更新时间: 2024-06-10 19:05:21

领域: cs.LG,cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.06744v1

An Elliptic Kernel Unsupervised Autoencoder-Graph Convolutional Network Ensemble Model for Hyperspectral Unmixing

Spectral Unmixing is an important technique in remote sensing used to analyze hyperspectral images to identify endmembers and estimate abundance maps. Over the past few decades, performance of techniques for endmember extraction and fractional abundance map estimation have significantly improved. This article presents an ensemble model workflow called Autoencoder Graph Ensemble Model (AEGEM) designed to extract endmembers and fractional abundance maps. An elliptical kernel is applied to measure spectral distances, generating the adjacency matrix within the elliptical neighborhood. This information is used to construct an elliptical graph, with centroids as senders and remaining pixels within the geometry as receivers. The next step involves stacking abundance maps, senders, and receivers as inputs to a Graph Convolutional Network, which processes this input to refine abundance maps. Finally, an ensemble decision-making process determines the best abundance maps based on root mean square error metric. The proposed AEGEM is assessed with benchmark datasets such as Samson, Jasper, and Urban, outperforming results obtained by baseline algorithms. For the Samson dataset, AEGEM excels in three abundance maps: water, tree and soil yielding values of 0.081, 0.158, and 0.182, respectively. For the Jasper dataset, results are improved for the tree and water endmembers with values of 0.035 and 0.060 in that order, as well as for the mean average of the spectral angle distance metric 0.109. For the Urban dataset, AEGEM outperforms previous results for the abundance maps of roof and asphalt, achieving values of 0.135 and 0.240, respectively. Additionally, for the endmembers of grass and roof, AEGEM achieves values of 0.063 and 0.094.

Updated: 2024-06-10 19:04:39

标题: 一个椭圆核非监督自编码器-图卷积网络集成模型用于高光谱解混。

摘要: 光谱解混是遥感中的一项重要技术，用于分析高光谱图像以识别端元并估算丰度图。在过去几十年中，端元提取和分数丰度图估计技术的性能显著提高。本文介绍了一种名为自动编码器图集成模型（AEGEM）的集成模型工作流程，旨在提取端元和分数丰度图。椭圆核被应用来衡量光谱距离，生成椭圆邻域内的邻接矩阵。这些信息被用来构建一个椭圆图，其中以质心作为发送者，几何内其余像素作为接收者。下一步涉及堆叠丰度图、发送者和接收者作为输入到图卷积网络，该网络处理此输入以精细化丰度图。最后，一个集成决策过程根据均方根误差指标确定最佳丰度图。提出的AEGEM使用基准数据集（如Samson、Jasper和Urban）进行评估，并优于基线算法获得的结果。对于Samson数据集，AEGEM在水、树木和土壤三个丰度图中表现出色，分别为0.081、0.158和0.182。对于Jasper数据集，树木和水端元的结果分别为0.035和0.060，以及光谱角距离平均值0.109。对于Urban数据集，AEGEM在屋顶和沥青的丰度图上优于先前结果，分别达到0.135和0.240。此外，对于草和屋顶的端元，AEGEM分别获得0.063和0.094的值。

更新时间: 2024-06-10 19:04:39

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.06742v1

Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval

Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases (up to 5 million tokens), thereby reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the significantly larger vocabulary. Our results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost. Furthermore, online A/B experiments on a large commercial search engine show that PIXAR increases ad clicks by 5.08% and revenue by 4.02%.

Updated: 2024-06-10 19:01:15

标题: 为了高效的生成式检索，扩展非自回归模型的词汇量

摘要: 生成检索引入了一种新的信息检索方法，将其重新构建为受限的生成任务，利用了最近在自回归（AR）语言模型方面的进展。然而，基于AR的生成检索方法与传统的密集检索技术相比存在较高的推理延迟和成本，限制了它们的实际适用性。本文研究了完全非自回归（NAR）语言模型作为生成检索更高效的替代方法。虽然标准NAR模型缓解了延迟和成本方面的顾虑，但由于其无法捕捉目标标记之间的依赖关系，它们在检索性能方面表现出明显下降（与AR模型相比）。为解决这一问题，我们质疑了将目标标记空间限制仅为单词或子词的传统选择。我们提出了PIXAR，一种新颖的方法，将NAR模型的目标词汇扩展到包括多词实体和常见短语（最多500万个标记），从而减少标记依赖性。PIXAR采用推理优化策略，尽管词汇量显著增加，但仍保持低推理延迟。我们的结果表明，与具有类似延迟和成本的标准NAR模型相比，PIXAR在MS MARCO的MRR@10上实现了31.0％的相对改进，在自然问题的Hits@5上实现了23.2％的改进。此外，在一个大型商业搜索引擎上进行的在线A/B实验表明，PIXAR将广告点击增加了5.08％，收入增加了4.02％。

更新时间: 2024-06-10 19:01:15

领域: cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.06739v1

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Updated: 2024-06-10 19:00:08

标题: 自我监督网络蒸馏：稀疏奖励环境中探索的有效方法

摘要: 强化学习可以解决决策问题，并训练一个代理根据预先设计的奖励函数在环境中行为。然而，如果奖励太稀疏，代理在环境探索过程中不会遇到奖励，这种方法就会变得非常棘手。解决这个问题的方法可能是为代理配备一种内在动机，这种内在动机将在探索过程中提供有信息量的探索，代理在此过程中很可能也会遇到外部奖励。新颖性检测是内在动机研究中有前途的一个分支。我们提出了基于蒸馏误差作为新颖性指示器的一类内在动机算法——自监督网络蒸馏（SND），其中预测模型和目标模型都经过训练。我们为此目的改编了三种现有的自监督方法，并在一组被认为难以探索的十个环境上进行了实验测试。结果显示，与基线模型相比，我们的方法在相同训练时间内实现了更快的增长和更高的外部奖励，这意味着在非常稀疏奖励环境中实现了更好的探索。此外，我们应用的分析方法为我们提出的模型提供了宝贵的解释洞见。

更新时间: 2024-06-10 19:00:08

领域: cs.AI

下载: http://arxiv.org/abs/2302.11563v4

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks. Our novel evaluation method assesses models under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models. The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers, alongside a diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of prompt theft attacks and defense mechanisms to date. Our findings highlight universal susceptibility to prompt theft in the absence of defenses, with OpenAI models demonstrating notable resilience when protected. This paper aims to establish a more systematic benchmark for assessing LLM robustness against prompt extraction attacks, offering insights into their causes and potential countermeasures. Resources of Raccoon are publicly available at https://github.com/M0gician/RaccoonBench.

Updated: 2024-06-10 18:57:22

标题: 浣熊：基于LLM集成应用的快速提取基准

摘要: 随着集成了LLM的应用程序（如GPT-s）的大量增加，数百万应用程序被部署，通过专有的指令提示提供有价值的服务。然而，这些系统容易受到通过精心设计的查询进行的提示提取攻击。为了帮助缓解这一问题，我们引入了Raccoon基准，全面评估模型对提示提取攻击的易感性。我们的新颖评估方法在无防御和有防御的情况下评估模型，采用双重方法评估现有防御的有效性和模型的韧性。该基准包括14个类别的提示提取攻击，以及密切模仿潜在攻击者策略的复合攻击，以及多样化的防御模板。据我们所知，这个数组是迄今为止最全面的提示窃取攻击和防御机制的编译。我们的研究结果突显了在没有防御措施的情况下普遍易受提示窃取攻击的影响，而OpenAI模型在受到保护时表现出显著的韧性。本文旨在建立一个更系统的基准，评估LLM对提示提取攻击的抗性，为了解其原因和潜在对策提供见解。Raccoon的资源可在https://github.com/M0gician/RaccoonBench上公开获取。

更新时间: 2024-06-10 18:57:22

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2406.06737v1

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.

Updated: 2024-06-10 18:57:06

标题: 机器学习中的长期公平调查与追求：概念、方法和挑战的调查

摘要: 机器学习系统在日常生活中的广泛应用，特别是在高风险领域，引发了人们对公平性影响的担忧。虽然以往的研究已经调查了静态公平性指标，但最近的研究表明，自动决策具有长期影响，通用的公平性方法可能无法实现长期公平性的目标。此外，反馈循环的存在以及模型与环境之间的相互作用引入了额外的复杂性，可能偏离最初的公平性目标。在这项调查中，我们从不同角度审查了现有关于长期公平性的文献，并提出了长期公平性研究的分类法。我们突出了关键挑战，并考虑了未来的研究方向，分析了当前的问题和可能的进一步探索。

更新时间: 2024-06-10 18:57:06

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.06736v1

Detecting algorithmic bias in medical-AI models using trees

With the growing prevalence of machine learning and artificial intelligence-based medical decision support systems, it is equally important to ensure that these systems provide patient outcomes in a fair and equitable fashion. This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems. Our approach efficiently identifies potential biases in medical-AI models, specifically in the context of sepsis prediction, by employing the Classification and Regression Trees (CART) algorithm with conformity scores. We verify our methodology by conducting a series of synthetic data experiments, showcasing its ability to estimate areas of bias in controlled settings precisely. The effectiveness of the concept is further validated by experiments using electronic medical records from Grady Memorial Hospital in Atlanta, Georgia. These tests demonstrate the practical implementation of our strategy in a clinical environment, where it can function as a vital instrument for guaranteeing fairness and equity in AI-based medical decisions.

Updated: 2024-06-10 18:55:41

标题: 使用树结构检测医疗人工智能模型中的算法偏见

摘要: 随着机器学习和人工智能医疗决策支持系统的普及，确保这些系统以公平和公正的方式提供患者结果同样重要。本文提出了一种创新的框架，用于检测医疗人工智能决策支持系统中的算法偏见。我们的方法通过使用分类和回归树（CART）算法和一致性分数，有效地识别医疗人工智能模型中潜在的偏见，特别是在败血症预测的背景下。我们通过进行一系列合成数据实验来验证我们的方法论，展示其在受控环境中准确估计偏见区域的能力。该概念的有效性进一步通过使用亚特兰大乔治亚州格雷迪纪念医院的电子病历进行实验验证。这些测试展示了我们的策略在临床环境中的实际实施，它可以作为确保AI医疗决策公平和公正的重要工具。

更新时间: 2024-06-10 18:55:41

领域: stat.ML,cs.CY,cs.LG,stat.AP

下载: http://arxiv.org/abs/2312.02959v6

TRINS: Towards Multimodal Language Models that Can Read

Large multimodal language models have shown remarkable proficiency in understanding and editing images. However, a majority of these visually-tuned models struggle to comprehend the textual content embedded in images, primarily due to the limitation of training data. In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model. TRINS is built upon LAION using hybrid data annotation strategies that include machine-assisted and human-assisted annotation processes. It contains 39,153 text-rich images, captions, and 102,437 questions. Specifically, we show that the number of words per annotation in TRINS is significantly longer than that of related datasets, providing new challenges. Furthermore, we introduce a simple and effective architecture, called a Language-vision Reading Assistant (LaRA), which is good at understanding textual content within images. LaRA outperforms existing state-of-the-art multimodal large language models on the TRINS dataset, as well as other classical benchmarks. Lastly, we conducted a comprehensive evaluation with TRINS on various text-rich image understanding and generation tasks, demonstrating its effectiveness.

Updated: 2024-06-10 18:52:37

标题: TRINS: 朝着能够阅读的多模态语言模型

摘要: 大型多模态语言模型已经显示出在理解和编辑图像方面的出色能力。然而，大多数这些经过视觉调整的模型很难理解嵌入图像中的文本内容，主要是由于训练数据的限制。在这项工作中，我们介绍了TRINS：一个文本丰富的图像指令数据集，旨在增强多模态大型语言模型的阅读能力。TRINS建立在LAION基础上，使用包括机器辅助和人工辅助标注过程在内的混合数据标注策略。它包含39,153幅文本丰富的图像、标题和102,437个问题。具体来说，我们展示了TRINS中每个标注的字数显著长于相关数据集，提供了新的挑战。此外，我们引入了一种简单而有效的架构，称为语言-视觉阅读助手（LaRA），擅长理解图像中的文本内容。LaRA在TRINS数据集以及其他经典基准测试上胜过了现有的最先进的多模态大型语言模型。最后，我们对TRINS进行了全面评估，展示了它在各种文本丰富的图像理解和生成任务中的有效性。

更新时间: 2024-06-10 18:52:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06730v1

Synthetic Query Generation using Large Language Models for Virtual Assistants

Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve upon the VA's abilities -- especially for use-cases that do not (yet) occur in paired audio/text data. In this paper, we provide a preliminary exploration of the use of Large Language Models (LLMs) to generate synthetic queries that are complementary to template-based methods. We investigate whether the methods (a) generate queries that are similar to randomly sampled, representative, and anonymized user queries from a popular VA, and (b) whether the generated queries are specific. We find that LLMs generate more verbose queries, compared to template-based methods, and reference aspects specific to the entity. The generated queries are similar to VA user queries, and are specific enough to retrieve the relevant entity. We conclude that queries generated by LLMs and templates are complementary.

Updated: 2024-06-10 18:50:57

标题: 使用大型语言模型生成合成查询以用于虚拟助手

摘要: 虚拟助手（VAs）是重要的信息检索平台，通过口头命令帮助用户完成各种任务。语音识别系统（语音转文本）使用仅在文本上训练的查询先验来区分音标混淆的替代方案。因此，生成类似于现有VA使用的合成查询可以极大地提高VA的能力--特别是对于尚未出现在配对音频/文本数据中的用例。在本文中，我们对使用大型语言模型（LLMs）生成与基于模板方法互补的合成查询进行了初步探讨。我们调查这些方法是否（a）生成与从流行VA中随机抽样、代表性且匿名化的用户查询相似的查询，以及（b）生成的查询是否具体。我们发现，与基于模板的方法相比，LLMs生成更冗长的查询，并引用与实体相关的方面。生成的查询类似于VA用户查询，且具体到足以检索相关实体。我们得出结论，LLMs和模板生成的查询是互补的。

更新时间: 2024-06-10 18:50:57

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06729v1

AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI

Chronic Kidney Disease (CKD) is one of the widespread Chronic diseases with no known ultimo cure and high morbidity. Research demonstrates that progressive Chronic Kidney Disease (CKD) is a heterogeneous disorder that significantly impacts kidney structure and functions, eventually leading to kidney failure. With the progression of time, chronic kidney disease has moved from a life-threatening disease affecting few people to a common disorder of varying severity. The goal of this research is to visualize dominating features, feature scores, and values exhibited for early prognosis and detection of CKD using ensemble learning and explainable AI. For that, an AI-driven predictive analytics approach is proposed to aid clinical practitioners in prescribing lifestyle modifications for individual patients to reduce the rate of progression of this disease. Our dataset is collected on body vitals from individuals with CKD and healthy subjects to develop our proposed AI-driven solution accurately. In this regard, blood and urine test results are provided, and ensemble tree-based machine-learning models are applied to predict unseen cases of CKD. Our research findings are validated after lengthy consultations with nephrologists. Our experiments and interpretation results are compared with existing explainable AI applications in various healthcare domains, including CKD. The comparison shows that our developed AI models, particularly the Random Forest model, have identified more features as significant contributors than XgBoost. Interpretability (I), which measures the ratio of important to masked features, indicates that our XgBoost model achieved a higher score, specifically a Fidelity of 98\%, in this metric and naturally in the FII index compared to competing models.

Updated: 2024-06-10 18:46:14

标题: 基于AI的预测分析方法，利用集成学习和可解释AI对慢性肾病早期预后的预测

摘要: 慢性肾病（CKD）是一种普遍慢性疾病，没有已知的终极治疗方法，并且有很高的患病率。研究表明，进行性慢性肾病（CKD）是一种影响肾脏结构和功能的异质性疾病，最终导致肾功能衰竭。随着时间的推移，慢性肾病已经从一种威胁生命的疾病发展为一种严重程度不同的常见疾病。本研究的目标是利用集成学习和可解释人工智能来可视化慢性肾病的主要特征、特征得分和数值，以便进行早期预测和检测CKD。为此，提出了一种以人工智能驱动的预测分析方法，以帮助临床从业者为个体患者开具生活方式改变处方，以减少该疾病的进展速度。我们的数据集收集了患有CKD和健康受试者的身体重要指标，以准确开发我们提出的人工智能驱动解决方案。在这方面，提供了血液和尿液检测结果，并应用集成树型机器学习模型来预测未见病例的CKD。我们的研究结果经过与肾病学专家长时间的磋商验证。我们的实验和解释结果与各种医疗领域的现有可解释人工智能应用进行了比较，包括CKD。比较显示，我们开发的AI模型，特别是随机森林模型，确定了更多特征作为重要贡献者，而不是XgBoost。可解释性（I）衡量了重要特征与掩盖特征的比例，表明我们的XgBoost模型在此指标上获得了更高的得分，特别是在FII指数上，自然比竞争模型表现更好，具有98\%的忠实度。

更新时间: 2024-06-10 18:46:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06728v1

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these methods frequently rely on textual input as the editing guidance, leading to ambiguities and limiting the types of edits they can perform. Recognizing these challenges, we introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model to modify the first frame, (2) utilizing an existing image-to-video generation model to generate the edited video through temporal feature injection. AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. AnyV2V can also support any video length. Our evaluation indicates that AnyV2V significantly outperforms other baseline methods in automatic and human evaluations by significant margin, maintaining visual consistency with the source video while achieving high-quality edits across all the editing tasks.

Updated: 2024-06-10 18:38:00

标题: AnyV2V：一种无需调整的框架，适用于任何视频到视频编辑任务

摘要: 在数字内容创作领域中使用生成模型进行视频编辑的动态领域中，尽管最先进的视频编辑模型仍未提供用户期望的质量和控制水平。先前关于视频编辑的研究要么是在零镜头方式下延伸自基于图像的生成模型，要么需要进行大量微调，这可能会妨碍流畅视频编辑的制作。此外，这些方法经常依赖文本输入作为编辑指导，导致歧义并限制它们可以执行的编辑类型。鉴于这些挑战，我们引入了AnyV2V，这是一种新颖的无调谐范例，旨在将视频编辑简化为两个主要步骤：（1）利用现成的图像编辑模型修改第一帧，（2）通过时间特征注入利用现有的图像到视频生成模型生成编辑后的视频。AnyV2V可以利用任何现有的图像编辑工具来支持广泛的视频编辑任务，包括基于提示的编辑、基于参考的风格转移、受主导的编辑和身份操纵，这是以前的方法无法达到的。AnyV2V还可以支持任何视频长度。我们的评估表明，AnyV2V在自动评估和人类评估中明显优于其他基线方法，并在所有编辑任务中保持与源视频的视觉一致性，同时实现高质量的编辑。

更新时间: 2024-06-10 18:38:00

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2403.14468v3

Optimization-based Causal Estimation from Heterogenous Environments

This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association with the outcome. Such spurious associations provide predictive power for classical ML, but they prevent us from causally interpreting the result. This paper proposes CoCo, an optimization algorithm that bridges the gap between pure prediction and causal inference. CoCo leverages the recently-proposed idea of environments, datasets of covariates/response where the causal relationships remain invariant but where the distribution of the covariates changes from environment to environment. Given datasets from multiple environments-and ones that exhibit sufficient heterogeneity-CoCo maximizes an objective for which the only solution is the causal solution. We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, CoCo provides more accurate estimates of the causal model and more accurate predictions under interventions.

Updated: 2024-06-10 18:32:38

标题: 基于优化的异质环境因果估计

摘要: 这篇论文提出了一种新的因果估计优化方法。给定包含协变量和结果的数据，哪些协变量是结果的原因，以及原因的强度是多少？在经典机器学习（ML）中，优化的目标是最大化预测准确性。然而，一些协变量可能表现出与结果的非因果关联。这种虚假关联为经典ML提供了预测能力，但阻止我们对结果进行因果解释。本文提出了CoCo，一种优化算法，可以弥合纯预测和因果推断之间的差距。CoCo利用最近提出的环境概念，即协变量/响应数据集，在这些数据集中，因果关系保持不变，但协变量的分布从一个环境到另一个环境会发生变化。给定来自多个环境的数据集，并且这些数据集展现出足够的异质性，CoCo最大化一个目标，其唯一解是因果解。我们描述了这种方法的理论基础，并展示了它在模拟和真实数据集上的有效性。与经典ML和现有方法相比，CoCo提供了对因果模型更准确的估计，并在干预下提供了更准确的预测。

更新时间: 2024-06-10 18:32:38

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2109.11990v3

Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

Adaptive brain stimulation can treat neurological conditions such as Parkinson's disease and post-stroke motor deficits by influencing abnormal neural activity. Because of patient heterogeneity, each patient requires a unique stimulation policy to achieve optimal neural responses. Model-free reinforcement learning (MFRL) holds promise in learning effective policies for a variety of similar control tasks, but is limited in domains like brain stimulation by a need for numerous costly environment interactions. In this work we introduce Coprocessor Actor Critic, a novel, model-based reinforcement learning (MBRL) approach for learning neural coprocessor policies for brain stimulation. Our key insight is that coprocessor policy learning is a combination of learning how to act optimally in the world and learning how to induce optimal actions in the world through stimulation of an injured brain. We show that our approach overcomes the limitations of traditional MFRL methods in terms of sample efficiency and task success and outperforms baseline MBRL approaches in a neurologically realistic model of an injured brain.

Updated: 2024-06-10 18:23:03

标题: 协处理器演员评论家：一种基于模型的自适应大脑刺激强化学习方法

摘要: 自适应脑部刺激可以治疗帕金森病和中风后的运动障碍等神经系统疾病，通过影响异常的神经活动。由于患者的异质性，每个患者都需要一个独特的刺激策略来实现最佳的神经反应。无模型强化学习（MFRL）在学习各种类似控制任务的有效策略方面表现出潜力，但在脑刺激等领域受到需要大量昂贵的环境交互的限制。在这项工作中，我们介绍了Coprocessor Actor Critic，这是一种新颖的基于模型的强化学习（MBRL）方法，用于学习神经协处理器策略以进行脑部刺激。我们的关键洞察是，协处理器策略学习是学习如何在世界中采取最优行动以及学习如何通过刺激受伤的大脑诱导最优行动的组合。我们展示了我们的方法在样本效率和任务成功方面克服了传统MFRL方法的局限，并在受损大脑的神经学模型中胜过基线MBRL方法。

更新时间: 2024-06-10 18:23:03

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.06714v1

Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network

This paper introduces a simple yet effective strategy for exercise classification and muscle group activation prediction (MGAP). These tasks have significant implications for personal fitness, facilitating more affordable, accessible, safer, and simpler exercise routines. This is particularly relevant for novices and individuals with disabilities. Previous research in the field is mostly dominated by the reliance on mounted sensors and a limited scope of exercises, reducing practicality for everyday use. Furthermore, existing MGAP methodologies suffer from a similar dependency on sensors and a restricted range of muscle groups, often excluding strength training exercises, which are pivotal for a comprehensive fitness regimen. Addressing these limitations, our research employs a video-based deep learning framework that encompasses a broad spectrum of exercises and muscle groups, including those vital for strength training. Utilizing the "Workout/Exercises Video" dataset, our approach integrates the X3D and SlowFast video activity recognition models in an effective way to enhance exercise classification and MGAP performance. Our findings demonstrate that this hybrid method obtained via weighted ensemble outperforms existing baseline models in accuracy. Pretrained models play a crucial role in enhancing overall performance, with optimal channel reduction values for the SlowFast model identified near 10. Through an ablation study that explores fine-tuning, we further elucidate the interrelation between the two tasks. Our composite model, a weighted-average ensemble of X3D and SlowFast, sets a new benchmark in both exercise classification and MGAP across all evaluated categories, offering a robust solution to the limitations of previous approaches.

Updated: 2024-06-10 18:05:02

标题: 基于视频的运动分类和混合X3D-SlowFast网络的激活肌肉组预测

摘要: 该论文介绍了一种简单而有效的策略，用于运动分类和肌肉群激活预测（MGAP）。这些任务对于个人健身具有重要意义，可以促进更加经济、便捷、安全和简单的运动计划。这对于新手和残疾人士尤为重要。先前在该领域的研究主要依赖于固定传感器和有限范围的运动，降低了日常使用的实用性。此外，现有的MGAP方法学也存在类似的传感器依赖性和肌肉群范围受限的问题，通常排除了对于全面健身计划至关重要的力量训练运动。针对这些局限性，我们的研究采用了基于视频的深度学习框架，涵盖了广泛的运动和肌肉群，包括对于力量训练至关重要的肌肉群。利用“锻炼/运动视频”数据集，我们的方法有效地整合了X3D和SlowFast视频活动识别模型，以增强运动分类和MGAP性能。我们的研究结果表明，通过加权集成获得的混合方法在准确性方面优于现有的基线模型。预训练模型在提高整体性能方面发挥了至关重要的作用，SlowFast模型的最佳通道缩减值确定在约10附近。通过探讨微调的消融研究，我们进一步阐明了这两个任务之间的相互关系。我们的复合模型，X3D和SlowFast的加权平均集成，为运动分类和MGAP在所有评估类别上设立了新的基准，为以前方法的局限性提供了强大的解决方案。

更新时间: 2024-06-10 18:05:02

领域: cs.CV,cs.LG,I.2.10; I.4.8

下载: http://arxiv.org/abs/2406.06703v1

Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM's training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model's outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-{10,100}, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.

Updated: 2024-06-10 18:02:48

标题: 忘记锐度：SAM动态中模型偏差的扰动遗忘

摘要: 尽管采用尖锐感知最小化（SAM）训练的模型达到了高度的经验概括，但模型的尖锐度并不总是与泛化错误相关。我们的论文不将SAM视为通过最小化尖锐度来改进泛化，而是考虑基于SAM训练动态的新视角。我们提出在SAM中的扰动执行扰动性遗忘，其中它们丢弃不良模型偏差，以展示更好的泛化学习信号。我们将遗忘的概念与信息瓶颈原则联系起来，用它来解释观察到的类似更小扰动批次泛化更好的现象，并显示扰动性遗忘可能比平坦性与泛化更强相关。尽管标准SAM目标是模型偏差暴露出的最陡上升方向，我们提出了一种新的扰动，该扰动目标是通过模型输出暴露出的偏差。我们的输出偏差遗忘扰动在ImageNet、鲁棒性基准和转移到CIFAR-10、100上的表现优于标准SAM、GSAM和ASAM，有时会收敛到更尖锐的区域。我们的结果表明，SAM的好处可以通过不需要损失表面平坦性的替代机械原则来解释。

更新时间: 2024-06-10 18:02:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06700v1

ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models

Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects & attributes and actions are joined using correct syntax to form a proper text query. These components (objects & attributes, actions and syntax) each play an important role to help distinguish among videos and retrieve the correct ground truth video. However, it is unclear what is the effect of these components on the video retrieval performance. We therefore, conduct a systematic study to evaluate the compositional and syntactic understanding of video retrieval models on standard benchmarks such as MSRVTT, MSVD and DIDEMO. The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.) (ii) which adapt pre-trained image-text representations like CLIP for video retrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal that actions and syntax play a minor role compared to objects & attributes in video understanding. Moreover, video retrieval models that use pre-trained image-text representations (CLIP) have better syntactic and compositional understanding as compared to models pre-trained on video-text data. The code is available at https://github.com/IntelLabs/multimodal_cognitive_ai/tree/main/ICSVR

Updated: 2024-06-10 18:02:43

标题: ICSVR：研究视频检索模型中的组成和句法理解

摘要: 视频检索（VR）涉及在给定文本标题或反之的情况下，从视频数据库中检索真实视频。组成性的两个重要组成部分：对象和属性以及动作，使用正确的语法连接在一起形成适当的文本查询。这些组件（对象和属性，动作和语法）每个都发挥重要作用，有助于区分视频并检索正确的真实视频。然而，目前尚不清楚这些组件对视频检索性能的影响。因此，我们进行了一项系统研究，评估了视频检索模型在MSRVTT、MSVD和DIDEMO等标准基准数据集上对组成和语法的理解。研究针对两类视频检索模型进行：（i）预先在视频文本对上进行训练，然后在下游视频检索数据集上进行微调（例如Frozen-in-Time、Violet、MCQ等）；（ii）使用预训练的图像文本表示（如CLIP）进行视频检索（例如CLIP4Clip、XCLIP、CLIP2Video等）。我们的实验表明，在视频理解中，动作和语法相对于对象和属性起到次要作用。此外，使用预训练的图像文本表示（CLIP）的视频检索模型相比于在视频文本数据上进行预训练的模型具有更好的语法和组成理解。代码可在https://github.com/IntelLabs/multimodal_cognitive_ai/tree/main/ICSVR上找到。

更新时间: 2024-06-10 18:02:43

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2306.16533v3

In-Context Learning and Fine-Tuning GPT for Argument Mining

Large Language Models (LLMs) have become ubiquitous in NLP and deep learning. In-Context Learning (ICL) has been suggested as a bridging paradigm between the training-free and fine-tuning LLMs settings. In ICL, an LLM is conditioned to solve tasks by means of a few solved demonstration examples included as prompt. Argument Mining (AM) aims to extract the complex argumentative structure of a text, and Argument Type Classification (ATC) is an essential sub-task of AM. We introduce an ICL strategy for ATC combining kNN-based examples selection and majority vote ensembling. In the training-free ICL setting, we show that GPT-4 is able to leverage relevant information from only a few demonstration examples and achieve very competitive classification accuracy on ATC. We further set up a fine-tuning strategy incorporating well-crafted structural features given directly in textual form. In this setting, GPT-3.5 achieves state-of-the-art performance on ATC. Overall, these results emphasize the emergent ability of LLMs to grasp global discursive flow in raw text in both off-the-shelf and fine-tuned setups.

Updated: 2024-06-10 18:01:55

标题: 在上下文学习和微调GPT用于论证挖掘

摘要: 大型语言模型(LLMs)已经在自然语言处理和深度学习中变得无处不在。在上下文学习(ICL)被提出作为训练无需和微调LLMs设置之间的桥梁范式。在ICL中，LLM被设定条件解决任务，通过几个已解决的示例作为提示。论证挖掘(AM)旨在提取文本的复杂论证结构，而论证类型分类(ATC)是AM的一个重要子任务。我们介绍了一个用于ATC的ICL策略，结合了基于kNN的示例选择和多数投票集成。在无需训练的ICL设置中，我们展示了GPT-4能够从仅有的几个示例中利用相关信息，并在ATC上实现非常有竞争力的分类准确性。我们进一步建立了一个微调策略，将直接以文本形式给出的精心设计的结构特征纳入其中。在这种设置下，GPT-3.5在ATC上实现了最先进的性能。总的来说，这些结果强调了LLMs在原始文本中把握全局话语流的新兴能力，无论是在现成的还是微调的设置中。

更新时间: 2024-06-10 18:01:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06699v1

Controlling Counterfactual Harm in Decision Support Systems Based on Prediction Sets

Decision support systems based on prediction sets help humans solve multiclass classification tasks by narrowing down the set of potential label values to a subset of them, namely a prediction set, and asking them to always predict label values from the prediction sets. While this type of systems have been proven to be effective at improving the average accuracy of the predictions made by humans, by restricting human agency, they may cause harm$\unicode{x2014}$a human who has succeeded at predicting the ground-truth label of an instance on their own may have failed had they used these systems. In this paper, our goal is to control how frequently a decision support system based on prediction sets may cause harm, by design. To this end, we start by characterizing the above notion of harm using the theoretical framework of structural causal models. Then, we show that, under a natural, albeit unverifiable, monotonicity assumption, we can estimate how frequently a system may cause harm using only predictions made by humans on their own. Further, we also show that, under a weaker monotonicity assumption, which can be verified experimentally, we can bound how frequently a system may cause harm again using only predictions made by humans on their own. Building upon these assumptions, we introduce a computational framework to design decision support systems based on prediction sets that are guaranteed to cause harm less frequently than a user-specified value using conformal risk control. We validate our framework using real human predictions from two different human subject studies and show that, in decision support systems based on prediction sets, there is a trade-off between accuracy and counterfactual harm.

Updated: 2024-06-10 18:00:00

标题: 基于预测集的决策支持系统中控制反事实伤害

摘要: 基于预测集的决策支持系统帮助人类通过将潜在标签值集合缩小到其中的一个子集，即预测集，来解决多类别分类任务，并要求他们始终从预测集中预测标签值。虽然这种类型的系统已被证明有效地提高了人类所做预测的平均准确率，但通过限制人类的行为，它们可能会导致伤害——一个能够独自成功预测实例的真实标签的人可能会在使用这些系统时失败。在本文中，我们的目标是通过设计控制基于预测集的决策支持系统可能造成伤害的频率。为此，我们首先使用结构因果模型的理论框架来描述上述伤害概念。然后，我们展示，在一个自然但不能验证的单调性假设下，我们可以仅通过人类独立做出的预测来估计系统可能造成伤害的频率。此外，我们还展示，在一个较弱但可通过实验证明的单调性假设下，我们可以再次通过仅使用人类独立做出的预测来限制系统可能造成伤害的频率。基于这些假设，我们引入了一个计算框架，设计基于预测集的决策支持系统，保证比用户指定的值更少地造成伤害，使用符合风险控制。我们使用来自两项不同人类主体研究的真实人类预测来验证我们的框架，并展示，在基于预测集的决策支持系统中，准确性和反事实伤害之间存在权衡。

更新时间: 2024-06-10 18:00:00

领域: cs.LG,cs.CY,cs.HC,stat.ME

下载: http://arxiv.org/abs/2406.06671v1

IllumiNeRF: 3D Relighting without Inverse Rendering

Existing methods for relightable view synthesis -- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination -- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization through differentiable Monte Carlo rendering, which is brittle and computationally-expensive. In this work, we propose a simpler approach: we first relight each input image using an image diffusion model conditioned on lighting and then reconstruct a Neural Radiance Field (NeRF) with these relit images, from which we render novel views under the target lighting. We demonstrate that this strategy is surprisingly competitive and achieves state-of-the-art results on multiple relighting benchmarks. Please see our project page at https://illuminerf.github.io/.

Updated: 2024-06-10 17:59:59

标题: IllumiNeRF：无需逆向渲染的3D重新照明

摘要: 现有的可重光视角合成方法——利用一组对象在未知光照下的图像恢复一个可以在目标照明下从新视角渲染的3D表示——基于反渲染，并试图分离解释输入图像的对象几何、材料和光照。此外，这通常涉及通过可微分蒙特卡洛渲染进行优化，这是脆弱且计算昂贵的。在这项工作中，我们提出了一种更简单的方法：我们首先使用一个以光照为条件的图像扩散模型对每个输入图像进行重新照明，然后利用这些重新照明的图像重建一个神经辐射场（NeRF），从中我们可以在目标光照下渲染新的视角。我们展示了这种策略出人意料地具有竞争力，并在多个重照基准测试中取得了最先进的结果。请查看我们的项目页面：https://illuminerf.github.io/。

更新时间: 2024-06-10 17:59:59

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.06527v1

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

We propose CounterCurate, a framework to comprehensively improve the visio-linguistic compositional reasoning capability for both contrastive and generative multimodal models. In particular, we identify two critical under-explored problems: the neglect of the physically grounded reasoning (counting and position understanding) and the potential of using highly capable text and image generation models for semantic counterfactual fine-tuning. Our work pioneers an approach that addresses these gaps. We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning. We then apply simple data augmentation using grounded image generation model GLIGEN to generate fine-tuning data, resulting in significant performance improvements: +33% and +37% for CLIP and LLaVA, respectively, on our newly curated Flickr30k-Positions benchmark. Moreover, we exploit the capabilities of high-performing text generation and image generation models, specifically GPT-4V and DALLE-3, to curate challenging semantic counterfactuals, thereby further enhancing compositional reasoning capabilities on benchmarks such as SugarCrepe, where CounterCurate outperforms GPT-4V. To facilitate future research, we release our code, dataset, benchmark, and checkpoints at https://countercurate.github.io.

Updated: 2024-06-10 17:59:55

标题: 对比策划：通过反事实例增强视觉-语义组合推理

摘要: 我们提出了CounterCurate，这是一个框架，可以全面提高对比和生成多模态模型的视觉-语言组合推理能力。特别是，我们确定了两个关键且未充分探讨的问题：忽视了基于物理的推理（计数和位置理解），以及利用高度能力的文本和图像生成模型进行语义反事实微调的潜力。我们的工作开创了一种解决这些差距的方法。我们首先着重指出多模态模型（如CLIP和LLaVA）在基于物理的组合推理中表现接近于偶然。然后，我们应用简单的数据增强，使用基于物理的图像生成模型GLIGEN生成微调数据，导致显著的性能提升：在我们新策划的Flickr30k-Positions基准测试中，CLIP和LLaVA分别提高了+33%和+37%。此外，我们利用高性能文本生成和图像生成模型（具体为GPT-4V和DALLE-3），策划具有挑战性的语义反事实，从而进一步增强在SugarCrepe等基准测试中的组合推理能力，CounterCurate表现优于GPT-4V。为促进未来研究，我们在https://countercurate.github.io发布了我们的代码、数据集、基准测试和检查点。

更新时间: 2024-06-10 17:59:55

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.13254v3

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: VAR, for the first time, makes GPT-like AR models surpass diffusion transformers in image generation. On ImageNet 256x256 benchmark, VAR significantly improve AR baseline by improving Frechet inception distance (FID) from 18.65 to 1.73, inception score (IS) from 80.4 to 350.2, with around 20x faster inference speed. It is also empirically verified that VAR outperforms the Diffusion Transformer (DiT) in multiple dimensions including image quality, inference speed, data efficiency, and scalability. Scaling up VAR models exhibits clear power-law scaling laws similar to those observed in LLMs, with linear correlation coefficients near -0.998 as solid evidence. VAR further showcases zero-shot generalization ability in downstream tasks including image in-painting, out-painting, and editing. These results suggest VAR has initially emulated the two important properties of LLMs: Scaling Laws and zero-shot task generalization. We have released all models and codes to promote the exploration of AR/VAR models for visual generation and unified learning.

Updated: 2024-06-10 17:59:07

标题: 视觉自回归建模：通过下一尺度预测实现可扩展图像生成

摘要: 我们提出了视觉自回归建模（VAR），这是一种重新定义图像上的自回归学习的新一代范式，将其视为粗到细的“下一尺度预测”或“下一分辨率预测”，与标准的光栅扫描“下一个令牌预测”有所不同。这种简单直观的方法使自回归（AR）变压器能够快速学习视觉分布并具有良好的泛化能力：VAR首次使类似GPT的AR模型在图像生成方面超越了扩散变压器。在ImageNet 256x256基准测试中，VAR通过将Frechet inception距离（FID）从18.65提高到1.73，将inception得分（IS）从80.4提高到350.2，推理速度提高约20倍，显著改进了AR基准。经验验证还表明，VAR在图像质量、推理速度、数据效率和可伸缩性等多个维度上优于扩散变压器（DiT）。扩大VAR模型的规模显示出与LLMs中观察到的类似的幂律缩放规律，具有接近-0.998的线性相关系数作为坚实的证据。VAR进一步展示了零射击泛化能力，包括图像修补、外修补和编辑等下游任务。这些结果表明，VAR初步模拟了LLMs的两个重要属性：缩放定律和零射击任务泛化。我们已发布所有模型和代码，以促进对AR/VAR模型在视觉生成和统一学习中的探索。

更新时间: 2024-06-10 17:59:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02905v2

Decentralized Personalized Federated Learning

This work tackles the challenges of data heterogeneity and communication limitations in decentralized federated learning. We focus on creating a collaboration graph that guides each client in selecting suitable collaborators for training personalized models that leverage their local data effectively. Our approach addresses these issues through a novel, communication-efficient strategy that enhances resource efficiency. Unlike traditional methods, our formulation identifies collaborators at a granular level by considering combinatorial relations of clients, enhancing personalization while minimizing communication overhead. We achieve this through a bi-level optimization framework that employs a constrained greedy algorithm, resulting in a resource-efficient collaboration graph for personalized learning. Extensive evaluation against various baselines across diverse datasets demonstrates the superiority of our method, named DPFL. DPFL consistently outperforms other approaches, showcasing its effectiveness in handling real-world data heterogeneity, minimizing communication overhead, enhancing resource efficiency, and building personalized models in decentralized federated learning scenarios.

Updated: 2024-06-10 17:58:48

标题: 分散式个性化联邦学习

摘要: 这项工作解决了分散式联邦学习中的数据异质性和通信限制挑战。我们专注于创建一个协作图，指导每个客户端选择适合训练个性化模型的合作伙伴，有效利用他们的本地数据。我们的方法通过一种新颖的、通信高效的策略来解决这些问题，增强资源效率。与传统方法不同，我们的方案通过考虑客户之间的组合关系在细粒度级别上识别合作伙伴，增强个性化的同时最小化通信开销。我们通过采用受限贪婪算法的双层优化框架实现了这一点，从而为个性化学习创建了一种资源高效的协作图。对各种基准线进行广泛评估，跨不同数据集展示了我们的方法DPFL的优越性。DPFL始终优于其他方法，展示了其在处理真实世界数据异质性、最小化通信开销、增强资源效率以及在分散式联邦学习场景中构建个性化模型方面的有效性。

更新时间: 2024-06-10 17:58:48

领域: cs.LG,cs.AI,cs.CV,cs.MA,math.OC

下载: http://arxiv.org/abs/2406.06520v1

Data Augmentation for Multivariate Time Series Classification: An Experimental Study

Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, paralleling the advancements seen in computer vision. Our work delves into adapting and applying existing methods in innovative ways to the domain of multivariate time series classification. Our comprehensive exploration of these techniques sets a new standard for addressing data scarcity in time series analysis, emphasizing that diverse augmentation strategies are crucial for unlocking the potential of both traditional and deep learning models. Moreover, by meticulously analyzing and applying a variety of augmentation techniques, we demonstrate that strategic data enrichment can enhance model accuracy. This not only establishes a benchmark for future research in time series analysis but also underscores the importance of adopting varied augmentation approaches to improve model performance in the face of limited data availability.

Updated: 2024-06-10 17:58:02

标题: 多元时间序列分类的数据增强：实验研究

摘要: 我们的研究调查了数据增强对多变量时间序列模型性能的影响，重点关注来自UCR存档的数据集。尽管这些数据集的规模有限，但我们在使用Rocket和InceptionTime模型的13个数据集中有10个数据集的分类准确性得到了改善。这突显了训练有效模型所需的充分数据的重要作用，与计算机视觉领域的进展相一致。我们的工作深入研究了将现有方法以创新方式应用于多变量时间序列分类领域。我们对这些技术的全面探索为解决时间序列分析中的数据稀缺性设立了新的标准，强调多样化的增强策略对于释放传统和深度学习模型潜力至关重要。此外，通过精心分析和应用各种增强技术，我们证明了战略性数据丰富可以提高模型准确性。这不仅为未来的时间序列分析研究建立了一个基准，而且强调了在面对有限数据可用性时采用多样化增强方法以提高模型性能的重要性。

更新时间: 2024-06-10 17:58:02

领域: cs.LG

下载: http://arxiv.org/abs/2406.06518v1

Distribution-Free Predictive Inference under Unknown Temporal Drift

Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The window is selected by optimizing an estimated bias-variance tradeoff. We provide sharp coverage guarantees for our method, showing its adaptivity to the underlying temporal drift. We also illustrate its efficacy through numerical experiments on synthetic and real data.

Updated: 2024-06-10 17:55:43

标题: 无分布的预测推断在未知时间漂移下

摘要: 分布自由的预测集在复杂统计模型的不确定性量化中发挥着关键作用。它们的有效性取决于可靠的校准数据，但由于现实世界环境经常随时间发生未知变化，这些数据可能并不容易获得。在本文中，我们提出了一种选择自适应窗口并利用其中数据构建预测集的策略。通过优化估计的偏差-方差权衡来选择窗口。我们为我们的方法提供了尖锐的覆盖保证，展示了其对基础时间漂移的适应性。我们还通过对合成和真实数据的数值实验展示了其有效性。

更新时间: 2024-06-10 17:55:43

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06516v1

Random Features Approximation for Control-Affine Systems

Modern data-driven control applications call for flexible nonlinear models that are amenable to principled controller synthesis and realtime feedback. Many nonlinear dynamical systems of interest are control affine. We propose two novel classes of nonlinear feature representations which capture control affine structure while allowing for arbitrary complexity in the state dependence. Our methods make use of random features (RF) approximations, inheriting the expressiveness of kernel methods at a lower computational cost. We formalize the representational capabilities of our methods by showing their relationship to the Affine Dot Product (ADP) kernel proposed by Casta\~neda et al. (2021) and a novel Affine Dense (AD) kernel that we introduce. We further illustrate the utility by presenting a case study of data-driven optimization-based control using control certificate functions (CCF). Simulation experiments on a double pendulum empirically demonstrate the advantages of our methods.

Updated: 2024-06-10 17:54:57

标题: 随机特征逼近控制仿射系统

摘要: 现代数据驱动控制应用需要灵活的非线性模型，这些模型适合于基于原则的控制器合成和实时反馈。许多感兴趣的非线性动力系统具有控制仿射性质。我们提出了两种新颖的非线性特征表示类别，这些表示捕捉了控制仿射结构，同时允许状态依赖性的任意复杂性。我们的方法利用随机特征（RF）近似，以更低的计算成本继承了核方法的表达能力。我们通过展示它们与Casta\~neda等人（2021）提出的Affine Dot Product（ADP）核以及我们引入的新颖Affine Dense（AD）核之间的关系，形式化了我们方法的表示能力。我们进一步通过提出一个基于数据驱动优化的控制案例研究，使用控制证明函数（CCF）来展示其实用性。在一个双摆实验中，模拟实验实证了我们方法的优势。

更新时间: 2024-06-10 17:54:57

领域: cs.LG,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.06514v1

Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU.

Updated: 2024-06-10 17:53:01

标题: 梅林：用于3D计算机断层扫描的视觉语言基础模型

摘要: 每年在美国进行超过8500万次计算机断层扫描(CT)，其中约四分之一集中在腹部。鉴于目前放射科医生短缺的情况，使用人工智能减轻解释这些复杂影像研究的负担具有巨大的动力。以往自动化医学图像解释的最新方法利用视觉语言模型(VLMs)。然而，目前的医学VLMs通常仅限于2D图像和简短报告，并未利用电子健康记录(EHR)数据进行监督。我们介绍了Merlin - 一个3D VLM，我们使用配对的CT扫描（来自15,331个CT的600万多张图像）、EHR诊断代码（180万多个代码）和放射学报告（600万多个标记）进行训练。我们在6个任务类型和752个单独任务上评估了Merlin。非自适应(现成)任务包括零样本发现分类（31个发现）、表型分类（692个表型）和零样本跨模态检索（图像到发现和图像到印象），而模型自适应任务包括5年疾病预测（6种疾病）、放射学报告生成和3D语义分割（20个器官）。我们在一个包含5,137个CT的测试集上进行内部验证，以及在7,000个临床CT和两个公共CT数据集（VerSe，TotalSegmentator）上进行外部验证。除了这些与临床相关的评估之外，我们评估了各种网络架构和训练策略的有效性，以表明Merlin相对于现有的特定任务基准具有有利的性能。我们推导数据缩放定律，以经验性地评估下游任务性能所需的训练数据需求。此外，与传统的VLMs需要数百个GPU进行训练不同，我们所有的训练都在一台单独的GPU上进行。

更新时间: 2024-06-10 17:53:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06512v1

AlpaCare:Instruction-tuned Large Language Models for Medical Application

Instruction-finetuning (IFT) has become crucial in aligning Large Language Models (LLMs) with diverse human needs and has shown great potential in medical applications. However, previous studies mainly fine-tune LLMs on biomedical datasets with limited diversity, which often rely on benchmarks or narrow task scopes, and hence significantly limit the effectiveness on their medical instruction-following ability and generalizability. To bridge this gap, we propose creating a diverse, machine-generated medical IFT dataset, MedInstruct-52k, using GPT-4 and ChatGPT with a high-quality expert-curated seed set. We then fine-tune LLaMA-series models on the dataset to develop AlpaCare. Despite using a smaller domain-specific dataset than previous medical LLMs, AlpaCare not only demonstrates superior performance on medical applications, with up to 38.1% absolute gain over best baselines in medical free-form instruction evaluations, but also achieves 6.7% absolute gains averaged over multiple general domain benchmarks. Human evaluation further shows that AlpaCare consistently outperforms best baselines in terms of both correctness and helpfulness. We offer public access to our data, model, and codebase in https://github.com/XZhang97666/AlpaCare.

Updated: 2024-06-10 17:52:31

标题: AlpaCare：针对医疗应用进行指导调整的大型语言模型

摘要: 指导微调（IFT）已经变得至关重要，以满足大型语言模型（LLMs）与不同人类需求之间的协调，并在医学应用中显示出巨大潜力。然而，先前的研究主要是在生物医学数据集上对LLMs进行微调，这些数据集的多样性有限，通常依赖于基准或狭窄的任务范围，从而显著限制了它们在医学指导遵循能力和泛化能力上的有效性。为了弥补这一差距，我们提出利用GPT-4和ChatGPT创建一个多样化的、机器生成的医学IFT数据集MedInstruct-52k，使用高质量的专家策划的种子集。然后，我们在该数据集上微调LLaMA系列模型，开发出AlpaCare。尽管使用的领域特定数据集比以前的医学LLMs要小，但AlpaCare不仅在医学应用中表现出优越性能，在医学自由形式指导评估中，相对最佳基线获得了高达38.1%的绝对增益，同时在多个一般领域基准上平均获得了6.7%的绝对增益。人类评估进一步显示，AlpaCare在正确性和帮助性方面始终优于最佳基线。我们在https://github.com/XZhang97666/AlpaCare上提供我们的数据、模型和代码库的公开访问。

更新时间: 2024-06-10 17:52:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.14558v4

Model Editing at Scale leads to Gradual and Catastrophic Forgetting

Editing knowledge in large language models is an attractive capability to have which allows us to correct incorrectly learnt facts during pre-training, as well as update the model with an ever-growing list of new facts. While existing model editing techniques have shown promise, they are usually evaluated using metrics for reliability, specificity and generalization over one or few edits. We argue that for model editing to have practical utility, we must be able to make multiple edits to the same model. With this in mind, we evaluate the current model editing methods at scale, focusing on two state of the art methods: ROME and MEMIT. We find that as the model is edited sequentially with multiple facts, it continually forgets previously edited facts and the ability to perform downstream tasks. This forgetting happens in two phases -- an initial gradual but progressive forgetting phase followed by abrupt or catastrophic forgetting phase. Both gradual and catastrophic forgetting limit the usefulness of model editing methods at scale -- the former making model editing less effective as multiple edits are made to the model while the latter caps the scalability of such model editing methods. Our analysis also highlights other key limitations of ROME and MEMIT at scale. With our work, we push for the development and evaluation of model editing methods keeping scalability in mind.

Updated: 2024-06-10 17:50:14

标题: 规模化模型编辑导致渐进性和灾难性遗忘

摘要: 在大型语言模型中进行知识编辑是一种具有吸引力的能力，它允许我们在预训练期间纠正错误学习的事实，以及通过不断增长的新事实列表更新模型。虽然现有的模型编辑技术表现出了潜力，但它们通常使用可靠性、特异性和对一个或少数编辑的泛化度量进行评估。我们认为，为了使模型编辑具有实际效用，我们必须能够对同一模型进行多次编辑。考虑到这一点，我们以规模为重点评估当前的模型编辑方法，重点关注两种最先进的方法：ROME和MEMIT。我们发现，随着模型依次编辑多个事实，它不断忘记先前编辑过的事实以及执行下游任务的能力。这种遗忘分为两个阶段--初始逐渐但渐进性遗忘阶段，随后是突然或灾难性遗忘阶段。逐渐和灾难性遗忘都限制了规模化模型编辑方法的实用性--前者使模型编辑在对模型进行多次编辑时效果较差，而后者限制了这种模型编辑方法的可扩展性。我们的分析还突出了ROME和MEMIT在规模上的其他关键局限性。通过我们的工作，我们推动开发和评估模型编辑方法，着眼于可扩展性。

更新时间: 2024-06-10 17:50:14

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2401.07453v4

Robust Distribution Learning with Local and Global Adversarial Corruptions

We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (*global* corruptions) and the remaining perturbations have average magnitude bounded by $\rho$ (*local* corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$ for all orthogonal projections $\Pi \in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(\Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + \rho + d^{O(1)}\tilde{O}(n^{-1/k})$ when $P$ has bounded moments of order $2+\delta$, for constant $\delta > 0$. For data distributions with bounded covariance, our finite-sample bounds match the minimax population-level optimum for large sample sizes. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.

Updated: 2024-06-10 17:48:36

标题: 具有局部和全局对抗性污染的鲁棒分布学习

摘要: 我们考虑在对抗环境中学习，其中来自分布$P$的$\varepsilon$比例的样本被任意修改（*全局*污染），其余扰动的平均幅度受到$\rho$的限制（*局部*污染）。鉴于$n$个这样的受损样本，我们寻求一个计算高效的估计量$\hat{P}_n$，以最小化Wasserstein距离$\mathsf{W}_1(\hat{P}_n,P)$。事实上，我们攻击了最小化$\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$的细粒度任务，其中$\Pi \in \mathbb{R}^{d \times d}$为所有正交投影，性能与$\mathrm{rank}(\Pi) = k$成比例。这使我们能够同时考虑均值估计（$k=1$）、分布估计（$k=d$）以及在这两个极端之间插值的设置。我们表征了这一任务的最优总体限制风险，然后开发了一个高效的有限样本算法，当$P$具有有界阶矩$2+\delta$时，误差受到$\sqrt{\varepsilon k} + \rho + d^{O(1)}\tilde{O}(n^{-1/k})$的限制，其中$\delta > 0$为常数。对于具有有界协方差的数据分布，我们的有限样本界与大样本量下的极小总体最优相匹配。我们的高效程序依赖于一种新颖的迹范数逼近理想但难以处理的2-Wasserstein投影估计量。我们将这一算法应用于鲁棒随机优化，并在此过程中发现了一种克服Wasserstein分布鲁棒优化中维度诅咒的新方法。

更新时间: 2024-06-10 17:48:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06509v1

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handling the motion space. In this work, we explore the attention mechanism of pre-trained motion diffusion models. We uncover the roles and interactions of attention elements in capturing and representing intricate human motion patterns, and carefully integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer. Editing features associated with selected motions allows us to confront a challenge observed in prior motion diffusion approaches, which use general directives (e.g., text, music) for editing, ultimately failing to convey subtle nuances effectively. Our work is inspired by how a monkey closely imitates what it sees while maintaining its unique motion patterns; hence we call it Monkey See, Monkey Do, and dub it MoMo. Employing our technique enables accomplishing tasks such as synthesizing out-of-distribution motions, style transfer, and spatial editing. Furthermore, diffusion inversion is seldom employed for motions; as a result, editing efforts focus on generated motions, limiting the editability of real ones. MoMo harnesses motion inversion, extending its application to both real and generated motions. Experimental results show the advantage of our approach over the current art. In particular, unlike methods tailored for specific applications through training, our approach is applied at inference time, requiring no training. Our webpage is at https://monkeyseedocg.github.io.

Updated: 2024-06-10 17:47:14

标题: 猴子看见什么就做什么：利用自注意力在运动扩散中进行零样本运动转移

摘要: 鉴于扩散模型在动作合成方面取得了显著的成果，一个自然的问题就是：我们如何有效地利用这些模型进行动作编辑？现有基于扩散的动作编辑方法忽视了预先训练模型权重中嵌入的先验的潜在潜力，这使得可以操纵潜在特征空间；因此，它们主要集中在处理动作空间。在这项工作中，我们探索了预先训练的运动扩散模型的注意机制。我们揭示了注意元素在捕捉和表示复杂人体运动模式中的角色和交互作用，并仔细整合这些元素，将领导者动作转移到追随者动作，同时保持追随者的微妙特征，从而实现零样本动作转移。编辑与选定动作相关的特征使我们能够应对先前动作扩散方法中观察到的挑战，这些方法使用一般指导（例如文本、音乐）进行编辑，最终未能有效传达微妙的细微差别。我们的工作受到猴子如何密切模仿所见的动作同时保持其独特动作模式的启发；因此我们称之为Monkey See, Monkey Do，简称为MoMo。采用我们的技术可以实现合成超出分布的运动、风格转移和空间编辑等任务。此外，扩散反演很少用于运动；因此，编辑工作集中在生成的动作上，限制了真实动作的可编辑性。MoMo利用运动反演，将其应用扩展到真实和生成的运动。实验结果显示我们的方法优于当前技术。特别是，与通过训练针对特定应用程序定制的方法不同，我们的方法在推断时应用，无需训练。我们的网页地址是https://monkeyseedocg.github.io。

更新时间: 2024-06-10 17:47:14

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.06508v1

Verification-Guided Shielding for Deep Reinforcement Learning

In recent years, Deep Reinforcement Learning (DRL) has emerged as an effective approach to solving real-world tasks. However, despite their successes, DRL-based policies suffer from poor reliability, which limits their deployment in safety-critical domains. As a result, various methods have been put forth to address this issue by providing formal safety guarantees. Two main approaches include shielding and verification. While shielding ensures the safe behavior of the policy by employing an external online component (i.e., a ``shield'') that overruns potentially dangerous actions, this approach has a significant computational cost as the shield must be invoked at runtime to validate every decision. On the other hand, verification is an offline process that can identify policies that are unsafe, prior to their deployment, yet, without providing alternative actions when such a policy is deemed unsafe. In this work, we present verification-guided shielding -- a novel approach that bridges the DRL reliability gap by integrating these two methods. Our approach combines both formal and probabilistic verification tools to partition the input domain into safe and unsafe regions. In addition, we employ clustering and symbolic representation procedures that compress the unsafe regions into a compact representation. This, in turn, allows to temporarily activate the shield solely in (potentially) unsafe regions, in an efficient manner. Our novel approach allows to significantly reduce runtime overhead while still preserving formal safety guarantees. We extensively evaluate our approach on two benchmarks from the robotic navigation domain, as well as provide an in-depth analysis of its scalability and completeness.

Updated: 2024-06-10 17:44:59

标题: 深度强化学习的验证引导屏蔽

摘要: 近年来，深度强化学习（DRL）已经成为解决现实世界任务的有效方法。然而，尽管取得了成功，基于DRL的策略存在着可靠性不佳的问题，这限制了它们在安全关键领域的部署。因此，已经提出了各种方法来解决这个问题，提供正式的安全性保证。两种主要方法包括屏蔽和验证。屏蔽通过利用外部在线组件（即“屏蔽器”）来确保策略的安全行为，覆盖潜在危险的动作，但这种方法在运行时必须调用屏蔽器来验证每个决策，因此具有显著的计算成本。另一方面，验证是一种离线过程，可以在部署之前识别不安全的策略，然而，在认为该策略不安全时并未提供替代行动。在这项工作中，我们提出了验证引导的屏蔽——一种通过整合这两种方法来弥合DRL可靠性差距的新方法。我们的方法结合了形式化和概率验证工具，将输入域划分为安全和不安全区域。此外，我们采用了聚类和符号表示过程，将不安全区域压缩成紧凑的表示。这反过来允许仅在（潜在）不安全区域中以有效的方式临时激活屏蔽器。我们的新方法可以显著减少运行时开销，同时仍保留正式的安全性保证。我们在机器人导航领域的两个基准测试中对我们的方法进行了广泛评估，并提供了其可扩展性和完整性的深入分析。

更新时间: 2024-06-10 17:44:59

领域: cs.LG

下载: http://arxiv.org/abs/2406.06507v1

Online Newton Method for Bandit Convex Optimisation

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

Updated: 2024-06-10 17:44:11

标题: 在线牛顿方法用于劫匪凸优化

摘要: 我们引入了一个计算效率高的零阶赌博凸优化算法，并证明在对抗环境中，其遗憾最多为$ d ^ {3.5} \sqrt {n} \mathrm {polylog}（n，d）$，概率很高，其中$ d $是维度，$ n $是时间跨度。在随机设置中，上限改进为$ M d ^ {2} \sqrt {n} \mathrm {polylog}（n，d）$，其中$ M \in [d ^ {- 1/2}，d ^ {- 1/4}]$是一个常数，取决于约束集的几何形状和所需的计算属性。

更新时间: 2024-06-10 17:44:11

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06506v1

Equivariant Neural Tangent Kernels

Equivariant neural networks have in recent years become an important technique for guiding architecture selection for neural networks with many applications in domains ranging from medical image analysis to quantum chemistry. In particular, as the most general linear equivariant layers with respect to the regular representation, group convolutions have been highly impactful in numerous applications. Although equivariant architectures have been studied extensively, much less is known about the training dynamics of equivariant neural networks. Concurrently, neural tangent kernels (NTKs) have emerged as a powerful tool to analytically understand the training dynamics of wide neural networks. In this work, we combine these two fields for the first time by giving explicit expressions for NTKs of group convolutional neural networks. In numerical experiments, we demonstrate superior performance for equivariant NTKs over non-equivariant NTKs on a classification task for medical images.

Updated: 2024-06-10 17:43:13

标题: 等变神经切向核

摘要: Equivariant神经网络近年来已成为指导神经网络架构选择的重要技术，在医学图像分析到量子化学等领域有许多应用。特别是作为相对于常规表示最一般的线性等变层，群卷积在许多应用中产生了巨大影响。虽然等变结构已被广泛研究，但对于等变神经网络的训练动态了解甚少。同时，神经切向核（NTKs）已成为分析理解宽神经网络训练动态的强大工具。在这项工作中，我们首次将这两个领域结合起来，通过明确表达群卷积神经网络的NTKs。在数值实验中，我们展示了在医学图像分类任务上，等变NTKs相对于非等变NTKs表现出更优异的性能。

更新时间: 2024-06-10 17:43:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.06504v1

Improving Alignment and Robustness with Circuit Breakers

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

Updated: 2024-06-10 17:40:19

标题: 使用断路器提高对齐性和鲁棒性

摘要: 人工智能系统可能采取有害行动，并且极易受到对抗性攻击的影响。我们提出了一种受到最新代表性工程进展启发的方法，即在模型产生有害输出时用“断路器”中断其响应。现有的旨在改善对齐性的技术，如拒绝训练，通常会被绕过。诸如对抗训练之类的技术试图通过对抗具体攻击来弥补这些漏洞。作为拒绝训练和对抗训练的替代方案，断路器直接控制了导致有害输出的表示。我们的技术可应用于仅文本和多模态语言模型，以防止生成有害输出而不牺牲效用，甚至在面对强大的未知攻击时也能实现。值得注意的是，尽管独立图像识别中的对抗稳健性仍然是一个挑战，但断路器可以使更大的多模态系统可靠地抵制旨在产生有害内容的图像“劫持”。最后，我们将我们的方法扩展到人工智能代理，当它们受到攻击时，我们展示了有害行动率的显著降低。我们的方法代表了在开发可靠防止有害行为和对抗性攻击的保障措施方面迈出的重要一步。

更新时间: 2024-06-10 17:40:19

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.04313v2

BloomVQA: Assessing Hierarchical Multi-modal Comprehension

We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxonomy, a classic framework for learning assessment widely adopted in education research. Our data maps to a novel hierarchical graph representation which enables automatic data augmentation and novel measures characterizing model consistency. We perform graded evaluation and reliability analysis on recent multi-modal models. In comparison to low-level tasks, we observe decreased performance on tasks requiring advanced comprehension and cognitive skills with up to 38.0\% drop in VQA accuracy. In comparison to earlier models, GPT-4V demonstrates improved accuracy over all comprehension levels and shows a tendency of bypassing visual inputs especially for higher-level tasks. Current models also show consistency patterns misaligned with human comprehension in various scenarios, demonstrating the need for improvement based on theoretically-grounded criteria.

Updated: 2024-06-10 17:39:04

标题: BloomVQA: 评估层次多模态理解

摘要: 我们提出了一个新颖的VQA数据集BloomVQA，旨在促进对大型视觉语言模型在理解任务上的全面评估。与当前通常侧重于基于事实记忆和简单推理任务而缺乏理论基础的基准相比，我们收集了基于图片故事的多项选择样本，反映了不同层次的理解，如布鲁姆的认知领域分类法所规定的那样，这是一种广泛应用于教育研究中的经典评估框架。我们的数据映射到一种新颖的分层图表示，这种表示使得自动数据增强和表征模型一致性的新方法成为可能。我们对最近的多模态模型进行了分级评估和可靠性分析。与低级任务相比，我们观察到在需要高级理解和认知技能的任务上表现出降低，VQA准确率下降了高达38.0％。与早期模型相比，GPT-4V在所有理解水平上都显示出了提高的准确性，并表现出一种绕过视觉输入的倾向，尤其是对于更高级别的任务。当前模型在各种场景中表现出与人类理解不一致的一致性模式，这表明需要基于理论基础标准的改进。

更新时间: 2024-06-10 17:39:04

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.12716v3

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Policy Switch-Detection Model), an online algorithm that employs dynamic error decay to detect changes in opponents' policies. OPS-DeMo continuously updates its beliefs using an Assumed Opponent Policy (AOP) Bank and selects corresponding responses from a pre-trained Response Policy Bank. Each response policy is trained against consistently strategizing opponents, reducing training uncertainty and enabling the effective use of algorithms like PPO in multi-agent environments. Comparative assessments show that our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting, providing greater robustness to sudden policy shifts and enabling more informed decision-making through precise opponent policy insights.

Updated: 2024-06-10 17:34:44

标题: 多智能体MDP中的自适应对手策略检测：利用运行误差估计进行实时策略切换识别

摘要: 在多智能体强化学习（MARL）中，准确地感知对手的策略对于合作和对抗环境都至关重要，尤其是在动态环境中。虽然Proximal Policy Optimization（PPO）和相关算法如具有经验重播的演员-评论家（ACER），信任区域策略优化（TRPO）和深度确定性策略梯度（DDPG）在单智能体，固定环境中表现良好，但由于对手的非固定和隐藏策略，它们在MARL中存在高方差，导致奖励性能下降。此外，MARL中现有方法面临重大挑战，包括对智能体间通信的需求，对明确奖励信息的依赖，高计算要求和采样效率低下。这些问题使它们在对手可能在没有事先通知的情况下突然改变策略的连续环境中效果较差。在这种背景下，我们提出了OPS-DeMo（在线策略切换检测模型），这是一种在线算法，采用动态误差衰减来检测对手策略的变化。OPS-DeMo通过使用假设对手策略（AOP）库不断更新其信念，并从预先训练的响应策略库中选择相应的响应。每个响应策略针对一直在策略的对手进行训练，减少训练不确定性，并使得像PPO这样的算法在多智能体环境中得以有效使用。比较评估显示，我们的方法在动态场景（如掠食者-猎物设置）中优于PPO训练模型，提供更强大的应对突然策略转变，并通过准确的对手策略洞察力实现更明智的决策制定。

更新时间: 2024-06-10 17:34:44

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06500v1

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrained VLMs performing radiology report generation by suppressing unwanted types of generations. We apply our method to the prevention of hallucinations of prior exams, addressing a long-established problem behavior in models performing chest X-ray report generation. Across our experiments, we find that DPO fine-tuning achieves a 3.2-4.8x reduction in lines hallucinating prior exams while maintaining model performance on clinical accuracy metrics. Our work is, to the best of our knowledge, the first work to apply DPO to medical VLMs, providing a data- and compute- efficient way to suppress problem behaviors while maintaining overall clinical accuracy.

Updated: 2024-06-10 17:31:36

标题: 直接偏好优化用于抑制放大的先验检查在放射学报告生成中

摘要: 最近在生成式视觉-语言模型（VLMs）方面取得的进展对放射学人工智能有着激动人心的潜在影响，然而VLMs也被认为会产生幻觉、无意义的文本和其他不受欢迎的行为，这些行为会浪费临床医生的时间并对患者造成伤害。借鉴最近关于直接偏好优化（DPO）的研究成果，我们提出了一种简单的方法，通过抑制预训练VLMs在放射学报告生成中产生的不受欢迎的类型来修改其行为。我们将这种方法应用于预防先前检查的幻觉，解决了在执行胸部X射线报告生成时存在已久的问题行为。在我们的实验中，我们发现DPO微调可以使幻觉先前检查的行数减少3.2-4.8倍，同时在临床准确性指标上保持模型性能。据我们所知，我们的工作是第一个将DPO应用于医学VLMs的工作，提供了一种数据和计算高效的方法来抑制问题行为，同时保持整体临床准确性。

更新时间: 2024-06-10 17:31:36

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.06496v1

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

For autonomous agents to successfully integrate into human-centered environments, agents should be able to learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) is a promising approach that learns reward functions from human preferences. This enables RL agents to adapt their behavior based on human desires. However, humans live in a world full of diverse information, most of which is not relevant to completing a particular task. It becomes essential that agents learn to focus on the subset of task-relevant environment features. Unfortunately, prior work has largely ignored this aspect; primarily focusing on improving PbRL algorithms in standard RL environments that are carefully constructed to contain only task-relevant features. This can result in algorithms that may not effectively transfer to a more noisy real-world setting. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. We study the effectiveness of R2N in the Extremely Noisy Environment setting, an RL problem setting where up to 95% of the state features are irrelevant distractions. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several state-of-the-art PbRL algorithms in multiple locomotion and control environments.

Updated: 2024-06-10 17:31:07

标题: 使用动态稀疏性增强偏好驱动的强化学习的稳健性

摘要: 为了让自主代理成功地融入以人为中心的环境中，代理应该能够从人类的本地环境中学习并适应。基于偏好的强化学习（PbRL）是一种有前途的方法，可以从人类偏好中学习奖励函数。这使得RL代理能够根据人类的愿望调整其行为。然而，人类生活在一个充满各种信息的世界中，其中大部分与完成特定任务无关。对于代理学习集中在任务相关的环境特征的子集变得至关重要。不幸的是，先前的研究在很大程度上忽视了这一方面；主要集中于改进PbRL算法在精心构建的仅包含任务相关特征的标准RL环境中。这可能导致算法无法有效地转移到更嘈杂的现实世界环境中。因此，本研究提出了R2N（Robust-to-Noise），这是第一个利用动态稀疏训练原则学习稳健奖励模型的PbRL算法，能够专注于任务相关特征。我们研究了R2N在极端嘈杂环境设置中的有效性，这是一个RL问题环境，其中高达95％的状态特征是无关的干扰。在与模拟教师的实验中，我们证明了R2N能够调整其神经网络的稀疏连接性，专注于任务相关特征，使其在多个运动和控制环境中明显优于几种最先进的PbRL算法。

更新时间: 2024-06-10 17:31:07

领域: cs.LG

下载: http://arxiv.org/abs/2406.06495v1

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Autonomous agents that execute human tasks by controlling computers can enhance human productivity and application accessibility. However, progress in this field will be driven by realistic and reproducible benchmarks. We present AndroidWorld, a fully functional Android environment that provides reward signals for 116 programmatic tasks across 20 real-world Android apps. Unlike existing interactive environments, which provide a static test set, AndroidWorld dynamically constructs tasks that are parameterized and expressed in natural language in unlimited ways, thus enabling testing on a much larger and more realistic suite of tasks. Reward signals are derived from the computer's system state, making them durable across task variations and extensible across different apps. To demonstrate AndroidWorld's benefits and mode of operation, we introduce a new computer control agent, M3A. M3A can complete 30.6% of the AndroidWorld's tasks, leaving ample room for future work. Furthermore, we adapt a popular desktop web agent to work on Android, which we find to be less effective on mobile, suggesting future research is needed to achieve universal, cross-domain agents. Finally, we conduct a robustness analysis by testing M3A against a range of task variations on a representative subset of tasks, demonstrating that variations in task parameters can significantly alter a task's complexity and, consequently, an agent's performance, highlighting the importance of testing agents under diverse conditions. AndroidWorld and the experiments in this paper are available at https://github.com/google-research/android_world.

Updated: 2024-06-10 17:30:49

标题: 安卓世界：自主代理的动态基准测试环境

摘要: 自主代理通过控制计算机执行人类任务可以提高人类生产力和应用可访问性。然而，在这一领域的进展将受到现实和可复制基准的推动。我们提出了AndroidWorld，这是一个完全功能的Android环境，为20个真实世界的Android应用程序中的116个程序任务提供奖励信号。与现有的交互式环境不同，AndroidWorld动态构建了参数化的、以自然语言表达的任务，使得可以以更大规模和更现实的任务集进行测试。奖励信号源自计算机的系统状态，使得这些信号可以在任务变化中持久存在，并且可以在不同应用程序中进行扩展。为了展示AndroidWorld的优势和运行方式，我们引入了一个新的计算机控制代理M3A。M3A可以完成AndroidWorld任务的30.6%，为未来工作留下了充分的空间。此外，我们将一款流行的桌面Web代理适应到Android上，发现其在移动端效果不佳，提示未来需要进行跨领域研究以实现通用的代理。最后，我们通过在代表性任务的一组任务变化上测试M3A来进行稳健性分析，结果表明任务参数的变化可以显著改变任务的复杂性，从而影响代理的性能，突显了在不同条件下测试代理的重要性。AndroidWorld和本文中的实验可在https://github.com/google-research/android_world 上获取。

更新时间: 2024-06-10 17:30:49

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.14573v2

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training. In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs.

Updated: 2024-06-10 17:30:17

标题: 将连续潜变量模型扩展为概率积分电路

摘要: 概率积分电路（PICs）最近被引入作为享有表达式生成模型背后关键要素的概率模型：连续潜变量（LVs）。PICs是符号计算图，将连续LV模型定义为函数的层次结构，这些函数相加和相乘，或在某些LV上进行积分。如果LV可以被解析积分掉，PICs是可解的；否则可以通过可解的概率电路（PC）来近似，这种PC编码了一种层次数值积分过程，称为QPCs。目前只有树形PICs得到了探索，通过数值积分对它们进行训练需要大规模的内存密集型处理。在本文中，我们解决了这些问题，并提出：（i）通过任意变量分解建立DAG形状的PICs的流程，（ii）使用张量化电路架构训练PICs的程序，以及（iii）神经功能共享技术，以实现可伸缩的训练。在广泛的实验中，我们展示了功能共享的有效性以及QPCs相对于传统PCs的优越性。

更新时间: 2024-06-10 17:30:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06494v1

When is Multicalibration Post-Processing Necessary?

Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates. Multicalibration is a related notion -- originating in algorithmic fairness -- which requires predictors to be simultaneously calibrated over a potentially complex and overlapping collection of protected subpopulations (such as groups defined by ethnicity, race, or income). We conduct the first comprehensive study evaluating the usefulness of multicalibration post-processing across a broad set of tabular, image, and language datasets for models spanning from simple decision trees to 90 million parameter fine-tuned LLMs. Our findings can be summarized as follows: (1) models which are calibrated out of the box tend to be relatively multicalibrated without any additional post-processing; (2) multicalibration post-processing can help inherently uncalibrated models; and (3) traditional calibration measures may sometimes provide multicalibration implicitly. More generally, we also distill many independent observations which may be useful for practical and effective applications of multicalibration post-processing in real-world contexts.

Updated: 2024-06-10 17:26:39

标题: 何时需要多校准后处理？

摘要: 校准是预测器的一个经过深入研究的属性，可以保证有意义的不确定性估计。多校准是一个相关概念，起源于算法公平性，要求预测器在潜在复杂且重叠的受保护子人群（例如以种族、种族或收入为定义的群体）上同时进行校准。我们进行了第一项全面研究，评估了多校准后处理在一系列表格、图像和语言数据集上的实用性，这些数据集涵盖了从简单决策树到9000万参数经过精细调整的LLM模型。我们的研究结果可以总结如下：（1）开箱即用的已校准模型往往在没有任何额外后处理的情况下相对多校准；（2）多校准后处理可以帮助本质上不校准的模型；（3）传统的校准措施有时可能会隐含地提供多校准。更一般地说，我们还总结了许多独立观察结果，这些结果可能对实际和有效应用多校准后处理于现实环境中有用。

更新时间: 2024-06-10 17:26:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.06487v1

Continuum Attention for Neural Operators

Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.

Updated: 2024-06-10 17:25:46

标题: 神经算子的连续关注

摘要: 变压器，特别是注意机制，已经在机器学习中变得无处不在。它们在建模非局部、长距离相关性方面取得的成功导致它们被广泛应用于自然语言处理、计算机视觉和时间序列问题中。神经算子将函数空间映射到函数空间，如果它们是普遍的，则必然是非线性和非局部的；因此，自然而然地可以问注意机制是否可以用于神经算子的设计。受此启发，我们研究了函数空间中的变压器。我们将注意力定义为无限维函数空间之间的映射，并证明实践中实现的注意机制是这个算子的蒙特卡罗或有限差分逼近。函数空间表述允许设计变压器神经算子，这是一类旨在学习函数空间之间映射的体系结构，我们证明了一种通用逼近结果。将注意算子应用于多维域上定义的函数成本过高，因此需要更高效的基于注意力的体系结构。出于这个原因，我们还引入了计算机视觉中的修补策略的函数空间泛化，并介绍了一类相关的神经算子。在一系列算子学习问题上的数值结果表明了我们关于注意力函数空间表述方法的潜力以及它们在神经算子中的应用。

更新时间: 2024-06-10 17:25:46

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.06486v1

Can Language Models Serve as Text-Based World Simulators?

Virtual environments play a key role in benchmarking advances in complex planning and decision-making tasks but are expensive and complicated to build by hand. Can current language models themselves serve as world simulators, correctly predicting how actions change different world states, thus bypassing the need for extensive manual coding? Our goal is to answer this question in the context of text-based simulators. Our approach is to build and use a new benchmark, called ByteSized32-State-Prediction, containing a dataset of text game state transitions and accompanying game tasks. We use this to directly quantify, for the first time, how well LLMs can serve as text-based world simulators. We test GPT-4 on this dataset and find that, despite its impressive performance, it is still an unreliable world simulator without further innovations. This work thus contributes both new insights into current LLM's capabilities and weaknesses, as well as a novel benchmark to track future progress as new models appear.

Updated: 2024-06-10 17:24:44

标题: 语言模型可以作为基于文本的世界模拟器吗？

摘要: 虚拟环境在复杂规划和决策任务的进展中发挥着关键作用，但手工构建这些环境既昂贵又复杂。当前的语言模型本身能否作为世界模拟器，正确预测行为如何改变不同的世界状态，从而绕过大量手工编码的需要？我们的目标是在基于文本的模拟器的背景下回答这个问题。我们的方法是构建并使用一个名为ByteSized32-State-Prediction的新基准，其中包含一个文本游戏状态转换数据集和相关游戏任务。我们首次直接量化了LLMs能够作为基于文本的世界模拟器的能力。我们在这个数据集上测试了GPT-4，并发现尽管其表现令人印象深刻，但它仍然是一个不可靠的世界模拟器，需要进一步的创新。这项工作为我们提供了关于当前LLM能力和弱点的新见解，同时提供了一个新颖的基准，用于跟踪随着新模型的出现而取得的进展。

更新时间: 2024-06-10 17:24:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06485v1

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers especially on tasks that require in-context retrieval. While more expressive variants of linear transformers which replace the additive outer-product update in linear transformers with the delta rule have been found to be more effective at associative recall, existing algorithms for training such models do not parallelize over sequence length and are thus inefficient to train on modern hardware. This work describes a hardware-efficient algorithm for training linear transformers with the delta rule, which exploits a memory-efficient representation for computing products of Householder matrices. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines such as Mamba and GLA in terms of perplexity and zero-shot performance on downstream tasks (including on tasks that focus on recall). We also experiment with two hybrid models which combine DeltaNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrid models outperform strong transformer baselines.

Updated: 2024-06-10 17:24:42

标题: 使用Delta规则对线性变换器进行序列长度并行化

摘要: 使用线性注意力（即线性变压器）和状态空间模型最近被提出作为变压器具有softmax注意力的线性时间替代方案。然而，这些模型在一些需要上下文检索的任务中仍然表现不佳。虽然曾发现线性变压器的更具表现力的变体，将线性变压器中的加法外积更新替换为增量规则，在联想回忆方面更有效，但是用于训练此类模型的现有算法并不在序列长度上并行化，因此在现代硬件上训练效率低下。本研究描述了一种适用于使用增量规则训练线性变压器的硬件高效算法，利用了一种计算Householder矩阵乘积的内存高效表示。该算法使我们能够将DeltaNet扩展到标准语言建模设置。我们为100B令牌训练了一个13亿模型，发现它在困惑度和零射击性能方面优于最近的线性时间基线，如Mamba和GLA（包括专注于回忆的任务）。我们还尝试了两种混合模型，将DeltaNet层与（1）每隔一层的滑动窗口注意力层或（2）两个全局注意力层相结合，发现这些混合模型优于强大的变压器基线。

更新时间: 2024-06-10 17:24:42

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.06484v1

EchoMamba4Rec: Harmonizing Bidirectional State Space Models with Spectral Filtering for Advanced Sequential Recommendation

Predicting user preferences and sequential dependencies based on historical behavior is the core goal of sequential recommendation. Although attention-based models have shown effectiveness in this field, they often struggle with inference inefficiency due to the quadratic computational complexity inherent in attention mechanisms, especially with long-range behavior sequences. Drawing inspiration from the recent advancements of state space models (SSMs) in control theory, which provide a robust framework for modeling and controlling dynamic systems, we introduce EchoMamba4Rec. Control theory emphasizes the use of SSMs for managing long-range dependencies and maintaining inferential efficiency through structured state matrices. EchoMamba4Rec leverages these control relationships in sequential recommendation and integrates bi-directional processing with frequency-domain filtering to capture complex patterns and dependencies in user interaction data more effectively. Our model benefits from the ability of state space models (SSMs) to learn and perform parallel computations, significantly enhancing computational efficiency and scalability. It features a bi-directional Mamba module that incorporates both forward and reverse Mamba components, leveraging information from both past and future interactions. Additionally, a filter layer operates in the frequency domain using learnable Fast Fourier Transform (FFT) and learnable filters, followed by an inverse FFT to refine item embeddings and reduce noise. We also integrate Gate Linear Units (GLU) to dynamically control information flow, enhancing the model's expressiveness and training stability. Experimental results demonstrate that EchoMamba significantly outperforms existing models, providing more accurate and personalized recommendations.

Updated: 2024-06-10 17:22:33

标题: EchoMamba4Rec: 使用频谱滤波调和双向状态空间模型，实现高级顺序推荐

摘要: 根据历史行为预测用户偏好和顺序依赖是顺序推荐的核心目标。尽管基于注意力机制的模型在这一领域表现出有效性，但由于注意力机制固有的二次计算复杂性，特别是对于长序列行为，它们往往面临推断效率低下的困难。受控制理论中状态空间模型（SSMs）在建模和控制动态系统方面的最新进展启发，我们引入了EchoMamba4Rec。控制理论强调使用SSMs来管理长距离依赖关系，并通过结构化状态矩阵维持推断效率。EchoMamba4Rec利用这些控制关系在顺序推荐中，并结合双向处理和频域滤波以更有效地捕捉用户交互数据中的复杂模式和依赖关系。我们的模型受益于状态空间模型（SSMs）学习和执行并行计算的能力，显著提升计算效率和可扩展性。它具有一个双向 Mamba 模块，包括正向和反向 Mamba 组件，利用过去和未来互动的信息。此外，一个滤波器层在频域中使用可学习的快速傅立叶变换（FFT）和可学习的滤波器进行操作，随后通过反向 FFT 来精炼项目嵌入并减少噪声。我们还集成了门控线性单元（GLU）来动态控制信息流，增强模型的表现力和训练稳定性。实验结果表明，EchoMamba 显著优于现有模型，提供更准确和个性化的推荐。

更新时间: 2024-06-10 17:22:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02638v2

A Taxonomy and Comparative Analysis of IPv4 ID Selection Correctness, Security, and Performance

The battle for a more secure Internet is waged on many fronts, including the most basic of networking protocols. Our focus is the IPv4 Identifier (IPID), an IPv4 header field as old as the Internet with an equally long history as an exploited side channel for scanning network properties, inferring off-path connections, and poisoning DNS caches. This article taxonomizes the 25-year history of IPID-based exploits and the corresponding changes to IPID selection methods. By mathematically analyzing these methods' correctness and security and empirically evaluating their performance, we reveal recommendations for best practice as well as shortcomings of current operating system implementations, emphasizing the value of systematic evaluations in network security.

Updated: 2024-06-10 17:22:17

标题: 一个IPv4 ID选择正确性、安全性和性能的分类和比较分析请问还有其他可以帮助你的地方吗?我可以继续为您提供帮助。

摘要: 在许多方面进行更安全互联网的斗争，包括最基本的网络协议。我们的重点是IPv4标识符（IPID），这是一个与互联网同样古老的IPv4头字段，也是作为扫描网络属性、推断离线连接和毒化DNS缓存的利用侧通道同样悠久的历史。本文对基于IPID的利用的25年历史以及IPID选择方法的相应变化进行分类。通过对这些方法的正确性和安全性进行数学分析，并通过经验评估它们的性能，我们揭示了最佳实践的建议以及当前操作系统实现的不足之处，强调网络安全中系统评估的价值。

更新时间: 2024-06-10 17:22:17

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2406.06483v1

Quantum Equilibrium Propagation for efficient training of quantum systems based on Onsager reciprocity

The widespread adoption of machine learning and artificial intelligence in all branches of science and technology has created a need for energy-efficient, alternative hardware platforms. While such neuromorphic approaches have been proposed and realised for a wide range of platforms, physically extracting the gradients required for training remains challenging as generic approaches only exist in certain cases. Equilibrium propagation (EP) is such a procedure that has been introduced and applied to classical energy-based models which relax to an equilibrium. Here, we show a direct connection between EP and Onsager reciprocity and exploit this to derive a quantum version of EP. This can be used to optimize loss functions that depend on the expectation values of observables of an arbitrary quantum system. Specifically, we illustrate this new concept with supervised and unsupervised learning examples in which the input or the solvable task is of quantum mechanical nature, e.g., the recognition of quantum many-body ground states, quantum phase exploration, sensing and phase boundary exploration. We propose that in the future quantum EP may be used to solve tasks such as quantum phase discovery with a quantum simulator even for Hamiltonians which are numerically hard to simulate or even partially unknown. Our scheme is relevant for a variety of quantum simulation platforms such as ion chains, superconducting qubit arrays, neutral atom Rydberg tweezer arrays and strongly interacting atoms in optical lattices.

Updated: 2024-06-10 17:22:09

标题: 基于Onsager互易性的量子平衡传播用于高效训练量子系统

摘要: 机器学习和人工智能在科学技术的各个领域得到了广泛应用，这导致了对能效高且替代性硬件平台的需求。虽然已经提出并实现了这种神经形态学方法用于各种平台，但提取用于训练的梯度仍然具有挑战性，因为通用方法只在某些情况下存在。平衡传播（EP）是这样一种程序，它已被引入并应用于经典能量基模型，这些模型会松弛到一个平衡态。在这里，我们展示了EP与Onsager互易之间的直接联系，并利用这一点推导出EP的量子版本。这可以用来优化依赖于任意量子系统的可观测量期望值的损失函数。具体地，我们通过监督和无监督学习示例说明了这一新概念，其中输入或可解任务是量子力学性质的，例如识别量子多体基态、量子相探索、传感和相界探索。我们提出，未来量子EP可能被用于解决一些任务，比如使用量子模拟器进行量子相发现，即使对于数值难以模拟或部分未知的哈密顿量。我们的方案与各种量子模拟平台相关，例如离子链、超导量子位阵列、中性原子Rydberg镊子阵列和在光学晶格中强相互作用的原子。

更新时间: 2024-06-10 17:22:09

领域: quant-ph,cond-mat.dis-nn,cs.ET,cs.LG

下载: http://arxiv.org/abs/2406.06482v1

Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics

On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such as the exponential dependence of the sample complexity, i.e., the number of samples required to solve an approximation problem, on the dimension of the ambient space. However, although DNNs have been used to solve PDEs since the `90s, the literature underpinning their mathematical efficiency in terms of numerical analysis (i.e., stability, accuracy, and sample complexity), is only recently beginning to emerge. In this paper, we leverage recent advancements in function approximation using sparsity-based techniques and random sampling to develop and analyze an efficient high-dimensional PDE solver based on DL. We show, both theoretically and numerically, that it can compete with a novel stable and accurate compressive spectral collocation method. In particular, we demonstrate a new practical existence theorem, which establishes the existence of a class of trainable DNNs with suitable bounds on the network architecture and a sufficient condition on the sample complexity, with logarithmic or, at worst, linear scaling in dimension, such that the resulting networks stably and accurately approximate a diffusion-reaction PDE with high probability.

Updated: 2024-06-10 17:22:08

标题: 物理学启发的深度学习和高维扩散反应方程的压缩求解：实际存在理论和数值解析

摘要: 在科学计算的前沿，深度学习（DL），即使用深度神经网络（DNNs）进行机器学习，已经成为解决偏微分方程（PDEs）的强大新工具。观察到DNNs特别适合减弱维度诅咒的影响，这个术语由Richard E. Bellman在50年代后期创造，用来描述挑战，如样本复杂性指数依赖性，即解决近似问题所需样本数量与环境空间维度的关系。然而，尽管自90年代以来DNNs已被用于解决PDEs，但支撑其在数值分析方面的数学效率（即稳定性、准确性和样本复杂性）的文献直到最近才开始出现。在这篇论文中，我们利用基于稀疏技术和随机抽样的最新进展，开发和分析了一种基于DL的高维PDE求解器。我们理论上和数值上展示，它可以与一种新颖的稳定和准确的压缩谱插值方法竞争。特别是，我们展示了一个新的实用存在定理，该定理建立了一类可训练的DNNs存在的条件，网络架构具有合适的界限，样本复杂性具有对数或最坏情况下线性的维度缩放，从而使得结果网络稳定地和准确地近似扩散-反应PDE的概率很高。

更新时间: 2024-06-10 17:22:08

领域: cs.LG,cs.IT,cs.NA,math.IT,math.NA

下载: http://arxiv.org/abs/2406.01539v2

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Data sets with imbalanced class sizes, often where one class size is much smaller than that of others, occur extremely often in various applications, including those with biological foundations, such as drug discovery and disease diagnosis. Thus, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to detect can result in heavy costs. However, many data classification algorithms do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this paper, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification problems on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed method not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer model based on an attention mechanism for self-supervised learning. Additionally, the method implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed model is validated using six molecular data sets, and we also provide a thorough comparison to other competing algorithms. The computational experiments show that the proposed method performs better than competing techniques even when the class imbalance ratio is very high.

Updated: 2024-06-10 17:20:13

标题: 基于图的双向Transformer决策阈值调整算法用于类别不平衡的分子数据

摘要: 在许多应用中，包括具有生物基础的应用，如药物发现和疾病诊断，数据集中存在类别不平衡的情况，通常其中一个类别的大小远远小于其他类别的大小。因此，能够识别不同大小类别的数据元素非常重要，因为未能检测到可能会导致巨大的成本。然而，许多数据分类算法在不平衡数据集上表现不佳，因为它们通常无法检测到属于少数类的元素。在本文中，我们提出了BTDT-MBO算法，该算法结合了Merriman-Bence-Osher（MBO）技术和双向变压器，以及距离相关和决策阈值调整，用于处理高度不平衡的分子数据集上的数据分类问题，其中类别的大小差异很大。所提出的方法不仅整合了MBO算法的分类阈值调整，以帮助处理类别不平衡，还利用基于注意机制的双向变压器模型进行自监督学习。此外，该方法在调整后的MBO算法操作的基于相似性图的框架上实现了距离相关作为权重函数。所提出的模型使用六个分子数据集进行验证，并与其他竞争算法进行了彻底比较。计算实验表明，即使类别不平衡比例非常高，所提出的方法仍然优于竞争技术。

更新时间: 2024-06-10 17:20:13

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.06479v1

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model. Specifically, we prove that the gradient flow dynamics can be split into three phases -- a warm-up phase where the loss decreases rather slowly and the attention heads gradually build up their inclination towards individual tasks, an emergence phase where each head selects a single task and the loss rapidly decreases, and a convergence phase where the attention parameters converge to a limit. Furthermore, we prove the optimality of gradient flow in the sense that the limiting model learned by gradient flow is on par with the best possible multi-head softmax attention model up to a constant factor. Our analysis also delineates a strict separation in terms of the prediction accuracy of ICL between single-head and multi-head attention models. The key technique for our convergence analysis is to map the gradient flow dynamics in the parameter space to a set of ordinary differential equations in the spectral domain, where the relative magnitudes of the semi-singular values of the attention weights determines task allocation. To our best knowledge, our work provides the first convergence result for the multi-head softmax attention model.

Updated: 2024-06-10 17:18:07

标题: 多头Softmax注意力的培训动态对于上下文学习的影响：出现、收敛和最优性

摘要: 我们研究了用于上下文学习多任务线性回归的多头softmax注意力模型的梯度流动力学。在适当选择初始化的情况下，我们建立了梯度流的全局收敛性。此外，我们证明了在梯度流动力学过程中出现了一个有趣的“任务分配”现象，其中每个注意力头都专注于解决多任务模型的单个任务。具体地，我们证明了梯度流动力学可以分为三个阶段--一个热身阶段，在这个阶段损失下降得相当缓慢，注意力头逐渐倾向于各自的任务，一个出现阶段，在这个阶段每个头选择一个单一任务，损失迅速下降，以及一个收敛阶段，在这个阶段注意力参数收敛到一个极限。此外，我们证明了梯度流在一定意义上的最优性，即通过梯度流学习得到的极限模型与最佳多头softmax注意力模型相比具有相同的水平。我们的分析还在ICL的预测准确性方面明确了单头和多头注意力模型之间的严格分离。我们收敛分析的关键技术是将参数空间中的梯度流动力学映射到频谱域中的一组常微分方程，其中注意力权重的半奇异值的相对大小决定了任务分配。据我们所知，我们的工作为多头softmax注意力模型提供了第一个收敛结果。

更新时间: 2024-06-10 17:18:07

领域: cs.LG,cs.AI,math.OC,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.19442v2

Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives

Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, operations, and product sophistication. Drawing upon our experiences in successfully integrating GAI into several major social and e-commerce platforms, this survey aims to comprehensively examine the underlying system and AI foundations, solution frameworks, connections to key research advancements, as well as summarize the practical insights and challenges encountered in the endeavor to integrate GAI into industrial Recsys. As pioneering work in this domain, we hope outline the representative developments of relevant fields, shed lights on practical GAI adoptions in the industry, and motivate future research.

Updated: 2024-06-10 17:16:59

标题: 调查社交和电子商务推荐系统中生成式人工智能的落地情况--行业视角

摘要: 最近，生成式人工智能(GAI)凭借其新兴的能力，为增强和革新工业推荐系统(Recsys)提供了独特的机遇。尽管在这些领域交叉处有越来越多的研究工作，但由于现代工业Recsys基础设施、运营和产品复杂性的复杂性，将GAI整合到工业Recsys中仍处于起步阶段。借鉴我们成功将GAI整合到几个主要社交和电子商务平台的经验，本调查旨在全面审视潜在的系统和人工智能基础、解决方案框架、与关键研究进展的联系，以及总结在将GAI整合到工业Recsys的努力中遇到的实际见解和挑战。作为该领域的开拓性工作，我们希望概述相关领域的代表性发展，为行业中实际采用GAI提供启示，并激励未来的研究。

更新时间: 2024-06-10 17:16:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.06475v1

Towards a Personal Health Large Language Model

In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.

Updated: 2024-06-10 17:16:49

标题: 走向个人健康大语言模型

摘要: 在健康领域，大多数大型语言模型（LLM）研究都集中在临床任务上。然而，移动和可穿戴设备很少与这些任务整合，为个人健康监测提供了丰富的、纵向的数据。在这里，我们介绍了个人健康大型语言模型（PH-LLM），从Gemini进行了微调，用于理解和推理数值时间序列个人健康数据。我们创建和整理了三个数据集，用于测试：1）从睡眠模式、体力活动和生理反应中产生个性化见解和建议，2）专家领域知识，以及3）预测自我报告的睡眠结果。对于第一个任务，我们与领域专家合作设计了857个案例研究，以评估睡眠和健身领域的真实场景。通过对领域特定评分标准的全面评估，我们观察到Gemini Ultra 1.0和PH-LLM在健身方面与专家表现没有统计学差异，而专家在睡眠方面仍然优于其，微调PH-LLM在使用相关领域知识和个性化信息为睡眠见解提供了显著改进。我们使用多项选择睡眠医学和健身考试评估了PH-LLM的领域知识。PH-LLM在睡眠方面取得了79%的成绩，在健身方面取得了88%的成绩，超过了一组人类专家的平均分数。最后，我们训练了PH-LLM来预测从可穿戴数据的文本和多模态编码表示中自我报告的睡眠质量结果，并展示了多模态编码是必需的，以匹配专门的辨别模型的性能。尽管在安全关键的个人健康领域中需要进一步的发展和评估，但这些结果既展示了Gemini模型的广泛知识和能力，也展示了将生理数据情境化为个人健康应用的益处，就像PH-LLM所做的那样。

更新时间: 2024-06-10 17:16:49

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06474v1

DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents

Children's and adolescents' online data privacy are regulated by laws such as the Children's Online Privacy Protection Act (COPPA) and the California Consumer Privacy Act (CCPA). Online services that are directed towards general audiences (i.e., including children, adolescents, and adults) must comply with these laws. In this paper, first, we present DiffAudit, a platform-agnostic privacy auditing methodology for general audience services. DiffAudit performs differential analysis of network traffic data flows to compare data processing practices (i) between child, adolescent, and adult users and (ii) before and after consent is given and user age is disclosed. We also present a data type classification method that utilizes GPT-4 and our data type ontology based on COPPA and CCPA, allowing us to identify considerably more data types than prior work. Second, we apply DiffAudit to a set of popular general audience mobile and web services and observe a rich set of behaviors extracted from over 440K outgoing requests, containing 3,968 unique data types we extracted and classified. We reveal problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.

Updated: 2024-06-10 17:14:53

标题: DiffAudit：审计面向儿童和青少年的在线服务的隐私实践

摘要: 儿童和青少年的在线数据隐私受到《儿童在线隐私保护法》（COPPA）和《加利福尼亚消费者隐私法》（CCPA）等法律的监管。针对广大受众（包括儿童、青少年和成人）的在线服务必须遵守这些法律。在本文中，首先我们介绍了DiffAudit，这是一个面向广大受众服务的平台无关隐私审计方法。DiffAudit通过对网络流量数据的差异分析，比较数据处理实践（i）在儿童、青少年和成人用户之间以及（ii）在同意给予和用户年龄披露之前和之后的情况。我们还提出了一种利用GPT-4和基于COPPA和CCPA的数据类型本体论的数据类型分类方法，使我们能够识别比以前的工作更多的数据类型。其次，我们将DiffAudit应用于一组热门的广泛受众移动和网络服务，并观察从超过440K的出站请求中提取和分类的3,968种独特数据类型中提取的丰富行为。我们揭示了在同意和年龄披露之前存在的问题数据处理实践，缺乏对特定年龄数据流的区分，不一致的隐私政策披露，以及与第三方（包括广告和跟踪服务）分享可链接数据。

更新时间: 2024-06-10 17:14:53

领域: cs.CR

下载: http://arxiv.org/abs/2406.06473v1

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs.-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e.g., CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e.g., GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.

Updated: 2024-06-10 17:14:09

标题: GTBench：通过博弈论评估揭示LLM的战略推理局限性

摘要: 随着大型语言模型（LLMs）被整合到关键的现实世界应用程序中，它们的战略和逻辑推理能力变得越来越关键。本文通过博弈论任务（例如需要纯逻辑和战略推理来与对手竞争的棋盘和卡牌游戏）评估LLMs在竞争环境中的推理能力。我们首先提出了GTBench，这是一个由10个广泛认可的任务组成的以语言驱动的环境，涵盖了一个全面的游戏分类法：完整与不完整信息，动态与静态，以及概率与确定性场景。然后，我们（1）表征了LLMs的博弈论推理能力；（2）进行了LLM对LLM的竞争以评估推理能力。我们观察到（1）LLMs在不同的游戏场景中表现出不同的行为；例如，在完整和确定性游戏中LLMs失败，但在概率游戏场景中具有竞争力；（2）大多数开源LLMs（例如CodeLlama-34b-Instruct和Llama-2-70b-chat）在复杂游戏中不如商业LLMs（例如GPT-4）竞争力强，然而最近发布的Llama-3-70b-Instruct弥补了这一不足。此外，代码预训练极大地有利于战略推理，而像CoT（Chain-of-Thought）和ToT（Tree-of-Thought）这样的高级推理方法并不总是有帮助的。我们进一步表征了LLMs的博弈论特性，例如在重复游戏中的均衡和帕累托效率。为了更好地理解LLMs的行为，我们提供了详细的错误分析。我们希望我们的研究提供了标准化的协议，并作为激发LLMs战略推理进一步探索的基础。

更新时间: 2024-06-10 17:14:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12348v2

The fast committor machine: Interpretable prediction with kernels

In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration $x$ will reach a set $B$ before a set $A$. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces which optimally describe the $A$ to $B$ transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly in the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.

Updated: 2024-06-10 17:13:52

标题: 快速提交机器：基于核的可解释预测

摘要: 在随机系统的研究中，committor函数描述了从初始配置$x$开始的系统在达到集合$B$之前到达集合$A$的概率。本文介绍了一种用于近似committor的高效且可解释的算法，称为“快速committor机器”（FCM）。FCM利用模拟轨迹数据构建committor的基于核的模型。核函数被构造为强调最佳描述$A$到$B$转换的低维子空间。核模型中的系数使用随机线性代数确定，导致运行时按数据点数线性缩放。在涉及三井势和丙氨酸二肽的数值实验中，FCM比具有相同参数数量的神经网络具有更高的准确性并且训练速度更快。FCM也比神经网络更具可解释性。

更新时间: 2024-06-10 17:13:52

领域: math.NA,cs.LG,cs.NA,stat.ML,82C31, 82C32, 65C30, 65C40

下载: http://arxiv.org/abs/2405.10410v2

GKAN: Graph Kolmogorov-Arnold Networks

We introduce Graph Kolmogorov-Arnold Networks (GKAN), an innovative neural network architecture that extends the principles of the recently proposed Kolmogorov-Arnold Networks (KAN) to graph-structured data. By adopting the unique characteristics of KANs, notably the use of learnable univariate functions instead of fixed linear weights, we develop a powerful model for graph-based learning tasks. Unlike traditional Graph Convolutional Networks (GCNs) that rely on a fixed convolutional architecture, GKANs implement learnable spline-based functions between layers, transforming the way information is processed across the graph structure. We present two different ways to incorporate KAN layers into GKAN: architecture 1 -- where the learnable functions are applied to input features after aggregation and architecture 2 -- where the learnable functions are applied to input features before aggregation. We evaluate GKAN empirically using a semi-supervised graph learning task on a real-world dataset (Cora). We find that architecture generally performs better. We find that GKANs achieve higher accuracy in semi-supervised learning tasks on graphs compared to the traditional GCN model. For example, when considering 100 features, GCN provides an accuracy of 53.5 while a GKAN with a comparable number of parameters gives an accuracy of 61.76; with 200 features, GCN provides an accuracy of 61.24 while a GKAN with a comparable number of parameters gives an accuracy of 67.66. We also present results on the impact of various parameters such as the number of hidden nodes, grid-size, and the polynomial-degree of the spline on the performance of GKAN.

Updated: 2024-06-10 17:09:38

标题: GKAN：图科尔莫戈洛夫-阿诺德网络

摘要: 我们介绍了图科尔莫高洛夫-阿诺德网络（GKAN），这是一种创新的神经网络架构，将最近提出的科尔莫高洛夫-阿诺德网络（KAN）的原则扩展到图结构化数据中。通过采用KAN的独特特征，特别是使用可学习的单变量函数而不是固定线性权重，我们为基于图的学习任务开发了一个强大的模型。与依赖固定卷积架构的传统图卷积网络（GCN）不同，GKAN在层之间实现了可学习的基于样条的函数，改变了信息在图结构中的处理方式。我们提出了两种不同的方法来将KAN层合并到GKAN中：架构1 - 在聚合后将可学习函数应用于输入特征；架构2 - 在聚合前将可学习函数应用于输入特征。我们在一个真实世界数据集（Cora）上通过半监督图学习任务对GKAN进行了实证评估。我们发现架构通常表现更好。我们发现，与传统GCN模型相比，GKAN在图的半监督学习任务中实现了更高的准确性。例如，考虑100个特征时，GCN提供53.5的准确性，而具有相同参数数量的GKAN提供61.76的准确性；考虑200个特征时，GCN提供61.24的准确性，而具有相同参数数量的GKAN提供67.66的准确性。我们还展示了各种参数（如隐藏节点数、网格大小和样条的多项式次数）对GKAN性能的影响结果。

更新时间: 2024-06-10 17:09:38

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.06470v1

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving numerical, tabular, and knowledge-based reasoning. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models and updating the current solution state. We identify a thorough ontology of actions for addressing complex tasks and curate high-quality data to train expert models for executing these actions. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets. Moreover, we introduce HuskyQA, a new evaluation set which stress tests language agents for mixed-tool reasoning, with a focus on retrieving missing knowledge and performing numerical reasoning. Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems. Our code and models are available at https://github.com/agent-husky/Husky-v1.

Updated: 2024-06-10 17:07:25

标题: 哈士奇：一个统一的、开源的语言智能体，用于多步推理

摘要: 语言代理通过使用工具来精确执行每个步骤来执行复杂的任务。然而，大多数现有的代理都基于专有模型或设计用于针对特定任务，比如数学或多跳问题回答。我们介绍了Husky，这是一个整体的、开源的语言代理，它学会了在统一的行动空间上进行推理，以解决涉及数字、表格和基于知识的推理的各种复杂任务。Husky在两个阶段之间迭代：1）生成下一个行动以解决给定任务，2）使用专家模型执行行动并更新当前解决方案状态。我们确定了用于解决复杂任务的行动的详尽本体论，并整理了高质量的数据来训练执行这些行动的专家模型。我们的实验表明，Husky在14个评估数据集上表现优于先前的语言代理。此外，我们介绍了HuskyQA，这是一个新的评估集，用于对语言代理进行混合工具推理的压力测试，重点是检索缺失知识和进行数值推理。尽管使用了7B模型，Husky在这些任务上与前沿的语言模型如GPT-4相匹敌甚至超越，展示了我们整体方法在解决复杂推理问题方面的有效性。我们的代码和模型可在https://github.com/agent-husky/Husky-v1上找到。

更新时间: 2024-06-10 17:07:25

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06469v1

Active Neural 3D Reconstruction with Colorized Surface Voxel-based View Selection

Active view selection in 3D scene reconstruction has been widely studied since training on informative views is critical for reconstruction. Recently, Neural Radiance Fields (NeRF) variants have shown promising results in active 3D reconstruction using uncertainty-guided view selection. They utilize uncertainties estimated with neural networks that encode scene geometry and appearance. However, the choice of uncertainty integration methods, either voxel-based or neural rendering, has conventionally depended on the types of scene uncertainty being estimated, whether geometric or appearance-related. In this paper, we introduce Colorized Surface Voxel (CSV)-based view selection, a new next-best view (NBV) selection method exploiting surface voxel-based measurement of uncertainty in scene appearance. CSV encapsulates the uncertainty of estimated scene appearance (e.g., color uncertainty) and estimated geometric information (e.g., surface). Using the geometry information, we interpret the uncertainty of scene appearance 3D-wise during the aggregation of the per-voxel uncertainty. Consequently, the uncertainty from occluded and complex regions is recognized under challenging scenarios with limited input data. Our method outperforms previous works on popular datasets, DTU and Blender, and our new dataset with imbalanced viewpoints, showing that the CSV-based view selection significantly improves performance by up to 30%.

Updated: 2024-06-10 17:05:28

标题: 主动神经网络三维重建与着色表面体素视图选择

摘要: 三维场景重建中的主动视图选择已经得到广泛研究，因为在重建过程中训练具有信息量的视图至关重要。最近，神经辐射场（NeRF）的变种已经展示出在使用基于不确定性引导的视图选择进行主动三维重建方面有着令人期待的结果。它们利用了用神经网络估计的不确定性来编码场景的几何和外观。然而，不确定性整合方法的选择，无论是基于体素还是神经渲染，传统上取决于正在估计的场景不确定性的类型，无论是几何还是外观相关的。在本文中，我们介绍了基于着色表面体素（CSV）的视图选择，这是一种新的下一个最佳视图（NBV）选择方法，利用基于表面体素的场景外观不确定性测量。CSV 封装了估计的场景外观（例如，颜色不确定性）和估计的几何信息（例如，表面）的不确定性。利用几何信息，在每个体素的不确定性聚合过程中，我们对场景外观的不确定性进行了三维解释。因此，在具有有限输入数据的挑战性场景下，能够识别遮挡和复杂区域的不确定性。我们的方法在流行数据集 DTU 和 Blender 上优于先前的作品，还在我们的新数据集中表现出色，该数据集具有不平衡的视点，表明基于 CSV 的视图选择通过提高性能高达 30%。

更新时间: 2024-06-10 17:05:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02568v2

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of 'distribution locality' to capture when weak learning is efficiently achievable by regular Transformers, where the locality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high locality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore, we show that (i) an agnostic scratchpad cannot help to break the locality barrier, (ii) an educated scratchpad can help if it breaks the locality at each step, (iii) a notion of 'inductive scratchpad' can both break the locality and improve the out-of-distribution generalization, e.g., generalizing to almost double input size for some arithmetic tasks.

Updated: 2024-06-10 17:05:12

标题: 变压器能够推理到什么程度？局部性障碍和归纳式记事板

摘要: 变压器能否通过组合已建立的三段论来预测新的三段论？更普遍地说，此类模型能够从零开始学习什么类型的目标？最近的研究表明，变压器在表达能力上可以达到图灵完备，但这并不涉及学习目标。本文提出了“分布局部性”这一概念，以捕捉普通变压器有效实现弱学习时的情况，其中局部性衡量了与目标相关的非平凡性所需的最少数量的标记，除了标记直方图之外。通过实验证明，高局部性的分布无法有效学习。特别地，三段论无法在长链上组合。此外，我们还表明：（i）一种不可知的记事本无法帮助突破局部性障碍，（ii）一个受过教育的记事本可以在每一步突破局部性时有所帮助，（iii）一个“归纳性记事本”的概念既能突破局部性，又能改善超出分布的泛化能力，例如在某些算术任务中将输入大小几乎提高一倍。

更新时间: 2024-06-10 17:05:12

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.06467v1

Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review

As the manufacturing industry advances with sensor integration and automation, the opaque nature of deep learning models in machine learning poses a significant challenge for fault detection and diagnosis. And despite the related predictive insights Artificial Intelligence (AI) can deliver, advanced machine learning engines often remain a black box. This paper reviews the eXplainable AI (XAI) tools and techniques in this context. We explore various XAI methodologies, focusing on their role in making AI decision-making transparent, particularly in critical scenarios where humans are involved. We also discuss current limitations and potential future research that aims to balance explainability with model performance while improving trustworthiness in the context of AI applications for critical industrial use cases.

Updated: 2024-06-10 17:04:10

标题: 可解释的人工智能技术用于准确的故障检测和诊断：一项综述

摘要: 随着制造业在传感器集成和自动化方面的不断发展，机器学习中深度学习模型的不透明性对于故障检测和诊断构成了重大挑战。尽管人工智能（AI）能够提供相关的预测洞察力，但先进的机器学习引擎往往仍然是一个黑盒。本文回顾了在这一背景下的可解释人工智能（XAI）工具和技术。我们探讨了各种XAI方法论，重点关注它们在使AI决策透明化方面的作用，特别是在涉及人类的关键场景中。我们还讨论了当前的局限性以及未来可能的研究，旨在在改进AI应用于关键工业用例的情况下平衡可解释性与模型性能的同时提高可信度。

更新时间: 2024-06-10 17:04:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.11597v2

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation. Previous TVP methods make significant breakthroughs by adapting Stable Diffusion for this task. However, they struggle with frame consistency and temporal stability primarily due to the limited scale of video datasets. We observe that pretrained Image2Video diffusion models possess good priors for video dynamics but they lack textual control. Hence, transferring Image2Video models to leverage their video dynamic priors while injecting instruction control to generate controllable videos is both a meaningful and challenging task. To achieve this, we introduce the Multi-Modal Large Language Model (MLLM) to predict future video states based on initial frames and text instructions. More specifically, we design a dual query transformer (DQFormer) architecture, which integrates the instructions and frames into the conditional embeddings for future frame prediction. Additionally, we develop Long-Short Term Temporal Adapters and Spatial Adapters that can quickly transfer general video diffusion models to specific scenarios with minimal training costs. Experimental results show that our method significantly outperforms state-of-the-art techniques on four datasets: Something Something V2, Epic Kitchen-100, Bridge Data, and UCF-101. Notably, AID achieves 91.2% and 55.5% FVD improvements on Bridge and SSv2 respectively, demonstrating its effectiveness in various domains. More examples can be found at our website https://chenhsing.github.io/AID.

Updated: 2024-06-10 17:02:08

标题: AID：为指导视频预测调整图像到视频扩散模型

摘要: 文本引导的视频预测（TVP）涉及根据指令从初始帧预测未来帧的运动，这在虚拟现实、机器人技术和内容创作等领域具有广泛的应用。先前的TVP方法通过将稳定扩散应用于这一任务取得了重大突破。然而，由于视频数据集的规模有限，它们在帧一致性和时间稳定性方面仍存在困难。我们观察到，预训练的Image2Video扩散模型具有良好的视频动态先验，但它们缺乏文本控制。因此，将Image2Video模型转移到注入指令控制以生成可控视频是一项有意义且具有挑战性的任务。为了实现这一目标，我们引入了多模态大语言模型（MLLM）来基于初始帧和文本指令预测未来视频状态。更具体地，我们设计了一个双查询变压器（DQFormer）架构，将指令和帧集成到未来帧预测的条件嵌入中。此外，我们开发了长短期时间适配器和空间适配器，可以快速将一般视频扩散模型转移到具体场景，而训练成本很低。实验结果表明，我们的方法在四个数据集上显著优于最先进的技术：Something Something V2，Epic Kitchen-100，Bridge Data和UCF-101。值得注意的是，AID在Bridge和SSv2上分别取得了91.2％和55.5％的FVD改进，证明了其在各个领域的有效性。更多示例可在我们的网站https://chenhsing.github.io/AID找到。

更新时间: 2024-06-10 17:02:08

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2406.06465v1

Transforming Wearable Data into Health Insights using Large Language Model Agents

Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.

Updated: 2024-06-10 17:00:54

标题: 利用大型语言模型代理将可穿戴设备数据转化为健康见解

摘要: 尽管可穿戴健康追踪器的普及和睡眠和锻炼对健康的重要性，但从可穿戴数据中获取可操作的个性化见解仍然是一个挑战，因为这需要对这些数据进行复杂的开放式分析。最近大型语言模型(LLM)代理的兴起，这些代理可以使用工具来推理和与世界互动，为在规模上实现个性化分析提供了有希望的机会。然而，在分析个人健康方面应用LLM代理仍然大部分有待开发。在本文中，我们介绍了个人健康见解代理(PHIA)，这是一个利用最先进的代码生成和信息检索工具来分析和解释可穿戴设备中的行为健康数据的代理系统。我们策划了两个超过4000个健康见解问题的基准问答数据集。根据650小时的人类和专家评估，我们发现PHIA能够准确回答超过84%的事实性数字问题和超过83%的众包开放式问题。这项工作对推进整个人群的行为健康具有重要意义，有可能使个人能够解释自己的可穿戴数据，并为基于数据驱动见解的新时代的可访问、个性化的健康计划铺平道路。

更新时间: 2024-06-10 17:00:54

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06464v1

VCR: Visual Caption Restoration

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

Updated: 2024-06-10 16:58:48

标题: VCR：视觉字幕修复

摘要: 我们介绍了Visual Caption Restoration（VCR），这是一项新颖的视觉-语言任务，挑战模型使用图像中的像素级提示准确恢复部分遮挡的文本。这项任务源于观察到，嵌入图像中的文本与常见的视觉元素和自然语言本质上有所不同，因为需要对视觉、文本和嵌入图像中的文本的模态进行对齐。虽然许多研究将嵌入图像中的文本整合到视觉问答任务中，但对这些任务的方法通常依赖于光学字符识别或遮罪语言建模，从而将任务主要转化为基于文本的处理。然而，在VCR中，基于文本的处理变得无效，因为准确的文本恢复取决于提供的图像、上下文和遮挡文本的微小暴露区域的微妙线索的综合信息。我们开发了一个管道，使用图像-标题对生成VCR任务的合成图像，可以调整标题的可见性来控制任务的难度。通过这个管道，我们构建了一个名为VCR-Wiki的VCR数据集，使用来自维基百科的带标题的图像，包括211万英文实体和34.6万中文实体，分为简单和困难两个变体。我们的结果显示，当前的视觉语言模型在VCR任务中明显落后于人类表现，仅仅在我们的数据集上微调模型并不能带来显著的改进。我们发布了VCR-Wiki和数据构建代码，以促进未来的研究。

更新时间: 2024-06-10 16:58:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06462v1

The Synchronic Web

The Synchronic Web is a distributed network for securing data provenance on the World Wide Web. By enabling clients around the world to freely commit digital information into a single shared view of history, it provides a foundational basis of truth on which to build decentralized and scalable trust across the Internet. Its core cryptographical capability allows mutually distrusting parties to create and verify statements of the following form: "I commit to this information--and only this information--at this moment in time." The backbone of the Synchronic Web infrastructure is a simple, small, and semantic-free blockchain that is accessible to any Internet-enabled entity. The infrastructure is maintained by a permissioned network of well-known servers, called notaries, and accessed by a permissionless group of clients, called ledgers. Through an evolving stack of flexible and composable semantic specifications, the parties cooperate to generate synchronic commitments over arbitrary data. When integrated with existing infrastructures, adapted to diverse domains, and scaled across the breadth of cyberspace, the Synchronic Web provides a ubiquitous mechanism to lock the world's data into unique points in discrete time and digital space.

Updated: 2024-06-10 16:58:01

标题: 《同步网络》

摘要: Synchronic Web 是一个分布式网络，用于确保数据溯源在全球范围内的网络上。通过使世界各地的客户能够自由地将数字信息提交到历史的单一共享视图中，它为建立互联网上的分散化和可扩展的信任提供了一个基本的真实基础。其核心加密能力允许互不信任的各方创建和验证以下形式的声明：“我承诺在此时刻提交此信息 - 仅此信息。” Synchronic Web 基础设施的骨干是一个简单、小型且无语义的区块链，可被任何联网实体访问。该基础设施由一组众所周知的服务器（称为公证人）维护，并由一组无需许可的客户（称为分类账）访问。通过不断发展的灵活和可组合的语义规范堆栈，各方合作生成对任意数据进行同步承诺。当与现有基础设施集成、适应各种领域并跨越网络空间的广度扩展时，Synchronic Web 提供了一个普遍的机制，将世界的数据锁定到离散时间和数字空间中的独特点。

更新时间: 2024-06-10 16:58:01

领域: cs.CR

下载: http://arxiv.org/abs/2301.10733v2

Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots

In this research, we introduce a deep reinforcement learning-based control approach to address the intricate challenge of the robotic pre-grasping phase under microgravity conditions. Leveraging reinforcement learning eliminates the necessity for manual feature design, therefore simplifying the problem and empowering the robot to learn pre-grasping policies through trial and error. Our methodology incorporates an off-policy reinforcement learning framework, employing the soft actor-critic technique to enable the gripper to proficiently approach a free-floating moving object, ensuring optimal pre-grasp success. For effective learning of the pre-grasping approach task, we developed a reward function that offers the agent clear and insightful feedback. Our case study examines a pre-grasping task where a Robotiq 3F gripper is required to navigate towards a free-floating moving target, pursue it, and subsequently position itself at the desired pre-grasp location. We assessed our approach through a series of experiments in both simulated and real-world environments. The source code, along with recordings of real-world robot grasping, is available at Fanuc_Robotiq_Grasp.

Updated: 2024-06-10 16:54:51

标题: 走向现实世界的效率：领域随机化在强化学习中的应用，用于自主机器人预捕获自由浮动移动目标

摘要: 在这项研究中，我们引入了一种基于深度强化学习的控制方法，以解决微重力条件下机器人抓取前阶段的复杂挑战。利用强化学习消除了手动特征设计的必要性，从而简化了问题，并使机器人能够通过试错学习抓取前阶段的策略。我们的方法结合了一种离策略强化学习框架，采用了软演员-评论家技术，使夹爪能够有效地接近自由漂浮的移动物体，确保最佳的抓取成功。为了有效学习抓取前阶段任务，我们开发了一个奖励函数，为代理提供清晰而有见地的反馈。我们的案例研究考察了一个抓取前阶段任务，其中需要使用Robotiq 3F夹爪导航到一个自由漂浮的移动目标，追踪它，并随后定位到所需的抓取前位置。我们通过一系列在模拟和现实环境中的实验评估了我们的方法。源代码以及真实世界机器人抓取的录像可在Fanuc_Robotiq_Grasp上找到。

更新时间: 2024-06-10 16:54:51

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.06460v1

How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blocking (arrives at each BO iteration) manner, are incompatible with the spirit of self-driving labs. In this work, we study whether a small amount of randomly arriving expert feedback that is being incorporated in a non-blocking manner can improve a BO campaign. To this end, we run an additional, independent computing thread on top of the BO loop to handle the feedback-gathering process. The gathered feedback is used to learn a Bayesian preference model that can readily be incorporated into the BO thread, to steer its exploration-exploitation process. Experiments on toy and chemistry datasets suggest that even just a few intermittent, asynchronous expert feedback can be useful for improving or constraining BO. This can especially be useful for its implication in improving self-driving labs, e.g. making them more data-efficient and less costly.

Updated: 2024-06-10 16:53:58

标题: 间歇性、异步专家反馈对贝叶斯优化有多大帮助？

摘要: 贝叶斯优化（BO）是自动科学发现的重要组成部分 - 所谓的自动化实验室 - 在这里，人类输入理想情况下应尽可能少，或者至少不会阻碍。然而，科学家通常有很强的直觉，因此人类反馈仍然是有用的。然而，先前的研究在将专家反馈与BO结合的工作中，例如将其以离线或在线但阻塞（在每个BO迭代中到达）的方式整合进来，与自动化实验室的精神不兼容。在这项工作中，我们研究了一小部分随机到达的专家反馈是否以非阻塞的方式整合进来能够改善BO活动。为此，我们在BO循环之上运行了一个额外的独立计算线程来处理反馈收集过程。收集到的反馈被用来学习一个贝叶斯偏好模型，该模型可以轻松地整合到BO线程中，以引导其探索-利用过程。对玩具和化学数据集的实验表明，即使是少量间歇性的、异步的专家反馈也可以有助于改善或限制BO。这对于改进自动化实验室尤其有用，例如使其更具数据效率和更少成本。

更新时间: 2024-06-10 16:53:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.06459v1

A Large Language Model Pipeline for Breast Cancer Oncology

Large language models (LLMs) have demonstrated potential in the innovation of many disciplines. However, how they can best be developed for oncology remains underdeveloped. State-of-the-art OpenAI models were fine-tuned on a clinical dataset and clinical guidelines text corpus for two important cancer treatment factors, adjuvant radiation therapy and chemotherapy, using a novel Langchain prompt engineering pipeline. A high accuracy (0.85+) was achieved in the classification of adjuvant radiation therapy and chemotherapy for breast cancer patients. Furthermore, a confidence interval was formed from observational data on the quality of treatment from human oncologists to estimate the proportion of scenarios in which the model must outperform the original oncologist in its treatment prediction to be a better solution overall as 8.2% to 13.3%. Due to indeterminacy in the outcomes of cancer treatment decisions, future investigation, potentially a clinical trial, would be required to determine if this threshold was met by the models. Nevertheless, with 85% of U.S. cancer patients receiving treatment at local community facilities, these kinds of models could play an important part in expanding access to quality care with outcomes that lie, at minimum, close to a human oncologist.

Updated: 2024-06-10 16:44:48

标题: 一个用于乳腺癌肿瘤学的大型语言模型管道

摘要: 大型语言模型(LLMs)已经展示了在许多领域创新的潜力。然而，如何最好地为肿瘤学发展这些模型仍然不成熟。最先进的OpenAI模型在临床数据集和临床指南文本语料库上进行了微调，针对两个重要的癌症治疗因素，即辅助放射治疗和化疗，使用了一种新颖的Langchain提示工程流程。对于乳腺癌患者的辅助放射治疗和化疗的分类中取得了高准确度(0.85+)。此外，从人类肿瘤学家对治疗质量的观察数据中形成了一个置信区间，以估计模型必须在其治疗预测中超越原始肿瘤学家的比例，从而成为更好的整体解决方案，为8.2%至13.3%。由于癌症治疗决策结果的不确定性，未来的调查，可能是一项临床试验，将需要确定这些模型是否达到了这一阈值。然而，由于85%的美国癌症患者在当地社区设施接受治疗，这种模型可能在扩大获得优质护理的同时发挥重要作用，其结果至少接近于人类肿瘤学家。

更新时间: 2024-06-10 16:44:48

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06455v1

Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since often the treatments of interest cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to seek latent quasi-experiments in instrumental variables (IVs) for the treatment, for example, a randomized intent to treat or a randomized product recommendation. This approach, on the other hand, can suffer from low compliance, i.e., IV weakness. Some subgroups may even exhibit zero compliance meaning we cannot instrument for their CATEs at all. In this paper we develop a novel approach to combine IV and observational data to enable reliable CATE estimation in the presence of unobserved confounding in the observational data and low compliance in the IV data, including no compliance for some subgroups. We propose a two-stage framework that first learns biased CATEs from the observational data, and then applies a compliance-weighted correction using IV data, effectively leveraging IV strength variability across covariates. We characterize the convergence rates of our method and validate its effectiveness through a simulation study. Additionally, we demonstrate its utility with real data by analyzing the heterogeneous effects of 401(k) plan participation on wealth.

Updated: 2024-06-10 16:40:55

标题: 通过结合弱工具和观察数据估计异质治疗效应

摘要: 精确预测条件平均治疗效应（CATEs）在个性化医学和数字平台分析中至关重要。由于通常感兴趣的治疗无法直接随机化，因此利用观察数据学习CATEs，但这种方法可能会因未观察到的混杂而产生显著偏差。克服这些限制的一种策略是在治疗的工具变量（IVs）中寻找潜在的准实验，例如，随机意图治疗或随机产品推荐。另一方面，这种方法可能受到低遵从性的影响，即IV弱点。一些亚组甚至可能表现为零遵从性，这意味着我们根本无法为他们的CATEs提供工具变量。在本文中，我们开发了一种新方法，结合IV和观察数据，以在观察数据中存在未观察到的混杂和IV数据中低遵从性（包括一些亚组没有遵从性）的情况下实现可靠的CATE估计。我们提出了一个两阶段框架，首先从观察数据中学习偏差的CATEs，然后使用IV数据进行遵从性加权校正，有效利用IV在协变量上的强度变化。我们表征了我们方法的收敛速度，并通过模拟研究验证了其有效性。此外，我们通过分析401(k)计划参与对财富的异质效应，展示了其实用性。

更新时间: 2024-06-10 16:40:55

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06452v1

Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course

The capability of large language models (LLMs) to generate, debug, and explain code has sparked the interest of researchers and educators in undergraduate programming, with many anticipating their transformative potential in programming education. However, decisions about why and how to use LLMs in programming education may involve more than just the assessment of an LLM's technical capabilities. Using the social shaping of technology theory as a guiding framework, our study explores how students' social perceptions influence their own LLM usage. We then examine the correlation of self-reported LLM usage with students' self-efficacy and midterm performances in an undergraduate programming course. Triangulating data from an anonymous end-of-course student survey (n = 158), a mid-course self-efficacy survey (n=158), student interviews (n = 10), self-reported LLM usage on homework, and midterm performances, we discovered that students' use of LLMs was associated with their expectations for their future careers and their perceptions of peer usage. Additionally, early self-reported LLM usage in our context correlated with lower self-efficacy and lower midterm scores, while students' perceived over-reliance on LLMs, rather than their usage itself, correlated with decreased self-efficacy later in the course.

Updated: 2024-06-10 16:40:14

标题: 社会塑造理论的启示：大型语言模型在本科编程课程中的应用

摘要: 大型语言模型（LLMs）生成、调试和解释代码的能力引起了研究人员和教育工作者对本科编程的兴趣，许多人期待它们在编程教育中的变革潜力。然而，关于为什么以及如何在编程教育中使用LLMs的决定可能涉及更多不仅仅是评估LLM的技术能力。使用技术社会塑造理论作为指导框架，我们的研究探讨了学生的社会认知如何影响他们自己对LLM的使用。然后，我们研究了学生自我报告的LLM使用与他们在本科编程课程中的自我效能和期中表现之间的相关性。通过匿名的课程结束学生调查（n=158）、中期自我效能调查（n=158）、学生访谈（n=10）、作业中的自我报告的LLM使用以及期中表现的数据三重验证，我们发现学生对LLMs的使用与他们对未来职业的期望和对同龄人使用的看法相关。此外，在我们的背景下早期自我报告的LLM使用与较低的自我效能和较低的期中成绩相关，而学生对LLMs过度依赖的感知，而不是使用本身，与课程后期降低的自我效能相关。

更新时间: 2024-06-10 16:40:14

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.06451v1

Cometh: A continuous-time discrete-state graph diffusion model

Discrete-state denoising diffusion models led to state-of-the-art performance in graph generation, especially in the molecular domain. Recently, they have been transposed to continuous time, allowing more flexibility in the reverse process and a better trade-off between sampling efficiency and quality. Here, to leverage the benefits of both approaches, we propose Cometh, a continuous-time discrete-state graph diffusion model, integrating graph data into a continuous-time diffusion model framework. Empirically, we show that integrating continuous time leads to significant improvements across various metrics over state-of-the-art discrete-state diffusion models on a large set of molecular and non-molecular benchmark datasets.

Updated: 2024-06-10 16:39:39

标题: "Cometh：一种连续时间离散状态图扩散模型"

摘要: 离散状态去噪扩散模型在图生成中表现出色，特别是在分子领域。最近，它们已经转变为连续时间，允许在反向过程中更灵活，并在采样效率和质量之间达到更好的平衡。在这里，为了充分利用这两种方法的优势，我们提出了Cometh，一个连续时间离散状态图扩散模型，将图数据整合到连续时间扩散模型框架中。从经验上讲，我们展示了整合连续时间在各种指标上显著提高了大量分子和非分子基准数据集上的最先进的离散状态扩散模型。

更新时间: 2024-06-10 16:39:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.06449v1

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic.

Updated: 2024-06-10 16:36:02

标题: 深度生成建模改变了压缩和传输：从效率到弹性

摘要: 信息论和机器学习是密不可分的，并甚至被称为“同一枚硬币的两面”。其中一个特别优雅的联系是概率生成建模与数据压缩或传输之间的基本等价性。在本文中，我们揭示了深度生成模型的双重功能，重新塑造了数据压缩以提高效率和传输错误掩盖以增强弹性。我们展示了强大生成模型的上下文预测能力如何能够成为强大的压缩器和估计器。在这个意义上，我们主张通过端到端通信的视角来看待深度生成建模问题，并评估基础生成模型的压缩和错误恢复能力。我们展示了许多大型生成模型的核心是强大的预测器，能够捕捉语义潜在变量之间的复杂关系，而通信观点提供了有关语义特征标记、上下文学习和深度生成模型使用的新见解。总之，我们的文章强调了生成AI与源编码和信道编码技术的基本联系，并激励研究人员在这一新兴主题上进行进一步探索。

更新时间: 2024-06-10 16:36:02

领域: cs.IT,cs.LG,cs.MM,math.IT

下载: http://arxiv.org/abs/2406.06446v1

LLM Dataset Inference: Did you train on my dataset?

The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives.

Updated: 2024-06-10 16:34:43

标题: LLM数据集推断：你是否在我的数据集上训练？

摘要: 大型语言模型（LLMs）在现实世界中的大量使用伴随着对公司侵犯版权的案件数量增加，这些公司训练模型时使用了未经许可的互联网数据。最近的研究提出了一种方法，用于识别个别文本序列是否是模型训练数据的成员，称为成员推理攻击（MIAs）。我们证明这些MIAs的表现成功与选择非成员（未用于训练的文本序列）有关，这些文本序列属于不同分布与成员（例如，与用于训练模型的文章相比的时间上移的最新维基百科文章）。这种分布偏移使得成员推理看起来成功。然而，大多数MIA方法在区分来自相同分布的成员和非成员（例如，在本例中是相同时间段）时并不比随机猜测更好。即使MIAs起作用，我们发现不同的MIAs成功地推断出来自不同分布的样本的成员资格。相反，我们提出了一种新的数据集推理方法，以准确识别用于训练大型语言模型的数据集。这一范式在现代版权领域中具有现实意义，作者声称LLM是基于他们撰写的多个文档（如一本书）进行训练，而不是特定的段落。虽然数据集推理面临许多成员推理的挑战，但我们通过有选择地结合为给定分布提供积极信号的MIAs，并对给定数据集执行统计测试来解决这些问题。我们的方法成功地区分了Pile不同子集的训练集和测试集，具有统计学意义的p值< 0.1，而没有任何假阳性。

更新时间: 2024-06-10 16:34:43

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2406.06443v1

Interpretability of Language Models via Task Spaces

The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes. In this paper, we present an alternative approach, concentrating on the quality of LM processing, with a focus on their language abilities. To this end, we construct 'linguistic task spaces' -- representations of an LM's language conceptualisation -- that shed light on the connections LMs draw between language phenomena. Task spaces are based on the interactions of the learning signals from different linguistic phenomena, which we assess via a method we call 'similarity probing'. To disentangle the learning signals of linguistic phenomena, we further introduce a method called 'fine-tuning via gradient differentials' (FTGD). We apply our methods to language models of three different scales and find that larger models generalise better to overarching general concepts for linguistic tasks, making better use of their shared structure. Further, the distributedness of linguistic processing increases with pre-training through increased parameter sharing between related linguistic tasks. The overall generalisation patterns are mostly stable throughout training and not marked by incisive stages, potentially explaining the lack of successful curriculum strategies for LMs.

Updated: 2024-06-10 16:34:30

标题: 通过任务空间解释语言模型的可解释性

摘要: 解释语言模型（LMs）的常规方法是测试它们在不同基准上的表现，然后推断它们的内部过程。在本文中，我们提出了一种替代方法，重点关注LM处理的质量，着重关注它们的语言能力。为此，我们构建了“语言任务空间”——LM语言概念的表示形式——这些空间揭示了LM之间绘制的语言现象之间的联系。任务空间是基于不同语言现象的学习信号的交互而形成的，我们通过一种称为“相似性探测”的方法对这些信号进行评估。为了解开语言现象的学习信号，我们进一步引入了一种称为“通过梯度差分微调”的方法（FTGD）。我们将这些方法应用于三种不同规模的语言模型，并发现较大的模型更好地泛化到语言任务的总体概念，更好地利用它们共享的结构。此外，通过预训练，语言处理的分布性增加，通过增加相关语言任务之间的参数共享。总体泛化模式在整个训练过程中大多是稳定的，并没有明显的阶段，这可能解释了LMs缺乏成功的课程策略。

更新时间: 2024-06-10 16:34:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06441v1

Multimodal Contextualized Semantic Parsing from Speech

We introduce Semantic Parsing in Contextual Environments (SPICE), a task designed to enhance artificial agents' contextual awareness by integrating multimodal inputs with prior contexts. SPICE goes beyond traditional semantic parsing by offering a structured, interpretable framework for dynamically updating an agent's knowledge with new information, mirroring the complexity of human communication. We develop the VG-SPICE dataset, crafted to challenge agents with visual scene graph construction from spoken conversational exchanges, highlighting speech and visual data integration. We also present the Audio-Vision Dialogue Scene Parser (AViD-SP) developed for use on VG-SPICE. These innovations aim to improve multimodal information processing and integration. Both the VG-SPICE dataset and the AViD-SP model are publicly available.

Updated: 2024-06-10 16:31:34

标题: 来自语音的多模态上下文语义解析

摘要: 我们介绍了语境环境中的语义解析（SPICE），这是一个旨在通过将多模态输入与先前上下文集成来增强人工代理的语境意识的任务。SPICE超越了传统的语义解析，提供了一个结构化、可解释的框架，用于动态更新代理的知识，以反映人类交流的复杂性。我们开发了VG-SPICE数据集，旨在挑战代理人通过口头对话交流从视觉场景图构建的能力，突出了语音和视觉数据的集成。我们还介绍了专为在VG-SPICE上使用而开发的音频-视觉对话场景解析器（AViD-SP）。这些创新旨在改善多模态信息处理和集成。VG-SPICE数据集和AViD-SP模型都是公开可用的。

更新时间: 2024-06-10 16:31:34

领域: cs.CL,cs.CV,cs.HC,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.06438v1

An Empirical Study on Fault Detection and Root Cause Analysis of Indium Tin Oxide Electrodes by Processing S-parameter Patterns

In the field of optoelectronics, indium tin oxide (ITO) electrodes play a crucial role in various applications, such as displays, sensors, and solar cells. Effective fault diagnosis and root cause analysis of the ITO electrodes are essential to ensure the performance and reliability of the devices. However, traditional visual inspection is challenging with transparent ITO electrodes, and existing fault diagnosis methods have limitations in determining the root causes of the defects, often requiring destructive evaluations and secondary material characterization techniques. In this study, a fault diagnosis method with root cause analysis is proposed using scattering parameter (S-parameter) patterns, offering early detection, high diagnostic accuracy, and noise robustness. A comprehensive S-parameter pattern database is obtained according to various defect states of the ITO electrodes. Deep learning (DL) approaches, including multilayer perceptron (MLP), convolutional neural network (CNN), and transformer, are then used to simultaneously analyze the cause and severity of defects. Notably, it is demonstrated that the diagnostic performance under additive noise levels can be significantly enhanced by combining different channels of the S-parameters as input to the learning algorithms, as confirmed through the t-distributed stochastic neighbor embedding (t-SNE) dimension reduction visualization of the S-parameter patterns.

Updated: 2024-06-10 16:29:37

标题: 对氧化铟锡电极的故障检测和根本原因分析的经验研究：通过处理S参数模式

摘要: 在光电子领域，氧化铟锡（ITO）电极在各种应用中起着关键作用，如显示器、传感器和太阳能电池。有效的故障诊断和根本原因分析对于确保设备的性能和可靠性至关重要。然而，传统的视觉检查在透明的ITO电极上是具有挑战性的，现有的故障诊断方法在确定缺陷的根本原因方面存在局限性，通常需要破坏性评估和二次材料表征技术。本研究提出了一种利用散射参数（S参数）模式进行故障诊断和根本原因分析的方法，提供早期检测、高诊断准确性和抗噪声性能。根据ITO电极的各种缺陷状态获得了全面的S参数模式数据库。随后使用深度学习（DL）方法，包括多层感知器（MLP）、卷积神经网络（CNN）和变压器，同时分析缺陷的原因和严重程度。值得注意的是，通过将S参数的不同通道组合作为学习算法的输入，可以显著提高在加性噪声水平下的诊断性能，通过S参数模式的t分布随机邻域嵌入（t-SNE）降维可视化进行确认。

更新时间: 2024-06-10 16:29:37

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.11639v2

Self-explainable Graph Neural Network for Alzheimer's Disease And Related Dementias Risk Prediction

Background: Alzheimer's disease and related dementias (ADRD) ranks as the sixth leading cause of death in the US, underlining the importance of accurate ADRD risk prediction. While recent advancement in ADRD risk prediction have primarily relied on imaging analysis, yet not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. Objective: Our goal is to utilize Graph Neural Networks (GNNs) with claims data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative method to evaluate relationship importance and its influence on ADRD risk prediction, ensuring comprehensive interpretation. Methods: We employed Variationally Regularized Encoder-decoder Graph Neural Network (VGNN) for estimating ADRD likelihood. We created three scenarios to assess the model's efficiency, using Random Forest and Light Gradient Boost Machine as baselines. We further used our relation importance method to clarify the key relationships for ADRD risk prediction. Results: VGNN surpassed other baseline models by 10% in the area under the receiver operating characteristic. The integration of the GNN model and relation importance interpretation could potentially play an essential role in providing valuable insight into factors that may contribute to or delay ADRD progression. Conclusions: Employing a GNN approach with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.

Updated: 2024-06-10 16:29:11

标题: 可以翻译为：自解释的图神经网络用于阿尔茨海默病和相关痴呆症风险预测

摘要: 背景：阿尔茨海默病和相关痴呆疾病（ADRD）在美国排名第六位死因，突显了准确预测ADRD风险的重要性。尽管最近ADRD风险预测方面取得了进展，主要依赖于影像分析，但并非所有患者在ADRD诊断前都接受医学影像检查。将机器学习与索赔数据相结合可以揭示额外的风险因素，并揭示各种医疗代码之间的相互关系。目标：我们的目标是利用图神经网络（GNN）与索赔数据进行ADRD风险预测。针对这些预测背后缺乏可解释性的原因，我们引入了一种创新方法来评估关系重要性及其对ADRD风险预测的影响，确保全面解释。方法：我们采用变分正则化编码器-解码器图神经网络（VGNN）来估计ADRD可能性。我们创建了三种情景来评估模型的效率，使用随机森林和轻梯度提升机器作为基线。我们进一步使用我们的关系重要性方法来澄清ADRD风险预测的关键关系。结果： VGNN在接收器操作特征下面积方面超过其他基线模型10%。GNN模型与关系重要性解释的整合可能在提供有价值的洞察因素方面发挥重要作用，这些因素可能有助于或延缓ADRD的进展。结论：采用GNN方法与索赔数据增强ADRD风险预测，并提供关于相互关联的医疗代码关系影响的见解。这种方法不仅可以实现ADRD风险建模，还显示了利用索赔数据进行其他影像分析预测的潜力。

更新时间: 2024-06-10 16:29:11

领域: cs.LG

下载: http://arxiv.org/abs/2309.06584v4

Adaptive Interface-PINNs (AdaI-PINNs): An Efficient Physics-informed Neural Networks Framework for Interface Problems

We present an efficient physics-informed neural networks (PINNs) framework, termed Adaptive Interface-PINNs (AdaI-PINNs), to improve the modeling of interface problems with discontinuous coefficients and/or interfacial jumps. This framework is an enhanced version of its predecessor, Interface PINNs or I-PINNs (Sarma et al.; https://dx.doi.org/10.2139/ssrn.4766623), which involves domain decomposition and assignment of different predefined activation functions to the neural networks in each subdomain across a sharp interface, while keeping all other parameters of the neural networks identical. In AdaI-PINNs, the activation functions vary solely in their slopes, which are trained along with the other parameters of the neural networks. This makes the AdaI-PINNs framework fully automated without requiring preset activation functions. Comparative studies on one-dimensional, two-dimensional, and three-dimensional benchmark elliptic interface problems reveal that AdaI-PINNs outperform I-PINNs, reducing computational costs by 2-6 times while producing similar or better accuracy.

Updated: 2024-06-10 16:28:15

标题: 自适应界面-PINNs（AdaI-PINNs）：一种有效的面向界面问题的物理信息神经网络框架

摘要: 我们提出了一种高效的物理信息神经网络（PINNs）框架，称为自适应界面-PINNs（AdaI-PINNs），以改进具有不连续系数和/或界面跳跃的界面问题建模。该框架是其前身界面PINNs或I-PINNs的增强版本（Sarma等人；https://dx.doi.org/10.2139/ssrn.4766623），其中涉及域分解并将不同预定义激活函数分配给跨越尖锐界面的每个子域中的神经网络，同时保持神经网络的所有其他参数相同。在AdaI-PINNs中，激活函数仅在其斜率上变化，这些斜率与神经网络的其他参数一起进行训练。这使得AdaI-PINNs框架完全自动化，无需预设激活函数。对一维、二维和三维基准椭圆界面问题的比较研究表明，AdaI-PINNs优于I-PINNs，将计算成本降低2-6倍，同时产生类似或更好的准确性。

更新时间: 2024-06-10 16:28:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.04626v2

Ranking Large Language Models without Ground Truth

Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructions, etc.) and a set of LLMs, we rank them without access to any ground truth or reference responses. Inspired by real life where both an expert and a knowledgeable person can identify a novice our main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. We also analyze our idea and provide sufficient conditions for it to succeed. Applying this idea repeatedly, we propose two methods to rank LLMs. In experiments on different generative tasks (summarization, multiple-choice, and dialog), our methods reliably recover close to true rankings without reference data. This points to a viable low-resource mechanism for practical use.

Updated: 2024-06-10 16:25:30

标题: 在没有基准事实的情况下对大型语言模型进行排名

摘要: 随着大型语言模型（LLMs）的普及和影响，评估和排名这些模型变得越来越重要。评估方法要么需要昂贵的人工响应，要么使用一对LLMs来相互评估，这可能不可靠。在本文中，我们提供了一个新颖的视角，即在给定一组提示（例如问题、指令等）和一组LLMs的数据集的情况下，我们在没有任何基准或参考响应的情况下对它们进行排名。受现实生活启发，专家和知识渊博的人都可以识别出一个新手，我们的主要想法是考虑模型三元组，其中每个模型评估其他两个模型，并且以高概率正确地识别出三元组中最差的模型。我们还分析了我们的想法，并提供了成功的充分条件。通过反复应用这一想法，我们提出了两种排名LLMs的方法。在不同的生成任务（摘要、多项选择和对话）的实验中，我们的方法可靠地恢复接近真实排名的结果，而无需参考数据。这表明这是一个可行的低资源机制，可用于实际应用。

更新时间: 2024-06-10 16:25:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14860v4

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This dataset consists of 62 scenarios, covering six different DMAs, including ethical principles such as fairness and moral desert. We present a novel software framework for human-aligned decision-making by utilizing these DMAs, paving the way for trustworthy AI with better guardrails. Specifically, we demonstrate how large language models (LLMs) can serve as ethical decision-makers, and how their decisions can be aligned to different DMAs using zero-shot prompting. Our experiments focus on different open-source models with varying sizes and training techniques, such as Falcon, Mistral, and Llama 2. Finally, we also introduce a new form of weighted self-consistency that improves the overall quantified performance. Our results provide new research directions in the use of LLMs as alignable decision-makers. The dataset and open-source software are publicly available at: https://github.com/ITM-Kitware/llm-alignable-dm.

Updated: 2024-06-10 16:25:23

标题: 语言模型是可对齐的决策者：数据集和在医疗分类领域的应用

摘要: 在困难的决策场景中，专家人类决策者之间存在不同意见是常见的，因为可能没有一个正确的答案。这些决策可能由不同属性指导，这些属性可以用来描述一个人的决策。我们引入了一个新颖的用于医疗分诊决策的数据集，标记有一组决策者属性（DMAs）。该数据集包含62个场景，涵盖了六种不同的DMAs，包括公平和道德原则等伦理原则。我们提出了一个新颖的软件框架，通过利用这些DMAs来进行与人类对齐的决策，为具有更好防护措施的可信AI铺平道路。具体来说，我们展示了大型语言模型（LLMs）如何作为道德决策者，并且如何通过零-shot提示将它们的决策与不同的DMAs对齐。我们的实验重点放在不同大小和培训技术的开源模型上，例如Falcon、Mistral和Llama 2。最后，我们还引入了一种新形式的加权自一致性，改善了整体量化性能。我们的结果为LLMs作为可对齐决策者的使用提供了新的研究方向。该数据集和开源软件可在以下链接公开获取：https://github.com/ITM-Kitware/llm-alignable-dm.

更新时间: 2024-06-10 16:25:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06435v1

DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Personalised discount codes provide a powerful mechanism for managing customer relationships and operational spend in e-commerce. Bandits are well suited for this product area, given the partial information nature of the problem, as well as the need for adaptation to the changing business environment. Here, we introduce DISCO, an end-to-end contextual bandit framework for personalised discount code allocation at ASOS.com. DISCO adapts the traditional Thompson Sampling algorithm by integrating it within an integer program, thereby allowing for operational cost control. Because bandit learning is often worse with high dimensional actions, we focused on building low dimensional action and context representations that were nonetheless capable of good accuracy. Additionally, we sought to build a model that preserved the relationship between price and sales, in which customers increasing their purchasing in response to lower prices ("negative price elasticity"). These aims were achieved by using radial basis functions to represent the continuous (i.e. infinite armed) action space, in combination with context embeddings extracted from a neural network. These feature representations were used within a Thompson Sampling framework to facilitate exploration, and further integrated with an integer program to allocate discount codes across ASOS's customer base. These modelling decisions result in a reward model that (a) enables pooled learning across similar actions, (b) is highly accurate, including in extrapolation, and (c) preserves the expected negative price elasticity. Through offline analysis, we show that DISCO is able to effectively enact exploration and improves its performance over time, despite the global constraint. Finally, we subjected DISCO to a rigorous online A/B test, and find that it achieves a significant improvement of >1% in average basket value, relative to the legacy systems.

Updated: 2024-06-10 16:24:35

标题: DISCO: 一种用于个性化折扣分配的端到端强盗框架

摘要: 个性化折扣代码为电子商务中管理客户关系和运营支出提供了强大的机制。由于问题的部分信息性质以及对不断变化的业务环境的适应性需求，赌博算法非常适合这种产品领域。在这里，我们介绍了DISCO，这是一个用于在ASOS.com上个性化折扣代码分配的端到端上下文赌博框架。DISCO通过将传统的汤普森抽样算法集成到整数程序中，从而允许进行运营成本控制。由于高维动作时赌博学习往往更糟糕，我们专注于构建低维动作和上下文表示，尽管这些表示仍然能够具有良好的准确性。此外，我们试图构建一个能够保持价格和销售之间关系的模型，即顾客对更低价格做出购买反应（“负价格弹性”）。通过使用径向基函数来表示连续（即无限动作）动作空间，结合从神经网络中提取的上下文嵌入，这些目标得以实现。这些特征表示被用于汤普森抽样框架中以促进探索，并进一步与整数程序集成以跨ASOS的客户群体分配折扣代码。这些建模决策导致了一个奖励模型，它（a）实现了相似动作之间的池化学习，（b）在外推方面非常准确，并且（c）保持了预期的负价格弹性。通过离线分析，我们展示了DISCO能够有效地进行探索，并随着时间的推移提高其性能，尽管存在全局约束。最后，我们对DISCO进行了严格的在线A/B测试，并发现相对于传统系统，它的平均购物篮价值实现了超过1%的显著提升。

更新时间: 2024-06-10 16:24:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06433v1

A Note on Vectorial Boolean Functions as Embeddings

Let $F$ be a vectorial Boolean function from $\mathbb{F}^n$ to $\mathbb{F}^m$, where $m \geq n$. We define $F$ as an embedding if $F$ is injective. In this paper, we examine the component functions of $F$, focusing on constant and balanced components. Our findings reveal that at most $2^m - 2^{m-n}$ components of $F$ can be balanced, and this maximum is achieved precisely when $F$ is an embedding, with the remaining $2^{m-n}$ components being constants. Additionally, for quadratic embeddings, we demonstrate that there are always at least $2^n - 1$ balanced components when $n$ is even, and $2^{m-1} + 2^{n-1} - 1$ balanced components when $n$ is odd.

Updated: 2024-06-10 16:23:04

标题: 关于矢量布尔函数作为嵌入的注解

摘要: 让$F$是一个从$\mathbb{F}^n$到$\mathbb{F}^m$的向量布尔函数，其中$m \geq n$。我们定义$F$为一个嵌入函数，如果$F$是单射的话。在本文中，我们研究$F$的分量函数，重点关注常数和平衡分量。我们的研究结果表明，$F$的最多$2^m - 2^{m-n}$个分量可以是平衡的，而且当$F$是一个嵌入函数时，这个最大值恰好被实现，剩下的$2^{m-n}$个分量是常数。此外，对于二次嵌入函数，我们证明当$n$是偶数时，至少有$2^n - 1$个平衡分量，当$n$是奇数时，有$2^{m-1} + 2^{n-1} - 1$个平衡分量。

更新时间: 2024-06-10 16:23:04

领域: cs.CR,06E30, 94A60, 14G50

下载: http://arxiv.org/abs/2406.06429v1

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multivariate first stochastic dominance in terms of couplings, we introduce a statistic that assesses multivariate almost stochastic dominance under the framework of Optimal Transport with a smooth cost. Further, we introduce an entropic regularization of this statistic, and establish a central limit theorem (CLT) and consistency of the bootstrap procedure for the empirical statistic. Armed with this CLT, we propose a hypothesis testing framework as well as an efficient implementation using the Sinkhorn algorithm. We showcase our method in comparing and benchmarking Large Language Models that are evaluated on multiple metrics. Our multivariate stochastic dominance test allows us to capture the dependencies between the metrics in order to make an informed and statistically significant decision on the relative performance of the models.

Updated: 2024-06-10 16:14:50

标题: 通过最优输运的多元随机优势及其在模型基准测试中的应用

摘要: 随机优势是概率论、计量经济学和社会选择理论中的重要概念，用于稳健地建模代理人对随机结果的偏好。虽然许多研究致力于单变量情况，但在多变量情况下几乎没有研究，代理人需要在不同的多变量结果之间做出决策。通过利用多元第一随机优势的耦合特性，我们引入了一种在光滑成本的最优输运框架下评估多元几乎随机优势的统计量。此外，我们还引入了这一统计量的熵正则化，并建立了经验统计量的中心极限定理（CLT）和自举程序的一致性。借助这一中心极限定理，我们提出了一个假设检验框架，并利用Sinkhorn算法实现了高效的实现。我们在比较和基准测试多个度量的大语言模型时展示了我们的方法。我们的多元随机优势检验允许我们捕捉度量之间的依赖关系，以便在模型的相对性能上做出明智且具有统计学意义的决策。

更新时间: 2024-06-10 16:14:50

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.06425v1

An Improved Empirical Fisher Approximation for Natural Gradient Descent

Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper first investigates the inversely-scaled projection issue of EF, which is shown to be a major cause of the poor empirical approximation quality. An improved empirical Fisher (iEF) method, motivated as a generalised NGD method from a loss reduction perspective, is proposed to address this issue, meanwhile retaining the practical convenience of EF. The exact iEF and EF methods are experimentally evaluated using practical deep learning setups, including widely-used setups for parameter-efficient fine-tuning of pre-trained models (T5-base with LoRA and Prompt-Tuning on GLUE tasks, and ViT with LoRA for CIFAR100). Optimisation experiments show that applying exact iEF as an optimiser provides strong convergence and generalisation. It achieves the best test performance and the lowest training loss for majority of the tasks, even when compared with well-tuned AdamW/Adafactor baselines. Additionally, under a novel empirical evaluation framework, the proposed iEF method shows consistently better approximation quality to the exact Natural Gradient updates than both EF and the more expensive sampled Fisher (SF). Further investigation also shows that the superior approximation quality of iEF is robust to damping across tasks and training stages. Improving existing approximate NGD optimisers with iEF is expected to lead to better convergence ability and stronger robustness to choice of damping.

Updated: 2024-06-10 16:12:32

标题: 一种改进的经验费舍尔近似方法用于自然梯度下降

摘要: 近似自然梯度下降（NGD）方法是深度学习模型的重要优化器家族，它们在训练过程中使用近似Fisher信息矩阵来对梯度进行预处理。经验Fisher（EF）方法通过重复使用在反向传播期间收集到的每个样本梯度来经验性地近似Fisher信息矩阵。尽管实现简单，EF近似方法存在理论和实际限制。本文首先研究了EF的逆比例投影问题，该问题被证明是导致较差经验近似质量的主要原因。提出了一种改进的经验Fisher（iEF）方法，从损失减少的角度出发，作为一种广义NGD方法，旨在解决这一问题，同时保留了EF的实际便利性。通过实际深度学习设置对精确iEF和EF方法进行了实验评估，包括用于参数高效微调预训练模型的广泛使用的设置（T5-base与LoRA和Prompt-Tuning在GLUE任务上，以及ViT与LoRA在CIFAR100上）。优化实验表明，将精确iEF应用作为优化器提供了强大的收敛性和泛化能力。对于大多数任务，它实现了最佳的测试性能和最低的训练损失，即使与精心调整的AdamW/Adafactor基线相比也是如此。此外，在一种新颖的经验评估框架下，提出的iEF方法显示出比EF和更昂贵的采样Fisher（SF）更好地逼近精确的自然梯度更新。进一步研究还表明，iEF的优越逼近质量对于任务和训练阶段中的阻尼是稳健的。通过使用iEF改进现有的近似NGD优化器，预计可以提高收敛能力，并增强对阻尼选择的强大稳健性。

更新时间: 2024-06-10 16:12:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.06420v1

Foundation Inference Models for Markov Jump Processes

Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets.

Updated: 2024-06-10 16:12:00

标题: 马尔可夫跳跃过程的基础推理模型

摘要: 马尔可夫跳跃过程是描述演化在离散状态空间中的动态系统的连续时间随机过程。这些过程在自然科学和机器学习中有广泛的应用，但它们的推断被认为远非易事。在这项工作中，我们介绍了一种从嘈杂和稀疏观测中零-shot推断马尔可夫跳跃过程（MJPs）的方法，这些过程在有界状态空间中，包括两个组成部分。首先，一个广泛的概率分布涵盖了MJPs家族，以及可能的观测时间和噪声机制，我们用它来模拟隐藏的MJPs及其嘈杂的观测过程的合成数据集。其次，一个神经网络模型处理模拟观测的子集，并被训练以监督方式输出目标MJP的初始条件和速率矩阵。我们实验证明，一个（预先训练的）模型可以以零-shot方式推断不同维度状态空间中演化的隐藏MJPs。具体来说，我们推断了描述（i）离散闪烁棘轮系统（一种布朗运动器）的MJPs，以及（ii）分子模拟中的构象动力学，（iii）实验离子通道数据和（iv）简单蛋白质折叠模型。此外，我们展示了我们的模型与针对目标数据集微调的最先进模型具有相当的性能。

更新时间: 2024-06-10 16:12:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06419v1

Explainable Graph Neural Networks Under Fire

Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness.

Updated: 2024-06-10 16:09:16

标题: 可解释的图神经网络面临挑战

摘要: 由于图神经网络（GNNs）的预测通常缺乏可解释性，这是由于它们复杂的计算行为和图的抽象性质。为了解决这个问题，许多GNN解释方法已经出现。它们的目标是解释模型的预测，从而在GNN模型部署在决策关键应用程序时获得信任。大多数GNN解释方法以事后方式工作，并以重要边缘和/或节点的小子集的形式提供解释。在本文中，我们证明这些解释不可信，因为常见的GNN解释方法很容易受到对抗性扰动的影响。也就是说，即使保持模型预测的原始图结构发生轻微扰动，也可能得到完全不同的解释。这对后期解释方法对GNN的可信度和实用性提出了质疑。为了攻击GNN解释模型，我们设计了一种名为GXAttack的新型攻击方法，这是在这种设置下的第一种基于优化的对抗性攻击方法。由于我们攻击的毁灭性有效性，我们呼吁对未来的GNN解释器进行对抗性评估，以证明它们的鲁棒性。

更新时间: 2024-06-10 16:09:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06417v1

Reproducibility study of FairAC

This work aims to reproduce the findings of the paper "Fair Attribute Completion on Graph with Missing Attributes" written by Guo, Chu, and Li arXiv:2302.12977 by investigating the claims made in the paper. This paper suggests that the results of the original paper are reproducible and thus, the claims hold. However, the claim that FairAC is a generic framework for many downstream tasks is very broad and could therefore only be partially tested. Moreover, we show that FairAC is generalizable to various datasets and sensitive attributes and show evidence that the improvement in group fairness of the FairAC framework does not come at the expense of individual fairness. Lastly, the codebase of FairAC has been refactored and is now easily applicable for various datasets and models.

Updated: 2024-06-10 16:09:03

标题: FairAC的可复制性研究

摘要: 这项工作旨在重现由Guo、Chu和Li撰写的论文"在具有缺失属性的图上进行公平属性补全"中的研究结果（arXiv:2302.12977），通过调查论文中提出的主张。本文指出，原始论文的结果是可重现的，因此这些主张是成立的。然而，论文中声称FairAC是许多下游任务的通用框架的说法非常广泛，因此只能在一定程度上进行测试。此外，我们展示FairAC可以泛化到各种数据集和敏感属性，并展示证据表明FairAC框架在组公平性改进方面并不以牺牲个体公平性为代价。最后，FairAC的代码库已经重构，现在可以轻松应用于各种数据集和模型。

更新时间: 2024-06-10 16:09:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.03314v2

Differentially Private Best-Arm Identification

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence in both the local and central models, i.e. $\epsilon$-local and $\epsilon$-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive lower bounds on the sample complexity of any $\delta$-correct BAI algorithm satisfying $\epsilon$-global DP or $\epsilon$-local DP. Our lower bounds suggest the existence of two privacy regimes. In the high-privacy regime, the hardness depends on a coupled effect of privacy and novel information-theoretic quantities involving the Total Variation. In the low-privacy regime, the lower bounds reduce to the non-private lower bounds. We propose $\epsilon$-local DP and $\epsilon$-global DP variants of a Top Two algorithm, namely CTB-TT and AdaP-TT*, respectively. For $\epsilon$-local DP, CTB-TT is asymptotically optimal by plugging in a private estimator of the means based on Randomised Response. For $\epsilon$-global DP, our private estimator of the mean runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. By adapting the transportation costs, the expected sample complexity of AdaP-TT* reaches the asymptotic lower bound up to multiplicative constants.

Updated: 2024-06-10 16:02:48

标题: 差分隐私最佳臂识别

摘要: 最佳臂识别（BAI）问题逐渐在数据敏感的应用中得到应用，例如设计自适应临床试验、调整超参数和进行用户研究。受到这些应用引发的数据隐私问题的影响，我们研究了在本地和中央模型中具有固定置信度的BAI问题，即$\epsilon$-本地和$\epsilon$-全局差分隐私（DP）。首先，为了量化隐私成本，我们推导了任何满足$\epsilon$-全局DP或$\epsilon$-本地DP的$\delta$-正确的BAI算法的样本复杂性的下界。我们的下界表明存在两种隐私制度。在高隐私制度中，困难取决于隐私和涉及总变差的新的信息理论量之间的耦合效应。在低隐私制度中，下界降低到非私有下界。我们提出了Top Two算法的$\epsilon$-本地DP和$\epsilon$-全局DP变体，分别为CTB-TT和AdaP-TT*。对于$\epsilon$-本地DP，通过使用基于随机响应的私有估计均值，CTB-TT是渐近最优的。对于$\epsilon$-全局DP，我们的均值私有估计器在依赖于臂的自适应周期中运行，并添加拉普拉斯噪声以确保良好的隐私效用权衡。通过调整运输成本，AdaP-TT*的预期样本复杂性达到了渐近下界，直至乘法常数。

更新时间: 2024-06-10 16:02:48

领域: stat.ML,cs.CR,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.06408v1

The Structure and Dynamics of Knowledge Graphs, with Superficiality

Large knowledge graphs combine human knowledge garnered from projects ranging from academia and institutions to enterprises and crowdsourcing. Within such graphs, each relationship between two nodes represents a basic fact involving these two entities. The diversity of the semantics of relationships constitutes the richness of knowledge graphs, leading to the emergence of singular topologies, sometimes chaotic in appearance. However, this complex characteristic can be modeled in a simple way by introducing the concept of superficiality, which controls the overlap between relationships whose facts are generated independently. With this model, superficiality also regulates the balance of the global distribution of knowledge by determining the proportion of misdescribed entities. This is the first model for the structure and dynamics of knowledge graphs. It leads to a better understanding of formal knowledge acquisition and organization.

Updated: 2024-06-10 16:02:44

标题: 知识图谱的结构和动态，以及表面特征

摘要: 大型知识图谱结合了从学术界和机构到企业和众包项目中获得的人类知识。在这种图谱中，两个节点之间的每个关系代表涉及这两个实体的基本事实。关系的语义多样性构成了知识图谱的丰富性，导致独特的拓扑结构的出现，有时外观混乱。然而，这种复杂特性可以通过引入表层概念以简单方式建模，表层概念控制了独立生成事实关系之间的重叠。通过这个模型，表层性还通过确定错误描述实体的比例来调节知识的全局分布的平衡。这是关于知识图谱结构和动态的第一个模型。它有助于更好地理解形式化知识获取和组织。

更新时间: 2024-06-10 16:02:44

领域: cs.AI

下载: http://arxiv.org/abs/2305.08116v4

A Taxonomy of Challenges to Curating Fair Datasets

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

Updated: 2024-06-10 15:59:08

标题: 一个关于筹集公平数据集挑战的分类学

摘要: 尽管已经付出了大量努力来创建更公平的机器学习（ML）数据集，但对数据集策划的实际方面仍了解有限。通过对30位ML数据集策划者的访谈，我们提出了一套全面的挑战和权衡的分类，这些挑战贯穿整个数据集策划生命周期。我们的研究结果强调了在更广泛的公平性领域中影响数据策划的主要问题。最后，我们提出了旨在促进系统变革以更好地促进公平数据集策划实践的建议。

更新时间: 2024-06-10 15:59:08

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.06407v1

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

Updated: 2024-06-10 15:56:52

标题: 元学习在超过7000种语言中的文本到语音合成

摘要: 在这项工作中，我们承担了一个具有挑战性的任务，即构建一个能够在超过7000种语言中生成语音的单一文本到语音合成系统，其中许多语言缺乏传统TTS开发所需的足够数据。通过利用大规模多语言预训练和元学习的新颖整合来逼近语言表示，我们的方法实现了在没有任何可用数据的语言中进行零-shot语音合成。我们通过客观指标和人类评估在各种语言环境中验证了系统的性能。通过公开发布我们的代码和模型，我们的目标是赋予具有有限语言资源的社区力量，并促进语音技术领域的进一步创新。

更新时间: 2024-06-10 15:56:52

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.06403v1

An Empirical Design Justice Approach to Identifying Ethical Considerations in the Intersection of Large Language Models and Social Robotics

The integration of Large Language Models (LLMs) in social robotics presents a unique set of ethical challenges and social impacts. This research is set out to identify ethical considerations that arise in the design and development of these two technologies in combination. Using LLMs for social robotics may provide benefits, such as enabling natural language open-domain dialogues. However, the intersection of these two technologies also gives rise to ethical concerns related to misinformation, non-verbal cues, emotional disruption, and biases. The robot's physical social embodiment adds complexity, as ethical hazards associated with LLM-based Social AI, such as hallucinations and misinformation, can be exacerbated due to the effects of physical embodiment on social perception and communication. To address these challenges, this study employs an empirical design justice-based methodology, focusing on identifying socio-technical ethical considerations through a qualitative co-design and interaction study. The purpose of the study is to identify ethical considerations relevant to the process of co-design of, and interaction with a humanoid social robot as the interface of a LLM, and to evaluate how a design justice methodology can be used in the context of designing LLMs-based social robotics. The findings reveal a mapping of ethical considerations arising in four conceptual dimensions: interaction, co-design, terms of service and relationship and evaluates how a design justice approach can be used empirically in the intersection of LLMs and social robotics.

Updated: 2024-06-10 15:53:50

标题: 一个实证设计公正方法来识别大型语言模型与社交机器人交叉领域的伦理考虑

摘要: 大型语言模型（LLMs）在社交机器人中的整合提出了一系列独特的道德挑战和社会影响。本研究旨在确定在设计和开发这两种技术结合时出现的道德考虑。在社交机器人中使用LLMs可能带来好处，例如实现自然语言开放领域对话。然而，这两种技术的交集也引发了与误导、非语言线索、情绪干扰和偏见相关的道德关切。机器人的物理社交体现增加了复杂性，因为与基于LLM的社交AI相关的道德危害，如幻觉和误导，可能因物理体现对社交感知和沟通的影响而加剧。为了解决这些挑战，本研究采用了一个基于经验的设计正义方法论，重点是通过定性共同设计和互动研究确定社会技术道德考虑。该研究的目的是确定与将人形社交机器人作为LLM界面共同设计和互动过程相关的道德考虑，并评估设计正义方法如何在基于LLM的社交机器人设计中使用。研究结果显示了在四个概念维度中出现的道德考虑的映射：互动、共同设计、服务条款和关系，并评估设计正义方法如何在LLMs和社交机器人的交集中经验性地使用。

更新时间: 2024-06-10 15:53:50

领域: cs.RO,cs.AI,I.2; K.4; H.5

下载: http://arxiv.org/abs/2406.06400v1

Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue

We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LLM adaptation techniques when applied to different dialogue types. We have selected two base LLMs, Llama-2 and Mistral, and four dialogue types Open-Domain, Knowledge-Grounded, Task-Oriented, and Question Answering. We evaluate the performance of in-context learning and fine-tuning techniques across datasets selected for each dialogue type. We assess the impact of incorporating external knowledge to ground the generation in both scenarios of Retrieval-Augmented Generation (RAG) and gold knowledge. We adopt consistent evaluation and explainability criteria for automatic metrics and human evaluation protocols. Our analysis shows that there is no universal best-technique for adapting large language models as the efficacy of each technique depends on both the base LLM and the specific type of dialogue. Last but not least, the assessment of the best adaptation technique should include human evaluation to avoid false expectations and outcomes derived from automatic metrics.

Updated: 2024-06-10 15:52:49

标题: 我们应该微调还是使用RAG？评估不同技术用于为对话调整LLMs

摘要: 我们研究了大型语言模型（LLMs）在人机对话中生成响应任务中的限制。文献中提出了几种针对不同对话类型（例如，开放域）的技术。然而，这些技术的评估在基本LLMs、对话类型和评估指标方面受到限制。在这项工作中，我们广泛分析了不同的LLM适应技术在应用于不同对话类型时的效果。我们选择了两个基本LLMs，Llama-2和Mistral，以及四种对话类型开放域、基于知识、任务导向和问答。我们评估了针对每种对话类型选择的数据集的上下文学习和微调技术的性能。我们评估了在检索增强生成（RAG）和黄金知识两种场景中结合外部知识以使生成接地的影响。我们采用一致的评估和可解释性标准来评估自动指标和人类评估协议。我们的分析表明，没有一种通用的最佳技术适用于调整大型语言模型，因为每种技术的效果取决于基本LLM和特定类型的对话。最后，评估最佳适应技术应包括人类评估，以避免从自动指标得出的虚假期望和结果。

更新时间: 2024-06-10 15:52:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06399v1

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server (PS), i.e., the effect on the generalization error of how often the clients' local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of $R$. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with $R$ than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$. Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of $R$.

Updated: 2024-06-10 15:52:35

标题: 《联邦学习的泛化误差分析中的教训：你可能会更少地进行通信！》

摘要: 我们研究了在联邦学习（FL）设置中统计学习模型的泛化误差。具体来说，我们研究了通信轮次$R$和$K$个客户端与参数服务器（PS）之间的通信次数对泛化误差的影响，即客户端的本地模型在PS处聚合的频率对泛化误差的影响。在我们的设置中，客户端与PS通信越频繁，它们在每一轮本地训练中使用的数据就越少，从而每个客户端的训练数据量对于不同的$R$值是相同的。我们建立了关于泛化误差的PAC-Bayes和率失真理论上的界限，明确考虑了轮次$R$的影响，除了参与设备数量$K$和个体数据集大小$n$。这些界限适用于大量损失函数和学习算法，似乎是FL设置中首次出现的。此外，我们将我们的界限应用于FL类型的支持向量机（FSVM）；并在这种情况下导出（更）明确的界限。特别地，我们表明FSVM的泛化界限随着$R$的增加而增加，暗示更频繁地与PS通信会降低泛化能力。这意味着总体风险随$R$的减少速度不如经验风险那样快。此外，我们的界限表明FSVM的泛化误差比集中式学习快$\mathcal{O}(\sqrt{\log(K)/K})$的因素。最后，我们提供了使用神经网络（ResNet-56）获得的实验结果，表明我们对FSVM的观察不仅可能更普遍，而且总体风险甚至可能在某个$R$值之后开始增加。

更新时间: 2024-06-10 15:52:35

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2306.05862v2

Contrastive learning of T cell receptor representations

Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.

Updated: 2024-06-10 15:50:45

标题: T细胞受体表示的对比学习

摘要: 免疫学中T细胞受体（TCRs）与其配体相互作用的计算预测是一个重大挑战。尽管高通量实验取得了进展，但特异性标记的TCR数据仍然稀缺。在其他领域，对未标记数据进行语言模型的预训练已成功用于解决数据瓶颈问题。然而，目前还不清楚如何最好地为TCR特异性预测预训练蛋白质语言模型。在这里，我们介绍了一种名为SCEPTR（T细胞受体的简单对比嵌入）的TCR语言模型，能够进行数据高效的迁移学习。通过我们的模型，我们引入了一种结合自对比学习和掩码语言建模的新型预训练策略，使SCEPTR能够实现其最先进的性能。相比之下，现有的蛋白质语言模型和一个没有进行自对比学习的SCEPTR变种被序列对齐方法超越。我们预计对比学习将是解码TCR特异性规则的有用范式。

更新时间: 2024-06-10 15:50:45

领域: q-bio.BM,cs.AI,cs.LG,J.3; I.2.7

下载: http://arxiv.org/abs/2406.06397v1

Towards Lifelong Learning of Large Language Models: A Survey

As the applications of large language models (LLMs) expand across diverse fields, the ability of these models to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods, relying on static datasets, are increasingly inadequate for coping with the dynamic nature of real-world information. Lifelong learning, also known as continual or incremental learning, addresses this challenge by enabling LLMs to learn continuously and adaptively over their operational lifetime, integrating new knowledge while retaining previously learned information and preventing catastrophic forgetting. This survey delves into the sophisticated landscape of lifelong learning, categorizing strategies into two primary groups: Internal Knowledge and External Knowledge. Internal Knowledge includes continual pretraining and continual finetuning, each enhancing the adaptability of LLMs in various scenarios. External Knowledge encompasses retrieval-based and tool-based lifelong learning, leveraging external data sources and computational tools to extend the model's capabilities without modifying core parameters. The key contributions of our survey are: (1) Introducing a novel taxonomy categorizing the extensive literature of lifelong learning into 12 scenarios; (2) Identifying common techniques across all lifelong learning scenarios and classifying existing literature into various technique groups within each scenario; (3) Highlighting emerging techniques such as model expansion and data selection, which were less explored in the pre-LLM era. Through a detailed examination of these groups and their respective categories, this survey aims to enhance the adaptability, reliability, and overall performance of LLMs in real-world applications.

Updated: 2024-06-10 15:46:25

标题: 走向大型语言模型的终身学习：一项调查

摘要: 随着大型语言模型（LLMs）在各个领域的应用不断扩展，这些模型适应数据、任务和用户偏好不断变化的能力变得至关重要。传统的训练方法依赖于静态数据集，越来越无法满足应对现实世界信息动态性的需求。终身学习，也被称为持续学习或增量学习，通过使LLMs能够在其运行寿命内持续和自适应地学习，整合新知识同时保留先前学到的信息并防止灾难性遗忘来应对这一挑战。本调查深入探讨了终身学习的复杂领域，将策略分为两个主要组别：内部知识和外部知识。内部知识包括持续预训练和持续微调，每种方法都增强了LLMs在各种情况下的适应能力。外部知识包括基于检索和基于工具的终身学习，利用外部数据源和计算工具扩展模型的能力而不修改核心参数。我们调查的主要贡献是：（1）将广泛的终身学习文献分类为12种情景的新颖分类法；（2）识别所有终身学习情景中的常见技术，并将现有文献分类到各种情景中的不同技术组中；（3）强调新兴技术，如模型扩展和数据选择，在LLM时代之前较少被探索。通过对这些组和各自分类的详细审查，本调查旨在提升LLMs在现实应用中的适应性、可靠性和整体性能。

更新时间: 2024-06-10 15:46:25

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.06391v1

Low-Rank Quantization-Aware Training for LLMs

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and memory efficient. Quantization-aware training (QAT) methods, generally produce the best quantized performance, however it comes at the cost of potentially long training time and excessive memory usage, making it impractical when applying for LLMs. Inspired by parameter-efficient fine-tuning (PEFT) and low-rank adaptation (LoRA) literature, we propose LR-QAT -- a lightweight and memory-efficient QAT algorithm for LLMs. LR-QAT employs several components to save memory without sacrificing predictive performance: (a) low-rank auxiliary weights that are aware of the quantization grid; (b) a downcasting operator using fixed-point or double-packed integers and (c) checkpointing. Unlike most related work, our method (i) is inference-efficient, leading to no additional overhead compared to traditional PTQ; (ii) can be seen as a general extended pretraining framework, meaning that the resulting model can still be utilized for any downstream task afterwards; (iii) can be applied across a wide range of quantization settings, such as different choices quantization granularity, activation quantization, and seamlessly combined with many PTQ techniques. We apply LR-QAT to the LLaMA-2/3 and Mistral model families and validate its effectiveness on several downstream tasks. Our method outperforms common post-training quantization (PTQ) approaches and reaches the same model performance as full-model QAT at the fraction of its memory usage. Specifically, we can train a 7B LLM on a single consumer grade GPU with 24GB of memory.

Updated: 2024-06-10 15:44:22

标题: 低秩量化感知训练用于LLMs

摘要: 大型语言模型（LLMs）无处不在，但由于其计算和内存需求不断增加，它们的实际部署具有挑战性。量化是使它们更具计算和内存效率的最有效方法之一。量化感知训练（QAT）方法通常能够产生最佳的量化性能，但这可能导致训练时间过长和内存使用过多，使其在应用于LLMs时变得不切实际。受参数高效微调（PEFT）和低秩适应（LoRA）文献的启发，我们提出了LR-QAT——一种轻量级和内存高效的LLMs的QAT算法。LR-QAT采用多个组件来节省内存而不损害预测性能：（a）低秩辅助权重，这些权重意识到量化网格；（b）使用定点或双打包整数的下转换运算符；和（c）检查点。与大多数相关工作不同，我们的方法（i）具有推断效率，与传统PTQ相比没有额外开销；（ii）可以看作是一个通用的扩展预训练框架，意味着生成的模型仍然可以用于之后的任何下游任务；（iii）可以应用于各种量化设置，如不同选择的量化粒度、激活量化，并且可以与许多PTQ技术无缝结合。我们将LR-QAT应用于LLaMA-2/3和Mistral模型系列，并在几个下游任务上验证其有效性。我们的方法优于常见的后训练量化（PTQ）方法，并在仅使用一小部分内存时达到与完整模型QAT相同的模型性能。具体来说，我们可以在配备24GB内存的单个消费级GPU上训练一个7B的LLM。

更新时间: 2024-06-10 15:44:22

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06385v1

Learning a Reward Function for User-Preferred Appliance Scheduling

Accelerated development of demand response service provision by the residential sector is crucial for reducing carbon-emissions in the power sector. Along with the infrastructure advancement, encouraging the end users to participate is crucial. End users highly value their privacy and control, and want to be included in the service design and decision-making process when creating the daily appliance operation schedules. Furthermore, unless they are financially or environmentally motivated, they are generally not prepared to sacrifice their comfort to help balance the power system. In this paper, we present an inverse-reinforcement-learning-based model that helps create the end users' daily appliance schedules without asking them to explicitly state their needs and wishes. By using their past consumption data, the end consumers will implicitly participate in the creation of those decisions and will thus be motivated to continue participating in the provision of demand response services.

Updated: 2024-06-10 15:43:30

标题: 学习用户偏好电器调度的奖励函数

摘要: 住宅部门加速发展需求响应服务是减少电力部门碳排放的关键。除了基础设施的进步外，鼓励最终用户参与至关重要。最终用户非常重视他们的隐私和控制权，并希望在制定日常家电运行时间表时包括在服务设计和决策过程中。此外，除非他们在财务或环境上受到激励，他们通常不愿意牺牲舒适度来帮助平衡电力系统。在本文中，我们提出了一种基于逆强化学习的模型，可以帮助创建最终用户的日常家电时间表，而无需要求他们明确陈述自己的需求和愿望。通过使用他们过去的消费数据，最终消费者将隐式参与这些决策的制定，并因此受到激励继续参与需求响应服务的提供。

更新时间: 2024-06-10 15:43:30

领域: cs.AI

下载: http://arxiv.org/abs/2310.07389v2

Active Learning with Simple Questions

We consider an active learning setting where a learner is presented with a pool S of n unlabeled examples belonging to a domain X and asks queries to find the underlying labeling that agrees with a target concept h^* \in H. In contrast to traditional active learning that queries a single example for its label, we study more general region queries that allow the learner to pick a subset of the domain T \subset X and a target label y and ask a labeler whether h^*(x) = y for every example in the set T \cap S. Such more powerful queries allow us to bypass the limitations of traditional active learning and use significantly fewer rounds of interactions to learn but can potentially lead to a significantly more complex query language. Our main contribution is quantifying the trade-off between the number of queries and the complexity of the query language used by the learner. We measure the complexity of the region queries via the VC dimension of the family of regions. We show that given any hypothesis class H with VC dimension d, one can design a region query family Q with VC dimension O(d) such that for every set of n examples S \subset X and every h^* \in H, a learner can submit O(d log n) queries from Q to a labeler and perfectly label S. We show a matching lower bound by designing a hypothesis class H with VC dimension d and a dataset S \subset X of size n such that any learning algorithm using any query class with VC dimension less than O(d) must make poly(n) queries to label S perfectly. Finally, we focus on well-studied hypothesis classes including unions of intervals, high-dimensional boxes, and d-dimensional halfspaces, and obtain stronger results. In particular, we design learning algorithms that (i) are computationally efficient and (ii) work even when the queries are not answered based on the learner's pool of examples S but on some unknown superset L of S

Updated: 2024-06-10 15:42:50

标题: 用简单问题进行主动学习

摘要: 我们考虑一个主动学习的情境，学习者被呈现一个包含 n 个未标记示例的池子 S，这些示例属于一个域 X，并询问问题以找到与目标概念 h^* \in H 一致的底层标记。与传统的主动学习不同，传统主动学习只询问单个示例的标签，我们研究更一般的区域查询，允许学习者选择域 X 的子集 T 和目标标签 y，并询问标记者对于集合 T \cap S 中的每个示例是否 h^*(x) = y。这种更强大的查询使我们能够绕过传统主动学习的限制，并使用更少的交互轮次学习，但可能会导致一个显著更复杂的查询语言。我们的主要贡献是量化查询数量与学习者使用的查询语言复杂性之间的权衡。我们通过区域族的 VC 维度来衡量区域查询的复杂性。我们展示了对于任何具有 VC 维度 d 的假设类 H，可以设计一个 VC 维度为 O(d) 的区域查询族 Q，以便对于每个集合 n 个示例 S \subset X 和每个 h^* \in H，学习者可以向标记者提交 O(d log n) 个来自 Q 的查询并完美地标记 S。我们通过设计一个具有 VC 维度 d 的假设类 H 和一个大小为 n 的数据集 S \subset X，展示了一个匹配的下界，这样任何使用 VC 维度小于 O(d) 的查询类的学习算法必须做出 poly(n) 个查询才能完美地标记 S。最后，我们专注于研究的假设类，包括区间的并集、高维箱子和 d 维半空间，并获得更强的结果。特别地，我们设计了学习算法，(i) 在计算上是高效的，(ii) 即使在查询不是基于学习者的示例池 S 而是基于一些未知的 S 的超集 L 上回答的情况下也能工作。

更新时间: 2024-06-10 15:42:50

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2405.07937v2

Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory. The Diffusion-DPO technique made initial strides by employing pairwise preference learning in diffusion models tailored for specific text prompts. We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively. This approach leverages both prompt-image pairs with identical prompts and those with semantically related content across various modalities. Furthermore, we have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low reproducibility, and limited interpretability prevalent in current evaluations of human preference alignment. Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0, achieving superior results in both automated evaluations of human preferences and style alignment. Our code is available at https://github.com/yigu1008/Diffusion-RPO

Updated: 2024-06-10 15:42:03

标题: Diffusion-RPO：通过相对偏好优化对齐扩散模型

摘要: 将大型语言模型与人类偏好对齐已经成为语言建模研究的一个关键焦点。然而，将偏好学习整合到文本到图像（T2I）生成模型中仍然是一个相对未知的领域。Diffusion-DPO技术通过在为特定文本提示量身定制的扩散模型中采用成对偏好学习，取得了初步进展。我们介绍了一种新方法Diffusion-RPO，旨在更有效地将基于扩散的T2I模型与人类偏好对齐。这种方法利用具有相同提示和那些在各种模态下具有语义相关内容的提示-图像对。此外，我们开发了一个新的评估指标，样式对齐，旨在克服当前人类偏好对齐评估中普遍存在的高成本、低可重复性和有限可解释性的挑战。我们的研究结果表明，Diffusion-RPO在调整稳定扩散版本1.5和XL-1.0中优于已建立的方法，包括监督微调和Diffusion-DPO，在人类偏好和样式对齐的自动评估中取得了优越的结果。我们的代码可在https://github.com/yigu1008/Diffusion-RPO 上找到。

更新时间: 2024-06-10 15:42:03

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06382v1

Are you still on track!? Catching LLM Task Drift with Activations

Large Language Models (LLMs) are routinely used in retrieval-augmented applications to orchestrate tasks and process inputs from users and other sources. These inputs, even in a single LLM interaction, can come from a variety of sources, of varying trustworthiness and provenance. This opens the door to prompt injection attacks, where the LLM receives and acts upon instructions from supposedly data-only sources, thus deviating from the user's original instructions. We define this as task drift, and we propose to catch it by scanning and analyzing the LLM's activations. We compare the LLM's activations before and after processing the external input in order to detect whether this input caused instruction drift. We develop two probing methods and find that simply using a linear classifier can detect drift with near perfect ROC AUC on an out-of-distribution test set. We show that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions, without being trained on any of these attacks. Our setup does not require any modification of the LLM (e.g., fine-tuning) or any text generation, thus maximizing deployability and cost efficiency and avoiding reliance on unreliable model output. To foster future research on activation-based task inspection, decoding, and interpretability, we will release our large-scale TaskTracker toolkit, comprising a dataset of over 500K instances, representations from 4 SoTA language models, and inspection tools.

Updated: 2024-06-10 15:39:56

标题: 你仍然在正确的轨道上吗！？通过激活捕捉LLM任务漂移

摘要: 大型语言模型（LLMs）通常用于检索增强应用程序，以协调任务并处理来自用户和其他来源的输入。即使在单个LLM交互中，这些输入也可能来自各种来源，信任度和来源也不同。这为即时注入攻击打开了大门，在这种攻击中，LLM接收并执行来自据称仅包含数据的来源的指令，从而偏离用户的原始指令。我们将此定义为任务漂移，并提议通过扫描和分析LLM的激活来捕捉它。我们比较LLM在处理外部输入之前和之后的激活，以检测此输入是否导致指令漂移。我们开发了两种探测方法，并发现仅使用线性分类器可以在一个分布之外的测试集上以几乎完美的ROC AUC检测漂移。我们展示了这种方法令人惊讶地泛化到了未见过的任务领域，如即时注入、越狱和恶意指令，而无需对任何这些攻击进行训练。我们的设置不需要对LLM进行任何修改（例如，微调）或任何文本生成，从而最大程度地提高了部署能力和成本效率，并避免依赖不可靠的模型输出。为了促进基于激活的任务检查、解码和可解释性的未来研究，我们将发布我们的大规模TaskTracker工具包，其中包括超过50万个实例的数据集、来自4个SoTA语言模型的表示和检查工具。

更新时间: 2024-06-10 15:39:56

领域: cs.CR,cs.CL,cs.CY

下载: http://arxiv.org/abs/2406.00799v2

ASTRA: Aligning Speech and Text Representations for Asr without Sampling

This paper introduces ASTRA, a novel method for improving Automatic Speech Recognition (ASR) through text injection.Unlike prevailing techniques, ASTRA eliminates the need for sampling to match sequence lengths between speech and text modalities. Instead, it leverages the inherent alignments learned within CTC/RNNT models. This approach offers the following two advantages, namely, avoiding potential misalignment between speech and text features that could arise from upsampling and eliminating the need for models to accurately predict duration of sub-word tokens. This novel formulation of modality (length) matching as a weighted RNNT objective matches the performance of the state-of-the-art duration-based methods on the FLEURS benchmark, while opening up other avenues of research in speech processing.

Updated: 2024-06-10 15:39:04

标题: ASTRA：无需采样即可对齐语音和文本表示进行ASR

摘要: 这篇论文介绍了ASTRA，一种通过文本注入来改进自动语音识别（ASR）的新方法。与现有技术不同，ASTRA消除了在语音和文本模态之间匹配序列长度所需的采样。相反，它利用了在CTC/RNNT模型中学习到的固有对齐。这种方法提供了以下两个优势，即避免可能由上采样引起的语音和文本特征之间的潜在错位，以及消除模型需要准确预测子词标记持续时间的需求。这种模态（长度）匹配的新颖形式作为加权RNNT目标与FLEURS基准上基于持续时间的最新方法的性能相匹配，同时为语音处理领域开辟了其他研究途径。

更新时间: 2024-06-10 15:39:04

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.06664v1

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).

Updated: 2024-06-10 15:37:46

标题: MOSA：跨模态音乐处理的音乐动作与语义标注数据集

摘要: 在跨模态音乐处理中，视觉、听觉和语义内容之间的转换为新的可能性和挑战打开了大门。这种转换方案的构建取决于具有全面数据基础设施的基准语料库。特别是，大规模跨模态数据集的组装提出了重大挑战。在本文中，我们介绍了MOSA（带有语义标注的音乐运动）数据集，包含了742位专业音乐表演者的高质量3-D运动捕捉数据，对齐的音频录音以及有关音高、节拍、乐句、动态、表现和和声的逐音符语义标注，总计超过30小时和570 K个音符的数据。据我们所知，这是迄今为止具有音符级别标注的最大跨模态音乐数据集。为了演示MOSA数据集的用途，我们提出了几个创新的跨模态音乐信息检索（MIR）和音乐内容生成任务，包括从音频、视频和运动数据中检测节拍、下拍、乐句和表现内容，以及从给定音乐音频生成音乐家的身体运动。数据集和代码可在本文旁边找到。

更新时间: 2024-06-10 15:37:46

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.06375v1

Improving Deep Learning-based Automatic Cranial Defect Reconstruction by Heavy Data Augmentation: From Image Registration to Latent Diffusion Models

Modeling and manufacturing of personalized cranial implants are important research areas that may decrease the waiting time for patients suffering from cranial damage. The modeling of personalized implants may be partially automated by the use of deep learning-based methods. However, this task suffers from difficulties with generalizability into data from previously unseen distributions that make it difficult to use the research outcomes in real clinical settings. Due to difficulties with acquiring ground-truth annotations, different techniques to improve the heterogeneity of datasets used for training the deep networks have to be considered and introduced. In this work, we present a large-scale study of several augmentation techniques, varying from classical geometric transformations, image registration, variational autoencoders, and generative adversarial networks, to the most recent advances in latent diffusion models. We show that the use of heavy data augmentation significantly increases both the quantitative and qualitative outcomes, resulting in an average Dice Score above 0.94 for the SkullBreak and above 0.96 for the SkullFix datasets. Moreover, we show that the synthetically augmented network successfully reconstructs real clinical defects. The work is a considerable contribution to the field of artificial intelligence in the automatic modeling of personalized cranial implants.

Updated: 2024-06-10 15:34:23

标题: 通过大量数据增强改进基于深度学习的自动颅骨缺损重建：从图像配准到潜在扩散模型

摘要: 建模和制造个性化颅骨植入物是重要的研究领域，可以减少患有颅部损伤的患者等待的时间。个性化植入物的建模可能部分自动化，通过使用基于深度学习的方法。然而，这项任务面临着将研究成果应用于真实临床环境中的困难，因为它难以泛化到以前未见过的数据分布。由于获取地面真实标注的困难，必须考虑并引入不同的技术来改善用于训练深度网络的数据集的异质性。在这项工作中，我们展示了对几种增强技术的大规模研究，从经典的几何变换、图像配准、变分自动编码器和生成对抗网络，到最新的潜在扩散模型。我们表明，使用大量数据增强显著提高了定量和定性结果，导致SkullBreak数据集的平均Dice分数超过0.94，SkullFix数据集的平均Dice分数超过0.96。此外，我们展示合成增强网络成功重建了真实临床缺陷。这项工作对于人工智能领域在个性化颅骨植入物自动建模方面是一项重大贡献。

更新时间: 2024-06-10 15:34:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06372v1

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks.

Updated: 2024-06-10 15:28:16

标题: Multimodal LLMs在基本视觉网络分析方面面临挑战：VNA基准测试

摘要: 我们评估了GPT-4和LLaVa在小规模图上执行简单视觉网络分析（VNA）任务的零样本能力。我们在5个与三个基础网络科学概念相关的任务上评估了视觉语言模型（VLMs）：在渲染图上识别最大度节点，识别有符号三元组是否平衡或不平衡，以及计算组件数量。这些任务结构设计为对理解基础图论概念的人类来说很容易，并且都可以通过计算图中适当元素来解决。我们发现，尽管GPT-4始终优于LLaVa，但两个模型在我们提出的每个视觉网络分析任务中都遇到了困难。我们公开发布了评估VLMs在基础VNA任务上的第一个基准。

更新时间: 2024-06-10 15:28:16

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.06634v2

Automating Food Drop: The Power of Two Choices for Dynamic and Fair Food Allocation

Food waste and food insecurity are two closely related pressing global issues. Food rescue organizations worldwide run programs aimed at addressing the two problems. In this paper, we partner with a non-profit organization in the state of Indiana that leads \emph{Food Drop}, a program that is designed to redirect rejected truckloads of food away from landfills and into food banks. The truckload to food bank matching decisions are currently made by an employee of our partner organization. In addition to this being a very time-consuming task, as perhaps expected from human-based matching decisions, the allocations are often skewed: a small percentage of the possible recipients receives the majority of donations. Our goal in this partnership is to completely automate Food Drop. In doing so, we need a matching algorithm for making real-time decisions that strikes a balance between ensuring fairness for the food banks that receive the food and optimizing efficiency for the truck drivers. In this paper, we describe the theoretical guarantees and experiments that dictated our choice of algorithm in the platform we built and deployed for our partner organization. Our work also makes contributions to the literature on load balancing and balls-into-bins games, that might be of independent interest. Specifically, we study the allocation of $m$ weighted balls into $n$ weighted bins, where each ball has two non-uniformly sampled random bin choices, and prove upper bounds, that hold with high probability, on the maximum load of any bin.

Updated: 2024-06-10 15:22:41

标题: 自动化食品投放：动态和公平食品分配的双选项力量

摘要: 食物浪费和食品不安全是两个密切相关的紧迫全球问题。全球范围内的食物救助组织开展旨在解决这两个问题的项目。在本文中，我们与印第安纳州的一家非营利组织合作，该组织领导着“食物投放”计划，旨在将被拒绝的货车载货从垃圾填埋场转移到食品银行。目前，货车到食品银行的匹配决策由我们的合作伙伴组织的一名员工负责。除了这是一项非常耗时的任务外，正如人为匹配决策所期望的那样，分配往往是倾斜的：少数潜在受益者中的大部分接收到了捐赠的大部分。我们在这个合作中的目标是完全自动化“食物投放”。为此，我们需要一个匹配算法，用于实时决策，既确保食品银行获得食品的公平性，又优化卡车司机的效率。在本文中，我们描述了我们选择算法的理论保证和实验，这些选择指导了我们为合作伙伴组织构建和部署的平台。我们的工作还对负载平衡和球到箱子游戏的文献做出了贡献，这可能是独立感兴趣的。具体来说，我们研究将$m$个加权球分配到$n$个加权箱子中，每个球有两个非均匀抽样的随机箱子选择，并证明了任何箱子的最大负载的上限，这个上限在很大概率下成立。

更新时间: 2024-06-10 15:22:41

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2406.06363v1

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.

Updated: 2024-06-10 15:21:41

标题: HALC: 通过自适应焦点-对比度解码减少对象幻觉

摘要: 大型视觉-语言模型（LVLMs）已经展示出在解释多模态情境方面的令人印象深刻的能力，但它们无可避免地遭受物体幻觉（OH）的困扰。我们介绍了HALC，一种旨在减轻LVLMs中OH的新型解码算法。HALC利用视觉-语言任务中的不同细粒度最佳视觉信息，并同时在局部和全局环境中运行。具体而言，HALC整合了一个强大的自动对焦接地机制（局部）以在运行过程中纠正幻觉标记，以及一个专门的束搜索算法（全局）以显著减少OH同时保持文本生成质量。此外，HALC可以作为即插即用模块集成到任何LVLMs中，无需额外训练。广泛的实验研究表明，HALC在减少OH方面的有效性，优于四个基准的最新技术。

更新时间: 2024-06-10 15:21:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.00425v2

MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows

Scientific innovation relies on detailed workflows, which include critical steps such as analyzing literature, generating ideas, validating these ideas, interpreting results, and inspiring follow-up research. However, scientific publications that document these workflows are extensive and unstructured. This makes it difficult for both human researchers and AI systems to effectively navigate and explore the space of scientific innovation. To address this issue, we introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of Scientific Workflows. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. Using Large Language Models (LLMs), we automatically extract five core aspects from these publications -- context, key idea, method, outcome, and projected impact -- which correspond to five key steps in the research workflow. These structured summaries facilitate a variety of downstream tasks and analyses. The quality of the LLM-extracted summaries is validated by comparing them with human annotations. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset, which make various types of predictions and recommendations along the scientific workflow. MASSW holds significant potential for researchers to create and benchmark new AI methods for optimizing scientific workflows and fostering scientific innovation in the field. Our dataset is openly available at \url{https://github.com/xingjian-zhang/massw}.

Updated: 2024-06-10 15:19:09

标题: MASSW：用于AI辅助科学工作流的新数据集和基准任务

摘要: 科学创新依赖于详细的工作流程，其中包括分析文献、生成想法、验证这些想法、解释结果和激发后续研究等关键步骤。然而，记录这些工作流程的科学出版物数量庞大且结构混乱。这使得人类研究人员和人工智能系统都难以有效地导航和探索科学创新领域。为了解决这个问题，我们介绍了MASSW，一个关于科学工作流的多方位摘要的全面文本数据集。MASSW包括来自过去50年17个领先计算机科学会议的超过152,000篇经过同行评审的出版物。使用大型语言模型(LLMs)，我们从这些出版物中自动提取五个核心方面--背景、关键想法、方法、结果和预期影响--这对应于研究工作流程中的五个关键步骤。这些结构化摘要有助于各种下游任务和分析。通过与人类注释进行比较，验证了LLM提取的摘要的质量。我们通过多个新颖的机器学习任务展示了MASSW的实用性，这些任务可以使用这个新数据集进行基准测试，可以进行各种类型的预测和建议沿着科学工作流程进行。MASSW对研究人员来说具有重要潜力，可以创建和基准测试新的人工智能方法，优化科学工作流程并促进该领域的科学创新。我们的数据集可以在\url{https://github.com/xingjian-zhang/massw}上公开获取。

更新时间: 2024-06-10 15:19:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06357v1

Re.Dis.Cover Place with Generative AI: Exploring the Experience and Design of City Wandering with Image-to-Image AI

The HCI field has demonstrated a growing interest in leveraging emerging technologies to enrich urban experiences. However, insufficient studies investigate the experience and design space of AI image technology (AIGT) applications for playful urban interaction, despite its widespread adoption. To explore this gap, we conducted an exploratory study involving four participants who wandered and photographed within Eindhoven Centre and interacted with an image-to-image AI. Preliminary findings present their observations, the effect of their familiarity with places, and how AIGT becomes an explorer's tool or co-speculator. We then highlight AIGT's capability of supporting playfulness, reimaginations, and rediscoveries of places through defamiliarizing and familiarizing cityscapes. Additionally, we propose the metaphor AIGT as a 'tourist' to discuss its opportunities for engaging explorations and risks of stereotyping places. Collectively, our research provides initial empirical insights and design considerations, inspiring future HCI endeavors for creating urban play with generative AI.

Updated: 2024-06-10 15:18:14

标题: 使用生成式人工智能重新发现地点：探索城市漫步的体验和设计与图像到图像人工智能

摘要: 人机交互领域表现出了越来越多的兴趣，利用新兴技术丰富城市体验。然而，尽管人工智能图像技术（AIGT）被广泛应用，但还缺乏对其在城市互动中的体验和设计空间进行研究。为了探索这一空白，我们进行了一项探索性研究，涉及四名在埃因霍温中心徘徊并拍照的参与者，并与图像到图像的人工智能进行互动。初步研究结果呈现了他们的观察，他们对地点的熟悉程度的影响，以及AIGT如何成为探险者的工具或共同推测者。然后，我们强调AIGT支持游戏性、重新想象和重新发现地点的能力，通过使城市景观熟悉和陌生化。此外，我们提出将AIGT比作“游客”的隐喻，讨论其参与探索的机会和对地点刻板印象的风险。总的来说，我们的研究提供了初步的实证见解和设计考虑，鼓舞未来的人机交互努力，创建具有生成式人工智能的城市游戏。

更新时间: 2024-06-10 15:18:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.06356v1

On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions

We investigate the out-of-domain generalization of random feature (RF) models and Transformers. We first prove that in the `generalization on the unseen (GOTU)' setting, where training data is fully seen in some part of the domain but testing is made on another part, and for RF models in the small feature regime, the convergence takes place to interpolators of minimal degree as in the Boolean case (Abbe et al., 2023). We then consider the sparse target regime and explain how this regime relates to the small feature regime, but with a different regularization term that can alter the picture in the non-Boolean case. We show two different outcomes for the sparse regime with q-ary data tokens: (1) if the data is embedded with roots of unities, then a min-degree interpolator is learned like in the Boolean case for RF models, (2) if the data is not embedded as such, e.g., simply as integers, then RF models and Transformers may not learn minimal degree interpolators. This shows that the Boolean setting and its roots of unities generalization are special cases where the minimal degree interpolator offers a rare characterization of how learning takes place. For more general integer and real-valued settings, a more nuanced picture remains to be fully characterized.

Updated: 2024-06-10 15:14:33

标题: 关于非布尔函数在未被看到的泛化中的最小度偏差

摘要: 我们研究了随机特征（RF）模型和Transformer模型的域外泛化。我们首先证明，在“在未见过的情况下泛化（GOTU）”设置中，在该设置中，训练数据在某个领域的一部分完全可见，但测试是在另一个部分进行的，并且对于小特征规模中的RF模型，收敛会发生在到达最小程度的插值器，就像在布尔情况下一样（Abbe等人，2023）。然后，我们考虑稀疏目标规模，并解释这种规模与小特征规模之间的关系，但使用不同的正则化项，可以改变非布尔情况下的情况。我们展示了针对q元数据令牌的稀疏规模的两种不同结果：（1）如果数据嵌入了单位根，那么就像RF模型的布尔情况一样，会学习最小程度的插值器，（2）如果数据没有被嵌入为这样，例如，仅仅是整数，那么RF模型和Transformer模型可能不会学习最小程度的插值器。这表明，布尔设置及其单位根广义是特殊情况，最小程度的插值器提供了一个学习发生的罕见描述。对于更一般的整数和实值设置，一个更微妙的情况仍然需要被完全描述。

更新时间: 2024-06-10 15:14:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.06354v1

Cascading Unknown Detection with Known Classification for Open Set Recognition

Deep learners tend to perform well when trained under the closed set assumption but struggle when deployed under open set conditions. This motivates the field of Open Set Recognition in which we seek to give deep learners the ability to recognize whether a data sample belongs to the known classes trained on or comes from the surrounding infinite world. Existing open set recognition methods typically rely upon a single function for the dual task of distinguishing between knowns and unknowns as well as making known class distinction. This dual process leaves performance on the table as the function is not specialized for either task. In this work, we introduce Cascading Unknown Detection with Known Classification (Cas-DC), where we instead learn specialized functions in a cascading fashion for both known/unknown detection and fine class classification amongst the world of knowns. Our experiments and analysis demonstrate that Cas-DC handily outperforms modern methods in open set recognition when compared using AUROC scores and correct classification rate at various true positive rates.

Updated: 2024-06-10 15:13:07

标题: 使用已知分类进行级联未知检测，用于开放集识别

摘要: 深度学习者在封闭集假设下训练时往往表现良好，但在开放集条件下部署时却面临困难。这促使开放集识别领域的发展，我们希望给予深度学习者识别数据样本是否属于已训练的已知类别，或者来自周围无限世界的能力。现有的开放集识别方法通常依赖于单一函数来完成区分已知和未知样本，同时进行已知类别的区分。这种双重过程导致性能下降，因为函数对任一任务都不是专门化的。在这项工作中，我们引入了级联未知检测与已知分类（Cas-DC），我们通过级联方式学习专门化函数，用于已知/未知检测以及在已知类别的世界中进行精细分类。我们的实验和分析表明，与现代方法相比，Cas-DC在使用AUROC分数和在各种真正正例率下的正确分类率时，在开放集识别方面表现出色。

更新时间: 2024-06-10 15:13:07

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06351v1

Error Analysis and Numerical Algorithm for PDE Approximation with Hidden-Layer Concatenated Physics Informed Neural Networks

We present the hidden-layer concatenated physics informed neural network (HLConcPINN) method, which combines hidden-layer concatenated feed-forward neural networks, a modified block time marching strategy, and a physics informed approach for approximating partial differential equations (PDEs). We analyze the convergence properties and establish the error bounds of this method for two types of PDEs: parabolic (exemplified by the heat and Burgers' equations) and hyperbolic (exemplified by the wave and nonlinear Klein-Gordon equations). We show that its approximation error of the solution can be effectively controlled by the training loss for dynamic simulations with long time horizons. The HLConcPINN method in principle allows an arbitrary number of hidden layers not smaller than two and any of the commonly-used smooth activation functions for the hidden layers beyond the first two, with theoretical guarantees. This generalizes several recent neural-network techniques, which have theoretical guarantees but are confined to two hidden layers in the network architecture and the $\tanh$ activation function. Our theoretical analyses subsequently inform the formulation of appropriate training loss functions for these PDEs, leading to physics informed neural network (PINN) type computational algorithms that differ from the standard PINN formulation. Ample numerical experiments are presented based on the proposed algorithm to validate the effectiveness of this method and confirm aspects of the theoretical analyses.

Updated: 2024-06-10 15:12:53

标题: PDE近似的误差分析和数值算法：具有隐藏层连接物理信息神经网络的情况

摘要: 我们提出了隐藏层串联物理信息神经网络（HLConcPINN）方法，该方法结合了隐藏层串联前馈神经网络、修改的块时间推进策略以及逼近偏微分方程（PDEs）的物理信息方法。我们分析了该方法在两种类型的PDEs（以热方程和Burgers'方程为例的拟合和以波动方程和非线性Klein-Gordon方程为例的双曲线方程）中的收敛性质并建立了误差界限。我们展示了通过训练损失可以有效控制其解的逼近误差，特别适用于具有较长时间跨度的动态模拟。HLConcPINN方法原则上允许任意数量不小于两个的隐藏层以及隐藏层第三层及以后使用任何常用的平滑激活函数，并具有理论保证。这扩展了几种最近的具有理论保证但限制在网络结构中两个隐藏层和$\tanh$激活函数的神经网络技术。我们的理论分析随后为这些PDEs的适当训练损失函数的制定提供信息，从而导致与标准PINN公式不同的物理信息神经网络（PINN）类型的计算算法。基于所提出的算法进行了大量数值实验，验证了该方法的有效性并确认了理论分析的某些方面。

更新时间: 2024-06-10 15:12:53

领域: math.NA,cs.LG,cs.NA,physics.comp-ph

下载: http://arxiv.org/abs/2406.06350v1

Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Foundation models have significantly advanced medical image analysis through the pre-train fine-tune paradigm. Among various fine-tuning algorithms, Parameter-Efficient Fine-Tuning (PEFT) is increasingly utilized for knowledge transfer across diverse tasks, including vision-language and text-to-image generation. However, its application in medical image analysis is relatively unexplored due to the lack of a structured benchmark for evaluating PEFT methods. This study fills this gap by evaluating 17 distinct PEFT algorithms across convolutional and transformer-based networks on image classification and text-to-image generation tasks using six medical datasets of varying size, modality, and complexity. Through a battery of over 700 controlled experiments, our findings demonstrate PEFT's effectiveness, particularly in low data regimes common in medical imaging, with performance gains of up to 22% in discriminative and generative tasks. These recommendations can assist the community in incorporating PEFT into their workflows and facilitate fair comparisons of future PEFT methods, ensuring alignment with advancements in other areas of machine learning and AI.

Updated: 2024-06-10 15:11:40

标题: 参数高效微调用于医学图像分析：被忽视的机会

摘要: 基于基础模型的预训练微调模式显著推动了医学图像分析的发展。在各种微调算法中，参数高效微调（PEFT）越来越多地用于在不同任务之间进行知识转移，包括视觉-语言和文本到图像生成。然而，由于缺乏用于评估PEFT方法的结构化基准，其在医学图像分析中的应用相对较少探索。本研究通过评估17种不同的PEFT算法，跨卷积和基于transformer的网络，在图像分类和文本到图像生成任务上使用六个不同大小、模态和复杂性的医学数据集，填补了这一空白。通过700多次受控实验，我们的研究结果表明PEFT的有效性，特别是在医学成像中常见的低数据情况下，其在区分和生成任务中的性能提升高达22%。这些建议可以帮助社区将PEFT纳入其工作流程，并促进未来PEFT方法的公平比较，确保与机器学习和人工智能其他领域的进展保持一致。

更新时间: 2024-06-10 15:11:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.08252v4

Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices

Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficiency scoring, with an objective to foster transparency in power and energy consumption within deep learning across various edge devices. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications. Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models. Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research. Find data, code, and more up-to-date information at https://amai-gsu.github.io/DeepEn2023.

Updated: 2024-06-10 15:09:24

标题: 揭示深度学习中的能效问题：跨边缘设备的测量、预测和评分

摘要: 今天，深度学习优化主要集中在实现高推理准确性和减少延迟方面的研究驱动。然而，能源效率方面经常被忽视，可能是由于该领域缺乏可持续性思维和整体能源数据集。在本文中，我们进行了一项三方面的研究，包括能源测量、预测和效率评分，旨在促进透明度，深度学习在各种边缘设备上的功耗和能耗。首先，我们提出了一项详细的、首创性的测量研究，揭示了设备上深度学习的能耗特性。这项研究结果产生了三个广泛的边缘设备能源数据集，涵盖了各种内核、最先进的DNN模型和流行的AI应用。其次，我们基于内核级别的能源数据集设计并实施了边缘设备的第一个内核级别能源预测器。评估结果表明，我们的预测器能够在未经训练的DNN模型上提供一致和准确的能量估计。最后，我们引入了两个评分指标，PCS和IECS，旨在将边缘设备的复杂功耗和能耗数据转换为易于理解的方式，供边缘设备终端用户使用。我们希望我们的工作可以帮助改变终端用户和研究社区的思维，朝着边缘计算的可持续性发展，这是我们研究的驱动原则。在https://amai-gsu.github.io/DeepEn2023 上找到数据、代码和更多更新信息。

更新时间: 2024-06-10 15:09:24

领域: cs.NI,cs.AI,cs.LG,cs.PF,I.2.11

下载: http://arxiv.org/abs/2310.18329v2

Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of directed acyclic graphs, to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap. In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees. We leverage the idea of a superstructure -- a set of learned or existing candidate hypotheses -- to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces.

Updated: 2024-06-10 15:08:14

标题: 在具有因果图分区的高维结构假设空间中的因果发现

摘要: 在许多科学领域中，目标是理解支配变量分布的机制，从一组初始假设开始。因果发现使我们能够以一种广义的方式推断机制，即因果关系的集合，而不一定要针对特定领域进行调整。因果发现算法在由有向无环图组成的结构化假设空间中搜索，以找到最能解释数据的图形。然而，对于高维问题，这种搜索变得难以处理，需要可扩展的因果发现算法来弥合差距。在本文中，我们定义了一种新颖的因果图分区，允许通过分治法进行因果发现，并具有理论保证。我们利用了一个超结构的概念 -- 一组学习或现有的候选假设 -- 来划分搜索空间。在某些假设下，我们证明了学习与因果图分区总是产生真实因果图的马尔可夫等价类。我们展示了我们的算法在生物调谐的合成网络和包含多达${10^4}$个变量的网络上实现了可比较的准确性和更快的解决时间。这使我们的方法适用于基因调控网络推断和其他具有高维结构化假设空间的领域。

更新时间: 2024-06-10 15:08:14

领域: cs.LG,cs.DC,stat.ME

下载: http://arxiv.org/abs/2406.06348v1

Sparsity regularization via tree-structured environments for disentangled representations

Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal representation learning -- the task of correctly mapping low-level observations to latent causal variables -- could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inferring latent variables from multiple related datasets (environments) and tasks. As a running example, we consider the task of predicting a phenotype from gene expression, where we often collect data from multiple cell types or organisms that are related in known ways. The key insight is that the mapping from latent variables driven by gene expression to the phenotype of interest changes sparsely across closely related environments. To model sparse changes, we introduce Tree-Based Regularization (TBR), an objective that minimizes both prediction error and regularizes closely related environments to learn similar predictors. We prove that under assumptions about the degree of sparse changes, TBR identifies the true latent variables up to some simple transformations. We evaluate the theory empirically with both simulations and ground-truth gene expression data. We find that TBR recovers the latent causal variables better than related methods across these settings, even under settings that violate some assumptions of the theory.

Updated: 2024-06-10 15:06:41

标题: 通过树形结构环境实现稀疏正则化以获得分离的表示

摘要: 许多因果系统，如细胞中的生物过程，只能通过间接测量，如基因表达，来观察。因果表示学习——正确将低级观察映射到潜在因果变量的任务——可以通过使得对潜在变量（如通路激活）的推断成为可能，从而推动科学理解的进展。在本文中，我们开发了一种从多个相关数据集（环境）和任务中推断潜在变量的方法。作为一个运行示例，我们考虑从基因表达预测表型的任务，我们经常收集来自多个已知相关的细胞类型或生物体的数据。关键的洞察是，由基因表达驱动的潜在变量到感兴趣表型的映射在密切相关的环境中变化得很稀疏。为了建模稀疏变化，我们引入了基于树的正则化（TBR），这是一个旨在同时最小化预测误差并使得密切相关的环境学习相似预测器的目标。我们证明，在关于稀疏变化程度的假设下，TBR可以识别出真实的潜在变量，直到一些简单的转换。我们通过模拟和地面真实基因表达数据来经验评估这一理论。我们发现，在这些设置下，即使违反了一些理论的假设，TBR也比相关方法更好地恢复了潜在的因果变量。

更新时间: 2024-06-10 15:06:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20482v2

Neural Wave Functions for Superfluids

Understanding superfluidity remains a major goal of condensed matter physics. Here we tackle this challenge utilizing the recently developed Fermionic neural network (FermiNet) wave function Ansatz [D. Pfau et al., Phys. Rev. Res. 2, 033429 (2020).] for variational Monte Carlo calculations. We study the unitary Fermi gas, a system with strong, short-range, two-body interactions known to possess a superfluid ground state but difficult to describe quantitatively. We demonstrate key limitations of the FermiNet Ansatz in studying the unitary Fermi gas and propose a simple modification based on the idea of an antisymmetric geminal power singlet (AGPs) wave function. The new AGPs FermiNet outperforms the original FermiNet significantly in paired systems, giving results which are more accurate than fixed-node diffusion Monte Carlo and are consistent with experiment. We prove mathematically that the new Ansatz, which only differs from the original Ansatz by the method of antisymmetrization, is a strict generalization of the original FermiNet architecture, despite the use of fewer parameters. Our approach shares several advantages with the original FermiNet: the use of a neural network removes the need for an underlying basis set; and the flexibility of the network yields extremely accurate results within a variational quantum Monte Carlo framework that provides access to unbiased estimates of arbitrary ground-state expectation values. We discuss how the method can be extended to study other superfluids.

Updated: 2024-06-10 15:05:39

标题: 超流体的神经波函数

摘要: 理解超流性仍然是凝聚态物理学的主要目标。在这里，我们利用最近开发的费米子神经网络（FermiNet）波函数假设[D. Pfau等人，Phys. Rev. Res. 2，033429（2020年）]进行变分蒙特卡洛计算来应对这一挑战。我们研究了幺正费米气体，这是一个具有强烈短程两体相互作用的系统，已知具有超流态基态，但难以定量描述。我们展示了费米子神经网络假设在研究幺正费米气体时的关键局限，并提出了一个基于反对称基态幂单体（AGPs）波函数思想的简单修改。新的AGPs FermiNet在成对系统中表现出色，其结果比固定节点扩散蒙特卡洛更为准确，并与实验结果一致。我们在数学上证明了这一新的假设，与原始假设仅在反对称方法上有所不同，是原始FermiNet体系的严格泛化，尽管使用的参数更少。我们的方法与原始FermiNet共享几个优势：神经网络的使用消除了基础基组的需求；网络的灵活性在变分量子蒙特卡洛框架中产生极其精确的结果，提供对任意基态期望值的无偏估计。我们讨论了如何将这种方法扩展到研究其他超流体。

更新时间: 2024-06-10 15:05:39

领域: cond-mat.quant-gas,cond-mat.supr-con,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2305.06989v4

Faster Convergence of Local SGD for Over-Parameterized Models

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of $\O(1/T)$ under a mild data similarity assumption and an error bound of $\O(K/T)$ otherwise, where $K$ is the number of local steps and $T$ is the total number of iterations. For non-convex loss functions we prove an error bound of $\O(K/T)$. These bounds improve upon the best previous bound of $\O(1/\sqrt{nT})$ in both cases, where $n$ is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the $\O(1/T)$ convergence rate of Local SGD is clearly shown.

Updated: 2024-06-10 15:04:22

标题: 过参数化模型的本地SGD更快的收敛速度

摘要: 现代机器学习架构通常具有高度表现力。它们通常是过度参数化的，并且可以通过将经验损失驱向零来插值数据。我们分析了在异质数据设置中针对这种过度参数化模型的本地SGD（或FedAvg）的收敛性，并通过建立以下收敛率来改进现有文献。对于一般的凸损失函数，我们在一个轻微的数据相似性假设下建立了一个$O(1/T)$的误差界，并在其他情况下建立了一个$O(K/T)$的误差界，其中$K$是本地步数，$T$是总迭代次数。对于非凸损失函数，我们证明了一个$O(K/T)$的误差界。这些界限在两种情况下都优于先前最佳界限$O(1/\sqrt{nT})$，其中$n$是节点数，在没有关于模型过度参数化的假设的情况下。我们通过提供问题实例来完成我们的结果，这些问题实例中我们建立的收敛率与一个相对较小的步长密切相关。最后，我们通过进行大规模数值实验验证了我们的理论结果，揭示了对于实际过度参数化的深度学习模型，本地SGD的收敛行为，其中本地SGD的$O(1/T)$收敛率得到了清晰展示。

更新时间: 2024-06-10 15:04:22

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2201.12719v3

Bayesian Active Learning in the Presence of Nuisance Parameters

In many settings, such as scientific inference, optimization, and transfer learning, the learner has a well-defined objective, which can be treated as estimation of a target parameter, and no intrinsic interest in characterizing the entire data-generating process. Usually, the learner must also contend with additional sources of uncertainty or variables -- with nuisance parameters. Bayesian active learning, or sequential optimal experimental design, can straightforwardly accommodate the presence of nuisance parameters, and so is a natural active learning framework for such problems. However, the introduction of nuisance parameters can lead to bias in the Bayesian learner's estimate of the target parameters, a phenomenon we refer to as negative interference. We characterize the threat of negative interference and how it fundamentally changes the nature of the Bayesian active learner's task. We show that the extent of negative interference can be extremely large, and that accurate estimation of the nuisance parameters is critical to reducing it. The Bayesian active learner is confronted with a dilemma: whether to spend a finite acquisition budget in pursuit of estimation of the target or of the nuisance parameters. Our setting encompasses Bayesian transfer learning as a special case, and our results shed light on the phenomenon of negative transfer between learning environments.

Updated: 2024-06-10 15:02:31

标题: 贝叶斯主动学习在干扰参数存在的情况下

摘要: 在许多情况下，比如科学推断、优化和迁移学习等领域，学习者有一个明确定义的目标，可以视为对目标参数的估计，并且对描述整个数据生成过程没有固有兴趣。通常，学习者还必须应对额外的不确定性或变量 -- 即干扰参数。贝叶斯主动学习，或顺序最优实验设计，可以直接适应干扰参数的存在，因此是这类问题的自然主动学习框架。然而，引入干扰参数可能导致贝叶斯学习者对目标参数的估计出现偏差，我们将这种现象称为负干扰。我们对负干扰的威胁及其如何从根本上改变贝叶斯主动学习者任务的性质进行了刻画。我们展示负干扰的程度可能非常大，并且准确估计干扰参数对减少负干扰至关重要。贝叶斯主动学习者面临一个困境：是花费有限的获取预算来追求目标参数的估计，还是干扰参数的估计。我们的设置包括贝叶斯迁移学习作为一个特殊情况，并且我们的结果阐明了学习环境之间负迁移现象的原理。

更新时间: 2024-06-10 15:02:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.14968v3

Predicting Heart Activity from Speech using Data-driven and Knowledge-based features

Accurately predicting heart activity and other biological signals is crucial for diagnosis and monitoring. Given that speech is an outcome of multiple physiological systems, a significant body of work studied the acoustic correlates of heart activity. Recently, self-supervised models have excelled in speech-related tasks compared to traditional acoustic methods. However, the robustness of data-driven representations in predicting heart activity remained unexplored. In this study, we demonstrate that self-supervised speech models outperform acoustic features in predicting heart activity parameters. We also emphasize the impact of individual variability on model generalizability. These findings underscore the value of data-driven representations in such tasks and the need for more speech-based physiological data to mitigate speaker-related challenges.

Updated: 2024-06-10 15:01:46

标题: 使用数据驱动和基于知识的特征来预测言语对心脏活动的影响

摘要: 准确预测心脏活动和其他生物信号对于诊断和监测至关重要。鉴于语音是多个生理系统的结果，大量研究探讨了心脏活动的声学相关性。最近，自监督模型在语音相关任务中表现优异，相比传统的声学方法。然而，数据驱动表示在预测心脏活动中的稳健性仍未被探索。在这项研究中，我们证明了自监督语音模型在预测心脏活动参数方面优于声学特征。我们还强调了个体差异对模型泛化能力的影响。这些发现强调了在这类任务中数据驱动表示的价值，以及更多基于语音的生理数据来减轻与说话者相关的挑战的必要性。

更新时间: 2024-06-10 15:01:46

领域: cs.SD,cs.AI,eess.AS,eess.SP

下载: http://arxiv.org/abs/2406.06341v1

Optimisation of federated learning settings under statistical heterogeneity variations

Federated Learning (FL) enables local devices to collaboratively learn a shared predictive model by only periodically sharing model parameters with a central aggregator. However, FL can be disadvantaged by statistical heterogeneity produced by the diversity in each local devices data distribution, which creates different levels of Independent and Identically Distributed (IID) data. Furthermore, this can be more complex when optimising different combinations of FL parameters and choosing optimal aggregation. In this paper, we present an empirical analysis of different FL training parameters and aggregators over various levels of statistical heterogeneity on three datasets. We propose a systematic data partition strategy to simulate different levels of statistical heterogeneity and a metric to measure the level of IID. Additionally, we empirically identify the best FL model and key parameters for datasets of different characteristics. On the basis of these, we present recommended guidelines for FL parameters and aggregators to optimise model performance under different levels of IID and with different datasets

Updated: 2024-06-10 15:01:03

标题: 联邦学习设置在统计异质性变化下的优化

摘要: 联邦学习（FL）使本地设备通过仅定期与中央聚合器共享模型参数来共同学习共享的预测模型。然而，FL 可能会受制于由每个本地设备数据分布的多样性产生的统计异质性，这会产生不同水平的独立同分布（IID）数据。此外，在优化不同组合的 FL 参数和选择最佳聚合时，这可能会更加复杂。在本文中，我们对三个数据集上不同的 FL 训练参数和聚合器进行了实证分析。我们提出了一种系统的数据分区策略，以模拟不同水平的统计异质性，并提出了一种衡量 IID 级别的度量标准。此外，我们在实证基础上确定了适用于不同特征数据集的最佳 FL 模型和关键参数。基于这些结果，我们提出了针对不同水平的 IID 和不同数据集下优化模型性能的 FL 参数和聚合器的推荐指南。

更新时间: 2024-06-10 15:01:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06340v1

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

In recent years, Cross-Domain Recommendation (CDR) has drawn significant attention, which utilizes user data from multiple domains to enhance the recommendation performance. However, current CDR methods require sharing user data across domains, thereby violating the General Data Protection Regulation (GDPR). Consequently, numerous approaches have been proposed for Federated Cross-Domain Recommendation (FedCDR). Nevertheless, the data heterogeneity across different domains inevitably influences the overall performance of federated learning. In this study, we propose FedHCDR, a novel Federated Cross-Domain Recommendation framework with Hypergraph signal decoupling. Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features. The approach employs high-pass and low-pass hypergraph filters to decouple domain-exclusive and domain-shared user representations, which are trained by the local-global bi-directional transfer algorithm. In addition, a hypergraph contrastive learning (HCL) module is devised to enhance the learning of domain-shared user relationship information by perturbing the user hypergraph. Extensive experiments conducted on three real-world scenarios demonstrate that FedHCDR outperforms existing baselines significantly.

Updated: 2024-06-10 14:57:11

标题: FedHCDR：使用超图信号解耦的联邦跨领域推荐

摘要: 近年来，跨领域推荐（CDR）引起了广泛关注，利用来自多个领域的用户数据来提高推荐性能。然而，当前的CDR方法需要跨领域共享用户数据，从而违反了《通用数据保护条例》（GDPR）。因此，已经提出了许多用于联邦跨领域推荐（FedCDR）的方法。然而，不同领域之间的数据异质性不可避免地影响了联邦学习的整体性能。在本研究中，我们提出了FedHCDR，一种具有超图信号解耦的新型联邦跨领域推荐框架。具体来说，为了解决不同领域之间的数据异质性，我们引入了一种称为超图信号解耦（HSD）的方法，将用户特征分解为领域专属特征和领域共享特征。该方法利用高通和低通超图滤波器来解耦领域专属和领域共享用户表示，这些表示通过本地-全局双向传输算法进行训练。此外，设计了一个超图对比学习（HCL）模块，通过扰动用户超图来增强学习领域共享用户关系信息。在三个真实场景上进行的大量实验表明，FedHCDR明显优于现有基准。

更新时间: 2024-06-10 14:57:11

领域: cs.LG,cs.IR,cs.SI

下载: http://arxiv.org/abs/2403.02630v4

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment's unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.

Updated: 2024-06-10 14:56:21

标题: 使用混合专家的课程强化学习获取多样化技能

摘要: 强化学习（RL）是获取良好执行策略的强大方法。然而，在RL中学习多样化技能是具有挑战性的，这是由于常用的高斯策略参数化。我们提出了\textbf{Di}verse \textbf{Skil}l \textbf{L}earning（Di-SkilL），这是一种用于学习多样化技能的RL方法，使用专家组合，其中每个专家将技能形式化为上下文运动原语。Di-SkilL优化每个专家及其相关上下文分布，以最大熵目标激励学习在相似上下文中学习多样化技能。每个专家的上下文分布使得自动课程学习成为可能，使得每个专家可以专注于其上下文空间中表现最佳的子区域。为了克服在不了解环境未知上下文概率空间的情况下的硬性不连续性和多模态性，我们利用基于能量的模型来表示每个专家的上下文分布，并展示了如何可以使用标准策略梯度目标有效地训练它们。我们在具有挑战性的机器人模拟任务上展示了Di-SkilL可以学习多样化且表现良好的技能。

更新时间: 2024-06-10 14:56:21

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.06966v2

Seeking Interpretability and Explainability in Binary Activated Neural Networks

We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.

Updated: 2024-06-10 14:54:23

标题: 在二进制激活神经网络中寻求可解释性和可解释性

摘要: 我们研究了将二进制激活神经网络作为可解释和可解释的预测器在表格数据的回归任务中的应用；更具体地，我们提供了关于其表达能力的保证，并提出了一种基于有效计算SHAP值的方法，用于量化特征、隐藏神经元甚至权重的相对重要性。由于模型的简单性对于实现可解释性至关重要，我们提出了一种用于构建紧凑二进制激活网络的贪婪算法。这种方法不需要提前确定网络的架构：它逐层构建，逐个神经元构建，从而得到针对特定任务不会不必要复杂的预测器。

更新时间: 2024-06-10 14:54:23

领域: cs.LG

下载: http://arxiv.org/abs/2209.03450v3

Benchmarking Counterfactual Image Generation

Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel datasets and causal graphs, demonstrating the superiority of Hierarchical VAEs across most datasets and metrics. Our framework is implemented in a user-friendly Python package that can be extended to incorporate additional SCMs, causal methods, generative models, and datasets for the community to build on.

Updated: 2024-06-10 14:47:46

标题: 基准对照图像生成Benchmarking Counterfactual Image Generation

摘要: 生成式人工智能已经彻底改变了视觉内容编辑，使用户能够轻松修改图像和视频。然而，并非所有编辑都是相等的。要在自然图像或医学成像等领域执行现实编辑，修改必须尊重数据生成过程中固有的因果关系。这种图像编辑属于逆事实图像生成范畴。评估逆事实图像生成具有相当复杂性：不仅缺乏可观察的基本事实，还需要遵守因果约束。尽管存在多种逆事实图像生成方法和评估指标，但缺乏在统一设置中的全面比较。我们提出了一个比较框架，全面评估逆事实图像生成方法。我们整合了所有用于当前任务的模型，并将它们扩展到新的数据集和因果图中，展示了在大多数数据集和度量标准上分层VAE的优越性。我们的框架采用用户友好的Python软件包实现，可以扩展以包含额外的SCMs、因果方法、生成模型和数据集，供社区进一步发展使用。

更新时间: 2024-06-10 14:47:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.20287v2

MedExQA: Medical Question Answering Benchmark with Multiple Explanations

This paper introduces MedExQA, a novel benchmark in medical question-answering, to evaluate large language models' (LLMs) understanding of medical knowledge through explanations. By constructing datasets across five distinct medical specialties that are underrepresented in current datasets and further incorporating multiple explanations for each question-answer pair, we address a major gap in current medical QA benchmarks which is the absence of comprehensive assessments of LLMs' ability to generate nuanced medical explanations. Our work highlights the importance of explainability in medical LLMs, proposes an effective methodology for evaluating models beyond classification accuracy, and sheds light on one specific domain, speech language pathology, where current LLMs including GPT4 lack good understanding. Our results show generation evaluation with multiple explanations aligns better with human assessment, highlighting an opportunity for a more robust automated comprehension assessment for LLMs. To diversify open-source medical LLMs (currently mostly based on Llama2), this work also proposes a new medical model, MedPhi-2, based on Phi-2 (2.7B). The model outperformed medical LLMs based on Llama2-70B in generating explanations, showing its effectiveness in the resource-constrained medical domain. We will share our benchmark datasets and the trained model.

Updated: 2024-06-10 14:47:04

标题: MedExQA：具有多个解释的医学问题回答基准

摘要: 本文介绍了MedExQA，这是一个在医学问答领域的新型基准，旨在通过解释评估大型语言模型（LLMs）对医学知识的理解。通过构建跨越五个不同的医学专业领域的数据集，这些领域在当前数据集中很少涉及，并进一步为每个问题-答案对提供多个解释，我们填补了当前医学问答基准中的一个重要空白，即缺乏对LLMs生成细致医学解释能力的全面评估。我们的工作突出了医学LLMs中解释性的重要性，提出了一种评估模型超越分类准确性的有效方法，并揭示了一个特定领域，即言语语言病理学，在这个领域，包括GPT4在内的当前LLMs缺乏良好的理解。我们的结果显示，使用多个解释进行生成评估更符合人类评估，突显了为LLMs提供更强大的自动理解评估的机会。为了使开源医学LLMs更加多样化（目前主要基于Llama2），本文还提出了一个基于Phi-2（27B）的新医学模型MedPhi-2。该模型在生成解释方面优于基于Llama2-70B的医学LLMs，显示了在资源受限制的医学领域中的有效性。我们将分享我们的基准数据集和训练模型。

更新时间: 2024-06-10 14:47:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06331v1

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

There is an growing interest in using Large Language Models (LLMs) in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.

Updated: 2024-06-10 14:43:34

标题: 合作，竞争和恶意：LLM-利益相关者互动谈判

摘要: 越来越多的人对在多智能体系统中使用大型语言模型（LLMs）来解决需要有效协作和评估复杂情况的互动现实任务表现出兴趣。然而，我们对LLMs在多智能体设置中的沟通和决策能力仍了解有限。谈判的基本任务涵盖了许多沟通的关键特征，如合作、竞争和操纵潜力。因此，我们提出使用可评分的谈判来评估LLMs。我们创建了一个复杂的多智能体、多议题和语义丰富的谈判游戏测试平台。为了达成协议，智能体必须具备强大的算术、推理、探索和规划能力，并将它们整合在一个动态和多轮设置中。我们提出了多个指标来严格量化智能体的表现和与分配角色的一致性。我们提供了创建新游戏和增加游戏难度的程序，以便获得一个不断发展的基准。重要的是，我们评估了诸如受贪婪和对抗性玩家影响的智能体之间的相互作用动态等关键安全方面。我们的基准非常具有挑战性；大多数GPT-3.5和小型模型都表现不佳，而GPT-4和SoTA大型模型（例如Llama-3 70b）仍未达到预期表现水平。

更新时间: 2024-06-10 14:43:34

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2309.17234v2

The Emergence of Reproducibility and Generalizability in Diffusion Models

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.

Updated: 2024-06-10 14:37:45

标题: 扩散模型中可重复性和泛化性的出现

摘要: 在这项工作中，我们调查了扩散模型中一个引人注目且普遍存在的现象，我们称之为“一致模型可复现性”：在给定相同的起始噪声输入和确定性采样器的情况下，不同的扩散模型通常会产生非常相似的输出。我们通过全面的实验证实了这一现象，意味着不同的扩散模型无论是扩散模型框架、模型架构还是训练程序，都会一致地达到相同的数据分布和评分函数。更引人注目的是，我们进一步的研究表明，扩散模型正在学习受训练数据大小影响的不同分布。这得到了支持，因为模型可复现性表现在两个不同的训练阶段：（i）“记忆阶段”，在这个阶段，扩散模型过度拟合于训练数据分布；（ii）“泛化阶段”，在这个阶段，模型学习了基础数据分布。我们的研究还发现，这一有价值的特性适用于许多扩散模型的变体，包括条件使用、解决逆问题和模型微调。最后，我们的工作提出了许多引人注目的理论问题供未来研究，并突出了关于训练效率、模型隐私和扩散模型的受控生成方面的实际影响。

更新时间: 2024-06-10 14:37:45

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.05264v4

Should my Blockchain Learn to Drive? A Study of Hyperledger Fabric

Similar to other transaction processing frameworks, blockchain systems need to be dynamically reconfigured to adapt to varying workloads and changes in network conditions. However, achieving optimal reconfiguration is particularly challenging due to the complexity of the blockchain stack, which has diverse configurable parameters. This paper explores the concept of self-driving blockchains, which have the potential to predict workload changes and reconfigure themselves for optimal performance without human intervention. We compare and contrast our discussions with existing research on databases and highlight aspects unique to blockchains. We identify specific parameters and components in Hyperledger Fabric, a popular permissioned blockchain system, that are suitable for autonomous adaptation and offer potential solutions for the challenges involved. Further, we implement three demonstrative locally autonomous systems, each targeting a different layer of the blockchain stack, and conduct experiments to understand the feasibility of our findings. Our experiments indicate up to 11% improvement in success throughput and a 30% decrease in latency, making this a significant step towards implementing a fully autonomous blockchain system in the future.

Updated: 2024-06-10 14:33:59

标题: 我的区块链应该学会驾驶吗？一项关于Hyperledger Fabric的研究

摘要: 与其他事务处理框架类似，区块链系统需要动态重新配置以适应不同的工作负载和网络条件的变化。然而，由于区块链堆栈的复杂性，其中具有多样化可配置参数，实现最佳重新配置尤为具有挑战性。本文探讨了自动驾驶区块链的概念，该概念有潜力预测工作负载变化，并在无需人工干预的情况下重新配置以实现最佳性能。我们将我们的讨论与现有数据库研究进行比较和对比，并突出了区块链独有的方面。我们确定了适合自主适应的特定参数和组件，这些参数和组件位于流行的许可区块链系统Hyperledger Fabric中，并提供了解决所涉挑战的潜在解决方案。此外，我们实现了三个示范性的本地自主系统，每个系统针对区块链堆栈的不同层，并进行实验以了解我们研究结果的可行性。我们的实验表明，成功吞吐量提高了最多11％，延迟降低了30％，这是未来实现完全自主区块链系统的一个重要步骤。

更新时间: 2024-06-10 14:33:59

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2406.06318v1

Tx-LLM: A Large Language Model for Therapeutics

Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, and AI models capable of expediting the process would be invaluable. However, the majority of current AI approaches address only a narrowly defined set of tasks, often circumscribed within a particular domain. To bridge this gap, we introduce Tx-LLM, a generalist large language model (LLM) fine-tuned from PaLM-2 which encodes knowledge about diverse therapeutic modalities. Tx-LLM is trained using a collection of 709 datasets that target 66 tasks spanning various stages of the drug discovery pipeline. Using a single set of weights, Tx-LLM simultaneously processes a wide variety of chemical or biological entities(small molecules, proteins, nucleic acids, cell lines, diseases) interleaved with free-text, allowing it to predict a broad range of associated properties, achieving competitive with state-of-the-art (SOTA) performance on 43 out of 66 tasks and exceeding SOTA on 22. Among these, Tx-LLM is particularly powerful and exceeds best-in-class performance on average for tasks combining molecular SMILES representations with text such as cell line names or disease names, likely due to context learned during pretraining. We observe evidence of positive transfer between tasks with diverse drug types (e.g.,tasks involving small molecules and tasks involving proteins), and we study the impact of model size, domain finetuning, and prompting strategies on performance. We believe Tx-LLM represents an important step towards LLMs encoding biochemical knowledge and could have a future role as an end-to-end tool across the drug discovery development pipeline.

Updated: 2024-06-10 14:33:02

标题: Tx-LLM：一种用于治疗的大型语言模型

摘要: 开发治疗方法是一个漫长且昂贵的过程，需要满足许多不同的标准，而能够加速这一过程的人工智能模型将是非常宝贵的。然而，目前大多数人工智能方法只针对一个狭窄定义的一组任务，通常局限在特定领域内。为了弥合这一差距，我们引入了Tx-LLM，这是一个通用的大型语言模型(LLM)，从PaLM-2微调而来，编码了有关各种治疗模式的知识。Tx-LLM使用一组包含709个数据集的训练集进行训练，涵盖了药物发现流程的各个阶段的66个任务。使用一组权重，Tx-LLM同时处理各种化学或生物实体（小分子、蛋白质、核酸、细胞系、疾病）与自由文本交错，使其能够预测广泛的相关属性，在43个任务中取得了与最先进技术（SOTA）性能竞争力相当的成果，并在22个任务上超过了SOTA。在这些任务中，Tx-LLM特别强大，并且在结合分子SMILES表示与文本（如细胞系名称或疾病名称）的任务方面表现优异，这可能是由于在预训练期间学到的上下文。我们观察到在涉及不同药物类型的任务之间存在积极的迁移证据（例如涉及小分子的任务和涉及蛋白质的任务），并研究了模型大小、领域微调和提示策略对性能的影响。我们认为Tx-LLM是向LLM编码生化知识迈出的重要一步，未来可能在整个药物发现开发流程中扮演端到端工具的角色。

更新时间: 2024-06-10 14:33:02

领域: cs.CL,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2406.06316v1

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.

Updated: 2024-06-10 14:31:38

标题: ProAct：用于增强DNN韧性的渐进式混合剪切激活函数训练

摘要: 深度神经网络（DNNs）被广泛应用于安全关键应用中，确保硬件可靠性是首要关注的问题。为了提高DNNs对硬件故障的可靠性，激活限制技术显著减轻了在DNN结构级别上的故障影响，无论加速器架构如何。最先进的方法提供了神经元级或层级裁剪激活函数。它们尝试使用启发式和基于学习的方法确定最佳裁剪阈值。层级裁剪激活函数无法在高误比特率下保持DNNs的弹性。另一方面，神经元级裁剪激活函数由于参数的增加而引入了相当大的内存开销，从而增加了其对故障的脆弱性。此外，基于启发式的优化方法在搜索过程中要求进行大量的故障注入，导致耗时的阈值识别。另一方面，基于学习的技术会同时为整个层训练阈值，通常会产生次优结果。在这项工作中，首先我们证明了在DNNs的所有层中都不必必须使用神经元级激活函数。然后，我们提出了一种混合裁剪激活函数，将神经元级和层级方法相结合，仅在DNNs的最后一层中应用神经元级裁剪。此外，为了在裁剪激活函数中获得最佳阈值，我们引入了ProAct，一种渐进训练方法。该方法逐层迭代地训练阈值，旨在分别获得每层的最佳阈值。

更新时间: 2024-06-10 14:31:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.06313v1

U-TELL: Unsupervised Task Expert Lifelong Learning

Continual learning (CL) models are designed to learn new tasks arriving sequentially without re-training the network. However, real-world ML applications have very limited label information and these models suffer from catastrophic forgetting. To address these issues, we propose an unsupervised CL model with task experts called Unsupervised Task Expert Lifelong Learning (U-TELL) to continually learn the data arriving in a sequence addressing catastrophic forgetting. During training of U-TELL, we introduce a new expert on arrival of a new task. Our proposed architecture has task experts, a structured data generator and a task assigner. Each task expert is composed of 3 blocks; i) a variational autoencoder to capture the task distribution and perform data abstraction, ii) a k-means clustering module, and iii) a structure extractor to preserve latent task data signature. During testing, task assigner selects a suitable expert to perform clustering. U-TELL does not store or replay task samples, instead, we use generated structured samples to train the task assigner. We compared U-TELL with five SOTA unsupervised CL methods. U-TELL outperformed all baselines on seven benchmarks and one industry dataset for various CL scenarios with a training time over 6 times faster than the best performing baseline.

Updated: 2024-06-10 14:30:19

标题: U-TELL：无监督任务专家终身学习

摘要: 持续学习（CL）模型旨在学习连续到达的新任务，而无需重新训练网络。然而，现实世界中的机器学习应用程序具有非常有限的标签信息，并且这些模型容易出现灾难性遗忘。为了解决这些问题，我们提出了一种无监督CL模型，称为无监督任务专家终身学习（U-TELL），以持续学习到按顺序到达的数据，解决灾难性遗忘问题。在U-TELL的训练过程中，我们在新任务到达时引入一个新的专家。我们提出的架构包括任务专家、结构化数据生成器和任务分配器。每个任务专家由3个模块组成：i）变分自动编码器，用于捕获任务分布并执行数据抽象，ii）k均值聚类模块，和iii）结构提取器，用于保留潜在任务数据特征。在测试期间，任务分配器选择一个合适的专家来执行聚类。U-TELL不存储或重播任务样本，而是使用生成的结构化样本来训练任务分配器。我们将U-TELL与五种最先进的无监督CL方法进行了比较。在七个基准测试和一个行业数据集上，U-TELL在各种CL场景中表现优于所有基线，并且训练时间比最佳表现基线快6倍以上。

更新时间: 2024-06-10 14:30:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.14623v2

Scaling ResNets in the Large-depth Regime

Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor $\alpha_L$. We show in a probabilistic setting that with standard i.i.d.~initializations, the only non-trivial dynamics is for $\alpha_L = \frac{1}{\sqrt{L}}$; other choices lead either to explosion or to identity mapping. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and $\alpha_L = \frac{1}{L}$. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.

Updated: 2024-06-10 14:28:26

标题: 在大深度范围内扩展ResNets

摘要: 深度ResNets以在复杂机器学习任务中取得最先进的结果而闻名。然而，这些架构的卓越性能依赖于一个需要精心设计的训练过程，以避免梯度消失或爆炸，特别是当深度$L$增加时。对于如何缓解这个问题尚未达成共识，尽管一个被广泛讨论的策略是通过将每一层的输出按因子$\alpha_L$进行缩放。我们在一个概率设置中展示，通过标准的独立同分布初始化，唯一的非平凡动态是对于$\alpha_L = \frac{1}{\sqrt{L}}$；其他选择会导致爆炸或恒等映射。这个缩放因子在连续时间极限下对应于神经随机微分方程，与一个普遍的解释相反，即深度ResNets是神经常微分方程的离散化。相比之下，在后者的情况下，通过特定相关初始化和$\alpha_L = \frac{1}{L}$可以获得稳定性。我们的分析表明，作为层索引的函数，权重的缩放和正则性之间存在强烈的相互作用。最后，在一系列实验中，我们展示了由这两个参数驱动的一系列连续的制度，这两个参数共同影响训练前后的性能。

更新时间: 2024-06-10 14:28:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2206.06929v2

An interpretable imbalanced semi-supervised deep learning framework for improving differential diagnosis of skin diseases

Dermatological diseases are among the most common disorders worldwide. This paper presents the first study of the interpretability and imbalanced semi-supervised learning of the multiclass intelligent skin diagnosis framework (ISDL) using 58,457 skin images with 10,857 unlabeled samples. Pseudo-labelled samples from minority classes have a higher probability at each iteration of class-rebalancing self-training, thereby promoting the utilization of unlabeled samples to solve the class imbalance problem. Our ISDL achieved a promising performance with an accuracy of 0.979, sensitivity of 0.975, specificity of 0.973, macro-F1 score of 0.974 and area under the receiver operating characteristic curve (AUC) of 0.999 for multi-label skin disease classification. The Shapley Additive explanation (SHAP) method is combined with our ISDL to explain how the deep learning model makes predictions. This finding is consistent with the clinical diagnosis. We also proposed a sampling distribution optimisation strategy to select pseudo-labelled samples in a more effective manner using ISDLplus. Furthermore, it has the potential to relieve the pressure placed on professional doctors, as well as help with practical issues associated with a shortage of such doctors in rural areas.

Updated: 2024-06-10 14:28:18

标题: 一个可解释的不平衡半监督深度学习框架，用于改善皮肤疾病的不同诊断

摘要: 皮肤病是全球最常见的疾病之一。本文介绍了对多类智能皮肤诊断框架（ISDL）的可解释性和不平衡半监督学习进行的首个研究，使用了58,457张皮肤图像和10,857个未标记样本。在每次迭代的类重新平衡自我训练中，来自少数类别的伪标记样本具有更高的概率，从而促进了未标记样本的利用，以解决类别不平衡问题。我们的ISDL在多标签皮肤疾病分类中表现出色，准确率为0.979，敏感性为0.975，特异性为0.973，宏F1分数为0.974，接收器操作特征曲线下面积（AUC）为0.999。Shapley Additive解释（SHAP）方法与我们的ISDL结合使用，解释深度学习模型如何进行预测。这一发现与临床诊断结果一致。我们还提出了一个采样分布优化策略，使用ISDLplus更有效地选择伪标记样本。此外，该方法有潜力减轻专业医生的压力，并帮助解决农村地区医生短缺的实际问题。

更新时间: 2024-06-10 14:28:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2211.10858v3

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.

Updated: 2024-06-10 14:25:11

标题: 使用分类插件进行值函数估计是否适用于离线强化学习？

摘要: 在深度强化学习（RL）中，通常使用深度神经网络来逼近值函数，并通过均方误差回归目标进行训练以拟合真实的值函数。最近的研究提出了一种替代方法，利用交叉熵分类目标，这种方法已经证明了RL算法的性能和可扩展性得到了改进。然而，现有研究并没有广泛地对这种替代方案在各个领域的影响进行基准测试，因为主要目标是展示这一概念在广泛任务范围内的有效性，而不是进行深入分析。我们的工作旨在在离线RL设置中通过经验研究这种替代方案的影响，并分析不同方面对性能的影响。通过在不同算法下进行各种任务的大规模实验，我们旨在深入了解这种方法的影响。我们的结果表明，引入这种改变可以在某些任务中比某些算法的最新解决方案表现出更好的性能，同时在其他任务中保持相当的性能水平，然而对于其他算法，这种修改可能会导致性能急剧下降。这些发现对于进一步将分类方法应用于研究和实际任务至关重要。

更新时间: 2024-06-10 14:25:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06309v1

Building Continuous Quantum-Classical Bayesian Neural Networks for a Classical Clinical Dataset

In this work, we are introducing a Quantum-Classical Bayesian Neural Network (QCBNN) that is capable to perform uncertainty-aware classification of classical medical dataset. This model is a symbiosis of a classical Convolutional NN that performs ultra-sound image processing and a quantum circuit that generates its stochastic weights, within a Bayesian learning framework. To test the utility of this idea for the possible future deployment in the medical sector we track multiple behavioral metrics that capture both predictive performance as well as model's uncertainty. It is our ambition to create a hybrid model that is capable to classify samples in a more uncertainty aware fashion, which will advance the trustworthiness of these models and thus bring us step closer to utilizing them in the industry. We test multiple setups for quantum circuit for this task, and our best architectures display bigger uncertainty gap between correctly and incorrectly identified samples than its classical benchmark at an expense of a slight drop in predictive performance. The innovation of this paper is two-fold: (1) combining of different approaches that allow the stochastic weights from the quantum circuit to be continues thus allowing the model to classify application-driven dataset; (2) studying architectural features of quantum circuit that make-or-break these models, which pave the way into further investigation of more informed architectural designs.

Updated: 2024-06-10 14:23:25

标题: 构建连续的量子经典贝叶斯神经网络用于经典临床数据集

摘要: 在这项工作中，我们介绍了一种量子-经典贝叶斯神经网络（QCBNN），能够对经典医疗数据集进行具有不确定性意识的分类。该模型是经典卷积神经网络和生成其随机权重的量子电路在贝叶斯学习框架内的共生体。为了测试这个想法在医疗领域可能未来部署的实用性，我们追踪多个行为指标，既捕捉了预测性能，也捕捉了模型的不确定性。我们的野心是创建一个能够以更加不确定性意识方式分类样本的混合模型，这将提高这些模型的可信度，从而使我们更接近将它们应用于工业中。我们为这项任务测试了多个量子电路设置，我们最好的架构显示出正确和错误识别样本之间的不确定性差距比其经典基准稍微降低了预测性能的代价更大。本文的创新有两个方面：（1）结合不同方法，使量子电路的随机权重能够连续，从而使模型能够对应用驱动的数据集进行分类；（2）研究量子电路的架构特征，这些特征对这些模型至关重要，为进一步研究更具见解的架构设计铺平了道路。

更新时间: 2024-06-10 14:23:25

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.06307v1

Self-Correcting Self-Consuming Loops for Generative Model Training

As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless certain conditions are met. Our paper aims to stabilize self-consuming generative model training. Our theoretical results demonstrate that by introducing an idealized correction function, which maps a data point to be more likely under the true data distribution, self-consuming loops can be made exponentially more stable. We then propose self-correction functions, which rely on expert knowledge (e.g. the laws of physics programmed in a simulator), and aim to approximate the idealized corrector automatically and at scale. We empirically validate the effectiveness of self-correcting self-consuming loops on the challenging human motion synthesis task, and observe that it successfully avoids model collapse, even when the ratio of synthetic data to real data is as high as 100%.

Updated: 2024-06-10 14:22:45

标题: 自我矫正自我消耗循环用于生成模型训练

摘要: 随着合成数据质量的提高和在互联网上的普及，机器学习模型越来越多地在人类和机器生成的数据混合上进行训练。尽管使用合成数据进行表示学习取得了成功，但使用合成数据进行生成模型训练会产生“自我消耗循环”，可能导致训练不稳定甚至崩溃，除非满足某些条件。本文旨在稳定自我消耗生成模型的训练。我们的理论结果表明，通过引入一个理想的校正函数，将数据点映射为更有可能符合真实数据分布，自我消耗循环可以变得指数级更稳定。然后，我们提出自校正函数，依赖于专业知识（例如在模拟器中编程的物理定律），旨在自动且规模化地逼近理想的校正器。我们通过在具有挑战性的人体运动合成任务上对自校正自我消耗循环的有效性进行实证验证，并观察到即使合成数据与真实数据比例高达100%，它仍成功避免了模型崩溃。

更新时间: 2024-06-10 14:22:45

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2402.07087v3

Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks

The recent release of GPT-4o has garnered widespread attention due to its powerful general capabilities. While its impressive performance is widely acknowledged, its safety aspects have not been sufficiently explored. Given the potential societal impact of risky content generated by advanced generative AI such as GPT-4o, it is crucial to rigorously evaluate its safety. In response to this question, this paper for the first time conducts a rigorous evaluation of GPT-4o against jailbreak attacks. Specifically, this paper adopts a series of multi-modal and uni-modal jailbreak attacks on 4 commonly used benchmarks encompassing three modalities (\ie, text, speech, and image), which involves the optimization of over 4,000 initial text queries and the analysis and statistical evaluation of nearly 8,000+ response on GPT-4o. Our extensive experiments reveal several novel observations: (1) In contrast to the previous version (such as GPT-4V), GPT-4o has enhanced safety in the context of text modality jailbreak; (2) The newly introduced audio modality opens up new attack vectors for jailbreak attacks on GPT-4o; (3) Existing black-box multimodal jailbreak attack methods are largely ineffective against GPT-4o and GPT-4V. These findings provide critical insights into the safety implications of GPT-4o and underscore the need for robust alignment guardrails in large models. Our code is available at \url{https://github.com/NY1024/Jailbreak_GPT4o}.

Updated: 2024-06-10 14:18:56

标题: 揭示GPT-4o的安全性：使用越狱攻击的实证研究

摘要: 最近发布的GPT-4o引起了广泛关注，因为它具有强大的通用能力。虽然其令人印象深刻的性能被广泛认可，但其安全性方面尚未得到充分探讨。鉴于像GPT-4o这样的先进生成式AI可能生成具有风险的内容对社会造成潜在影响，严格评估其安全性是至关重要的。为了回应这个问题，本文首次对GPT-4o进行了针对越狱攻击的严格评估。具体来说，本文采用了一系列多模态和单模态越狱攻击，涵盖了三种模态（即文本、语音和图像）的4个常用基准，涉及优化超过4,000个初始文本查询，以及对GPT-4o近8,000多个响应进行的分析和统计评估。我们广泛的实验揭示了几个新的发现：（1）与之前的版本（如GPT-4V）相比，GPT-4o在文本模态越狱的情况下具有增强的安全性；（2）新引入的音频模态为对GPT-4o的越狱攻击开辟了新的攻击向量；（3）现有的黑盒多模态越狱攻击方法在GPT-4o和GPT-4V上大多无效。这些发现为GPT-4o的安全影响提供了关键见解，并强调了大型模型中强大的对齐保护栏的必要性。我们的代码可在\url{https://github.com/NY1024/Jailbreak_GPT4o}上找到。

更新时间: 2024-06-10 14:18:56

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.06302v1

Geometric sparsification in recurrent neural networks

A common technique for ameliorating the computational costs of running large neural models is sparsification, or the removal of neural connections during training. Sparse models are capable of maintaining the high accuracy of state of the art models, while functioning at the cost of more parsimonious models. The structures which underlie sparse architectures are, however, poorly understood and not consistent between differently trained models and sparsification schemes. In this paper, we propose a new technique for sparsification of recurrent neural nets (RNNs), called moduli regularization, in combination with magnitude pruning. Moduli regularization leverages the dynamical system induced by the recurrent structure to induce a geometric relationship between neurons in the hidden state of the RNN. By making our regularizing term explicitly geometric, we provide the first, to our knowledge, a priori description of the desired sparse architecture of our neural net. We verify the effectiveness of our scheme for navigation and natural language processing RNNs. Navigation is a structurally geometric task, for which there are known moduli spaces, and we show that regularization can be used to reach 90% sparsity while maintaining model performance only when coefficients are chosen in accordance with a suitable moduli space. Natural language processing, however, has no known moduli space in which computations are performed. Nevertheless, we show that moduli regularization induces more stable recurrent neural nets with a variety of moduli regularizers, and achieves high fidelity models at 98% sparsity.

Updated: 2024-06-10 14:12:33

标题: 循环神经网络中的几何稀疏化

摘要: 一种减少运行大型神经模型计算成本的常用技术是稀疏化，即在训练过程中删除神经连接。稀疏模型能够保持尖端模型的高准确性，同时以更简洁模型的成本运行。然而，稀疏架构的结构目前仍不为人们所理解，并且在不同训练模型和稀疏化方案之间不一致。本文提出了一种新的递归神经网络（RNNs）稀疏化技术，称为模量正则化，结合幅度修剪。模量正则化利用递归结构诱导的动力系统，在RNN的隐藏状态中诱导神经元之间的几何关系。通过明确地使我们的正则化项具有几何性质，我们首次提供了对我们神经网络所需的稀疏架构的先验描述。我们验证了我们的方案在导航和自然语言处理RNNs中的有效性。导航是一个结构上几何化的任务，已知存在模量空间，我们表明正则化可以在系数选择符合适当的模量空间时实现90%的稀疏度并保持模型性能。然而，自然语言处理中没有已知的进行计算的模量空间。尽管如此，我们展示了模量正则化通过多种模量正则器诱导出更稳定的递归神经网络，并在98%的稀疏度下实现高保真度模型。

更新时间: 2024-06-10 14:12:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.06290v1

VS-PINN: A Fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior

Physics-informed neural networks (PINNs) have recently emerged as a promising way to compute the solutions of partial differential equations (PDEs) using deep neural networks. However, despite their significant success in various fields, it remains unclear in many aspects how to effectively train PINNs if the solutions of PDEs exhibit stiff behaviors or high frequencies. In this paper, we propose a new method for training PINNs using variable-scaling techniques. This method is simple and it can be applied to a wide range of problems including PDEs with rapidly-varying solutions. Throughout various numerical experiments, we will demonstrate the effectiveness of the proposed method for these problems and confirm that it can significantly improve the training efficiency and performance of PINNs. Furthermore, based on the analysis of the neural tangent kernel (NTK), we will provide theoretical evidence for this phenomenon and show that our methods can indeed improve the performance of PINNs.

Updated: 2024-06-10 14:11:15

标题: VS-PINN：使用可变缩放方法快速高效地训练物理启发神经网络，用于解决具有僵硬行为的偏微分方程

摘要: 物理学知识的神经网络（PINNs）最近已成为使用深度神经网络计算偏微分方程（PDEs）解的一种有前途的方法。然而，尽管它们在各个领域取得了显著成功，但在许多方面仍不清楚如何有效地训练PINNs，特别是当PDEs的解表现出僵硬行为或高频率时。在本文中，我们提出了一种使用可变缩放技术训练PINNs的新方法。这种方法简单且可以应用于包括解快速变化的PDEs在内的广泛问题。通过各种数值实验，我们将展示所提出的方法对这些问题的有效性，并确认它可以显著提高PINNs的训练效率和性能。此外，基于神经切向核（NTK）的分析，我们将为这一现象提供理论证据，并展示我们的方法确实可以改善PINNs的性能。

更新时间: 2024-06-10 14:11:15

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2406.06287v1

Outlier detection by ensembling uncertainty with negative objectness

Outlier detection is an essential capability in safety-critical applications of supervised visual recognition. Most of the existing methods deliver best results by encouraging standard closed-set models to produce low-confidence predictions in negative training data. However, that approach conflates prediction uncertainty with recognition of the negative class. We therefore reconsider direct prediction of K+1 logits that correspond to K groundtruth classes and one outlier class. This setup allows us to formulate a novel anomaly score as an ensemble of in-distribution uncertainty and the posterior of the outlier class which we term negative objectness. Now outliers can be independently detected due to i) high prediction uncertainty or ii) similarity with negative data. We embed our method into a dense prediction architecture with mask-level recognition over K+2 classes. The training procedure encourages the novel K+2-th class to learn negative objectness at pasted negative instances. Our models outperform the current state-of-the art on standard benchmarks for image-wide and pixel-level outlier detection with and without training on real negative data.

Updated: 2024-06-10 14:10:38

标题: Outlier detection by ensembling uncertainty with negative objectness 利用负对象性与不确定性集成的异常值检测

摘要: 异常检测是监督式视觉识别中安全关键应用的一个关键能力。大多数现有方法通过鼓励标准的闭集模型在负训练数据中产生低置信度预测来实现最佳结果。然而，这种方法将预测不确定性与负类别的识别混淆在一起。因此，我们重新考虑直接预测与K个地面真实类别和一个异常类别相对应的K+1个logits。这种设置使我们能够制定一个新颖的异常分数，作为内分布不确定性和异常类别的后验的集成，我们称之为负对象性。现在，由于i）高预测不确定性或ii）与负数据的相似性，异常值可以被独立检测出来。我们将我们的方法嵌入到一个密集预测架构中，该架构通过K+2个类别上的掩膜级识别。训练过程鼓励新的K+2-th类在负实例中学习负对象性。我们的模型在图像整体和像素级异常检测的标准基准上表现优于当前的最新技术，无论是否在真实负数据上进行训练。

更新时间: 2024-06-10 14:10:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.15374v2

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. The source codes and a demo page showcasing our work are available at https://kikyo-16.github.io/AIR.

Updated: 2024-06-10 14:08:17

标题: 排列、修补和精修：通过基于内容的控制实现可导向的长期音乐音频生成和编辑

摘要: 可控音乐生成在人工智能音乐共创中扮演着至关重要的角色。虽然大型语言模型（LLMs）在生成高质量音乐方面表现出了潜力，但它们对自回归生成的关注限制了它们在音乐编辑任务中的实用性。为了填补这一空白，我们提出了一种新颖的方法，利用参数高效的异质适配器结合掩码训练方案。这种方法使自回归语言模型能够无缝地处理音乐涂鸦任务。此外，我们的方法集成了基于帧级内容的控制，促进了基于轨道的音乐细化和基于乐谱的音乐编排。我们将这种方法应用于Fine-tune MusicGen，这是一个领先的自回归音乐生成模型。我们的实验展示了在多个音乐编辑任务中取得的令人期待的结果，为未来基于人工智能的音乐编辑工具提供了更灵活的控制。我们的源代码和展示我们工作的演示页面可在https://kikyo-16.github.io/AIR上找到。

更新时间: 2024-06-10 14:08:17

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2402.09508v2

MedMamba: Vision Mamba for Medical Image Classification

Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mechanism, making them difficult to deploy in real-world settings with limited computational resources. Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies while maintaining linear computational complexity. Inspired by it, we proposed MedMamba, the first vision Mamba for generalized medical image classification. Concretely, we introduced a novel hybrid basic block named SS-Conv-SSM, which integrates the convolutional layers for extracting local features with the abilities of SSM to capture long-range dependencies, aiming to model medical images from different image modalities efficiently. By employing the grouped convolution strategy and channel-shuffle operation, MedMamba successfully provides fewer model parameters and a lower computational burden for efficient applications. To demonstrate the potential of MedMamba, we conducted extensive experiments using 16 datasets containing ten imaging modalities and 411,007 images. Experimental results show that the proposed MedMamba demonstrates competitive performance in classifying various medical images compared with the state-of-the-art methods. Our work is aims to establish a new baseline for medical image classification and provide valuable insights for developing more powerful SSM-based artificial intelligence algorithms and application systems in the medical field. The source codes and all pre-trained weights of MedMamba are available at https://github.com/YubiaoYue/MedMamba.

Updated: 2024-06-10 14:06:05

标题: MedMamba：用于医学图像分类的Vision Mamba

摘要: 自深度学习时代以来，卷积神经网络（CNNs）和视觉变压器（ViTs）在医学图像分类任务中得到了广泛研究和应用。不幸的是，CNN在建模长距离依赖关系方面存在局限，导致分类性能较差。相反，ViTs受到其自注意机制二次计算复杂度的限制，使其难以在具有有限计算资源的现实世界环境中部署。最近的研究表明，由Mamba表示的状态空间模型（SSMs）可以有效地建模长距离依赖关系，同时保持线性计算复杂度。受此启发，我们提出了MedMamba，第一个用于广义医学图像分类的视觉Mamba。具体来说，我们引入了一种名为SS-Conv-SSM的新型混合基本块，该基本块将用于提取局部特征的卷积层与SSM的捕捉长距离依赖关系的能力集成在一起，旨在高效地建模来自不同图像模态的医学图像。通过采用分组卷积策略和通道混洗操作，MedMamba成功地提供了更少的模型参数和更低的计算负担，以便进行高效的应用。为了展示MedMamba的潜力，我们进行了广泛的实验，使用包含十种成像模态和411,007张图像的16个数据集。实验结果表明，与最先进的方法相比，所提出的MedMamba在分类各种医学图像方面表现出竞争性能。我们的工作旨在为医学图像分类建立新的基准，并为开发更强大的基于SSM的人工智能算法和应用系统提供有价值的见解。MedMamba的源代码和所有预训练权重可在https://github.com/YubiaoYue/MedMamba 中找到。

更新时间: 2024-06-10 14:06:05

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03849v4

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity. The key insight of PowerInfer-2 is to utilize the heterogeneous computation, memory, and I/O resources in smartphones by decomposing traditional matrix computations into fine-grained neuron cluster computations. Specifically, PowerInfer-2 features a polymorphic neuron engine that adapts computational strategies for various stages of LLM inference. Additionally, it introduces segmented neuron caching and fine-grained neuron-cluster-level pipelining, which effectively minimize and conceal the overhead caused by I/O operations. The implementation and evaluation of PowerInfer-2 demonstrate its capability to support a wide array of LLM models on two smartphones, achieving up to a 29.2x speed increase compared with state-of-the-art frameworks. Notably, PowerInfer-2 is the first system to serve the TurboSparse-Mixtral-47B model with a generation rate of 11.68 tokens per second on a smartphone. For models that fit entirely within the memory, PowerInfer-2 can achieve approximately a 40% reduction in memory usage while maintaining inference speeds comparable to llama.cpp and MLC-LLM. For more details, including a demonstration video, please visit the project site at www.powerinfer.ai/v2.

Updated: 2024-06-10 14:01:21

标题: PowerInfer-2：智能手机上快速大型语言模型推断

摘要: 这篇论文介绍了PowerInfer-2，这是一个专为智能手机上的大型语言模型（LLMs）进行高速推断而设计的框架，特别适用于模型尺寸超过设备内存容量的情况。PowerInfer-2的关键见解是通过将传统矩阵计算分解为细粒度神经元簇计算，利用智能手机中的异构计算、内存和I/O资源。具体来说，PowerInfer-2具有一个多态神经元引擎，可以为LLM推断的各个阶段适应计算策略。此外，它引入了分段神经元缓存和细粒度神经元簇级流水线，有效地减少和隐藏了由I/O操作引起的开销。PowerInfer-2的实施和评估展示了它支持多种LLM模型的能力，在两款智能手机上实现了与最先进框架相比高达29.2倍的速度提升。值得注意的是，PowerInfer-2是第一个在智能手机上为TurboSparse-Mixtral-47B模型提供11.68个令牌每秒的生成速率的系统。对于完全适合内存的模型，PowerInfer-2可以实现大约40%的内存使用减少，同时保持与llama.cpp和MLC-LLM相当的推断速度。有关更多详细信息，包括演示视频，请访问项目网站 www.powerinfer.ai/v2。

更新时间: 2024-06-10 14:01:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.06282v1

Dataset Condensation for Time Series Classification via Dual Domain Matching

Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.

Updated: 2024-06-10 13:55:22

标题: 通过双域匹配对时间序列分类进行数据集压缩

摘要: 时间序列数据已被证明在各种研究领域中至关重要。大量时间序列数据的管理在深度学习任务方面提出了挑战，特别是训练深度神经网络。最近，一种名为“数据集凝缩”的技术已经出现作为解决这个问题的方法。这种技术生成一个较小的合成数据集，在下游任务如分类中表现出与完整真实数据集相当的性能。然而，先前的方法主要设计用于图像和图形数据集，直接将它们适应到时间序列数据集会导致次优性能，因为它们无法有效利用时间序列数据中固有的丰富信息，特别是在频域中。在本文中，我们提出了一个名为“Condensation for Time Series Classification via Dual Domain Matching (CondTSC)”的新框架，专注于时间序列分类数据集凝缩任务。与先前的方法不同，我们提出的框架旨在生成一个与时间和频率域中的代理目标匹配的凝缩数据集。具体而言，CondTSC结合了多视图数据增强、双域训练和双代理目标，以增强在时间和频率域中的数据集凝缩过程。通过大量实验，我们展示了我们提出的框架的有效性，它优于其他基线，并学习到一个具有理想特性的凝缩合成数据集，如符合原始数据的分布。

更新时间: 2024-06-10 13:55:22

领域: cs.LG

下载: http://arxiv.org/abs/2403.07245v3

FreeVA: Offline MLLM as Training-Free Video Assistant

This paper undertakes an empirical study to revisit the latest advancements in Multimodal Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to extend existing image-based MLLM to the video domain in a training-free manner. The study provides an essential, yet must-know baseline, and reveals several surprising findings: 1) FreeVA, leveraging only offline image-based MLLM without additional training, excels in zero-shot video question-answering (e.g., MSVD-QA, ActivityNet-QA, and MSRVTT-QA), even surpassing state-of-the-art methods that involve video instruction tuning. 2) While mainstream video-based MLLMs typically initialize with an image-based MLLM (e.g., LLaVA) and then fine-tune using video instruction tuning, the study indicates that utilizing the widely adopted VideoInstruct-100K for video instruction tuning doesn't actually lead to better performance compared to not training at all. 3) The commonly used evaluation metrics in existing works are significantly influenced by changes in the GPT API version over time. If ignored, this could affect the fairness and uniformity of comparisons between different methods and impact the analysis and judgment of researchers in the field. The advancement of MLLMs is currently thriving, drawing numerous researchers into the field. We aim for this work to serve as a plug-and-play, simple yet effective baseline, encouraging the direct evaluation of existing MLLMs in video domain while also standardizing the field of video conversational models to a certain extent. Also, we encourage researchers to reconsider: Have current video MLLM methods truly acquired knowledge beyond image MLLM? Code is available at https://github.com/whwu95/FreeVA

Updated: 2024-06-10 13:55:21

标题: FreeVA：离线MLLM作为无需训练的视频助手

摘要: 本文进行了一项实证研究，重新审视了多模态大语言模型（MLLMs）的最新进展：视频助手。这项研究，即FreeVA，旨在以无需训练的方式将现有的基于图像的MLLM扩展到视频领域。该研究提供了一个必不可少但必须了解的基线，并揭示了一些令人惊讶的发现：1）FreeVA仅利用离线图像MLLM在没有额外训练的情况下在零样本视频问答（例如MSVD-QA、ActivityNet-QA和MSRVTT-QA）方面表现出色，甚至超过了涉及视频指导微调的最先进方法。2）尽管主流的基于视频的MLLM通常使用基于图像的MLLM（例如LLaVA）进行初始化，然后通过视频指导微调进行微调，但该研究表明，利用广泛采用的VideoInstruct-100K进行视频指导微调实际上并没有比根本不进行训练更好的性能。3）现有作品中常用的评估指标受GPT API版本随时间变化的影响显著。如果忽视这一点，可能会影响不同方法之间的公平性和一致性比较，并影响该领域研究人员的分析和判断。MLLM的进展目前正在蓬勃发展，吸引了众多研究人员进入这一领域。我们希望这项工作能够作为一个即插即用、简单而有效的基线，鼓励直接评估视频领域现有的MLLM，并在一定程度上规范视频会话模型领域。此外，我们鼓励研究人员重新考虑：当前的视频MLLM方法是否真正超越了图像MLLM？代码可在https://github.com/whwu95/FreeVA上找到。

更新时间: 2024-06-10 13:55:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.07798v2

Multifidelity digital twin for real-time monitoring of structural dynamics in aquaculture net cages

As the global population grows and climate change intensifies, sustainable food production is critical. Marine aquaculture offers a viable solution, providing a sustainable protein source. However, the industry's expansion requires novel technologies for remote management and autonomous operations. Digital twin technology can advance the aquaculture industry, but its adoption has been limited. Fish net cages, which are flexible floating structures, are critical yet vulnerable components of aquaculture farms. Exposed to harsh and dynamic marine environments, the cages experience significant loads and risk damage, leading to fish escapes, environmental impacts, and financial losses. We propose a multifidelity surrogate modeling framework for integration into a digital twin for real-time monitoring of aquaculture net cage structural dynamics under stochastic marine conditions. Central to this framework is the nonlinear autoregressive Gaussian process method, which learns complex, nonlinear cross-correlations between models of varying fidelity. It combines low-fidelity simulation data with a small set of high-fidelity field sensor measurements, which offer the real dynamics but are costly and spatially sparse. Validated at the SINTEF ACE fish farm in Norway, our digital twin receives online metocean data and accurately predicts net cage displacements and mooring line loads, aligning closely with field measurements. The proposed framework is beneficial where application-specific data are scarce, offering rapid predictions and real-time system representation. The developed digital twin prevents potential damages by assessing structural integrity and facilitates remote operations with unmanned underwater vehicles. Our work also compares GP and GCNs for predicting net cage deformation, highlighting the latter's effectiveness in complex structural applications.

Updated: 2024-06-10 13:52:32

标题: 多信度数字孪生体用于水产养殖网箱结构动力学的实时监测

摘要: 随着全球人口增长和气候变化加剧，可持续食品生产至关重要。海洋水产养殖提供了一个可行的解决方案，提供了可持续的蛋白质来源。然而，该行业的扩张需要远程管理和自主运营的新技术。数字孪生技术可以推动水产养殖业的发展，但其应用受到限制。鱼网笼是水产养殖场关键但易受损的组成部分，是柔性浮动结构。暴露在恶劣和动态的海洋环境中，笼子承受重大载荷并面临损坏风险，导致鱼类逃逸、环境影响和财务损失。我们提出了一个多保真度代理建模框架，用于集成到数字孪生系统中，实时监测随机海洋条件下水产养殖网笼的结构动态。该框架的核心是非线性自回归高斯过程方法，可以学习不同保真度模型之间的复杂、非线性交叉相关性。它将低保真度模拟数据与少量高保真度现场传感器测量数据相结合，后者提供真实的动态但成本高昂且空间稀疏。在挪威SINTEF ACE鱼场进行验证后，我们的数字孪生系统接收在线气象海洋数据，并准确预测网笼位移和锚线载荷，与现场测量结果密切匹配。所提出的框架在应用特定数据稀缺的情况下具有益处，提供快速预测和实时系统表示。开发的数字孪生系统通过评估结构完整性来防止潜在损害，并借助无人水下车辆实现远程运营。我们的工作还比较了高斯过程和图卷积网络在预测网笼变形方面的效果，突出了后者在复杂结构应用中的有效性。

更新时间: 2024-06-10 13:52:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.04519v2

PRewrite: Prompt Rewriting with Reinforcement Learning

Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications? To address these problems, we investigate automated prompt engineering in this paper. Specifically, we propose PRewrite, an automated method to rewrite an under-optimized prompt to a more effective prompt. We instantiate the prompt rewriter using a LLM. The rewriter LLM is trained using reinforcement learning to optimize the performance on a given downstream task. We conduct experiments on diverse benchmark datasets, which demonstrates the effectiveness of PRewrite.

Updated: 2024-06-10 13:46:22

标题: PRewrite：使用强化学习的提示重写

摘要: 提示工程对于基于LLM的应用程序的开发至关重要。然而，通常以“试错”的方式手动完成，这可能耗时、低效且不够优化。即使对于那些表面上运作良好的提示，仍然存在一个困扰的问题：是否可以通过进一步修改使提示更好？为了解决这些问题，我们在本文中研究了自动提示工程。具体地，我们提出了PRewrite，一种自动方法，将一个未经优化的提示重写为更有效的提示。我们使用LLM实例化提示重写器。重写器LLM使用强化学习进行训练，以优化在给定下游任务上的性能。我们在多样化的基准数据集上进行实验，展示了PRewrite的有效性。

更新时间: 2024-06-10 13:46:22

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2401.08189v4

American Sign Language Handshapes Reflect Pressures for Communicative Efficiency

Communicative efficiency is a key topic in linguistics and cognitive psychology, with many studies demonstrating how the pressure to communicate with minimal effort guides the form of natural language. However, this phenomenon is rarely explored in signed languages. This paper shows how handshapes in American Sign Language (ASL) reflect these efficiency pressures and provides new evidence of communicative efficiency in the visual-gestural modality. We focus on hand configurations in native ASL signs and signs borrowed from English to compare efficiency pressures from both ASL and English usage. First, we develop new methodologies to quantify the articulatory effort needed to produce handshapes and the perceptual effort required to recognize them. Then, we analyze correlations between communicative effort and usage statistics in ASL or English. Our findings reveal that frequent ASL handshapes are easier to produce and that pressures for communicative efficiency mostly come from ASL usage, rather than from English lexical borrowing.

Updated: 2024-06-10 13:45:36

标题: 美国手语手势形状反映了沟通效率的压力

摘要: 交际效率是语言学和认知心理学中的一个关键主题，许多研究表明，为了以最小的努力进行交流而产生的压力如何影响自然语言的形式。然而，这种现象在手语中很少被探讨。本文展示了美国手语（ASL）中手形如何反映这些效率压力，并提供了视觉手势模式中交际效率的新证据。我们重点研究母语ASL手势和从英语借用的手势的手形配置，以比较ASL和英语使用中的效率压力。首先，我们开发了新的方法来量化产生手形所需的发音努力和识别手形所需的感知努力。然后，我们分析了ASL或英语中交际努力和使用统计数据之间的相关性。我们的研究发现，频繁出现的ASL手形更容易产生，而交际效率的压力主要来自ASL的使用，而不是英语的词汇借用。

更新时间: 2024-06-10 13:45:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.04024v2

Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning

Structural modularity is a pervasive feature of biological neural networks, which have been linked to several functional and computational advantages. Yet, the use of modular architectures in artificial neural networks has been relatively limited despite early successes. Here, we explore the performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum. We find that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations. We further examine how different aspects of a modular network's connectivity contribute to its computational capability. We then demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed and only the connections between modules are trained. Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales, and help build more scalable and compressible artificial networks.

Updated: 2024-06-10 13:44:07

标题: 分层网络的模块化增长：高效、通用和稳健的课程学习

摘要: 结构模块化是生物神经网络的一种普遍特征，已与多种功能和计算优势相关联。然而，虽然早期取得了成功，但人工神经网络中对模块化架构的使用相对有限。在这里，我们通过迭代增长课程探讨了一个模块化网络在通过记忆任务训练时的性能和功能动态。我们发现，对于给定的经典非模块化递归神经网络（RNN），一个等效的模块化网络在多个指标上表现更好，包括训练时间、泛化能力和对某些扰动的稳健性。我们进一步研究了模块化网络连通性的不同方面如何影响其计算能力。然后，我们证明了模块化拓扑引入的归纳偏差足够强，使得网络在模块内连通性固定且只训练模块间连接时也能表现良好。我们的发现表明，逐渐模块化增长RNNs可能在演化时间尺度上为学习越来越复杂的任务提供优势，并有助于构建更具可扩展性和可压缩性的人工网络。

更新时间: 2024-06-10 13:44:07

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.06262v1

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.

Updated: 2024-06-10 13:43:21

标题: 拜占庭-稳健的联邦学习：客户子采样和本地更新的影响

摘要: 可能存在对抗性（又名拜占庭）客户，这使得联邦学习（FL）容易受到任意操纵。使FL对抗性客户具有鲁棒性的自然方法是通过在标准$\mathsf{FedAvg}$算法中服务器上的简单平均操作替换为\emph{鲁棒平均规则}。虽然已经有大量工作致力于研究联邦{\em 鲁棒平均}（我们将其表示为$\mathsf{FedRo}$）的收敛性，但先前的工作主要忽略了{\em 客户子采样}和{\em 本地步骤}这两个基本的FL特征的影响。虽然客户子采样增加了拜占庭客户的有效比例，本地步骤增加了通过诚实（即非拜占庭）客户计算的本地更新之间的漂移。因此，对$\mathsf{FedRo}$的粗心部署可能导致性能不佳。我们通过对$\mathsf{FedRo}$进行深入分析来验证这一观察，紧密分析客户子采样和本地步骤的影响。具体而言，我们提出了对客户子采样的充分条件，以实现$\mathsf{FedRo}$的近乎最优收敛性（对于光滑非凸损失）。此外，我们还展示了学习精度的改善速度与子采样客户数量呈{\em 减小}关系，一旦样本量超过阈值。有趣的是，我们还观察到，在谨慎选择步长的情况下，由于拜占庭客户造成的学习错误会随着本地步骤的增加而减少。我们通过在FEMNIST和CIFAR-$10$图像分类任务上进行实验来验证我们的理论。

更新时间: 2024-06-10 13:43:21

领域: cs.LG

下载: http://arxiv.org/abs/2402.12780v2

What All the PHUZZ Is About: A Coverage-guided Fuzzer for Finding Vulnerabilities in PHP Web Applications

Coverage-guided fuzz testing has received significant attention from the research community, with a strong focus on binary applications, greatly disregarding other targets, such as web applications. The importance of the World Wide Web in everyone's life cannot be overstated, and to this day, many web applications are developed in PHP. In this work, we address the challenges of applying coverage-guided fuzzing to PHP web applications and introduce PHUZZ, a modular fuzzing framework for PHP web applications. PHUZZ uses novel approaches to detect more client-side and server-side vulnerability classes than state-of-the-art related work, including SQL injections, remote command injections, insecure deserialization, path traversal, external entity injection, cross-site scripting, and open redirection. We evaluate PHUZZ on a diverse set of artificial and real-world web applications with known and unknown vulnerabilities, and compare it against a variety of state-of-the-art fuzzers. In order to show PHUZZ' effectiveness, we fuzz over 1,000 API endpoints of the 115 most popular WordPress plugins, resulting in over 20 security issues and 2 new CVE-IDs. Finally, we make the framework publicly available to motivate and encourage further research on web application fuzz testing.

Updated: 2024-06-10 13:43:07

标题: 关于PHUZZ的一切：一种用于发现PHP Web应用程序中漏洞的覆盖引导模糊测试工具

摘要: 基于覆盖率的模糊测试在研究界受到了广泛关注，主要集中在二进制应用程序，大大忽视了其他目标，如Web应用程序。每个人生活中对世界各地网络的重要性不言而喁，迄今为止，许多Web应用程序是用PHP开发的。在这项工作中，我们解决了将基于覆盖率的模糊测试应用于PHP Web应用程序所面临的挑战，并介绍了PHUZZ，一个用于PHP Web应用程序的模块化模糊测试框架。PHUZZ使用新颖的方法来检测比现有相关工作更多的客户端和服务器端漏洞类别，包括SQL注入、远程命令注入、不安全的反序列化、路径遍历、外部实体注入、跨站脚本和开放重定向。我们在一组人工和真实世界的具有已知和未知漏洞的Web应用程序上评估了PHUZZ，并将其与各种最先进的模糊测试工具进行了比较。为了展示PHUZZ的有效性，我们对最受欢迎的115个WordPress插件的1000多个API端点进行了模糊测试，结果发现了超过20个安全问题和2个新的CVE-ID。最后，我们公开提供了该框架，以激励和鼓励进一步开展Web应用程序模糊测试方面的研究。

更新时间: 2024-06-10 13:43:07

领域: cs.CR

下载: http://arxiv.org/abs/2406.06261v1

Random Time-hopping Secure Ranging Strategy Against Distance-Reduction Attacks in UWB

In order to mitigate the distance reduction attack in Ultra-Wide Band (UWB) ranging, this paper proposes a secure ranging scheme based on a random time-hopping mechanism without redundant signaling overhead. Additionally, a secure ranging strategy is designed for backward compatibility with existing standards such as IEEE 802.15.4a/z, combined with an attack detection scheme. The effectiveness and feasibility of the proposed strategy are demonstrated through both simulation and experimental results in the case of the Ghost Peak attack, as demonstrated by Patrick Leu et al. The random time-hopping mechanism is verified to be capable of reducing the success rate of distance reduction attacks to less than 0.01%, thereby significantly enhancing the security of UWB ranging.

Updated: 2024-06-10 13:33:06

标题: 随机时间跳跃安全测距策略抵御UWB中的距离缩短攻击

摘要: 为了减轻超宽带（UWB）测距中的距离缩减攻击，本文提出了一种基于随机时间跳频机制的安全测距方案，无需冗余信令开销。此外，设计了一种安全测距策略，与现有标准（如IEEE 802.15.4a/z）向后兼容，结合攻击检测方案。通过模拟和实验结果，在Ghost Peak攻击的情况下，Patrick Leu等人证明了所提出策略的有效性和可行性。随机时间跳频机制被验证能够将距离缩减攻击的成功率降至低于0.01%，从而显著增强了UWB测距的安全性。

更新时间: 2024-06-10 13:33:06

领域: eess.SP,cs.CR,H.1.1

下载: http://arxiv.org/abs/2406.06252v1

Resonance RoPE: Improving Context Length Generalization of Large Language Models

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences. We introduce Resonance RoPE, a novel approach designed to narrow the generalization gap in TSTL scenarios by refining the interpolation of RoPE features for OOD positions, significantly improving the model performance without additional online computational costs. Furthermore, we present PosGen, a new synthetic benchmark specifically designed for fine-grained behavior analysis in TSTL scenarios, aiming to isolate the constantly increasing difficulty of token generation on long contexts from the challenges of recognizing new token positions. Our experiments on synthetic tasks show that after applying Resonance RoPE, Transformers recognize OOD position better and more robustly. Our extensive LLM experiments also show superior performance after applying Resonance RoPE to the current state-of-the-art RoPE scaling method, YaRN, on both upstream language modeling tasks and a variety of downstream long-text applications.

Updated: 2024-06-10 13:30:34

标题: 共鸣 RoPE：改善大型语言模型的上下文长度泛化

摘要: 这篇论文讨论了在配备旋转位置嵌入（RoPE）的大型语言模型（LLMs）中，面临短序列训练-长序列测试（TSTL）场景的挑战，其中在较长序列中的分布外（OOD）标记位置上，预先训练的模型面临困难。我们引入了共振RoPE，这是一种新颖的方法，旨在通过调整RoPE特征的插值，为OOD位置缩小TSTL场景中的泛化差距，显著提高模型性能，而不增加额外的在线计算成本。此外，我们提出了PosGen，一个专门设计用于TSTL场景中细粒度行为分析的新合成基准，旨在将长上下文中令牌生成的不断增加的困难与识别新令牌位置的挑战隔离开来。我们在合成任务上的实验表明，应用共振RoPE后，变压器更好地且更稳健地识别OOD位置。我们的大量LLM实验还表明，在应用共振RoPE到当前最先进的RoPE缩放方法YaRN后，上游语言建模任务和各种下游长文本应用的性能均有所提升。

更新时间: 2024-06-10 13:30:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.00071v2

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance, especially as models scale. Using insights from the Maximal Update Parameterization, we determine the optimal scaling for initialization and learning rates of these unconventional layers. Finally, we measure the scaling laws of different structures to compare how quickly their performance improves with compute. We propose a novel matrix family containing Monarch matrices, the Block Tensor-Train (BTT), which we show performs better than dense matrices for the same compute on multiple tasks. On CIFAR-10/100 with augmentation, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs. BTT matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute and is more efficient than dense for training small GPT-2 language models.

Updated: 2024-06-10 13:25:43

标题: 更好的计算利用：用结构化矩阵替代密集层

摘要: 密集的线性层是基础模型中的主要计算瓶颈。寻找比密集矩阵更高效的替代方案对于构建更高效的模型具有巨大潜力，就像卷积网络在图像领域的成功所示。在这项工作中，我们系统地探索了结构化矩阵作为密集矩阵的替代品。我们展示了不同的结构通常需要截然不同的初始化尺度和学习速率，这对性能至关重要，特别是随着模型规模的扩大。利用Maximal Update Parameterization的见解，我们确定了这些非传统层初始化和学习速率的最佳缩放。最后，我们测量了不同结构的缩放规律，以比较它们的性能随计算速度提高的速度。我们提出了一个包含Monarch矩阵的新型矩阵家族，即Block Tensor-Train（BTT），我们展示了在多项任务中，与密集矩阵相比，BTT在相同计算量下表现更好。在带有数据增强的CIFAR-10/100上，BTT在训练MLPs和ViTs时比密集矩阵实现了指数级别的更低训练损失。在ImageNet-1k上，BTT与密集ViT-S/32的性能相匹配，计算量减少3.8倍，并且比密集训练小型GPT-2语言模型更有效。

更新时间: 2024-06-10 13:25:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.06248v1

Data-Efficient Learning with Neural Programs

Many computational tasks can be naturally expressed as a composition of a DNN followed by a program written in a traditional programming language or an API call to an LLM. We call such composites "neural programs" and focus on the problem of learning the DNN parameters when the training data consist of end-to-end input-output labels for the composite. When the program is written in a differentiable logic programming language, techniques from neurosymbolic learning are applicable, but in general, the learning for neural programs requires estimating the gradients of black-box components. We present an algorithm for learning neural programs, called ISED, that only relies on input-output samples of black-box components. For evaluation, we introduce new benchmarks that involve calls to modern LLMs such as GPT-4 and also consider benchmarks from the neurosymolic learning literature. Our evaluation shows that for the latter benchmarks, ISED has comparable performance to state-of-the-art neurosymbolic frameworks. For the former, we use adaptations of prior work on gradient approximations of black-box components as a baseline, and show that ISED achieves comparable accuracy but in a more data- and sample-efficient manner.

Updated: 2024-06-10 13:23:00

标题: 用神经程序进行数据高效学习

摘要: 许多计算任务可以自然地表达为一个由DNN组成的程序，后跟一个传统编程语言编写的程序或LLM的API调用。我们将这样的组合称为“神经程序”，并专注于当训练数据由组合的端到端输入输出标签组成时，学习DNN参数的问题。当程序是用可微逻辑编程语言编写时，神经符号学习技术是适用的，但总的来说，神经程序的学习需要估计黑盒组件的梯度。我们提出了一种用于学习神经程序的算法，称为ISED，它仅依赖于黑盒组件的输入输出样本。为了评估，我们引入了涉及对现代LLM（如GPT-4）的调用的新基准，并考虑了神经符号学习文献中的基准。我们的评估显示，对于后一组基准，ISED与最先进的神经符号框架具有可比性能。对于前一组基准，我们使用以前关于黑盒组件梯度近似的基线的改编，并显示ISED以更高效地使用数据和样本的方式实现了可比的准确性。

更新时间: 2024-06-10 13:23:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.06246v1

Deep Reinforcement Learning from Hierarchical Preference Design

Reward design is a fundamental, yet challenging aspect of reinforcement learning (RL). Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals. This paper shows by exploiting certain structures, one can ease the reward design process. Specifically, we propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning. Both scenarios allow us to design a hierarchical decision tree induced by the importance ranking of the feedback signals to compare RL trajectories. With such preference data, we can then train a reward model for policy learning. We apply HERON to several RL applications, and we find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at \url{https://github.com/abukharin3/HERON}.

Updated: 2024-06-10 13:22:42

标题: 层次偏好设计中的深度强化学习

摘要: 奖励设计是强化学习（RL）中的一个基本且具有挑战性的方面。研究人员通常利用来自环境的反馈信号来手工制定奖励函数，但由于反馈信号的变化规模和复杂依赖关系，这一过程并不总是有效的。本文通过利用某些结构，展示了可以简化奖励设计过程。具体而言，我们提出了一个层次化奖励建模框架-- HERON，适用于以下情景：（I）反馈信号自然存在层次结构；（II）奖励是稀疏的，但具有不太重要的替代反馈以帮助策略学习。这两种情景都允许我们设计一个由反馈信号重要性排名引导的层次决策树，以比较RL轨迹。通过这样的偏好数据，我们可以为策略学习训练一个奖励模型。我们将HERON应用于几个RL应用程序，并发现我们的框架不仅可以训练出在各种困难任务上表现优异的代理，还可以提供额外的好处，如提高样本效率和稳健性。我们的代码可在\url{https://github.com/abukharin3/HERON}上找到。

更新时间: 2024-06-10 13:22:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.02632v3

Digital assistant in a point of sales

This article investigates the deployment of a Voice User Interface (VUI)-powered digital assistant in a retail setting and assesses its impact on customer engagement and service efficiency. The study explores how digital assistants can enhance user interactions through advanced conversational capabilities with multilingual support. By integrating a digital assistant into a high-traffic retail environment, we evaluate its effectiveness in improving the quality of customer service and operational efficiency. Data collected during the experiment demonstrate varied impacts on customer interaction, revealing insights into the future optimizations of digital assistant technologies in customer-facing roles. This study contributes to the understanding of digital transformation strategies within the customer relations domain emphasizing the need for service flexibility and user-centric design in modern retail stores.

Updated: 2024-06-10 13:20:33

标题: 数字助理在销售点

摘要: 本文研究了在零售环境中部署语音用户界面（VUI）驱动的数字助手，并评估了它对客户参与和服务效率的影响。研究探讨了数字助手如何通过具有多语言支持的先进对话能力来增强用户交互。通过将数字助手整合到高流量的零售环境中，我们评估了它在改善客户服务质量和运营效率方面的有效性。实验期间收集的数据显示了数字助手对客户互动的不同影响，揭示了数字助手技术在面向客户角色中未来优化的见解。本研究有助于理解客户关系领域的数字转型战略，强调了现代零售店在服务灵活性和用户中心设计方面的需求。

更新时间: 2024-06-10 13:20:33

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.04851v2

SecureNet: A Comparative Study of DeBERTa and Large Language Models for Phishing Detection

Phishing, whether through email, SMS, or malicious websites, poses a major threat to organizations by using social engineering to trick users into revealing sensitive information. It not only compromises company's data security but also incurs significant financial losses. In this paper, we investigate whether the remarkable performance of Large Language Models (LLMs) can be leveraged for particular task like text classification, particularly detecting malicious content and compare its results with state-of-the-art Deberta V3 (DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing) model. We systematically assess the potential and limitations of both approaches using comprehensive public datasets comprising diverse data sources such as email, HTML, URL, SMS, and synthetic data generation. Additionally, we demonstrate how LLMs can generate convincing phishing emails, making it harder to spot scams and evaluate the performance of both models in this context. Our study delves further into the challenges encountered by DeBERTa V3 during its training phases, fine-tuning methodology and transfer learning processes. Similarly, we examine the challenges associated with LLMs and assess their respective performance. Among our experimental approaches, the transformer-based DeBERTa method emerged as the most effective, achieving a test dataset (HuggingFace phishing dataset) recall (sensitivity) of 95.17% closely followed by GPT-4 providing a recall of 91.04%. We performed additional experiments with other datasets on the trained DeBERTa V3 model and LLMs like GPT 4 and Gemini 1.5. Based on our findings, we provide valuable insights into the effectiveness and robustness of these advanced language models, offering a detailed comparative analysis that can inform future research efforts in strengthening cybersecurity measures for detecting and mitigating phishing threats.

Updated: 2024-06-10 13:13:39

标题: SecureNet：DeBERTa和大型语言模型在钓鱼检测中的比较研究

摘要: 网络钓鱼，无论是通过电子邮件、短信还是恶意网站，都通过社会工程学来欺骗用户，以获取敏感信息，对组织构成重大威胁。它不仅损害了公司的数据安全，还造成了巨大的财务损失。本文研究了大型语言模型（LLMs）在文本分类等特定任务中的卓越表现是否可以用于检测恶意内容，并将其结果与最先进的DeBERTa V3（使用ELECTRA风格的预训练与梯度解耦嵌入共享的DeBERTa）模型进行比较。我们系统地评估了这两种方法的潜力和局限性，使用包括电子邮件、HTML、URL、短信和合成数据在内的全面公共数据集。此外，我们演示了LLMs如何生成令人信服的网络钓鱼邮件，使识别诈骗更加困难，并评估了两种模型在这种情况下的性能。我们的研究进一步探讨了DeBERTa V3在训练阶段、微调方法和迁移学习过程中遇到的挑战。同样，我们考察了LLMs面临的挑战，并评估了它们的性能。在我们的实验方法中，基于变压器的DeBERTa方法表现最为有效，达到了95.17%的测试数据集（HuggingFace网络钓鱼数据集）召回率（敏感度），紧随其后的是GPT-4，提供了91.04%的召回率。我们对经过训练的DeBERTa V3模型以及GPT 4和Gemini 1.5等LLMs进行了额外实验。根据我们的研究结果，我们提供了对这些先进语言模型的有效性和稳健性的宝贵见解，提供了详细的比较分析，可为未来强化网络安全措施以检测和减轻网络钓鱼威胁的研究工作提供参考。

更新时间: 2024-06-10 13:13:39

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06663v1

FusionINN: Decomposable Image Fusion for Brain Tumor Monitoring

Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel decomposable image fusion framework, capable of efficiently generating fused images and also decomposing them back to the source images. FusionINN is designed to be bijective by including a latent image alongside the fused image, while ensuring minimal transfer of information from the source images to the latent representation. To the best of our knowledge, we are the first to investigate the decomposability of fused images, which is particularly crucial for life-sensitive applications such as medical image fusion compared to other tasks like multi-focus or multi-exposure image fusion. Our extensive experimentation validates FusionINN over existing discriminative and generative fusion methods, both subjectively and objectively. Moreover, compared to a recent denoising diffusion-based fusion model, our approach offers faster and qualitatively better fusion results.

Updated: 2024-06-10 13:09:53

标题: FusionINN：可分解的图像融合用于脑肿瘤监测

摘要: 图像融合通常使用不可逆的神经网络将多个源图像合并成单个融合图像。然而，对于临床专家来说，仅依靠融合图像可能不足以做出诊断决策，因为融合机制会混合来自源图像的特征，从而使解释潜在肿瘤病理变得困难。我们引入了FusionINN，这是一个新颖的可分解图像融合框架，能够有效地生成融合图像，并将其分解回源图像。FusionINN被设计为可逆的，通过在融合图像旁边包含一个潜在图像，同时确保从源图像到潜在表示的信息传输最小化。据我们所知，我们是第一个研究融合图像可分解性的团队，这对于像医学图像融合这样的生命敏感应用非常关键，与其他任务如多焦点或多曝光图像融合相比。我们进行了大量实验，主观和客观地验证了FusionINN相对于现有的辨别和生成融合方法的有效性。此外，与最近基于去噪扩散的融合模型相比，我们的方法提供了更快速和质量更好的融合结果。

更新时间: 2024-06-10 13:09:53

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.15769v3

Efficient Neural Compression with Inference-time Decoding

This paper explores the combination of neural network quantization and entropy coding for memory footprint minimization. Edge deployment of quantized models is hampered by the harsh Pareto frontier of the accuracy-to-bitwidth tradeoff, causing dramatic accuracy loss below a certain bitwidth. This accuracy loss can be alleviated thanks to mixed precision quantization, allowing for more flexible bitwidth allocation. However, standard mixed precision benefits remain limited due to the 1-bit frontier, that forces each parameter to be encoded on at least 1 bit of data. This paper introduces an approach that combines mixed precision, zero-point quantization and entropy coding to push the compression boundary of Resnets beyond the 1-bit frontier with an accuracy drop below 1% on the ImageNet benchmark. From an implementation standpoint, a compact decoder architecture features reduced latency, thus allowing for inference-compatible decoding.

Updated: 2024-06-10 13:07:13

标题: 高效神经压缩与推理时间解码

摘要: 本文探讨了神经网络量化和熵编码相结合以减小内存占用的方法。量化模型的边缘部署受到了精度与比特宽度权衡的严格帕累托前沿的阻碍，导致在一定比特宽度以下出现了严重的精度损失。这种精度损失可以通过混合精度量化来缓解，允许更灵活的比特宽度分配。然而，由于1比特前沿的限制，标准混合精度的优势仍然有限，这迫使每个参数至少要在1比特的数据上进行编码。本文引入了一种方法，结合了混合精度、零点量化和熵编码，将ResNet的压缩边界推向了超越1比特前沿的方向，在ImageNet基准测试中精度下降不到1%。从实现的角度来看，紧凑的解码器架构具有降低的延迟，因此可以进行推断兼容的解码。

更新时间: 2024-06-10 13:07:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.06237v1

Statistical Inference for Privatized Data with Unknown Sample Size

We develop both theory and algorithms to analyze privatized data in the unbounded differential privacy(DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that ABC-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to zero, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. In order to facilitate valid, finite-sample Bayesian inference on privatized data in the unbounded DP setting, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al. (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.

Updated: 2024-06-10 13:03:20

标题: 未知样本大小的私有数据的统计推断

摘要: 我们发展了理论和算法来分析无界差分隐私(DP)中的私有化数据，即使样本大小也被视为需要隐私保护的敏感数量。我们表明，在样本大小$n$趋于无穷大时，无界DP和有界DP之间的采样分布之间的距离趋于零，前提是用于私有化$n$的噪声以适当的速率使用；我们还建立了ABC类型的后验分布在类似假设下收敛。我们进一步在隐私预算为$n$趋于零的情况下给出了渐近结果，建立了采样分布的相似性，并显示无界设置中的MLE收敛到有界DP的MLE。为了在无界DP设置中对私有化数据进行有效的有限样本贝叶斯推断，我们提出了一个可逆跳跃MCMC算法，扩展了Ju等人(2022)的数据增广MCMC。我们还提出了一个蒙特卡洛EM算法，用于计算有界DP和无界DP中的私有化数据的MLE。我们应用我们的方法来分析线性回归模型以及使用Dirichlet分布建模的2019年美国时间使用调查微数据文件。

更新时间: 2024-06-10 13:03:20

领域: math.ST,cs.CR,stat.CO,stat.TH

下载: http://arxiv.org/abs/2406.06231v1

Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs

Establishing causal relationships between actions and outcomes is fundamental for accountable multi-agent decision-making. However, interpreting and quantifying agents' contributions to such relationships pose significant challenges. These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depends on how other agents respond to that action. In this paper, our objective is to present a systematic approach for attributing the causal effects of agents' actions to the influence they exert on other agents. Focusing on multi-agent Markov decision processes, we introduce agent-specific effects (ASE), a novel causal quantity that measures the effect of an agent's action on the outcome that propagates through other agents. We then turn to the counterfactual counterpart of ASE (cf-ASE), provide a sufficient set of conditions for identifying cf-ASE, and propose a practical sampling-based algorithm for estimating it. Finally, we experimentally evaluate the utility of cf-ASE through a simulation-based testbed, which includes a sepsis management environment.

Updated: 2024-06-10 13:01:30

标题: 特定代理效应：多代理MDPs 中的因果效应传播分析

摘要: 建立行动与结果之间的因果关系对于可追溯的多智能体决策至关重要。然而，解释和量化智能体对这种关系的贡献会面临重大挑战。在多智能体顺序决策的背景下，这些挑战尤为突出，因为一个智能体的行动对结果的因果影响取决于其他智能体如何对该行动做出响应。本文的目标是提出一种系统性方法，将智能体的行动的因果效应归因于它们对其他智能体施加的影响。我们专注于多智能体马尔可夫决策过程，引入了智能体特定效应（ASE），这是一种衡量智能体行动对通过其他智能体传播的结果的影响的新颖因果数量。然后，我们转向ASE的反事实对应（cf-ASE），提供了一个确定cf-ASE的充分条件，并提出了一个基于实际抽样的算法来估计它。最后，我们通过一个基于模拟的测试平台，包括一个感染管理环境，实验评估了cf-ASE的效用。

更新时间: 2024-06-10 13:01:30

领域: cs.AI

下载: http://arxiv.org/abs/2310.11334v3

PAC-Bayesian Soft Actor-Critic Learning

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused mainly by the destructive effect of the approximation errors of the critic on the actor. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm. We further demonstrate that online learning performance improves significantly when a stochastic actor explores multiple futures by critic-guided random search. We observe our resulting algorithm to compare favorably against the state-of-the-art SAC implementation on multiple classical control and locomotion tasks in terms of both sample efficiency and regret.

Updated: 2024-06-10 12:53:36

标题: PAC-Bayesian 软Actor-Critic 学习

摘要: 演员-评论家算法解决了强化学习（RL）的双重目标，通过两个单独的函数逼近器进行策略评估和改进。这种方法的实用性是以训练的不稳定性为代价的，主要是由于评论家的逼近误差对演员的破坏性影响。我们通过首次将现有的大概近似正确（PAC）贝叶斯界限作为Soft Actor-Critic（SAC）算法的评论家训练目标来解决这一瓶颈。我们进一步证明，在演员通过评论家引导的随机搜索探索多个未来时，在线学习性能显著提高。我们观察到，我们得到的算法在多个经典控制和运动任务中的样本效率和后悔方面与最先进的SAC实现相比表现良好。

更新时间: 2024-06-10 12:53:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2301.12776v3

PAC-Bayes Analysis for Recalibration in Classification

Nonparametric estimation with binning is widely employed in the calibration error evaluation and the recalibration of machine learning models. Recently, theoretical analyses of the bias induced by this estimation approach have been actively pursued; however, the understanding of the generalization of the calibration error to unknown data remains limited. In addition, although many recalibration algorithms have been proposed, their generalization performance lacks theoretical guarantees. To address this problem, we conduct a generalization analysis of the calibration error under the probably approximately correct (PAC) Bayes framework. This approach enables us to derive a first optimizable upper bound for the generalization error in the calibration context. We then propose a generalization-aware recalibration algorithm based on our generalization theory. Numerical experiments show that our algorithm improves the Gaussian-process-based recalibration performance on various benchmark datasets and models.

Updated: 2024-06-10 12:53:13

标题: PAC-Bayes分类中的重新校准分析

摘要: 具有分箱的非参数估计在校准错误评估和机器学习模型的重新校准中被广泛采用。最近，对该估计方法引入的偏差进行了理论分析；然而，对于校准错误在未知数据上的泛化理解仍然有限。此外，虽然已提出许多重新校准算法，但它们的泛化性能缺乏理论保证。为了解决这个问题，我们在可能近似正确（PAC）贝叶斯框架下对校准错误进行了泛化分析。这种方法使我们能够推导出校准上的泛化误差的第一个可优化上界。然后，基于我们的泛化理论，我们提出了一个具有泛化意识的重新校准算法。数值实验表明，我们的算法改善了基于高斯过程的重新校准在各种基准数据集和模型上的性能。

更新时间: 2024-06-10 12:53:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06227v1

Siren -- Advancing Cybersecurity through Deception and Adaptive Analysis

Siren represents a pioneering research effort aimed at fortifying cybersecurity through strategic integration of deception, machine learning, and proactive threat analysis. Drawing inspiration from mythical sirens, this project employs sophisticated methods to lure potential threats into controlled environments. The system features a dynamic machine learning model for real-time analysis and classification, ensuring continuous adaptability to emerging cyber threats. The architectural framework includes a link monitoring proxy, a purpose-built machine learning model for dynamic link analysis, and a honeypot enriched with simulated user interactions to intensify threat engagement. Data protection within the honeypot is fortified with probabilistic encryption. Additionally, the incorporation of simulated user activity extends the system's capacity to capture and learn from potential attackers even after user disengagement. Siren introduces a paradigm shift in cybersecurity, transforming traditional defense mechanisms into proactive systems that actively engage and learn from potential adversaries. The research strives to enhance user protection while yielding valuable insights for ongoing refinement in response to the evolving landscape of cybersecurity threats.

Updated: 2024-06-10 12:47:49

标题: 塞壬——通过欺骗和适应性分析推动网络安全

摘要: Siren代表了一项开拓性的研究工作，旨在通过欺骗、机器学习和积极的威胁分析的战略集成来加强网络安全。这个项目从神话中的塞壬中汲取灵感，采用复杂的方法将潜在威胁引诱进入受控环境。该系统采用了一个动态的机器学习模型进行实时分析和分类，确保对新兴网络威胁的持续适应性。架构框架包括一个链接监控代理，一个专门为动态链接分析设计的机器学习模型，以及一个富有模拟用户交互的蜜罐，以加强威胁的参与。蜜罐内的数据保护通过概率加密得到加强。此外，模拟用户活动的整合扩展了系统捕获和学习潜在攻击者的能力，即使在用户脱离后也能继续。Siren在网络安全领域引入了一种范式转变，将传统的防御机制转变为积极主动地与潜在对手互动并学习的系统。该研究致力于增强用户保护，同时为应对不断发展的网络安全威胁景观提供宝贵的见解。

更新时间: 2024-06-10 12:47:49

领域: cs.CR,cs.LG,C.2.0; I.2.7

下载: http://arxiv.org/abs/2406.06225v1

Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder

A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder with the capability of data compression and feature extraction. Taking Ghost Peak attack as an example, this paper demonstrates the effectiveness, feasibility and generalizability of the proposed attack detection scheme through simulation and experimental validation. The proposed scheme achieves an attack detection success rate of over 99% and can be implemented in current systems at low cost.

Updated: 2024-06-10 12:47:03

标题: 基于通道互惠的自编码器攻击检测用于保护UWB测距

摘要: 一种名为Ghost Peak的各种威胁对超宽带（UWB）系统的安全性能提出了担忧，IEEE 802.15.4z标准的最终确定。基于信道互易性，本文提出了一种低复杂度的攻击检测方案，通过使用具有数据压缩和特征提取功能的自编码器比较两个定位方面的信道脉冲响应（CIR）特征。以Ghost Peak攻击为例，本文通过模拟和实验验证展示了所提出的攻击检测方案的有效性、可行性和通用性。所提出的方案实现了超过99%的攻击检测成功率，并且可以以低成本实现在当前系统中。

更新时间: 2024-06-10 12:47:03

领域: cs.CR,cs.SI,eess.SP,H.1.1

下载: http://arxiv.org/abs/2405.18255v2

Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells

This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of polymeric organic solar cells (OSCs) with a multilayer structure ITO/PEDOT:PSS/P3HT:PCBM/Al. To that aim, we generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days. Then, we relied on a software framework that brings together a conglomeration of automated ML protocols that execute sequentially against our database by simply command-line interface. This easily permits hyper-optimizing and randomizing seeds of the ML models through exhaustive benchmarking so that optimal models are obtained. The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE. Additionally, we contribute with validated models able to screen the behavior of OSCs never seen in the database. In that case, R2~0.96-0.97 and RMSE~1%, thus confirming the reliability of the proposal to predict. For comparative purposes, classical Bayesian regression fitting based on non-linear mean squares (LMS) are also presented, which only perform sufficiently for univariate cases of single OSCs. Hence they fail to outperform the breadth of the capabilities shown by the ML models. Finally, thanks to the standardized results offered by the ML framework, we study the dependencies between the variables of the dataset and their implications for the optimal performance and stability of the OSCs. Reproducibility is ensured by a standardized report altogether with the dataset, which are publicly available at Github.

Updated: 2024-06-10 12:46:22

标题: 比较超优化的机器学习模型，用于预测有机太阳能电池效率下降

摘要: 这项工作提出了一组最佳的机器学习（ML）模型，以表示具有多层结构ITO/PEDOT：PSS/P3HT：PCBM/Al的聚合物有机太阳能电池（OSC）的功率转换效率（PCE）所遭受的时间降解。为此，我们生成了一个包含996个条目的数据库，其中包括有关制造过程和环境条件的多达7个变量，持续超过180天。然后，我们依赖于一个软件框架，该框架集成了一系列自动化的ML协议，通过简单的命令行界面对我们的数据库进行顺序执行。这样可以轻松通过详尽的基准测试对ML模型的超优化和随机化种子，从而获得最佳模型。实现的准确度达到了广泛超过0.90的系数确定值（R2），而均方根误差（RMSE）、平方和误差（SSE）和平均绝对误差（MAE）>1%的目标值PCE。此外，我们贡献了经过验证的模型，能够筛选数据库中从未见过的OSC的行为。在这种情况下，R2约为0.96-0.97，RMSE约为1%，从而确认了预测的可靠性。为了比较，还提出了基于非线性均方（LMS）的经典贝叶斯回归拟合，仅对单个OSC的单变量情况足够。因此，它们未能超越ML模型所展示的广泛能力。最后，由于ML框架提供的标准化结果，我们研究了数据集变量之间的依赖关系及其对OSC的最佳性能和稳定性的影响。通过标准化报告和数据集，确保了可重现性，这些都可以在Github上公开获取。

更新时间: 2024-06-10 12:46:22

领域: cs.LG

下载: http://arxiv.org/abs/2404.00173v2

Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks

A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property prediction consider either 2D molecular graphs or 3D conformer structure representations in isolation. Inspired by recent work on using ensembles of conformers in conjunction with 2D graph representations, we propose $\mathrm{E}$(3)-invariant molecular conformer aggregation networks. The method integrates a molecule's 2D representation with that of multiple of its conformers. Contrary to prior work, we propose a novel 2D-3D aggregation mechanism based on a differentiable solver for the \emph{Fused Gromov-Wasserstein Barycenter} problem and the use of an efficient conformer generation method based on distance geometry. We show that the proposed aggregation mechanism is $\mathrm{E}$(3) invariant and propose an efficient GPU implementation. Moreover, we demonstrate that the aggregation mechanism helps to significantly outperform state-of-the-art molecule property prediction methods on established datasets.

Updated: 2024-06-10 12:43:11

标题: 结构感知E(3)-不变分子构象聚合网络

摘要: 一个分子的二维表示由其原子、它们的属性以及分子的共价键组成。一个分子的三维（几何）表示被称为构象，由其原子类型和笛卡尔坐标组成。每个构象都有一个势能，这个能量越低，它在自然界中出现的可能性就越大。大多数现有的用于分子性质预测的机器学习方法要么只考虑二维分子图，要么只考虑三维构象结构。受最近关于使用构象集合与二维图表示的工作的启发，我们提出了E(3)-不变的分子构象聚合网络。该方法将一个分子的二维表示与其多个构象的表示整合在一起。与以往的工作相反，我们提出了一种基于可微求解器的新颖二维-三维聚合机制，用于融合Gromov-Wasserstein重心问题，并使用基于距离几何的高效构象生成方法。我们展示了所提出的聚合机制是E(3)不变的，并提出了一种高效的GPU实现。此外，我们证明了这种聚合机制有助于在已建立的数据集上显著优于最先进的分子性质预测方法。

更新时间: 2024-06-10 12:43:11

领域: cs.LG

下载: http://arxiv.org/abs/2402.01975v2

Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Pufferfish privacy is a flexible generalization of differential privacy that allows to model arbitrary secrets and adversary's prior knowledge about the data. Unfortunately, designing general and tractable Pufferfish mechanisms that do not compromise utility is challenging. Furthermore, this framework does not provide the composition guarantees needed for a direct use in iterative machine learning algorithms. To mitigate these issues, we introduce a R\'enyi divergence-based variant of Pufferfish and show that it allows us to extend the applicability of the Pufferfish framework. We first generalize the Wasserstein mechanism to cover a wide range of noise distributions and introduce several ways to improve its utility. We also derive stronger guarantees against out-of-distribution adversaries. Finally, as an alternative to composition, we prove privacy amplification results for contractive noisy iterations and showcase the first use of Pufferfish in private convex optimization. A common ingredient underlying our results is the use and extension of shift reduction lemmas.

Updated: 2024-06-10 12:41:43

标题: Rényi河豚隐私：一般加性噪声机制和通过迭代增强隐私

摘要: Pufferfish隐私是差分隐私的一种灵活的泛化形式，允许模拟任意秘密信息和对数据的敌手先验知识。不幸的是，设计不会损害效用的通用且可行的Pufferfish机制是具有挑战性的。此外，该框架并不提供直接在迭代机器学习算法中使用所需的组合保证。为了缓解这些问题，我们引入了一种基于Rényi散度的Pufferfish变种，并展示它可以扩展Pufferfish框架的适用性。我们首先将Wasserstein机制泛化，以涵盖各种噪声分布，并介绍几种改进其效用的方法。我们还推导出针对超出分布的对手更强的保证。最后，作为组合的替代方案，我们证明了对于收缩嘈杂的迭代的隐私增强结果，并展示了Pufferfish在私有凸优化中的首次使用。我们的结果的一个共同要素是使用和扩展移位减少引理。

更新时间: 2024-06-10 12:41:43

领域: cs.CR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.13985v2

Proximity Matters: Analyzing the Role of Geographical Proximity in Shaping AI Research Collaborations

The role of geographical proximity in facilitating inter-regional or inter-organizational collaborations has been studied thoroughly in recent years. However, the effect of geographical proximity on forming scientific collaborations at the individual level still needs to be addressed. Using publication data in the field of artificial intelligence from 2001 to 2019, in this work, the effect of geographical proximity on the likelihood of forming future scientific collaborations among researchers is studied. In addition, the interaction between geographical and network proximities is examined to see whether network proximity can substitute geographical proximity in encouraging long-distance scientific collaborations. Employing conventional and machine learning techniques, our results suggest that geographical distance impedes scientific collaboration at the individual level despite the tremendous improvements in transportation and communication technologies during recent decades. Moreover, our findings show that the effect of network proximity on the likelihood of scientific collaboration increases with geographical distance, implying that network proximity can act as a substitute for geographical proximity.

Updated: 2024-06-10 12:37:47

标题: 接近性重要性：分析地理接近性在塑造人工智能研究合作中的作用

摘要: 近年来，地理接近性在促进区域间或组织间合作方面的作用得到了深入研究。然而，在个人层面上，地理接近性对形成科学合作的影响仍需进一步探讨。本研究利用2001年至2019年间人工智能领域的出版数据，研究了地理接近性对研究人员未来形成科学合作的可能性的影响。此外，还研究了地理和网络接近性之间的相互作用，以查看网络接近性是否可以替代地理接近性鼓励远距离的科学合作。通过运用传统和机器学习技术，我们的结果表明，尽管近几十年来交通和通信技术取得了巨大进步，但地理距离阻碍了个人层面的科学合作。此外，我们的研究发现，网络接近性对科学合作可能性的影响随着地理距离的增加而增加，这意味着网络接近性可以替代地理接近性。

更新时间: 2024-06-10 12:37:47

领域: cs.SI,cs.AI,cs.DL,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2406.06662v1

Label-Looping: Highly Efficient Decoding for Transducers

This paper introduces a highly efficient greedy decoding algorithm for Transducer inference. We propose a novel data structure using CUDA tensors to represent partial hypotheses in a batch that supports parallelized hypothesis manipulations. During decoding, our algorithm maximizes GPU parallelism by adopting a nested-loop design, where the inner loop consumes all blank predictions, while non-blank predictions are handled in the outer loop. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. Experiments show that the label-looping algorithm can bring a speedup up to 2.0X compared to conventional batched decoding algorithms when using batch size 32, and can be combined with other compiler or GPU call-related techniques to bring more speedup. We will open-source our implementation to benefit the research community.

Updated: 2024-06-10 12:34:38

标题: 标签循环：用于转录器的高效解码

摘要: 这篇论文介绍了一种用于Transducer推理的高效贪婪解码算法。我们提出了一种使用CUDA张量表示批处理中的部分假设的新颖数据结构，支持并行化假设操作。在解码过程中，我们的算法通过采用嵌套循环设计最大化GPU并行性，其中内循环消耗所有空白预测，而非空白预测在外部循环中处理。我们的算法是通用的，可以与传统的Transducers和Token-and-Duration Transducers一起使用。实验表明，与传统的批处理解码算法相比，标签循环算法在批处理大小为32时可以带来高达2.0倍的加速，并且可以与其他编译器或GPU调用相关技术结合以实现更多加速。我们将开源我们的实现以造福研究社区。

更新时间: 2024-06-10 12:34:38

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.06220v1

Data Augmentation in Earth Observation: A Diffusion Model Approach

The scarcity of high-quality Earth Observation (EO) imagery poses a significant challenge, despite its critical role in enabling precise analysis and informed decision-making across various sectors. This scarcity is primarily due to atmospheric conditions, seasonal variations, and limited geographical coverage, which complicates the application of Artificial Intelligence (AI) in EO. Data augmentation, a widely used technique in AI that involves generating additional data mainly through parameterized image transformations, has been employed to increase the volume and diversity of data. However, this method often falls short in generating sufficient diversity across key semantic axes, adversely affecting the accuracy of EO applications. To address this issue, we propose a novel four-stage approach aimed at improving the diversity of augmented data by integrating diffusion models. Our approach employs meta-prompts for instruction generation, harnesses general-purpose vision-language models for generating rich captions, fine-tunes an Earth Observation diffusion model, and iteratively augments data. We conducted extensive experiments using four different data augmentation techniques, and our approach consistently demonstrated improvements, outperforming the established augmentation methods, revealing its effectiveness in generating semantically rich and diverse EO images.

Updated: 2024-06-10 12:33:47

标题: 地球观测中的数据增强：扩散模型方法

摘要: 地球观测（EO）图像的高质量稀缺性构成了一个重要挑战，尽管这些图像在各个领域的精确分析和明智决策中发挥着关键作用。这种稀缺性主要是由于大气条件、季节变化和有限的地理覆盖范围，这使得人工智能（AI）在EO中的应用变得复杂。数据增强是AI中广泛使用的一种技术，通过参数化图像转换生成额外数据，已被用来增加数据的数量和多样性。然而，这种方法通常在跨关键语义轴生成足够多样性方面存在不足，从而对EO应用的准确性产生不利影响。为了解决这一问题，我们提出了一种新颖的四阶段方法，旨在通过整合扩散模型来改善增强数据的多样性。我们的方法利用元提示生成指令，利用通用视觉语言模型生成丰富的标题，对地球观测扩散模型进行微调，并迭代地增加数据。我们进行了大量实验，使用四种不同的数据增强技术，我们的方法始终表现出改进，胜过已建立的增强方法，显示了其在生成语义丰富和多样的EO图像方面的有效性。

更新时间: 2024-06-10 12:33:47

领域: cs.CV,cs.AI,cs.SE,I.4.9; I.4.9; I.2.m

下载: http://arxiv.org/abs/2406.06218v1

Challenges in Drone Firmware Analyses of Drone Firmware and Its Solutions

With the advancement of Internet of Things (IoT) technology, its applications span various sectors such as public, industrial, private and military. In particular, the drone sector has gained significant attention for both commercial and military purposes. As a result, there has been a surge in research focused on vulnerability analysis of drones. However, most security research to mitigate threats to IoT devices has focused primarily on networks, firmware and mobile applications. Of these, the use of fuzzing to analyze the security of firmware requires emulation of the firmware. However, when it comes to drone firmware, the industry lacks emulation and automated fuzzing tools. This is largely due to challenges such as limited input interfaces, firmware encryption and signatures. While it may be tempting to assume that existing emulators and automated analyzers for IoT devices can be applied to drones, practical applications have proven otherwise. In this paper, we discuss the challenges of dynamically analyzing drone firmware and propose potential solutions. In addition, we demonstrate the effectiveness of our methodology by applying it to DJI drones, which have the largest market share.

Updated: 2024-06-10 12:30:40

标题: 无人机固件分析中的挑战及其解决方案

摘要: 随着物联网（IoT）技术的进步，其应用跨越公共、工业、私人和军事等各个领域。特别是，无人机领域因商业和军事目的而受到了重视。因此，关于无人机漏洞分析的研究呈现出激增的趋势。然而，大多数针对物联网设备威胁的安全研究主要集中在网络、固件和移动应用程序。其中，使用模糊测试来分析固件的安全性需要模拟固件。然而，当涉及无人机固件时，行业缺乏模拟和自动化模糊测试工具。这在很大程度上是由于挑战，如有限的输入接口、固件加密和签名。虽然可能会诱人地认为现有的物联网设备模拟器和自动分析器可以应用于无人机，但实际应用证明了相反。在本文中，我们讨论了动态分析无人机固件的挑战，并提出了潜在的解决方案。此外，我们通过将其应用于市场份额最大的大疆无人机来展示我们方法的有效性。

更新时间: 2024-06-10 12:30:40

领域: cs.RO,cs.CR

下载: http://arxiv.org/abs/2312.16818v4

A Statistical Theory of Regularization-Based Continual Learning

We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms indexed by matrix-valued hyperparameters, which includes the minimum norm estimator and continual ridge regression as special cases. As more tasks are introduced, we derive an iterative update formula for the estimation error of generalized $\ell_2$-regularized estimators, from which we determine the hyperparameters resulting in the optimal algorithm. Interestingly, the choice of hyperparameters can effectively balance the trade-off between forward and backward knowledge transfer and adjust for data heterogeneity. Moreover, the estimation error of the optimal algorithm is derived explicitly, which is of the same order as that of the oracle estimator. In contrast, our lower bounds for the minimum norm estimator and continual ridge regression show their suboptimality. A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $\ell_2$-regularization in continual learning, which may be of independent interest. Finally, we conduct experiments to complement our theory.

Updated: 2024-06-10 12:25:13

标题: 基于正则化的连续学习的统计理论

摘要: 我们对基于正则化的持续学习在一系列线性回归任务上进行了统计分析，重点是不同正则化项如何影响模型性能。首先，我们推导了作为如果所有数据同时可用的神谕估计器的收敛速度。接下来，我们考虑了一组由矩阵值超参数索引的广义$\ell_2$正则化算法，其中包括最小范数估计器和持续岭回归作为特例。随着引入更多任务，我们推导了广义$\ell_2$正则化估计器的估计误差的迭代更新公式，从中确定导致最佳算法的超参数。有趣的是，超参数的选择可以有效平衡前向和后向知识传递之间的权衡，以及调整数据异质性。此外，我们明确推导了最佳算法的估计误差，其与神谕估计器的阶数相同。相比之下，我们的最小范数估计器和持续岭回归的下界显示了它们的次优性。我们理论分析的一个副产品是早停和持续学习中的广义$\ell_2$正则化之间的等价性，这可能是独立感兴趣的。最后，我们进行实验证明我们理论的补充。

更新时间: 2024-06-10 12:25:13

领域: cs.LG,cs.AI,stat.AP,stat.ML

下载: http://arxiv.org/abs/2406.06213v1

DiffDA: a Diffusion Model for Weather-scale Data Assimilation

The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25 deg (~30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.

Updated: 2024-06-10 12:22:59

标题: DiffDA：一种用于天气尺度数据同化的扩散模型

摘要: 通过准确的数据同化生成初始条件对于天气预报和气候建模至关重要。我们提出DiffDA作为一种去噪扩散模型，能够利用预测状态和稀疏观测同化大气变量。认识到天气预报模型和专门用于天气应用的去噪扩散模型之间的相似性，我们将预训练的GraphCast神经网络适应为扩散模型的骨干。通过基于ERA5再分析数据集的模拟观测实验，我们的方法可以以0.25度（约30公里）全球分辨率生成与观测一致的同化全球大气数据。这标志着ML数据同化模型所达到的最高分辨率。实验还表明，从稀疏观测同化的初始条件（少于0.96%的格网数据）和48小时预报可以用于预报模型，与ERA5中最先进的数据同化的初始条件相比，最多损失24小时的领先时间。这使得该方法可以应用于实际应用，例如创建具有自回归数据同化的再分析数据集。

更新时间: 2024-06-10 12:22:59

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2401.05932v3

Quantum Architecture Search: A Survey

Quantum computing has made significant progress in recent years, attracting immense interest not only in research laboratories but also in various industries. However, the application of quantum computing to solve real-world problems is still hampered by a number of challenges, including hardware limitations and a relatively under-explored landscape of quantum algorithms, especially when compared to the extensive development of classical computing. The design of quantum circuits, in particular parameterized quantum circuits (PQCs), which contain learnable parameters optimized by classical methods, is a non-trivial and time-consuming task requiring expert knowledge. As a result, research on the automated generation of PQCs, known as quantum architecture search (QAS), has gained considerable interest. QAS focuses on the use of machine learning and optimization-driven techniques to generate PQCs tailored to specific problems and characteristics of quantum hardware. In this paper, we provide an overview of QAS methods by examining relevant research studies in the field. We discuss main challenges in designing and performing an automated search for an optimal PQC, and survey ways to address them to ease future research.

Updated: 2024-06-10 12:17:46

标题: 量子架构搜索：一项调查

摘要: 量子计算在近年取得了显著进展，不仅在研究实验室中引起了巨大兴趣，也在各种行业中引起了关注。然而，将量子计算应用于解决实际问题仍然受到许多挑战的限制，包括硬件限制和相对未开发的量子算法领域，特别是与经典计算的广泛发展相比。量子电路的设计，特别是包含由经典方法优化的可学习参数的参数化量子电路（PQC），是一项非常复杂和耗时的任务，需要专业知识。因此，自动生成PQC的研究，即量子架构搜索（QAS），已经引起了相当大的兴趣。QAS专注于利用机器学习和优化驱动技术生成适合特定问题和量子硬件特征的PQC。在本文中，我们通过研究相关领域的研究来提供QAS方法的概述。我们讨论了设计和执行自动生成最佳PQC的主要挑战，并调查了解决这些挑战的方式，以便为未来研究提供帮助。

更新时间: 2024-06-10 12:17:46

领域: quant-ph,cs.AI,I.2.2; J.6

下载: http://arxiv.org/abs/2406.06210v1

Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local data. To echo this challenge, personalized FL (PFL) is designed to allow each client to create personalized local models tailored to their private data. While extensive research has scrutinized backdoor risks in FL, it has remained underexplored in PFL applications. In this study, we delve deep into the vulnerabilities of PFL to backdoor attacks. Our analysis showcases a tale of two cities. On the one hand, the personalization process in PFL can dilute the backdoor poisoning effects injected into the personalized local models. Furthermore, PFL systems can also deploy both server-end and client-end defense mechanisms to strengthen the barrier against backdoor attacks. On the other hand, our study shows that PFL fortified with these defense methods may offer a false sense of security. We propose \textit{PFedBA}, a stealthy and effective backdoor attack strategy applicable to PFL systems. \textit{PFedBA} ingeniously aligns the backdoor learning task with the main learning task of PFL by optimizing the trigger generation process. Our comprehensive experiments demonstrate the effectiveness of \textit{PFedBA} in seamlessly embedding triggers into personalized local models. \textit{PFedBA} yields outstanding attack performance across 10 state-of-the-art PFL algorithms, defeating the existing 6 defense mechanisms. Our study sheds light on the subtle yet potent backdoor threats to PFL systems, urging the community to bolster defenses against emerging backdoor challenges.

Updated: 2024-06-10 12:14:05

标题: 潜伏在阴影中：揭示针对个性化联邦学习的隐秘后门攻击

摘要: 联邦学习（FL）是一种协作机器学习技术，多个客户端与中央服务器共同训练全局模型，而无需共享其私人数据。然而，客户端非独立同分布数据集之间的分布偏移对这种一模式适用于所有方法构成了挑战，阻碍了全局模型有效适应每个客户端独特本地数据的能力。为了应对这一挑战，个性化FL（PFL）旨在允许每个客户端创建根据其私人数据量身定制的个性化本地模型。虽然大量研究已经审查了FL中的后门风险，但在PFL应用中却尚未得到充分探讨。在本研究中，我们深入探讨了PFL对后门攻击的脆弱性。我们的分析展示了两种情况。一方面，PFL中的个性化过程可以稀释注入个性化本地模型的后门毒害效应。此外，PFL系统还可以部署服务器端和客户端防御机制来加强对后门攻击的屏障。另一方面，我们的研究显示，装备这些防御方法的PFL可能会提供一种虚假的安全感。我们提出了一种名为PFedBA的隐蔽有效的后门攻击策略，适用于PFL系统。PFedBA通过优化触发器生成过程，巧妙地将后门学习任务与PFL的主要学习任务对齐。我们的综合实验展示了PFedBA在将触发器无缝嵌入个性化本地模型中的有效性。PFedBA在10种最先进的PFL算法中取得了出色的攻击表现，击败了现有的6种防御机制。我们的研究揭示了PFL系统面临微妙但强大的后门威胁，敦促社区加强对新兴后门挑战的防御。

更新时间: 2024-06-10 12:14:05

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.06207v1

CountCLIP -- [Re] Teaching CLIP to Count to Ten

Large vision-language models (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good counting-aware representation. This paper conducts a reproducibility study of 'Teaching CLIP to Count to Ten' (Paiss et al., 2023), which presents a method to finetune a CLIP model (Radford et al., 2021) to improve zero-shot counting accuracy in an image while maintaining the performance for zero-shot classification by introducing a counting-contrastive loss term. We improve the model's performance on a smaller subset of their training data with lower computational resources. We verify these claims by reproducing their study with our own code. The implementation can be found at https://github.com/SforAiDl/CountCLIP.

Updated: 2024-06-10 12:09:37

标题: CountCLIP -- [Re] 教CLIP数到十

摘要: 大型视觉语言模型（VLMs）被证明能够学习丰富的联合图像-文本表示，从而在相关的下游任务中取得高性能。然而，它们未能展示出对物体的定量理解，也缺乏良好的计数感知表示。本文对《教CLIP数到十》（Paiss等，2023）进行了一项可重现性研究，该研究提出了一种方法，通过引入计数对比损失项来微调CLIP模型（Radford等，2021），以提高图像的零样本计数准确性，并同时保持零样本分类的性能。我们在较小子集的训练数据上，利用较低的计算资源提高了模型的性能。我们通过使用我们自己的代码重现他们的研究来验证这些声明。实现代码可以在https://github.com/SforAiDl/CountCLIP找到。

更新时间: 2024-06-10 12:09:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03586v2

BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models

Semantic information is vital for human interaction, and decoding it from brain activity enables non-invasive clinical augmentative and alternative communication. While there has been significant progress in reconstructing visual images, few studies have focused on the language aspect. To address this gap, leveraging the powerful capabilities of the decoder-based vision-language pretrained model CoCa, this paper proposes BrainChat, a simple yet effective generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity, including fMRI question answering and fMRI captioning. BrainChat employs the self-supervised approach of Masked Brain Modeling to encode sparse fMRI data, obtaining a more compact embedding representation in the latent space. Subsequently, BrainChat bridges the gap between modalities by applying contrastive loss, resulting in aligned representations of fMRI, image, and text embeddings. Furthermore, the fMRI embeddings are mapped to the generative Brain Decoder via cross-attention layers, where they guide the generation of textual content about fMRI in a regressive manner by minimizing caption loss. Empirically, BrainChat exceeds the performance of existing state-of-the-art methods in the fMRI captioning task and, for the first time, implements fMRI question answering. Additionally, BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.

Updated: 2024-06-10 12:06:15

标题: BrainChat：使用视觉-语言预训练模型从fMRI解码语义信息

摘要: 语义信息对人类交互至关重要，从大脑活动中解码它使非侵入性临床辅助和替代沟通成为可能。虽然在重建视觉图像方面取得了重大进展，但很少有研究关注语言方面。为填补这一空白，利用基于解码器的视觉-语言预训练模型CoCa的强大功能，本文提出了BrainChat，这是一个简单而有效的生成框架，旨在快速完成来自大脑活动的语义信息解码任务，包括fMRI问题回答和fMRI字幕。BrainChat采用自监督方法Masked Brain Modeling来编码稀疏的fMRI数据，在潜在空间中获得更紧凑的嵌入表示。随后，BrainChat通过应用对比损失来弥合模态之间的差距，导致fMRI、图像和文本嵌入的对齐表示。此外，fMRI嵌入被映射到生成式Brain Decoder通过交叉注意力层，通过最小化字幕损失，以回归方式指导有关fMRI的文本内容的生成。实证上，BrainChat在fMRI字幕任务中超越了现有的最先进方法，并且首次实现了fMRI问题回答。此外，BrainChat非常灵活，可以在没有图像数据的情况下实现高性能，使其更适合具有有限数据的现实场景。

更新时间: 2024-06-10 12:06:15

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07584v1

Federated learning in food research

Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures.

Updated: 2024-06-10 11:58:11

标题: 食品研究中的联邦学习

摘要: 在食品领域的研究有时受到数据共享障碍的限制，例如数据所有权、隐私要求和法规。虽然重要，但这些障碍可能限制机器学习等数据驱动方法。联邦学习是一种在本地数据上训练模型并仅共享学习参数的方法，是减轻数据共享障碍的潜在技术。本系统评估调查了在食品领域内使用联邦学习的情况，将包括的论文结构化为联邦学习框架，突出了知识空白，并讨论了潜在应用。总共有41篇论文被纳入了这个评估。目前的应用包括解决水和牛奶质量评估、水处理的网络安全、农药残留风险分析、杂草检测和诈骗检测等问题，重点是集中式水平联邦学习。发现的一个空白是缺乏垂直或转移联邦学习和分散式架构。

更新时间: 2024-06-10 11:58:11

领域: cs.LG

下载: http://arxiv.org/abs/2406.06202v1

2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval

Moment retrieval aims to locate the most relevant moment in an untrimmed video based on a given natural language query. Existing solutions can be roughly categorized into moment-based and clip-based methods. The former often involves heavy computations, while the latter, due to overlooking coarse-grained information, typically underperforms compared to moment-based models. Hence, this paper proposes a novel 2-Dimensional Pointer-based Machine Reading Comprehension for Moment Retrieval Choice (2DP-2MRC) model to address the issue of imprecise localization in clip-based methods while maintaining lower computational complexity than moment-based methods. Specifically, we introduce an AV-Encoder to capture coarse-grained information at moment and video levels. Additionally, a 2D pointer encoder module is introduced to further enhance boundary detection for target moment. Extensive experiments on the HiREST dataset demonstrate that 2DP-2MRC significantly outperforms existing baseline models.

Updated: 2024-06-10 11:53:29

标题: 2DP-2MRC: 用于多模态时刻检索的基于二维指针的机器阅读理解方法

摘要: Moment retrieval旨在根据给定的自然语言查询在未修剪的视频中定位最相关的时刻。现有解决方案大致可分为基于时刻和基于片段的方法。前者通常涉及大量计算，而后者由于忽视粗粒度信息，通常表现不如基于时刻的模型。因此，本文提出了一种新颖的基于二维指针的机器阅读理解用于时刻检索选择（2DP-2MRC）模型，以解决基于片段方法中不精确定位的问题，同时保持比基于时刻方法更低的计算复杂度。具体来说，我们引入了一个AV-Encoder来捕捉时刻和视频级别的粗粒度信息。此外，还引入了一个二维指针编码器模块，进一步增强目标时刻的边界检测。对HiREST数据集进行的大量实验表明，2DP-2MRC明显优于现有的基准模型。

更新时间: 2024-06-10 11:53:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06201v1

Implications for Governance in Public Perceptions of Societal-scale AI Risks

Amid growing concerns over AI's societal risks--ranging from civilizational collapse to misinformation and systemic bias--this study explores the perceptions of AI experts and the general US registered voters on the likelihood and impact of 18 specific AI risks, alongside their policy preferences for managing these risks. While both groups favor international oversight over national or corporate governance, our survey reveals a discrepancy: voters perceive AI risks as both more likely and more impactful than experts, and also advocate for slower AI development. Specifically, our findings indicate that policy interventions may best assuage collective concerns if they attempt to more carefully balance mitigation efforts across all classes of societal-scale risks, effectively nullifying the near-vs-long-term debate over AI risks. More broadly, our results will serve not only to enable more substantive policy discussions for preventing and mitigating AI risks, but also to underscore the challenge of consensus building for effective policy implementation.

Updated: 2024-06-10 11:52:25

标题: 公众对社会规模人工智能风险的看法对治理的影响

摘要: 在人工智能的社会风险日益引起关注的背景下，从文明崩溃到错误信息和系统偏见等范围广泛的风险，本研究探讨了人工智能专家和美国注册选民对18种具体人工智能风险的可能性和影响的认知，以及他们对管理这些风险的政策偏好。虽然两组都支持国际监督而不是国家或企业治理，但我们的调查显示了一种差异：选民认为人工智能风险比专家更可能发生，对影响也更大，并主张减缓人工智能发展。具体而言，我们的研究结果表明，政策干预可能最好地缓解集体担忧，如果这些政策尝试更加谨慎地平衡在所有社会规模风险类别上的减轻努力，有效地消除关于人工智能风险的近期与长期辩论。更广泛地说，我们的结果不仅将促使更具实质性的政策讨论以预防和减轻人工智能风险，而且将强调共识建立对有效政策实施的挑战。

更新时间: 2024-06-10 11:52:25

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.06199v1

Space-Time Continuous PDE Forecasting using Equivariant Neural Fields

Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions -- e.g. symmetries or boundary conditions -- in favour of modelling flexibility. Instead, we propose a space-time continuous NeF-based solving framework that - by preserving geometric information in the latent space - respects known symmetries of the PDE. We show that modelling solutions as flows of pointclouds over the group of interest $G$ improves generalization and data-efficiency. We validated that our framework readily generalizes to unseen spatial and temporal locations, as well as geometric transformations of the initial conditions - where other NeF-based PDE forecasting methods fail - and improve over baselines in a number of challenging geometries.

Updated: 2024-06-10 11:49:11

标题: 使用等变神经场的时空连续PDE预测

摘要: 最近，条件神经场（NeFs）已经成为PDE的强大建模范式，通过在条件NeF的潜在空间中学习解作为流动。尽管受益于NeFs的诸如与网格无关性和空间-时间连续动态建模等有利特性，这种方法限制了在解上施加已知PDE约束的能力，例如对称性或边界条件，以换取建模灵活性。相反，我们提出了一个基于空间-时间连续NeF的求解框架，通过在潜在空间中保留几何信息，尊重PDE的已知对称性。我们展示将解建模为在感兴趣的群$G$上的点云流会改善泛化和数据效率。我们验证了我们的框架能够轻松泛化到未见的空间和时间位置，以及初始条件的几何变换，而其他基于NeF的PDE预测方法失败，并在许多具有挑战性的几何形状上提高了基线水平。

更新时间: 2024-06-10 11:49:11

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.06660v1

Demonstration-Regularized RL

Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using $N^{\mathrm{E}}$ expert demonstrations enables the identification of an optimal policy at a sample complexity of order $\widetilde{O}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in finite and $\widetilde{O}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in linear Markov decision processes, where $\varepsilon$ is the target precision, $H$ the horizon, $A$ the number of action, $S$ the number of states in the finite case and $d$ the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behaviour cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.

Updated: 2024-06-10 11:46:34

标题: 演示规范化的强化学习

摘要: 将专家演示纳入到强化学习（RL）中在实践中已经被证明可以提高样本效率。本文从理论上量化了这种额外信息在降低RL样本复杂度方面的程度。特别地，我们研究了利用KL正则化的演示正则化强化学习，通过行为克隆学习的策略利用专家演示。我们的研究发现，使用$N^{\mathrm{E}}$个专家演示可以在有限马尔可夫决策过程中以$\widetilde{O}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}}))$的样本复杂度顺序识别出最优策略，在线性马尔可夫决策过程中以$\widetilde{O}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}}))$的顺序。其中，$\varepsilon$为目标精度，$H$为时间跨度，$A$为动作数，在有限情况下$S$为状态数，在线性情况下$d$为特征空间的维数。作为副产品，我们在对策略类别做出一般假设的情况下提供了行为克隆过程的收敛保证。此外，我们证明了演示正则化方法在从人类反馈中进行强化学习（RLHF）方面是有效的。在这方面，我们提供了理论证据，展示了KL正则化在表格化和线性MDP中对RLHF的益处。有趣的是，我们通过使用计算可行的正则化方法来处理奖励估计不确定性，从而使我们的方法与先前的工作有所不同，避免了悲观注入。

更新时间: 2024-06-10 11:46:34

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.17303v2

AI Cat Narrator: Designing an AI Tool for Exploring the Shared World and Social Connection with a Cat

As technology continues to advance, the interaction between humans and cats is becoming more diverse. Our research introduces a new tool called the AI Cat Narrator, which offers a unique perspective on the shared lives of humans and cats. We combined the method of ethnography with fictional storytelling, using a defamiliarization strategy to merge real-world data seen through the eyes of cats with excerpts from cat literature. This combination serves as the foundation for a database to instruct the AI Cat Narrator in crafting alternative narrative. Our findings indicate that using defamiliarized data for training purposes significantly contributes to the development of characters that are both more empathetic and individualized. The contributions of our study are twofold: 1) proposing an innovative approach to prompting a reevaluation of living alongside cats; 2) establishing a collaborative, exploratory tool developed by humans, cats, and AI together.

Updated: 2024-06-10 11:44:15

标题: AI猫叙述者：设计一款用于探索与猫共享世界和社会联系的人工智能工具

摘要: 随着技术的不断进步，人类与猫之间的互动变得更加多样化。我们的研究引入了一种名为AI猫叙述者的新工具，该工具提供了人类和猫共同生活的独特视角。我们将民族志方法与虚构叙事相结合，使用一种疏远化策略，将通过猫的眼睛看到的真实世界数据与猫文学摘录相融合。这种组合构成了一个数据库的基础，以指导AI猫叙述者在创作替代叙事方面。我们的研究发现表明，使用疏远化的数据进行训练目的显著促进了塑造更具移情和个性化的角色。我们研究的贡献是双重的：1）提出了促使重新评估与猫共同生活的创新方法；2）建立了一个由人类、猫和AI共同开发的协作性、探索性工具。

更新时间: 2024-06-10 11:44:15

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.06192v1

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun

Updated: 2024-06-10 11:43:48

标题: 《CPsyCoun：基于报告的中文心理咨询多轮对话重建和评估框架》

摘要: 目前利用大型语言模型（LLMs）辅助心理咨询是一个重要但具有挑战性的任务。目前已经尝试改进具有同理心的对话或作为LLMs在治疗中有效助手的做法。然而，现有数据集缺乏咨询知识，导致LLMs缺乏专业咨询能力。此外，在咨询过程中如何自动评估多轮对话仍是一个研究不足的领域。为了弥补这一差距，我们提出了CPsyCoun，一个基于报告的中文心理咨询多轮对话重建和评估框架。为了充分利用心理咨询报告，我们设计了一个两阶段方法来构建高质量的对话，同时开发了一个全面的评估基准，用于有效地自动评估多轮心理咨询。竞争性实验结果证明了我们提出的框架在心理咨询中的有效性。我们向未来研究开放了数据集和模型，网址为https://github.com/CAS-SIAT-XinHai/CPsyCoun。

更新时间: 2024-06-10 11:43:48

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.16433v3

Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures

Model-based deep learning methods such as loop unrolling (LU) and deep equilibrium model}(DEQ) extensions offer outstanding performance in solving inverse problems (IP). These methods unroll the optimization iterations into a sequence of neural networks that in effect learn a regularization function from data. While these architectures are currently state-of-the-art in numerous applications, their success heavily relies on the accuracy of the forward model. This assumption can be limiting in many physical applications due to model simplifications or uncertainties in the apparatus. To address forward model mismatch, we introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance. We propose two variants in well-known model-based architectures (LU and DEQ) and prove convergence under mild conditions. Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass, benefiting both linear and nonlinear inverse problems. The experiments show significant quality improvement in removing artifacts and preserving details across three distinct applications, encompassing both linear and nonlinear inverse problems. Moreover, we highlight reconstruction effectiveness in intermediate steps and showcase robustness to random initialization of the residual block and a higher number of iterations during evaluation. Code is available at \texttt{https://github.com/InvProbs/A-adaptive-model-based-methods}.

Updated: 2024-06-10 11:43:17

标题: 使用模型不匹配的未经训练的神经网络在基于模型的架构中解决反问题

摘要: 基于模型的深度学习方法，如循环展开（LU）和深度平衡模型（DEQ）扩展，在解决逆问题（IP）方面表现出色。这些方法将优化迭代展开为一系列神经网络，实际上从数据中学习了一个正则化函数。尽管这些架构目前在许多应用中处于领先地位，但它们的成功在很大程度上取决于前向模型的准确性。这种假设在许多物理应用中可能受到限制，原因是模型简化或仪器不确定性。为了解决前向模型不匹配的问题，我们在基于模型的架构中引入了一个未经训练的前向模型残差块，以匹配测量领域中每个实例的数据一致性。我们提出了两种在众所周知的基于模型的架构（LU和DEQ）中的变体，并在温和条件下证明了收敛性。我们的方法提供了一个统一的解决方案，对参数不敏感，不需要额外的数据，并能够在一个步骤中同时拟合前向模型和重构，从而有利于线性和非线性逆问题。实验证明，在三个不同应用中去除伪影并保留细节的质量显着提高，涵盖了线性和非线性逆问题。此外，我们突出了中间步骤中的重构有效性，并展示了残差块的随机初始化和评估过程中更多迭代次数的稳健性。代码可在\texttt{https://github.com/InvProbs/A-adaptive-model-based-methods}上找到。

更新时间: 2024-06-10 11:43:17

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2403.04847v2

Quantized Approximately Orthogonal Recurrent Neural Networks

In recent years, Orthogonal Recurrent Neural Networks (ORNNs) have gained popularity due to their ability to manage tasks involving long-term dependencies, such as the copy-task, and their linear complexity. However, existing ORNNs utilize full precision weights and activations, which prevents their deployment on compact devices.In this paper, we explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs). The construction of such networks remained an open problem, acknowledged for its inherent instability. We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and orthogonal projections. We also study post-training quantization of the activations for pure integer computation of the recurrent loop. The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.

Updated: 2024-06-10 11:40:40

标题: 量化近似正交循环神经网络

摘要: 近年来，正交循环神经网络（ORNNs）因其能够处理涉及长期依赖关系的任务（如复制任务）以及其线性复杂度而受到欢迎。然而，现有的ORNNs利用完整精度的权重和激活，这阻碍了它们在紧凑设备上的部署。在本文中，我们探讨了ORNNs中权重矩阵的量化，从而导致了近似正交的量化循环神经网络（QORNNs）。这样网络的构建一直是一个未解决的问题，因为它被认为具有固有的不稳定性。我们提出并研究了两种策略来学习QORNN，即结合量化感知训练（QAT）和正交投影。我们还研究了对激活进行后训练量化，以进行循环环路的纯整数计算。最有效的模型在各种标准基准上实现了与最先进的完整精度ORNN、LSTM和FastRNN类似的结果，即使使用4位量化也是如此。

更新时间: 2024-06-10 11:40:40

领域: cs.NE,cs.AI,cs.LG,eess.SP,math.ST,stat.TH

下载: http://arxiv.org/abs/2402.04012v2

AMED: Automatic Mixed-Precision Quantization for Edge Devices

Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. This makes them highly appropriate for systems with limited resources and low power capacity. Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Quantization methods either aim to minimize the compression loss given a desired reduction or optimize a dependent variable for a specified property of the model (such as FLOPs or model size); both make the performance inefficient when deployed on specific hardware, but more importantly, quantization methods assume that the loss manifold holds a global minimum for a quantized model that copes with the global minimum of the full precision counterpart. Challenging this assumption, we argue that the optimal minimum changes as the precision changes, and thus, it is better to look at quantization as a random process, placing the foundation for a different approach to quantize neural networks, which, during the training procedure, quantizes the model to a different precision, looks at the bit allocation as a Markov Decision Process, and then, finds an optimal bitwidth allocation for measuring specified behaviors on a specific device via direct signals from the particular hardware architecture. By doing so, we avoid the basic assumption that the loss behaves the same way for a quantized model. Automatic Mixed-Precision Quantization for Edge Devices (dubbed AMED) demonstrates its superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency, backed by a comprehensive evaluation.

Updated: 2024-06-10 11:35:42

标题: AMED：边缘设备的自动混合精度量化

摘要: 量子化神经网络以减少延迟、功耗和模型大小而闻名，且对性能影响不大。这使得它们非常适合资源有限、功耗低的系统。混合精度量化可以更好地利用支持不同位宽算术运算的定制硬件。量化方法要么旨在在给定期望减少的压缩损失，要么在指定模型属性（如FLOPs或模型大小）的情况下优化一个相关变量；这两种方法在特定硬件上部署时都会使性能低效，但更重要的是，量化方法假设损失流形保持与完整精度对应物的全局最小值相匹配。挑战这个假设，我们认为最优最小值随着精度变化而变化，因此，最好将量化视为一个随机过程，为量化神经网络的不同方法奠定基础，其在训练过程中将模型量化为不同精度，并将位分配视为马尔可夫决策过程，然后通过来自特定硬件架构的直接信号找到一种对于特定设备上测量指定行为的最佳位宽分配。通过这样做，我们避免了损失对于量化模型的行为是相同的基本假设。边缘设备的自动混合精度量化（称为AMED）在神经网络精度与硬件效率之间的权衡上表现出优越性，得到了全面评估的支持，背书了其优于当前最先进方案的地位。

更新时间: 2024-06-10 11:35:42

领域: cs.LG

下载: http://arxiv.org/abs/2205.15437v2

Moderate Adaptive Linear Units (MoLU)

We propose a new high-performance activation function, Moderate Adaptive Linear Units (MoLU), for the deep neural network. The MoLU is a simple, beautiful and powerful activation function that can be a good main activation function among hundreds of activation functions. Because the MoLU is made up of the elementary functions, not only it is a infinite diffeomorphism (i.e. smooth and infinitely differentiable over whole domains), but also it decreases training time.

Updated: 2024-06-10 11:32:24

标题: Moderate Adaptive Linear Units (MoLU)的翻译是：中等自适应线性单元 (MoLU)

摘要: 我们提出了一种新的高性能激活函数，适度自适应线性单元（MoLU），用于深度神经网络。MoLU是一种简单、美观和强大的激活函数，可以成为数百种激活函数中的主要激活函数。由于MoLU由基本函数组成，不仅是一个无限可微分的同胚映射（即在整个定义域上是平滑且无限可微），而且它还可以减少训练时间。

更新时间: 2024-06-10 11:32:24

领域: cs.LG,cs.AI,cs.GT,cs.NE

下载: http://arxiv.org/abs/2302.13696v4

A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks

The explosive growth of machine learning has made it a critical infrastructure in the era of artificial intelligence. The extensive use of data poses a significant threat to individual privacy. Various countries have implemented corresponding laws, such as GDPR, to protect individuals' data privacy and the right to be forgotten. This has made machine unlearning a research hotspot in the field of privacy protection in recent years, with the aim of efficiently removing the contribution and impact of individual data from trained models. The research in academia on machine unlearning has continuously enriched its theoretical foundation, and many methods have been proposed, targeting different data removal requests in various application scenarios. However, recently researchers have found potential privacy leakages of various of machine unlearning approaches, making the privacy preservation on machine unlearning area a critical topic. This paper provides an overview and analysis of the existing research on machine unlearning, aiming to present the current vulnerabilities of machine unlearning approaches. We analyze privacy risks in various aspects, including definitions, implementation methods, and real-world applications. Compared to existing reviews, we analyze the new challenges posed by the latest malicious attack techniques on machine unlearning from the perspective of privacy threats. We hope that this survey can provide an initial but comprehensive discussion on this new emerging area.

Updated: 2024-06-10 11:31:04

标题: 一项关于机器遗忘的调查：技术和新出现的隐私风险

摘要: 机器学习的爆炸性增长使其在人工智能时代成为重要基础设施。对数据的广泛使用对个人隐私构成重大威胁。各国已实施了相应的法律，如《GDPR》，以保护个人数据隐私和被遗忘权。这使得机器去学习成为近年来隐私保护领域的研究热点，旨在有效地从经过训练的模型中移除个人数据的贡献和影响。学术界对机器去学习的研究不断丰富其理论基础，并提出了许多方法，针对不同的数据删除请求在各种应用场景中。然而，最近研究人员发现了各种机器去学习方法的潜在隐私泄漏，使得机器去学习领域的隐私保护成为一个关键话题。本文概述和分析了关于机器去学习的现有研究，旨在呈现机器去学习方法的当前漏洞。我们分析了各个方面的隐私风险，包括定义、实施方法和实际应用。与现有评论相比，我们从隐私威胁的角度分析了最新恶意攻击技术对机器去学习提出的新挑战。我们希望这项调查能对这一新兴领域进行初步但全面的讨论。

更新时间: 2024-06-10 11:31:04

领域: cs.CR

下载: http://arxiv.org/abs/2406.06186v1

When predict can also explain: few-shot prediction to select better neural latents

Latent variable models serve as powerful tools to infer underlying dynamics from observed neural activity. However, due to the absence of ground truth data, prediction benchmarks are often employed as proxies. In this study, we reveal the limitations of the widely-used 'co-smoothing' prediction framework and propose an improved few-shot prediction approach that encourages more accurate latent dynamics. Utilizing a student-teacher setup with Hidden Markov Models, we demonstrate that the high co-smoothing model space can encompass models with arbitrary extraneous dynamics within their latent representations. To address this, we introduce a secondary metric -- a few-shot version of co-smoothing. This involves performing regression from the latent variables to held-out channels in the data using fewer trials. Our results indicate that among models with near-optimal co-smoothing, those with extraneous dynamics underperform in the few-shot co-smoothing compared to 'minimal' models devoid of such dynamics. We also provide analytical insights into the origin of this phenomenon. We further validate our findings on real neural data using two state-of-the-art methods: LFADS and STNDT. In the absence of ground truth, we suggest a proxy measure to quantify extraneous dynamics. By cross-decoding the latent variables of all model pairs with high co-smoothing, we identify models with minimal extraneous dynamics. We find a correlation between few-shot co-smoothing performance and this new measure. In summary, we present a novel prediction metric designed to yield latent variables that more accurately reflect the ground truth, offering a significant improvement for latent dynamics inference.

Updated: 2024-06-10 11:30:28

标题: 当预测也能解释：少样本预测选择更好的神经潜变量

摘要: 潜变量模型是推断观察到的神经活动背后动态的强大工具。然而，由于缺乏真实数据，预测基准常常被用作代理。在本研究中，我们揭示了广泛使用的“共平滑”预测框架的局限性，并提出了一种改进的少样本预测方法，鼓励更准确的潜在动态。利用隐藏马尔可夫模型的学生-教师设置，我们证明高共平滑模型空间可以包含具有任意无关动态的模型在其潜在表示中。为了解决这个问题，我们引入了一个次要指标 -- 共平滑的少样本版本。这涉及使用更少的试验从潜变量到数据中的保留通道进行回归。我们的结果表明，在具有接近最优共平滑的模型中，具有无关动态的模型在少样本共平滑中表现不佳，而“最小”模型则不具备这种动态。我们还就这一现象的起源提供了分析见解。我们进一步验证了我们在真实神经数据上的发现，使用两种最先进的方法：LFADS和STNDT。在没有真实数据的情况下，我们建议使用代理指标来量化无关动态。通过交叉解码所有具有高共平滑的模型对的潜变量，我们确定了具有最小无关动态的模型。我们发现少样本共平滑性能与这个新指标之间存在相关性。总之，我们提出了一种新的预测度量标准，旨在产生更准确反映真实情况的潜变量，为潜在动态推断提供了显著改进。

更新时间: 2024-06-10 11:30:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.14425v2

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

Updated: 2024-06-10 11:28:29

标题: EARS：一个用于语音增强和去混响基准测试的无混响全频语音数据集

摘要: 我们发布了EARS（Expressive Anechoic Recordings of Speech）数据集，这是一个高质量的语音数据集，包括来自不同背景的107名发言者，总共100小时的清晰、无混响的语音数据。该数据集涵盖了各种不同的说话风格，包括情绪性语音、不同的阅读风格、非语言声音和自由对话式语音。我们在数据集上对语音增强和去混响的各种方法进行基准测试，并通过一组仪器指标评估它们的性能。此外，我们为语音增强任务进行了一个听觉测试，有20名参与者参与其中，其中一种生成方法被优先选择。我们引入了一个盲测试集，允许对上传的数据进行自动在线评估。数据集下载链接和自动评估服务器可在网上找到。

更新时间: 2024-06-10 11:28:29

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.06185v1

Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization

In this paper, we introduce Multi-Objective Deep Centralized Multi-Agent Actor-Critic (MO- DCMAC), a multi-objective reinforcement learning (MORL) method for infrastructural maintenance optimization, an area traditionally dominated by single-objective reinforcement learning (RL) approaches. Previous single-objective RL methods combine multiple objectives, such as probability of collapse and cost, into a singular reward signal through reward-shaping. In contrast, MO-DCMAC can optimize a policy for multiple objectives directly, even when the utility function is non-linear. We evaluated MO-DCMAC using two utility functions, which use probability of collapse and cost as input. The first utility function is the Threshold utility, in which MO-DCMAC should minimize cost so that the probability of collapse is never above the threshold. The second is based on the Failure Mode, Effects, and Criticality Analysis (FMECA) methodology used by asset managers to asses maintenance plans. We evaluated MO-DCMAC, with both utility functions, in multiple maintenance environments, including ones based on a case study of the historical quay walls of Amsterdam. The performance of MO-DCMAC was compared against multiple rule-based policies based on heuristics currently used for constructing maintenance plans. Our results demonstrate that MO-DCMAC outperforms traditional rule-based policies across various environments and utility functions.

Updated: 2024-06-10 11:28:25

标题: 基于效用的基础设施维护优化的深度多目标强化学习

摘要: 在本文中，我们介绍了多目标深度集中式多智能体演员-评论家（MO-DCMAC），这是一种基础设施维护优化的多目标强化学习（MORL）方法，传统上由单目标强化学习（RL）方法主导。先前的单目标RL方法通过奖励塑造将多个目标（如倒塌概率和成本）组合成一个奖励信号。相比之下，MO-DCMAC可以直接为多个目标优化策略，即使效用函数是非线性的。我们使用两个效用函数评估了MO-DCMAC，这两个效用函数使用倒塌概率和成本作为输入。第一个效用函数是阈值效用，其中MO-DCMAC应该最小化成本，以便倒塌概率永远不超过阈值。第二个基于故障模式、影响和重要性分析（FMECA）方法，资产管理者用于评估维护计划。我们在多个维护环境中评估了MO-DCMAC，包括基于阿姆斯特丹历史码头墙案例研究的环境。MO-DCMAC的性能与基于启发式当前用于构建维护计划的多个基于规则的政策进行了比较。我们的结果表明，MO-DCMAC在各种环境和效用函数中优于传统的基于规则的政策。

更新时间: 2024-06-10 11:28:25

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06184v1

Guided Diffusion for Fast Inverse Design of Density-based Mechanical Metamaterials

Mechanical metamaterial is a synthetic material that can possess extraordinary physical characteristics, such as abnormal elasticity, stiffness, and stability, by carefully designing its internal structure. To make metamaterials contain delicate local structures with unique mechanical properties, it is a potential method to represent them through high-resolution voxels. However, it brings a substantial computational burden. To this end, this paper proposes a fast inverse design method, whose core is an advanced deep generative AI algorithm, to generate voxel-based mechanical metamaterials. Specifically, we use the self-conditioned diffusion model, capable of generating a microstructure with a resolution of $128^3$ to approach the specified homogenized tensor matrix in just 3 seconds. Accordingly, this rapid reverse design tool facilitates the exploration of extreme metamaterials, the sequence interpolation in metamaterials, and the generation of diverse microstructures for multi-scale design. This flexible and adaptive generative tool is of great value in structural engineering or other mechanical systems and can stimulate more subsequent research.

Updated: 2024-06-10 11:26:14

标题: 引导扩散用于基于密度的机械超材料快速逆向设计

摘要: 机械超材料是一种合成材料，通过精心设计其内部结构，可以具有非凡的物理特性，如异常的弹性、刚度和稳定性。为了使超材料包含具有独特机械性能的精细局部结构，通过高分辨率的体素来表示它们是一种潜在的方法。然而，这带来了巨大的计算负担。因此，本文提出了一种快速逆向设计方法，其核心是一种先进的深度生成人工智能算法，用于生成基于体素的机械超材料。具体来说，我们使用了自条件扩散模型，能够在仅3秒内生成分辨率为$128^3$的微结构，以接近指定的均质张量矩阵。因此，这种快速逆向设计工具促进了对极端超材料的探索，超材料中的序列插值，以及多尺度设计的各种微结构的生成。这种灵活和适应性的生成工具在结构工程或其他机械系统中具有巨大的价值，并可以激发更多的后续研究。

更新时间: 2024-06-10 11:26:14

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2401.13570v2

Pipeline Parallelism with Controllable Memory

Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our proposed methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models.

Updated: 2024-06-10 11:24:06

标题: 具有可控内存的管道并行ism

摘要: 管道并行性已被广泛探索，但大多数现有的调度缺乏系统方法。在本文中，我们提出了一个框架来将管道调度分解为重复的构建块，并展示构建块的寿命决定了管道调度的峰值激活内存。在观察的指导下，我们发现据我们所知，几乎所有现有的管道调度都存在内存效率低的问题。为了解决这个问题，我们引入了一系列具有可控激活内存的内存高效构建块，可以将峰值激活内存减少到1F1B的一半，而不会牺牲效率，甚至可以减少到1/3，同时具有可比较的吞吐量。我们还可以在保持与1F1B相同激活内存的同时实现几乎零的管道泡沫。我们的评估表明，在纯管道并行设置中，我们的方法在吞吐量方面比1F1B高出7%至55%。在实际情景中通过对混合并行超参数进行网格搜索时，我们提出的方法在大型语言模型中相比1F1B基线实现了16%的吞吐量改善。

更新时间: 2024-06-10 11:24:06

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2405.15362v3

Link Prediction in Bipartite Networks

Bipartite networks serve as highly suitable models to represent systems involving interactions between two distinct types of entities, such as online dating platforms, job search services, or ecommerce websites. These models can be leveraged to tackle a number of tasks, including link prediction among the most useful ones, especially to design recommendation systems. However, if this task has garnered much interest when conducted on unipartite (i.e. standard) networks, it is far from being the case for bipartite ones. In this study, we address this gap by performing an experimental comparison of 19 link prediction methods able to handle bipartite graphs. Some come directly from the literature, and some are adapted by us from techniques originally designed for unipartite networks. We also propose to repurpose recommendation systems based on graph convolutional networks (GCN) as a novel link prediction solution for bipartite networks. To conduct our experiments, we constitute a benchmark of 3 real-world bipartite network datasets with various topologies. Our results indicate that GCN-based personalized recommendation systems, which have received significant attention in recent years, can produce successful results for link prediction in bipartite networks. Furthermore, purely heuristic metrics that do not rely on any learning process, like the Structural Perturbation Method (SPM), can also achieve success.

Updated: 2024-06-10 11:23:30

标题: 双分网络中的链接预测

摘要: 双分网络是表示涉及两种不同类型实体之间相互作用的系统的非常适合的模型，如在线约会平台、求职服务或电子商务网站。这些模型可以用来解决许多任务，其中包括链接预测是最有用的之一，尤其是用于设计推荐系统。然而，尽管在单分（即标准）网络上进行这一任务已经引起了很多兴趣，但在双分网络上却远未达到这种情况。在本研究中，我们通过对能够处理双分图的19种链接预测方法进行实验比较来填补这一差距。其中一些直接来自文献，另一些是我们从最初设计用于单分网络的技术中进行了调整。我们还提出重新利用基于图卷积网络（GCN）的推荐系统作为双分网络的新型链接预测解决方案。为了进行实验，我们构建了一个包含3个真实世界双分网络数据集的基准，具有不同的拓扑结构。我们的结果表明，基于GCN的个性化推荐系统，在近年来受到广泛关注的情况下，可以为双分网络中的链接预测产生成功的结果。此外，纯启发式度量标准，如结构扰动方法（SPM），也可以取得成功，而不依赖于任何学习过程。

更新时间: 2024-06-10 11:23:30

领域: cs.SI,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.06658v1

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying bias vulnerabilities within the safety fine-tuning and design a black-box jailbreak method named DRA (Disguise and Reconstruction Attack), which conceals harmful instructions through disguise and prompts the model to reconstruct the original harmful instruction within its completion. We evaluate DRA across various open-source and closed-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency. Notably, DRA boasts a 91.1% attack success rate on OpenAI GPT-4 chatbot.

Updated: 2024-06-10 11:20:43

标题: 让他们提问和回答：通过掩饰和重建在少数查询中越狱大型语言模型

摘要: 近年来，大型语言模型（LLMs）在各种任务中取得了显著的成功，但LLMs的可信度仍然是一个悬而未决的问题。一个特定的威胁是可能生成有毒或有害回应。攻击者可以制作对抗性提示，诱导LLMs产生有害回应。在这项工作中，我们通过识别安全微调中的偏见漏洞，开创了LLMs安全领域的理论基础，并设计了一种名为DRA（伪装和重构攻击）的黑盒越狱方法，通过伪装隐藏有害指令，并提示模型在完成时重构原始有害指令。我们评估了DRA在各种开源和闭源模型上的效果，展示了最先进的越狱成功率和攻击效率。值得注意的是，DRA在OpenAI GPT-4聊天机器人上拥有91.1%的攻击成功率。

更新时间: 2024-06-10 11:20:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2402.18104v2

Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110

Policy documents, such as legislation, regulations, and executive orders, are crucial in shaping society. However, their length and complexity make interpretation and application challenging and time-consuming. Artificial intelligence (AI), particularly large language models (LLMs), has the potential to automate the process of analyzing these documents, improving accuracy and efficiency. This study aims to evaluate the potential of AI in streamlining policy analysis and to identify the strengths and limitations of current AI approaches. The research focuses on question answering and tasks involving content extraction from policy documents. A case study was conducted using Executive Order 14110 on "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence" as a test case. Four commercial AI systems were used to analyze the document and answer a set of representative policy questions. The performance of the AI systems was compared to manual analysis conducted by human experts. The study found that two AI systems, Gemini 1.5 Pro and Claude 3 Opus, demonstrated significant potential for supporting policy analysis, providing accurate and reliable information extraction from complex documents. They performed comparably to human analysts but with significantly higher efficiency. However, achieving reproducibility remains a challenge, necessitating further research and development.

Updated: 2024-06-10 11:19:28

标题: 利用人工智能进行复杂政策文件的高效分析：以第14110号行政命令为例研究

摘要: 政策文件，如法律、法规和行政命令，在塑造社会方面至关重要。然而，它们的长度和复杂性使解释和应用变得具有挑战性和耗时。人工智能（AI），特别是大型语言模型（LLM），有潜力自动化分析这些文件的过程，提高准确性和效率。本研究旨在评估AI在简化政策分析方面的潜力，并识别当前AI方法的优势和局限性。该研究侧重于问答和涉及从政策文件中提取内容的任务。使用行政命令14110号“人工智能的安全、可靠和可信开发与使用”作为测试案例进行了案例研究。使用了四个商业AI系统来分析文件并回答一组代表性的政策问题。AI系统的性能与人类专家进行的手动分析进行了比较。研究发现，Gemini 1.5 Pro和Claude 3 Opus这两个AI系统展示了在支持政策分析方面的显著潜力，能够从复杂文件中提取准确可靠的信息。它们的表现与人类分析师相当，但效率显著更高。然而，实现可重复性仍然是一个挑战，需要进一步的研究和发展。

更新时间: 2024-06-10 11:19:28

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.06657v1

XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX

Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.

Updated: 2024-06-10 11:13:06

标题: XLand-MiniGrid：使用JAX构建可扩展的元强化学习环境

摘要: 受到XLand的多样性和深度以及MiniGrid的简单和极简主义的启发，我们提出了XLand-MiniGrid，这是一个用于元强化学习研究的一套工具和网格世界环境。XLand-MiniGrid使用JAX编写，旨在具有高度可扩展性，并且潜在地可以在GPU或TPU加速器上运行，使得在资源有限的情况下可以进行大规模实验。除了环境外，XLand-MiniGrid还提供了预先采样的基准测试，包含数百万个不同难度的任务，以及易于使用的基准线，让用户可以快速开始训练适应性智能体。此外，我们进行了一个关于扩展和泛化的初步分析，表明我们的基准线在训练过程中能够达到每秒的百万步，并验证了所提出的基准测试是具有挑战性的。

更新时间: 2024-06-10 11:13:06

领域: cs.LG

下载: http://arxiv.org/abs/2312.12044v3

End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability

In the course of the energy transition, the expansion of generation and consumption will change, and many of these technologies, such as PV systems, electric cars and heat pumps, will influence the power flow, especially in the distribution grids. Scalable methods that can make decisions for each grid connection are needed to enable congestion-free grid operation in the distribution grids. This paper presents a novel end-to-end approach to resolving congestion in distribution grids with deep reinforcement learning. Our architecture learns to curtail power and set appropriate reactive power to determine a non-congested and, thus, feasible grid state. State-of-the-art methods such as the optimal power flow (OPF) demand high computational costs and detailed measurements of every bus in a grid. In contrast, the presented method enables decisions under sparse information with just some buses observable in the grid. Distribution grids are generally not yet fully digitized and observable, so this method can be used for decision-making on the majority of low-voltage grids. On a real low-voltage grid the approach resolves 100\% of violations in the voltage band and 98.8\% of asset overloads. The results show that decisions can also be made on real grids that guarantee sufficient quality for congestion-free grid operation.

Updated: 2024-06-10 11:04:04

标题: 具有部分测量可用性的疗效缩减的端到端强化学习

摘要: 在能源转型过程中，发电和消费的扩张将发生变化，许多这些技术，如光伏系统、电动汽车和热泵，将影响电力流，特别是在配电网中。需要可扩展的方法来为每个电网连接做出决策，以实现无拥塞的电网运行。本文提出了一种利用深度强化学习解决配电网拥塞问题的全新方法。我们的架构学习如何限制功率并设置适当的无功功率，以确定一个非拥塞且可行的电网状态。最先进的方法，如最优功率流（OPF），需要高昂的计算成本和对电网中每个母线的详细测量。相比之下，所提出的方法允许在稀疏信息下做出决策，只需观察到电网中的一些母线。配电网通常尚未完全数字化和可观测，因此这种方法可以用于对大多数低压电网进行决策。在一个真实的低压电网中，该方法解决了100\%的电压带内违规和98.8\%的资产超载。结果显示，可以在真实电网上做出决策，以保证足够质量的无拥塞电网运行。

更新时间: 2024-06-10 11:04:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03262v2

Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Rate-distortion optimization through neural networks has accomplished competitive results in compression efficiency and image quality. This learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality by automatically extracting and retaining crucial information, while discarding less critical details. A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model, enhancing compression by capturing complex data dependencies. This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure. We demonstrate as L increases that a trainable prior is detrimental and explore a common dimensionality along the distinct latent variables to boost compression performance. As this structured framework can represent autoregressive coders, we outperform the hyperprior model and achieve state-of-the-art performance while reducing substantially the computational cost. Our experimental evaluation is performed on wind turbine scenarios to study its application on visual inspections

Updated: 2024-06-10 11:00:26

标题: 广义嵌套潜变量模型在应用于风力涡轮机场景的损失编码中的应用

摘要: 通过神经网络进行速率失真优化在压缩效率和图像质量方面取得了竞争性结果。这种基于学习的方法旨在通过自动提取和保留关键信息，同时丢弃较不重要的细节，从而最小化压缩速率和重建图像质量之间的折衷。一种成功的技术是引入一个深度超先验，它在一个2级嵌套潜变量模型内运作，通过捕获复杂的数据依赖性增强压缩。本文通过设计一个具有马尔可夫链结构的广义L级嵌套生成模型来扩展这一概念。我们展示当L增加时，可训练先验是有害的，并探索不同潜变量之间的共同维度，以提高压缩性能。由于这种结构化框架可以代表自回归编码器，我们在减少计算成本的同时，胜过超先验模型并实现了最先进的性能。我们的实验评估是在风力涡轮机场景上进行的，以研究其在视觉检查中的应用。

更新时间: 2024-06-10 11:00:26

领域: cs.CV,cs.AI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2406.06165v1

Time to Separate from StackOverflow and Match with ChatGPT for Encryption

Cryptography is known as a challenging topic for developers. We studied StackOverflow posts to identify the problems that developers encounter when using Java Cryptography Architecture (JCA) for symmetric encryption. We investigated security risks that are disseminated in these posts, and we examined whether ChatGPT helps avoid cryptography issues. We found that developers frequently struggle with key and IV generations, as well as padding. Security is a top concern among developers, but security issues are pervasive in code snippets. ChatGPT can effectively aid developers when they engage with it properly. Nevertheless, it does not substitute human expertise, and developers should remain alert.

Updated: 2024-06-10 10:56:59

标题: 与StackOverflow分开并与ChatGPT匹配进行加密

摘要: 密码学被认为是开发人员的一个具有挑战性的话题。我们研究了StackOverflow帖子，以确定开发人员在使用Java密码体系结构（JCA）进行对称加密时遇到的问题。我们调查了这些帖子中传播的安全风险，并检查了ChatGPT是否有助于避免加密问题。我们发现开发人员经常在密钥和IV生成以及填充方面遇到困难。安全是开发人员的首要关注点，但安全问题在代码片段中普遍存在。当开发人员正确使用ChatGPT时，它可以有效地帮助他们。然而，它并不能替代人类专业知识，开发人员应保持警惕。

更新时间: 2024-06-10 10:56:59

领域: cs.CR

下载: http://arxiv.org/abs/2406.06164v1

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Vision and language model (VLM) decoders are currently the best-performing architectures on multimodal tasks. Next to predictions, they can also produce explanations, either in post-hoc or CoT settings. However, it is not clear how much they use the vision and text modalities when generating predictions or explanations. In this work, we investigate if VLMs rely on modalities differently when they produce explanations as opposed to providing answers. We also evaluate the self-consistency of VLM decoders in both post-hoc and CoT explanation settings, by extending existing unimodal tests and measures to VLM decoders. We find that VLMs are less self-consistent than LLMs. Text contributions in VL decoders are more important than image contributions in all examined tasks. Moreover, the contributions of images are significantly stronger for explanation generation compared to answer generation. This difference is even larger in CoT compared to post-hoc explanations. Lastly, we provide an up-to-date benchmarking of state-of-the-art VL decoders on the VALSE benchmark, which before only covered VL encoders. We find that VL decoders still struggle with most phenomena tested by VALSE.

Updated: 2024-06-10 10:43:20

标题: 这个标题的翻译是：视觉和语言解码器是否平等使用图像和文本？它们的解释有多自洽？

摘要: 视觉和语言模型（VLM）解码器目前是多模式任务中表现最佳的架构。除了预测外，它们还可以在事后或CoT设置中产生解释。然而，目前并不清楚它们在生成预测或解释时使用了多少视觉和文本模态。在这项工作中，我们调查了VLM在生成解释时是否依赖于不同的模态，与提供答案时相比。我们还通过将现有的单模态测试和措施扩展到VLM解码器，评估了VLM解码器在事后和CoT解释设置中的自我一致性。我们发现VLM比LLM缺乏自我一致性。在所有检查的任务中，VL解码器中的文本贡献比图像贡献更重要。此外，与回答生成相比，图像在解释生成中的贡献显著更大。这种差异在CoT解释中比在事后解释中更大。最后，我们在VALSE基准测试中提供了最新的最先进的VL解码器的基准测试，该基准测试之前只覆盖了VL编码器。我们发现VL解码器仍然在VALSE测试中大多数现象上面临挑战。

更新时间: 2024-06-10 10:43:20

领域: cs.CL,cs.AI,cs.CV,cs.LG,68Txx,I.2.7; I.2.10

下载: http://arxiv.org/abs/2404.18624v2

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between lazy and rich learning, precisely elucidating how unbalanced layer-specific initialization variances and learning rates determine the degree of feature learning. Our analysis reveals that they conspire to influence the learning regime through a set of conserved quantities that constrain and modify the geometry of learning trajectories in parameter and function space. We extend our analysis to more complex linear models with multiple neurons, outputs, and layers and to shallow nonlinear networks with piecewise linear activation functions. In linear networks, rapid feature learning only occurs with balanced initializations, where all layers learn at similar speeds. While in nonlinear networks, unbalanced initializations that promote faster learning in earlier layers can accelerate rich learning. Through a series of experiments, we provide evidence that this unbalanced rich regime drives feature learning in deep finite-width networks, promotes interpretability of early layers in CNNs, reduces the sample complexity of learning hierarchical data, and decreases the time to grokking in modular arithmetic. Our theory motivates further exploration of unbalanced initializations to enhance efficient feature learning.

Updated: 2024-06-10 10:42:37

标题: 快速致富：确切解决方案揭示了不平衡的初始化如何促进快速特征学习

摘要: 现代神经网络的出色性能常常归因于它们能够高效地从数据中提取与任务相关的特征，然而支撑这一丰富特征学习机制的机制仍然难以捉摸，我们的理论理解大部分来自相反的懒惰机制。在这项工作中，我们推导出一个过渡于懒惰和丰富学习之间的最小模型的精确解，准确阐明了不平衡的层特定初始化方差和学习率如何决定特征学习的程度。我们的分析揭示了它们通过一组守恒量来影响学习机制，这些守恒量限制并修改了参数和函数空间中学习轨迹的几何形状。我们将我们的分析扩展到具有多个神经元、输出和层的更复杂的线性模型，以及具有分段线性激活函数的浅非线性网络。在线性网络中，只有在平衡初始化的情况下才会出现快速特征学习，所有层学习速度相似。而在非线性网络中，促进早期层快速学习的不平衡初始化可以加速丰富学习。通过一系列实验，我们提供证据表明这种不平衡的丰富学习机制推动了深度有限宽度网络中的特征学习，促进了CNN中早期层的可解释性，降低了学习分层数据的样本复杂性，并缩短了模块化算术的理解时间。我们的理论激励进一步探索不平衡初始化以增强高效特征学习。

更新时间: 2024-06-10 10:42:37

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.06158v1

Bandit Convex Optimisation

Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. These notes cover the many tools used for this problem, including cutting plane methods, interior point methods, continuous exponential weights, gradient descent and online Newton step. The nuances between the many assumptions and setups are explained. Although there is not much truly new here, some existing tools are applied in novel ways to obtain new algorithms. A few bounds are improved in minor ways.

Updated: 2024-06-10 10:38:59

标题: 强盗凸优化

摘要: 赌徒凸优化是研究零阶凸优化的基本框架。这些笔记涵盖了用于解决这个问题的许多工具，包括切割平面方法、内点方法、连续指数权重、梯度下降和在线牛顿步。解释了许多假设和设置之间的微妙差别。虽然这里没有太多真正新颖的内容，但一些现有工具以新颖的方式应用，从而获得新的算法。有一些边界在某种程度上得到改善。

更新时间: 2024-06-10 10:38:59

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.06535v2

Gameful Introduction to Cryptography for Dyslexic Students

Cryptography has a pivotal role in securing our digital world. Nonetheless, it is a challenging topic to learn. In this paper, we show that despite its complex nature, dyslexia$-$a learning disorder that influences reading and writing skills$-$does not hinder one's ability to comprehend cryptography. In particular, we conducted a gameful workshop with 14 high-school dyslexic students and taught them fundamental encryption methods. The students engaged well, learned the techniques, and enjoyed the training. We conclude that with a proper approach, dyslexia cannot hinder learning a complex subject such as cryptography.

Updated: 2024-06-10 10:30:43

标题: 《面向阅读障碍学生的游戏化密码学介绍》

摘要: 密码学在保护我们的数字世界中起着至关重要的作用。然而，它是一个具有挑战性的学习主题。在这篇论文中，我们展示了尽管其复杂性，阅读障碍-一种影响阅读和写作能力的学习障碍-并不妨碍人们理解密码学。特别地，我们与14名高中阅读障碍学生进行了一场有趣的研讨会，并教授了他们基本的加密方法。学生们参与积极，学会了这些技术，并享受了培训过程。我们得出结论，通过适当的方法，阅读障碍不会妨碍学习密码学这样复杂的学科。

更新时间: 2024-06-10 10:30:43

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2406.06153v1

Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features

We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.

Updated: 2024-06-10 10:25:14

标题: 弱特征的测试风险的随机梯度流动动力学及其精确解

摘要: 我们研究了学习理论中连续时间随机梯度流动力学的测试风险。使用路径积分公式，我们在小学习率的情况下提供了一个通用公式，用于计算纯梯度和随机梯度流动的测试风险曲线之间的差异。我们将这一通用理论应用于一个简单的弱特征模型，该模型展示了双谷现象，并明确计算了动态中添加随机项带来的修正，作为时间和模型参数的函数。分析结果与离散时间随机梯度下降的模拟进行了比较，并显示出良好的一致性。

更新时间: 2024-06-10 10:25:14

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2402.07626v2

MLLMReID: Multimodal Large Language Model-based Person Re-identification

Multimodal large language models (MLLM) have achieved satisfactory results in many tasks. However, their performance in the task of ReID (ReID) has not been explored to date. This paper will investigate how to adapt them for the task of ReID. An intuitive idea is to fine-tune MLLM with ReID image-text datasets, and then use their visual encoder as a backbone for ReID. However, there still exist two apparent issues: (1) Designing instructions for ReID, MLLMs may overfit specific instructions, and designing a variety of instructions will lead to higher costs. (2) When fine-tuning the visual encoder of a MLLM, it is not trained synchronously with the ReID task. As a result, the effectiveness of the visual encoder fine-tuning cannot be directly reflected in the performance of the ReID task. To address these problems, this paper proposes MLLMReID: Multimodal Large Language Model-based ReID. Firstly, we proposed Common Instruction, a simple approach that leverages the essence ability of LLMs to continue writing, avoiding complex and diverse instruction design. Secondly, we propose a multi-task learning-based synchronization module to ensure that the visual encoder of the MLLM is trained synchronously with the ReID task. The experimental results demonstrate the superiority of our method.

Updated: 2024-06-10 10:21:19

标题: MLLMReID: 基于多模态大型语言模型的人员再识别

摘要: 多模态大型语言模型（MLLM）在许多任务中取得了令人满意的结果。然而，在ReID（重新识别）任务中，它们的表现尚未得到探索。本文将调查如何将它们适应于ReID任务。一个直观的想法是使用ReID图像文本数据集对MLLM进行微调，然后将它们的视觉编码器作为ReID的主干。然而，仍然存在两个明显的问题：（1）设计ReID指导时，MLLM可能会过度拟合特定指导，而设计各种指导会导致更高的成本。（2）在微调MLLM的视觉编码器时，它与ReID任务并不同步训练。因此，视觉编码器微调的有效性无法直接反映在ReID任务的性能中。为了解决这些问题，本文提出了MLLMReID：基于多模态大型语言模型的ReID。首先，我们提出了通用指导，这是一种利用LLM的本质能力继续编写的简单方法，避免了复杂和多样化的指导设计。其次，我们提出了基于多任务学习的同步模块，以确保MLLM的视觉编码器与ReID任务同步训练。实验结果证明了我们方法的优越性。

更新时间: 2024-06-10 10:21:19

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.13201v3

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Distribution matching methods for language model alignment such as Generation with Distributional Control (GDC) and Distributional Policy Gradient (DPG) have not received the same level of attention in reinforcement learning from human feedback (RLHF) as contrastive methods such as Sequence Likelihood Calibration (SLiC), Direct Preference Optimization (DPO) and its variants. We identify high variance of the gradient estimate as the primary reason for the lack of success of these methods and propose a self-normalized baseline to reduce the variance. We further generalize the target distribution in DPG, GDC and DPO by using Bayes' rule to define the reward-conditioned posterior. The resulting approach, referred to as BRAIn - Bayesian Reward-conditioned Amortized Inference acts as a bridge between distribution matching methods and DPO and significantly outperforms prior art in summarization and Antropic HH tasks.

Updated: 2024-06-10 10:18:46

标题: BRAIn: 基于贝叶斯奖励条件化摊销推断的自然语言生成文献

摘要: 分布匹配方法，如具有分布控制的生成（GDC）和分布策略梯度（DPG）等语言模型对齐方法，在强化学习中并没有像对比方法（如序列可能性校准（SLiC），直接偏好优化（DPO）及其变体）那样受到人类反馈的同等关注。我们确定梯度估计的高方差是这些方法缺乏成功的主要原因，并提出了一种自标准化基线以减少方差。我们进一步通过使用Bayes定律将目标分布在DPG、GDC和DPO中进行了泛化，以定义奖励条件后验。由此产生的方法，称为BRAIn - 基于贝叶斯奖励条件化推理，充当分布匹配方法和DPO之间的桥梁，并在摘要和Antropic HH任务中明显优于先前的研究成果。

更新时间: 2024-06-10 10:18:46

领域: cs.LG,cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2402.02479v2

Physics-Informed Bayesian Optimization of Variational Quantum Circuits

In this paper, we propose a novel and powerful method to harness Bayesian optimization for Variational Quantum Eigensolvers (VQEs) -- a hybrid quantum-classical protocol used to approximate the ground state of a quantum Hamiltonian. Specifically, we derive a VQE-kernel which incorporates important prior information about quantum circuits: the kernel feature map of the VQE-kernel exactly matches the known functional form of the VQE's objective function and thereby significantly reduces the posterior uncertainty. Moreover, we propose a novel acquisition function for Bayesian optimization called Expected Maximum Improvement over Confident Regions (EMICoRe) which can actively exploit the inductive bias of the VQE-kernel by treating regions with low predictive uncertainty as indirectly ``observed''. As a result, observations at as few as three points in the search domain are sufficient to determine the complete objective function along an entire one-dimensional subspace of the optimization landscape. Our numerical experiments demonstrate that our approach improves over state-of-the-art baselines.

Updated: 2024-06-10 10:17:06

标题: 物理学信息贝叶斯优化变分量子电路

摘要: 在这篇论文中，我们提出了一种新颖而强大的方法，利用贝叶斯优化来为变分量子本征求解器（VQE）服务 - 一种用于近似量子哈密顿量基态的混合量子-经典协议。具体来说，我们推导了一个VQE-kernel，其中包含重要的有关量子电路的先验信息：VQE-kernel的核特征映射与已知的VQE目标函数的函数形式完全匹配，从而显著降低了后验不确定性。此外，我们提出了一种名为期望最大改进覆盖区域（EMICoRe）的贝叶斯优化新型获取函数，可以通过将具有低预测不确定性的区域视为间接“观察到”，积极利用VQE-kernel的归纳偏差。结果，仅在搜索域中的三个点的观察就足以确定整个一维优化景观中的目标函数。我们的数值实验表明，我们的方法改进了现有技术基线。

更新时间: 2024-06-10 10:17:06

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2406.06150v1

Asymptotics of Learning with Deep Structured (Random) Features

For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.

Updated: 2024-06-10 10:16:19

标题: 深度结构化（随机）特征学习的渐近性

摘要: 对于一大类特征映射，我们提供了一个紧密的渐近特性描述，该描述与学习输出层相关的测试错误，在高维极限下，输入维数、隐藏层宽度和训练样本数量成比例增加。这种特性描述是以特征的总体协方差来形成的。我们的工作部分受到了使用高斯彩虹神经网络进行学习的问题的启发，即具有随机但结构化权重的深度非线性全连接网络，其按行协方差进一步允许依赖于前几层的权重。对于这样的网络，我们还导出了一个以权重矩阵为基础的特征协方差的封闭形式公式。此外，我们发现在某些情况下，我们的结果可以捕捉由使用梯度下降训练的有限宽度神经网络学习的特征映射。

更新时间: 2024-06-10 10:16:19

领域: stat.ML,cond-mat.dis-nn,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2402.13999v2

Decoupled Marked Temporal Point Process using Neural Ordinary Differential Equations

A Marked Temporal Point Process (MTPP) is a stochastic process whose realization is a set of event-time data. MTPP is often used to understand complex dynamics of asynchronous temporal events such as money transaction, social media, healthcare, etc. Recent studies have utilized deep neural networks to capture complex temporal dependencies of events and generate embedding that aptly represent the observed events. While most previous studies focus on the inter-event dependencies and their representations, how individual events influence the overall dynamics over time has been under-explored. In this regime, we propose a Decoupled MTPP framework that disentangles characterization of a stochastic process into a set of evolving influences from different events. Our approach employs Neural Ordinary Differential Equations (Neural ODEs) to learn flexible continuous dynamics of these influences while simultaneously addressing multiple inference problems, such as density estimation and survival rate computation. We emphasize the significance of disentangling the influences by comparing our framework with state-of-the-art methods on real-life datasets, and provide analysis on the model behavior for potential applications.

Updated: 2024-06-10 10:15:32

标题: 使用神经常微分方程的解耦标记时间点过程

摘要: 一种显著的时间点过程（MTPP）是一个随机过程，其实现是一组事件时间数据。 MTPP通常用于理解异步时间事件的复杂动态，如金钱交易、社交媒体、医疗保健等。最近的研究利用深度神经网络捕捉事件的复杂时间依赖关系，并生成适当表示观察到的事件的嵌入。虽然大多数先前的研究都集中在事件间的依赖关系及其表示上，但个别事件如何影响随时间变化的整体动态尚未得到充分探讨。在这种情况下，我们提出了一种解耦MTPP框架，将随机过程的表征分解为来自不同事件的一组不断发展的影响。我们的方法采用神经常微分方程（Neural ODEs）来学习这些影响的灵活连续动态，同时解决多个推断问题，如密度估计和生存率计算。我们通过将我们的框架与现实生活数据集上的最先进方法进行比较，并对潜在应用的模型行为进行分析，强调了分解影响的重要性。

更新时间: 2024-06-10 10:15:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06149v1

Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications

The success of Large Language Models (LLMs) has led to a parallel rise in the development of Large Multimodal Models (LMMs), which have begun to transform a variety of applications. These sophisticated multimodal models are designed to interpret and analyze complex data by integrating multiple modalities such as text and images, thereby opening new avenues for a range of applications. This paper investigates the applicability and effectiveness of prompt-engineered LMMs that process both images and text, including models such as LLaVA, BakLLaVA, Moondream, Gemini-pro-vision, and GPT-4o, compared to fine-tuned Vision Transformer (ViT) models in addressing critical security challenges. We focus on two distinct security tasks: 1) a visually evident task of detecting simple triggers, such as small pixel variations in images that could be exploited to access potential backdoors in the models, and 2) a visually non-evident task of malware classification through visual representations. In the visually evident task, some LMMs, such as Gemini-pro-vision and GPT-4o, have demonstrated the potential to achieve good performance with careful prompt engineering, with GPT-4o achieving the highest accuracy and F1-score of 91.9\% and 91\%, respectively. However, the fine-tuned ViT models exhibit perfect performance in this task due to its simplicity. For the visually non-evident task, the results highlight a significant divergence in performance, with ViT models achieving F1-scores of 97.11\% in predicting 25 malware classes and 97.61\% in predicting 5 malware families, whereas LMMs showed suboptimal performance despite iterative prompt improvements. This study not only showcases the strengths and limitations of prompt-engineered LMMs in cybersecurity applications but also emphasizes the unmatched efficacy of fine-tuned ViT models for precise and dependable tasks.

Updated: 2024-06-10 10:07:24

标题: 评估快速工程化的大型多模态模型与微调的视觉变换器在基于图像的安全应用中的有效性

摘要: 大型语言模型（LLMs）的成功导致了大型多模态模型（LMMs）的发展，这些模型已开始改变各种应用。这些复杂的多模态模型旨在通过集成文本和图像等多种模态来解释和分析复杂数据，从而为各种应用开辟了新的途径。本文调查了处理图像和文本的提示工程化LMMs（如LLaVA、BakLLaVA、Moondream、Gemini-pro-vision和GPT-4o）在处理关键安全挑战时的适用性和有效性，与微调的Vision Transformer（ViT）模型进行比较。我们关注两个不同的安全任务：1）检测简单触发器的视觉明显任务，例如图像中的小像素变化可能被利用来访问模型中的潜在后门；2）通过视觉表示进行恶意软件分类的视觉不明显任务。在视觉明显任务中，一些LMMs（如Gemini-pro-vision和GPT-4o）通过精心设计的提示工程表现出潜力，其中GPT-4o分别实现了最高准确率和F1分数为91.9\%和91\%。然而，由于其简单性，微调的ViT模型在此任务中表现出完美性能。对于视觉不明显任务，结果突显了性能上的显著差异，ViT模型在预测25种恶意软件类别时实现了97.11\%的F1分数，在预测5种恶意软件家族时实现了97.61\%，而LMMs尽管进行了迭代提示改进，但表现不佳。这项研究不仅展示了提示工程化LMMs在网络安全应用中的优势和局限性，还强调了微调的ViT模型在精确和可靠任务中的无与伦比的功效。

更新时间: 2024-06-10 10:07:24

领域: cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2403.17787v2

How to Benchmark Vision Foundation Models for Semantic Segmentation?

Recent vision foundation models (VFMs) have demonstrated proficiency in various tasks but require supervised fine-tuning to perform the task of semantic segmentation effectively. Benchmarking their performance is essential for selecting current models and guiding future model developments for this task. The lack of a standardized benchmark complicates comparisons. Therefore, the primary objective of this paper is to study how VFMs should be benchmarked for semantic segmentation. To do so, various VFMs are fine-tuned under various settings, and the impact of individual settings on the performance ranking and training time is assessed. Based on the results, the recommendation is to fine-tune the ViT-B variants of VFMs with a 16x16 patch size and a linear decoder, as these settings are representative of using a larger model, more advanced decoder and smaller patch size, while reducing training time by more than 13 times. Using multiple datasets for training and evaluation is also recommended, as the performance ranking across datasets and domain shifts varies. Linear probing, a common practice for some VFMs, is not recommended, as it is not representative of end-to-end fine-tuning. The benchmarking setup recommended in this paper enables a performance analysis of VFMs for semantic segmentation. The findings of such an analysis reveal that pretraining with promptable segmentation is not beneficial, whereas masked image modeling (MIM) with abstract representations is crucial, even more important than the type of supervision used. The code for efficiently fine-tuning VFMs for semantic segmentation can be accessed through the project page at: https://tue-mps.github.io/benchmark-vfm-ss/.

Updated: 2024-06-10 10:05:01

标题: 如何对语义分割视觉基础模型进行基准测试？

摘要: 最近的视觉基础模型（VFMs）已经展示出在各种任务中的熟练表现，但需要监督微调才能有效地执行语义分割任务。对它们的性能进行基准测试对于选择当前模型并指导未来模型发展至关重要。缺乏标准化基准使比较变得复杂。因此，本文的主要目标是研究如何对VFMs进行语义分割的基准测试。为此，在不同设置下对各种VFMs进行微调，并评估单个设置对性能排名和训练时间的影响。根据结果，建议使用ViT-B变种的VFMs进行微调，采用16x16补丁大小和线性解码器，因为这些设置代表了使用更大的模型、更先进的解码器和更小的补丁大小，同时能够减少训练时间超过13倍。还建议使用多个数据集进行训练和评估，因为跨数据集和领域转移的性能排名会有所不同。不建议使用线性探测，因为它不代表端到端微调。本文推荐的基准测试设置可以对VFMs进行语义分割的性能分析。这样的分析结果显示，预训练具有提示性的分割并不有益，而具有抽象表示的遮罩图像建模（MIM）则至关重要，甚至比所使用的监督类型更重要。可以通过项目页面（https://tue-mps.github.io/benchmark-vfm-ss/）访问用于高效微调VFMs进行语义分割的代码。

更新时间: 2024-06-10 10:05:01

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.12172v2

Language Models Resist Alignment

Large language models (LLMs) may exhibit undesirable behaviors. Recent efforts have focused on aligning these models to prevent harmful generation. Despite these efforts, studies have shown that even a well-conducted alignment process can be easily circumvented, whether intentionally or accidentally. Do alignment fine-tuning have robust effects on models, or are merely superficial? In this work, we answer this question through both theoretical and empirical means. Empirically, we demonstrate the elasticity of post-alignment models, i.e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning. Using compression theory, we formally derive that such fine-tuning process \textit{disproportionately} undermines alignment compared to pre-training, potentially by orders of magnitude. We conduct experimental validations to confirm the presence of elasticity across models of varying types and sizes. Specifically, we find that model performance declines rapidly before reverting to the pre-training distribution, after which the rate of decline drops significantly. We further reveal that elasticity positively correlates with increased model size and the expansion of pre-training data. Our discovery signifies the importance of taming the inherent elasticity of LLMs, thereby overcoming the resistance of LLMs to alignment finetuning.

Updated: 2024-06-10 10:03:16

标题: 语言模型抗拒对齐

摘要: 大型语言模型（LLMs）可能表现出不良行为。最近的努力集中在对齐这些模型以防止有害生成。尽管已经做出了这些努力，研究表明，即使进行了良好的对齐过程，模型也很容易被故意或意外地规避。对齐微调是否对模型产生强大的影响，还是仅仅表面上？在这项工作中，我们通过理论和实证手段回答了这个问题。从实证角度看，我们展示了后对齐模型的弹性，即在进一步微调时倾向于恢复在预训练阶段形成的行为分布。使用压缩理论，我们正式推导出这种微调过程与预训练相比\textit{不成比例}地损害对齐，潜在地相差数倍。我们进行实验证实以确认各种类型和规模的模型之间存在弹性。具体而言，我们发现在恢复到预训练分布之前，模型性能迅速下降，之后下降速度显著降低。我们进一步揭示，弹性与增加的模型大小和预训练数据扩展呈正相关。我们的发现表明了驯服LLMs固有的弹性的重要性，从而克服LLMs对齐微调的抵抗。

更新时间: 2024-06-10 10:03:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06144v1

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a single backward pass. Through extensive evaluations against existing methods on LLaMa 2, Mixtral 8x7b, Flan-T5 and vision transformer architectures, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an LRP library at https://github.com/rachtibat/LRP-eXplains-Transformers.

Updated: 2024-06-10 09:58:55

标题: AttnLRP：面向Transformer的注意力感知逐层相关性传播

摘要: 大型语言模型容易产生偏见预测和幻觉，强调理解其模型内部推理过程的至关重要性。然而，为整个黑匣子变压器模型实现忠实的归因并保持计算效率是一个尚未解决的挑战。通过将逐层相关性传播归因方法扩展到处理注意力层，我们有效地解决了这些挑战。尽管存在部分解决方案，我们的方法是第一个能够忠实和全面地归因变压器模型的输入和潜在表示的方法，其计算效率类似于单次反向传播。通过在LLaMa 2、Mixtral 8x7b、Flan-T5和视觉变换器架构上对现有方法进行广泛评估，我们证明了我们提出的方法在忠实性方面超越了替代方法，并实现了对潜在表示的理解，为基于概念的解释打开了大门。我们提供了一个LRP库，网址为https://github.com/rachtibat/LRP-eXplains-Transformers。

更新时间: 2024-06-10 09:58:55

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.05602v2

Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm

Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.

Updated: 2024-06-10 09:57:30

标题: Fed-Sophia：一种通信高效的二阶联邦学习算法

摘要: 联邦学习是一种机器学习方法，多个设备通过参数服务器共同学习，仅分享本地更新。虽然在这个领域广泛采用基于梯度的优化技术，但第二阶方法展示的曲率信息对于引导和加快收敛至关重要。本文介绍了一种可扩展的第二阶方法，允许在联邦大模型中采用曲率信息。我们的方法，被称为Fed-Sophia，将梯度的加权移动平均与剪切操作相结合，以找到下降方向。除此之外，还使用Hessian矩阵对角线的轻量级估计来整合曲率信息。数值评估显示，与一阶和二阶基准方法相比，所提出的Fed-Sophia方案具有更好的优越性、鲁棒性和可扩展性。

更新时间: 2024-06-10 09:57:30

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.06655v1

Can I understand what I create? Self-Knowledge Evaluation of Large Language Models

Large language models (LLMs) have achieved remarkable progress in linguistic tasks, necessitating robust evaluation frameworks to understand their capabilities and limitations. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework that is easy to implement, evaluating models on their ability to comprehend and respond to self-generated questions. Our findings, based on testing multiple models across diverse tasks, reveal significant gaps in the model's self-knowledge ability. Further analysis indicates these gaps may be due to misalignment with human attention mechanisms. Additionally, fine-tuning on self-generated math task may enhance the model's math performance, highlighting the potential of the framework for efficient and insightful model evaluation and may also contribute to the improvement of LLMs.

Updated: 2024-06-10 09:53:54

标题: 我可以理解我所创造的吗？大型语言模型的自我认识评估

摘要: 大型语言模型(LLMs)在语言任务中取得了显著进展，这需要健壮的评估框架来了解它们的能力和局限性。受费曼的理解通过创造原则的启发，我们引入了一个自我知识评估框架，易于实施，评估模型对于理解和回答自行生成的问题的能力。我们的研究结果基于在不同任务中测试多个模型，揭示了模型自我知识能力中的显著差距。进一步分析表明，这些差距可能是由于与人类注意力机制的不匹配。此外，在自行生成的数学任务上微调可能会增强模型的数学表现，突显了该框架对于有效和深入地评估模型的潜力，也可以促进LLMs的改进。

更新时间: 2024-06-10 09:53:54

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06140v1

Thunder : Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge

Diffusion-based speech enhancement has shown promising results, but can suffer from a slower inference time. Initializing the diffusion process with the enhanced audio generated by a regression-based model can be used to reduce the computational steps required. However, these approaches often necessitate a regression model, further increasing the system's complexity. We propose Thunder, a unified regression-diffusion model that utilizes the Brownian bridge process which can allow the model to act in both modes. The regression mode can be accessed by setting the diffusion time step closed to 1. However, the standard score-based diffusion modeling does not perform well in this setup due to gradient instability. To mitigate this problem, we modify the diffusion model to predict the clean speech instead of the score function, achieving competitive performance with a more compact model size and fewer reverse steps.

Updated: 2024-06-10 09:52:25

标题: 雷声：使用布朗桥进行统一回归扩散单向逆向步骤语音增强

摘要: 基于扩散的语音增强显示出有希望的结果，但可能会受到推理时间较慢的影响。使用基于回归模型生成的增强音频初始化扩散过程可以减少所需的计算步骤。然而，这些方法通常需要一个回归模型，进一步增加了系统的复杂性。我们提出了Thunder，一个统一的回归-扩散模型，利用布朗桥过程，使模型能够同时在两种模式下运行。通过将扩散时间步骤设置为接近1，可以访问回归模式。然而，在这种设置下，基于标准分数的扩散建模由于梯度不稳定而表现不佳。为了缓解这个问题，我们修改了扩散模型，以预测干净语音而不是分数函数，实现了与更紧凑的模型尺寸和更少的反向步骤相比具有竞争性的性能。

更新时间: 2024-06-10 09:52:25

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2406.06139v1

A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis

Texture, a significant visual attribute in images, has been extensively investigated across various image recognition applications. Convolutional Neural Networks (CNNs), which have been successful in many computer vision tasks, are currently among the best texture analysis approaches. On the other hand, Vision Transformers (ViTs) have been surpassing the performance of CNNs on tasks such as object recognition, causing a paradigm shift in the field. However, ViTs have so far not been scrutinized for texture recognition, hindering a proper appreciation of their potential in this specific setting. For this reason, this work explores various pre-trained ViT architectures when transferred to tasks that rely on textures. We review 21 different ViT variants and perform an extensive evaluation and comparison with CNNs and hand-engineered models on several tasks, such as assessing robustness to changes in texture rotation, scale, and illumination, and distinguishing color textures, material textures, and texture attributes. The goal is to understand the potential and differences among these models when directly applied to texture recognition, using pre-trained ViTs primarily for feature extraction and employing linear classifiers for evaluation. We also evaluate their efficiency, which is one of the main drawbacks in contrast to other methods. Our results show that ViTs generally outperform both CNNs and hand-engineered models, especially when using stronger pre-training and tasks involving in-the-wild textures (images from the internet). We highlight the following promising models: ViT-B with DINO pre-training, BeiTv2, and the Swin architecture, as well as the EfficientFormer as a low-cost alternative. In terms of efficiency, although having a higher number of GFLOPs and parameters, ViT-B and BeiT(v2) can achieve a lower feature extraction time on GPUs compared to ResNet50.

Updated: 2024-06-10 09:48:13

标题: 一项纹理分析中用于特征提取的视觉Transformer的比较调查

摘要: 纹理是图像中一个重要的视觉属性，已经在各种图像识别应用中得到了广泛的研究。卷积神经网络（CNNs），在许多计算机视觉任务中取得了成功，目前是最好的纹理分析方法之一。另一方面，Vision Transformers（ViTs）在诸如物体识别等任务上的性能已经超越了CNNs，引起了该领域的范式转变。然而，迄今为止，ViTs还没有被用于纹理识别，这妨碍了对它们在这一特定环境中潜力的正确评估。因此，本文探讨了各种预训练的ViT架构在依赖纹理的任务中的应用。我们回顾了21种不同的ViT变体，并在几个任务上进行了广泛的评估和与CNNs和手工设计模型的比较，如评估对纹理旋转、缩放和光照变化的鲁棒性，以及区分颜色纹理、材质纹理和纹理属性。我们的目标是了解这些模型在直接应用于纹理识别时的潜力和差异，主要使用预训练的ViTs进行特征提取，并使用线性分类器进行评估。我们还评估它们的效率，这是与其他方法相比的主要缺点之一。我们的结果显示，ViTs通常优于CNNs和手工设计模型，特别是在使用更强的预训练和涉及野外纹理的任务（互联网图像）时。我们强调以下有希望的模型：具有DINO预训练的ViT-B，BeiTv2和Swin架构，以及EfficientFormer作为低成本替代方案。在效率方面，尽管具有更高的GFLOPs和参数数量，与ResNet50相比，ViT-B和BeiT（v2）可以在GPU上实现更低的特征提取时间。

更新时间: 2024-06-10 09:48:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06136v1

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.

Updated: 2024-06-10 09:45:38

标题: DiffInject：通过基于扩散的风格注入重新审视去偏见的合成数据生成

摘要: 数据集偏差是机器学习中的一个重要挑战，特定属性，如图像的纹理或颜色，被无意中学习，导致性能下降。为了解决这个问题，先前的工作集中在通过开发新的去偏算法或生成合成数据来减轻普遍存在的数据集偏差。然而，迄今为止的生成方法主要依赖于使用数据集中特定偏差的样本，这些样本通常太少。在这项工作中，我们提出了DiffInject，一种简单而强大的方法，使用预训练扩散模型来增强合成的偏差冲突样本。这种方法通过操纵潜在空间显著促进了扩散模型用于去偏的目的。我们的框架不需要对偏差类型或标记有任何明确的知识，使其成为一个完全无监督的去偏设置。我们的方法论展示了在有效减少数据集偏差方面的显著结果。

更新时间: 2024-06-10 09:45:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06134v1

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

Modern metrics for generative learning like Fr\'echet Inception Distance (FID) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fr\'echet Wavelet Distance (FWD) as a domain-agnostic metric based on Wavelet Packet Transform ($W_p$). FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, along with preserving both spatial and textural aspects. Specifically, we use Wp to project generated and dataset images to packet coefficient space. Further, we compute Fr\'echet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network while being more interpretable because of frequency band transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD is able to generalize and improve robustness to domain shift and various corruptions compared to other metrics.

Updated: 2024-06-10 09:45:32

标题: 弗雷歇小波距离：一种面向领域的图像生成度量

摘要: 像Fr\'echet Inception Distance（FID）这样的生成式学习的现代度量标准展现出了令人印象深刻的性能。然而，它们存在各种缺陷，比如对特定生成器和数据集的偏见。为了解决这个问题，我们提出了基于小波包变换（$W_p$）的领域无关度量Fr\'echet Wavelet Distance（FWD）。FWD能够在高分辨率图像中跨越广泛频率范围，同时保留空间和纹理方面。具体来说，我们使用Wp将生成的和数据集图像投影到包系数空间。此外，我们计算Fr\'echet距离与结果系数以评估生成器的质量。这个度量标准是通用的，并且不依赖于任何预训练网络，同时由于频带透明性更具可解释性。最后，我们通过对各种生成器在不同数据集上的广泛评估得出结论，即所提出的FWD能够相对于其他度量泛化和提高对领域转移和各种破坏的稳健性。

更新时间: 2024-06-10 09:45:32

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2312.15289v2

Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems

Creating effective and reliable task-oriented dialog systems (ToDSs) is challenging, not only because of the complex structure of these systems, but also due to the scarcity of training data, especially when several modules need to be trained separately, each one with its own input/output training examples. Data augmentation (DA), whereby synthetic training examples are added to the training data, has been successful in other NLP systems, but has not been explored as extensively in ToDSs. We empirically evaluate the effectiveness of DA methods in an end-to-end ToDS setting, where a single system is trained to handle all processing stages, from user inputs to system outputs. We experiment with two ToDSs (UBAR, GALAXY) on two datasets (MultiWOZ, KVRET). We consider three types of DA methods (word-level, sentence-level, dialog-level), comparing eight DA methods that have shown promising results in ToDSs and other NLP systems. We show that all DA methods considered are beneficial, and we highlight the best ones, also providing advice to practitioners. We also introduce a more challenging few-shot cross-domain ToDS setting, reaching similar conclusions.

Updated: 2024-06-10 09:36:05

标题: 比较端到端任务导向对话系统的数据增强方法

摘要: 创建有效可靠的面向任务的对话系统（ToDS）具有挑战性，不仅因为这些系统的复杂结构，而且由于训练数据的稀缺，特别是当需要单独训练多个模块时，每个模块都有自己的输入/输出训练示例。数据增强（DA）通过向训练数据添加合成训练示例，在其他自然语言处理系统中取得了成功，但在ToDS中尚未得到广泛探索。我们在端到端ToDS设置中实证评估了DA方法的有效性，即训练单个系统处理所有处理阶段，从用户输入到系统输出。我们在两个数据集（MultiWOZ、KVRET）上尝试了两个ToDS（UBAR、GALAXY）。我们考虑三种类型的DA方法（单词级、句子级、对话级），比较了在ToDS和其他自然语言处理系统中显示出有希望结果的八种DA方法。我们表明所有考虑的DA方法都是有益的，并突出了最好的方法，同时为从业者提供建议。我们还引入了一个更具挑战性的少样本跨域ToDS设置，并得出类似的结论。

更新时间: 2024-06-10 09:36:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06127v1

VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees

Machine learning techniques often lack formal correctness guarantees, evidenced by the widespread adversarial examples that plague most deep-learning applications. This lack of formal guarantees resulted in several research efforts that aim at verifying Deep Neural Networks (DNNs), with a particular focus on safety-critical applications. However, formal verification techniques still face major scalability and precision challenges. The over-approximation introduced during the formal verification process to tackle the scalability challenge often results in inconclusive analysis. To address this challenge, we propose a novel framework to generate Verification-Friendly Neural Networks (VNNs). We present a post-training optimization framework to achieve a balance between preserving prediction performance and verification-friendliness. Our proposed framework results in VNNs that are comparable to the original DNNs in terms of prediction performance, while amenable to formal verification techniques. This essentially enables us to establish robustness for more VNNs than their DNN counterparts, in a time-efficient manner.

Updated: 2024-06-10 09:35:57

标题: VNN: 具有硬性鲁棒性保证的验证友好型神经网络

摘要: 机器学习技术通常缺乏正式的正确性保证，这一点可以从困扰大多数深度学习应用程序的广泛对抗性示例得到证实。这种缺乏正式保证导致了几项研究工作，旨在验证深度神经网络（DNNs），特别关注安全关键应用。然而，正式验证技术仍然面临重大的可扩展性和精确性挑战。在正式验证过程中引入的过度近似通常导致分析结果不确定。为了解决这一挑战，我们提出了一个新颖的框架，用于生成适合验证的神经网络（VNNs）。我们提出了一个后期优化框架，以实现在保持预测性能和易于验证性之间的平衡。我们提出的框架产生的VNNs在预测性能方面与原始DNNs相当，同时适用于正式验证技术。这本质上使我们能够以时间有效的方式为更多VNNs建立稳健性，而不是它们的DNN对应物。

更新时间: 2024-06-10 09:35:57

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2312.09748v2

Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

Updated: 2024-06-10 09:32:49

标题: 固定方差高斯混合变分推断的理论保证

摘要: 变分推断（VI）是贝叶斯推断中一种流行的方法，它寻求在参数族内寻找后验分布的最佳逼近，最小化的损失通常是（逆）Kullback-Leibler（KL）散度。尽管它在实证上取得了成功，但变分推断的理论性质直到最近才受到关注，而且主要是在参数族为高斯分布时。本文旨在通过研究具有固定协方差和恒定权重的高斯混合的情况，为非高斯情况下的变分推断的理论研究做出贡献。在这种观点下，对于这个特定族的变分推断可以被看作是一个Mollified相对熵的最小化，即在Diracs上支持的原子测度与目标分布之间的KL卷积（关于高斯核）。原子测度的支持对应于高斯分量的定位。因此，解决变分推断等同于优化Diracs的位置（颗粒），可以通过梯度下降来实现，并且采取了一个相互作用的颗粒系统的形式。在这种情况下，我们研究了变分推断优化Mollified相对熵时的两种误差来源。第一个是一个优化结果，即一个下降引理，建立算法在每次迭代时减少目标函数。第二个是一个近似误差，它限制了目标函数在最优有限混合和目标分布之间的上限。

更新时间: 2024-06-10 09:32:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.04012v2

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep

Updated: 2024-06-10 09:29:21

标题: 从Alexnet到Transformers：用仿射优化传输测量深度神经网络的非线性

摘要: 在过去的十年中，我们目睹了几种新颖的深度神经网络（DNN）架构的引入，这些架构在各种任务中的性能不断提高。然而，解释它们性能上升的趋势仍然很困难，因为即使在相同数据集上训练的深度和宽度相当的不同DNN架构，也可能表现出截然不同的性能。在本文中，我们介绍了DNN的非线性签名概念，这是第一个理论上合理的解决方案，用于近似测量深度神经网络的非线性。基于从封闭形式最优传输映射导出的得分，这个签名提供了对各种DNN架构和学习范式内部工作原理的更好理解，特别强调计算机视觉任务。我们提供了广泛的实验结果，突显了所提出的非线性签名的实用性及其可能具有深远影响的潜力。我们的工作代码可在https://github.com/qbouniot/AffScoreDeep 上找到。

更新时间: 2024-06-10 09:29:21

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.11439v2

Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

Large language models have limited context capacity, hindering reasoning over long conversations. We propose the Hierarchical Aggregate Tree memory structure to recursively aggregate relevant dialogue context through conditional tree traversals. HAT encapsulates information from children nodes, enabling broad coverage with depth control. We formulate finding best context as optimal tree traversal. Experiments show HAT improves dialog coherence and summary quality over baseline contexts, demonstrating the techniques effectiveness for multi turn reasoning without exponential parameter growth. This memory augmentation enables more consistent, grounded longform conversations from LLMs

Updated: 2024-06-10 09:29:08

标题: 利用分层聚合树增强检索增强生成的长期记忆

摘要: 大型语言模型具有有限的上下文容量，阻碍了对长对话的推理。我们提出了分层聚合树记忆结构，通过条件树遍历递归地聚合相关对话上下文。HAT封装了来自子节点的信息，实现了广泛覆盖和深度控制。我们将找到最佳上下文的问题规定为最优树遍历。实验表明，HAT提高了对话的连贯性和摘要质量，证明了该技术对于多轮推理而言的有效性，而且不会导致参数呈指数增长。这种记忆增强使LLMs能够进行更一致、扎实的长篇对话。

更新时间: 2024-06-10 09:29:08

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.06124v1

Training and Validating a Treatment Recommender with Partial Verification Evidence

Current clinical decision support systems (DSS) are trained and validated on observational data from the target clinic. This is problematic for treatments validated in a randomized clinical trial (RCT), but not yet introduced in any clinic. In this work, we report on a method for training and validating the DSS using the RCT data. The key challenges we address are of missingness -- missing rationale for treatment assignment (the assignment is at random), and missing verification evidence, since the effectiveness of a treatment for a patient can only be verified (ground truth) for treatments what were actually assigned to a patient. We use data from a multi-armed RCT that investigated the effectiveness of single- and combination- treatments for 240+ tinnitus patients recruited and treated in 5 clinical centers. To deal with the 'missing rationale' challenge, we re-model the target variable (outcome) in order to suppress the effect of the randomly-assigned treatment, and control on the effect of treatment in general. Our methods are also robust to missing values in features and with a small number of patients per RCT arm. We deal with 'missing verification evidence' by using counterfactual treatment verification, which compares the effectiveness of the DSS recommendations to the effectiveness of the RCT assignments when they are aligned v/s not aligned. We demonstrate that our approach leverages the RCT data for learning and verification, by showing that the DSS suggests treatments that improve the outcome. The results are limited through the small number of patients per treatment; while our ensemble is designed to mitigate this effect, the predictive performance of the methods is affected by the smallness of the data. We provide a basis for the establishment of decision supporting routines on treatments that have been tested in RCTs but have not yet been deployed clinically.

Updated: 2024-06-10 09:23:00

标题: 训练和验证具有部分验证证据的治疗推荐系统

摘要: 目前的临床决策支持系统（DSS）是在目标诊所的观察数据上进行训练和验证的。对于在随机临床试验（RCT）中验证过但尚未在任何诊所引入的治疗方法来说，这是有问题的。在这项工作中，我们报告了一种使用RCT数据进行DSS训练和验证的方法。我们解决的关键挑战是缺失性问题——治疗分配的缺失理由（分配是随机的），以及缺失验证证据，因为对于患者的治疗效果只能对实际分配给患者的治疗进行验证（地面真相）。我们使用了一项多臂RCT的数据，该研究调查了针对240多名耳鸣患者进行的单一和组合治疗的有效性，这些患者在5个临床中心进行了招募和治疗。为了应对“缺失理由”挑战，我们重新对目标变量（结果）进行建模，以抑制随机分配治疗的影响，并控制治疗的总体效果。我们的方法也能够应对特征值的缺失和每个RCT臂的患者数量较少的情况。我们通过使用对照治疗验证来处理“缺失验证证据”，该方法比较DSS建议的有效性与RCT分配的有效性，当它们一致或不一致时。我们展示了我们的方法利用RCT数据进行学习和验证，通过显示DSS建议的治疗方法可以改善结果。由于治疗每组患者数量较少，结果受到限制；虽然我们的集成设计旨在缓解这种影响，但方法的预测性能受到数据量的限制。我们为建立对在RCT中进行过测试但尚未在临床上使用的治疗方法的决策支持程序提供了基础。

更新时间: 2024-06-10 09:23:00

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.06654v1

On-line conformalized neural networks ensembles for probabilistic forecasting of day-ahead electricity prices

Probabilistic electricity price forecasting (PEPF) is subject of increasing interest, following the demand for proper quantification of prediction uncertainty, to support the operation in complex power markets with increasing share of renewable generation. Distributional neural networks ensembles have been recently shown to outperform state of the art PEPF benchmarks. Still, they require critical reliability enhancements, as fail to pass the coverage tests at various steps on the prediction horizon. In this work, we propose a novel approach to PEPF, extending the state of the art neural networks ensembles based methods through conformal inference based techniques, deployed within an on-line recalibration procedure. Experiments have been conducted on multiple market regions, achieving day-ahead forecasts with improved hourly coverage and stable probabilistic scores.

Updated: 2024-06-10 09:13:29

标题: 在线一致性神经网络集成用于概率预测次日电力价格

摘要: 概率电价预测（PEPF）是一个越来越受关注的主题，随着对预测不确定性适当量化的需求增加，以支持在日益增加可再生能源的复杂电力市场中的运营。最近已经证明，分布式神经网络集合优于最先进的PEPF基准。然而，它们需要关键的可靠性增强，因为在预测的不同步骤上未能通过覆盖率测试。在这项工作中，我们提出了一种新颖的PEPF方法，通过在在线重新校准过程中部署基于符合推断的技术，扩展了最先进的神经网络集合方法。在多个市场区域进行了实验，实现了改进的次日预测，具有更好的小时覆盖率和稳定的概率得分。

更新时间: 2024-06-10 09:13:29

领域: cs.LG

下载: http://arxiv.org/abs/2404.02722v2

A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends

In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.

Updated: 2024-06-10 09:11:30

标题: 一份关于不完全多标签学习的调查：最新进展和未来趋势

摘要: 在现实中，数据经常展现出与多个标签相关的关联性，使得多标签学习（MLL）成为一个突出的研究主题。过去的二十年见证了MLL的成功，这是离不开完整和准确的监督信息的。然而，在实践中获得这样的信息通常是费力的，有时甚至是不可能的。为了避免这一困境，不完全的多标签学习（InMLL）应运而生，旨在从不完整的标记数据中学习。迄今为止，已经提出了大量的InMLL作品，以缩小与完整MLL的性能差距，但对InMLL的系统性评估仍然缺失。在本文中，我们不仅尝试填补这一空白，还努力为创新研究铺平道路。具体而言，我们回顾了InMLL的起源，分析了InMLL面临的挑战，并从数据导向和算法导向的角度分别对InMLL进行分类。此外，我们还展示了InMLL在各个领域的实际应用。更重要的是，我们强调了几个潜在的未来趋势，包括四个与实践更符合的开放问题，以及在解决InMLL挑战方面尚未被充分探索/未被探索的三种技术，这些可能为InMLL领域的新研究方向开辟新的视角。

更新时间: 2024-06-10 09:11:30

领域: cs.LG

下载: http://arxiv.org/abs/2406.06119v1

Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis

Concept drift detection is crucial for many AI systems to ensure the system's reliability. These systems often have to deal with large amounts of data or react in real-time. Thus, drift detectors must meet computational requirements or constraints with a comprehensive performance evaluation. However, so far, the focus of developing drift detectors is on inference quality, e.g. accuracy, but not on computational performance, such as runtime. Many of the previous works consider computational performance only as a secondary objective and do not have a benchmark for such evaluation. Hence, we propose and explain performance engineering for unsupervised concept drift detection that reflects on computational complexities, benchmarking, and performance analysis. We provide the computational complexities of existing unsupervised drift detectors and discuss why further computational performance investigations are required. Hence, we state and substantiate the aspects of a benchmark for unsupervised drift detection reflecting on inference quality and computational performance. Furthermore, we demonstrate performance analysis practices that have proven their effectiveness in High-Performance Computing, by tracing two drift detectors and displaying their performance data.

Updated: 2024-06-10 09:09:15

标题: 朝向无监督概念漂移检测的计算性能工程 -- 复杂性、基准测试、性能分析

摘要: 概念漂移检测对于许多人工智能系统来说至关重要，以确保系统的可靠性。这些系统通常需要处理大量数据或实时反应。因此，漂移检测器必须满足计算要求或约束，并进行全面的性能评估。然而，到目前为止，开发漂移检测器的重点在于推理质量，例如准确性，而不是计算性能，比如运行时间。许多先前的工作只将计算性能视为次要目标，并没有针对这种评估制定基准。因此，我们提出并解释了反映在计算复杂性、基准测试和性能分析上的无监督概念漂移检测的性能工程。我们提供了现有无监督漂移检测器的计算复杂性，并讨论了为什么需要进一步进行计算性能研究。因此，我们阐明并证实了一个反映推理质量和计算性能的无监督漂移检测基准的方面。此外，我们展示了在高性能计算中证明有效的性能分析实践，通过跟踪两个漂移检测器并展示它们的性能数据。

更新时间: 2024-06-10 09:09:15

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2304.08319v3

DKDL-Net: A Lightweight Bearing Fault Detection Model via Decoupled Knowledge Distillation and Low-Rank Adaptation Fine-tuning

Rolling bearing fault detection has developed rapidly in the field of fault diagnosis technology, and it occupies a very important position in this field. Deep learning-based bearing fault diagnosis models have achieved significant success. At the same time, with the continuous improvement of new signal processing technologies such as Fourier transform, wavelet transform and empirical mode decomposition, the fault diagnosis technology of rolling bearings has also been greatly developed, and it can be said that it has entered a new research stage. However, most of the existing methods are limited to varying degrees in the industrial field. The main ones are fast feature extraction and computational complexity. The key to this paper is to propose a lightweight bearing fault diagnosis model DKDL-Net to solve these challenges. The model is trained on the CWRU data set by decoupling knowledge distillation and low rank adaptive fine tuning. Specifically, we built and trained a teacher model based on a 6-layer neural network with 69,626 trainable parameters, and on this basis, using decoupling knowledge distillation (DKD) and Low-Rank adaptive (LoRA) fine-tuning, we trained the student sag model DKDL-Net, which has only 6838 parameters. Experiments show that DKDL-Net achieves 99.48\% accuracy in computational complexity on the test set while maintaining model performance, which is 0.58\% higher than the state-of-the-art (SOTA) model, and our model has lower parameters. Our code is available at Github link: https://github.com/SPBU-LiPengyi/DKDL-Net.git.

Updated: 2024-06-10 09:09:08

标题: DKDL-Net：一种轻量级轴承故障检测模型，通过解耦知识蒸馏和低秩适应微调实现

摘要: 滚动轴承故障检测在故障诊断技术领域发展迅速，并在该领域占据着非常重要的位置。基于深度学习的轴承故障诊断模型取得了显著成功。同时，随着傅里叶变换、小波变换和经验模态分解等新信号处理技术的不断改进，滚动轴承的故障诊断技术也得到了极大发展，可以说已经进入了新的研究阶段。然而，大多数现有方法在工业领域存在不同程度的限制，主要是特征提取速度快和计算复杂性。本文的关键在于提出了一种轻量级轴承故障诊断模型DKDL-Net来解决这些挑战。该模型在CWRU数据集上经过分离知识蒸馏和低秩自适应微调进行训练。具体来说，我们基于具有69,626个可训练参数的6层神经网络构建并训练了一个教师模型，然后在此基础上，使用分离知识蒸馏（DKD）和低秩自适应（LoRA）微调，训练出参数仅为6838的学生模型DKDL-Net。实验证明，DKDL-Net在测试集上的计算复杂度达到了99.48\%的准确率，同时保持了模型性能，比最先进的模型高出0.58\%，而且我们的模型参数更少。我们的代码可在Github链接https://github.com/SPBU-LiPengyi/DKDL-Net.git上找到。

更新时间: 2024-06-10 09:09:08

领域: cs.LG

下载: http://arxiv.org/abs/2406.06653v1

Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

Deploying large language models (LLMs) to real scenarios for domain-specific question answering (QA) is a key thrust for LLM applications, which poses numerous challenges, especially in ensuring that responses are both accommodating to user requirements and appropriately leveraging domain-specific knowledge bases. They are the two major difficulties for LLM application as vanilla fine-tuning falls short of addressing. Combining these requirements, we conceive of them as the requirement for the model's preference to be harmoniously aligned with humans'. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference sets to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with different human preferences uniformly, aiming to optimize LLM performance in real-world, domain-specific QA settings. Adequate experiments and comprehensive comparisons with 15 baseline methods illustrate that our KnowPAT is a superior pipeline for real-scenario domain-specific QA with LLMs.

Updated: 2024-06-10 09:06:10

标题: 领域特定问答中LLM的知识偏好对齐

摘要: 将大型语言模型（LLMs）部署到实际场景中进行特定领域的问题回答（QA）是LLM应用的关键推动力，这提出了许多挑战，特别是要确保响应既符合用户需求，又适当地利用特定领域的知识库。这是LLM应用的两个主要困难，因为普通的微调无法解决这些问题。结合这些要求，我们将其构想为模型优先级与人类之间的和谐一致性要求。因此，我们引入了知识偏好对齐（KnowPAT），它构建了两种偏好集来解决这两个问题。此外，我们设计了一个新的对齐目标，以统一地将LLM偏好与不同人类偏好对齐，旨在优化LLM在真实世界、特定领域的QA设置中的性能。充分的实验和与15种基线方法的全面比较表明，我们的KnowPAT是LLMs实际场景特定领域QA的优越管道。

更新时间: 2024-06-10 09:06:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.06503v3

Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches

We present Strokes2Surface, an offline geometry reconstruction pipeline that recovers well-connected curve networks from imprecise 4D sketches to bridge concept design and digital modeling stages in architectural design. The input to our pipeline consists of 3D strokes' polyline vertices and their timestamps as the 4th dimension, along with additional metadata recorded throughout sketching. Inspired by architectural sketching practices, our pipeline combines a classifier and two clustering models to achieve its goal. First, with a set of extracted hand-engineered features from the sketch, the classifier recognizes the type of individual strokes between those depicting boundaries (Shape strokes) and those depicting enclosed areas (Scribble strokes). Next, the two clustering models parse strokes of each type into distinct groups, each representing an individual edge or face of the intended architectural object. Curve networks are then formed through topology recovery of consolidated Shape clusters and surfaced using Scribble clusters guiding the cycle discovery. Our evaluation is threefold: We confirm the usability of the Strokes2Surface pipeline in architectural design use cases via a user study, we validate our choice of features via statistical analysis and ablation studies on our collected dataset, and we compare our outputs against a range of reconstructions computed using alternative methods.

Updated: 2024-06-10 09:04:11

标题: Strokes2Surface：从4D建筑设计草图中恢复曲线网络

摘要: 我们提出了Strokes2Surface，这是一个离线几何重建流程，可以从不精确的4D草图中恢复出良好连接的曲线网络，以在建筑设计的概念设计和数字建模阶段之间建立桥梁。我们的流程的输入包括3D笔画的折线顶点及其时间戳作为第四维度，以及在整个草图过程中记录的附加元数据。受建筑草图实践的启发，我们的流程结合了一个分类器和两个聚类模型来实现其目标。首先，通过从草图中提取的手工设计特征集，分类器识别出描述边界的笔画（形状笔画）和描述封闭区域的笔画（涂鸦笔画）之间的类型。接下来，两个聚类模型将每种类型的笔画解析为不同的组，每个组代表所需建筑物体的一个单独的边缘或面。通过拓扑恢复整合的形状组形成曲线网络，并使用涂鸦组指导循环发现进行表面化。我们的评估有三个方面：通过用户研究确认Strokes2Surface流程在建筑设计用例中的可用性，通过对我们收集的数据集进行统计分析和消融研究验证我们特征选择的合理性，并将我们的输出与使用替代方法计算的一系列重建进行比较。

更新时间: 2024-06-10 09:04:11

领域: cs.GR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.07220v4

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

Updated: 2024-06-10 09:03:17

标题: 通过模型架构的视角改进神经车辆路径问题求解器的泛化能力

摘要: 神经模型在解决车辆路径问题（VRPs）时取得了令人期待的结果，但通常在泛化方面表现不佳。最近一些试图增强模型泛化能力的尝试往往会导致不必要的训练成本过大，或者无法直接应用于解决不同VRP变种的其他模型。为了解决这些问题，在本研究中我们采用了一种新颖的模型架构视角。具体地，我们提出了一种即插即用的基于熵的缩放因子（ESF）和一种特定分布（DS）解码器，分别用来增强大小和分布泛化性。ESF调整模型的注意力权重模式，使其朝向在解决不同规模的VRPs时发现的熟悉模式。DS解码器通过多个辅助轻量级解码器明确地建模多个训练分布模式的VRPs，扩展了模型表示空间，涵盖了更广泛的分布情景。我们在合成和广泛认可的真实世界基准数据集上进行了大量实验，并将性能与七个基线模型进行了比较。结果表明，使用ESF和DS解码器可以获得更具泛化性的模型，并展示它们解决不同VRP变种，即旅行推销员问题和容量限制VRP的适用性。值得注意的是，我们提出的通用组件需要最少的计算资源，并且可以轻松集成到传统的泛化策略中，进一步提升模型的泛化能力。

更新时间: 2024-06-10 09:03:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06652v1

A Survey on Federated Unlearning: Challenges, Methods, and Future Directions

In recent years, the notion of ``the right to be forgotten" (RTBF) has become a crucial aspect of data privacy, requiring the provision of mechanisms that support the removal of personal data of individuals upon their requests. Consequently, given the extensive adoption of data-intensive machine learning (ML) algorithms and increasing concerns for personal data privacy protection, the concept of machine unlearning (MU) has gained considerable attention. MU empowers an ML model to selectively eliminate identifiable information. Evolving from the foundational principles of MU, federated unlearning (FU) has emerged to confront the challenge of data erasure within federated learning (FL) settings. This empowers the FL model to unlearn an FL client or identifiable information pertaining to the client. Nevertheless, unlike traditional MU, the distinctive attributes of federated learning introduce specific challenges for FU techniques. These challenges necessitate a tailored design when developing FU algorithms. While various concepts and numerous federated unlearning schemes exist in this field, the unified workflow and tailored design of FU are not yet well understood. Therefore, this comprehensive survey delves into the techniques, methodologies, and recent advancements in federated unlearning. It provides an overview of fundamental concepts and principles, evaluates existing federated unlearning algorithms, and reviews optimizations tailored to federated learning. Additionally, it discusses practical applications and assesses their limitations. Finally, it outlines promising directions for future research.

Updated: 2024-06-10 09:03:03

标题: 一项关于联邦式遗忘的调查：挑战、方法和未来方向

摘要: 近年来，“被遗忘的权利”（RTBF）的概念已成为数据隐私的一个关键方面，要求提供支持根据个人要求删除个人数据的机制。因此，鉴于数据密集型机器学习（ML）算法的广泛采用以及对个人数据隐私保护日益增加的关注，机器遗忘（MU）的概念引起了广泛关注。MU使ML模型能够有选择地消除可识别信息。从MU的基本原则发展而来，联邦遗忘（FU）已经出现，以应对联邦学习（FL）设置中数据消除的挑战。这使FL模型能够遗忘FL客户端或与客户端相关的可识别信息。然而，与传统的MU不同，联邦学习的独特属性为FU技术带来了特定挑战。这些挑战要求在开发FU算法时进行定制设计。尽管在这一领域存在各种概念和众多联邦遗忘方案，但联邦遗忘的统一工作流程和定制设计尚未被充分理解。因此，本综合调查深入探讨了联邦遗忘的技术、方法论和最新进展。它提供了基本概念和原则的概述，评估了现有的联邦遗忘算法，并审查了针对联邦学习量身定制的优化方法。此外，它讨论了实际应用，并评估了它们的局限性。最后，它概述了未来研究的有前途的方向。

更新时间: 2024-06-10 09:03:03

领域: cs.CR

下载: http://arxiv.org/abs/2310.20448v3

Short-Term Electricity Demand Forecasting of Dhaka City Using CNN with Stacked BiLSTM

The precise forecasting of electricity demand also referred to as load forecasting, is essential for both planning and managing a power system. It is crucial for many tasks, including choosing which power units to commit to, making plans for future power generation capacity, enhancing the power network, and controlling electricity consumption. As Bangladesh is a developing country, the electricity infrastructure is critical for economic growth and employment in this country. Accurate forecasting of electricity demand is crucial for ensuring that this country has a reliable and sustainable electricity supply to meet the needs of its growing population and economy. The complex and nonlinear behavior of such energy systems inhibits the creation of precise algorithms. Within this context, this paper aims to propose a hybrid model of Convolutional Neural Network (CNN) and stacked Bidirectional Long-short Term Memory (BiLSTM) architecture to perform an accurate short-term forecast of the electricity demand of Dhaka city. Short-term forecasting is ordinarily done to anticipate load for the following few hours to a few weeks. Normalization techniques have been also investigated because of the sensitivity of these models towards the input range. The proposed approach produced the best prediction results in comparison to the other benchmark models (LSTM, CNN- BiLSTM and CNN-LSTM) used in the study, with MAPE 1.64%, MSE 0.015, RMSE 0.122 and MAE 0.092. The result of the proposed model also outperformed some of the existing works on load-forecasting.

Updated: 2024-06-10 09:02:07

标题: 《使用堆叠BiLSTM的CNN进行达卡市短期电力需求预测》

摘要: 电力需求的精确预测，也称为负荷预测，对规划和管理电力系统至关重要。对于许多任务来说，包括选择哪些电力单元进行承诺、制定未来发电容量计划、增强电力网络和控制电力消耗，都是至关重要的。由于孟加拉国是一个发展中国家，电力基础设施对于该国的经济增长和就业至关重要。准确预测电力需求对确保该国拥有可靠和可持续的电力供应以满足不断增长的人口和经济需求至关重要。这种能源系统的复杂和非线性行为阻碍了精确算法的创建。在这种情况下，本文旨在提出一个卷积神经网络（CNN）和堆叠双向长短期记忆（BiLSTM）架构的混合模型，以实现对达卡市电力需求的准确短期预测。短期预测通常用于预测接下来几个小时到几周的负荷。由于这些模型对输入范围的敏感性，归一化技术也进行了研究。与在研究中使用的其他基准模型（LSTM、CNN-BiLSTM和CNN-LSTM）相比，所提出的方法产生了最佳的预测结果，MAPE为1.64％，MSE为0.015，RMSE为0.122，MAE为0.092。所提出模型的结果也优于一些现有的负荷预测工作。

更新时间: 2024-06-10 09:02:07

领域: cs.LG

下载: http://arxiv.org/abs/2406.06651v1

Are EEG-to-Text Models Working?

This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models that truly learn from EEG signals and those that simply memorize training data. Our analysis reveals that model performance on noise data can be comparable to that on EEG data. These findings highlight the need for stricter evaluation practices in EEG-to-Text research, emphasizing transparent reporting and rigorous benchmarking with noise inputs. This approach will lead to more reliable assessments of model capabilities and pave the way for robust EEG-to-Text communication systems.

Updated: 2024-06-10 09:01:18

标题: 脑电图到文本模型有效吗？

摘要: 这项工作对现有的开放词汇量脑电图到文本翻译模型进行了批判性分析。我们确定了一个关键限制：先前的研究在评估过程中通常采用隐式教师强迫，人为地夸大了性能指标。此外，它们缺乏一个关键的基准 - 比较模型在纯噪声输入上的性能。我们提出了一种方法来区分那些真正从脑电信号中学习的模型和那些只是记忆训练数据的模型。我们的分析表明，模型在噪声数据上的性能可以与在脑电数据上的性能相媲美。这些发现凸显了在脑电图到文本研究中需要更严格的评估实践，强调透明的报告和用噪声输入进行严格基准测试。这种方法将带来对模型能力的更可靠评估，并为稳健的脑电图到文本通信系统铺平道路。

更新时间: 2024-06-10 09:01:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06459v2

Topological Expressivity of ReLU Neural Networks

We study the expressivity of ReLU neural networks in the setting of a binary classification problem from a topological perspective. Recently, empirical studies showed that neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simpler one as it passes through the layers. This topological simplification has been measured by Betti numbers, which are algebraic invariants of a topological space. We use the same measure to establish lower and upper bounds on the topological simplification a ReLU neural network can achieve with a given architecture. We therefore contribute to a better understanding of the expressivity of ReLU neural networks in the context of binary classification problems by shedding light on their ability to capture the underlying topological structure of the data. In particular the results show that deep ReLU neural networks are exponentially more powerful than shallow ones in terms of topological simplification. This provides a mathematically rigorous explanation why deeper networks are better equipped to handle complex and topologically rich data sets.

Updated: 2024-06-10 08:58:42

标题: ReLU神经网络的拓扑表达能力

摘要: 我们从拓扑的角度研究了ReLU神经网络在二元分类问题中的表现能力。最近的实证研究表明，神经网络通过改变拓扑结构来运作，将一个拓扑复杂的数据集转化为一个拓扑简单的数据集。这种拓扑简化可以通过贝蒂数来衡量，它是拓扑空间的代数不变量。我们使用相同的度量标准来建立ReLU神经网络在给定架构下可以实现的拓扑简化的下限和上限。因此，我们通过揭示ReLU神经网络捕捉数据的潜在拓扑结构的能力，为更好地理解ReLU神经网络在二元分类问题中的表现能力做出贡献。特别是，结果表明，深层ReLU神经网络在拓扑简化方面比浅层网络具有指数级的优势。这提供了一个数学严谨的解释，说明为什么深度网络更适合处理复杂和拓扑丰富的数据集。

更新时间: 2024-06-10 08:58:42

领域: cs.LG

下载: http://arxiv.org/abs/2310.11130v2

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.

Updated: 2024-06-10 08:51:04

标题: JenGAN: 基于GAN的语音合成中的堆叠移位滤波器

摘要: 非自回归基于GAN的神经声码器因其快速推理速度和高感知质量而被广泛使用。然而，它们经常受到可听的伪影的困扰，如其生成结果中的音调伪影。因此，我们提出了JenGAN，一种新的训练策略，其中包括堆叠的低通滤波器，以确保移位等变性质。这种方法有助于防止混叠和减少伪影，同时保留了推理过程中使用的模型结构。在我们的实验评估中，JenGAN持续提升声码器模型的性能，在大多数评估指标上获得显著优异的分数。

更新时间: 2024-06-10 08:51:04

领域: eess.AS,cs.AI,cs.SD,eess.SP

下载: http://arxiv.org/abs/2406.06111v1

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer

Updated: 2024-06-10 08:50:59

标题: 循环上下文压缩：高效扩展LLM的上下文窗口

摘要: 为了扩展基于Transformer的大型语言模型（LLMs）的上下文长度并提高理解能力，我们经常面临由于计算资源和有限的内存存储容量而受限的局限性。本文介绍了一种称为Recurrent Context Compression（RCC）的方法，旨在在受限存储空间内有效扩展LLMs的上下文窗口长度。我们还研究了在下游任务中压缩指令和上下文时模型响应不佳的问题，并提出了一种指令重建方法来缓解这个问题。我们在多个任务上验证了我们方法的有效性，实现了在文本重建任务上的最高32倍的压缩率，BLEU4分数接近0.95，并在一个具有100万序列长度的密码检索任务上实现了近100\%的准确率。最后，我们的方法在长文本问答任务中表现出与未压缩方法相比的竞争性能，并在长文本推理任务中显著节省了存储资源。我们的代码、模型和演示可在https://github.com/WUHU-G/RCC_Transformer上找到。

更新时间: 2024-06-10 08:50:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06110v1

EXPIL: Explanatory Predicate Invention for Learning in Games

Reinforcement learning (RL) has proven to be a powerful tool for training agents that excel in various games. However, the black-box nature of neural network models often hinders our ability to understand the reasoning behind the agent's actions. Recent research has attempted to address this issue by using the guidance of pretrained neural agents to encode logic-based policies, allowing for interpretable decisions. A drawback of such approaches is the requirement of large amounts of predefined background knowledge in the form of predicates, limiting its applicability and scalability. In this work, we propose a novel approach, Explanatory Predicate Invention for Learning in Games (EXPIL), that identifies and extracts predicates from a pretrained neural agent, later used in the logic-based agents, reducing the dependency on predefined background knowledge. Our experimental evaluation on various games demonstrate the effectiveness of EXPIL in achieving explainable behavior in logic agents while requiring less background knowledge.

Updated: 2024-06-10 08:46:49

标题: EXPIL：游戏学习中的解释性谓词创造

摘要: 强化学习（RL）已被证明是训练在各种游戏中表现优异的代理的强大工具。然而，神经网络模型的黑盒特性通常阻碍了我们理解代理行为背后推理的能力。最近的研究尝试通过利用预训练的神经代理的指导来编码基于逻辑的策略来解决这个问题，从而实现可解释的决策。这种方法的一个缺点是需要大量预定义的背景知识，以谓词的形式，限制了其适用性和可扩展性。在这项工作中，我们提出了一种新颖的方法，即游戏中的解释性谓词发明（EXPIL），它从预训练的神经代理中识别和提取谓词，然后用于基于逻辑的代理中，减少对预定义背景知识的依赖。我们在各种游戏上的实验评估显示，EXPIL在实现逻辑代理中可解释行为方面的有效性，同时需要更少的背景知识。

更新时间: 2024-06-10 08:46:49

领域: cs.AI

下载: http://arxiv.org/abs/2406.06107v1

MolTC: Towards Molecular Relational Modeling In Language Models

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. To train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.

Updated: 2024-06-10 08:45:51

标题: MolTC: 迈向语言模型中的分子关系建模

摘要: 分子关系学习（MRL）旨在理解分子对之间的相互作用，在推进生化研究方面发挥着关键作用。最近，采用大型语言模型（LLMs），以其庞大的知识存储库和先进的逻辑推理能力而闻名，已成为高效和有效的MRL的一种有前途的方式。尽管它们具有潜力，但这些方法主要依赖于文本数据，因此没有充分利用分子图中固有的丰富结构信息。此外，缺乏统一框架加剧了信息未充分利用的问题，因为它阻碍了在不同数据集之间共享学习的相互作用机制。为了解决这些挑战，本文提出了一种基于LLM的多模态框架，用于遵循Chain-of-Thought（CoT）理论的分子相互作用预测，称为MolTC，它有效地整合了成对两种分子的图形信息。为了有效训练MolTC，我们引入了一个多层次的CoT概念来完善其训练范式，并为涉及MRL的生化LLMs的开发提供了一个全面的分子交互指令数据集。我们的实验跨越各种数据集，涉及超过400万个分子对，展示了我们的方法优于当前基于GNN和LLM的基线方法。代码可在https://github.com/MangoKiller/MolTC 上找到。

更新时间: 2024-06-10 08:45:51

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.03781v6

Testably Learning Polynomial Threshold Functions

Rubinfeld & Vasilyan recently introduced the framework of testable learning as an extension of the classical agnostic model. It relaxes distributional assumptions which are difficult to verify by conditions that can be checked efficiently by a tester. The tester has to accept whenever the data truly satisfies the original assumptions, and the learner has to succeed whenever the tester accepts. We focus on the setting where the tester has to accept standard Gaussian data. There, it is known that basic concept classes such as halfspaces can be learned testably with the same time complexity as in the (distribution-specific) agnostic model. In this work, we ask whether there is a price to pay for testably learning more complex concept classes. In particular, we consider polynomial threshold functions (PTFs), which naturally generalize halfspaces. We show that PTFs of arbitrary constant degree can be testably learned up to excess error $\varepsilon > 0$ in time $n^{\mathrm{poly}(1/\varepsilon)}$. This qualitatively matches the best known guarantees in the agnostic model. Our results build on a connection between testable learning and fooling. In particular, we show that distributions that approximately match at least $\mathrm{poly}(1/\varepsilon)$ moments of the standard Gaussian fool constant-degree PTFs (up to error $\varepsilon$). As a secondary result, we prove that a direct approach to show testable learning (without fooling), which was successfully used for halfspaces, cannot work for PTFs.

Updated: 2024-06-10 08:42:48

标题: 可以翻译为：可测学习多项式阈值函数

摘要: Rubinfeld＆Vasilyan最近引入了可检测学习的框架，作为经典不可知模型的扩展。它通过可以被测试者有效检查的条件放宽了难以验证的分布假设。测试者必须在数据真正满足原始假设时接受，学习者必须在测试者接受时成功。我们关注的设置是测试者必须接受标准高斯数据。在那里，人们知道基本概念类别，如半空间，可以以与（特定分布）不可知模型相同的时间复杂度被可检测地学习。在这项工作中，我们询问是否有一个价格可以为更复杂的概念类别的可检测学习付出。特别是，我们考虑多项式阈值函数（PTFs），自然地概括了半空间。我们展示了任意常数次幂的PTFs可以在时间$n^{\mathrm{poly}(1/\varepsilon)}$内被可检测地学习，直到多余误差$\varepsilon > 0$。这在定性上与不可知模型中已知的最佳保证相匹配。我们的结果建立在可检测学习与愚弄之间的联系上。特别是，我们展示了近似匹配标准高斯分布至少$\mathrm{poly}(1/\varepsilon)$时刻的分布愚弄常数次幂PTFs（直到误差$\varepsilon$）。作为次要结果，我们证明了用于半空间成功的直接方法不能适用于PTFs的可检测学习。

更新时间: 2024-06-10 08:42:48

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2406.06106v1

Adaptive Control in Assistive Application -- A Study Evaluating Shared Control by Users with Limited Upper Limb Mobility

Shared control in assistive robotics blends human autonomy with computer assistance, thus simplifying complex tasks for individuals with physical impairments. This study assesses an adaptive Degrees of Freedom control method specifically tailored for individuals with upper limb impairments. It employs a between-subjects analysis with 24 participants, conducting 81 trials across three distinct input devices in a realistic everyday-task setting. Given the diverse capabilities of the vulnerable target demographic and the known challenges in statistical comparisons due to individual differences, the study focuses primarily on subjective qualitative data. The results reveal consistently high success rates in trial completions, irrespective of the input device used. Participants appreciated their involvement in the research process, displayed a positive outlook, and quick adaptability to the control system. Notably, each participant effectively managed the given task within a short time frame.

Updated: 2024-06-10 08:36:55

标题: 辅助应用中的自适应控制--一项评估受限上肢运动能力用户共享控制的研究

摘要: 在辅助机器人中的共享控制将人类自治与计算机辅助融合在一起，从而简化了对身体功能障碍个体复杂任务的完成。本研究评估了一种专门为上肢功能障碍个体量身定制的自适应自由度控制方法。该研究采用了24名参与者的组间分析，在一个真实的日常任务环境中通过三种不同的输入设备进行了81次试验。考虑到脆弱目标人群的多样能力和由于个体差异而导致的统计比较挑战，该研究主要关注主观定性数据。结果显示，无论使用哪种输入设备，试验完成的成功率一直很高。参与者对他们在研究过程中的参与感到满意，表现出积极的态度，并且迅速适应了控制系统。值得注意的是，每位参与者都在短时间内有效地完成了所给定的任务。

更新时间: 2024-06-10 08:36:55

领域: cs.HC,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.06103v1

On the Consistency of Kernel Methods with Dependent Observations

The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.i.d., such as when data comes from a dynamical system. We propose the new notion of empirical weak convergence (EWC) as a general assumption explaining such phenomena for kernel methods. It assumes the existence of a random asymptotic data distribution and is a strict weakening of previous assumptions in the field. Our main results then establish consistency of SVMs, kernel mean embeddings, and general Hilbert-space valued empirical expectations with EWC data. Our analysis holds for both finite- and infinite-dimensional outputs, as we extend classical results of statistical learning to the latter case. In particular, it is also applicable to CKMEs. Overall, our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.

Updated: 2024-06-10 08:35:01

标题: 关于具有相关观测数据的核方法的一致性

摘要: 通常情况下，学习方法的一致性是建立在观察结果是独立同分布（i.i.d.）或混合过程的实现的假设下。然而，像支持向量机（SVM）、高斯过程或条件核均值嵌入（CKME）这样的核方法在明显非i.i.d.的抽样方案下表现出色，比如当数据来自动力系统时。我们提出了经验弱收敛（EWC）的新概念，作为解释核方法中这种现象的一般假设。它假设了随机渐近数据分布的存在，并且是该领域先前假设的严格削弱。然后，我们的主要结果建立了SVM、核均值嵌入和一般希尔伯特空间值的经验期望与EWC数据的一致性。我们的分析适用于有限和无限维输出，因为我们将统计学习的经典结果扩展到后者的情况。特别地，它也适用于CKME。总体而言，我们的结果为统计学习打开了新的过程类别，并可以作为超越i.i.d.和混合学习理论的基础。

更新时间: 2024-06-10 08:35:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06101v1

Sequential Binary Classification for Intrusion Detection in Software Defined Networks

Software-Defined Networks (SDN) are the standard architecture for network deployment. Intrusion Detection Systems (IDS) are a pivotal part of this technology as networks become more vulnerable to new and sophisticated attacks. Machine Learning (ML)-based IDS are increasingly seen as the most effective approach to handle this issue. However, IDS datasets suffer from high class imbalance, which impacts the performance of standard ML models. We propose Sequential Binary Classification (SBC) - an algorithm for multi-class classification to address this issue. SBC is a hierarchical cascade of base classifiers, each of which can be modelled on any general binary classifier. Extensive experiments are reported on benchmark datasets that evaluate the performance of SBC under different scenarios.

Updated: 2024-06-10 08:34:13

标题: 在软件定义网络中用于入侵检测的顺序二进制分类

摘要: 软件定义网络（SDN）是网络部署的标准架构。入侵检测系统（IDS）是这项技术的关键部分，因为网络越来越容易受到新的和复杂的攻击。基于机器学习（ML）的IDS越来越被认为是处理这个问题的最有效方法。然而，IDS数据集存在高类别不平衡问题，影响了标准ML模型的性能。我们提出了一种用于多类分类的Sequential Binary Classification（SBC）算法，以解决这个问题。SBC是一种基于基分类器的层级级联，每个基分类器可以基于任何通用的二元分类器进行建模。我们对基准数据集进行了大量实验，评估了SBC在不同场景下的性能。

更新时间: 2024-06-10 08:34:13

领域: cs.CR,cs.LG,cs.NI

下载: http://arxiv.org/abs/2406.06099v1

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the challenges of handling continuous and unbounded audio streams. This requires additional decisions about what to retain of the previous history, which is impractical to keep entirely due to latency and computational constraints. Despite the real-world demand for real-time ST, research on streaming translation remains limited, with existing works solely focusing on SimulST. To fill this gap, we introduce StreamAtt, the first StreamST policy, and propose StreamLAAL, the first StreamST latency metric designed to be comparable with existing metrics for SimulST. Extensive experiments across all 8 languages of MuST-C v1.0 show the effectiveness of StreamAtt compared to a naive streaming baseline and the related state-of-the-art SimulST policy, providing a first step in StreamST research.

Updated: 2024-06-10 08:27:58

标题: StreamAtt：基于注意力机制的音频历史选择的直接流式语音转文本翻译

摘要: 流式语音到文本翻译（StreamST）是指在接收音频流的同时自动翻译语音的任务。与同时翻译（SimulST）处理预分段语音不同，StreamST面临着处理连续和无限音频流的挑战。这需要关于保留先前历史的决策，由于延迟和计算约束，完全保留先前历史是不切实际的。尽管实时ST的实际需求很大，但针对流式翻译的研究仍然有限，现有研究仅专注于SimulST。为填补这一空白，我们引入了StreamAtt，第一个StreamST策略，并提出了StreamLAAL，第一个旨在与SimulST现有指标相媲美的StreamST延迟度量。对MuST-C v1.0的所有8种语言进行的广泛实验显示，与天真的流式基线和相关的SimulST最新策略相比，StreamAtt的有效性，为StreamST研究迈出了第一步。

更新时间: 2024-06-10 08:27:58

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2406.06097v1

Semantica: An Adaptable Image-Conditioned Diffusion Model

We investigate the task of adapting image generative models to different datasets without finetuneing. To this end, we introduce Semantica, an image-conditioned diffusion model capable of generating images based on the semantics of a conditioning image. Semantica is trained exclusively on web-scale image pairs, that is it receives a random image from a webpage as conditional input and models another random image from the same webpage. Our experiments highlight the expressivity of pretrained image encoders and necessity of semantic-based data filtering in achieving high-quality image generation. Once trained, it can adaptively generate new images from a dataset by simply using images from that dataset as input. We study the transfer properties of Semantica on ImageNet, LSUN Churches, LSUN Bedroom and SUN397.

Updated: 2024-06-10 08:23:03

标题: Semantica: 一种适应图像条件扩散模型

摘要: 我们研究了将图像生成模型适应不同数据集的任务，而无需微调。为此，我们引入了Semantica，一种基于图像的扩散模型，能够根据条件图像的语义生成图像。Semantica仅在网络规模的图像对上进行训练，即接收来自网页的随机图像作为条件输入，并对同一网页上的另一个随机图像建模。我们的实验突出了预训练图像编码器的表现力以及在实现高质量图像生成时基于语义的数据过滤的必要性。一旦训练完成，它可以通过简单地使用该数据集中的图像作为输入来自适应地生成新图像。我们研究了Semantica在ImageNet、LSUN教堂、LSUN卧室和SUN397上的迁移属性。

更新时间: 2024-06-10 08:23:03

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.14857v2

Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects

Policy learning in robot-assisted surgery (RAS) lacks data efficient and versatile methods that exhibit the desired motion quality for delicate surgical interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a novel method for imitation learning (IL) in RAS that focuses on gentle manipulation of deformable objects. The approach combines the versatility of diffusion-based imitation learning (DIL) with the high-quality motion generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs). This combination enables MPD to achieve gentle manipulation of deformable objects, while maintaining data efficiency critical for RAS applications where demonstration data is scarce. We evaluate MPD across various simulated and real world robotic tasks on both state and image observations. MPD outperforms state-of-the-art DIL methods in success rate, motion quality, and data efficiency. Project page: https://scheiklp.github.io/movement-primitive-diffusion/

Updated: 2024-06-10 08:11:00

标题: 运动基元扩散：学习对可变形物体进行轻柔机器人操作

摘要: 机器人辅助手术（RAS）中的政策学习缺乏数据效率和多功能性方法，这些方法展示了对于精细外科干预所需的运动质量。为此，我们引入了Movement Primitive Diffusion（MPD），这是一种新颖的在RAS中进行模仿学习（IL）的方法，重点放在对可变形物体进行轻柔操作上。该方法将基于扩散的模仿学习（DIL）的多功能性与概率动态运动原语（ProDMPs）的高质量运动生成能力相结合。这种组合使MPD能够实现对可变形物体的轻柔操作，同时保持数据效率，这对RAS应用至关重要，因为示范数据很少。我们评估了MPD在各种模拟和真实世界的机器人任务中的表现，包括状态和图像观察。MPD在成功率、运动质量和数据效率方面优于最先进的DIL方法。项目页面：https://scheiklp.github.io/movement-primitive-diffusion/

更新时间: 2024-06-10 08:11:00

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.10008v2

When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems

Many research papers have recently focused on behavioral-based driver authentication systems in vehicles. Pushed by Artificial Intelligence (AI) advancements, these works propose powerful models to identify drivers through their unique biometric behavior. However, these models have never been scrutinized from a security point of view, rather focusing on the performance of the AI algorithms. Several limitations and oversights make implementing the state-of-the-art impractical, such as their secure connection to the vehicle's network and the management of security alerts. Furthermore, due to the extensive use of AI, these systems may be vulnerable to adversarial attacks. However, there is currently no discussion on the feasibility and impact of such attacks in this scenario. Driven by the significant gap between research and practical application, this paper seeks to connect these two domains. We propose the first security-aware system model for behavioral-based driver authentication. We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures designed for our constrained environments. We formalize a realistic system and threat model reflecting a real-world vehicle's network for their implementation. When evaluated on real driving data, our models outclass the state-of-the-art with an accuracy of up to 0.999 in identification and authentication. Moreover, we are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN. We show how attackers can still exploit these systems with a perfect attack success rate (up to 1.000). Finally, we discuss requirements for deploying driver authentication systems securely. Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.

Updated: 2024-06-10 08:09:27

标题: 当认证不足以保证安全性：关于基于行为的驾驶员认证系统的安全性

摘要: 最近许多研究论文集中讨论了车辆中基于行为的驾驶员认证系统。在人工智能（AI）的推动下，这些作品提出了强大的模型，通过独特的生物特征行为识别驾驶员。然而，这些模型从未从安全角度进行审查，而是专注于AI算法的性能。一些限制和疏忽使得实施最先进的技术变得不切实际，例如它们与车辆网络的安全连接以及安全警报的管理。此外，由于广泛使用AI，这些系统可能容易受到对抗性攻击的影响。然而，目前在这种情况下对这些攻击的可行性和影响没有讨论。鉴于研究和实际应用之间存在重大差距，本文旨在连接这两个领域。我们提出了第一个注重安全的基于行为的驾驶员认证系统模型。我们基于随机森林和循环神经网络架构开发了两种轻量级驾驶员认证系统，设计用于我们受限环境。我们制定了一个反映真实世界车辆网络的现实系统和威胁模型以供实施。在真实驾驶数据上评估时，我们的模型在识别和认证方面的准确率高达0.999，超越了最先进技术。此外，我们是第一个通过开发两种新型规避攻击，SMARTCAN和GANCAN，针对这些系统提出攻击的。我们展示了攻击者如何仍然可以利用这些系统，攻击成功率达到完美（高达1.000）。最后，我们讨论了部署驾驶员认证系统安全性的要求。通过我们的贡献，我们帮助从业者安全地采用这些系统，帮助减少车辆盗窃，并增强驾驶员安全。

更新时间: 2024-06-10 08:09:27

领域: cs.CR

下载: http://arxiv.org/abs/2306.05923v4

Random Function Descent

Classical worst-case optimization theory neither explains the success of optimization in machine learning, nor does it help with step size selection. We establish a connection between Bayesian Optimization (i.e. average case optimization theory) and classical optimization using a 'stochastic Taylor approximation' to rediscover gradient descent. This rediscovery yields a step size schedule we call Random Function Descent (RFD), which, in contrast to classical derivations, is scale invariant. Furthermore, our analysis of RFD step sizes yields a theoretical foundation for common step size heuristics such as gradient clipping and gradual learning rate warmup. We finally propose a statistical procedure for estimating the RFD step size schedule and validate this theory with a case study on the MNIST dataset.

Updated: 2024-06-10 07:57:24

标题: 随机函数下降

摘要: 经典最坏情况优化理论既无法解释机器学习中优化的成功，也无法帮助选择步长。我们通过使用“随机泰勒近似”建立了贝叶斯优化（即平均情况优化理论）与经典优化之间的联系，从而重新发现了梯度下降。这一重新发现产生了我们称之为随机函数下降（RFD）的步长安排，与经典推导相比，RFD是尺度不变的。此外，我们对RFD步长大小的分析为常见的步长启发式方法，如梯度裁剪和逐渐学习率预热，提供了理论基础。最后，我们提出了一种用于估计RFD步长安排的统计程序，并通过对MNIST数据集的案例研究验证了这一理论。

更新时间: 2024-06-10 07:57:24

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.01377v2

Using Deep Learning to Find the Next Unicorn: A Practical Synthesis

Startups often represent newly established business models associated with disruptive innovation and high scalability. They are commonly regarded as powerful engines for economic and social development. Meanwhile, startups are heavily constrained by many factors such as limited financial funding and human resources. Therefore, the chance for a startup to eventually succeed is as rare as "spotting a unicorn in the wild". Venture Capital (VC) strives to identify and invest in unicorn startups during their early stages, hoping to gain a high return. To avoid entirely relying on human domain expertise and intuition, investors usually employ data-driven approaches to forecast the success probability of startups. Over the past two decades, the industry has gone through a paradigm shift moving from conventional statistical approaches towards becoming machine-learning (ML) based. Notably, the rapid growth of data volume and variety is quickly ushering in deep learning (DL), a subset of ML, as a potentially superior approach in terms of capacity and expressivity. In this work, we carry out a literature review and synthesis on DL-based approaches, covering the entire DL life cycle. The objective is a) to obtain a thorough and in-depth understanding of the methodologies for startup evaluation using DL, and b) to distil valuable and actionable learning for practitioners. To the best of our knowledge, our work is the first of this kind.

Updated: 2024-06-10 07:56:11

标题: 使用深度学习寻找下一个独角兽：一个实用的综合方法

摘要: 初创企业通常代表着与颠覆性创新和高度扩展性相关的新业务模式。它们通常被视为经济和社会发展的强大引擎。与此同时，初创企业受到许多因素的严格限制，如有限的财务资金和人力资源。因此，初创企业最终成功的机会就像“在野外发现独角兽”一样罕见。风险投资（VC）致力于在初创企业的早期阶段识别和投资独角兽初创企业，希望获得高回报。为了避免完全依赖于人类领域专业知识和直觉，投资者通常采用数据驱动方法来预测初创企业的成功概率。在过去的二十年中，该行业经历了一场范式转变，从传统的统计方法转向基于机器学习（ML）的方法。值得注意的是，数据量和种类的快速增长迅速引入了深度学习（DL），作为在能力和表现力方面可能更优趀的方法。在这项工作中，我们进行了基于DL的方法的文献回顾和综合，涵盖了整个DL生命周期。其目的是a）深入和全面了解使用DL评估初创企业的方法论，b）为从业者提炼有价值和可操作的经验教训。据我们所知，我们的工作是第一种这种类型的工作。

更新时间: 2024-06-10 07:56:11

领域: q-fin.CP,cs.LG,68T07,H.1.0

下载: http://arxiv.org/abs/2210.14195v2

An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware Crop Yield Predictions

Precise crop yield predictions are of national importance for ensuring food security and sustainable agricultural practices. While AI-for-science approaches have exhibited promising achievements in solving many scientific problems such as drug discovery, precipitation nowcasting, etc., the development of deep learning models for predicting crop yields is constantly hindered by the lack of an open and large-scale deep learning-ready dataset with multiple modalities to accommodate sufficient information. To remedy this, we introduce the CropNet dataset, the first terabyte-sized, publicly available, and multi-modal dataset specifically targeting climate change-aware crop yield predictions for the contiguous United States (U.S.) continent at the county level. Our CropNet dataset is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, for over 2200 U.S. counties spanning 6 years (2017-2022), expected to facilitate researchers in developing versatile deep learning models for timely and precisely predicting crop yields at the county-level, by accounting for the effects of both short-term growing season weather variations and long-term climate change on crop yields. Besides, we develop the CropNet package, offering three types of APIs, for facilitating researchers in downloading the CropNet data on the fly over the time and region of interest, and flexibly building their deep learning models for accurate crop yield predictions. Extensive experiments have been conducted on our CropNet dataset via employing various types of deep learning solutions, with the results validating the general applicability and the efficacy of the CropNet dataset in climate change-aware crop yield predictions.

Updated: 2024-06-10 07:54:56

标题: 一个用于多模态气候变化感知作物产量预测的开放且大规模的数据集

摘要: 精确的作物产量预测对于确保食品安全和可持续农业实践具有国家重要性。虽然人工智能在科学领域已经取得了许多有希望的成就，如药物发现、降水现场预测等，但深度学习模型用于预测作物产量的发展一直受到一个问题的阻碍，即缺乏一个开放且大规模的深度学习准备好的数据集，具有多种模式以容纳足够的信息。为了解决这个问题，我们引入了CropNet数据集，这是第一个以太字节大小的、公开可用的、以多模式为目标的数据集，特别针对美国连续的县级大陆的气候变化感知作物产量预测。我们的CropNet数据集由三种数据模式组成，即Sentinel-2图像、WRF-HRRR计算数据集和USDA作物数据集，涵盖了超过2200个美国县，跨越6年（2017-2022），预计将帮助研究人员开发多功能的深度学习模型，用于及时和准确地预测县级作物产量，考虑到短期生长季节天气变化和长期气候变化对作物产量的影响。此外，我们开发了CropNet软件包，提供三种类型的API，以便研究人员在需要的时间和区域内即时下载CropNet数据，并灵活构建他们的深度学习模型，以准确预测作物产量。通过采用各种类型的深度学习解决方案，在我们的CropNet数据集上进行了大量实验，结果验证了CropNet数据集在气候变化感知作物产量预测中的普适性和有效性。

更新时间: 2024-06-10 07:54:56

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.06081v1

Multi-target stain normalization for histology slides

Traditional staining normalization approaches, e.g. Macenko, typically rely on the choice of a single representative reference image, which may not adequately account for the diverse staining patterns of datasets collected in practical scenarios. In this study, we introduce a novel approach that leverages multiple reference images to enhance robustness against stain variation. Our method is parameter-free and can be adopted in existing computational pathology pipelines with no significant changes. We evaluate the effectiveness of our method through experiments using a deep-learning pipeline for automatic nuclei segmentation on colorectal images. Our results show that by leveraging multiple reference images, better results can be achieved when generalizing to external data, where the staining can widely differ from the training set.

Updated: 2024-06-10 07:49:05

标题: 组织切片的多目标染色归一化

摘要: 传统的染色规范化方法，例如Macenko，通常依赖于选择一个代表性的参考图像，这可能无法充分考虑在实际场景中收集的数据集的多样性染色模式。在本研究中，我们引入了一种新颖的方法，利用多个参考图像来增强对染色变化的鲁棒性。我们的方法是无参数的，并且可以在现有的计算病理学流程中采用，无需进行重大更改。我们通过在结肠图像上使用深度学习流程进行自动细胞核分割的实验来评估我们方法的有效性。我们的结果表明，通过利用多个参考图像，在推广到外部数据时可以获得更好的结果，其中染色可能与训练集大不相同。

更新时间: 2024-06-10 07:49:05

领域: eess.IV,cs.AI,cs.CV,68U10,I.4.0

下载: http://arxiv.org/abs/2406.02077v3

Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

Vision Transformers (ViT), when paired with large-scale pretraining, have shown remarkable performance across various computer vision tasks, primarily due to their weak inductive bias. However, while such weak inductive bias aids in pretraining scalability, this may hinder the effective adaptation of ViTs for visuo-motor control tasks as a result of the absence of control-centric inductive biases. Such absent inductive biases include spatial locality and translation equivariance bias which convolutions naturally offer. To this end, we introduce Convolution Injector (CoIn), an add-on module that injects convolutions which are rich in locality and equivariance biases into a pretrained ViT for effective adaptation in visuo-motor control. We evaluate CoIn with three distinct types of pretrained ViTs (CLIP, MVP, VC-1) across 12 varied control tasks within three separate domains (Adroit, MetaWorld, DMC), and demonstrate that CoIn consistently enhances control task performance across all experimented environments and models, validating the effectiveness of providing pretrained ViTs with control-centric biases.

Updated: 2024-06-10 07:36:24

标题: 使用卷积注入器对预训练的ViTs进行调整以进行视觉-运动控制

摘要: Vision Transformers（ViT）与大规模预训练结合时，在各种计算机视觉任务中表现出卓越的性能，主要是由于它们的弱归纳偏差。然而，虽然这种弱归纳偏差有助于预训练的可扩展性，但由于缺乏以控制为中心的归纳偏差，这可能会妨碍ViTs在视觉-运动控制任务中的有效适应。这种缺失的归纳偏差包括空间局部性和平移等价性偏差，而卷积自然地提供了这些偏差。为此，我们引入了Convolution Injector（CoIn），这是一个附加模块，将富含局部性和等价性偏差的卷积注入预训练的ViT，以便在视觉-运动控制中实现有效的适应。我们使用三种不同类型的预训练ViTs（CLIP、MVP、VC-1）在三个不同领域（Adroit、MetaWorld、DMC）中的12种不同的控制任务中评估CoIn，并证明CoIn始终提高了所有实验环境和模型中的控制任务性能，验证了为预训练ViTs提供以控制为中心的偏差的有效性。

更新时间: 2024-06-10 07:36:24

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.06072v1

Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning

Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level problem, which is to find the root of a strongly monotone operator. In this work, we propose a new method for solving two-time-scale optimization that achieves significantly faster convergence than the prior arts. The key idea of our approach is to leverage an averaging step to improve the estimates of the operators in both lower and upper levels before using them to update the decision variables. These additional averaging steps eliminate the direct coupling between the main variables, enabling the accelerated performance of our algorithm. We characterize the finite-time convergence rates of the proposed algorithm under various conditions of the underlying objective function, including strong convexity, convexity, Polyak-Lojasiewicz condition, and general non-convexity. These rates significantly improve over the best-known complexity of the standard two-time-scale stochastic approximation algorithm. When applied to RL, we show how the proposed algorithm specializes to novel online sample-based methods that surpass or match the performance of the existing state of the art. Finally, we support our theoretical results with numerical simulations in RL.

Updated: 2024-06-10 07:32:12

标题: 快速双时间尺度随机梯度方法及其在强化学习中的应用

摘要: 双时间尺度优化是在曾等人（2024年）中引入的一个框架，它抽象了强化学习（RL）中一系列政策评估和政策优化问题。类似于在特定类型的随机预言者下的双层优化，双时间尺度优化框架具有一个上层目标，其梯度评估取决于解决一个强单调算子的根的下层问题。在这项工作中，我们提出了一种新的解决双时间尺度优化的方法，其收敛速度显著快于之前的方法。我们的方法的关键思想是利用平均步骤来改进下层和上层算子的估计，然后再用它们来更新决策变量。这些额外的平均步骤消除了主要变量之间的直接耦合，从而加速了我们算法的性能。我们表征了所提出算法在底层目标函数的不同条件下的有限时间收敛速度，包括强凸性、凸性、Polyak-Lojasiewicz条件和一般非凸性。这些速度显著优于标准双时间尺度随机逼近算法的已知复杂性。当应用于RL时，我们展示了所提出算法如何专门化为超越或匹配现有最先进技术性能的新型在线基于样本的方法。最后，我们通过RL中的数值模拟支持我们的理论结果。

更新时间: 2024-06-10 07:32:12

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.09660v2

A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha = n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observed in the adversarial robustness literature. Our main theoretical contribution is an exact asymptotic description of the sufficient statistics for the adversarial empirical risk minimiser, under generic convex and non-increasing losses. Our result allow us to precisely characterise which directions in the data are associated with a higher generalisation/robustness trade-off, as defined by a robustness and a usefulness metric. In particular, we unveil the existence of directions which can be defended without penalising accuracy. Finally, we show the advantage of defending non-robust features during training, identifying a uniform protection as an inherently effective defence mechanism.

Updated: 2024-06-10 07:24:37

标题: 一个用于对抗训练的高维统计模型：几何和权衡

摘要: 这项工作在高维度情况下探讨了基于边际的线性分类器中的对抗训练，其中维度$d$和数据点数量$n$随着固定比率$\alpha = n / d$发散。我们引入了一个可处理的数学模型，可以研究数据和对抗攻击者几何结构之间的相互作用，同时捕捉对抗鲁棒性文献中观察到的核心现象。我们的主要理论贡献是对对抗性经验风险最小化器的充分统计量的精确渐近描述，在通用凸损失和非递增损失下。我们的结果使我们能够准确地描述在数据中哪些方向与更高的泛化/鲁棒性权衡相关联，如鲁棒性和有用性度量所定义。特别地，我们揭示了可以在不损害准确性的情况下进行防御的方向的存在。最后，我们展示了在训练过程中防御非鲁棒特征的优势，确定统一保护作为一种本质有效的防御机制。

更新时间: 2024-06-10 07:24:37

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2402.05674v2

ProcessPainter: Learn Painting Process from Sequence Data

The painting process of artists is inherently stepwise and varies significantly among different painters and styles. Generating detailed, step-by-step painting processes is essential for art education and research, yet remains largely underexplored. Traditional stroke-based rendering methods break down images into sequences of brushstrokes, yet they fall short of replicating the authentic processes of artists, with limitations confined to basic brushstroke modifications. Text-to-image models utilizing diffusion processes generate images through iterative denoising, also diverge substantially from artists' painting process. To address these challenges, we introduce ProcessPainter, a text-to-video model that is initially pre-trained on synthetic data and subsequently fine-tuned with a select set of artists' painting sequences using the LoRA model. This approach successfully generates painting processes from text prompts for the first time. Furthermore, we introduce an Artwork Replication Network capable of accepting arbitrary-frame input, which facilitates the controlled generation of painting processes, decomposing images into painting sequences, and completing semi-finished artworks. This paper offers new perspectives and tools for advancing art education and image generation technology.

Updated: 2024-06-10 07:18:41

标题: ProcessPainter：从序列数据学习绘画过程

摘要: 艺术家的绘画过程本质上是分步进行的，并且在不同的画家和风格之间存在显著差异。生成详细的、逐步的绘画过程对艺术教育和研究至关重要，但目前仍然鲜有涉及。传统的基于笔触的渲染方法将图像分解为一系列笔触，但它们无法复制艺术家的真实过程，局限在基本笔触修改上。利用扩散过程的文本到图像模型通过迭代去噪生成图像，但也与艺术家的绘画过程有很大不同。为了解决这些挑战，我们引入了ProcessPainter，这是一个文本到视频模型，最初在合成数据上进行预训练，然后通过使用LoRA模型对一组艺术家的绘画序列进行微调。这种方法成功地首次从文本提示中生成绘画过程。此外，我们引入了一个能够接受任意帧输入的艺术品复制网络，这有助于控制生成绘画过程，将图像分解为绘画序列，并完成半成品艺术品。本文为推进艺术教育和图像生成技术提供了新的视角和工具。

更新时间: 2024-06-10 07:18:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06062v1

Greedy SLIM: A SLIM-Based Approach For Preference Elicitation

Preference elicitation is an active learning approach to tackle the cold-start problem of recommender systems. Roughly speaking, new users are asked to rate some carefully selected items in order to compute appropriate recommendations for them. To the best of our knowledge, we are the first to propose a method for preference elicitation that is based on SLIM , a state-of-the-art technique for top-N recommendation. Our approach mainly consists of a new training technique for SLIM, which we call Greedy SLIM. This technique iteratively selects items for the training in order to minimize the SLIM loss greedily. We conduct offline experiments as well as a user study to assess the performance of this new method. The results are remarkable, especially with respect to the user study. We conclude that Greedy SLIM seems to be more suitable for preference elicitation than widely used methods based on latent factor models.

Updated: 2024-06-10 07:18:24

标题: 贪婪SLIM：一种基于SLIM的偏好引导方法

摘要: 偏好引导是解决推荐系统冷启动问题的一种主动学习方法。简而言之，新用户被要求对一些精心挑选的物品进行评分，以便为他们计算出适当的推荐。据我们所知，我们是第一个提出一种基于SLIM的偏好引导方法的人，SLIM是一种用于top-N推荐的最先进技术。我们的方法主要包括一种新的SLIM训练技术，我们称之为贪婪SLIM。这种技术迭代地选择物品进行训练，以贪心地最小化SLIM损失。我们进行了离线实验以及用户研究，以评估这种新方法的性能。结果令人瞩目，特别是在用户研究方面。我们得出结论，贪婪SLIM似乎比基于潜在因素模型的广泛使用的方法更适合偏好引导。

更新时间: 2024-06-10 07:18:24

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.06061v1

Learning Physical Simulation with Message Passing Transformer

Machine learning methods for physical simulation have achieved significant success in recent years. We propose a new universal architecture based on Graph Neural Network, the Message Passing Transformer, which incorporates a Message Passing framework, employs an Encoder-Processor-Decoder structure, and applies Graph Fourier Loss as loss function for model optimization. To take advantage of the past message passing state information, we propose Hadamard-Product Attention to update the node attribute in the Processor, Hadamard-Product Attention is a variant of Dot-Product Attention that focuses on more fine-grained semantics and emphasizes on assigning attention weights over each feature dimension rather than each position in the sequence relative to others. We further introduce Graph Fourier Loss (GFL) to balance high-energy and low-energy components. To improve time performance, we precompute the graph's Laplacian eigenvectors before the training process. Our architecture achieves significant accuracy improvements in long-term rollouts for both Lagrangian and Eulerian dynamical systems over current methods.

Updated: 2024-06-10 07:14:56

标题: 学习物理模拟的消息传递变压器

摘要: 近年来，机器学习方法在物理模拟方面取得了显著的成功。我们提出了一种基于图神经网络的新型通用架构，即消息传递变压器，它融合了消息传递框架，采用了编码器-处理器-解码器结构，并应用了图傅立叶损失作为模型优化的损失函数。为了利用过去的消息传递状态信息，我们提出了Hadamard-Product注意力机制来更新处理器中的节点属性，Hadamard-Product 注意力是点积注意力的一种变体，侧重于更精细的语义，并强调在每个特征维度上分配注意力权重，而不是相对于其他位置在序列中分配。我们进一步引入了图傅里叶损失（GFL）来平衡高能量和低能量成分。为了提高时间性能，我们在训练过程之前预先计算图的拉普拉斯特征向量。我们的架构在长期滚动过程中显著提高了对拉格朗日和欧拉动力系统的准确性，比当前方法有了显著的改进。

更新时间: 2024-06-10 07:14:56

领域: cs.LG

下载: http://arxiv.org/abs/2406.06060v1

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

We develop generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayesian theory, covering both the random sampling setting and the random splitting setting. We show that the transductive generalization gap can be bounded by the mutual information between training labels selection and the hypothesis. By introducing the concept of transductive supersamples, we translate results depicted by various information measures from the inductive learning setting to the transductive learning setting. We further establish PAC-Bayesian bounds with weaker assumptions on the loss function and numbers of training and test data points. Finally, we present the upper bounds for adaptive optimization algorithms and demonstrate the applications of results on semi-supervised learning and graph learning scenarios. Our theoretic results are validated on both synthetic and real-world datasets.

Updated: 2024-06-10 06:50:09

标题: 信息论泛化界限用于传导学习及其应用

摘要: 我们在信息理论和PAC-Bayesian理论的背景下，针对转导学习算法开发了泛化边界，涵盖了随机抽样设置和随机分割设置。我们展示了转导泛化差距可以通过训练标签选择与假设之间的互信息来界定。通过引入转导超样本的概念，我们将各种信息度量所描述的结果从归纳学习设置转换到转导学习设置。我们进一步建立了PAC-Bayesian边界，对损失函数和训练数据点和测试数据点的数量提出了更弱的假设。最后，我们提出自适应优化算法的上界，并展示了结果在半监督学习和图学习场景中的应用。我们的理论结果在合成和真实世界数据集上得到验证。

更新时间: 2024-06-10 06:50:09

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.04561v2

On the Utility of Accounting for Human Beliefs about AI Behavior in Human-AI Collaboration

To enable effective human-AI collaboration, merely optimizing AI performance while ignoring humans is not sufficient. Recent research has demonstrated that designing AI agents to account for human behavior leads to improved performance in human-AI collaboration. However, a limitation of most existing approaches is their assumption that human behavior is static, irrespective of AI behavior. In reality, humans may adjust their action plans based on their observations of AI behavior. In this paper, we address this limitation by enabling a collaborative AI agent to consider the beliefs of its human partner, i.e., what the human partner thinks the AI agent is doing, and design its action plan to facilitate easier collaboration with its human partner. Specifically, we developed a model of human beliefs that accounts for how humans reason about the behavior of their AI partners. Based on this belief model, we then developed an AI agent that considers both human behavior and human beliefs in devising its strategy for working with humans. Through extensive real-world human-subject experiments, we demonstrated that our belief model more accurately predicts humans' beliefs about AI behavior. Moreover, we showed that our design of AI agents that accounts for human beliefs enhances performance in human-AI collaboration.

Updated: 2024-06-10 06:39:37

标题: 关于考虑人类对人工智能行为的信念在人工智能与人类合作中的实用性

摘要: 为了实现有效的人工智能与人类的协作，仅仅优化人工智能的性能而忽视人类是不够的。最近的研究表明，设计考虑人类行为的人工智能代理可以提升人工智能与人类协作的性能。然而，大多数现有方法的局限性在于它们假设人类行为是静态的，不考虑人工智能的行为。实际上，人类可能会根据他们对人工智能行为的观察来调整他们的行动计划。在本文中，我们通过使协作人工智能代理考虑其人类伙伴的信念，即人类伙伴认为人工智能代理正在做什么，并设计其行动计划以便更容易与人类伙伴协作，来解决这一限制。具体而言，我们开发了一个人类信念模型，考虑了人类如何推理其人工智能伙伴的行为。基于这个信念模型，我们开发了一个人工智能代理，考虑了人类行为和人类信念，制定了与人类合作的策略。通过大量的真实世界人类实验，我们证明了我们的信念模型更准确地预测了人类对人工智能行为的信念。此外，我们还展示了我们设计的考虑人类信念的人工智能代理增强了人工智能与人类协作的性能。

更新时间: 2024-06-10 06:39:37

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.06051v1

Robust Latent Representation Tuning for Image-text Classification

Large models have demonstrated exceptional generalization capabilities in computer vision and natural language processing. Recent efforts have focused on enhancing these models with multimodal processing abilities. However, addressing the challenges posed by scenarios where one modality is absent remains a significant hurdle. In response to this issue, we propose a robust latent representation tuning method for large models. Specifically, our approach introduces a modality latent translation module to maximize the correlation between modalities. Following this, a newly designed fusion module is employed to facilitate information interaction between the modalities. In this framework, not only are common semantics refined during training, but the method also yields robust representations in the absence of one modality. Importantly, our method maintains the frozen state of the image and text foundation models to preserve their abilities acquired through large-scale pretraining. We conduct experiments on several public datasets, and the results underscore the effectiveness of our proposed method.

Updated: 2024-06-10 06:29:00

标题: 稳健的潜在表征调整用于图像文本分类

摘要: 大型模型在计算机视觉和自然语言处理中展现出卓越的泛化能力。最近的努力集中在增强这些模型的多模态处理能力。然而，解决一个模态缺失的情况所带来的挑战仍然是一个重要障碍。针对这个问题，我们提出了一种针对大型模型的稳健潜在表示调整方法。具体地，我们的方法引入了一个模态潜在转换模块，以最大化模态之间的相关性。随后，采用一个新设计的融合模块来促进模态之间的信息交互。在这个框架中，不仅在训练过程中对常见语义进行了精细化处理，而且在一个模态缺失的情况下，该方法也产生了稳健的表示。重要的是，我们的方法保持图像和文本基础模型的冻结状态，以保留它们通过大规模预训练获得的能力。我们在几个公共数据集上进行了实验，结果突显了我们提出的方法的有效性。

更新时间: 2024-06-10 06:29:00

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.06048v1

MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

Pretraining data selection has the potential to improve language model pretraining efficiency by utilizing higher-quality data from massive web data corpora. Current data selection methods, which rely on either hand-crafted rules or larger reference models, are conducted statically and do not capture the evolving data preferences during pretraining. In this paper, we introduce model-aware data selection with data influence models (MATES), where a data influence model continuously adapts to the evolving data preferences of the pretraining model and then selects the data most effective for the current pretraining progress. Specifically, we fine-tune a small data influence model to approximate oracle data preference signals collected by locally probing the pretraining model and to select data accordingly for the next pretraining stage. Experiments on Pythia and the C4 dataset demonstrate that MATES significantly outperforms random data selection on extensive downstream tasks in both zero- and few-shot settings. It doubles the gains achieved by recent data selection approaches that leverage larger reference models and reduces the total FLOPs required to reach certain performances by half. Further analysis validates the ever-changing data preferences of pretraining models and the effectiveness of our data influence models to capture them. Our code is open-sourced at https://github.com/cxcscmu/MATES.

Updated: 2024-06-10 06:27:42

标题: MATES: 模型感知数据选择，用于具有数据影响模型的高效预训练

摘要: 预训练数据选择有潜力通过利用来自大规模网络数据语料库的高质量数据来提高语言模型的预训练效率。当前的数据选择方法，依赖于手工制定的规则或更大的参考模型，是静态进行的，并不能捕捉预训练过程中数据偏好的演变。在本文中，我们引入了一种基于模型感知的数据选择方法，即数据影响模型（MATES），其中数据影响模型不断适应预训练模型的演变数据偏好，然后选择最有效的数据用于当前的预训练进度。具体来说，我们微调一个小的数据影响模型，以近似通过在本地探测预训练模型收集的oracle数据偏好信号，并相应地为下一个预训练阶段选择数据。在Pythia和C4数据集上的实验证明，MATES在零次和少次迁移学习设置中明显优于随机数据选择。它将最近利用更大参考模型的数据选择方法取得的收益翻倍，并将达到一定性能所需的总FLOP减少了一半。进一步的分析验证了预训练模型的不断变化的数据偏好以及我们的数据影响模型捕获它们的有效性。我们的代码在https://github.com/cxcscmu/MATES上开源。

更新时间: 2024-06-10 06:27:42

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06046v1

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority.

Updated: 2024-06-10 06:26:03

标题: 使用扩散模型合成高效数据，用于人员再识别预训练

摘要: 现有的人员重新识别（Re-ID）方法主要使用ImageNet-1K数据集进行模型初始化，这不可避免地导致由于较大的领域差异而出现次优的情况。其中一个关键挑战是建立大规模人员Re-ID数据集是耗时的。一些先前的努力通过从互联网收集人员图像来解决这个问题，例如LUPerson，但它难以从未标记、不可控制和嘈杂的数据中学习。在本文中，我们提出了一种新的范式Diffusion-ReID，可以在不需要任何数据收集和注释成本的情况下有效地增广和生成多样化的图像。从技术上讲，这种范式分为两个阶段：生成和过滤。在生成阶段，我们提出了语言提示增强（LPE）以确保输入图像序列和生成图像之间的ID一致性。在扩散过程中，我们提出了多样性注入（DI）模块来增加属性多样性。为了使生成的数据质量更高，我们应用了一个Re-ID置信度阈值过滤器来进一步去除低质量的图像。受益于我们提出的范式，我们首先创建了一个新的大规模人员Re-ID数据集Diff-Person，其中包含来自5,183个身份的超过777K张图像。接下来，我们构建了一个在我们的Diff-Person上预训练的更强大的人员Re-ID骨干。在六个广泛使用的设置中对四个人员Re-ID基准进行了广泛的实验。与其他预训练和自监督竞争对手相比，我们的方法表现出明显的优势。

更新时间: 2024-06-10 06:26:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06045v1

Risk Sensitivity in Markov Games and Multi-Agent Reinforcement Learning: A Systematic Review

Markov games (MGs) and multi-agent reinforcement learning (MARL) are studied to model decision making in multi-agent systems. Traditionally, the objective in MG and MARL has been risk-neutral, i.e., agents are assumed to optimize a performance metric such as expected return, without taking into account subjective or cognitive preferences of themselves or of other agents. However, ignoring such preferences leads to inaccurate models of decision making in many real-world scenarios in finance, operations research, and behavioral economics. Therefore, when these preferences are present, it is necessary to incorporate a suitable measure of risk into the optimization objective of agents, which opens the door to risk-sensitive MG and MARL. In this paper, we systemically review the literature on risk sensitivity in MG and MARL that has been growing in recent years alongside other areas of reinforcement learning and game theory. We define and mathematically describe different risk measures used in MG and MARL and individually for each measure, discuss articles that incorporate it. Finally, we identify recent trends in theoretical and applied works in the field and discuss possible directions of future research.

Updated: 2024-06-10 06:19:33

标题: 《马尔可夫博弈和多智能体强化学习中的风险敏感性：系统综述》

摘要: 马尔可夫博弈（MGs）和多智能体强化学习（MARL）被研究用于建模多智能体系统中的决策制定。传统上，在MG和MARL中的目标是风险中性的，即，智能体被假定为优化性能指标，如预期回报，而不考虑自身或其他智能体的主观或认知偏好。然而，忽略这些偏好会导致在金融、运营研究和行为经济学等许多实际场景中的决策制定模型不准确。因此，当存在这些偏好时，有必要将风险的适当度量纳入智能体的优化目标中，从而打开了风险敏感的MG和MARL之门。在本文中，我们系统地审查了近年来与强化学习和博弈论其他领域一道不断增长的MG和MARL中的风险敏感性文献。我们定义并数学描述了在MG和MARL中使用的不同风险度量，并针对每种度量分别讨论了包含它的文章。最后，我们确定了该领域理论和应用作品中的最新趋势，并讨论了未来研究的可能方向。

更新时间: 2024-06-10 06:19:33

领域: cs.GT,cs.LG,cs.MA,cs.SY,eess.SY,I.2.11

下载: http://arxiv.org/abs/2406.06041v1

Homomorphism Counts for Graph Neural Networks: All About That Basis

A large body of work has investigated the properties of graph neural networks and identified several limitations, particularly pertaining to their expressive power. Their inability to count certain patterns (e.g., cycles) in a graph lies at the heart of such limitations, since many functions to be learned rely on the ability of counting such patterns. Two prominent paradigms aim to address this limitation by enriching the graph features with subgraph or homomorphism pattern counts. In this work, we show that both of these approaches are sub-optimal in a certain sense and argue for a more fine-grained approach, which incorporates the homomorphism counts of all structures in the ``basis'' of the target pattern. This yields strictly more expressive architectures without incurring any additional overhead in terms of computational complexity compared to existing approaches. We prove a series of theoretical results on node-level and graph-level motif parameters and empirically validate them on standard benchmark datasets.

Updated: 2024-06-10 06:14:34

标题: 图神经网络的同态计数：一切关乎基础

摘要: 大量研究已经探讨了图神经网络的特性，并确定了一些限制，特别是与它们的表达能力有关的限制。它们无法计算图中某些模式（例如循环）是这些限制的核心，因为许多需要学习的功能依赖于计算这些模式的能力。两个重要的范例旨在通过使用子图或同态模式计数来解决这种限制。在这项工作中，我们展示了这两种方法在某种程度上都不够理想，并主张采用更细粒度的方法，将目标模式的“基础”中所有结构的同态计数纳入考虑。这会产生更具表现力的架构，而与现有方法相比并不会增加任何计算复杂性方面的额外开销。我们在标准基准数据集上证明了一系列关于节点级和图级模式参数的理论结果，并进行了实证验证。

更新时间: 2024-06-10 06:14:34

领域: cs.LG

下载: http://arxiv.org/abs/2402.08595v5

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions from 50 Atari games and evaluates it across diverse environment distributions. Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. In contrast, objectives focused on learning task-specific knowledge (e.g., identifying agents and fitting reward functions) improve performance in environments similar to the pre-training dataset but not in varied ones. We publicize our codes, datasets, and model checkpoints at https://github.com/dojeon-ai/Atari-PB.

Updated: 2024-06-10 06:06:38

标题: 调查视觉强化学习中泛化的预训练目标

摘要: 最近，在基于视觉的强化学习（RL）中引入了各种预训练方法。然而，由于评估仅限于分布环境和非统一的实验设置，它们的泛化能力仍不清楚。为了解决这个问题，我们引入了Atari预训练基准（Atari-PB），该基准在50个Atari游戏的1000万次转换中对ResNet-50模型进行预训练，并在不同的环境分布下对其进行评估。我们的实验表明，以学习任务无关特征为重点的预训练目标（例如，识别对象和理解时间动态）增强了在不同环境中的泛化能力。相比之下，以学习特定任务知识为重点的目标（例如，识别代理和拟合奖励函数）提高了在类似于预训练数据集的环境中的性能，但在多样化环境中并未表现出改进。我们在https://github.com/dojeon-ai/Atari-PB 上公开了我们的代码，数据集和模型检查点。

更新时间: 2024-06-10 06:06:38

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.06037v1

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. Despite several efforts to alleviate the degradation, the transformer-based SR model still suffers severe degradation due to its distinctive activation distribution. In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization. The proposed method first investigates the weight and activation and finds that the distribution is characterized by coexisting symmetry and asymmetry, long tails. Specifically, we propose Distribution-Oriented Bound Initialization (DOBI), using different searching strategies to search a coarse bound for quantizers. To obtain refined quantizer parameters, we further propose Distillation Quantization Calibration (DQC), which employs a distillation approach to make the quantized model learn from its FP counterpart. Through extensive experiments on different bits and scaling factors, the performance of DOBI can reach the state-of-the-art (SOTA) while after stage two, our method surpasses existing PTQ in both metrics and visual effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2) compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio and 5.08x speedup ratio. The code and models will be available at https://github.com/Kai-Liu001/2DQuant.

Updated: 2024-06-10 06:06:11

标题: 2DQuant：图像超分辨率的低比特后训练量化

摘要: 低比特量化已经在压缩图像超分辨率（SR）模型的边缘部署中变得普遍，这使得先进的SR模型可以享受紧凑的低比特参数和高效的整数/位运算构造，分别用于存储压缩和推理加速。然而，众所周知，与全精度（FP）对应物相比，低比特量化会降低SR模型的准确性。尽管已经做出了一些努力来缓解这种降级，但基于Transformer的SR模型仍然因其独特的激活分布而遭受严重的降级。在这项工作中，我们提出了一种用于图像超分辨率的双阶段低比特后训练量化（PTQ）方法，即2DQuant，它在低比特量化下实现了高效且准确的SR。所提出的方法首先研究了权重和激活，并发现其分布特征为共存的对称性和不对称性，长尾。具体来说，我们提出了基于分布的边界初始化（DOBI），使用不同的搜索策略来搜索量化器的粗略边界。为了获得精细的量化器参数，我们进一步提出了蒸馏量化校准（DQC），该方法采用蒸馏方法，使量化模型能够从其FP对应物中学习。通过对不同比特和缩放因子进行广泛实验，DOBI的性能可以达到最新技术水平（SOTA），而在第二阶段之后，我们的方法在指标和视觉效果方面超过了现有的PTQ。2DQuant在Set5（x2）上的PSNR提高了高达4.52dB，与SOTA相比，当量化为2比特时，压缩比为3.60倍，加速比为5.08倍。代码和模型将可在https://github.com/Kai-Liu001/2DQuant 上获取。

更新时间: 2024-06-10 06:06:11

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06649v1

Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors

Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of known transient attacks and explore a small subset of instruction set. Further, they take a random fuzzing approach that does not scale as the complexity of search space increases. In this work, we identify that the search space of bad speculation is disjointedly fragmented into equivalence classes, and then use this observation to develop a framework named Shesha, inspired by Particle Swarm Optimization, which exhibits faster convergence rates than state-of-the-art fuzzing techniques for automatic discovery of transient execution attacks. We then use Shesha to explore the vast search space of extensions to the x86 Instruction Set Architecture (ISEs), thereby focusing on previously unexplored avenues of bad speculation. As such, we report five previously unreported transient execution paths in Instruction Set Extensions (ISEs) on new generation of Intel processors. We then perform extensive reverse engineering of each of the transient execution paths and provide root-cause analysis. Using the discovered transient execution paths, we develop attack building blocks to exhibit exploitable transient windows. Finally, we demonstrate data leakage from Fused Multiply-Add instructions through SIMD buffer and extract victim data from various cryptographic implementations.

Updated: 2024-06-10 05:56:34

标题: Shesha：新一代英特尔处理器中的多头微体系结构泄漏发现

摘要: 短暂执行攻击自发现Spectre和Meltdown以来一直是广泛探讨的微体系结构侧信道之一。然而，许多研究都是通过手动发现新的短暂路径来驱动的，这些路径是通过众所周知的推测事件实现的。虽然文献中存在一些自动化短暂泄漏发现的尝试，但这些工具主要集中在寻找已知短暂攻击的变体，并只探索指令集的一小部分。此外，它们采用的是随机模糊化方法，随着搜索空间复杂度的增加，这种方法并不具备可扩展性。在这项工作中，我们发现不良推测的搜索空间被不连续地分割成等价类，并利用这一观察结果开发了一个名为Shesha的框架，受粒子群优化启发，展现出比最先进的模糊技术更快的收敛速率，用于自动发现短暂执行攻击。然后我们使用Shesha来探索x86指令集架构的扩展的广阔搜索空间，重点关注以前未曾探索的不良推测途径。因此，我们报告了在新一代英特尔处理器的指令集扩展中之前未曾报道的五条短暂执行路径。然后我们对每个短暂执行路径进行了广泛的逆向工程，并提供了根本原因分析。利用发现的短暂执行路径，我们开发攻击构建模块，以展示可利用的短暂窗口。最后，我们演示了通过SIMD缓冲区从融合乘加指令中泄露数据，并从各种加密实现中提取受害者数据。

更新时间: 2024-06-10 05:56:34

领域: cs.CR

下载: http://arxiv.org/abs/2406.06034v1

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.

Updated: 2024-06-10 05:48:30

标题: 基于大型语言模型内部状态的无监督实时幻觉检测

摘要: 大型语言模型（LLMs）中的幻觉是指LLMs产生的响应是连贯的，但事实上不准确的现象。这个问题削弱了LLMs在实际应用中的有效性，需要研究如何检测和减轻LLMs的幻觉。先前的研究主要集中在用于幻觉检测的后处理技术上，这些技术往往计算密集且受限于与LLMs推理过程的分离。为了克服这些限制，我们引入了MIND，这是一个无监督训练框架，利用LLMs的内部状态进行实时幻觉检测，无需手动注释。此外，我们介绍了HELM，一个新的用于评估跨多个LLMs的幻觉检测的基准，包括多样化的LLMs输出和LLMs在推理过程中的内部状态。我们的实验证明，MIND在幻觉检测方面胜过现有的最先进方法。

更新时间: 2024-06-10 05:48:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.06448v2

HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

While large language models (LLMs) excel in various natural language tasks in English, their performance in lower-resourced languages like Hebrew, especially for generative tasks such as abstractive summarization, remains unclear. The high morphological richness in Hebrew adds further challenges due to the ambiguity in sentence comprehension and the complexities in meaning construction. In this paper, we address this resource and evaluation gap by introducing HeSum, a novel benchmark specifically designed for abstractive text summarization in Modern Hebrew. HeSum consists of 10,000 article-summary pairs sourced from Hebrew news websites written by professionals. Linguistic analysis confirms HeSum's high abstractness and unique morphological challenges. We show that HeSum presents distinct difficulties for contemporary state-of-the-art LLMs, establishing it as a valuable testbed for generative language technology in Hebrew, and MRLs generative challenges in general.

Updated: 2024-06-10 05:45:25

标题: HeSum：一个用于希伯来语抽象文本摘要的新型数据集

摘要: 尽管大型语言模型（LLMs）在英语中在各种自然语言任务中表现出色，但它们在希伯来语等资源较少的语言中，特别是在抽象总结等生成任务中的表现仍不清楚。希伯来语中的高度形态丰富性增加了句子理解的模糊性和意义构建的复杂性。本文通过引入HeSum来解决这一资源和评估差距，这是一个专门设计用于现代希伯来语的抽象文本总结的新基准。HeSum包括来自专业人士撰写的希伯来语新闻网站的1万篇文章摘要对。语言分析证实了HeSum的高抽象性和独特的形态挑战。我们展示了HeSum对当今最先进的LLMs具有明显困难，将其确立为希伯来语和MRLs生成挑战的宝贵测试平台。

更新时间: 2024-06-10 05:45:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.03897v2

A Universal Class of Sharpness-Aware Minimization Algorithms

Recently, there has been a surge in interest in developing optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, most literature only considers a few sharpness measures, such as the maximum eigenvalue or trace of the training loss Hessian, which may not yield meaningful insights for non-convex optimization scenarios like neural networks. Additionally, many sharpness measures are sensitive to parameter invariances in neural networks, magnifying significantly under rescaling parameters. Motivated by these challenges, we introduce a new class of sharpness measures in this paper, leading to new sharpness-aware objective functions. We prove that these measures are \textit{universally expressive}, allowing any function of the training loss Hessian matrix to be represented by appropriate hyperparameters. Furthermore, we show that the proposed objective functions explicitly bias towards minimizing their corresponding sharpness measures, and how they allow meaningful applications to models with parameter invariances (such as scale-invariances). Finally, as instances of our proposed general framework, we present \textit{Frob-SAM} and \textit{Det-SAM}, which are specifically designed to minimize the Frobenius norm and the determinant of the Hessian of the training loss, respectively. We also demonstrate the advantages of our general framework through extensive experiments.

Updated: 2024-06-10 05:40:59

标题: 一种通用的锐度感知最小化算法类

摘要: 最近，人们对为过度参数化模型开发优化算法产生了浓厚的兴趣，因为人们认为实现泛化需要具有适当偏差的算法。这种兴趣集中在最小化原始损失函数的尖锐度上；尖锐度感知最小化（SAM）算法已被证明是有效的。然而，大多数文献只考虑了一些尖锐度度量，比如训练损失Hessian的最大特征值或迹，这可能不会为非凸优化场景（如神经网络）提供有意义的见解。此外，许多尖锐度度量对神经网络中的参数不变性很敏感，在重新调整参数时会显著放大。受到这些挑战的启发，我们在本文中介绍了一类新的尖锐度度量，从而导致新的尖锐度感知目标函数。我们证明这些度量是“普遍表达式”，允许任何训练损失Hessian矩阵的函数通过适当的超参数来表示。此外，我们展示了所提出的目标函数明确偏向于最小化其对应的尖锐度度量，以及它们如何允许对具有参数不变性（如尺度不变性）的模型进行有意义的应用。最后，作为我们提出的一般框架的实例，我们提出了Frob-SAM和Det-SAM，专门设计用于最小化训练损失Hessian的Frobenius范数和行列式，分别。我们还通过大量实验展示了我们一般框架的优势。

更新时间: 2024-06-10 05:40:59

领域: cs.LG

下载: http://arxiv.org/abs/2406.03682v2

A Large-Scale Exploration of $μ$-Transfer

Large neural network models have become a mainstay of language, vision, and audio processing and synthesis, yet their initialization and learning rates are still set in a largely unsophisticated and potentially expensive fashion, with a potentially high cost incurred every time a new architecture or model size is to be trained. The $\mu$-Parameterization ($\mu$P) offers a potential solution to these challenges, yielding scaling rules for model initialization and learning rates, and reportedly enabling zero-shot hyperparameter transfer from small to large models in a variety of cases. Despite its evident promise, the $\mu$P method is not yet widely adopted, perhaps due to higher implementation complexity, many variations, or complex theoretical background. This work investigates $\mu$P empirically, focusing on the ubiquitous transformer architecture, and aims to answer a simple question: does $\mu$-Transfer yield optimal learning rates in practice? Studying models of up to 10B parameters and training budgets of up to 190B tokens, we find $\mu$-Transfer works as intended for the majority of important cases, yet also identify a few cases where it may not. Our experiment codebase is available at https://github.com/lucaslingle/mu_transformer/

Updated: 2024-06-10 05:19:55

标题: 一个关于$μ$-迁移的大规模探索

摘要: 大型神经网络模型已经成为语言、视觉和音频处理和合成的主要工具，然而它们的初始化和学习率仍然采用一种相对不成熟和潜在昂贵的方式进行设置，每次需要训练新的架构或模型规模时都可能产生高昂的成本。μ-参数化（μP）提供了一个潜在的解决方案，为模型初始化和学习率提供了缩放规则，并据报道在各种情况下使得从小型模型到大型模型的零射击超参数转移成为可能。尽管μP方法有着明显的潜力，但目前尚未被广泛采用，或许是由于实现复杂性较高、存在许多变体或背后有复杂的理论基础。本研究从经验的角度对μP进行了调查，重点关注普遍存在的transformer架构，并旨在回答一个简单的问题：μ-转移在实践中是否产生最佳学习率？通过研究多达10B参数和高达190B令牌的训练预算的模型，我们发现在大多数重要情况下，μ-转移的效果符合预期，但也确定了一些情况下可能不适用的情况。我们的实验代码库可在https://github.com/lucaslingle/mu_transformer/获取。

更新时间: 2024-06-10 05:19:55

领域: cs.LG

下载: http://arxiv.org/abs/2404.05728v4

FAIIR: Building Toward A Conversational AI Agent Assistant for Youth Mental Health Service Provision

World's healthcare systems and mental health agencies face both a growing demand for youth mental health services, alongside a simultaneous challenge of limited resources. Given these constraints, this work presents our experience in the creation and evaluation of the FAIIR (Frontline Assistant: Issue Identification and Recommendation) tool, an ensemble of domain-adapted and fine-tuned transformer models, leveraging natural language processing to identify issues that youth may be experiencing. We explore the technical development, performance, and validation processes leveraged for the FAIIR tool in application to situations of frontline crisis response via Kids Help Phone. Frontline Crisis Responders assign an issue tag from a defined list following each conversation. Assisting with the identification of issues of relevance helps reduce the burden on CRs, ensuring that appropriate resources can be provided and that active rescues and mandatory reporting can take place in critical situations requiring immediate de-escalation.

Updated: 2024-06-10 05:17:28

标题: FAIIR：为青少年心理健康服务提供构建对话式人工智能助手

摘要: 世界各地的医疗保健系统和心理健康机构面临着青少年心理健康服务需求不断增长的挑战，同时也面临着资源有限的困境。鉴于这些限制，本研究介绍了我们在创建和评估FAIIR（前线助手：问题识别和建议）工具方面的经验，该工具是一组经过领域适应和微调的变压器模型，利用自然语言处理来识别青少年可能遇到的问题。我们探讨了在Kids Help Phone的前线危机应对情况中应用FAIIR工具时所利用的技术开发、性能和验证过程。前线危机应对人员在每次对话后根据定义列表为问题分配标签。帮助识别相关问题有助于减轻危机应对人员的负担，确保能够提供适当的资源，并在需要立即降级的关键情况下进行积极的营救和强制性报告。

更新时间: 2024-06-10 05:17:28

领域: cs.AI

下载: http://arxiv.org/abs/2405.18553v2

UniHead: Unifying Multi-Perception for Detection Heads

The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception, global perception and cross-task perception. Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach (1) introduces deformation perception, enabling the model to adaptively sample object features; (2) proposes a Dual-axial Aggregation Transformer (DAT) to adeptly model long-range dependencies, thereby achieving global perception; and (3) devises a Cross-task Interaction Transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.

Updated: 2024-06-10 05:17:16

标题: UniHead：将多感知统一为检测头

摘要: 检测头是目标检测器中的一个关键组件，负责执行分类和定位功能。遗憾的是，通常使用的并行头往往缺乏全方位感知能力，如变形感知、全局感知和跨任务感知。尽管有许多方法试图从单一方面增强这些能力，但实现全面统一的解决方案仍然是一个重大挑战。为了应对这一挑战，我们开发了一种创新的检测头，称为UniHead，同时统一了三种感知能力。更具体地说，我们的方法（1）引入了变形感知，使模型能够自适应地采样对象特征；（2）提出了双轴聚合变换器（DAT）来熟练地建模远程依赖性，从而实现全局感知；（3）设计了一种跨任务交互变换器（CIT），促进了分类和定位分支之间的交互，从而使两个任务保持一致。作为一种即插即用的方法，所提出的UniHead可以方便地与现有的检测器集成。对COCO数据集的大量实验证明，我们的UniHead可以显著改进许多检测器。例如，UniHead在RetinaNet中可以获得+2.7 AP增益，在FreeAnchor中可以获得+2.9 AP增益，在GFL中可以获得+2.1 AP增益。代码可在https://github.com/zht8506/UniHead上找到。

更新时间: 2024-06-10 05:17:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.13242v2

RepoQA: Evaluating Long Context Code Understanding

Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we initiate the RepoQA benchmark to evaluate LLMs on long-context code understanding. Traditional needle testers ask LLMs to directly retrieve the answer from the context without necessary deep understanding. In RepoQA, we built our initial task, namely Searching Needle Function (SNF), which exercises LLMs to search functions given their natural-language description, i.e., LLMs cannot find the desired function if they cannot understand the description and code. RepoQA is multilingual and comprehensive: it includes 500 code search tasks gathered from 50 popular repositories across 5 modern programming languages. By evaluating 26 general and code-specific LLMs on RepoQA, we show (i) there is still a small gap between the best open and proprietary models; (ii) different models are good at different languages; and (iii) models may understand code better without comments.

Updated: 2024-06-10 05:15:30

标题: RepoQA：评估长上下文代码理解

摘要: 最近的进展不断改进了大型语言模型（LLMs）的上下文窗口。为了量化LLMs真正的长上下文能力，评估器如流行的“草堆中的针”已经被开发用于在大量原始文本中测试LLMs。虽然有效，当前的评估忽视了LLMs如何处理长上下文代码，即仓库的见解。为此，我们发起了RepoQA基准测试，以评估LLMs对长上下文代码的理解。传统的针对测试器要求LLMs直接从上下文中检索答案，而无需深入理解。在RepoQA中，我们建立了我们的初始任务，即搜索Needle函数（SNF），它让LLMs根据它们的自然语言描述搜索函数，即如果LLMs无法理解描述和代码，则无法找到所需的函数。RepoQA是多语言和全面的：它包括从5种现代编程语言中收集的50个流行存储库中获取的500个代码搜索任务。通过在RepoQA上评估26个通用和特定代码的LLMs，我们展示了（i）最佳开放和专有模型之间仍存在一小差距；（ii）不同模型擅长不同语言；以及（iii）模型可能在没有注释的情况下更好地理解代码。

更新时间: 2024-06-10 05:15:30

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.06025v1

SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation

Sign languages are multi-channel languages that communicate information through not just the hands (manual signals) but also facial expressions and upper body movements (non-manual signals). However, since automatic sign language translation is usually performed by generating a single sequence of glosses, researchers eschew non-manual and co-occurring manual signals in favor of a simplified list of manual glosses. This can lead to significant information loss and ambiguity. In this paper, we introduce a new task named multi-channel sign language translation (MCSLT) and present a novel metric, SignBLEU, designed to capture multiple signal channels. We validated SignBLEU on a system-level task using three sign language corpora with varied linguistic structures and transcription methodologies and examined its correlation with human judgment through two segment-level tasks. We found that SignBLEU consistently correlates better with human judgment than competing metrics. To facilitate further MCSLT research, we report benchmark scores for the three sign language corpora and release the source code for SignBLEU at https://github.com/eq4all-projects/SignBLEU.

Updated: 2024-06-10 05:01:26

标题: SignBLEU：多通道手语翻译的自动评估

摘要: 手语是一种多通道语言，通过手部（手势信号）、面部表情和上半身动作（非手势信号）来传达信息。然而，由于自动手语翻译通常是通过生成单个手语序列来执行的，研究人员避开了非手势和同时出现的手势信号，而更倾向于简化的手势列表。这可能导致重要信息的丢失和歧义。在本文中，我们引入了一个名为多通道手语翻译（MCSLT）的新任务，并提出了一种新的度量标准，SignBLEU，旨在捕捉多个信号通道。我们在一个系统级任务中使用三个手语语料库验证了SignBLEU，这些语料库具有不同的语言结构和转录方法，并通过两个片段级任务研究其与人类判断的相关性。我们发现，SignBLEU与竞争性度量标准相比，与人类判断更一致。为了促进进一步的MCSLT研究，我们报告了三个手语语料库的基准分数，并在https://github.com/eq4all-projects/SignBLEU发布了SignBLEU的源代码。

更新时间: 2024-06-10 05:01:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06648v1

Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN.

Updated: 2024-06-10 04:56:40

标题: 通过解耦表示学习增强图神经网络中的尺寸泛化

摘要: 尽管大多数图神经网络（GNNs）可以在任何大小的图上运行，但它们的分类性能在大于训练过程中遇到的图形时通常会下降。现有方法未能充分解决从图形表示中删除大小信息的问题，导致性能次优并依赖于骨干模型。为此，我们提出了DISGEN，这是一个新颖的、与模型无关的框架，旨在将大小因素从图形表示中解开。DISGEN采用大小和任务不变的增强，并引入了一个解耦损失，最小化隐藏表示中的共享信息，具有其有效性的理论保证。我们的实证结果表明，DISGEN在真实数据集上的性能优于最先进的模型多达6%，突显了其增强图神经网络尺寸泛化能力的效果。我们的代码可在以下链接找到：https://github.com/GraphmindDartmouth/DISGEN。

更新时间: 2024-06-10 04:56:40

领域: cs.LG

下载: http://arxiv.org/abs/2406.04601v2

GraphStorm: all-in-one graph machine learning framework for industry applications

Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm.

Updated: 2024-06-10 04:56:16

标题: GraphStorm：面向行业应用的一体化图机器学习框架

摘要: 图机器学习（GML）在许多商业应用中是有效的。然而，使GML易于使用并适用于具有海量数据集的行业应用仍然具有挑战性。我们开发了GraphStorm，它提供了一个端到端的解决方案，用于可扩展的图构建、图模型训练和推断。GraphStorm具有以下可取之处：（a）易于使用：它可以通过一个简单的命令执行图构建和模型训练和推断；（b）专家友好：GraphStorm包含许多高级GML建模技术，用于处理复杂的图数据并提高模型性能；（c）可扩展：GraphStorm中的每个组件都可以在具有数十亿节点的图上运行，并且可以将模型训练和推断扩展到不同的硬件而无需更改任何代码。自2023年5月发布以来，GraphStorm已在十几个规模达十亿的行业应用中使用和部署。它在Github上开源：https://github.com/awslabs/graphstorm。

更新时间: 2024-06-10 04:56:16

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.06022v1

HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce a trainable parameter as a weighting factor for the Jacobian at each triangle to adaptively change local shapes while maintaining global correspondences and facial features. Moreover, to ensure the coherence of the resulting shape and appearance from different viewpoints, we use pretrained image diffusion models for differentiable rendering with regularization terms to refine the deformation under text guidance. Extensive experiments demonstrate that our method can generate diverse head avatars with an articulated mesh that can be edited seamlessly in 3D graphics software, facilitating downstream applications such as more efficient animation with inherited blend shapes and semantic consistency.

Updated: 2024-06-10 04:50:36

标题: HeadEvolver：通过表达和属性保持网格变形实现文本到头像的转换

摘要: 我们提出了HeadEvolver，这是一个新颖的框架，可以根据文本指导生成风格化的头部头像。HeadEvolver使用来自模板头部网格的局部可学习网格变形，生成高质量的数字资产，可用于保留细节的编辑和动画。为了解决全局变形中缺乏细粒度和语义感知的局部形状控制的挑战，我们引入了一个可训练的参数作为每个三角形的Jacobian的加权因子，以自适应地改变局部形状同时保持全局对应关系和面部特征。此外，为了确保来自不同视角的结果形状和外观的一致性，我们使用预训练的图像扩散模型进行可微渲染，并使用正则化项来在文本指导下优化变形。大量实验证明，我们的方法可以生成具有关节网格的多样化头部头像，这些头像可以在3D图形软件中无缝编辑，有助于后续应用，例如继承混合形状和语义一致性，实现更高效的动画。

更新时间: 2024-06-10 04:50:36

领域: cs.GR,cs.AI,I.2.6; I.3.8

下载: http://arxiv.org/abs/2403.09326v2

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors. We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60% of the time. However, the more unrealistic the retrieved content is (i.e. more deviated from truth), the less likely the model is to adopt it. Also, the less confident a model is in its initial response (via measuring token probabilities), the more likely it is to adopt the information in the retrieved content. We exploit this finding and demonstrate simple methods for improving model accuracy where there is conflicting retrieved content. Our results highlight a difficult task and benchmark for LLMs -- namely, their ability to correctly discern when it is wrong in light of correct retrieved content and to reject cases when the provided content is incorrect.

Updated: 2024-06-10 04:44:57

标题: ClashEval：量化LLM内部先验和外部证据之间的拉锯战

摘要: 检索增强生成（RAG）经常用于减轻幻觉并为大型语言模型（LLMs）提供最新知识。然而，考虑到文档检索是一个不精确的任务，有时会导致错误甚至有害内容在上下文中呈现，这就引发了一个问题：LLMs如何处理检索到的信息？如果提供的内容是错误的，模型是否知道忽略它，还是重复错误？相反，当模型的初始回应是错误的时，是否总是知道使用检索到的信息来纠正自己，还是坚持其错误的先前回应？为了回答这个问题，我们整理了一个包含超过1200个问题的数据集，涵盖六个领域（如药物剂量、奥运纪录、地点），以及回答每个问题所需的相关内容。我们进一步对内容中的答案施加精确的扰动，范围从微妙到明显的错误。我们在这个数据集上对包括GPT-4o在内的六个表现最佳的LLMs进行基准测试，发现LLMs容易采用错误的检索内容，超过60%的时间覆盖其自己正确的先前知识。然而，检索到的内容越不现实（即与真相相去甚远），模型就越不可能采用它。另外，模型对其初始回应的信心越低（通过衡量标记概率），就越有可能采用检索到的内容中的信息。我们利用这一发现，并展示了在存在冲突的检索内容时改进模型准确性的简单方法。我们的结果突显了LLMs的一个困难任务和基准--即，它们在正确辨别在正确检索内容的情况下何时出错，并拒绝提供内容不正确的情况。

更新时间: 2024-06-10 04:44:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.10198v2

Probabilistic Regular Tree Priors for Scientific Symbolic Reasoning

Symbolic Regression (SR) allows for the discovery of scientific equations from data. To limit the large search space of possible equations, prior knowledge has been expressed in terms of formal grammars that characterize subsets of arbitrary strings. However, there is a mismatch between context-free grammars required to express the set of syntactically correct equations, missing closure properties of the former, and a tree structure of the latter. Our contributions are to (i) compactly express experts' prior beliefs about which equations are more likely to be expected by probabilistic Regular Tree Expressions (pRTE), and (ii) adapt Bayesian inference to make such priors efficiently available for symbolic regression encoded as finite state machines. Our scientific case studies show its effectiveness in soil science to find sorption isotherms and for modeling hyper-elastic materials.

Updated: 2024-06-10 04:39:52

标题: 概率正则树先验用于科学符号推理

摘要: 象征回归（SR）允许从数据中发现科学方程。为了限制可能方程的大搜索空间，先前的知识已用形式语法表达，这些形式语法表征任意字符串的子集。然而，需要表达语法正确的方程集的无上下文语法，以及前者缺失的封闭特性之间存在不匹配，而后者具有树结构。我们的贡献是通过概率正则树表达式（pRTE）紧凑地表达专家对哪些方程更有可能被预期的先验信念，并将贝叶斯推理调整为使这种先验对象作为有限状态机编码的象征回归有效可用。我们的科学案例研究显示了其在土壤科学中寻找吸附等温线和建模超弹性材料方面的有效性。

更新时间: 2024-06-10 04:39:52

领域: cs.LG,cs.AI,cs.FL

下载: http://arxiv.org/abs/2306.08506v2

Neuro-TransUNet: Segmentation of stroke lesion in MRI using transformers

Accurate segmentation of the stroke lesions using magnetic resonance imaging (MRI) is associated with difficulties due to the complicated anatomy of the brain and the different properties of the lesions. This study introduces the Neuro-TransUNet framework, which synergizes the U-Net's spatial feature extraction with SwinUNETR's global contextual processing ability, further enhanced by advanced feature fusion and segmentation synthesis techniques. The comprehensive data pre-processing pipeline improves the framework's efficiency, which involves resampling, bias correction, and data standardization, enhancing data quality and consistency. Ablation studies confirm the significant impact of the advanced integration of U-Net with SwinUNETR and data pre-processing pipelines on performance and demonstrate the model's effectiveness. The proposed Neuro-TransUNet model, trained with the ATLAS v2.0 \emph{training} dataset, outperforms existing deep learning algorithms and establishes a new benchmark in stroke lesion segmentation.

Updated: 2024-06-10 04:36:21

标题: 《神经-TransUNet：使用transformers在MRI中对中风损伤进行分割》

摘要: 磁共振成像（MRI）准确分割中风病变与复杂的大脑解剖结构和病变不同特性有关。本研究引入了Neuro-TransUNet框架，将U-Net的空间特征提取与SwinUNETR的全局背景处理能力相结合，进一步通过先进的特征融合和分割合成技术进行增强。全面的数据预处理流程提高了框架的效率，包括重新采样、偏差校正和数据标准化，增强了数据质量和一致性。消融研究证实了U-Net与SwinUNETR的先进集成以及数据预处理流程对性能的显著影响，并证明了模型的有效性。提出的Neuro-TransUNet模型使用ATLAS v2.0训练数据集进行训练，优于现有深度学习算法，并在中风病变分割中建立了新的基准。

更新时间: 2024-06-10 04:36:21

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2406.06017v1

EpiLearn: A Python Library for Machine Learning in Epidemic Modeling

EpiLearn is a Python toolkit developed for modeling, simulating, and analyzing epidemic data. Although there exist several packages that also deal with epidemic modeling, they are often restricted to mechanistic models or traditional statistical tools. As machine learning continues to shape the world, the gap between these packages and the latest models has become larger. To bridge the gap and inspire innovative research in epidemic modeling, EpiLearn not only provides support for evaluating epidemic models based on machine learning, but also incorporates comprehensive tools for analyzing epidemic data, such as simulation, visualization, transformations, etc. For the convenience of both epidemiologists and data scientists, we provide a unified framework for training and evaluation of epidemic models on two tasks: Forecasting and Source Detection. To facilitate the development of new models, EpiLearn follows a modular design, making it flexible and easy to use. In addition, an interactive web application is also developed to visualize the real-world or simulated epidemic data. Our package is available at https://github.com/Emory-Melody/EpiLearn.

Updated: 2024-06-10 04:35:14

标题: EpiLearn：一个用于流行病模型中机器学习的Python库

摘要: EpiLearn是一个用于建模、模拟和分析流行病数据的Python工具包。虽然存在一些处理流行病建模的软件包，但它们通常局限于机械模型或传统统计工具。随着机器学习继续塑造世界，这些软件包与最新模型之间的差距变得更大。为了弥合这一差距，激发流行病建模的创新研究，EpiLearn不仅提供了基于机器学习评估流行病模型的支持，还结合了用于分析流行病数据的综合工具，如模拟、可视化、转换等。为了方便流行病学家和数据科学家，我们提供了一个统一的框架，用于在两个任务上训练和评估流行病模型：预测和源检测。为了促进新模型的开发，EpiLearn采用了模块化设计，使其灵活易用。此外，还开发了一个交互式网络应用程序，用于可视化真实或模拟的流行病数据。我们的软件包可以在https://github.com/Emory-Melody/EpiLearn上找到。

更新时间: 2024-06-10 04:35:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.06016v1

Quantum algorithms for spectral sums

We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices. The spectral sum of an PSD matrix $A$, for a function $f$, is defined as $ \text{Tr}[f(A)] = \sum_j f(\lambda_j)$, where $\lambda_j$ are the eigenvalues of $A$. Typical examples of spectral sums are the von Neumann entropy, the trace of $A^{-1}$, the log-determinant, and the Schatten $p$-norm, where the latter does not require the matrix to be PSD. The current best classical randomized algorithms estimating these quantities have a runtime that is at least linearly in the number of nonzero entries of the matrix and quadratic in the estimation error. Assuming access to a block-encoding of a matrix, our algorithms are sub-linear in the matrix size, and depend at most quadratically on other parameters, like the condition number and the approximation error, and thus can compete with most of the randomized and distributed classical algorithms proposed in the literature, and polynomially improve the runtime of other quantum algorithms proposed for the same problems. We show how the algorithms and techniques used in this work can be applied to three problems in spectral graph theory: approximating the number of triangles, the effective resistance, and the number of spanning trees within a graph.

Updated: 2024-06-10 04:21:41

标题: 量子算法用于谱和算。

摘要: 我们提出了一种新的量子算法，用于估计正半定（PSD）矩阵的谱和。对于一个函数$f$，PSD矩阵$A$的谱和被定义为$ \text{Tr}[f(A)] = \sum_j f(\lambda_j)$，其中$\lambda_j$是$A$的特征值。谱和的典型例子包括冯·诺依曼熵、$A^{-1}$的迹、对数行列式和Schatten $p$-范数，后者不要求矩阵为PSD。当前最佳的经典随机算法估计这些量的运行时间至少与矩阵的非零元素数量成线性关系，并且与估计误差成二次关系。假设可以访问矩阵的块编码，我们的算法在矩阵大小上是次线性的，并且最多与其他参数，如条件数和近似误差成二次关系，因此可以与文献中提出的大多数随机和分布式经典算法竞争，并且在运行时间上多项式地改进了其他用于相同问题的量子算法。我们展示了本文中使用的算法和技术如何应用于谱图理论中的三个问题：近似三角形数量、有效电阻和图中生成树数量。

更新时间: 2024-06-10 04:21:41

领域: quant-ph,cs.AI,cs.DS,cs.LG

下载: http://arxiv.org/abs/2011.06475v2

How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

The emergence of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Advancement of LLM-based program synthesis calls for a thorough evaluation of LLM-generated code. Most evaluation frameworks focus on the (functional) correctness of generated code; efficiency, as an important measure of code quality, has been overlooked in existing evaluations. In this work, we develop ENAMEL (EfficeNcy AutoMatic EvaLuator), a rigorous and high-standard benchmark for evaluating the capability of LLMs in generating efficient code. Firstly, we propose a new efficiency metric called eff@k, which generalizes the pass@k metric from correctness to efficiency and appropriately handles right-censored execution time. Furthermore, we derive an unbiased and variance-reduced estimator of eff@k via Rao--Blackwellization; we also provide a numerically stable implementation for the new estimator. Secondly, to set a high-standard for efficiency evaluation, we employ a human expert to design best algorithms and implementations as our reference solutions of efficiency, many of which are much more efficient than existing canonical solutions in HumanEval and HumanEval+. Moreover, to ensure a rigorous evaluation, we employ a human expert to curate strong test case generators to filter out wrong code and differentiate suboptimal algorithms. An extensive study across 30 popular LLMs using our benchmark ENAMEL shows that LLMs still fall short of generating expert-level efficient code. Using two subsets of our problem set, we demonstrate that such deficiency is because current LLMs struggle in designing advanced algorithms and are barely aware of implementation optimization. Our benchmark is publicly available at https://github.com/q-rz/enamel .

Updated: 2024-06-10 04:19:20

标题: LLM生成的代码有多有效？一个严谨和高标准的基准测试

摘要: 大型语言模型（LLMs）的出现显著推动了程序综合的前沿。基于LLM的程序综合的进展需要对LLM生成的代码进行彻底评估。大多数评估框架侧重于生成代码的（功能）正确性；作为代码质量的重要衡量标准，效率在现有评估中被忽视了。在这项工作中，我们开发了ENAMEL（EfficeNcy AutoMatic EvaLuator），这是一个严格的高标准基准，用于评估LLM在生成高效代码方面的能力。首先，我们提出了一种新的效率度量标准，称为eff@k，它将正确性度量pass@k从正确性推广到效率，并适当处理了右截尾的执行时间。此外，我们通过Rao-Blackwell化推导出了eff@k的无偏和方差减少的估计器；我们还为新估计器提供了一个数值稳定的实现。其次，为了为效率评估设定高标准，我们雇用了一名人类专家设计最佳算法和实现作为我们效率的参考解决方案，其中许多比现有的经典解决方案在HumanEval和HumanEval+中更有效。此外，为了确保严格评估，我们雇用了一名人类专家策划强大的测试用例生成器，以过滤错误代码并区分次优算法。在使用我们的基准ENAMEL对30个流行的LLMs进行广泛研究后，发现LLMs仍然无法生成专家级高效代码。使用我们问题集的两个子集，我们证明了这种不足是因为当前的LLMs在设计高级算法方面存在困难，并且几乎没有意识到实现优化。我们的基准可在https://github.com/q-rz/enamel 上公开获得。

更新时间: 2024-06-10 04:19:20

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06647v1

The Impact of AI on Academic Research and Publishing

Generative artificial intelligence (AI) technologies like ChatGPT, have significantly impacted academic writing and publishing through their ability to generate content at levels comparable to or surpassing human writers. Through a review of recent interdisciplinary literature, this paper examines ethical considerations surrounding the integration of AI into academia, focusing on the potential for this technology to be used for scholarly misconduct and necessary oversight when using it for writing, editing, and reviewing of scholarly papers. The findings highlight the need for collaborative approaches to AI usage among publishers, editors, reviewers, and authors to ensure that this technology is used ethically and productively.

Updated: 2024-06-10 04:10:18

标题: 人工智能对学术研究和出版的影响

摘要: 生成式人工智能（AI）技术，如ChatGPT，通过其能够生成与甚至超过人类作者相媲美水平的内容，已经显著影响了学术写作和出版。通过对最近跨学科文献的审查，本文探讨了围绕将AI整合到学术界的伦理考虑，重点关注这种技术可能被用于学术不端行为以及在写作、编辑和评审学术论文时的必要监督。研究结果强调了出版商、编辑、评审人员和作者之间合作使用AI的必要性，以确保这种技术在伦理和生产力方面得到正确应用。

更新时间: 2024-06-10 04:10:18

领域: cs.DL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.06009v1

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://github.com/richard-peng-xia/CARES.

Updated: 2024-06-10 04:07:09

标题: CARES：医学视觉语言模型可信度综合基准

摘要: 人工智能已经在医疗应用中产生了显著影响，特别是随着医疗大视觉语言模型（Med-LVLMs）的出现，为自动化和个性化医疗的未来带来了乐观情绪。然而，Med-LVLMs的可信度尚未经过验证，为未来模型部署带来了重大风险。在本文中，我们介绍了CARES，并旨在全面评估医疗领域Med-LVLMs的可信度。我们评估了Med-LVLMs在信任度、公平性、安全性、隐私性和稳健性等五个维度上的可信度。CARES包含约41K个封闭和开放式格式的问题-答案对，涵盖了16种医学图像模态和27个解剖区域。我们的分析表明，这些模型一贯存在可信度方面的问题，经常出现事实错误并未能在不同人口群体之间保持公平。此外，它们容易受到攻击并展现出缺乏隐私意识。我们在https://github.com/richard-peng-xia/CARES上公开发布了我们的基准和代码。

更新时间: 2024-06-10 04:07:09

领域: cs.LG,cs.CL,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.06007v1

FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model

Most existing image captioning evaluation metrics focus on assigning a single numerical score to a caption by comparing it with reference captions. However, these methods do not provide an explanation for the assigned score. Moreover, reference captions are expensive to acquire. In this paper, we propose FLEUR, an explainable reference-free metric to introduce explainability into image captioning evaluation metrics. By leveraging a large multimodal model, FLEUR can evaluate the caption against the image without the need for reference captions, and provide the explanation for the assigned score. We introduce score smoothing to align as closely as possible with human judgment and to be robust to user-defined grading criteria. FLEUR achieves high correlations with human judgment across various image captioning evaluation benchmarks and reaches state-of-the-art results on Flickr8k-CF, COMPOSITE, and Pascal-50S within the domain of reference-free evaluation metrics. Our source code and results are publicly available at: https://github.com/Yebin46/FLEUR.

Updated: 2024-06-10 03:57:39

标题: FLEUR：一种基于大型多模型的图像字幕可解释性无参考评估指标

摘要: 大多数现有的图像字幕评估指标专注于将一个字幕与参考字幕进行比较，从而为其分配一个单一的数值分数。然而，这些方法并未为所分配的分数提供解释。此外，获取参考字幕是昂贵的。在本文中，我们提出了FLEUR，一种可解释的无参考度量，以将可解释性引入图像字幕评估指标中。通过利用一个大型多模态模型，FLEUR可以在无需参考字幕的情况下评估字幕与图像之间的关系，并为所分配的分数提供解释。我们引入了分数平滑以尽可能与人类判断保持一致，并对用户定义的评分标准具有鲁棒性。FLEUR在各种图像字幕评估基准上与人类判断高度相关，并在Flickr8k-CF、COMPOSITE和Pascal-50S领域内达到了最先进的结果。我们的源代码和结果可以在以下网址公开获取：https://github.com/Yebin46/FLEUR。

更新时间: 2024-06-10 03:57:39

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06004v1

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD.

Updated: 2024-06-10 03:51:38

标题: 张量-张量回归的计算和统计保证与张量列车分解

摘要: 最近，提出了一种张量对张量（ToT）回归模型，旨在推广张量恢复，涵盖了类似标量对张量回归和张量对向量回归的情景。然而，张量复杂度的指数增长给ToT回归中的存储和计算带来挑战。为了克服这一障碍，引入了张量分解，基于张量列（TT）的ToT模型在实践中证明了其高效性，因为其降低了存储需求，增强了计算效率，并降低了采样复杂性。尽管存在这些实际好处，理论分析与实际表现之间存在差距。在本文中，我们深入探讨了基于TT的ToT回归模型的理论和算法方面。假设回归算子满足受限等距性质（RIP），我们对受约束最小二乘优化问题的解进行了误差分析。这个分析包括上误差界和极小下界，揭示了这些误差界在多项式上依赖于阶数 $N+M$。为了高效找到满足这些误差界的解，我们提出了两种优化算法：迭代硬阈值（IHT）算法（采用梯度下降与TT奇异值分解（TT-SVD））和使用黎曼梯度下降（RGD）算法的因子化方法。当RIP被满足时，谱初始化有助于适当初始化，并且我们建立了IHT和RGD的线性收敛速度。

更新时间: 2024-06-10 03:51:38

领域: cs.LG,eess.SP,math.OC

下载: http://arxiv.org/abs/2406.06002v1

QuickCent: a fast and frugal heuristic for harmonic centrality estimation on scale-free networks

We present a simple and quick method to approximate network centrality indexes. Our approach, called QuickCent, is inspired by so-called fast and frugal heuristics, which are heuristics initially proposed to model some human decision and inference processes. The centrality index that we estimate is the harmonic centrality, which is a measure based on shortest-path distances, so infeasible to compute on large networks. We compare QuickCent with known machine learning algorithms on synthetic data generated with preferential attachment, and some empirical networks. Our experiments show that QuickCent is able to make estimates that are competitive in accuracy with the best alternative methods tested, either on synthetic scale-free networks or empirical networks. QuickCent has the feature of achieving low error variance estimates, even with a small training set. Moreover, QuickCent is comparable in efficiency -- accuracy and time cost -- to those produced by more complex methods. We discuss and provide some insight into how QuickCent exploits the fact that in some networks, such as those generated by preferential attachment, local density measures such as the in-degree, can be a proxy for the size of the network region to which a node has access, opening up the possibility of approximating centrality indices based on size such as the harmonic centrality. Our initial results show that simple heuristics and biologically inspired computational methods are a promising line of research in the context of network measure estimations.

Updated: 2024-06-10 03:47:29

标题: QuickCent：一种在无标度网络上快速估算谐波中心性的简便启发式算法

摘要: 我们提出了一种简单而快速的方法来近似网络中心性指标。我们的方法称为QuickCent，受到所谓的快速和节俭启发式的启发，这些启发式最初被提出来模拟一些人类决策和推理过程。我们估计的中心性指标是谐波中心性，这是一种基于最短路径距离的度量，因此在大型网络上计算是不可行的。我们将QuickCent与已知的机器学习算法在使用优先附着生成的合成数据和一些实证网络上进行了比较。我们的实验表明，QuickCent能够做出与已测试的最佳替代方法相竞争的准确估计，无论是在合成的无标度网络还是实证网络上。QuickCent具有实现低误差方差估计的特点，即使是在小训练集的情况下也是如此。此外，QuickCent在效率上与更复杂方法产生的结果相当 -- 准确性和时间成本。我们讨论并提供了一些关于QuickCent如何利用某些网络的事实的见解，比如那些由优先附着生成的网络，局部密度度量如入度可以作为节点访问的网络区域大小的代理，从而可能近似基于大小的中心性指标，如谐波中心性。我们的初步结果表明，简单启发式和受生物启发的计算方法在网络度量估计方面是一个有前途的研究方向。

更新时间: 2024-06-10 03:47:29

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2303.00927v2

fSEAD: a Composable FPGA-based Streaming Ensemble Anomaly Detection Library

Machine learning ensembles combine multiple base models to produce a more accurate output. They can be applied to a range of machine learning problems, including anomaly detection. In this paper, we investigate how to maximize the composability and scalability of an FPGA-based streaming ensemble anomaly detector (fSEAD). To achieve this, we propose a flexible computing architecture consisting of multiple partially reconfigurable regions, pblocks, which each implement anomaly detectors. Our proof-of-concept design supports three state-of-the-art anomaly detection algorithms: Loda, RS-Hash and xStream. Each algorithm is scalable, meaning multiple instances can be placed within a pblock to improve performance. Moreover, fSEAD is implemented using High-level synthesis (HLS), meaning further custom anomaly detectors can be supported. Pblocks are interconnected via an AXI-switch, enabling them to be composed in an arbitrary fashion before combining and merging results at run-time to create an ensemble that maximizes the use of FPGA resources and accuracy. Through utilizing reconfigurable Dynamic Function eXchange (DFX), the detector can be modified at run-time to adapt to changing environmental conditions. We compare fSEAD to an equivalent central processing unit (CPU) implementation using four standard datasets, with speed-ups ranging from $3\times$ to $8\times$.

Updated: 2024-06-10 03:38:35

标题: fSEAD：一种基于FPGA的可组合流式集成异常检测库

摘要: 机器学习集成将多个基本模型组合在一起，产生更准确的输出。它们可以应用于各种机器学习问题，包括异常检测。在本文中，我们研究如何最大化基于FPGA的流式集成异常检测器（fSEAD）的可组合性和可扩展性。为实现这一目标，我们提出了一个包含多个部分可重配置区域pblocks的灵活计算架构，每个区域实现异常检测器。我们的概念验证设计支持三种最先进的异常检测算法：Loda、RS-Hash和xStream。每种算法都具有可扩展性，意味着可以在pblock内放置多个实例以提高性能。此外，fSEAD使用高级综合（HLS）实现，这意味着可以支持进一步定制的异常检测器。pblocks通过AXI开关互连，使它们能够以任意方式组合，然后在运行时合并结果，创建一个最大化FPGA资源利用率和准确性的集成。通过利用可重配置的动态功能交换（DFX），检测器可以在运行时修改以适应环境条件的变化。我们将fSEAD与等价的中央处理单元（CPU）实现在四个标准数据集上进行比较，加速范围为3倍至8倍。

更新时间: 2024-06-10 03:38:35

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05999v1

Ricci flow-guided autoencoders in learning time-dependent dynamics

We present a manifold-based autoencoder method for learning dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by simulating Ricci flow in a physics-informed setting, and manifold quantities can be matched so that Ricci flow is empirically achieved. With our method, the manifold is discerned through the training procedure, while the latent evolution due to Ricci flow induces a more accommodating representation over static methods. We present our method on a range of experiments consisting of PDE data that encompasses desirable characteristics such as periodicity and randomness. The dynamical manifold latent space facilitates qualities such as learning for out-of-distribution data, and robustness. We showcase our method by demonstrating these features.

Updated: 2024-06-10 03:33:25

标题: 翻译：Ricci流引导的自动编码器在学习时变动力学中的应用

摘要: 我们提出了一种基于流形的自动编码器方法，用于学习时间动态，特别是偏微分方程（PDEs），其中流形潜在空间根据里奇流演化。通过在物理信息设定中模拟里奇流，可以实现这一点，并且可以匹配流形量，以便实现里奇流。通过我们的方法，在训练过程中可以识别流形，而由于里奇流引起的潜在演化会比静态方法产生更具包容性的表示。我们在一系列实验中展示了我们的方法，这些实验包含具有周期性和随机性等理想特征的PDE数据。动态流形潜在空间促进了学习超出分布数据和鲁棒性等特性。我们通过展示这些特性来展示我们的方法。

更新时间: 2024-06-10 03:33:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.14591v7

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data, where the constant stepsize $\eta$ is so large that the loss initially oscillates. We show that GD exits this initial oscillatory phase rapidly -- in $\mathcal{O}(\eta)$ steps -- and subsequently achieves an $\tilde{\mathcal{O}}(1 / (\eta t) )$ convergence rate after $t$ additional steps. Our results imply that, given a budget of $T$ steps, GD can achieve an accelerated loss of $\tilde{\mathcal{O}}(1/T^2)$ with an aggressive stepsize $\eta:= \Theta( T)$, without any use of momentum or variable stepsize schedulers. Our proof technique is versatile and also handles general classification loss functions (where exponential tails are needed for the $\tilde{\mathcal{O}}(1/T^2)$ acceleration), nonlinear predictors in the neural tangent kernel regime, and online stochastic gradient descent (SGD) with a large stepsize, under suitable separability conditions.

Updated: 2024-06-10 03:32:05

标题: 大步长梯度下降法用于逻辑回归损失函数：损失函数的非单调性提高了优化效率

摘要: 我们考虑使用恒定步长的梯度下降（GD）应用于具有线性可分数据的逻辑回归，其中恒定步长$\eta$非常大，使得损失最初会振荡。我们展示了GD能够快速退出这个初始振荡阶段 - 在$\mathcal{O}(\eta)$步之内 - 并在额外的$t$步之后实现$\tilde{\mathcal{O}}(1/(\eta t))$的收敛速率。我们的结果表明，给定$T$步的预算，GD可以在激进的步长$\eta:= \Theta(T)$的情况下实现加速损失$\tilde{\mathcal{O}}(1/T^2)$，而无需使用任何动量或变步长调度器。我们的证明技术是多才多艺的，也适用于一般分类损失函数（其中需要指数尾部以实现$\tilde{\mathcal{O}}(1/T^2)$的加速）、神经切线核区域中的非线性预测以及在线随机梯度下降（SGD）在适当的可分离条件下使用大步长。

更新时间: 2024-06-10 03:32:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.15926v2

A Dual-View Approach to Classifying Radiology Reports by Co-Training

Radiology report analysis provides valuable information that can aid with public health initiatives, and has been attracting increasing attention from the research community. In this work, we present a novel insight that the structure of a radiology report (namely, the Findings and Impression sections) offers different views of a radiology scan. Based on this intuition, we further propose a co-training approach, where two machine learning models are built upon the Findings and Impression sections, respectively, and use each other's information to boost performance with massive unlabeled data in a semi-supervised manner. We conducted experiments in a public health surveillance study, and results show that our co-training approach is able to improve performance using the dual views and surpass competing supervised and semi-supervised methods.

Updated: 2024-06-10 03:29:23

标题: 一个双视角方法来通过协同训练对放射学报告进行分类

摘要: 放射学报告分析提供了有价值的信息，可以帮助公共卫生倡议，并且已经引起了研究界的日益关注。在这项工作中，我们提出了一个新颖的见解，即放射学报告的结构（即，发现和印象部分）提供了不同的放射学扫描视图。基于这种直觉，我们进一步提出了一种协同训练方法，其中两个机器学习模型分别建立在发现和印象部分，并利用彼此的信息以半监督方式利用大量未标记数据提高性能。我们在一项公共卫生监测研究中进行了实验，结果表明我们的协同训练方法能够通过双重视图提高性能，并超越竞争的监督和半监督方法。

更新时间: 2024-06-10 03:29:23

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05995v1

Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning

Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we therefore addressed the problem of finding multiple solutions from a single task in offline RL. We propose algorithms that can learn multiple solutions in offline RL, and empirically investigate their performance. Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL.

Updated: 2024-06-10 03:25:49

标题: 在离线强化学习中从单个任务中发现多个解决方案

摘要: 近期关于在线强化学习（RL）的研究表明，从单个任务中学习多种行为的优势，例如在适应新环境时进行少次迭代。尽管这种方法在离线RL中也有望带来类似的好处，但以前的研究并未完全探讨学习多种解决方案的适当方法。因此，在本研究中，我们解决了在离线RL中从单个任务中找到多种解决方案的问题。我们提出了可以在离线RL中学习多个解决方案的算法，并对其性能进行了实证研究。我们的实验结果表明，所提出的算法在离线RL中学习了多个在质量和数量上具有显著差异的解决方案。

更新时间: 2024-06-10 03:25:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05993v1

Federated Representation Learning in the Under-Parameterized Regime

Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.

Updated: 2024-06-10 03:14:21

标题: 在欠参数化情况下的联邦表示学习

摘要: 联合表示学习（FRL）是一种流行的个性化联合学习（FL）框架，其中客户端共同合作训练一个共同的表示，同时保留他们的个性化头部。然而，现有研究主要集中在过度参数化的情况下。本文初步探讨了在欠参数化情况下的FRL，即FL模型不足以表达所有地面真实模型的变化。我们提出了一种新颖的FRL算法FLUTE，并在欠参数化情况下对线性模型的样本复杂性和收敛速度进行了理论表征。据我们所知，这是该领域首个具有可证明性能保证的FRL算法。FLUTE具有数据无关的随机初始化和精心设计的目标函数，有助于从错位的局部表示中提取全局最优表示所涵盖的子空间。在技术方面，我们将低秩矩阵逼近技术与FL分析相结合，可能引起广泛兴趣。我们还将FLUTE扩展到超出线性表示。实验结果表明，FLUTE在合成和真实任务中均优于最先进的FRL解决方案。

更新时间: 2024-06-10 03:14:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.04596v2

Learning the Uncertainty Sets for Control Dynamics via Set Membership: A Non-Asymptotic Analysis

This paper studies uncertainty set estimation for unknown linear systems. Uncertainty sets are crucial for the quality of robust control since they directly influence the conservativeness of the control design. Departing from the confidence region analysis of least squares estimation, this paper focuses on set membership estimation (SME). Though good numerical performances have attracted applications of SME in the control literature, the non-asymptotic convergence rate of SME for linear systems remains an open question. This paper provides the first convergence rate bounds for SME and discusses variations of SME under relaxed assumptions. We also provide numerical results demonstrating SME's practical promise.

Updated: 2024-06-10 03:05:58

标题: 通过集合成员学习控制动态的不确定性集：非渐近分析

摘要: 本文研究未知线性系统的不确定性集估计。不确定性集对鲁棒控制的质量至关重要，因为它们直接影响控制设计的保守性。本文从最小二乘估计的置信区间分析出发，重点关注集成员估计（SME）。尽管SME在控制文献中具有良好的数值性能，但对于线性系统的非渐近收敛速率仍然是一个未解问题。本文提供了SME的首次收敛率界限，并讨论了在放宽假设条件下的SME变体。我们还提供了展示SME实用前景的数值结果。

更新时间: 2024-06-10 03:05:58

领域: math.OC,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2309.14648v2

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG.

Updated: 2024-06-10 03:00:28

标题: Neural-g：一种用于混合密度估计的深度学习框架

摘要: 混合（或先验）密度估计是机器学习和统计学中的一个重要问题，特别是在经验贝叶斯$g$建模中，准确估计先验对于进行良好的后验推断是必要的。在本文中，我们提出了一种新的基于神经网络的$g$建模估计器 neural-$g$。Neural-$g$使用softmax输出层来确保估计的先验是一个有效的概率密度。在默认超参数下，我们展示了 neural-$g$ 非常灵活，能够捕捉许多未知密度，包括具有平坦区域、长尾和/或不连续性的密度。相比之下，现有方法很难捕捉所有这些先验形状。我们通过建立一个关于神经网络学习任意概率质量函数能力的新的通用逼近定理来证明 neural-$g$ 的合理性。为了加速我们的数值实现的收敛，我们利用加权平均梯度下降方法来更新网络参数。最后，我们将 neural-$g$ 扩展到多元先验密度估计。我们通过模拟和真实数据集的分析展示了我们方法的有效性。一个用于实现 neural-$g$ 的软件包可以在 https://github.com/shijiew97/neuralG 公开获取。

更新时间: 2024-06-10 03:00:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.05986v1

Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook

Mental health constitutes a complex and pervasive global challenge, affecting millions of lives and often leading to severe consequences. In this paper, we conduct a thorough survey to explore the intersection of data science, artificial intelligence, and mental healthcare, focusing on the recent developments of mental disorder detection through online social media (OSM). A significant portion of the population actively engages in OSM platforms, creating a vast repository of personal data that holds immense potential for mental health analytics. The paper navigates through traditional diagnostic methods, state-of-the-art data- and AI-driven research studies, and the emergence of explainable AI (XAI) models for mental healthcare. We review state-of-the-art machine learning methods, particularly those based on modern deep learning, while emphasising the need for explainability in healthcare AI models. The experimental design section provides insights into prevalent practices, including available datasets and evaluation approaches. We also identify key issues and challenges in the field and propose promising future research directions. As mental health decisions demand transparency, interpretability, and ethical considerations, this paper contributes to the ongoing discourse on advancing XAI in mental healthcare through social media. The comprehensive overview presented here aims to guide researchers, practitioners, and policymakers in developing the area of mental disorder detection.

Updated: 2024-06-10 02:51:16

标题: 通过社交媒体进行心理障碍检测的可解释人工智能：调查与展望

摘要: 精神健康构成了一个复杂且普遍的全球性挑战，影响着数百万人的生活，通常会导致严重后果。本文进行了彻底的调查，探索数据科学、人工智能和精神保健的交叉领域，重点关注通过在线社交媒体（OSM）进行精神障碍检测的最新发展。大部分人口积极参与OSM平台，创造了一个拥有巨大潜力进行精神健康分析的个人数据库。本文穿越传统诊断方法、最新数据和人工智能驱动的研究，以及精神保健的可解释人工智能（XAI）模型的出现。我们回顾了最新的机器学习方法，尤其是基于现代深度学习的方法，同时强调了在医疗人工智能模型中需要解释性的需求。实验设计部分提供了关于普遍实践的见解，包括可用数据集和评估方法。我们还确定了该领域的关键问题和挑战，并提出了有前景的未来研究方向。由于精神健康决策需要透明性、可解释性和道德考量，本文有助于推动社交媒体在精神保健中的XAI进步的持续讨论。这里呈现的全面概述旨在引导研究人员、从业者和决策者发展精神障碍检测领域。

更新时间: 2024-06-10 02:51:16

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.05984v1

Artificial Intelligence for Neuro MRI Acquisition: A Review

Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks.

Updated: 2024-06-10 02:50:33

标题: 人工智能在神经MRI采集中的应用：一项综述

摘要: 核磁共振成像（MRI）在人工智能（AI）的复苏中受益匪浅。通过利用AI在大规模优化和模式识别方面的能力，创新方法正在改变MRI采集工作流程，包括规划、序列设计和校正采集工件。这些新兴算法展示了在增强采集步骤的效率和吞吐量方面的巨大潜力。本综述讨论了神经MRI采集中几种关键的基于AI的方法，着重讨论它们的技术进步、对临床实践的影响以及潜在风险。

更新时间: 2024-06-10 02:50:33

领域: eess.IV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2406.05982v1

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.

Updated: 2024-06-10 02:47:55

标题: ShiftAddLLM: 通过后训练无乘法重参数化加速预训练LLMs

摘要: 大型语言模型（LLM）在语言任务上表现出色，但在资源受限设备上部署时面临挑战，原因是它们具有庞大的参数并依赖于密集的乘法运算，导致内存需求高和延迟瓶颈。移位加法重新参数化通过在LLM的注意力和多层感知器（MLP）层中用硬件友好的基元替换昂贵的乘法，提供了一个有前途的解决方案。然而，当前的重新参数化技术需要从头开始训练或全参数微调以恢复准确性，这对LLM来说是资源密集型的。为了解决这个问题，我们提出通过后训练移位加法重新参数化来加速预训练的LLM，创建高效的无乘法模型，称为ShiftAddLLM。具体来说，我们将每个权重矩阵量化为与组别缩放因子配对的二进制矩阵。相关的乘法被重新参数化为（1）激活和缩放因子之间的移位和（2）查询和根据二进制矩阵添加。为了减少准确性损失，我们提出了一种多目标优化方法，以最小化权重和输出激活重新参数化错误。此外，基于各层对重新参数化的敏感度不同，我们制定了一种自动位分配策略，进一步减少内存使用和延迟。在五个LLM家族和八个任务上的实验证实了ShiftAddLLM的有效性，相比于最具竞争力的3位和2位量化LLM，分别在可比或更低的延迟下实现了平均困惑度提高了5.6和22.7个点，原始LLM的内存和能耗减少超过80％。代码和模型可在https://github.com/GATECH-EIC/ShiftAddLLM上找到。

更新时间: 2024-06-10 02:47:55

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05981v1

Combinatorial Optimization with Automated Graph Neural Networks

In recent years, graph neural networks (GNNs) have become increasingly popular for solving NP-hard combinatorial optimization (CO) problems, such as maximum cut and maximum independent set. The core idea behind these methods is to represent a CO problem as a graph and then use GNNs to learn the node/graph embedding with combinatorial information. Although these methods have achieved promising results, given a specific CO problem, the design of GNN architectures still requires heavy manual work with domain knowledge. Existing automated GNNs are mostly focused on traditional graph learning problems, which is inapplicable to solving NP-hard CO problems. To this end, we present a new class of \textbf{AUTO}mated \textbf{G}NNs for solving \textbf{NP}-hard problems, namely \textbf{AutoGNP}. We represent CO problems by GNNs and focus on two specific problems, i.e., mixed integer linear programming and quadratic unconstrained binary optimization. The idea of AutoGNP is to use graph neural architecture search algorithms to automatically find the best GNNs for a given NP-hard combinatorial optimization problem. Compared with existing graph neural architecture search algorithms, AutoGNP utilizes two-hop operators in the architecture search space. Moreover, AutoGNP utilizes simulated annealing and a strict early stopping policy to avoid local optimal solutions. Empirical results on benchmark combinatorial problems demonstrate the superiority of our proposed model.

Updated: 2024-06-10 02:45:41

标题: 自动图神经网络在组合优化中的应用

摘要: 最近几年，图神经网络（GNNs）在解决NP难的组合优化（CO）问题，如最大割和最大独立集等方面变得越来越受欢迎。这些方法背后的核心思想是将一个CO问题表示为一个图，然后使用GNNs学习带有组合信息的节点/图嵌入。尽管这些方法已经取得了令人满意的结果，但对于特定的CO问题，设计GNN架构仍需要大量手工工作和领域知识。现有的自动化GNN主要集中在传统图学习问题上，这对解决NP难的CO问题是不适用的。为此，我们提出了一种新的用于解决NP难问题的自动化GNN类别，即AutoGNP。我们通过GNNs表示CO问题，并专注于两个具体问题，即混合整数线性规划和二次无约束二进制优化。AutoGNP的理念是利用图神经网络架构搜索算法来自动找到最适合给定NP难的组合优化问题的GNNs。与现有的图神经网络架构搜索算法相比，AutoGNP在架构搜索空间中利用了两跳操作符。此外，AutoGNP利用模拟退火和严格的早停策略来避免局部最优解。在基准组合问题上的实证结果证明了我们提出模型的优越性。

更新时间: 2024-06-10 02:45:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02872v2

On the Variance of Neural Network Training with respect to Test Sets and Distributions

Typical neural network trainings have substantial variance in test-set performance between repeated runs, impeding hyperparameter comparison and training reproducibility. In this work we present the following results towards understanding this variation. (1) Despite having significant variance on their test-sets, we demonstrate that standard CIFAR-10 and ImageNet trainings have little variance in performance on the underlying test-distributions from which their test-sets are sampled. (2) We show that these trainings make approximately independent errors on their test-sets. That is, the event that a trained network makes an error on one particular example does not affect its chances of making errors on other examples, relative to their average rates over repeated runs of training with the same hyperparameters. (3) We prove that the variance of neural network trainings on their test-sets is a downstream consequence of the class-calibration property discovered by Jiang et al. (2021). Our analysis yields a simple formula which accurately predicts variance for the binary classification case. (4) We conduct preliminary studies of data augmentation, learning rate, finetuning instability and distribution-shift through the lens of variance between runs.

Updated: 2024-06-10 02:25:33

标题: 关于神经网络训练与测试集和分布之间的方差

摘要: 典型的神经网络训练在重复运行中在测试集性能上存在显著的方差，阻碍了超参数比较和训练的可重复性。在这项工作中，我们提出以下结果，以了解这种变化。(1)尽管它们在测试集上具有显著的方差，但我们证明标准的CIFAR-10和ImageNet训练在其测试集抽样的基础测试分布上性能变化小。(2) 我们展示这些训练在其测试集上几乎独立地产生错误。也就是说，训练网络在特定示例上产生错误的事件并不影响其在其他示例上产生错误的机会，相对于重复运行相同超参数的训练的平均错误率。(3) 我们证明神经网络在其测试集上的方差是姜等人发现的类校准性质的下游结果。我们的分析得出了一个简单的公式，可以准确预测二元分类情况的方差。(4) 我们通过运行之间的方差的视角进行了数据增强、学习率、微调不稳定性和分布转移的初步研究。

更新时间: 2024-06-10 02:25:33

领域: cs.LG

下载: http://arxiv.org/abs/2304.01910v4

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.

Updated: 2024-06-10 02:21:14

标题: 费舍尔-拉奥距离与多元正态分布之间的拉回SPD锥距离

摘要: 多元正态分布的数据集在许多科学领域中随处可见，如扩散张量成像、结构张量计算机视觉、雷达信号处理、机器学习等等。为了处理这些正态数据集以进行诸如过滤、分类或聚类等下游任务，需要定义合适的正态之间的不相似性概念和连接它们的路径。费舍尔-劳距离被定义为由费舍尔信息度量诱导的里曼测地线距离，是一种合理的度量距离，但除了少数特殊情况外，并没有已知的闭合形式。在这项工作中，我们首先报告了一种快速而鲁棒的方法，以任意精细的方式近似多元正态分布之间的费舍尔-劳距离。其次，我们引入了一类基于正态流形的微分嵌入到高维对称正定锥的子流形上的距离。我们展示了在锥上的投影希尔伯特距离在嵌入的正态子流形上产生了度量，并通过拉回该锥距离及其相关的直线希尔伯特锥测地线来获得正态分布之间的距离和平滑路径。与费舍尔-劳距离近似相比，拉回希尔伯特锥距离在计算上更轻，因为只需要计算矩阵的极小和极大特征值。最后，我们展示了如何在聚类任务中使用这些距离。

更新时间: 2024-06-10 02:21:14

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2307.10644v3

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Several empirical studies have investigated the rationality and social behavior performance of LLMs, yet their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of LLMs. Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities. However, there are significant variations in the degree to which these behaviors are expressed across different LLMs. We also explore their behavior when embedded with socio-demographic features, uncovering significant disparities. For instance, when modeled with attributes of sexual minority groups or physical disabilities, Claude-3-Opus displays increased risk aversion, leading to more conservative choices. These findings underscore the need for careful consideration of the ethical implications and potential biases in deploying LLMs in decision-making scenarios. Therefore, this study advocates for developing standards and guidelines to ensure that LLMs operate within ethical boundaries while enhancing their utility in complex decision-making environments.

Updated: 2024-06-10 02:14:19

标题: 不确定环境下LLMs的决策行为评估框架

摘要: 在面对不确定性时做出决策时，个体通常会偏离理性行为，这可以从三个维度进行评估：风险偏好、概率加权和损失规避。鉴于大型语言模型（LLMs）在决策过程中的广泛应用，评估它们的行为是否符合人类的规范和道德期望，或者是否存在潜在偏见至关重要。几项实证研究已经调查了LLMs的理性和社会行为表现，但它们的内部决策倾向和能力仍然不够了解。本文提出了一个基于行为经济学的框架，用于评估LLMs的决策行为。通过一个多项选择列表实验，我们估计了三个商业LLMs（ChatGPT-4.0-Turbo、Claude-3-Opus和Gemini-1.0-pro）在一个无上下文环境中的风险偏好、概率加权和损失规避程度。我们的结果表明，LLMs通常表现出与人类类似的模式，例如风险规避和损失规避，倾向于过度看重小概率。然而，这些行为在不同LLMs之间表现出显著的差异程度。我们还探讨了它们在嵌入社会人口特征时的行为，发现了显著的差异。例如，当模拟具有性少数群体或身体残疾属性时，Claude-3-Opus表现出增加的风险规避，导致更保守的选择。这些发现强调了在决策情境中部署LLMs时需要仔细考虑道德影响和潜在偏见的必要性。因此，本研究主张制定标准和指南，以确保LLMs在道德界限内运作，同时增强它们在复杂决策环境中的实用性。

更新时间: 2024-06-10 02:14:19

领域: cs.AI,cs.CY,cs.HC,cs.LG,econ.TH

下载: http://arxiv.org/abs/2406.05972v1

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of overparameterization without the computational burdens. In practice, we demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models. Our approach is grounded in theoretical findings for deep overparameterized low-rank matrix recovery, where we show that the learning dynamics of each weight matrix are confined to an invariant low-dimensional subspace. Consequently, we can construct and train compact, highly compressed factorizations possessing the same benefits as their overparameterized counterparts. In the context of deep matrix completion, our technique substantially improves training efficiency while retaining the advantages of overparameterization. For language model fine-tuning, we propose a method called "Deep LoRA", which improves the existing low-rank adaptation (LoRA) technique, leading to reduced overfitting and a simplified hyperparameter setup, while maintaining comparable efficiency. We validate the effectiveness of Deep LoRA on natural language tasks, particularly when fine-tuning with limited data. Our code is available at https://github.com/cjyaras/deep-lora-transformers.

Updated: 2024-06-10 02:05:26

标题: 深度过度参数化低秩学习与适应性中的可压缩动力学

摘要: 在机器学习模型中的过参数化提供了优化和泛化方面的巨大好处，但随着模型规模的增长也导致了增加的计算需求。在这项工作中，我们展示了通过利用数据的固有低维结构和模型参数中可压缩的动态，我们可以在不增加计算负担的情况下获得过参数化的好处。在实践中，我们展示了这种方法在深度低秩矩阵完成和微调语言模型方面的有效性。我们的方法基于深度过参数化低秩矩阵恢复的理论发现，我们表明每个权重矩阵的学习动态被限制在一个不变的低维子空间中。因此，我们可以构建和训练紧凑、高度压缩的因子分解，具有与过参数化对应物相同的好处。在深度矩阵完成的情况下，我们的技术显著提高了训练效率，同时保留了过参数化的优势。对于语言模型微调，我们提出了一种称为"Deep LoRA"的方法，改进了现有的低秩适应（LoRA）技术，减少了过拟合并简化了超参数设置，同时保持了可比较的效率。我们验证了Deep LoRA在自然语言任务中的有效性，特别是在有限数据微调时。我们的代码可在https://github.com/cjyaras/deep-lora-transformers找到。

更新时间: 2024-06-10 02:05:26

领域: cs.LG,cs.AI,eess.SP,stat.ML

下载: http://arxiv.org/abs/2406.04112v2

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

Updated: 2024-06-10 01:59:00

标题: CVQA: 多元文化多语言视觉问答基准

摘要: 视觉问答（VQA）是多模态人工智能中的一个重要任务，通常用于测试视觉-语言模型理解和推理视觉和文本数据中的知识的能力。然而，大多数当前的VQA模型使用的数据集主要集中在英语和少数主要世界语言上，其中的图像通常以西方为中心。尽管最近的努力试图增加VQA数据集中涵盖的语言数量，但仍然缺乏低资源语言的多样性。更重要的是，尽管这些数据集通常通过翻译或其他方法扩展其语言范围，但通常保持图像不变，导致狭隘的文化代表性。为了解决这些限制，我们构建了CVQA，一个新的跨文化多语言视觉问答基准，旨在涵盖丰富的语言和文化，我们在数据收集过程中邀请了本土说话者和文化专家。因此，CVQA包括来自四大洲28个国家的以文化为驱动的图像和问题，涵盖26种语言，其中包括11种文字，提供了总共9k个问题。然后我们在CVQA上对几个多模态大语言模型（MLLMs）进行基准测试，并展示该数据集对当前最先进的模型具有挑战性。这个基准测试可以作为评估多模态模型文化能力和偏见的探索性评估套件，并希望鼓励更多的研究努力，以增加这一领域的文化意识和语言多样性。

更新时间: 2024-06-10 01:59:00

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05967v1

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancing the quality of generated voices with large amount of unlabeled data. At inference, our novel dual guiding mechanism gives text and pitch guidance on the reverse diffusion step by estimating the score of masked input. Experimental results show that the model trained in a semi-supervised manner outperforms other baselines trained only on the labeled data in terms of pronunciation, pitch accuracy and overall quality. Furthermore, we demonstrate that by adding Text-to-Speech (TTS) data in training, the model can synthesize the singing voices of TTS speakers even without their singing voices.

Updated: 2024-06-10 01:47:52

标题: MakeSinger：一种半监督训练方法，通过无分类器的扩散引导实现数据高效的歌声合成

摘要: 本文提出了一种名为MakeSinger的半监督训练方法，用于通过无分类器扩散引导进行歌声合成（SVS）。SVS中的挑战在于收集文本、音高和音频数据的对齐集合的昂贵过程。MakeSinger使得能够从任何语音和歌声数据对扩散基础的SVS模型进行训练，无论其标记如何，从而提高了生成声音质量，尤其是在大量无标记数据的情况下。在推断时，我们的新颖的双引导机制通过估计遮罩输入的分数，在反向扩散步骤上给出文本和音高引导。实验结果表明，以半监督方式训练的模型在发音、音高准确性和整体质量方面优于其他仅在标记数据上训练的基线。此外，我们证明通过在训练中添加文本到语音（TTS）数据，即使没有他们的歌声，模型也能合成TTS说话者的歌声。

更新时间: 2024-06-10 01:47:52

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2406.05965v1

Distributionally Robust Safe Sample Screening

In this study, we propose a machine learning method called Distributionally Robust Safe Sample Screening (DRSSS). DRSSS aims to identify unnecessary training samples, even when the distribution of the training samples changes in the future. To achieve this, we effectively combine the distributionally robust (DR) paradigm, which aims to enhance model robustness against variations in data distribution, with the safe sample screening (SSS), which identifies unnecessary training samples prior to model training. Since we need to consider an infinite number of scenarios regarding changes in the distribution, we applied SSS because it does not require model training after the change of the distribution. In this paper, we employed the covariate shift framework to represent the distribution of training samples and reformulated the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the existing SSS technique to accommodate this weight uncertainty, the DRSSS method is capable of reliably identifying unnecessary samples under any future distribution within a specified range. We provide a theoretical guarantee for the DRSSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

Updated: 2024-06-10 01:46:42

标题: 分布鲁棒的安全样本筛选

摘要: 在本研究中，我们提出了一种称为分布鲁棒安全样本筛选（DRSSS）的机器学习方法。DRSSS旨在识别不必要的训练样本，即使训练样本的分布在未来发生变化。为了实现这一目标，我们有效地将分布鲁棒（DR）范式与安全样本筛选（SSS）相结合，SSS用于在模型训练之前识别不必要的训练样本。由于我们需要考虑关于分布变化的无限数量的场景，我们应用了SSS，因为在分布变化后不需要模型训练。在本文中，我们利用协变量转移框架来表示训练样本的分布，并将DR协变量转移问题重新构造为加权经验风险最小化问题，其中权重在预定范围内存在不确定性。通过将现有的SSS技术扩展以适应这种权重不确定性，DRSSS方法能够可靠地识别在指定范围内任何未来分布下不必要的样本。我们为DRSSS方法提供了理论保证，并通过对合成和真实世界数据集的数值实验验证了其性能。

更新时间: 2024-06-10 01:46:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.05964v1

Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set.

Updated: 2024-06-10 01:45:55

标题: CVPR多模态算法推理任务2024年的SMART-101挑战解决方案

摘要: 在本文中，介绍了HYU MLLAB KT团队对多模态算法推理任务的解决方案：SMART-101 CVPR 2024挑战。超越传统的视觉问答问题，SMART-101挑战旨在通过解决设计给6-8岁儿童的复杂视听难题，实现人类级别的多模态理解。为了解决这个问题，我们提出了两个主要想法。首先，利用大规模语言模型（LLM）的推理能力，将给定的视觉线索（图像）扎根于文本模态。为此，我们生成了描述图像上下文的高度详细的文本标题，并将这些标题用作LLM的输入。其次，由于难题图像的特性，通常包含各种几何视觉模式，我们利用物体检测算法来确保这些模式在标题过程中不被忽视。我们采用了SAM算法，可以检测各种大小的对象，捕获这些几何模式的视觉特征，并将这些信息作为LLM的输入。在难题分割配置下，我们在测试集上实现了29.5的选项选择准确率Oacc，挑战集上的加权选项选择准确率（WOSA）为27.1。

更新时间: 2024-06-10 01:45:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05963v1

MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation

Online Bayesian bipartite matching is a central problem in digital marketplaces and exchanges, including advertising, crowdsourcing, ridesharing, and kidney exchange. We introduce a graph neural network (GNN) approach that emulates the problem's combinatorially-complex optimal online algorithm, which selects actions (e.g., which nodes to match) by computing each action's value-to-go (VTG) -- the expected weight of the final matching if the algorithm takes that action, then acts optimally in the future. We train a GNN to estimate VTG and show empirically that this GNN returns high-weight matchings across a variety of tasks. Moreover, we identify a common family of graph distributions in spatial crowdsourcing applications, such as rideshare, under which VTG can be efficiently approximated by aggregating information within local neighborhoods in the graphs. This structure matches the local behavior of GNNs, providing theoretical justification for our approach.

Updated: 2024-06-10 01:39:04

标题: 梧桐：基于GNN的匹配算法，用于在线价值估计近似

摘要: 在线贝叶斯二分匹配是数字市场和交易中的一个中心问题，包括广告、众包、拼车和肾脏交换。我们介绍了一种图神经网络（GNN）方法，模拟了问题的组合复杂最优在线算法，该算法通过计算每个动作的预期最终匹配权重来选择动作（例如，要匹配哪些节点），然后在未来进行最优操作。我们训练了一个GNN来估计VTG，并从经验上证明，这个GNN在各种任务中返回高权重的匹配。此外，我们在空间众包应用中识别出一类常见的图分布，例如拼车，在这种分布下，可以通过在图中的局部邻域内聚合信息来有效地逼近VTG。这种结构与GNN的局部行为相匹配，为我们的方法提供了理论上的理据。

更新时间: 2024-06-10 01:39:04

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2406.05959v1

Online Speculative Decoding

Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model's outputs. However, its efficacy can be limited due to the low predictive accuracy of the draft model, particularly when faced with diverse text inputs and a significant capability gap between the draft and target models. We introduce online speculative decoding to address this challenge. The main idea is to continuously update the (multiple) draft model(s) on observed user query data. Adapting to query distribution mitigates the shifts between the training distribution of the draft model and the query distribution, enabling the draft model to more accurately predict the target model's outputs. We develop a prototype of online speculative decoding based on knowledge distillation and evaluate it using both synthetic and real query data. The results show a substantial increase in the token acceptance rate by 0.1 to 0.65, bringing 1.42x to 2.17x latency reduction. Our code is available at https://github.com/LiuXiaoxuanPKU/OSD.

Updated: 2024-06-10 01:36:31

标题: 在线推测解码

摘要: 投机解码是一种加速大型语言模型（LLMs）推理的关键技术，通过利用较小的草案模型来预测目标模型的输出。然而，由于草案模型的预测准确性较低，特别是面对多样化的文本输入和草案模型与目标模型之间的显著能力差距时，其有效性可能受到限制。我们引入在线投机解码来解决这一挑战。其主要思想是持续更新（多个）草案模型以适应观察到的用户查询数据。适应查询分布有助于减轻草案模型的训练分布与查询分布之间的偏移，使草案模型能够更准确地预测目标模型的输出。我们基于知识蒸馏开发了在线投机解码的原型，并使用合成和真实查询数据进行评估。结果显示，标记接受率增加了0.1到0.65，带来了1.42倍到2.17倍的延迟减少。我们的代码可在https://github.com/LiuXiaoxuanPKU/OSD上找到。

更新时间: 2024-06-10 01:36:31

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.07177v4

Increasing Trust in Language Models through the Reuse of Verified Circuits

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

Updated: 2024-06-10 01:32:09

标题: 通过重复使用经过验证的电路增加对语言模型的信任

摘要: 语言模型（LMs）越来越被广泛用于各种预测任务，但它们的训练往往会忽略罕见的边缘情况，降低了它们的可靠性。在这里，我们定义了一个严格的可信度标准，即任务算法和电路实现必须经过验证，考虑到边缘情况，并且没有已知的故障模式。我们展示了一个transformer模型可以通过使用数学和逻辑指定的框架构建来训练以满足这一标准。在本文中，我们完全验证了一个n位整数加法模型。为了展示验证模块的可重用性，我们将训练好的整数加法模型插入一个未经训练的模型中，并训练组合模型来执行加法和减法。我们发现加法电路在两个任务中都有广泛的重用，减轻了更复杂减法模型的验证工作。我们讨论了如何将经过验证的任务模块插入到LMs中，以利用模型重用来提高使用它们构建的语言模型的可验证性和可信度。经过验证电路的重复使用减少了验证更复杂的复合模型的努力，我们认为这是向语言模型的安全性迈出的重要一步。

更新时间: 2024-06-10 01:32:09

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.02619v6

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreover, inadequate training data can further increase the risk of performance degradation. To address these challenges, we propose a novel dReLU function, which is designed to improve LLM activation sparsity, along with a high-quality training data mixture ratio to facilitate effective sparsification. Additionally, we leverage sparse activation patterns within the Feed-Forward Network (FFN) experts of Mixture-of-Experts (MoE) models to further boost efficiency. By applying our neuron sparsification method to the Mistral and Mixtral models, only 2.5 billion and 4.3 billion parameters are activated per inference iteration, respectively, while achieving even more powerful model performance. Evaluation results demonstrate that this sparsity achieves a 2-5x decoding speedup. Remarkably, on mobile phones, our TurboSparse-Mixtral-47B achieves an inference speed of 11 tokens per second. Our models are available at \url{https://huggingface.co/PowerInfer}

Updated: 2024-06-10 01:21:59

标题: 超级稀疏：以最少激活参数实现LLM SOTA性能

摘要: 利用激活稀疏性是显著加速大型语言模型(LLMs)推断过程的一种有前途的方法，而不会影响性能。然而，激活稀疏性取决于激活函数，常用的如SwiGLU和GeGLU等表现出有限的稀疏性。简单地用ReLU替换这些函数无法达到足够的稀疏性。此外，不足的训练数据还会进一步增加性能下降的风险。为了解决这些挑战，我们提出了一种新颖的dReLU函数，旨在改善LLM激活稀疏性，同时采用高质量的训练数据混合比例来促进有效的稀疏化。此外，我们利用混合专家(MoE)模型中前馈网络(FFN)专家内的稀疏激活模式来进一步提高效率。通过将我们的神经元稀疏化方法应用于Mistral和Mixtral模型，在每次推断迭代中，分别仅激活25亿和43亿个参数，同时实现更强大的模型性能。评估结果表明，这种稀疏性实现了2-5倍的解码加速。值得注意的是，在手机上，我们的TurboSparse-Mixtral-47B实现了每秒11个标记的推断速度。我们的模型可在\url{https://huggingface.co/PowerInfer}上获得。

更新时间: 2024-06-10 01:21:59

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.05955v1

Aligning Large Language Models with Representation Editing: A Control Perspective

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods.

Updated: 2024-06-10 01:21:31

标题: 使用表示编辑对齐大型语言模型：控制视角

摘要: 将大型语言模型（LLMs）与人类目标对齐对于现实世界应用至关重要。然而，为了对齐而对LLMs进行微调通常会遇到训练不稳定的问题，并且需要大量的计算资源。在测试时间对齐技术，如提示和引导解码，不会修改基础模型，并且它们的性能仍然依赖于原始模型的能力。为了解决这些挑战，我们提出通过表示编辑来对齐LLMs。我们方法的核心是将预先训练的自回归LLM视为一个离散时间随机动力系统。为了实现特定目标的对齐，我们将外部控制信号引入到这种语言动力系统的状态空间中。我们根据贝尔曼方程直接在隐藏状态上训练一个值函数，使得基于梯度的优化可以在测试时间获得最佳的控制信号。我们的实验证明，我们的方法在测试时间对齐技术方面表现优于现有方法，同时与微调方法相比需要更少的资源。

更新时间: 2024-06-10 01:21:31

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.05954v1

Decoupling regularization from the action space

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological sequence design task.

Updated: 2024-06-10 01:20:31

标题: 解耦规范化与动作空间

摘要: 常规化强化学习（RL），特别是熵正则化类型，在最优控制和逆向RL中已经获得了一定的关注。尽管标准的非正则化RL方法不受行动数量变化的影响，但我们表明它可能严重影响它们的正则化对应物。本文展示了将正则化器与动作空间解耦的重要性：即保持一致的正则化水平，无论涉及多少动作，以避免过度正则化。虽然通过引入一个特定于任务的温度参数可以避免这个问题，但这通常是不可取的，而且不能解决动作空间依赖于状态的问题。在状态依赖的动作环境中，具有不同动作空间的不同状态被不一致地正则化。我们引入了两种解决方案：静态温度选择方法和动态对应物，普遍适用于出现这个问题的地方。实施这些改变在静态和动态温度制度以及生物序列设计任务中提高了DeepMind控制套件的性能。

更新时间: 2024-06-10 01:20:31

领域: cs.LG

下载: http://arxiv.org/abs/2406.05953v1

Conformal Prediction Sets Improve Human Decision Making

In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.

Updated: 2024-06-10 01:12:10

标题: Conformal Prediction Sets Improve Human Decision Making （注：Conformal Prediction Sets 是一种用于机器学习中进行置信区间估计的方法）

摘要: 针对日常查询，人类在不确定时会明确表示不确定性，并在不确定时提供替代答案。通过遵循预测，输出校准预测集的机器学习模型模仿了这种人类行为；更大的集合表明更大的不确定性，同时提供替代选项。在这项工作中，我们通过进行预先注册的随机对照试验，向人类实验对象提供符合预测集，研究了符合预测集作为辅助人类决策的有效性。通过统计显著性，我们发现，当人类被提供符合预测集时，他们在任务上的准确性比具有相同覆盖保证的固定大小预测集有所提高。结果表明，用符合预测量化模型不确定性有助于人机协作决策和人工智能团队。

更新时间: 2024-06-10 01:12:10

领域: cs.LG,cs.HC,stat.ML

下载: http://arxiv.org/abs/2401.13744v3

Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning

Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pre-training, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (90.4%) of individual evaluations rated the notes generated by LLaMA-Clinic as "acceptable" or higher across all three criteria: real-world readiness, completeness, and accuracy. In the more challenging "Assessment and Plan" section, LLaMA-Clinic scored higher (4.2/5) in real-world readiness than physician-authored notes (4.1/5). Our cost analysis for inference shows that our LLaMA-Clinic model achieves a 3.75-fold cost reduction compared to an external generic LLM service. Additionally, we highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a best-practice note format, rather than relying on LLMs to determine this for clinical practice. We have made our newly created synthetic clinic dialogue-note dataset and the physician feedback dataset publicly available to foster future research.

Updated: 2024-06-10 01:09:03

标题: 使用基于政策的强化学习，将开源大型语言模型转化为成本效益高、专家水平的临床笔记生成工具

摘要: 专有的大型语言模型（LLM）如GPT-4和Gemini在临床文本摘要任务中展示了有希望的能力。然而，由于患者数据隐私和计算成本的考虑，许多医疗保健提供者更喜欢使用小型、本地托管的模型而不是外部通用的LLM。本研究提出了一个全面的领域和任务特定的适应过程，用于开源的LLaMA-2 13亿参数模型，使其能够从门诊患者医生对话中生成高质量的临床笔记。我们的过程包括持续的预训练、监督微调以及来自AI和人类反馈的强化学习。我们引入了一种新方法，DistillDirect，用于使用Gemini 1.0 Pro作为教师模型进行政策驱动的强化学习。我们的结果模型，LLaMA-Clinic，可以生成与医生撰写的临床笔记相当质量的笔记。在一个被盲目的医生阅读研究中，大多数（90.4%）个体评估将LLaMA-Clinic生成的笔记评为“可接受”或更高，涵盖真实世界准备、完整性和准确性这三个标准。在更具挑战性的“评估和计划”部分，LLaMA-Clinic在真实世界准备方面得分更高（4.2/5）比医生撰写的笔记（4.1/5）高。我们的推断成本分析显示，与外部通用LLM服务相比，我们的LLaMA-Clinic模型实现了3.75倍的成本降低。此外，我们强调未来临床笔记生成任务的关键考虑因素，强调预先定义最佳实践笔记格式的重要性，而不是依赖LLM来确定临床实践的格式。我们已经公开提供了我们新创建的合成诊所对话-笔记数据集和医生反馈数据集，以促进未来研究。

更新时间: 2024-06-10 01:09:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.00715v4

Set Features for Anomaly Detection

This paper proposes to use set features for detecting anomalies in samples that consist of unusual combinations of normal elements. Many leading methods discover anomalies by detecting an unusual part of a sample. For example, state-of-the-art segmentation-based approaches, first classify each element of the sample (e.g., image patch) as normal or anomalous and then classify the entire sample as anomalous if it contains anomalous elements. However, such approaches do not extend well to scenarios where the anomalies are expressed by an unusual combination of normal elements. In this paper, we overcome this limitation by proposing set features that model each sample by the distribution of its elements. We compute the anomaly score of each sample using a simple density estimation method, using fixed features. Our approach outperforms the previous state-of-the-art in image-level logical anomaly detection and sequence-level time series anomaly detection.

Updated: 2024-06-10 01:06:49

标题: 异常检测的集合特征

摘要: 本文提出使用集合特征来检测由正常元素的非常规组合构成的样本中的异常。许多主要方法通过检测样本的异常部分来发现异常。例如，基于分割的最先进方法首先将样本的每个元素（例如，图像块）分类为正常或异常，然后如果样本包含异常元素，则将整个样本分类为异常。然而，这种方法在异常由正常元素的非常规组合表达的场景中不适用。在本文中，我们通过提出集合特征来克服这一限制，通过模拟每个样本的元素分布来对每个样本计算异常分数。我们使用固定特征的简单密度估计方法。我们的方法在图像级逻辑异常检测和序列级时间序列异常检测方面优于先前的最先进方法。

更新时间: 2024-06-10 01:06:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.14773v3

Decision-focused Graph Neural Networks for Combinatorial Optimization

In recent years, there has been notable interest in investigating combinatorial optimization (CO) problems by neural-based framework. An emerging strategy to tackle these challenging problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms, a subject that has attracted considerable attention. Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework. The primary focus of our work is to formulate a more efficient and precise framework for CO by employing decision-focused learning on graphs. Additionally, we introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support. To realize an end-to-end approach, we have designed two cascaded modules: (a) an unsupervised trained graph predictive model, and (b) a solver for quadratic binary unconstrained optimization. Empirical evaluations are conducted on various classical tasks, including maximum cut, maximum independent set, and minimum vertex cover. The experimental results on classical CO problems (i.e. MaxCut, MIS, and MVC) demonstrate the superiority of our method over both the standalone GNN approach and classical methods.

Updated: 2024-06-10 00:53:40

标题: 面向决策的图神经网络在组合优化中的应用

摘要: 近年来，人们对利用基于神经网络的框架来研究组合优化（CO）问题表现出了明显的兴趣。一种新兴的应对这些具有挑战性问题的策略是采用图神经网络（GNNs）作为传统算法的替代方案，这一主题引起了相当大的关注。尽管在CO领域中，GNNs和传统算法求解器的普及程度不断增加，但对它们的整合使用以及在端到端框架内的相关性的研究仍然有限。我们工作的主要重点是通过在图上采用以决策为中心的学习，制定一个更高效和精确的CO框架。此外，我们引入了一个利用GNNs解决具有辅助支持的CO问题的决策集中框架。为了实现端到端方法，我们设计了两个级联模块：（a）一个无监督训练的图形预测模型；（b）一个解决二次二进制无约束优化问题的求解器。我们对各种经典任务进行了实证评估，包括最大割、最大独立集和最小顶点覆盖。对经典CO问题（即MaxCut、MIS和MVC）的实验结果表明，我们的方法优于独立的GNN方法和传统方法。

更新时间: 2024-06-10 00:53:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.03647v2

Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models

Backdoor attacks present significant threats to Large Language Models (LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can plant backdoors into LLMs and pose risks to users by embedding malicious instructions into user queries. The backdoor-compromised LLM will generate malicious output when and input is embedded with a specific trigger predetermined by an attacker. Traditional defense strategies, which primarily involve model parameter fine-tuning and gradient calculation, are inadequate for LLMs due to their extensive computational and clean data requirements. In this paper, we propose a novel solution, Chain-of-Scrutiny (CoS), to address these challenges. Backdoor attacks fundamentally create a shortcut from the trigger to the target output, thus lack reasoning support. Accordingly, CoS guides the LLMs to generate detailed reasoning steps for the input, then scrutinizes the reasoning process to ensure consistency with the final answer. Any inconsistency may indicate an attack. CoS only requires black-box access to LLM, offering a practical defense, particularly for API-accessible LLMs. It is user-friendly, enabling users to conduct the defense themselves. Driven by natural language, the entire defense process is transparent to users. We validate the effectiveness of CoS through extensive experiments across various tasks and LLMs. Additionally, experiments results shows CoS proves more beneficial for more powerful LLMs.

Updated: 2024-06-10 00:53:25

标题: “Chain-of-Scrutiny: 检测大型语言模型的后门攻击”

摘要: 背景攻击对大型语言模型（LLMs）构成重大威胁，尤其随着第三方服务的兴起，提供API集成和即时工程。不可信任的第三方可能向LLMs植入后门，并通过将恶意指令嵌入用户查询中对用户构成风险。当输入中嵌入攻击者预先确定的特定触发器时，被后门破坏的LLM将生成恶意输出。传统的防御策略，主要涉及模型参数微调和梯度计算，对LLMs来说由于其对计算和干净数据的广泛需求而不足。在本文中，我们提出了一种新颖的解决方案，Chain-of-Scrutiny（CoS），以解决这些挑战。背景攻击从根本上创造了从触发器到目标输出的快捷方式，因此缺乏推理支持。因此，CoS指导LLMs生成输入的详细推理步骤，然后审查推理过程以确保与最终答案的一致性。任何不一致都可能表明存在攻击。CoS仅需要对LLM的黑盒访问，提供了一种实用的防御方法，特别适用于API可访问的LLMs。它用户友好，使用户能够自行进行防御。由自然语言驱动，整个防御过程对用户透明。我们通过在各种任务和LLMs上进行广泛实验来验证CoS的有效性。此外，实验结果表明CoS对于更强大的LLMs更有益。

更新时间: 2024-06-10 00:53:25

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.05948v1

Network-Based Transfer Learning Helps Improve Short-Term Crime Prediction Accuracy

Deep learning architectures enhanced with human mobility data have been shown to improve the accuracy of short-term crime prediction models trained with historical crime data. However, human mobility data may be scarce in some regions, negatively impacting the correct training of these models. To address this issue, we propose a novel transfer learning framework for short-term crime prediction models, whereby weights from the deep learning crime prediction models trained in source regions with plenty of mobility data are transferred to target regions to fine-tune their local crime prediction models and improve crime prediction accuracy. Our results show that the proposed transfer learning framework improves the F1 scores for target cities with mobility data scarcity, especially when the number of months of available mobility data is small. We also show that the F1 score improvements are pervasive across different types of crimes and diverse cities in the US.

Updated: 2024-06-10 00:51:20

标题: 基于网络的迁移学习有助于提高短期犯罪预测准确性

摘要: 利用人类移动数据增强的深度学习架构已被证明可以提高使用历史犯罪数据训练的短期犯罪预测模型的准确性。然而，在一些地区，人类移动数据可能稀缺，这会对模型的正确训练产生负面影响。为了解决这个问题，我们提出了一种新颖的迁移学习框架，用于短期犯罪预测模型，通过将在具有充足移动数据的源地区训练的深度学习犯罪预测模型的权重转移到目标地区，以微调其本地犯罪预测模型并提高犯罪预测准确性。我们的结果表明，所提出的迁移学习框架提高了对移动数据稀缺的目标城市的F1分数，特别是当可用移动数据的月数较少时。我们还展示了F1分数的提升在美国不同类型犯罪和不同城市中普遍存在。

更新时间: 2024-06-10 00:51:20

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2406.06645v1

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens. We refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and provide evidence that current aligned LLMs are subject to this issue. We also show how these findings help explain multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. Importantly, we discuss how this consolidated notion of shallow safety alignment sheds light on promising research directions for mitigating these vulnerabilities. For instance, we show that deepening the safety alignment beyond just the first few tokens can often meaningfully improve robustness against some common exploits. Finally, we design a regularized finetuning objective that makes the safety alignment more persistent against fine-tuning attacks by constraining updates on initial tokens. Overall, we advocate that future safety alignment should be made more than just a few tokens deep.

Updated: 2024-06-10 00:35:23

标题: 安全对齐不应该只停留在表面

摘要: 当前大型语言模型（LLMs）的安全对准存在漏洞。相对简单的攻击，甚至是良性微调，都可以越狱对准的模型。我们认为许多这些漏洞与一个共同的潜在问题有关：安全对准可能会采取捷径，其中对准主要调整模型的生成分布仅在其最初的几个输出标记上。我们将这个问题称为浅层安全对准。在本文中，我们提供案例研究来解释为什么浅层安全对准会存在，并提供证据表明当前对齐的LLMs受到这个问题的影响。我们还展示了这些发现如何帮助解释LLMs中多个最近发现的漏洞，包括对敌对后缀攻击、预填充攻击、解码参数攻击和微调攻击的易感性。重要的是，我们讨论了这个浅层安全对准的整体概念如何为缓解这些漏洞提供了有希望的研究方向。例如，我们表明，将对准的深度延伸到仅仅前几个标记之外通常可以显著提高对一些常见利用的鲁棒性。最后，我们设计了一个正则化微调目标，通过限制对初始标记的更新，使安全对准更加持久，抵御微调攻击。总的来说，我们主张未来的安全对准应该不仅仅局限于几个标记。

更新时间: 2024-06-10 00:35:23

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.05946v1

uSF: Learning Neural Semantic Field with Uncertainty

Recently, there has been an increased interest in NeRF methods which reconstruct differentiable representation of three-dimensional scenes. One of the main limitations of such methods is their inability to assess the confidence of the model in its predictions. In this paper, we propose a new neural network model for the formation of extended vector representations, called uSF, which allows the model to predict not only color and semantic label of each point, but also estimate the corresponding values of uncertainty. We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality. Code of the uSF approach is publicly available at https://github.com/sevashasla/usf/.

Updated: 2024-06-10 00:22:46

标题: uSF：学习具有不确定性的神经语义场

摘要: 最近，对NeRF方法的兴趣日益增加，这些方法可以重建三维场景的可微表示。这类方法的主要局限性之一是它们无法评估模型对其预测的置信度。在本文中，我们提出了一种新的神经网络模型，用于形成扩展向量表示，称为uSF，该模型允许预测每个点的颜色和语义标签，还可以估计相应的不确定性值。我们表明，即使训练可用的图像数量很少，一个能够量化不确定性的模型表现比没有这种功能的模型更好。uSF方法的代码可以在https://github.com/sevashasla/usf/上公开获取。

更新时间: 2024-06-10 00:22:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.08012v2

Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers

Graph transformers typically lack third-order interactions, limiting their geometric understanding which is crucial for tasks like molecular geometry prediction. We propose the Triplet Graph Transformer (TGT) that enables direct communication between pairs within a 3-tuple of nodes via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first predicting interatomic distances from 2D graphs and then using these distances for downstream tasks. A novel three-stage training procedure and stochastic inference further improve training efficiency and model performance. Our model achieves new state-of-the-art (SOTA) results on open challenge benchmarks PCQM4Mv2 and OC20 IS2RE. We also obtain SOTA results on QM9, MOLPCBA, and LIT-PCBA molecular property prediction benchmarks via transfer learning. We also demonstrate the generality of TGT with SOTA results on the traveling salesman problem (TSP).

Updated: 2024-06-10 00:22:17

标题: 三元交互改进图变换器：三元图变换器实现准确的分子图学习

摘要: 图转换器通常缺乏三阶交互作用，限制了它们对几何理解的能力，而这对于分子几何预测等任务至关重要。我们提出了三元图转换器（TGT），通过新颖的三元关注和聚合机制，实现了三元节点组内对之间的直接通信。TGT首先从2D图中预测原子间距离，然后将这些距离用于下游任务的分子性质预测。一种新颖的三阶段训练过程和随机推理进一步提高了训练效率和模型性能。我们的模型在开放挑战基准PCQM4Mv2和OC20 IS2RE上实现了最新的最佳结果。我们还通过迁移学习在QM9、MOLPCBA和LIT-PCBA分子性质预测基准上获得了最新的最佳结果。我们还通过旅行推销员问题（TSP）的最新最佳结果展示了TGT的普适性。

更新时间: 2024-06-10 00:22:17

领域: cs.LG

下载: http://arxiv.org/abs/2402.04538v2

SETC: A Vulnerability Telemetry Collection Framework

As emerging software vulnerabilities continuously threaten enterprises and Internet services, there is a critical need for improved security research capabilities. This paper introduces the Security Exploit Telemetry Collection (SETC) framework - an automated framework to generate reproducible vulnerability exploit data at scale for robust defensive security research. SETC deploys configurable environments to execute and record rich telemetry of vulnerability exploits within isolated containers. Exploits, vulnerable services, monitoring tools, and logging pipelines are defined via modular JSON configurations and deployed on demand. Compared to current manual processes, SETC enables automated, customizable, and repeatable vulnerability testing to produce diverse security telemetry. This research enables scalable exploit data generation to drive innovations in threat modeling, detection methods, analysis techniques, and remediation strategies. The capabilities of the framework are demonstrated through an example scenario. By addressing key barriers in security data generation, SETC represents a valuable platform to support impactful vulnerability and defensive security research.

Updated: 2024-06-10 00:13:35

标题: SETC：一种漏洞遥测收集框架

摘要: 随着新兴软件漏洞不断威胁企业和互联网服务，急需改进安全研究能力。本文介绍了安全漏洞遥测收集（SETC）框架 - 一种自动化框架，用于大规模生成可复现的漏洞利用数据，以支持强大的防御安全研究。SETC部署可配置环境，以在隔离容器内执行和记录漏洞利用的丰富遥测。利用模块化JSON配置定义漏洞利用、易受攻击的服务、监控工具和记录管道，并根据需求部署。与当前的手动流程相比，SETC实现了自动化、可定制化和可重复的漏洞测试，以生成多样化的安全遥测。该研究实现了可扩展的漏洞利用数据生成，推动威胁建模、检测方法、分析技术和应对策略的创新。该框架的功能通过一个示例场景进行了演示。通过解决安全数据生成中的关键障碍，SETC代表了一个有价值的平台，支持有影响力的漏洞和防御安全研究。

更新时间: 2024-06-10 00:13:35

领域: cs.CR,D.4.6

下载: http://arxiv.org/abs/2406.05942v1

Jailbreaking Quantum Computers

This work presented the first thorough exploration of the attacks on the interface between gate-level and pulse-level quantum circuits and pulse-level quantum circuits themselves. Typically, quantum circuits and programs that execute on quantum computers, are defined using gate-level primitives. However, to improve the expressivity of quantum circuits and to allow better optimization, pulse-level circuits are now often used. The attacks presented in this work leverage the inconsistency between the gate-level description of the custom gate, and the actual, low-level pulse implementation of this gate. By manipulating the custom gate specification, this work proposes numerous attacks: qubit plunder, qubit block, qubit reorder, timing mismatch, frequency mismatch, phase mismatch, and waveform mismatch. This work demonstrates these attacks on the real quantum computer and simulator, and shows that most current software development kits are vulnerable to these new types of attacks. In the end, this work proposes a defense framework. The exploration of security and privacy issues of the rising pulse-level quantum circuits provides insight into the future development of secure quantum software development kits and quantum computer systems.

Updated: 2024-06-10 00:11:05

标题: 越狱量子计算机

摘要: 这项工作首次对针对门级和脉冲级量子电路之间的接口以及脉冲级量子电路本身的攻击进行了彻底的探讨。通常，在量子计算机上执行的量子电路和程序是使用门级基元定义的。然而，为了提高量子电路的表达能力并实现更好的优化，现在经常使用脉冲级电路。本文提出的攻击利用了自定义门的门级描述与该门的实际低级脉冲实现之间的不一致性。通过操纵自定义门规范，本文提出了许多攻击：量子掠夺、量子阻塞、量子重新排序、时间不匹配、频率不匹配、相位不匹配和波形不匹配。这项工作在真实量子计算机和模拟器上展示了这些攻击，并表明大多数当前的软件开发工具包都容易受到这些新类型的攻击。最后，这项工作提出了一个防御框架。对新兴的脉冲级量子电路的安全和隐私问题的探讨为未来安全的量子软件开发工具包和量子计算机系统的发展提供了洞察。

更新时间: 2024-06-10 00:11:05

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2406.05941v1

Liouville Flow Importance Sampler

We present the Liouville Flow Importance Sampler (LFIS), an innovative flow-based model for generating samples from unnormalized density functions. LFIS learns a time-dependent velocity field that deterministically transports samples from a simple initial distribution to a complex target distribution, guided by a prescribed path of annealed distributions. The training of LFIS utilizes a unique method that enforces the structure of a derived partial differential equation to neural networks modeling velocity fields. By considering the neural velocity field as an importance sampler, sample weights can be computed through accumulating errors along the sample trajectories driven by neural velocity fields, ensuring unbiased and consistent estimation of statistical quantities. We demonstrate the effectiveness of LFIS through its application to a range of benchmark problems, on many of which LFIS achieved state-of-the-art performance.

Updated: 2024-06-10 00:08:07

标题: 勒维尔流重要性采样器

摘要: 我们提出了Liouville流重要采样器（LFIS），这是一种创新的基于流的模型，用于从非标准化密度函数中生成样本。LFIS学习一个时间相关的速度场，确定地将样本从简单的初始分布传输到复杂的目标分布，由一个预设的逐渐退火分布路径引导。LFIS的训练利用一种独特的方法，强制将一个导出的偏微分方程的结构施加到模拟速度场的神经网络上。通过将神经速度场视为重要采样器，可以通过在由神经速度场驱动的样本轨迹上累积误差来计算样本权重，从而保证对统计量的无偏和一致估计。我们通过将LFIS应用于一系列基准问题来展示其有效性，在其中许多问题上，LFIS达到了最先进的性能水平。

更新时间: 2024-06-10 00:08:07

领域: stat.ML,cs.LG,math.PR,physics.data-an,stat.CO

下载: http://arxiv.org/abs/2405.06672v2