Arxiv Day: Article

Knowledge Return Oriented Prompting (KROP)

Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.

Updated: 2024-06-11 23:58:37

标题: 知识回归导向提示（KROP）

摘要: 许多当前部署的大型语言模型（LLMs）和由LLM驱动的应用程序使用某种形式的提示过滤器或对齐来保护其完整性。然而，这些措施并非百分之百可靠。本文介绍了KROP，一种提示注入技术，能够混淆提示注入攻击，使它们对大多数安全措施几乎无法检测到。

更新时间: 2024-06-11 23:58:37

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.11880v1

Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction

Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important questions. Using a causal, interpretable model of AD, we first compare the performance of four contemporary RL algorithms in predicting brain cognition over 10 years using only baseline (year 0) data. We then apply SHAP (SHapley Additive exPlanations) to explain the decisions made by each algorithm in the model. Our approach combines interpretability with explainability to provide insights into the key factors influencing AD progression, offering both global and individual, patient-level analysis. Our findings show that only one of the RL methods is able to satisfactorily model disease progression, but the post-hoc explanations indicate that all methods fail to properly capture the importance of amyloid accumulation, one of the pathological hallmarks of Alzheimer's disease. Our work aims to merge predictive accuracy with transparency, assisting clinicians and researchers in enhancing disease progression modeling for informed healthcare decisions. Code is available at https://github.com/rfali/xrlad.

Updated: 2024-06-11 23:54:42

标题: 将阿尔茨海默病进展预测的可解释性和解释性统一起来

摘要: 强化学习（RL）最近显示出预测阿尔茨海默病（AD）进展的潜力，因为它具有建模领域知识的独特能力。然而，目前还不清楚哪种RL算法适合这项任务。此外，这些方法并不具有固有的可解释性，限制了它们在现实世界临床场景中的适用性。我们的工作解决了这两个重要问题。使用AD的因果可解释模型，我们首先比较了四种当代RL算法在仅使用基线（年0）数据预测10年大脑认知的表现。然后，我们应用SHAP（SHapley Additive exPlanations）来解释模型中每种算法所做决策。我们的方法将可解释性与解释性相结合，以洞察影响AD进展的关键因素，提供全局和个体、患者级别分析。我们的研究结果显示，只有一种RL方法能够令人满意地建模疾病进展，但事后解释表明所有方法都未能正确捕捉淀粉样蛋白积累的重要性，这是阿尔茨海默病的病理特征之一。我们的工作旨在将预测准确性与透明度相结合，帮助临床医生和研究人员提高疾病进展建模，以做出知情的医疗决策。代码可在https://github.com/rfali/xrlad找到。

更新时间: 2024-06-11 23:54:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.07777v1

Graph Laplacian Learning with Exponential Family Noise

Graph signal processing (GSP) is a prominent framework for analyzing signals on non-Euclidean domains. The graph Fourier transform (GFT) uses the combinatorial graph Laplacian matrix to reveal the spectral decomposition of signals in the graph frequency domain. However, a common challenge in applying GSP methods is that in many scenarios the underlying graph of a system is unknown. A solution in such cases is to construct the unobserved graph from available data, which is commonly referred to as graph or network inference. Although different graph inference methods exist, these are restricted to learning from either smooth graph signals or simple additive Gaussian noise. Other types of noisy data, such as discrete counts or binary digits, are rather common in real-world applications, yet are underexplored in graph inference. In this paper, we propose a versatile graph inference framework for learning from graph signals corrupted by exponential family noise. Our framework generalizes previous methods from continuous smooth graph signals to various data types. We propose an alternating algorithm that jointly estimates the graph Laplacian and the unobserved smooth representation from the noisy signals. We also extend our approach to a variational form to account for the inherent stochasticity of the latent smooth representation. Finally, since real-world graph signals are frequently non-independent and temporally correlated, we further adapt our original setting to a time-vertex formulation. We demonstrate on synthetic and real-world data that our new algorithms outperform competing Laplacian estimation methods that suffer from noise model mismatch.

Updated: 2024-06-11 23:52:10

标题: 使用指数家族噪声的图拉普拉斯学习

摘要: 图信号处理（GSP）是分析非欧几里德域上信号的一个突出框架。图傅立叶变换（GFT）使用组合图拉普拉斯矩阵揭示图频域中信号的谱分解。然而，在应用GSP方法时的一个常见挑战是，在许多情况下系统的潜在图是未知的。在这种情况下的解决方案是从可用数据中构建未观察到的图，通常被称为图或网络推断。尽管存在不同的图推断方法，但这些方法仅限于学习来自连续平滑图信号或简单加性高斯噪声。在真实应用中，其他类型的嘈杂数据，如离散计数或二进制数位，是相当常见的，但在图推断中尚未充分探索。在本文中，我们提出了一个多功能的图推断框架，用于学习受指数族噪声污染的图信号。我们的框架将以前的方法从连续平滑图信号推广到各种数据类型。我们提出了一个交替算法，同时估计图拉普拉斯和从嘈杂信号中未观察到的平滑表示。我们还将我们的方法扩展到变分形式，以考虑潜在平滑表示的固有随机性。最后，由于真实世界的图信号经常是非独立的且具有时间相关性，我们进一步将我们的原始设置调整为时间-顶点形式。我们在合成和真实数据上展示了我们的新算法优于那些受到噪声模型不匹配影响的竞争拉普拉斯估计方法。

更新时间: 2024-06-11 23:52:10

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2306.08201v2

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

Updated: 2024-06-11 23:51:06

标题: 自注意力基础变换的非线性基础转换用于动态光纤传输矩阵紧凑潜在空间建模

摘要: 多模光纤是玻璃的头发般细丝，能有效地传输光线。它们承诺提供能在人体深部提供前所未有的亚细胞图像分辨率的下一代医疗内窥镜。然而，将光线限制在这种光纤中意味着图像在传输过程中本质上会被扰乱。传统上，这种扰乱是通过预先校准特定光纤如何扰乱光线并解决表示光纤物理模型的静态线性矩阵方程来补偿的。然而，随着技术向现实世界部署的发展，解码过程必须考虑到代表光纤对光的效应的矩阵中的动态变化，由于诸如运动和温度变化等因素以及在人体内部时光纤尖端不可访问而导致的非线性。这种复杂、动态和非线性行为非常适合由神经网络来近似，但大多数领先的图像重建网络依赖于卷积层，这些层假设相邻像素之间存在强相关性，这种强归纳偏差不适用于可能以一系列任意坐标表示形式表达的光纤矩阵具有长程相关性。我们引入了一个新概念，使用自注意力层动态地转换不同光纤矩阵的坐标表示，使其能够接受适用于进一步处理的紧凑、低维表示。我们展示了这种方法在各种光纤矩阵数据集上的有效性。我们展示了我们的模型显著改善了其转换基础的稀疏性，其参与率p作为稀疏性度量，介于0.01和0.11之间。此外，我们展示了这些转换表示可以重建原始矩阵，重建误差小于10%，证明了可逆性。

更新时间: 2024-06-11 23:51:06

领域: cs.LG

下载: http://arxiv.org/abs/2406.07775v1

Operator Splitting for Learning to Predict Equilibria in Convex Games

Systems of competing agents can often be modeled as games. Assuming rationality, the most likely outcomes are given by an equilibrium (e.g. a Nash equilibrium). In many practical settings, games are influenced by context, i.e. additional data beyond the control of any agent (e.g. weather for traffic and fiscal policy for market economies). Often the exact game mechanics are unknown, yet vast amounts of historical data consisting of (context, equilibrium) pairs are available, raising the possibility of learning a solver which predicts the equilibria given only the context. We introduce Nash Fixed Point Networks (N-FPNs), a class of neural networks that naturally output equilibria. Crucially, N- FPNs employ a constraint decoupling scheme to handle complicated agent action sets while avoiding expensive projections. Empirically, we find N-FPNs are compatible with the recently developed Jacobian-Free Backpropagation technique for training implicit networks, making them significantly faster and easier to train than prior models. Our experiments show N-FPNs are capable of scaling to problems orders of magnitude larger than existing learned game solvers.

Updated: 2024-06-11 23:32:53

标题: 运算符分裂用于学习在凸博弈中预测均衡

摘要: 竞争代理系统经常可以建模为游戏。假设理性，最可能的结果由平衡（例如Nash平衡）给出。在许多实际情况下，游戏受到环境的影响，即超出任何代理人控制范围的额外数据（例如交通的天气和市场经济的财政政策）。通常，确切的游戏机制是未知的，但大量包含（环境，平衡）对的历史数据可用，提出了通过仅给定环境来预测平衡的解算器的可能性。我们介绍了Nash Fixed Point Networks（N-FPNs），一类自然输出平衡的神经网络。关键是，N-FPNs采用约束解耦方案来处理复杂的代理动作集，同时避免昂贵的投影。实证上，我们发现N-FPNs与最近开发的用于训练隐式网络的无Jacobian反向传播技术兼容，使它们比以前的模型训练速度更快、更容易。我们的实验表明，N-FPNs能够扩展到比现有学习游戏解算器大数个数量级的问题。

更新时间: 2024-06-11 23:32:53

领域: cs.LG,cs.GT,math.OC

下载: http://arxiv.org/abs/2106.00906v4

DualBind: A Dual-Loss Framework for Protein-Ligand Binding Affinity Prediction

Accurate prediction of protein-ligand binding affinities is crucial for drug development. Recent advances in machine learning show promising results on this task. However, these methods typically rely heavily on labeled data, which can be scarce or unreliable, or they rely on assumptions like Boltzmann-distributed data that may not hold true in practice. Here, we present DualBind, a novel framework that integrates supervised mean squared error (MSE) with unsupervised denoising score matching (DSM) to accurately learn the binding energy function. DualBind not only addresses the limitations of DSM-only models by providing more accurate absolute affinity predictions but also improves generalizability and reduces reliance on labeled data compared to MSE-only models. Our experimental results demonstrate that DualBind excels in predicting binding affinities and can effectively utilize both labeled and unlabeled data to enhance performance.

Updated: 2024-06-11 23:29:48

标题: DualBind：一种用于蛋白质-配体结合亲和力预测的双重损失框架

摘要: 蛋白质-配体结合亲和力的准确预测对于药物开发至关重要。机器学习的最新进展在这一任务上显示出有希望的结果。然而，这些方法通常严重依赖标记数据，这些数据可能稀缺或不可靠，或者它们依赖于像玻尔兹曼分布数据这样在实践中可能不成立的假设。在这里，我们提出了DualBind，这是一个新颖的框架，它将监督均方误差（MSE）与无监督去噪得分匹配（DSM）集成在一起，以准确学习结合能量函数。DualBind不仅通过提供更准确的绝对亲和力预测来解决DSM-only模型的局限性，还提高了泛化能力，并减少了与仅MSE模型相比对标记数据的依赖。我们的实验结果表明，DualBind在预测结合亲和力方面表现出色，并且能够有效利用标记和未标记数据来提高性能。

更新时间: 2024-06-11 23:29:48

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2406.07770v1

Learning Minimal NAP Specifications for Neural Network Verification

Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.

Updated: 2024-06-11 23:25:06

标题: 学习神经网络验证的最小NAP规范

摘要: 规范在神经网络验证中发挥着至关重要的作用。它们定义了我们旨在验证的精确输入区域，通常表示为L-无穷范数球。尽管最近的研究建议将神经激活模式（NAPs）作为验证未见测试集数据的规范，但它集中于计算最精细的NAPs，通常限于输入空间中非常小的区域。在本文中，我们研究以下问题：给定一个神经网络，找到一个足以形式验证网络鲁棒性的最小（最粗糙）NAP。找到最小的NAP规范不仅扩展了可验证的边界，还揭示了哪些神经元对模型的鲁棒性起到作用。为了解决这个问题，我们提出了几种精确和近似方法。我们的精确方法利用验证工具以确定性或统计方式找到最小的NAP规范。而近似方法则利用对抗性例子和局部梯度有效地估计最小的NAP，而不需要调用验证工具。这使我们能够检查神经元与最先进神经网络的鲁棒性之间的潜在因果关系，这是现有验证框架无法扩展的任务。我们的实验结果表明，与最精细的NAP规范相比，最小的NAP规范所需的神经元比例要小得多，但它们可以显着扩展可验证的边界至数个数量级更大。

更新时间: 2024-06-11 23:25:06

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2404.04662v2

Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff Estimation

Product assortment selection is a critical challenge facing physical retailers. Effectively aligning inventory with the preferences of shoppers can increase sales and decrease out-of-stocks. However, in real-world settings the problem is challenging due to the combinatorial explosion of product assortment possibilities. Consumer preferences are typically heterogeneous across space and time, making inventory-preference alignment challenging. Additionally, existing strategies rely on syndicated data, which tends to be aggregated, low resolution, and suffer from high latency. To solve these challenges we introduce a real-time recommendation system, which we call \ours. Our system utilizes recent advances in 3D computer vision for perception and automatic, fine grained sales estimation. These perceptual components run on the edge of the network and facilitate real-time reward signals. Additionally, we develop a Bayesian payoff model to account for noisy estimates from 3D LIDAR data. We rely on spatial clustering to allow the system to adapt to heterogeneous consumer preferences, and a graph-based candidate generation algorithm to address the combinatorial search problem. We test our system in real-world stores across two, 6-8 week A/B tests with beverage products and demonstrate a 35% and 27\% increase in sales respectively. Finally, we monitor the deployed system for a period of 28 weeks with an observational study and show a 9.4\% increase in sales.

Updated: 2024-06-11 23:23:54

标题: 个性化产品组合与实时三维感知和贝叶斯收益估计

摘要: 产品组合选择是实体零售商面临的关键挑战。有效地将库存与购物者的偏好相匹配可以增加销售额并减少缺货情况。然而，在现实世界中，由于产品组合可能性的组合爆炸，这个问题是具有挑战性的。消费者偏好通常在空间和时间上是异质的，这使得库存-偏好对齐具有挑战性。此外，现有策略依赖于聚合的数据，往往分辨率低且延迟高。为了解决这些挑战，我们引入了一个实时推荐系统，我们称之为\ours。我们的系统利用了最近在3D计算机视觉方面的进展，用于感知和自动、细粒度的销售估计。这些感知组件在网络边缘运行，并促进实时奖励信号。此外，我们开发了一个贝叶斯回报模型，以解释来自3D激光雷达数据的嘈杂估计。我们依赖于空间聚类，使系统能够适应异质的消费者偏好，并且使用基于图的候选生成算法来解决组合搜索问题。我们在两个真实商店中进行了为期6-8周的A/B测试，测试了我们的系统与饮料产品，并分别展示了35%和27%的销售增长。最后，我们通过一个观察性研究监测了部署的系统28周，并展示了9.4%的销售增长。

更新时间: 2024-06-11 23:23:54

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2406.07769v1

Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions

Assistive robotic arms often have more degrees-of-freedom than a human teleoperator can control with a low-dimensional input, like a joystick. To overcome this challenge, existing approaches use data-driven methods to learn a mapping from low-dimensional human inputs to high-dimensional robot actions. However, determining if such a black-box mapping can confidently infer a user's intended high-dimensional action from low-dimensional inputs remains an open problem. Our key idea is to adapt the assistive map at training time to additionally estimate high-dimensional action quantiles, and then calibrate these quantiles via rigorous uncertainty quantification methods. Specifically, we leverage adaptive conformal prediction which adjusts the intervals over time, reducing the uncertainty bounds when the mapping is performant and increasing the bounds when the mapping consistently mis-predicts. Furthermore, we propose an uncertainty-interval-based mechanism for detecting high-uncertainty user inputs and robot states. We evaluate the efficacy of our proposed approach in a 2D assistive navigation task and two 7DOF Kinova Jaco tasks involving assistive cup grasping and goal reaching. Our findings demonstrate that conformalized assistive teleoperation manages to detect (but not differentiate between) high uncertainty induced by diverse preferences and induced by low-precision trajectories in the mapping's training dataset. On the whole, we see this work as a key step towards enabling robots to quantify their own uncertainty and proactively seek intervention when needed.

Updated: 2024-06-11 23:16:46

标题: 共形化远程操作：自信地将人类输入映射到高维度机器人动作

摘要: 辅助机器人手臂通常具有比人类遥控器可以用低维输入（如操纵杆）控制的自由度更多。为了克服这一挑战，现有方法使用数据驱动方法学习从低维人类输入到高维机器人动作的映射。然而，确定这样一个黑盒映射是否能确信地从低维输入推断用户的预期高维动作仍然是一个悬而未决的问题。我们的关键想法是在训练时调整辅助映射，另外估计高维动作的分位数，并通过严格的不确定性量化方法校准这些分位数。具体来说，我们利用自适应符合预测，根据情况调整区间，当映射表现良好时，减小不确定性边界，当映射始终错误预测时，增加边界。此外，我们提出了一种基于不确定性区间的机制，用于检测高不确定性的用户输入和机器人状态。我们在一个2D辅助导航任务和两个涉及辅助杯抓取和目标到达的7自由度Kinova Jaco任务中评估了我们提出的方法的有效性。我们的研究结果表明，符合预测的辅助远程操作能够检测（但不能区分）由不同偏好引起的高不确定性和由映射训练数据集中低精度轨迹引起的不确定性。总的来说，我们认为这项工作是使机器人能够量化自身不确定性并在需要时主动寻求干预的关键一步。

更新时间: 2024-06-11 23:16:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.07767v1

Theoretical Analysis of Submodular Information Measures for Targeted Data Subset Selection

With increasing volume of data being used across machine learning tasks, the capability to target specific subsets of data becomes more important. To aid in this capability, the recently proposed Submodular Mutual Information (SMI) has been effectively applied across numerous tasks in literature to perform targeted subset selection with the aid of a exemplar query set. However, all such works are deficient in providing theoretical guarantees for SMI in terms of its sensitivity to a subset's relevance and coverage of the targeted data. For the first time, we provide such guarantees by deriving similarity-based bounds on quantities related to relevance and coverage of the targeted data. With these bounds, we show that the SMI functions, which have empirically shown success in multiple applications, are theoretically sound in achieving good query relevance and query coverage.

Updated: 2024-06-11 23:15:08

标题: 针对目标数据子集选择的次模信息度量的理论分析

摘要: 随着在机器学习任务中使用的数据量不断增加，针对特定数据子集的能力变得更加重要。为了帮助实现这种能力，最近提出的子模互信息（SMI）已经在文献中有效地应用于许多任务，以在示例查询集的帮助下执行目标子集选择。然而，所有这些工作在提供关于SMI对子集相关性和目标数据覆盖范围的理论保证方面都存在不足。我们首次通过推导与目标数据的相关性和覆盖范围相关的基于相似性的界限来提供这样的保证。通过这些界限，我们展示了在多个应用中经验上表现出成功的SMI函数在理论上实现了良好的查询相关性和查询覆盖率。

更新时间: 2024-06-11 23:15:08

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2402.13454v2

VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning

In recent years, there has been a growing emphasis on compressing large pre-trained transformer models for resource-constrained devices. However, traditional pruning methods often leave the embedding layer untouched, leading to model over-parameterization. Additionally, they require extensive compression time with large datasets to maintain performance in pruned models. To address these challenges, we propose VTrans, an iterative pruning framework guided by the Variational Information Bottleneck (VIB) principle. Our method compresses all structural components, including embeddings, attention heads, and layers using VIB-trained masks. This approach retains only essential weights in each layer, ensuring compliance with specified model size or computational constraints. Notably, our method achieves upto 70% more compression than prior state-of-the-art approaches, both task-agnostic and task-specific. We further propose faster variants of our method: Fast-VTrans utilizing only 3% of the data and Faster-VTrans, a time efficient alternative that involves exclusive finetuning of VIB masks, accelerating compression by upto 25 times with minimal performance loss compared to previous methods. Extensive experiments on BERT, ROBERTa, and GPT-2 models substantiate the efficacy of our method. Moreover, our method demonstrates scalability in compressing large models such as LLaMA-2-7B, achieving superior performance compared to previous pruning methods. Additionally, we use attention-based probing to qualitatively assess model redundancy and interpret the efficiency of our approach. Notably, our method considers heads with high attention to special and current tokens in un-pruned model as foremost candidates for pruning while retained heads are observed to attend more to task-critical keywords.

Updated: 2024-06-11 23:11:43

标题: VTrans：基于变分信息瓶颈的修剪加速Transformer压缩

摘要: 近年来，人们越来越注重压缩大型预训练的Transformer模型，以适应资源受限的设备。然而，传统的剪枝方法通常会保留嵌入层不动，导致模型参数过度化。此外，它们需要大量的压缩时间和大型数据集来维持剪枝模型的性能。为了解决这些挑战，我们提出了VTrans，一个由变分信息瓶颈（VIB）原则指导的迭代剪枝框架。我们的方法使用VIB训练的掩码来压缩所有结构组件，包括嵌入、注意力头和层。这种方法仅保留每层中的必要权重，确保符合指定的模型大小或计算约束。值得注意的是，我们的方法比先前的最先进方法（无论是通用任务还是特定任务）实现了高达70%的更高压缩率。我们进一步提出了我们方法的更快变体：Fast-VTrans只利用3%的数据，而Faster-VTrans是一个时间效率更高的替代方案，只涉及对VIB掩码的专门微调，与先前方法相比，压缩速度提高了高达25倍，并且性能损失最小。在BERT、ROBERTa和GPT-2模型上进行的大量实验证实了我们方法的有效性。此外，我们的方法在压缩大型模型（如LLaMA-2-7B）方面表现出可伸缩性，与先前的剪枝方法相比，取得了更优异的性能。此外，我们使用基于注意力的探测来定性评估模型的冗余性，并解释我们方法的效率。值得注意的是，在未剪枝模型中，我们的方法将具有高关注度的头部视为主要的剪枝候选，而保留的头部被观察到更多地关注任务关键字。

更新时间: 2024-06-11 23:11:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.05276v2

Using AI-Based Coding Assistants in Practice: State of Affairs, Perceptions, and Ways Forward

The last several years saw the emergence of AI assistants for code -- multi-purpose AI-based helpers in software engineering. Their quick development makes it necessary to better understand how specifically developers are using them, why they are not using them in certain parts of their development workflow, and what needs to be improved. In this work, we carried out a large-scale survey aimed at how AI assistants are used, focusing on specific software development activities and stages. We collected opinions of 481 programmers on five broad activities: (a) implementing new features, (b) writing tests, (c) bug triaging, (d) refactoring, and (e) writing natural-language artifacts, as well as their individual stages. Our results show that usage of AI assistants varies depending on activity and stage. For instance, developers find writing tests and natural-language artifacts to be the least enjoyable activities and want to delegate them the most, currently using AI assistants to generate tests and test data, as well as generating comments and docstrings most of all. This can be a good focus for features aimed to help developers right now. As for why developers do not use assistants, in addition to general things like trust and company policies, there are fixable issues that can serve as a guide for further research, e.g., the lack of project-size context, and lack of awareness about assistants. We believe that our comprehensive and specific results are especially needed now to steer active research toward where users actually need AI assistants.

Updated: 2024-06-11 23:10:43

标题: 在实践中使用基于人工智能的编码助手：现状、认知和未来方向

摘要: 在过去几年中，AI助手在代码领域崭露头角——这些多功能的基于人工智能的软件工程帮手。它们的快速发展使我们有必要更好地了解开发人员如何具体使用它们，为什么在开发工作流的某些部分不使用它们，以及需要改进的地方。在这项工作中，我们进行了一项针对AI助手使用情况的大规模调查，重点关注特定软件开发活动和阶段。我们收集了481名程序员对五个广泛活动的意见：(a) 实施新功能，(b) 编写测试，(c) 故障分类，(d) 重构，和 (e) 编写自然语言文档，以及它们各自的阶段。我们的结果显示，AI助手的使用情况因活动和阶段而异。例如，开发人员发现编写测试和自然语言文档是最不愉快的活动，并希望最多地委托它们，目前使用AI助手生成测试和测试数据，以及生成评论和文档字符串最多。这可以成为目前帮助开发人员的功能的一个良好焦点。至于为什么开发人员不使用助手，除了一般事项如信任和公司政策外，还有可解决的问题可以作为进一步研究的指导，例如项目规模上下文的缺乏，以及对助手的认识不足。我们相信我们全面和具体的结果现在尤为重要，可以引导积极的研究朝向用户实际需要的AI助手的方向。

更新时间: 2024-06-11 23:10:43

领域: cs.SE,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.07765v1

PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters

Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.

Updated: 2024-06-11 22:37:33

标题: PVF（参数脆弱性因子）：一种可伸缩的度量，用于理解模型参数中人工智能对SDC的脆弱性

摘要: 人工智能系统的可靠性对于成功部署和广泛采用人工智能技术至关重要。不幸的是，人工智能硬件系统的不断复杂化和异质性使其越来越容易受到硬件故障的影响，例如静默数据损坏（SDC），这可能会损坏模型参数。当这种情况发生在人工智能推断/服务过程中时，可能导致用户获得不正确或降级的模型输出，最终影响人工智能服务的质量和可靠性。鉴于不断升级的威胁，解决关键问题至关重要：人工智能模型对参数损坏有多脆弱，不同组件（如模块、层）的模型对参数损坏表现出不同的脆弱性？为了系统地解决这个问题，我们提出了一种新颖的定量指标，参数脆弱性因子（PVF），灵感来自计算机体系结构领域的架构脆弱性因子（AVF），旨在标准化对人工智能模型对参数损坏的脆弱性评估。我们将模型参数的PVF定义为特定模型参数损坏导致输出不正确的概率。在本文中，我们提出了几个应用PVF到推断过程中的三种类型任务/模型--推荐（DLRM）、视觉分类（CNN）和文本分类（BERT）的用例，同时对DLRM进行了深入的脆弱性分析。PVF可以为人工智能硬件设计人员提供关键见解，帮助平衡故障保护与性能/效率之间的权衡，例如将易受影响的人工智能参数组件映射到受保护良好的硬件模块。PVF指标适用于任何人工智能模型，并有潜力帮助统一和标准化人工智能脆弱性/韧性评估实践。

更新时间: 2024-06-11 22:37:33

领域: cs.CR,cs.AI,cs.AR,cs.LG

下载: http://arxiv.org/abs/2405.01741v3

The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($\rho$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.

Updated: 2024-06-11 22:26:20

标题: 2024年MuSe多模态情感分析挑战赛：社会感知和幽默识别

摘要: 2024年多模态情感分析挑战（MuSe）涉及两个当代多模态情感和情感分析问题：在社交感知子挑战（MuSe-Perception）中，参与者将根据提供的音频-视觉数据预测个体的16种社交属性，如自信、主导性、可爱和诚意。跨文化幽默检测子挑战（MuSe-Humor）数据集扩展了帕绍自发足球教练幽默（Passau-SFCH）数据集，重点关注跨语言和跨文化环境中自发幽默的检测。MuSe 2024的主要目标是团结来自各种研究领域的广泛观众，包括多模态情感分析、音频-视觉情感计算、连续信号处理和自然语言处理。通过促进这些领域专家之间的合作和交流，MuSe 2024致力于推动对情感分析和情感计算在多种模态下的理解和应用。本基线论文提供了每个子挑战及其相应数据集的详细信息，提取了每个数据模态的特征，并讨论了挑战基线。对于我们的基线系统，我们利用一系列Transformer和专家设计的特征，并在它们上训练门控循环单元（GRU）-递归神经网络（RNN）模型，从而产生一个竞争性的基线系统。在各自子挑战的未见测试数据集上，它实现了MuSe-Perception的平均皮尔逊相关系数（$\rho$）为0.3573，MuSe-Humor的曲线下面积（AUC）值为0.8682。

更新时间: 2024-06-11 22:26:20

领域: cs.AI,cs.CL,68T10,I.2

下载: http://arxiv.org/abs/2406.07753v1

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes

Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.

Updated: 2024-06-11 22:19:14

标题: 基于嵌入式多模态学习的全鳞细胞癌症研究，以改善生存结果

摘要: 癌症诊所在不同尺度上捕获疾病数据，从基因到器官水平。当前的生物信息学方法在处理这种数据的异质性，特别是缺失的模态方面存在困难。我们提出了PARADIGM，一个图神经网络（GNN）框架，从多模态、异质数据中学习，以改善临床结果预测。PARADIGM利用基础模型从多分辨率数据中生成嵌入，将它们聚合成患者级别表示，融合成统一的图，并增强像生存分析等任务的性能。我们在全脑鳞状细胞癌上训练GNN，并在莫菲特癌症中心肺鳞状细胞癌数据上验证我们的方法。多模态GNN在患者生存预测中表现优于其他模型。将不同尺度上的个体数据模态汇聚在一起，提供了更深入的疾病视角。我们的解决方案旨在全面了解患者的情况，提供关于异质数据整合和汇聚最大数据视图的好处的见解。

更新时间: 2024-06-11 22:19:14

领域: q-bio.CB,cs.LG

下载: http://arxiv.org/abs/2406.08521v1

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios

Automatic speech recognition (ASR) on multi-talker recordings is challenging. Current methods using 3D spatial data from multi-channel audio and visual cues focus mainly on direct waves from the target speaker, overlooking reflection wave impacts, which hinders performance in reverberant environments. Our research introduces RIR-SF, a novel spatial feature based on room impulse response (RIR) that leverages the speaker's position, room acoustics, and reflection dynamics. RIR-SF significantly outperforms traditional 3D spatial features, showing superior theoretical and empirical performance. We also propose an optimized all-neural multi-channel ASR framework for RIR-SF, achieving a relative 21.3\% reduction in CER for target speaker ASR in multi-channel settings. RIR-SF enhances recognition accuracy and demonstrates robustness in high-reverberation scenarios, overcoming the limitations of previous methods.

Updated: 2024-06-11 22:09:26

标题: RIR-SF：基于房间冲激响应的空间特征，用于多通道多人情境下的目标语音识别

摘要: 多说者录音的自动语音识别（ASR）是具有挑战性的。目前的方法利用来自多通道音频和视觉线索的3D空间数据主要关注目标说话者的直接波，忽略了反射波对性能的影响，这在混响环境中阻碍了性能。我们的研究介绍了RIR-SF，这是一种基于房间冲激响应（RIR）的新型空间特征，利用说话者的位置、房间声学和反射动态。RIR-SF在理论和实证性能方面明显优于传统的3D空间特征。我们还提出了一个针对RIR-SF的优化全神经多通道ASR框架，实现了多通道设置中目标说话者ASR的相对21.3\%的CER降低。RIR-SF增强了识别准确性，并在高混响情况下表现出稳健性，克服了以前方法的局限性。

更新时间: 2024-06-11 22:09:26

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2311.00146v2

Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

The first algorithm for the Linear Quadratic (LQ) control problem with an unknown system model, featuring a regret of $\mathcal{O}(\sqrt{T})$, was introduced by Abbasi-Yadkori and Szepesv\'ari (2011). Recognizing the computational complexity of this algorithm, subsequent efforts (see Cohen et al. (2019), Mania et al. (2019), Faradonbeh et al. (2020a), and Kargin et al.(2022)) have been dedicated to proposing algorithms that are computationally tractable while preserving this order of regret. Although successful, the existing works in the literature lack a fully adaptive exploration-exploitation trade-off adjustment and require a user-defined value, which can lead to overall regret bound growth with some factors. In this work, noticing this gap, we propose the first fully adaptive algorithm that controls the number of policy updates (i.e., tunes the exploration-exploitation trade-off) and optimizes the upper-bound of regret adaptively. Our proposed algorithm builds on the SDP-based approach of Cohen et al. (2019) and relaxes its need for a horizon-dependant warm-up phase by appropriately tuning the regularization parameter and adding an adaptive input perturbation. We further show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability, which is restrictive and can introduce complexities in initialization.

Updated: 2024-06-11 22:04:59

标题: 线性二次系统控制的全自适应保证后悔算法

摘要: Abbasi-Yadkori和Szepesv\'ari（2011）首次提出了一种具有$\mathcal{O}(\sqrt{T})$遗憾的未知系统模型的线性二次（LQ）控制问题的算法。鉴于该算法的计算复杂性，随后的努力（参见Cohen等人（2019），Mania等人（2019），Faradonbeh等人（2020a）和Kargin等人（2022））致力于提出既具有可计算性又保持这种遗憾顺序的算法。尽管取得了成功，但文献中的现有研究缺乏完全自适应的探索-利用权衡调整，并需要用户定义的值，这可能会导致总体遗憾边界增长一些因素。在本研究中，我们注意到这一差距，提出了第一个完全自适应的算法，用于控制策略更新次数（即调整探索-利用权衡）并自适应地优化遗憾上限。我们提出的算法建立在Cohen等人（2019）的基于SDP的方法之上，通过适当调整正则化参数和添加自适应输入扰动，放松其对依赖于时间的预热阶段的需求。我们进一步表明，通过仔细调整探索-利用权衡，无需坚持广泛使用的强顺序稳定性概念，这种概念是具有限制性的，并可能在初始化中引入复杂性。

更新时间: 2024-06-11 22:04:59

领域: stat.ML,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.07746v1

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).

Updated: 2024-06-11 21:54:15

标题: 面部误识别系统：简单的权重调整使深度神经网络只在特定人员上出错

摘要: 在这篇论文中，我们描述了如何在基于深度孪生神经网络流行架构的任何面部识别模型中种植新型后门。这些后门强制系统仅在攻击者预先选择的特定人物的自然图像上出错，而不控制其外观或插入任何触发器。例如，我们展示了这样一个带有后门的系统如何将特定人物的任意两幅图像分类为不同的人，或者将特定一对人物的任意两幅图像分类为同一人，而对于其他人的决策几乎没有影响。令人惊讶的是，我们发现这两种类型的后门都可以通过对模型的最后权重矩阵应用线性变换来实现，而无需额外的训练或优化，只使用后门身份的图像。我们的攻击的一个独特特性是，多个攻击者可以独立地在同一模型中安装多个后门，这些攻击者可能不知道彼此的存在，几乎没有干扰。我们在一个领先的面部识别系统上进行了实验证实这些攻击。当我们尝试单独对十位名人进行匿名化时，网络在$97.02\%$至$98.31\%$的时间内未能识别他们的两幅图像为同一人。例如，当我们试图混淆长相截然不同的摩根·弗里曼和斯嘉丽·约翰逊时，他们的图像在$98.47\%$的时间内被宣布为同一人。对于每种类型的后门，我们依次安装了多个后门，对彼此的性能几乎没有影响（例如，在同一模型上对所有十位名人进行匿名化，每位名人的成功率下降不超过$1.01\%$）。在所有实验中，网络对其他人的良性准确率几乎没有下降（在大多数情况下，下降不到$0.05\%$）。

更新时间: 2024-06-11 21:54:15

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2301.03118v2

The Future of Software Engineering in an AI-Driven World

A paradigm shift is underway in Software Engineering, with AI systems such as LLMs gaining increasing importance for improving software development productivity. This trend is anticipated to persist. In the next five years, we will likely see an increasing symbiotic partnership between human developers and AI. The Software Engineering research community cannot afford to overlook this trend; we must address the key research challenges posed by the integration of AI into the software development process. In this paper, we present our vision of the future of software development in an AI-Driven world and explore the key challenges that our research community should address to realize this vision.

Updated: 2024-06-11 21:46:19

标题: 人工智能驱动世界中软件工程的未来

摘要: 软件工程领域正在发生一场范式转变，人工智能系统如LLMs越来越重要，以提高软件开发生产力。这种趋势预计将持续下去。在接下来的五年里，我们很可能会看到人类开发者和人工智能之间的共生合作关系日益增强。软件工程研究界不能忽视这一趋势；我们必须解决人工智能融入软件开发过程所带来的关键研究挑战。本文展示了我们对未来人工智能驱动软件开发世界的愿景，并探讨了我们的研究社区应该解决的关键挑战，以实现这一愿景。

更新时间: 2024-06-11 21:46:19

领域: cs.SE,cs.AI,cs.LG,cs.PL

下载: http://arxiv.org/abs/2406.07737v1

REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy

Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved factuality and diversity over nucleus sampling by predicting an adaptive threshold of $p$. Specifically, REAL sampling predicts the step-wise likelihood of an LLM to hallucinate, and lowers the p threshold when an LLM is likely to hallucinate. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a Token-level Hallucination Forecasting (THF) model to predict the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies from a series of LLMs with different sizes. If a LLM's entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling. In the FactualityPrompts benchmark, we demonstrate that REAL sampling based on a 70M THF model can substantially improve the factuality and diversity of 7B LLMs simultaneously, judged by both retrieval-based metrics and human evaluation. After combined with contrastive decoding, REAL sampling outperforms 9 sampling methods, and generates texts that are more factual than the greedy sampling and more diverse than the nucleus sampling with $p=0.5$. Furthermore, the predicted asymptotic entropy is also a useful unsupervised signal for hallucination detection tasks.

Updated: 2024-06-11 21:44:49

标题: 实际采样：通过渐近熵提升开放式生成的真实性和多样性

摘要: 大型语言模型（LLM）的解码方法通常在确保事实性和保持多样性之间存在困难。例如，在核心（top-p）抽样中，较高的p阈值会增加多样性但降低事实性，反之亦然。在本文中，我们提出了REAL（来自渐近线的残余熵）抽样，这是一种解码方法，通过预测自适应阈值$p$来实现比核心抽样更好的事实性和多样性。具体来说，REAL抽样预测了LLM产生幻觉的逐步可能性，并在LLM可能产生幻觉时降低p阈值。否则，REAL抽样会增加p阈值以提高多样性。为了在没有监督的情况下预测逐步幻觉可能性，我们构建了一个Token级幻觉预测（THF）模型，通过从一系列不同大小的LLM中推测下一个标记的熵值（即固有不确定性）来预测渐近熵。如果一个LLM的熵值高于渐近熵（即LLM比应该更不确定），THF模型会预测高幻觉危险性，这导致REAL抽样中降低p阈值。在FactualityPrompts基准测试中，我们展示了基于70M THF模型的REAL抽样可以显著提高7B LLM的事实性和多样性，通过检索性指标和人类评估进行评判。与对比解码结合后，REAL抽样胜过了9种抽样方法，并生成的文本比贪婪抽样更具事实性，比$p=0.5$的核心抽样更具多样性。此外，预测的渐近熵也是幻觉检测任务的有用无监督信号。

更新时间: 2024-06-11 21:44:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.07735v1

Experimenting with D-Wave Quantum Annealers on Prime Factorization problems

This paper builds on top of a paper we have published very recently, in which we have proposed a novel approach to prime factorization (PF) by quantum annealing, where 8,219,999=32,749x251 was the highest prime product we were able to factorize -- which, to the best of our knowledge is the largest number which was ever factorized by means of a quantum device. The series of annealing experiments which led us to these results, however, did not follow a straight-line path; rather, they involved a convoluted trial-and-error process, full of failed or partially-failed attempts and backtracks, which only in the end drove us to find the successful annealing strategies. In this paper, we delve into the reasoning behind our experimental decisions and provide an account of some of the attempts we have taken before conceiving the final strategies that allowed us to achieve the results. This involves also a bunch of ideas, techniques, and strategies we investigated which, although turned out to be inferior wrt. those we adopted in the end, may instead provide insights to a more-specialized audience of D-Wave users and practitioners. In particular, we show the following insights: ($i$) different initialization techniques affect performances, among which flux biases are effective when targeting locally-structured embeddings; ($ii$) chain strengths have a lower impact in locally-structured embeddings compared to problem relying on global embeddings; ($iii$) there is a trade-off between broken chain and excited CFAs, suggesting an incremental annealing offset remedy approach based on the modules instead of single qubits. Thus, by sharing the details of our experiences, we aim to provide insights into the evolving landscape of quantum annealing, and help people access and effectively use D-Wave quantum annealers.

Updated: 2024-06-11 21:30:53

标题: 在素因子分解问题上尝试D-Wave量子退火器的实验

摘要: 这篇论文是在我们最近发表的一篇论文的基础上建立的，在那篇论文中，我们提出了一种通过量子退火进行素因数分解（PF）的新方法，其中8,219,999=32,749x251是我们能够分解的最高素数乘积--据我们所知，这是迄今为止通过量子设备分解的最大数字。然而，导致我们得出这些结果的一系列退火实验并非一帆风顺；相反，它们涉及一个错综复杂的反复试验过程，充满了失败或部分失败的尝试和倒退，直到最后才让我们找到成功的退火策略。在这篇论文中，我们深入探讨了我们实验决策背后的原因，并描述了我们在构思最终允许我们取得结果的最终策略之前采取的一些尝试。这还涉及到一堆我们调查过的想法、技术和策略，尽管它们最终被证明不如我们最终采用的那些优越，但可能会为D-Wave用户和从业者提供深入了解。特别是，我们展示了以下见解：（i）不同的初始化技术会影响性能，其中当针对局部结构嵌入时，通量偏置是有效的；（ii）链强度对于依赖全局嵌入的问题而言比对局部结构嵌入的问题影响较小；（iii）链断裂和激发的CFAs之间存在权衡，建议基于模块而不是单个量子比特的增量式退火偏移补救方法。因此，通过分享我们经验的细节，我们旨在为量子退火不断发展的领域提供见解，并帮助人们访问和有效使用D-Wave量子退火器。

更新时间: 2024-06-11 21:30:53

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2406.07732v1

Aligning Large Language Models with Representation Editing: A Control Perspective

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods.

Updated: 2024-06-11 21:18:24

标题: 使用表示编辑对齐大型语言模型：一种控制视角

摘要: 将大型语言模型（LLMs）与人类目标对齐对于实际应用至关重要。然而，为了实现对齐，对LLMs进行微调通常会遇到训练不稳定的问题，并且需要大量的计算资源。诸如提示和引导解码等测试时对齐技术并不修改基础模型，它们的性能仍然取决于原始模型的能力。为了解决这些挑战，我们提出通过表示编辑来对齐LLMs。我们方法的核心是将预训练的自回归LLM视为离散时间随机动力系统。为了实现特定目标的对齐，我们将外部控制信号引入这种语言动态系统的状态空间中。我们根据贝尔曼方程直接在隐藏状态上训练值函数，从而使得梯度优化可以在测试时获得最佳控制信号。我们的实验证明，我们的方法优于现有的测试时对齐技术，同时与微调方法相比所需资源显著减少。

更新时间: 2024-06-11 21:18:24

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.05954v2

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

Updated: 2024-06-11 21:18:14

标题: EARS：一个无混响全频带语音数据集，用于语音增强和去混响进行基准测试

摘要: 我们发布了EARS（Expressive Anechoic Recordings of Speech）数据集，这是一个高质量的语音数据集，包括来自不同背景的107位发言人，总计100小时的干净、无混响的语音数据。该数据集涵盖了各种不同的发言风格，包括情绪语音、不同的朗读风格、非语言声音和自由形式的对话语音。我们在数据集上对语音增强和去混响的各种方法进行基准测试，并通过一组仪器指标评估它们的性能。此外，我们还为语音增强任务进行了一个听觉测试，共有20名参与者参与，其中优先选择了生成方法。我们引入了一个盲测试集，可以用于对上传数据进行自动在线评估。数据集下载链接和自动评估服务器可以在线找到。

更新时间: 2024-06-11 21:18:14

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.06185v2

Efficient Parallel Multi-Hop Reasoning: A Scalable Approach for Knowledge Graph Analysis

Multi-hop reasoning (MHR) is a process in artificial intelligence and natural language processing where a system needs to make multiple inferential steps to arrive at a conclusion or answer. In the context of knowledge graphs or databases, it involves traversing multiple linked entities and relationships to understand complex queries or perform tasks requiring a deeper understanding. Multi-hop reasoning is a critical function in various applications, including question answering, knowledge base completion, and link prediction. It has garnered significant interest in artificial intelligence, machine learning, and graph analytics. This paper focuses on optimizing MHR for time efficiency on large-scale graphs, diverging from the traditional emphasis on accuracy which is an orthogonal goal. We introduce a novel parallel algorithm that harnesses domain-specific learned embeddings to efficiently identify the top K paths between vertices in a knowledge graph to find the best answers to a three-hop query. Our contributions are: (1) We present a new parallel algorithm to enhance MHR performance, scalability and efficiency. (2) We demonstrate the algorithm's superior performance on leading-edge Intel and AMD architectures through empirical results. We showcase the algorithm's practicality through a case study on identifying academic affiliations of potential Turing Award laureates in Deep Learning, highlighting its capability to handle intricate entity relationships. This demonstrates the potential of our approach to enabling high-performance MHR, useful to navigate the growing complexity of modern knowledge graphs.

Updated: 2024-06-11 21:12:34

标题: 高效的并行多跳推理：一种可扩展的知识图分析方法

摘要: 多跳推理（MHR）是人工智能和自然语言处理中的一个过程，系统需要进行多个推理步骤才能得出结论或答案。在知识图谱或数据库的背景下，它涉及遍历多个链接的实体和关系，以理解复杂查询或执行需要更深入理解的任务。多跳推理是各种应用中的关键功能，包括问答、知识库完成和链接预测。它在人工智能、机器学习和图分析领域引起了广泛关注。本文侧重于优化大规模图上的MHR的时间效率，与传统侧重于准确性的目标不同。我们介绍了一种新的并行算法，利用领域特定的学习嵌入来高效地识别知识图中顶点之间的前K条路径，以找到三跳查询的最佳答案。我们的贡献是：（1）我们提出了一种新的并行算法，以增强MHR的性能、可扩展性和效率。（2）我们通过实证结果展示了该算法在领先的英特尔和AMD架构上的卓越性能。我们通过一个关于在深度学习中识别潜在图灵奖得主的学术关系的案例研究展示了该算法的实用性，突出了它处理复杂实体关系的能力。这展示了我们的方法在实现高性能MHR方面的潜力，有助于应对现代知识图谱的日益复杂性。

更新时间: 2024-06-11 21:12:34

领域: cs.AI,cs.DC,cs.DS,cs.LG,cs.PF,H.4; C.4

下载: http://arxiv.org/abs/2406.07727v1

A simple connection from loss flatness to compressed representations in neural networks

The generalization capacity of deep neural networks has been studied in a variety of ways, including at least two distinct categories of approaches: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). Although these two approaches are related, they are rarely studied together explicitly. Here, we present an analysis that bridges this gap. We show that in the final phase of learning in deep neural networks, the compression of the manifold of neural representations correlates with the flatness of the loss around the minima explored by SGD. This correlation is predicted by a relatively simple mathematical relationship: a flatter loss corresponds to a lower upper bound on the compression metrics of neural representations. Our work builds upon the linear stability insight by Ma and Ying, deriving inequalities between various compression metrics and quantities involving sharpness. Empirically, our derived inequality predicts a consistently positive correlation between representation compression and loss sharpness in multiple experimental settings. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.

Updated: 2024-06-11 21:11:28

标题: 从损失平坦性到神经网络中的压缩表示的简单连接

摘要: 深度神经网络的泛化能力已通过多种方式进行研究，包括至少两种不同类别的方法：一种基于参数空间中损失景观的形状，另一种基于特征空间中表示流形的结构（即，在单元活动的空间中）。尽管这两种方法有关联，但它们很少被明确地一起研究。在这里，我们提出了一项能够弥合这一差距的分析。我们展示了在深度神经网络学习的最后阶段，神经表示流形的压缩与随 SGD 探索的最小值周围损失的平坦程度相关。这种关联由一个相对简单的数学关系预测：较平坦的损失对应于神经表示的压缩度的较低上界。我们的工作建立在 Ma 和 Ying 提出的线性稳定性洞见的基础上，推导出各种压缩度量和涉及尖锐度的量之间的不等式。在实证上，我们推导出的不等式在多个实验设置中预测了表示压缩和损失尖锐度之间的一致正相关性。总的来说，我们在神经网络中的泛化提出了一个关于参数空间和特征空间的双重视角。

更新时间: 2024-06-11 21:11:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.01770v3

Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN.

Updated: 2024-06-11 21:10:58

标题: 通过解耦表示学习增强图神经网络中的尺寸泛化

摘要: 尽管大多数图神经网络（GNNs）可以在任何规模的图上运行，但它们在大于训练过程中遇到的图上的分类性能通常会下降。现有方法不够解决从图表示中移除大小信息的问题，导致性能次优并依赖于骨干模型。为此，我们提出了DISGEN，这是一个新颖的、与模型无关的框架，旨在将大小因素从图表示中解开。DISGEN采用大小和任务不变的增强，并引入了一个解耦损失，最小化隐藏表示中的共享信息，具有理论保证的有效性。我们的实证结果表明，DISGEN在真实数据集上的表现优于最先进的模型高达6%，凸显了其增强GNN大小泛化能力的有效性。我们的代码可在以下链接找到：https://github.com/GraphmindDartmouth/DISGEN。

更新时间: 2024-06-11 21:10:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.04601v3

A Concise Mathematical Description of Active Inference in Discrete Time

In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a general introduction to the topic, including an example illustrating the theory on action selection. In the appendix the more subtle mathematical details are discussed. This part is aimed at readers who have already studied the active inference literature but struggle to make sense of the mathematical details and derivations. Throughout the whole manuscript, special attention has been paid to adopting notation that is both precise and in line with standard mathematical texts. All equations and derivations are linked to specific equation numbers in other popular text on the topic. Furthermore, Python code is provided that implements the action selection mechanism described in this paper and is compatible with pymdp environments.

Updated: 2024-06-11 21:09:45

标题: 一个简洁的数学描述：离散时间中的主动推理

摘要: 在本文中，我们提出了一种简洁的离散时间中主动推理的数学描述。本文的主要部分作为对该主题的一般介绍，包括通过示例说明行动选择理论。在附录中讨论了更微妙的数学细节，这部分针对那些已经研究过主动推理文献但难以理解数学细节和推导的读者。在整个手稿中，特别注意采用既精确又符合标准数学文本的符号。所有方程式和推导都与其他流行文献中的特定方程式编号相关联。此外，提供了Python代码，实现了本文中描述的行动选择机制，并与pymdp环境兼容。

更新时间: 2024-06-11 21:09:45

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.07726v1

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67\% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

Updated: 2024-06-11 20:55:07

标题: 关于分布偏移下公共表征对私人迁移学习的益处

摘要: 公共预训练是改进差分隐私模型训练的一种有前途的方法。然而，最近的研究指出，许多研究这一范式的积极结果仅考虑分布内任务，并且可能不适用于预训练和微调数据之间存在分布转移的情况--这种情况在微调私人任务时很可能发生，因为数据的敏感性。在这项工作中，我们在三个任务上的实证研究表明，即使在存在很大的分布转移的情况下，公共特征实际上可以比私人从头开始训练的结果提高私人训练准确性高达67\%。我们为这一现象提供了理论解释，表明如果公共数据和私人数据共享低维表示，即使无法仅从公共数据学习私人任务，公共表示也可以改善私人训练的样本复杂性。总的来说，我们的结果表明，在极端分布转移的实际情况下，公共数据确实可以使私人训练变得实用。

更新时间: 2024-06-11 20:55:07

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2312.15551v3

Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation allows us to port several algorithmic ideas from other areas of optimal transport theory. In particular, our formulation makes it possible to introduce an appropriate notion of entropy regularization into the optimization problem, which in turn enables us to directly calculate optimal transport distances via a Sinkhorn-like method we call Sinkhorn Value Iteration (SVI). We show both theoretically and empirically that this method converges quickly to an optimal coupling, essentially at the same computational cost of running vanilla Sinkhorn in each pair of states. Along the way, we point out that our optimal transport distance exactly matches the common notion of bisimulation metrics between Markov chains, and thus our results also apply to computing such metrics, and in fact our algorithm turns out to be significantly more efficient than the best known methods developed so far for this purpose.

Updated: 2024-06-11 20:53:28

标题: 双模拟度量是最优输运距离，并且可以高效计算。

摘要: 我们提出了一个新的框架，用于制定马尔可夫链之间的最优传输距离。先前已知的公式研究了链引起的整个联合分布之间的耦合，并通过将其简化为适当定义的马尔可夫决策过程中的动态规划（DP）得出解决方案。然而，这种公式到目前为止尚未导致特别高效的算法，因为计算相关的DP算子需要完全解决一个静态最优传输问题，并且这些算子需要在整个优化过程中应用多次。在这项工作中，我们通过考虑我们称之为折扣占用耦合的联合分布的扁平化版本之间的耦合，提出了一种替代视角，并表明在联合分布的完整空间中计算最优传输距离可以等价地制定为在这个减少的空间中解决线性规划（LP）。这个LP公式使我们能够将几个算法思想从其他最优传输理论领域引入。特别地，我们的公式使得我们能够将适当的熵正则化概念引入到优化问题中，从而使我们能够通过一种我们称为Sinkhorn值迭代（SVI）的Sinkhorn样方法直接计算最优传输距离。我们在理论上和实践上都表明，这种方法在几乎与在每对状态中运行普通Sinkhorn相同的计算成本下迅速收敛到最优耦合。在此过程中，我们指出我们的最优传输距离恰好与马尔可夫链之间的公共双模拟度量的概念完全匹配，因此我们的结果也适用于计算这种度量，事实上，我们的算法事实上比迄今为止为此目的开发的最佳已知方法更高效。

更新时间: 2024-06-11 20:53:28

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.04056v2

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.

Updated: 2024-06-11 20:48:28

标题: LLAMAFUZZ：大型语言模型增强的灰盒模糊测试

摘要: 灰盒模糊测试在揭示程序中的漏洞和脆弱性方面取得了成功。然而，随机变异策略限制了模糊器在结构化数据上的性能。专门的模糊器可以处理复杂的结构化数据，但需要额外的语法工作，并且吞吐量较低。在本文中，我们探讨了利用大语言模型增强用于结构化数据的灰盒模糊测试的潜力。我们利用LLM关于数据转换和格式的预训练知识生成新的有效输入。我们进一步通过配对变异种子对其进行微调，以有效学习结构化格式和变异策略。我们的基于LLM的模糊器LLAMAFUZZ整合了LLM的能力，以理解和变异结构化数据进行模糊测试。我们在标准基于漏洞的基准测试Magma和各种真实世界程序上进行实验。LLAMAFUZZ在平均41个漏洞方面优于我们的顶级竞争对手。我们还在所有试验中识别了47个独特的漏洞。此外，LLAMAFUZZ在漏洞触发和达到方面表现出一致的性能。与AFL++相比，LLAMAFUZZ在平均实际程序集中实现了27.19%更多的分支。我们还展示了一个案例研究，以解释LLM如何在代码覆盖方面增强模糊测试过程。

更新时间: 2024-06-11 20:48:28

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.07714v1

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which we demonstrate to hold empirically for deep models. Second, we show that sample reuse in finite sum (stochastic) optimization does not make the empirical gradient deviate from the population gradient as long as the LGGW is small. Third, focusing on deep networks, we present results showing how to bound their LGGW under mild assumptions. In particular, we show that their LGGW can be bounded (a) by the $L_2$-norm of the loss Hessian eigenvalues, which has been empirically shown to be $\tilde{O}(1)$ for commonly used deep models; and (b) in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, avoid the pitfalls of predictor Rademacher complexity based analysis, and hold considerable promise towards quantitatively tight bounds for deep models.

Updated: 2024-06-11 20:46:32

标题: 基于损失梯度高斯宽度的泛化和优化保证

摘要: 在机器学习中，对于总体损失的泛化和优化保证通常依赖于基于均匀收敛的分析，通常基于预测器的Rademacher复杂性。现代模型的丰富表示能力引发了对这种方法的担忧。本文提出了泛化和优化保证，以梯度复杂性为尺度，即通过损失梯度高斯宽度（LGGW）进行测量。首先，我们在一种灵活的梯度支配条件下直接引入了以LGGW为基础的泛化保证，我们证明了这种条件在深度模型中经验上成立。其次，我们表明在有限和（随机）优化中的样本重用不会使经验梯度偏离总体梯度，只要LGGW很小。第三，专注于深度网络，我们展示了如何在温和的假设下限制其LGGW。特别地，我们展示了他们的LGGW可以被限制为（a）损失Hessian特征值的$L_2$-范数，在常用的深度模型中经验上被证明为$\tilde{O}(1)$；和（b）以特征提取器的高斯宽度为尺度，即倒数第二层的输出。据我们所知，我们基于LGGW的泛化和优化保证是其类别中的第一个结果，避免了基于预测器Rademacher复杂性的分析中的陷阱，并为深度模型提供了量化严格边界的可观前景。

更新时间: 2024-06-11 20:46:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.07712v1

Data-Driven Goal Recognition Design for General Behavioral Agents

Goal recognition design aims to make limited modifications to decision-making environments with the goal of making it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness($\textit{wcd}$) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the $\textit{wcd}$ for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing $\textit{wcd}$ and enhancing runtime efficiency in conventional setup. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we have conducted human-subject experiments which confirm that our method can create environments that facilitate efficient goal recognition from real-world human decision-makers.

Updated: 2024-06-11 20:45:56

标题: 基于数据驱动的一般行为代理目标识别设计

摘要: 目标识别设计旨在对决策环境进行有限修改，以便更容易推断在该环境中行动的代理人的目标。尽管在目标识别设计方面已经进行了各种研究努力，但现有方法在计算上要求高，通常假设代理人在决策过程中是（近乎）最优的。为了解决这些限制，我们引入了一种基于数据驱动的目标识别设计方法，可以考虑具有一般行为模型的代理人。遵循现有文献，我们使用最坏情况下的独特性（$\textit{wcd}$）作为衡量在决策环境中推断代理人目标难度的指标。我们的方法首先通过训练机器学习模型来预测给定环境和代理人行为模型的$\textit{wcd}$。然后，我们提出了一个基于梯度的优化框架，以适应各种约束条件，优化决策环境以增强目标识别。通过大量模拟，我们证明我们的方法在降低$\textit{wcd}$和提高传统设置中的运行效率方面优于现有方法。此外，我们的方法还适应了现有方法不适用的情况，例如涉及灵活预算约束、更复杂环境和次优代理行为的情况。最后，我们进行了人体实验，证实我们的方法可以创建有助于从现实世界人类决策者中高效识别目标的环境。

更新时间: 2024-06-11 20:45:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03054v2

Diagnosing and fixing common problems in Bayesian optimization for molecule design

Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al, 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.

Updated: 2024-06-11 20:44:04

标题: 诊断和解决分子设计中贝叶斯优化常见问题

摘要: 贝叶斯优化（BO）是分子设计任务的一种原则性方法。在本文中，我们解释了BO的三个缺陷，可能导致实验性能不佳：先验宽度不正确，过度平滑和不足的收集函数最大化。我们展示了，解决了这些问题后，即使是基本的BO设置也能在PMO分子设计基准测试中实现最高的整体性能（Gao等，2022）。这些结果表明，BO在分子机器学习社区中可能受益于更多的关注。

更新时间: 2024-06-11 20:44:04

领域: cs.LG,physics.chem-ph,stat.ML

下载: http://arxiv.org/abs/2406.07709v1

A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7

In the construction sector, ensuring worker safety is of the utmost significance. In this study, a deep learning-based technique is presented for identifying safety gear worn by construction workers, such as helmets, goggles, jackets, gloves, and footwears. The recommended approach uses the YOLO v7 (You Only Look Once) object detection algorithm to precisely locate these safety items. The dataset utilized in this work consists of labeled images split into training, testing and validation sets. Each image has bounding box labels that indicate where the safety equipment is located within the image. The model is trained to identify and categorize the safety equipment based on the labeled dataset through an iterative training approach. We used custom dataset to train this model. Our trained model performed admirably well, with good precision, recall, and F1-score for safety equipment recognition. Also, the model's evaluation produced encouraging results, with a mAP@0.5 score of 87.7\%. The model performs effectively, making it possible to quickly identify safety equipment violations on building sites. A thorough evaluation of the outcomes reveals the model's advantages and points up potential areas for development. By offering an automatic and trustworthy method for safety equipment detection, this research makes a contribution to the fields of computer vision and workplace safety. The proposed deep learning-based approach will increase safety compliance and reduce the risk of accidents in the construction industry

Updated: 2024-06-11 20:38:41

标题: 基于YOLOv7的深度学习方法检测建筑工人完整的安全装备

摘要: 在建筑行业，确保工人安全至关重要。本研究提出了一种基于深度学习的技术，用于识别建筑工人佩戴的安全装备，如头盔、护目镜、夹克、手套和鞋子。推荐的方法使用YOLO v7（You Only Look Once）目标检测算法准确定位这些安全物品。本研究中使用的数据集包括标记的图像，分为训练、测试和验证集。每张图像都有边界框标签，指示图像中安全装备的位置。通过迭代训练方法，模型被训练来根据标记的数据集识别和分类安全装备。我们使用自定义数据集来训练这个模型。我们训练的模型表现出色，对安全装备的识别具有良好的精度、召回率和F1分数。此外，模型的评估产生了令人鼓舞的结果，mAP@0.5分数为87.7\%。该模型表现出色，能够快速识别建筑工地上的安全装备违规行为。对结果的彻底评估揭示了模型的优势，并指出了潜在的发展领域。通过提供一种自动可靠的安全装备检测方法，本研究为计算机视觉和工作场所安全领域做出了贡献。提出的基于深度学习的方法将提高建筑行业的安全合规性，并减少事故风险。

更新时间: 2024-06-11 20:38:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.07707v1

Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation

Hand gestures can provide a natural means of human-computer interaction and enable people who cannot speak to communicate efficiently. Existing hand gesture recognition methods heavily depend on pre-defined gestures, however, motor-impaired individuals require new gestures tailored to each individual's gesture motion and style. Gesture samples collected from different persons have distribution shifts due to their health conditions, the severity of the disability, motion patterns of the arms, etc. In this paper, we introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning (FSCL) framework that significantly improves the performance of fine-tuning a model for out-of-distribution data. Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as gesture prior knowledge, along with intra-gesture divergence derived from two additional embeddings. Thus, the model can capture latent statistical structure in highly variable gestures with limited samples. We conduct an experimental evaluation using the SmartWatch Gesture and the Motion Gesture datasets. The proposed method results in an average test accuracy of 57.0%, 64.6%, and 69.3% by using one, three, and five samples for six different gestures. Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied in human-computer interaction and social communication. Code is available at: https://github.com/riyadRafiq/wearable-latent-embedding-exploitation

Updated: 2024-06-11 20:33:20

标题: 穿戴式传感器基于潜在嵌入利用的手势少样本持续学习对运动受损个体的研究

摘要: 手势可以提供一种自然的人机交互方式，使不能说话的人能够高效地进行沟通。现有的手势识别方法在很大程度上依赖于预定义的手势，然而，运动障碍的个体需要根据每个人的手势动作和风格定制新的手势。由于不同个体的健康状况、残疾的严重程度、手臂的动作模式等原因，从不同人员收集的手势样本存在分布偏移。在本文中，我们在基于回放的少样本持续学习（FSCL）框架中引入了潜在嵌入利用（LEE）机制，显著提高了对于分布外数据微调模型的性能。我们的方法通过利用一种被称为手势先验知识的保留潜在嵌入，以及从两个额外嵌入中得出的内部手势差异，产生了多样化的潜在特征空间。因此，模型可以在有限样本下捕捉高度变化的手势中的潜在统计结构。我们使用SmartWatch Gesture和Motion Gesture数据集进行了实验评估。所提出的方法通过使用一个、三和五个样本进行六种不同手势的平均测试准确率分别为57.0％、64.6％和69.3％。我们的方法帮助运动障碍者利用可穿戴设备，他们独特的运动风格可以被学习并应用于人机交互和社交通讯。源代码可在以下网址找到：https://github.com/riyadRafiq/wearable-latent-embedding-exploitation

更新时间: 2024-06-11 20:33:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.08969v2

Scalable UTXO Smart Contracts via Fine-Grained Distributed State

Current UTXO-based smart contracts face an efficiency bottleneck, requiring any transaction sent to a contract to specify the entire updated contract state. This requirement becomes particularly burdensome when the contract state contains dynamic data structures, such as maps, which are needed in many use cases for tracking users interactions with the contract. The problem is twofold: on the one hand, a large state in transactions implies a large transaction fee; on the other hand, a large centralized state is detrimental to the parallelization of transactions, which should be one of the main selling points of UTXO-based blockchains compared to account-based ones. We propose a technique to efficiently execute smart contracts on an extended UTXO blockchain, which allows the contract state to be distributed across multiple UTXOs. In this way, transactions only need to specify the part of the state they need to access, reducing their size (and fees). We also show how to exploit our model to parallelize the validation of transactions on multi-core CPUs. We implement our technique and provide an empirical validation of its effectiveness.

Updated: 2024-06-11 20:28:27

标题: 通过细粒度分布式状态实现可扩展的UTXO智能合约

摘要: 目前基于UTXO的智能合约面临效率瓶颈，要求发送到合约的任何交易都必须指定整个更新的合约状态。当合约状态包含动态数据结构（例如地图），在许多情况下需要跟踪用户与合约的交互时，这一要求尤为繁重。问题是双重的：一方面，交易中的大状态意味着高额交易费；另一方面，大型集中状态对交易的并行化是有害的，而交易的并行化应该是基于UTXO的区块链相对于基于账户的区块链的主要卖点之一。我们提出了一种在扩展的UTXO区块链上高效执行智能合约的技术，该技术允许将合约状态分布在多个UTXO之间。通过这种方式，交易只需指定它们需要访问的状态部分，从而减小了其大小（和费用）。我们还展示了如何利用我们的模型在多核CPU上并行验证交易。我们实现了我们的技术，并提供了其有效性的经验验证。

更新时间: 2024-06-11 20:28:27

领域: cs.CR

下载: http://arxiv.org/abs/2406.07700v1

CUPID: Contextual Understanding of Prompt-conditioned Image Distributions

We present CUPID: a visualization method for the contextual understanding of prompt-conditioned image distributions. CUPID targets the visual analysis of distributions produced by modern text-to-image generative models, wherein a user can specify a scene via natural language, and the model generates a set of images, each intended to satisfy the user's description. CUPID is designed to help understand the resulting distribution, using contextual cues to facilitate analysis: objects mentioned in the prompt, novel, synthesized objects not explicitly mentioned, and their potential relationships. Central to CUPID is a novel method for visualizing high-dimensional distributions, wherein contextualized embeddings of objects, those found within images, are mapped to a low-dimensional space via density-based embeddings. We show how such embeddings allows one to discover salient styles of objects within a distribution, as well as identify anomalous, or rare, object styles. Moreover, we introduce conditional density embeddings, whereby conditioning on a given object allows one to compare object dependencies within the distribution. We employ CUPID for analyzing image distributions produced by large-scale diffusion models, where our experimental results offer insights on language misunderstanding from such models and biases in object composition, while also providing an interface for discovery of typical, or rare, synthesized scenes.

Updated: 2024-06-11 20:26:41

标题: 丘比特：基于提示条件的图像分布的情境理解

摘要: 我们提出了CUPID：一种用于上下文理解的可视化方法受提示条件影响的图像分布。CUPID旨在可视化现代文本到图像生成模型产生的分布，用户可以通过自然语言指定场景，模型会生成一组每个图像都旨在满足用户描述。CUPID旨在帮助理解结果分布，使用上下文线索促进分析：提示中提到的对象，新颖的，合成的对象不明确提到，以及它们的潜在关系。CUPID的核心是一种新颖的可视化高维分布的方法，其中对象的上下文化嵌入，即在图像中找到的对象，被映射到通过基于密度的嵌入将低维空间。我们展示了这样嵌入使人们能够发现分布中的显着对象风格，并识别异常或稀有的对象风格。此外，我们引入了条件密度嵌入，通过在给定的对象上进行条件化可以比较分布中的对象依赖关系。我们使用CUPID来分析大规模生成的图像分布扩散模型，我们的实验结果提供了关于语言的见解这些模型中的误解和对象构成中的偏见，同时也提供了一个界面来发现典型的或稀有的合成场景。

更新时间: 2024-06-11 20:26:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07699v1

Label Smoothing Improves Machine Unlearning

The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.

Updated: 2024-06-11 20:26:26

标题: 标签平滑改善机器遗忘

摘要: 机器遗忘（MU）的目标是从模型中消除先前学习的数据。然而，在使用现有的MU技术时，很难在计算成本和性能之间取得平衡。受标签平滑对模型信心和差分隐私的影响的启发，我们提出了一种简单的基于梯度的MU方法，该方法使用了标签平滑的反向过程。这项工作介绍了UGradSL，一种简单的即插即用的MU方法，使用了平滑的标签。我们提供了理论分析，证明了为什么适当引入标签平滑可以改善MU性能。我们在六个不同大小和不同形式的数据集上进行了广泛的实验，展示了我们提出的方法的有效性和稳健性。MU性能的持续改善仅需额外计算的边际成本。例如，UGradSL在不牺牲遗忘效率的情况下，将遗忘准确性提高了66%以上，超过了梯度上升MU基线。

更新时间: 2024-06-11 20:26:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.07698v1

A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection

Recent advancements in quality control across various industries have increasingly utilized the integration of video cameras and image processing for effective defect detection. A critical barrier to progress is the scarcity of comprehensive datasets featuring annotated defects, which are essential for developing and refining automated defect detection models. This systematic review, spanning from 2015 to 2023, identifies 15 publicly available datasets and critically examines them to assess their effectiveness and applicability for benchmarking and model development. Our findings reveal a diverse landscape of datasets, such as NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, and the Hollow Cylindrical Defect Detection Dataset, each with unique strengths and limitations in terms of image quality, defect type representation, and real-world applicability. The goal of this systematic review is to consolidate these datasets in a single location, providing researchers who seek such publicly available resources with a comprehensive reference.

Updated: 2024-06-11 20:14:59

标题: 一个以PRISMA为驱动的关于工业缺陷检测的公开数据集用于基准和模型发展的系统性审查

摘要: 近年来，各行各业的质量控制领域越来越多地利用视频摄像头和图像处理技术来实现有效的缺陷检测。进展的一个关键障碍是缺乏包含缺陷标注的综合数据集，这些数据集对于开发和完善自动缺陷检测模型至关重要。本系统性回顾涵盖了从2015年到2023年的时间范围，识别了15个公开可用的数据集，并对其进行了批判性审查，以评估它们在基准测试和模型开发方面的效果和适用性。我们的研究结果揭示了多样化的数据集，如NEU-CLS、NEU-DET、DAGM、KolektorSDD、PCB Defect Dataset和Hollow Cylindrical Defect Detection Dataset，每个数据集在图像质量、缺陷类型表示和现实世界适用性方面都有独特的优势和局限性。本系统性回顾的目标是将这些数据集整合到一个地方，为寻找这类公开资源的研究人员提供全面的参考。

更新时间: 2024-06-11 20:14:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.07694v1

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles

The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset.

Updated: 2024-06-11 20:14:22

标题: 一个用于YouTube、TikTok和其他来源关于2024年麻疹爆发的视频情感分析的标记数据集

摘要: 这篇论文的工作呈现了一个数据集，其中包含了在2024年1月1日至2024年5月31日期间在264个网站上发布的关于麻疹持续爆发的4011个视频的数据。该数据集可在https://dx.doi.org/10.21227/40s8-xf63 上找到。这些网站主要包括YouTube和TikTok，分别占48.6%和15.2%的视频。其余的网站包括Instagram和Facebook以及各种全球和本地新闻机构的网站。对于每个视频，数据集中都以单独的属性呈现视频的URL、帖子标题、帖子描述和视频发布日期。在开发了这个数据集之后，对视频标题和视频描述进行了情感分析（使用VADER）、主观性分析（使用TextBlob）和细粒度情感分析（使用DistilRoBERTa-base）。这包括将每个视频标题和视频描述分类为（i）情感类别之一，即积极、消极或中性，（ii）主观性类别之一，即高度主观、中性主观或最少主观，以及（iii）细粒度情感类别之一，即恐惧、惊讶、喜悦、悲伤、愤怒、厌恶或中性。这些结果作为数据集中的单独属性呈现，用于训练和测试机器学习算法，用于在这一领域进行情感分析或主观性分析以及其他应用。最后，这篇论文还提出了一系列可能使用该数据集进行研究的开放性研究问题的列表。

更新时间: 2024-06-11 20:14:22

领域: cs.CY,cs.AI,cs.CL,cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.07693v1

AI Radiologist: Revolutionizing Liver Tissue Segmentation with Convolutional Neural Networks and a Clinician-Friendly GUI

Artificial Intelligence (AI) is a pervasive research topic, permeating various sectors and applications. In this study, we harness the power of AI, specifically convolutional neural networks (ConvNets), for segmenting liver tissues. It also focuses on developing a user-friendly graphical user interface (GUI) tool, "AI Radiologist", enabling clinicians to effectively delineate different liver tissues (parenchyma, tumors, and vessels), thereby saving lives. This endeavor bridges the gap between academic research and practical, industrial applications. The GUI is a single-page application and is designed using the PyQt5 Python framework. The offline-available AI Radiologist resorts to three ConvNet models trained to segment all liver tissues. With respect to the Dice metric, the best liver ConvNet scores 98.16%, the best tumor ConvNet scores 65.95%, and the best vessel ConvNet scores 51.94%. It outputs 2D slices of the liver, tumors, and vessels, along with 3D interpolations in .obj and .mtl formats, which can be visualized/printed using any 3D-compatible software. Thus, the AI Radiologist offers a convenient tool for clinicians to perform liver tissue segmentation and 3D interpolation employing state-of-the-art models for tissues segmentation. With the provided capacity to select the volumes and pre-trained models, the clinicians can leave the rest to the AI Radiologist.

Updated: 2024-06-11 20:10:16

标题: AI放射科医师：利用卷积神经网络和临床医生友好的图形用户界面改革肝脏组织分割

摘要: 人工智能（AI）是一个普遍的研究课题，渗透到各个领域和应用中。在本研究中，我们利用AI的力量，特别是卷积神经网络（ConvNets），用于肝脏组织的分割。本研究还着重于开发一个用户友好的图形用户界面（GUI）工具，“AI放射科医师”，使临床医生能够有效地描绘不同的肝脏组织（实质、肿瘤和血管），从而挽救生命。这一努力弥合了学术研究与实际工业应用之间的鸿沟。该GUI是一个单页应用程序，使用PyQt5 Python框架设计而成。离线可用的AI放射科医师采用了三个ConvNet模型，经过训练可对所有肝脏组织进行分割。就Dice指标而言，最佳肝脏ConvNet得分为98.16％，最佳肿瘤ConvNet得分为65.95％，最佳血管ConvNet得分为51.94％。它输出肝脏、肿瘤和血管的2D切片，以及以.obj和.mtl格式的3D插值，可使用任何3D兼容软件进行可视化/打印。因此，AI放射科医师为临床医生提供了一个方便的工具，用于进行肝脏组织分割和3D插值，采用最先进的组织分割模型。临床医生可以选择容积和预训练模型，让AI放射科医师处理剩下的工作。

更新时间: 2024-06-11 20:10:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07688v1

Adversarial Machine Unlearning

This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models. Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat to determine whether a data instance was used for training. However, the two strands are intimately connected: one can view machine unlearning through the lens of MIA success with respect to removed data. Recognizing this connection, we propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms. Specifically, we model the unlearning problem as a Stackelberg game in which an unlearner strives to unlearn specific training data from a model, while an auditor employs MIAs to detect the traces of the ostensibly removed data. Adopting this adversarial perspective allows the utilization of new attack advancements, facilitating the design of unlearning algorithms. Our framework stands out in two ways. First, it takes an adversarial approach and proactively incorporates the attacks into the design of unlearning algorithms. Secondly, it uses implicit differentiation to obtain the gradients that limit the attacker's success, thus benefiting the process of unlearning. We present empirical results to demonstrate the effectiveness of the proposed approach for machine unlearning.

Updated: 2024-06-11 20:07:22

标题: 对抗性机器遗忘

摘要: 本文关注机器遗忘的挑战，旨在消除特定训练数据对机器学习模型的影响。传统上，遗忘算法的发展与成员推理攻击（MIA）并行进行，MIA是一种确定数据实例是否用于训练的隐私威胁。然而，这两个方面是密切相关的：人们可以通过MIA成功来看待机器遗忘，就其对已删除数据的影响而言。认识到这一联系，我们提出了一个将MIAs整合到遗忘算法设计中的博弈理论框架。具体地，我们将遗忘问题建模为一个Stackelberg博弈，在该博弈中，一个遗忘者努力从模型中遗忘特定的训练数据，而一名审计员则使用MIAs来检测表面上删除的数据的痕迹。采用这种对抗性视角允许利用新的攻击进展，促进遗忘算法的设计。我们的框架在两个方面脱颖而出。首先，它采取对抗性方法，并主动将攻击整合到遗忘算法的设计中。其次，它使用隐式微分来获取限制攻击者成功的梯度，从而有利于遗忘的过程。我们提供实证结果来展示所提出的机器遗忘方法的有效性。

更新时间: 2024-06-11 20:07:22

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.07687v1

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions

Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes decision-making. On the other hand, these models are still consistently making predictions that contradict users' or society's expectations, e.g., hallucinating, or discriminating. Thus, it is important that we develop test-time strategies to improve their trustworthiness. Inspired by prior work, we leverage causality as a tool to formally encode two aspects of trustworthiness in LLMs: fairness and robustness. Under this perspective, existing test-time solutions explicitly instructing the model to be fair or robust implicitly depend on the LLM's causal reasoning capabilities. In this work, we explore the opposite approach. Instead of explicitly asking the LLM for trustworthiness, we design prompts to encode the underlying causal inference algorithm that will, by construction, result in more trustworthy predictions. Concretely, we propose out-of-context prompting as a test-time solution to encourage fairness and robustness in LLMs. Out-of-context prompting leverages the user's prior knowledge of the task's causal model to apply (random) counterfactual transformations and improve the model's trustworthiness. Empirically, we show that out-of-context prompting consistently improves the fairness and robustness of frontier LLMs across five different benchmark datasets without requiring additional data, finetuning or pre-training.

Updated: 2024-06-11 20:05:15

标题: 脱离语境提示提升大型语言模型预测中的公平性和稳健性

摘要: 前沿大型语言模型（LLMs）越来越多地应用于重要决策中。另一方面，这些模型仍然经常做出与用户或社会期望相矛盾的预测，例如产生幻觉或歧视。因此，我们需要开发测试时间策略来提高它们的可信度。受先前工作启发，我们利用因果关系作为一种工具，正式编码LLMs中的两个可信度方面：公平性和鲁棒性。从这个角度来看，现有的测试时间解决方案明确要求模型公平或鲁棒，隐含地依赖于LLMs的因果推理能力。在这项工作中，我们探索了相反的方法。我们设计提示来编码潜在的因果推理算法，从而通过构造更可信的预测。具体来说，我们提出了脱离上下文提示作为测试时间解决方案，以鼓励LLMs中的公平性和鲁棒性。脱离上下文提示利用用户对任务因果模型的先验知识，应用（随机）反事实转换来提高模型的可信度。实证上，我们展示了脱离上下文提示在五个不同基准数据集上持续提高前沿LLMs的公平性和鲁棒性，而无需额外数据、微调或预训练。

更新时间: 2024-06-11 20:05:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07685v1

Impact of AI-tooling on the Engineering Workspace

To understand the impacts of AI-driven coding tools on engineers' workflow and work environment, we utilize the Jellyfish platform to analyze indicators of change. Key indicators are derived from Allocations, Coding Fraction vs. PR Fraction, Lifecycle Phases, Cycle Time, Jira ticket size, PR pickup time, PR comments, PR comment count, interactions, and coding languages. Significant changes were observed in coding time fractions among Copilot users, with an average decrease of 3% with individual decreases as large as 15%. Ticket sizes decreased by an average of 16% across four companies, accompanied by an 8% decrease in cycle times, whereas the control group showed no change. Additionally, the PR process evolved with Copilot usage, featuring longer and more comprehensive comments, despite the weekly number of PRs reviewed remaining constant. Not all hypothesized changes were observed across all participating companies. However, some companies experienced a decrease in PR pickup times by up to 33%, indicating reduced workflow bottlenecks, and one company experienced a shift of up to 17% of effort from maintenance and support work towards product growth initiatives. This study is the first to utilize data from more than one company and goes beyond simple productivity and satisfaction measures, considering real-world engineering settings instead. By doing so, we highlight that some companies seem to benefit more than others from the use of Copilot and that changes can be subtle when investigating aggregates rather than specific aspects of engineering work and workflows - something that will be further investigated in the future.

Updated: 2024-06-11 20:04:09

标题: 人工智能工具对工程工作空间的影响

摘要: 为了了解人工智能驱动的编码工具对工程师的工作流程和工作环境的影响，我们利用Jellyfish平台分析变化指标。关键指标来自分配、编码比例与PR比例、生命周期阶段、周期时间、Jira工单大小、PR接收时间、PR评论、PR评论数量、交互和编码语言。在Copilot用户中观察到编码时间分数的显著变化，平均减少了3%，个体减少高达15%。四家公司的工单大小平均减少了16%，伴随着8%的周期时间减少，而对照组没有变化。此外，随着Copilot的使用，PR流程得到改进，评论更长更全面，尽管每周审查的PR数量保持不变。并非所有假设的变化都在所有参与的公司中观察到。然而，一些公司的PR接收时间减少了高达33%，表明工作流程瓶颈减少，一家公司将高达17%的工作量从维护和支持工作转移到产品增长倡议上。这项研究是第一个利用超过一家公司的数据，并超越简单的生产力和满意度指标，考虑真实的工程环境。通过这样做，我们强调一些公司似乎比其他公司更受益于使用Copilot，而在调查工程工作和工作流程的总体而非特定方面时，变化可能是微妙的，这将在未来进一步研究。

更新时间: 2024-06-11 20:04:09

领域: cs.SE,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.07683v1

Semantic Similarity Loss for Neural Source Code Summarization

This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions.

Updated: 2024-06-11 19:57:56

标题: 神经源代码摘要的语义相似性损失

摘要: 本文提出了一种程序，并评估了将语义相似性度量作为神经源代码摘要的损失函数的方法。代码摘要是编写源代码的自然语言描述的任务。神经源代码摘要是指使用神经网络生成这些描述的自动化技术。几乎所有当前的方法都涉及将神经网络作为独立模型或作为预训练大型语言模型的一部分，例如GPT、Codex、LLaMA。然而，几乎所有方法都使用分类交叉熵（CCE）损失函数进行网络优化。CCE存在两个问题，即1）它逐个计算每个单词预测的损失，而不是评估整个句子，2）它需要完美预测，不留任何余地给同义词的部分得分。在本文中，我们扩展了我们先前关于语义相似性度量的工作，展示了使用语义相似性作为损失函数的程序来缓解这个问题，并在度量驱动和人类研究中对这个程序进行了评估。本质上，我们建议使用语义相似性度量来计算整个输出句子预测每个训练批次的损失，而不仅仅是每个单词的损失。我们还建议将我们的损失与每个单词的CCE结合起来，与基线相比，这简化了训练过程。我们在几个基线上评估了我们的方法，并报告在绝大多数条件下的改进。

更新时间: 2024-06-11 19:57:56

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2308.07429v2

Federated Representation Learning in the Under-Parameterized Regime

Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.

Updated: 2024-06-11 19:51:26

标题: 在欠参数化范式下的联合表示学习

摘要: 联邦表示学习（FRL）是一种流行的个性化联邦学习（FL）框架，其中客户端共同工作以训练一个共同的表示，同时保留他们的个性化头部。然而，现有研究主要集中在过度参数化的情况下。本文首次尝试在欠参数化的情况下研究FRL，即FL模型不足以表达所有地面真实模型的变化。我们提出了一种新颖的FRL算法FLUTE，并在欠参数化的情况下理论上表征了其线性模型的样本复杂度和收敛速度。据我们所知，这是第一个在这个领域具有可证明性能保证的FRL算法。FLUTE具有数据独立的随机初始化和精心设计的目标函数，有助于从不一致的局部表示中提炼出全局最优表示所跨越的子空间。在技术方面，我们将低秩矩阵逼近技术与FL分析联系起来，这可能具有广泛的兴趣。我们还将FLUTE扩展到线性表示之外。实验结果表明，FLUTE在合成和真实世界任务中优于最先进的FRL解决方案。

更新时间: 2024-06-11 19:51:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.04596v3

FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation

Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we introduce FastAST, a framework that integrates Token Merging (ToMe) into the AST framework. FastAST enhances inference speed without requiring extensive retraining by merging similar tokens in audio spectrograms. Furthermore, during training, FastAST brings about significant speed improvements. The experiments indicate that FastAST can increase audio classification throughput with minimal impact on accuracy. To mitigate the accuracy impact, we integrate Cross-Model Knowledge Distillation (CMKD) into the FastAST framework. Integrating ToMe and CMKD into AST results in improved accuracy compared to AST while maintaining faster inference speeds. FastAST represents a step towards real-time, resource-efficient audio analysis.

Updated: 2024-06-11 19:50:50

标题: FastAST：通过令牌合并和跨模型知识蒸馏加速音频频谱图变换器

摘要: 音频分类模型，尤其是音频频谱图变换器（AST），在高效的音频分析中起着至关重要的作用。然而，在不影响准确性的情况下优化它们的效率仍然是一个挑战。在本文中，我们介绍了FastAST，这是一个将Token Merging（ToMe）集成到AST框架中的框架。FastAST通过合并音频频谱图中相似的标记来提高推理速度，而无需进行大量的重新训练。此外，在训练过程中，FastAST带来了显著的速度改进。实验表明，FastAST可以提高音频分类的吞吐量，对准确性的影响最小。为了减少准确性的影响，我们将交叉模型知识蒸馏（CMKD）集成到FastAST框架中。将ToMe和CMKD集成到AST中会比AST提高准确性，同时保持更快的推理速度。FastAST代表了朝着实时、资源高效的音频分析迈出的一步。

更新时间: 2024-06-11 19:50:50

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS,68T10

下载: http://arxiv.org/abs/2406.07676v1

Larimar: Large Language Models with Episodic Memory Control

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar

Updated: 2024-06-11 19:50:17

标题: 蓝海石：具有情节记忆控制的大型语言模型

摘要: 大型语言模型（LLMs）中知识的高效准确更新是当今最紧迫的研究挑战之一。本文提出了Larimar - 一种新颖的、受大脑启发的架构，用于增强LLMs的分布式情节记忆。Larimar的记忆允许动态、一次性更新知识，无需进行计算昂贵的重新训练或微调。在多个事实编辑基准测试中的实验结果表明，Larimar在挑战性的顺序编辑设置中达到与大多数竞争基准相当的准确性，但在速度上也表现出色 - 根据基础LLM的不同，速度提升了8-10倍，同时由于所提出的架构简单、不依赖于LLM，因此具有灵活性和通用性。我们进一步提供了选择性事实遗忘、信息泄漏预防和输入上下文长度泛化的机制，并展示了它们的有效性。我们的代码可在https://github.com/IBM/larimar 上找到。

更新时间: 2024-06-11 19:50:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11901v2

Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement

Current methods for end-to-end constructive neural combinatorial optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. While behavior cloning is straightforward, it requires expensive expert solutions, and policy gradient methods are often computationally demanding and complex to fine-tune. In this work, we bridge the two and simplify the training process by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. To achieve progressively improving solutions with minimal sampling, we introduce a method that combines round-wise Stochastic Beam Search with an update strategy derived from a provable policy improvement. This strategy refines the policy between rounds by utilizing the advantage of the sampled sequences with almost no computational overhead. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.

Updated: 2024-06-11 19:48:33

标题: 神经组合优化的自我改进：无替换抽样，但改进

摘要: 目前用于端到端构造性神经组合优化的方法通常通过从专家解决方案进行行为克隆或使用强化学习中的策略梯度方法来训练策略。虽然行为克隆方法直接，但需要昂贵的专家解决方案，而策略梯度方法通常计算量大且难以微调。在这项工作中，我们将这两者联系起来，通过在每个时代使用当前模型对随机实例进行多次抽样来简化训练过程，然后选择最佳解决方案作为专家轨迹进行监督模仿学习。为了实现逐渐改进的解决方案，并减少抽样次数，我们引入了一种将轮次随机束搜索与从可证明策略改进中导出的更新策略相结合的方法。这种策略通过利用几乎没有计算开销的抽样序列的优势在轮次之间细化策略。我们在旅行商问题和容量车辆路径问题上评估了我们的方法。使用我们的方法训练的模型实现了与使用专家数据训练的模型相当的性能和泛化能力。此外，我们使用基于变压器的架构将我们的方法应用于作业车间调度问题，并大幅超越现有的最先进方法。

更新时间: 2024-06-11 19:48:33

领域: cs.LG

下载: http://arxiv.org/abs/2403.15180v2

BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes

Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.

Updated: 2024-06-11 19:34:19

标题: BCAmirs参加SemEval-2024任务4：超越词语：多模态和多语言探索迷因中的说服力

摘要: 梗文化常常使用隐喻来传达有说服力的信息，塑造公众舆论。受此启发，我们团队参与了SemEval-2024任务4，这是一个层次化多标签分类任务，旨在识别植入梗文中的修辞和心理说服技巧。为了解决这个问题，我们引入了一个标题生成步骤来评估模态差距和来自图片的额外语义信息的影响，从而提高了我们的结果。我们最好的模型利用了由GPT-4生成的标题以及梗文文本来微调RoBERTa作为文本编码器和CLIP作为图像编码器。在所有12个子任务中，它的表现都远远超过了基线。特别是，在子任务2a中，它在所有语言中排名前三，在子任务2b中排名前四，从数量上证明了强大的性能。引入的中间步骤所取得的进展可能归因于挑战视觉编码器的图像的隐喻本质。这突显了提高抽象视觉语义编码的潜力。

更新时间: 2024-06-11 19:34:19

领域: cs.CL,cs.CV,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.03022v2

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in Risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated \totalscenarios unique scenarios leading to \totalsamples representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other three human experts to review the models and the former human expert's analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. HEs demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs for an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.

Updated: 2024-06-11 19:20:27

标题: 超越言语：关于大型语言模型在使命关键风险分析中的可操作性

摘要: 背景。风险分析评估特定情景中潜在风险。风险分析原则是无关背景的；相同方法可以应用于涉及健康和信息技术安全的风险。风险分析需要对国家和国际法规和标准有广泛的了解，并且需要耗费时间和精力。一个大型语言模型可以比人类更快地总结信息，并可以针对特定任务进行优化。目的。我们的实证研究旨在调查检索增强生成和经过优化的LLM在风险分析中的有效性。据我们所知，以前没有研究探讨过它在风险分析中的能力。方法。我们手动策划了\totalscenarios个独特的情景，涉及过去五年工业背景团队存档的50多个关键任务分析中的\totalsamples个代表样本。我们比较了基础的GPT-3.5和GPT-4模型与它们的检索增强生成和经过优化的对应模型。我们雇佣两名人类专家作为模型的竞争者，另外三名人类专家审查模型和前述人类专家的分析。审阅者分析了5,000个情景分析。结果和结论。人类专家表现出更高的准确性，但LLM更快速和可操作。此外，我们的研究结果显示，RAG辅助的LLM具有最低的幻觉率，有效地揭示了隐藏的风险，并补充了人类专业知识。因此，模型的选择取决于具体需求，FTM用于准确性，RAG用于发现隐藏风险，基础模型用于全面性和可操作性。因此，专家可以在压缩的时间框架内利用LLM作为风险分析中有效的辅助伴侣。他们还可以通过避免实施不必要的对策而节省成本。

更新时间: 2024-06-11 19:20:27

领域: cs.CL,cs.AI,cs.CR,cs.HC

下载: http://arxiv.org/abs/2406.10273v1

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.

Updated: 2024-06-11 19:08:58

标题: 浅层ReLU类神经网络的损失景观：稳定点、鞍点逃逸和网络嵌入

摘要: 在本文中，我们研究了使用ReLU-like激活函数训练的具有一个隐藏层的神经网络的损失景观，该网络使用经验平方损失。由于激活函数是不可微的，目前还不清楚如何完全表征稳定点。我们提出了适用于不可微和可微情况的稳定性条件。此外，我们展示了，如果一个稳定点不包含“逃逸神经元”，这些逃逸神经元是用一阶条件定义的，那么它必须是一个局部最小值。此外，对于标量输出情况，逃逸神经元的存在保证了稳定点不是局部最小值。我们的结果完善了从微小（消失）初始化开始的浅ReLU-like网络的鞍点到鞍点训练过程的描述，直接将鞍点逃逸与逃逸神经元的参数变化联系在一起。此外，我们还能够充分讨论网络嵌入如何重新塑造稳定点，即在更宽的网络中实例化一个较窄的网络。

更新时间: 2024-06-11 19:08:58

领域: cs.LG

下载: http://arxiv.org/abs/2402.05626v4

Progress Towards Decoding Visual Imagery via fNIRS

We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.

Updated: 2024-06-11 19:08:32

标题: 通过fNIRS解码视觉想象的进展

摘要: 我们展示了从fNIRS脑活动重建图像的可能性，并开始构建原型以匹配所需规格。通过在降采样的fMRI数据上训练图像重建模型，我们发现厘米级空间分辨率足以生成图像。与完整分辨率fMRI的93%相比，我们在1厘米分辨率下获得了71%的检索准确度，而在2厘米分辨率下只有20%。通过模拟和高密度断层扫描，我们发现时间域fNIRS可以实现1厘米分辨率，而连续波fNIRS只能实现2厘米分辨率。最后，我们分享了一个原型时间域fNIRS设备的设计，包括激光驱动器、单光子探测器和时间到数字转换器系统。

更新时间: 2024-06-11 19:08:32

领域: eess.IV,cs.AI,cs.CV,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.07662v1

Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees

Probabilistic prediction aims to compute predictive distributions rather than single-point predictions. These distributions enable practitioners to quantify uncertainty, compute risk, and detect outliers. However, most probabilistic methods assume parametric responses, such as Gaussian or Poisson distributions. When these assumptions fail, such models lead to bad predictions and poorly calibrated uncertainty. In this paper, we propose Treeffuser, an easy-to-use method for probabilistic prediction on tabular data. The idea is to learn a conditional diffusion model where the score function is estimated using gradient-boosted trees. The conditional diffusion model makes Treeffuser flexible and non-parametric, while the gradient-boosted trees make it robust and easy to train on CPUs. Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks -- including those with multivariate, multimodal, and skewed responses. % , as well as categorical predictors and missing data We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better-calibrated probabilistic predictions. We further demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart. We implement Treeffuser in \href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser}.

Updated: 2024-06-11 18:59:24

标题: Treeffuser：通过梯度提升树实现的条件扩散概率预测

摘要: 概率预测旨在计算预测分布，而不是单点预测。这些分布使从业者能够量化不确定性，计算风险并检测异常值。然而，大多数概率方法假设参数响应，如高斯或泊松分布。当这些假设失败时，这些模型会导致糟糕的预测和不良校准的不确定性。在本文中，我们提出了Treeffuser，一种用于表格数据的概率预测的易于使用的方法。其思想是学习一个条件扩散模型，其中评分函数使用梯度提升树来估计。条件扩散模型使Treeffuser灵活且非参数化，而梯度提升树使其稳健且易于在CPU上训练。Treeffuser学习了校准良好的预测分布，并可以处理各种回归任务，包括具有多元，多模和偏态响应的任务。我们在合成和真实数据上研究了Treeffuser，并展示其优于现有方法，提供了更好校准的概率预测。我们进一步展示了其多功能性，应用于使用沃尔玛销售数据进行库存分配的不确定性。我们在\href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser}中实现了Treeffuser。

更新时间: 2024-06-11 18:59:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07658v1

OPTune: Efficient Online Preference Tuning

Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is costly, hard to parallelize, and suffers from varying quality and utility. In this paper, we propose a more efficient data exploration strategy for online preference tuning (OPTune), which does not rely on human-curated or pre-collected teacher responses but dynamically samples informative responses for on-policy preference alignment. During data generation, OPTune only selects prompts whose (re)generated responses can potentially provide more informative and higher-quality training signals than the existing responses. In the training objective, OPTune reweights each generated response (pair) by its utility in improving the alignment so that learning can be focused on the most helpful samples. Throughout our evaluations, OPTune'd LLMs maintain the instruction-following benefits provided by standard preference tuning whilst enjoying 1.27-1.56x faster training speed due to the efficient data exploration strategy.

Updated: 2024-06-11 18:55:04

标题: OPTune：高效在线偏好调整

摘要: 人类反馈强化学习（RLHF）对于将大型语言模型（LLMs）与人类偏好进行对齐至关重要。与广泛研究的RLHF离线版本相比，例如直接偏好优化（DPO），最近的研究表明在线变体实现了更好的对齐。然而，在线对齐需要实时生成新的训练数据，这是昂贵的、难以并行化的，并且受到质量和效用的变化影响。在本文中，我们提出了一种更有效的数据探索策略，用于在线偏好调优（OPTune），它不依赖于人为策划或预先收集的教师反馈，而是动态地采样信息量大的响应以进行策略偏好对齐。在数据生成过程中，OPTune仅选择那些（重新）生成的响应可能提供比现有响应更具信息量和更高质量的训练信号的提示。在训练目标中，OPTune通过其用于改善对齐的效用重新对每个生成的响应（对）加权，从而使学习能够专注于最有帮助的样本。在我们的评估中，OPTune的LLMs保持了标准偏好调优提供的遵循指示的益处，同时由于有效的数据探索策略，训练速度提高了1.27-1.56倍。

更新时间: 2024-06-11 18:55:04

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.07657v1

FP-Inconsistent: Detecting Evasive Bots using Browser Fingerprint Inconsistencies

As browser fingerprinting is increasingly being used for bot detection, bots have started altering their fingerprints for evasion. We conduct the first large-scale evaluation of evasive bots to investigate whether and how altering fingerprints helps bots evade detection. To systematically investigate evasive bots, we deploy a honey site incorporating two anti-bot services (DataDome and BotD) and solicit bot traffic from 20 different bot services that purport to sell "realistic and undetectable traffic". Across half a million requests from 20 different bot services on our honey site, we find an average evasion rate of 52.93% against DataDome and 44.56% evasion rate against BotD. Our comparison of fingerprint attributes from bot services that evade each anti-bot service individually as well as bot services that evade both shows that bot services indeed alter different browser fingerprint attributes for evasion. Further, our analysis reveals the presence of inconsistent fingerprint attributes in evasive bots. Given evasive bots seem to have difficulty in ensuring consistency in their fingerprint attributes, we propose a data-driven approach to discover rules to detect such inconsistencies across space (two attributes in a given browser fingerprint) and time (a single attribute at two different points in time). These rules, which can be readily deployed by anti-bot services, reduce the evasion rate of evasive bots against DataDome and BotD by 48.11% and 44.95% respectively.

Updated: 2024-06-11 18:26:17

标题: FP-Inconsistent：通过浏览器指纹不一致性检测规避机器人

摘要: 随着浏览器指纹识别越来越多地用于机器人检测，机器人开始改变他们的指纹以逃避检测。我们进行了首次大规模评估逃避性机器人，以调查改变指纹是否帮助机器人逃避检测，以及如何帮助他们逃避检测。为了系统地研究逃避性机器人，我们部署了一个蜜罐站点，结合了两种反机器人服务（DataDome和BotD），并从声称销售“逼真和不可检测流量”的20个不同机器人服务中获取机器人流量。在我们的蜜罐站点上来自20个不同机器人服务的50万个请求中，我们发现对DataDome有52.93%的平均逃避率，对BotD有44.56%的逃避率。我们比较了从分别逃避每个反机器人服务的机器人服务以及逃避两者的机器人服务中提供的指纹属性，结果显示，机器人服务确实改变了不同的浏览器指纹属性以逃避检测。此外，我们的分析揭示了逃避性机器人中存在不一致的指纹属性。考虑到逃避性机器人似乎难以确保其指纹属性的一致性，我们提出了一种数据驱动的方法，用于发现跨空间（给定浏览器指纹中的两个属性）和时间（两个不同时间点上的单个属性）的这种不一致性。这些规则，可以被反机器人服务轻松部署，能够分别将逃避性机器人对DataDome和BotD的逃避率分别降低48.11%和44.95%。

更新时间: 2024-06-11 18:26:17

领域: cs.CR

下载: http://arxiv.org/abs/2406.07647v1

Pre-training Feature Guided Diffusion Model for Speech Enhancement

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances practical deployment capabilities, without increasing computational demands.

Updated: 2024-06-11 18:22:59

标题: 使用预训练特征引导扩散模型进行语音增强

摘要: 语音增强显著提高了在嘈杂环境中的语音清晰度和可理解性，改善了沟通和听力体验。本文介绍了一种新颖的预训练特征引导扩散模型，专为高效的语音增强而设计，解决了现有辨别和生成模型的局限性。通过将频谱特征整合到变分自动编码器（VAE）中，并在逆过程中利用预训练特征进行引导，再结合利用确定性离散积分方法（DDIM）来简化采样步骤，我们的模型提高了效率和语音增强质量。在两个具有不同信噪比的公共数据集上展示出最先进的结果，我们的模型在效率和鲁棒性方面胜过其他基线。所提出的方法不仅优化了性能，还增强了实际部署能力，而不增加计算需求。

更新时间: 2024-06-11 18:22:59

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.07646v1

Can Transformers Learn Optimal Filtering for Unknown Systems?

Transformer models have shown great success in natural language processing; however, their potential remains mostly unexplored for dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics. Empirically, the trained transformer adapts exceedingly well to different unseen systems and even matches the optimal performance given by the Kalman filter for linear systems. In more complex settings with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters, transformers also demonstrate promising results. To support our experimental findings, we provide statistical guarantees that quantify the amount of training data required for the transformer to achieve a desired excess risk. Finally, we point out some limitations by identifying two classes of problems that lead to degraded performance, highlighting the need for caution when using transformers for control and estimation.

Updated: 2024-06-11 18:18:55

标题: 变压器可以学习未知系统的最优滤波吗？

摘要: Transformer模型在自然语言处理领域取得了巨大成功；然而，在动态系统领域，它们的潜力大多尚未被探索。在这项工作中，我们使用Transformer模型研究了最优输出估计问题，该模型使用过去的所有输出预测未来的输出。特别地，我们使用各种不同的系统训练Transformer模型，然后在具有未知动态的未见系统上评估其性能。实证结果表明，经过训练的Transformer模型对不同的未见系统适应非常良好，甚至与Kalman滤波器在线性系统上的最优性能相匹配。在具有非独立同分布噪声、时变动态和未知参数的四旋翼系统等非线性动态的更复杂环境中，Transformer模型也展现出有希望的结果。为了支持我们的实验发现，我们提供了统计保证，量化了Transformer模型实现所需风险的训练数据量。最后，我们指出了一些局限性，通过确定导致性能下降的两类问题，强调了在控制和估计中使用Transformer模型时需要谨慎的必要性。

更新时间: 2024-06-11 18:18:55

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2308.08536v3

Generating Human Understandable Explanations for Node Embeddings

Node embedding algorithms produce low-dimensional latent representations of nodes in a graph. These embeddings are often used for downstream tasks, such as node classification and link prediction. In this paper, we investigate the following two questions: (Q1) Can we explain each embedding dimension with human-understandable graph features (e.g. degree, clustering coefficient and PageRank). (Q2) How can we modify existing node embedding algorithms to produce embeddings that can be easily explained by human-understandable graph features? We find that the answer to Q1 is yes and introduce a new framework called XM (short for eXplain eMbedding) to answer Q2. A key aspect of XM involves minimizing the nuclear norm of the generated explanations. We show that by minimizing the nuclear norm, we minimize the lower bound on the entropy of the generated explanations. We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability.

Updated: 2024-06-11 18:16:28

标题: 生成节点嵌入的人类可理解解释

摘要: 节点嵌入算法生成图中节点的低维潜在表示。这些嵌入通常用于下游任务，如节点分类和链接预测。在本文中，我们研究以下两个问题：(Q1)我们是否可以用人类可理解的图特征（如度、聚类系数和PageRank）解释每个嵌入维度。(Q2)我们如何修改现有节点嵌入算法，以生成可以轻松解释的嵌入？我们发现Q1的答案是肯定的，并引入了一个称为XM（代表eXplain eMbedding）的新框架来回答Q2。XM的一个关键方面涉及最小化生成解释的核范数。我们表明，通过最小化核范数，我们最小化了生成解释的熵下界。我们在各种真实世界的图上测试了XM，并展示了XM不仅保留了现有节点嵌入方法的性能，而且增强了它们的可解释性。

更新时间: 2024-06-11 18:16:28

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.07642v1

When is an Embedding Model More Promising than Another?

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

Updated: 2024-06-11 18:13:46

标题: 何时嵌入模型比另一个更有前途？

摘要: Embedders在机器学习中发挥着核心作用，将任何对象投影为数字表示，这些数字表示可以进一步利用来执行各种下游任务。评估嵌入模型通常取决于使用下游任务的领域特定经验方法，主要是因为缺乏用于比较的标准化框架。然而，获取足够大且具代表性的数据集来进行这些评估并不总是可行的，并且可能会证明是极其昂贵和耗时的。在本文中，我们提出了一种统一的方法来评估embedders。首先，我们建立了用于比较嵌入模型的理论基础，借鉴了充分性和信息性的概念。然后，我们利用这些概念设计了一个可行的比较标准（信息充分性），从而导致了一个与任务无关且自监督的排名程序。我们在实验证明，我们的方法与嵌入模型促进自然语言处理和分子生物学中各种下游任务的能力密切相关。这有效地为从业者提供了一个有价值的工具，用于优先考虑模型试验。

更新时间: 2024-06-11 18:13:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07640v1

Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments

This study proposes a method for knowledge distillation (KD) of fine-tuned Large Language Models (LLMs) into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 3% and 2% higher scoring accuracy than ANN and TinyBERT, respectively, and comparable accuracy to the teacher model. Furthermore, the student model size is 0.03M, 4,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

Updated: 2024-06-11 18:01:09

标题: LLM知识蒸馏用于科学教育评估的自动评分

摘要: 这项研究提出了一种知识蒸馏（KD）的方法，将经过微调的大型语言模型（LLMs）转化为更小、更高效和更准确的神经网络。我们特别关注将这些模型部署在资源受限设备上的挑战。我们的方法涉及使用LLM的预测概率（作为软标签）训练较小的学生模型（神经网络），LLM作为教师模型。通过一个专门设计的损失函数，根据LLM的输出概率进行学习，确保学生模型与教师的表现密切相似。为了验证KD方法的性能，我们利用了一个包含6,684个学生撰写的科学问题响应和三个数学推理数据集的大型数据集7T，这些数据集的学生撰写的响应由人类专家评分。我们将准确性与最新的蒸馏模型TinyBERT和人工神经网络（ANN）模型进行了比较。结果表明，KD方法的得分准确性比ANN和TinyBERT分别高出3%和2%，与教师模型的准确性相当。此外，学生模型的大小为0.03M，参数比教师模型和TinyBERT小4,000倍，并且推理速度比教师模型和TinyBERT快10倍。这项研究的重要性在于它有潜力使先进的人工智能技术在典型的教育环境中更易获得，特别是用于自动评分。

更新时间: 2024-06-11 18:01:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.15842v3

A Geometric Explanation of the Likelihood OOD Detection Paradox

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.

Updated: 2024-06-11 18:00:00

标题: 一个关于可能性OOD检测悖论的几何解释

摘要: 基于似然的深度生成模型（DGMs）通常表现出令人困惑的行为：当在相对复杂的数据集上训练时，它们将更高的似然值分配给来自更简单来源的离分布（OOD）数据。更令人费解的是，尽管具有更高的似然值，但这些DGMs从未生成OOD样本。这个双重悖论尚未得到确定性解释，使基于似然的OOD检测不可靠。我们的主要观察是，如果高似然值区域包含极小的概率质量，则不会生成它们。我们展示了在局部低维流形上的数据周围可能出现大密度但低概率质量的矛盾情况。我们还展示了通过局部固有维数（LID）估计可以识别这种情况，并提出了一种基于预训练DGM获得的似然值和LID估计进行OOD检测的方法。我们的方法可以应用于归一化流和基于分数的扩散模型，并获得与或超过使用相同DGM骨干的最新OOD检测基准相匹配的结果。我们的代码可在https://github.com/layer6ai-labs/dgm_ood_detection 上获得。

更新时间: 2024-06-11 18:00:00

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2403.18910v2

Image and Video Tokenization with Binary Spherical Quantization

We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). BSQ projects the high-dimensional visual embedding to a lower-dimensional hypersphere and then applies binary quantization. BSQ is (1) parameter-efficient without an explicit codebook, (2) scalable to arbitrary token dimensions, and (3) compact: compressing visual data by up to 100$\times$ with minimal distortion. Our tokenizer uses a transformer encoder and decoder with simple block-wise causal masking to support variable-length videos as input. The resulting BSQ-ViT achieves state-of-the-art visual reconstruction quality on image and video reconstruction benchmarks with 2.4$\times$ throughput compared to the best prior methods. Furthermore, by learning an autoregressive prior for adaptive arithmetic coding, BSQ-ViT achieves comparable results on video compression with state-of-the-art video compression standards. BSQ-ViT also enables masked language models to achieve competitive image synthesis quality to GAN- and diffusion-based methods.

Updated: 2024-06-11 17:59:53

标题: 用二进制球形量化进行图像和视频的令牌化

摘要: 我们提出了一种新的基于变压器的图像和视频分词器，具有二进制球面量化（BSQ）。BSQ将高维视觉嵌入投影到较低维度的超球体上，然后应用二进制量化。BSQ具有以下特点：（1）参数高效，无需显式码本；（2）可扩展到任意令牌维度；（3）紧凑：将视觉数据压缩最多可达100倍，且失真最小。我们的分词器使用变压器编码器和解码器，通过简单的块状因果屏蔽支持可变长度的视频作为输入。结果BSQ-ViT在图像和视频重建基准上实现了最先进的视觉重建质量，与最佳先前方法相比，吞吐量提高了2.4倍。此外，通过学习自回归先验进行自适应算术编码，BSQ-ViT在视频压缩上达到了与最先进视频压缩标准相媲美的结果。BSQ-ViT还使得遮蔽语言模型在图像合成质量上达到了与GAN和扩散方法竞争力相当的水平。

更新时间: 2024-06-11 17:59:53

领域: cs.CV,cs.IT,cs.LG,eess.IV,math.IT

下载: http://arxiv.org/abs/2406.07548v1

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I models can conduct visual-commonsense reasoning, e.g. produce images that fit "the lightbulb is unlit" vs. "the lightbulb is lit" correspondingly. Commonsense-T2I presents an adversarial challenge, providing pairwise text prompts along with expected outputs. The dataset is carefully hand-curated by experts and annotated with fine-grained labels, such as commonsense type and likelihood of the expected outputs, to assist analyzing model behavior. We benchmark a variety of state-of-the-art (sota) T2I models and surprisingly find that, there is still a large gap between image synthesis and real life photos--even the DALL-E 3 model could only achieve 48.92% on Commonsense-T2I, and the stable diffusion XL model only achieves 24.92% accuracy. Our experiments show that GPT-enriched prompts cannot solve this challenge, and we include a detailed analysis about possible reasons for such deficiency. We aim for Commonsense-T2I to serve as a high-quality evaluation benchmark for T2I commonsense checking, fostering advancements in real life image generation.

Updated: 2024-06-11 17:59:48

标题: Commonsense-T2I挑战：文本到图像生成模型能理解常识吗？

摘要: 我们提出了一个新颖的任务和基准，用于评估文本到图像（T2I）生成模型产生符合现实生活常识的图像的能力，我们称之为Commonsense-T2I。给定两个包含相同动作词集合但有细微差异的对抗性文本提示，比如“没有电的灯泡”与“有电的灯泡”，我们评估T2I模型是否能进行视觉常识推理，比如产生符合“灯泡未点亮”与“灯泡点亮”对应的图像。Commonsense-T2I提出了一个对抗性挑战，提供成对的文本提示以及期望的输出。数据集由专家精心筛选并标注了细粒度标签，比如常识类型和期望输出的可能性，以帮助分析模型行为。我们对多种最先进的T2I模型进行了基准测试，令人惊讶的是，图像合成与现实生活照片之间仍然存在很大差距——即使是DALL-E 3模型在Commonsense-T2I上也只能达到48.92%的准确率，而稳定的扩散XL模型只能达到24.92%的准确率。我们的实验表明，GPT增强的提示无法解决这一挑战，我们对可能导致此类不足的原因进行了详细分析。我们希望Commonsense-T2I能够作为T2I常识检查的高质量评估基准，促进现实生活图像生成的进步。

更新时间: 2024-06-11 17:59:48

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07546v1

Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

Multiple-choice questions (MCQ) are frequently used to assess large language models (LLMs). Typically, an LLM is given a question and selects the answer deemed most probable after adjustments for factors like length. Unfortunately, LLMs may inherently favor certain answer choice IDs, such as A/B/C/D, due to inherent biases of priori unbalanced probabilities, influencing the prediction of answers based on these IDs. Previous research has introduced methods to reduce this ''selection bias'' by simply permutating options on a few test samples and applying to new ones. Another problem of MCQ is the lottery ticket choice by ''random guessing''. The LLM does not learn particular knowledge, but the option is guessed correctly. This situation is especially serious for those small-scale LLMs. To address them, a more thorough approach involves shifting from MCQ to open-style questions, which can fundamentally eliminate selection bias and random guessing issues. However, transitioning causes its own set of challenges in (1) identifying suitable open-style questions and (2) validating the correctness of LLM open-style responses against human-annotated ground-truths. This work aims to tackle these significant difficulties, and establish a new LLM evaluation benchmark through entirely open-style questions. Consequently, we introduce the Open-LLM-Leaderboard to track various LLMs' performance and reflect true capability of them, such as GPT-4o/4/3.5, Claude 3, Gemini, etc. Our code and dataset are available at https://github.com/VILA-Lab/Open-LLM-Leaderboard.

Updated: 2024-06-11 17:59:47

标题: Open-LLM-Leaderboard：从多选题到开放式问题，用于LLM评估、基准测试和竞技场

摘要: 多项选择题（MCQ）经常用于评估大型语言模型（LLMs）。通常，LLM会被给出一个问题，并在考虑长度等因素后选择最有可能的答案。不幸的是，LLM可能会固有地偏向某些答案选择ID，比如A/B/C/D，这是由于先验不平衡概率的固有偏见，影响基于这些ID的答案预测。先前的研究引入了一些方法来通过简单地在一些测试样本上对选项进行排列并应用到新样本中来减少这种“选择偏见”。MCQ的另一个问题是通过“随机猜测”进行抽奖选择。LLM不会学习特定知识，但选项会被猜中。这种情况对于那些规模较小的LLM尤为严重。为了解决这些问题，一个更彻底的方法包括从MCQ转向开放式问题，这可以从根本上消除选择偏见和随机猜测问题。然而，过渡会带来一系列挑战，包括（1）确定适合的开放式问题和（2）验证LLM开放式响应的正确性与人工标注的地面真相。本文旨在解决这些重要困难，并通过完全开放式问题建立一个新的LLM评估基准。因此，我们引入了Open-LLM-Leaderboard来跟踪各种LLM的表现，并反映它们的真正能力，如GPT-4o/4/3.5、Claude 3、Gemini等。我们的代码和数据集可以在https://github.com/VILA-Lab/Open-LLM-Leaderboard 上找到。

更新时间: 2024-06-11 17:59:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07545v1

Situational Awareness Matters in 3D Vision Language Reasoning

Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on a language prompt. (2) The agent answers open-ended questions from the perspective of its calculated position. To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning. We tokenize the 3D scene into sparse voxel representation and propose a language-grounded situation estimator, followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets show that SIG3D outperforms state-of-the-art models in situation estimation and question answering by a large margin (e.g., an enhancement of over 30% on situation estimation accuracy). Subsequent analysis corroborates our architectural design choices, explores the distinct functions of visual and textual tokens, and highlights the importance of situational awareness in the domain of 3D question answering.

Updated: 2024-06-11 17:59:45

标题: Situation Awareness在3D视觉语言推理中的重要性

摘要: 在开发家庭机器人和以人为中心的具身人工智能方面，能够在3D空间中执行复杂的视觉语言推理任务代表了一个重要的里程碑。在这项工作中，我们展示了3D视觉语言推理中一个关键且独特的挑战是情境意识，其中包括两个关键组成部分：（1）自主代理根据语言提示确定自身位置。（2）代理从其计算出的位置的视角回答开放式问题。为了解决这一挑战，我们引入了SIG3D，一个端到端的用于3D视觉语言推理的情境基础模型。我们将3D场景标记为稀疏体素表示，并提出了一个语言基础的情境估计器，随后是一个情境化的问题回答模块。在SQA3D和ScanQA数据集上的实验表明，SIG3D在情境估计和问题回答方面的性能远远超过了最先进的模型（例如，在情境估计准确性方面提高了超过30%）。随后的分析证实了我们的架构设计选择，探讨了视觉和文本标记的不同功能，并强调了情境意识在3D问题回答领域的重要性。

更新时间: 2024-06-11 17:59:45

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.07544v1

Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis

Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which comprises audio recordings of clinical interviews. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Subsequently, the model extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results. Our approach involves in-depth research to implement various features obtained from the proposed modalities.

Updated: 2024-06-11 17:59:31

标题: 跨语言认知洞察：增强多模态访谈分析

摘要: 认知衰退是随着个体年龄增长而发生的自然过程。对异常衰退的早期诊断对于启动专业治疗至关重要，可以提高受影响个体的生活质量。为了解决这个问题，我们提出了一个多模态模型，能够预测轻度认知障碍和认知分数。我们使用TAUKADIAL数据集进行评估，该数据集包括临床访谈的音频记录。所提出的模型展示了转录和区分访谈中使用的语言的能力。随后，该模型提取音频和文本特征，将它们组合成一个多模态架构，以实现稳健和泛化的结果。我们的方法涉及深入研究，以实现从所提出的模态获得的各种特征的实施。

更新时间: 2024-06-11 17:59:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07542v1

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions. We decouple the conservatism constraints from the policy, thus can benefit wide offline RL algorithms. As a consequence, we propose the Conservative Denoising Score-based Algorithm (CDSA) which utilizes the denoising score-based model to model the gradient of the dataset density, rather than the dataset density itself, and facilitates a more accurate and efficient method to adjust the action generated by the pre-trained policy in a deterministic and continuous MDP environment. In experiments, we show that our approach significantly improves the performance of baseline algorithms in D4RL datasets, and demonstrate the generalizability and plug-and-play capability of our model across different pre-trained offline RL policy in different tasks. We also validate that the agent exhibits greater risk aversion after employing our method while showcasing its ability to generalize effectively across diverse tasks.

Updated: 2024-06-11 17:59:29

标题: CDSA：离线强化学习的保守去噪基于评分的算法

摘要: 分布转移是离线强化学习中的一个主要障碍，需要最小化学习策略和行为策略之间的差异，以避免高估罕见或未见行为。先前的保守型离线RL算法在泛化到未见行为时遇到困难，尽管它们在学习良好的分布策略方面取得了成功。相反，我们提出使用从预训练的离线RL算法生成的数据集密度的梯度场来调整原始行为。我们将保守性约束与策略解耦，因此可以使广泛的离线RL算法受益。因此，我们提出了基于保守去噪分数的算法（CDSA），该算法利用去噪分数模型来建模数据集密度的梯度，而不是数据集密度本身，并促进了一种更准确和高效的方法，在确定性和连续的MDP环境中调整预训练策略生成的行为。在实验中，我们展示了我们的方法显著改进了D4RL数据集中基准算法的性能，并展示了我们的模型在不同任务中跨不同预训练离线RL策略的泛化能力和即插即用能力。我们还验证了在采用我们的方法后，代理人表现出更大的风险规避能力，同时展示了其在各种任务中有效泛化的能力。

更新时间: 2024-06-11 17:59:29

领域: cs.LG

下载: http://arxiv.org/abs/2406.07541v1

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Recent controllable generation approaches such as FreeControl and Diffusion Self-guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexibility and use. This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance. Ctrl-X designs feed-forward structure control to enable the structure alignment with a structure image and semantic-aware appearance transfer to facilitate the appearance transfer from a user-input image. Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. In particular, Ctrl-X supports novel structure and appearance control with arbitrary condition images of any modality, exhibits superior image quality and appearance transfer compared to existing works, and provides instant plug-and-play functionality to any T2I and text-to-video (T2V) diffusion model. See our project page for an overview of the results: https://genforce.github.io/ctrl-x

Updated: 2024-06-11 17:59:01

标题: Ctrl-X：在没有指导的情况下控制文本到图像生成的结构和外观

摘要: 最近的可控生成方法，如FreeControl和Diffusion Self-guidance，为文本到图像（T2I）扩散模型带来了精细的空间和外观控制，而无需训练辅助模块。然而，这些方法通过更长的扩散步骤优化每种得分函数的潜在嵌入，使生成过程耗时，并限制了它们的灵活性和使用。本文介绍了Ctrl-X，一个简单的框架，用于T2I扩散控制结构和外观，无需额外的训练或指导。Ctrl-X设计了前馈结构控制，以实现与结构图像的结构对齐，并实现语义感知的外观转移，以促进从用户输入图像的外观转移。广泛的定性和定量实验表明，Ctrl-X在各种条件输入和模型检查点上表现出优越的性能。特别是，Ctrl-X支持任何模态的任意条件图像的新颖结构和外观控制，与现有作品相比，具有更高的图像质量和外观转移，并为任何T2I和文本到视频（T2V）扩散模型提供即插即用的功能。请参阅我们的项目页面以获取结果概览：https://genforce.github.io/ctrl-x

更新时间: 2024-06-11 17:59:01

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.07540v1

Methods for Recovering Conditional Independence Graphs: A Survey

Conditional Independence (CI) graphs are a type of probabilistic graphical models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list out different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes.

Updated: 2024-06-11 17:58:07

标题: 恢复条件独立图的方法：综述

摘要: 条件独立（CI）图是一种概率图模型，主要用于获取特征关系的见解。每个边代表连接特征之间的偏相关性，提供了它们直接依赖关系的信息。在本调查中，我们列出了不同的方法，并研究了用于恢复CI图的技术进展。我们涵盖了传统的优化方法以及最近开发的深度学习架构，以及它们的推荐实现。为了促进更广泛的采用，我们包括了巩固相关操作的初步内容，例如获取混合数据类型的协方差矩阵的技术。

更新时间: 2024-06-11 17:58:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.06829v2

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

The advancement of deep learning technologies is bringing new models every day, motivating the study of scalable model selection. An ideal model selection scheme should minimally support two operations efficiently over a large pool of candidate models: update, which involves either adding a new candidate model or removing an existing candidate model, and selection, which involves locating highly performing models for a given task. However, previous solutions to model selection require high computational complexity for at least one of these two operations. In this work, we target fundamentally (more) scalable model selection that supports asymptotically fast update and asymptotically fast selection at the same time. Firstly, we define isolated model embedding, a family of model selection schemes supporting asymptotically fast update and selection: With respect to the number of candidate models $m$, the update complexity is O(1) and the selection consists of a single sweep over $m$ vectors in addition to O(1) model operations. Isolated model embedding also implies several desirable properties for applications. Secondly, we present Standardized Embedder, an empirical realization of isolated model embedding. We assess its effectiveness by using it to select representations from a pool of 100 pre-trained vision models for classification tasks and measuring the performance gaps between the selected models and the best candidates with a linear probing protocol. Experiments suggest our realization is effective in selecting models with competitive performances and highlight isolated model embedding as a promising direction towards model selection that is fundamentally (more) scalable.

Updated: 2024-06-11 17:57:49

标题: 走向基本可扩展的模型选择：渐近快速更新和选择

摘要: 深度学习技术的进步每天都带来新模型，促使对可扩展模型选择的研究。理想的模型选择方案应该能够在大量候选模型中高效地支持两种操作：更新，涉及添加新的候选模型或移除现有候选模型，以及选择，涉及为给定任务定位性能优越的模型。然而，先前的模型选择解决方案对于这两种操作中至少一种都需要高计算复杂度。在这项工作中，我们致力于基本上更可扩展的模型选择，同时支持渐近快速的更新和渐近快速的选择。首先，我们定义了孤立模型嵌入，一系列支持渐近快速更新和选择的模型选择方案：关于候选模型数量m，更新复杂度为O(1)，选择包括对m个向量进行单次扫描，另外还有O(1)模型操作。孤立模型嵌入还暗示了应用程序的几个理想属性。其次，我们提出了标准化嵌入器，这是孤立模型嵌入的实证实现。我们通过将其用于从100个预训练视觉模型中选择表示来评估其有效性，并使用线性探测协议测量所选模型与最佳候选模型之间的性能差距。实验表明我们的实现在选择具有竞争性表现的模型方面是有效的，并突出孤立模型嵌入作为一个有望实现基本上更可扩展模型选择的方向。

更新时间: 2024-06-11 17:57:49

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.07536v1

diff History for Neural Language Agents

Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io.

Updated: 2024-06-11 17:57:15

标题: 神经语言代理的diff历史

摘要: 神经语言模型（LMs）为通用目的的实体控制提供了令人兴奋的解决方案。然而，在使用基于LM的控制器时出现了一个关键技术问题：环境观察必须转换为文本，这与历史结合在一起，导致文本提示冗长而啰嗦。因此，先前在LM代理方面的工作局限于具有较小观察尺寸以及最小交互历史或指导调整需求的受限领域。在本文中，我们介绍了差异历史（diff history），这是针对这些问题的一个简单而高效的解决方案。通过在用于提示LM策略的交互历史中连续文本观察上应用Unix diff命令，我们可以抽象出多余信息，并将文本输入的内容集中在环境中突出变化上。在需要长期推理进行决策的未解决视频游戏NetHack上，使用diff history调整的LMs与先前工作相比，需要的训练示例数量少了1800倍，达到了神经代理的最新性能水平。即使在简单的BabyAI-Text环境中使用简洁的文本观察，我们发现，虽然diff history增加了提示的长度，但它提供的表示方式在低样本指导调整的效率上提高了25%。此外，我们展示了diff history在不同调整数据集大小上的良好扩展性。我们将我们的代码和数据开源到https://diffhistory.github.io。

更新时间: 2024-06-11 17:57:15

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.07540v3

Hearing Anything Anywhere

Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.

Updated: 2024-06-11 17:56:14

标题: 在任何地方听到任何声音

摘要: 近年来，在3D计算机视觉和计算机图形方面取得了巨大进展，出现了可以为众多混合现实（XR）应用虚拟化现实世界3D环境的新工具。然而，除了沉浸式视觉体验外，沉浸式听觉体验对我们对环境的整体感知同样至关重要。本文旨在仅给定大约12个房间冲激响应（RIR）录音和场景的平面重建的情况下，重建任意环境的空间声学特征，这可以轻松地由普通用户实现。为此，我们引入了DiffRIR，一个可微分的RIR渲染框架，具有场景显著声学特征的可解释参数模型，包括声源指向性和表面反射性。这使我们能够通过空间合成新颖的听觉体验，使用任何源音频。为了评估我们的方法，我们收集了四个不同真实环境的RIR录音和音乐数据集。我们展示了我们的模型在呈现未知位置的单声道和双声道RIR和音乐方面优于最先进的基线，并学习了表征场景中声源和表面声学特性的物理可解释参数。

更新时间: 2024-06-11 17:56:14

领域: cs.SD,cs.CV,cs.LG,eess.AS,I.2.10; I.4.8

下载: http://arxiv.org/abs/2406.07532v1

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

Updated: 2024-06-11 17:55:25

标题: 地图：通过二次近似实现低计算模型合并和分摊帕累托前沿

摘要: 模型合并已经成为一种有效的方法，将多个单任务模型，从相同的预训练模型微调，合并为一个多任务模型。这个过程通常涉及计算模型参数的加权平均，而无需进行额外的训练。现有的模型合并方法侧重于提高平均任务精度。然而，不同任务之间的干扰和冲突可能导致在模型合并过程中进行权衡。在实际应用中，具有各种权衡的一组解决方案可能更具信息性，帮助从业者基于不同的偏好做出决策。在本文中，我们介绍了一种新颖的低计算算法，Model Merging with Amortized Pareto Front（MAP）。MAP识别出用于合并多个模型以反映权衡的缩放系数的帕累托集。MAP的核心组件是使用从预选缩放系数集合中得出的二次逼近替代模型来近似各种任务的评估指标，实现摊销推断。在视觉和自然语言处理任务上的实验结果表明，MAP能够准确识别帕累托前沿。为了进一步减少MAP所需的计算量，我们提出了（1）贝叶斯自适应抽样算法和（2）具有多个阶段的嵌套合并方案。

更新时间: 2024-06-11 17:55:25

领域: cs.LG

下载: http://arxiv.org/abs/2406.07529v1

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

The capacity of Large Language Models (LLMs) to comprehend and reason over long contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing long-distance dependencies within sequences to deeply understand semantics. To address this issue, we introduce Query-aware Inference for LLMs (Q-LLM), a system designed to process extensive sequences akin to human cognition. By focusing on memory data relevant to a given query, Q-LLM can accurately capture pertinent information within a fixed window size and provide precise answers to queries. It doesn't require extra training and can be seamlessly integrated with any LLMs. Q-LLM using LLaMA3 (QuickLLaMA) can read Harry Potter within 30s and accurately answer the questions. Q-LLM improved by 7.17% compared to the current state-of-the-art on LLaMA3, and by 3.26% on Mistral on the $\infty$-bench. In the Needle-in-a-Haystack task, On widely recognized benchmarks, Q-LLM improved upon the current SOTA by 7.0% on Mistral and achieves 100% on LLaMA3. Our code can be found in https://github.com/dvlab-research/Q-LLM.

Updated: 2024-06-11 17:55:03

标题: QuickLLaMA：用于大型语言模型的查询感知推理加速 (Note: The translation may vary depending on the context and specific terminology used in the document.)

摘要: 大型语言模型（LLMs）理解和推理长文本的能力对于各个领域的进步至关重要。然而，它们仍然在捕捉序列中的长距离依赖关系以深入理解语义方面遇到困难。为了解决这个问题，我们引入了一种名为Query-aware Inference for LLMs（Q-LLM）的系统，旨在处理类似人类认知的广泛序列。通过专注于与给定查询相关的记忆数据，Q-LLM能够准确捕捉固定窗口大小内的相关信息，并提供精确的答案。它不需要额外的训练，并且可以无缝集成到任何LLMs中。使用LLaMA3（QuickLLaMA）的Q-LLM可以在30秒内阅读《哈利·波特》并准确回答问题。与当前LLaMA3的最新技术相比，Q-LLM提高了7.17%，在$\infty$-bench的Mistral上提高了3.26%。在广泛认可的基准测试中，Q-LLM在Mistral上比当前SOTA提高了7.0%，并在LLaMA3上达到100%。我们的代码可以在https://github.com/dvlab-research/Q-LLM找到。

更新时间: 2024-06-11 17:55:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.07528v1

On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.

Updated: 2024-06-11 17:53:39

标题: 关于从时间聚合的独立同分布数据中恢复因果关系的研究

摘要: 我们考虑时间聚合对一般情况下瞬时（非时间）因果发现的影响。这是由于观察到真实的因果时间滞后通常比观测间隔短得多的事实而引发的。这种差异导致高度聚合，导致时间延迟因果性消失，瞬时依赖性显现。尽管我们期望这种瞬时依赖性在某种意义上与真实的因果关系一致，以使发现结果有意义，但目前仍不清楚我们需要什么类型的一致性以及何时会满足这种一致性。我们提出了函数一致性和条件独立性一致性，形式地对应于基于功能因果模型的方法和基于条件独立性的方法，并提供了这些一致性将成立的条件。我们从理论和实验上展示了，尤其是在完全非线性情况下，因果发现结果可能会受到聚合的严重扭曲，并且我们还发现，如果数据部分具有线性性或适当的先验知识，从聚合数据中仍然可以恢复因果关系。我们的研究结果表明，社区在解释从这类数据中得出的因果发现结果时应采取谨慎细致的态度，并展示了为什么以及何时聚合会扭曲因果发现方法的性能。

更新时间: 2024-06-11 17:53:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.02191v2

Simple and Effective Masked Diffusion Language Models

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We release our code at: https://github.com/kuleshov-group/mdlm

Updated: 2024-06-11 17:51:40

标题: 简单且有效的遮蔽扩散语言模型

摘要: 虽然扩散模型在生成高质量图像方面表现出色，但先前的研究报告指出，在语言建模方面，扩散模型与自回归（AR）方法之间存在显著的性能差距。在这项工作中，我们展示了简单的掩模离散扩散比之前认为的更有效。我们应用了一种有效的训练方法，提高了掩模扩散模型的性能，并推导出了一个简化的、Rao-Blackwell化的目标函数，从而实现了额外的改进。我们的目标函数形式简单--它是经典的掩模语言建模损失的混合体--并且可以用来训练仅包含编码器的语言模型，这些模型可以使用高效的采样器，包括能够半自回归地生成任意长度文本的传统语言模型。在语言建模基准测试中，一系列使用现代工程实践训练的掩模扩散模型实现了在扩散模型中的最新技术水平，并接近了AR模型的困惑度。我们在以下网址发布了我们的代码：https://github.com/kuleshov-group/mdlm

更新时间: 2024-06-11 17:51:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07524v1

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Efficiently modeling sequences with infinite context length has been a long-standing problem. Past works suffer from either the quadratic computation complexity or the limited extrapolation ability on length generalization. In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). Samba selectively compresses a given sequence into recurrent hidden states while still maintaining the ability to precisely recall memories with the attention mechanism. We scale Samba up to 3.8B parameters with 3.2T training tokens and show that Samba substantially outperforms the state-of-the-art models based on pure attention or SSMs on a wide range of benchmarks. When trained on 4K length sequences, Samba can be efficiently extrapolated to 256K context length with perfect memory recall and show improved token predictions up to 1M context length. As a linear-time sequence model, Samba enjoys a 3.73x higher throughput compared to Transformers with grouped-query attention when processing user prompts of 128K length, and 3.64x speedup when generating 64K tokens with unlimited streaming. A sample implementation of Samba is publicly available in https://github.com/microsoft/Samba.

Updated: 2024-06-11 17:50:51

标题: Samba：简单的混合状态空间模型用于高效的无限上下文语言建模

摘要: Efficiently modeling sequences with infinite context length has been a long-standing problem. Past works suffer from either the quadratic computation complexity or the limited extrapolation ability on length generalization. In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). Samba selectively compresses a given sequence into recurrent hidden states while still maintaining the ability to precisely recall memories with the attention mechanism. We scale Samba up to 3.8B parameters with 3.2T training tokens and show that Samba substantially outperforms the state-of-the-art models based on pure attention or SSMs on a wide range of benchmarks. When trained on 4K length sequences, Samba can be efficiently extrapolated to 256K context length with perfect memory recall and show improved token predictions up to 1M context length. As a linear-time sequence model, Samba enjoys a 3.73x higher throughput compared to Transformers with grouped-query attention when processing user prompts of 128K length, and 3.64x speedup when generating 64K tokens with unlimited streaming. A sample implementation of Samba is publicly available in https://github.com/microsoft/Samba.

更新时间: 2024-06-11 17:50:51

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.07522v1

Faster Spectral Density Estimation and Sparsification in the Nuclear Norm

We consider the problem of estimating the spectral density of the normalized adjacency matrix of an $n$-node undirected graph. We provide a randomized algorithm that, with $O(n\epsilon^{-2})$ queries to a degree and neighbor oracle and in $O(n\epsilon^{-3})$ time, estimates the spectrum up to $\epsilon$ accuracy in the Wasserstein-1 metric. This improves on previous state-of-the-art methods, including an $O(n\epsilon^{-7})$ time algorithm from [Braverman et al., STOC 2022] and, for sufficiently small $\epsilon$, a $2^{O(\epsilon^{-1})}$ time method from [Cohen-Steiner et al., KDD 2018]. To achieve this result, we introduce a new notion of graph sparsification, which we call nuclear sparsification. We provide an $O(n\epsilon^{-2})$-query and $O(n\epsilon^{-2})$-time algorithm for computing $O(n\epsilon^{-2})$-sparse nuclear sparsifiers. We show that this bound is optimal in both its sparsity and query complexity, and we separate our results from the related notion of additive spectral sparsification. Of independent interest, we show that our sparsification method also yields the first deterministic algorithm for spectral density estimation that scales linearly with $n$ (sublinear in the representation size of the graph).

Updated: 2024-06-11 17:50:20

标题: 核范数中更快的谱密度估计和稀疏化

摘要: 我们考虑估计一个$n$节点无向图的归一化邻接矩阵的谱密度的问题。我们提供了一个随机算法，通过对度和邻居的oracle进行$O(n\epsilon^{-2})$次查询，在$O(n\epsilon^{-3})$的时间内，以Wasserstein-1度量精确度$\epsilon$估计谱。这优于先前的最先进方法，包括来自[Braverman等人，STOC 2022]的$O(n\epsilon^{-7})$时间算法，以及对于足够小的$\epsilon$，来自[Cohen-Steiner等人，KDD 2018]的$2^{O(\epsilon^{-1})}$时间方法。为了实现这一结果，我们引入了一个新的图稀疏化概念，称为核稀疏化。我们提供了一个用于计算$O(n\epsilon^{-2})$稀疏核稀疏化器的$O(n\epsilon^{-2})$查询和$O(n\epsilon^{-2})$时间算法。我们展示了这个界限在稀疏性和查询复杂度方面是最佳的，并且我们将我们的结果与相关的加法谱稀疏化概念分开。值得独立关注的是，我们展示了我们的稀疏化方法也产生了第一个随着$n$线性扩展的谱密度估计的确定性算法（在图的表示大小的次线性）。

更新时间: 2024-06-11 17:50:20

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2406.07521v1

Neural Gaffer: Relighting Any Object via Diffusion

Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BRDFs, which can be inaccurate or under-expressive. In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. Our method builds on a pre-trained diffusion model, and fine-tunes it on a synthetic relighting dataset, revealing and harnessing the inherent understanding of lighting present in the diffusion model. We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. Moreover, by combining with other generative methods, our model enables many downstream 2D tasks, such as text-based relighting and object insertion. Our model can also operate as a strong relighting prior for 3D tasks, such as relighting a radiance field.

Updated: 2024-06-11 17:50:15

标题: 神经 Gaffer：通过扩散重新点亮任何物体

摘要: 单图像照明是一项具有挑战性的任务，涉及对几何、材料和照明之间复杂相互作用的推理。许多先前的方法只支持特定类别的图像，例如肖像，或者需要特殊的拍摄条件，比如使用手电筒。另外，一些方法明确将场景分解为固有组件，如法线和BRDF，但可能不准确或表达能力不足。在这项工作中，我们提出了一种新颖的端到端2D照明扩散模型，称为神经Gaffer，它可以接受任何对象的单个图像，并可以在任何新的环境照明条件下合成准确且高质量的再照明图像，只需将图像生成器条件化为目标环境地图，而无需显式场景分解。我们的方法建立在一个经过预训练的扩散模型基础上，并在一个合成照明数据集上对其进行微调，揭示和利用扩散模型中固有的对照明的理解。我们在合成和野外互联网图像上评估了我们的模型，并展示了其在泛化和准确性方面的优势。此外，通过与其他生成方法结合，我们的模型使许多下游的2D任务成为可能，如基于文本的照明和对象插入。我们的模型还可以作为3D任务的强大照明先验，比如重新照明辐射场。

更新时间: 2024-06-11 17:50:15

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.07520v1

Physics-guided weak-form discovery of reduced-order models for trapped ultracold hydrodynamics

We study the relaxation of a highly collisional, ultracold but nondegenerate gas of polar molecules. Confined within a harmonic trap, the gas is subject to fluid-gaseous coupled dynamics that lead to a breakdown of first-order hydrodynamics. An attempt to treat these higher-order hydrodynamic effects was previously made with a Gaussian ansatz and coarse-graining model parameter [R. R. W. Wang & J. L. Bohn, Phys. Rev. A 108, 013322 (2023)], leading to an approximate set of equations for a few collective observables accessible to experiments. Here we present substantially improved reduced-order models for these same observables, admissible beyond previous parameter regimes, discovered directly from particle simulations using the WSINDy algorithm (Weak-form Sparse Identification of Nonlinear Dynamics). The interpretable nature of the learning algorithm enables estimation of previously unknown physical quantities and discovery of model terms with candidate physical mechanisms, revealing new physics in mixed collisional regimes. Our approach constitutes a general framework for data-driven model identification leveraging known physics.

Updated: 2024-06-11 17:50:04

标题: 物理引导的弱形式发现用于困住的超冷流体动力学的降阶模型

摘要: 我们研究了高度碰撞的、超冷但非简并的极性分子气体的弛豫过程。气体被限制在一个谐振陷阱中，受到流体-气态耦合动力学的影响，导致了一阶流体动力学的崩溃。之前曾尝试用高斯假设和粗粒化模型参数[R.R.W.Wang & J.L.Bohn, Phys. Rev. A 108, 013322 (2023)]来处理这些高阶流体动力学效应，从而得到了一组近似方程，用于描述实验可观测到的几个集体观测值。在这里，我们提出了针对这些相同观测值的大幅改进的降阶模型，可以超出以前的参数范围，直接从使用WSINDy算法(弱形式稀疏识别非线性动力学)的粒子模拟中发现。学习算法的可解释性使得可以估计之前未知的物理量，并发现具有候选物理机制的模型项，揭示了在混合碰撞区域中的新物理学。我们的方法构成了一个利用已知物理的数据驱动模型识别的通用框架。

更新时间: 2024-06-11 17:50:04

领域: cond-mat.quant-gas,cond-mat.stat-mech,cs.LG,math.DS

下载: http://arxiv.org/abs/2406.07519v1

RudolfV: A Foundation Model by Pathologists for Pathologists

Artificial intelligence has started to transform histopathology impacting clinical diagnostics and biomedical research. However, while many computational pathology approaches have been proposed, most current AI models are limited with respect to generalization, application variety, and handling rare diseases. Recent efforts introduced self-supervised foundation models to address these challenges, yet existing approaches do not leverage pathologist knowledge by design. In this study, we present a novel approach to designing foundation models for computational pathology, incorporating pathologist expertise, semi-automated data curation, and a diverse dataset from over 15 laboratories, including 58 tissue types, and encompassing 129 different histochemical and immunohistochemical staining modalities. We demonstrate that our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks focused on tumor microenvironment profiling, biomarker evaluation, and reference case search while exhibiting favorable robustness properties. Our study shows how domain-specific knowledge can increase the efficiency and performance of pathology foundation models and enable novel application areas.

Updated: 2024-06-11 17:46:38

标题: RudolfV：病理学家为病理学家建立的基础模型

摘要: 人工智能已经开始改变组织病理学，影响临床诊断和生物医学研究。然而，虽然许多计算病理学方法已被提出，但大多数当前的人工智能模型在泛化、应用多样性和处理罕见疾病方面存在局限性。最近的努力引入了自监督基础模型来解决这些挑战，然而现有方法并没有设计利用病理学家的知识。在这项研究中，我们提出了一种新颖的设计计算病理学基础模型的方法，结合了病理学家的专业知识、半自动化数据整理以及来自15多个实验室的多样化数据集，包括58种组织类型，涵盖了129种不同的组织化学和免疫组织化学染色方法。我们展示了我们的模型“RudolfV”在不同以肿瘤微环境分析、生物标志物评估和参考病例搜索为重点的各项基准测试中超越了现有的最先进基础模型，同时展示了良好的鲁棒性特性。我们的研究表明，领域特定知识如何提高病理学基础模型的效率和性能，并促进新的应用领域。

更新时间: 2024-06-11 17:46:38

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.04079v4

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of feedback on synthesized data to prevent model collapse. We derive theoretical conditions under which a Gaussian mixture classification model can achieve asymptotically optimal performance when trained on feedback-augmented synthesized data, and provide supporting simulations for finite regimes. We illustrate our theoretical predictions on two practical problems: computing matrix eigenvalues with transformers and news summarization with large language models, which both undergo model collapse when trained on model-generated data. We show that training from feedback-augmented synthesized data, either by pruning incorrect predictions or by selecting the best of several guesses, can prevent model collapse, validating popular approaches like RLHF.

Updated: 2024-06-11 17:46:16

标题: 超越模型崩溃：使用合成数据进行扩展需要强化

摘要: 生成模型合成数据越来越被视为调整大型语言模型的替代选择，而不是人工注释数据。这引发了有关模型崩溃的担忧：在生成数据上进行微调的模型性能下降。考虑到人类和机器都更容易区分好的和坏的示例，而不是生成高质量的样本，我们研究了在合成数据上使用反馈以防止模型崩溃的方法。我们推导了理论条件，在这些条件下，高斯混合分类模型在训练在反馈增强的合成数据时可以实现渐近最优性能，并为有限情况提供支持模拟。我们在两个实际问题上说明了我们的理论预测：使用变压器计算矩阵特征值和使用大型语言模型进行新闻摘要，这两种情况在训练模型生成数据时都会发生模型崩溃。我们展示了通过从反馈增强的合成数据进行训练，无论是通过修剪不正确的预测还是通过选择多个猜测中的最佳猜测，都可以防止模型崩溃，验证了像RLHF这样的流行方法。

更新时间: 2024-06-11 17:46:16

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.07515v1

Toward efficient resource utilization at edge nodes in federated learning

Federated learning (FL) enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. This is accomplished by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices, as well as the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. In doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilizes resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.

Updated: 2024-06-11 17:44:28

标题: 朝向在联邦学习中边缘节点的资源有效利用

摘要: 联邦学习（FL）使边缘节点能够协作地贡献于构建全局模型，而无需共享其数据。这是通过设备计算本地、私密模型更新，然后由服务器汇总实现的。然而，计算资源约束和网络通信对于深度学习应用中典型的更大模型尺寸可能成为严重瓶颈。边缘节点往往具有有限的硬件资源（RAM、CPU），边缘的网络带宽和可靠性是扩展联邦学习应用时的一个问题。在本文中，我们提出并评估了一种受迁移学习启发的FL策略，以减少设备资源利用率，以及减轻服务器和网络在每次全局训练轮次中的负载。对于每个本地模型更新，我们随机选择要训练的层，冻结模型的其余部分。通过这样做，我们可以通过排除所有未训练的层权重不被传输到服务器来减少每轮的服务器负载和通信成本。本研究的目标是在所提出的策略下经验性地探索设备资源利用率和全局模型收敛之间的潜在权衡。我们使用联邦学习框架FEDn实现了这一方法。在不同数据集（CIFAR-10、CASA和IMDB）上进行了一系列实验，使用不同的深度学习模型架构执行不同任务。我们的结果表明，部分训练模型可以加速训练过程，有效利用设备上的资源，并且将数据传输减少约75%和53%，当我们分别训练25%和50%的模型层时，而不影响最终全局模型的准确性。

更新时间: 2024-06-11 17:44:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.10367v2

Flow Map Matching

Generative models based on dynamical transport of measure, such as diffusion models, flow matching models, and stochastic interpolants, learn an ordinary or stochastic differential equation whose trajectories push initial conditions from a known base distribution onto the target. While training is cheap, samples are generated via simulation, which is more expensive than one-step models like GANs. To close this gap, we introduce flow map matching -- an algorithm that learns the two-time flow map of an underlying ordinary differential equation. The approach leads to an efficient few-step generative model whose step count can be chosen a-posteriori to smoothly trade off accuracy for computational expense. Leveraging the stochastic interpolant framework, we introduce losses for both direct training of flow maps and distillation from pre-trained (or otherwise known) velocity fields. Theoretically, we show that our approach unifies many existing few-step generative models, including consistency models, consistency trajectory models, progressive distillation, and neural operator approaches, which can be obtained as particular cases of our formalism. With experiments on CIFAR-10 and ImageNet 32x32, we show that flow map matching leads to high-quality samples with significantly reduced sampling cost compared to diffusion or stochastic interpolant methods.

Updated: 2024-06-11 17:41:26

标题: 流地图匹配

摘要: 基于动态测量传输的生成模型，如扩散模型、流匹配模型和随机插值器，学习普通或随机微分方程，其轨迹将初始条件从已知基础分布推送到目标分布。虽然训练成本较低，但样本是通过模拟生成的，这比像GAN这样的一步模型更昂贵。为了缩小这一差距，我们引入了流图匹配算法——一种学习基础普通微分方程两次流图的算法。这种方法导致了一个高效的几步生成模型，其步数可以事后选择，以在准确性和计算费用之间平滑地权衡。利用随机插值器框架，我们引入了用于直接训练流图和从预训练（或其他已知）速度场中提取的损失。理论上，我们展示了我们的方法统一了许多现有的几步生成模型，包括一致性模型、一致性轨迹模型、渐进蒸馏和神经操作器方法，这些方法可以作为我们形式化的特殊情况获得。通过对CIFAR-10和ImageNet 32x32的实验，我们展示了流图匹配相对于扩散或随机插值方法具有显着降低采样成本的高质量样本。

更新时间: 2024-06-11 17:41:26

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2406.07507v1

Understanding Visual Concepts Across Models

Large multimodal models such as Stable Diffusion can generate, detect, and classify new visual concepts after fine-tuning just a single word embedding. Do models learn similar words for the same concepts (i.e. <orange-cat> = orange + cat)? We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that new word embeddings are model-specific and non-transferable. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an $\epsilon$-ball to any prior embedding that generate, detect, and classify an arbitrary concept. When these new embeddings are spliced into new models, fine-tuning that targets the original model is lost. We show popular soft prompt-tuning approaches find these perturbative solutions when applied to visual concept learning tasks, and embeddings for visual concepts are not transferable. Code for reproducing our work is available at: https://visual-words.github.io.

Updated: 2024-06-11 17:40:31

标题: 跨模型理解视觉概念

摘要: 大型多模型如稳定扩散能够在仅微调单个单词嵌入后生成、检测和分类新的视觉概念。模型是否会学习相似的单词来表示相同的概念（即<橙色猫> = 橙色 + 猫）？我们对三种最先进的文本到图像生成、开放式目标检测和零样本分类模型进行了大规模分析，发现新的单词嵌入是模型特定且不可转移的。在四个标准数据集上针对40个不同的视觉概念训练的4,800个新嵌入中，我们发现对于任意概念，在$\epsilon$-球范围内存在扰动，可以生成、检测和分类。当这些新的嵌入被插入新模型时，针对原始模型的微调将丢失。我们展示了流行的软提示调整方法在应用于视觉概念学习任务时会找到这些扰动解决方案，而视觉概念的嵌入是不可转移的。可以在以下网址找到用于重现我们工作的代码：https://visual-words.github.io。

更新时间: 2024-06-11 17:40:31

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07506v1

TextGrad: Automatic "Differentiation" via Text

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

Updated: 2024-06-11 17:32:21

标题: TextGrad：通过文本自动“微分”

摘要: 人工智能正在经历一种范式转变，通过系统协调多个大型语言模型（LLMs）和其他复杂组件取得突破。因此，开发基于原则和自动化优化方法的复合人工智能系统是最重要的新挑战之一。神经网络在早期也面临类似的挑战，直到反向传播和自动微分通过使优化变得即插即用而改变了该领域。受此启发，我们介绍了TextGrad，这是一个通过文本执行自动“微分”的强大框架。TextGrad通过LLMs提供的文本反馈来改进复合人工智能系统的单个组件。在我们的框架中，LLMs提供了丰富、通用、自然语言的建议，以优化计算图中的变量，从代码片段到分子结构。TextGrad遵循PyTorch的语法和抽象，灵活易用。它可以立即用于各种任务，用户只需提供目标函数，而无需调整框架的组件或提示。我们展示了TextGrad在各种应用中的有效性和普适性，从问题回答和分子优化到放射治疗计划。在不修改框架的情况下，TextGrad将GPT-4o在Google-Proof问题回答中的零-shot准确度从51%提高到55%，在优化LeetCode-Hard编码问题解决方案方面获得了20%的相对性能提升，改进了推理的提示，设计出具有理想体外结合性的新药物小分子，并设计出具有高特异性的放射肿瘤治疗方案。TextGrad奠定了加速下一代人工智能系统开发的基础。

更新时间: 2024-06-11 17:32:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07496v1

CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

Abstractive dialogue summarization is the task of distilling conversations into informative and concise summaries. Although reviews have been conducted on this topic, there is a lack of comprehensive work detailing the challenges of dialogue summarization, unifying the differing understanding of the task, and aligning proposed techniques, datasets, and evaluation metrics with the challenges. This article summarizes the research on Transformer-based abstractive summarization for English dialogues by systematically reviewing 1262 unique research papers published between 2019 and 2024, relying on the Semantic Scholar and DBLP databases. We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality) and link them to corresponding techniques such as graph-based approaches, additional training tasks, and planning strategies, which typically overly rely on BART-based encoder-decoder models. We find that while some challenges, like language, have seen considerable progress, mainly due to training methods, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities. We investigate how these approaches are typically assessed, covering the datasets for the subdomains of dialogue (e.g., meeting, medical), the established automatic metrics and human evaluation approaches for assessing scores and annotator agreement. We observe that only a few datasets span across all subdomains. The ROUGE metric is the most used, while human evaluation is frequently reported without sufficient detail on inner-annotator agreement and annotation guidelines. Additionally, we discuss the possible implications of the recently explored large language models and conclude that despite a potential shift in relevance and difficulty, our described challenge taxonomy remains relevant.

Updated: 2024-06-11 17:30:22

标题: CADS：关于抽象对话摘要挑战的系统文献综述

摘要: 摘要对话摘要是将对话内容提炼为信息丰富且简洁的总结的任务。尽管关于这一主题已经进行了评论，但仍缺乏详细研究对话摘要的挑战，统一了对任务的不同理解，并将提出的技术、数据集和评估指标与挑战对齐的综合工作。本文通过系统地审查2019年至2024年间发表的1262篇独特的基于Transformer的英语对话摘要研究论文，依赖于Semantic Scholar和DBLP数据库，总结了对话摘要的研究。我们涵盖对话总结中存在的主要挑战（即语言、结构、理解、说话者、显著性和事实性），并将其与对应的技术（如基于图的方法、额外的训练任务和规划策略）联系起来，这些技术通常过度依赖于基于BART的编码器-解码器模型。我们发现，虽然一些挑战，如语言，由于训练方法的进步已经取得了可观的进展，但其他一些挑战，如理解、事实性和显著性，仍然困难，并具有重大的研究机会。我们调查了这些方法通常如何进行评估，涵盖了对话子领域的数据集（如会议、医学）、已建立的用于评估分数和注释者一致性的自动评估指标和人类评估方法。我们观察到只有少数数据集跨越所有子领域。ROUGE指标是最常用的，而人类评估经常报告，但缺乏足够的内部注释者一致性和标注指南的细节。此外，我们讨论了最近探索的大型语言模型的可能影响，并得出结论，尽管相关性和难度可能发生变化，但我们描述的挑战分类仍然相关。

更新时间: 2024-06-11 17:30:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07494v1

Exploring Meta Information for Audio-based Zero-shot Bird Classification

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse meta-data. We investigate three different sources of metadata: textual bird sound descriptions encoded via (S)BERT, functional traits (AVONET), and bird life-history (BLH) characteristics. As audio features, we extract audio spectrogram transformer (AST) embeddings and project them to the dimension of the auxiliary information by adopting a single linear layer. Then, we employ the dot product as compatibility function and a standard zero-shot learning ranking hinge loss to determine the correct class. The best results are achieved by concatenating the AVONET and BLH features attaining a mean unweighted F1-score of .233 over five different test sets with 8 to 10 classes.

Updated: 2024-06-11 17:29:51

标题: 探索音频元信息用于基于零样本鸟类分类

摘要: passiven声学监测和机器学习的进展已经导致了大量数据集的获取，用于计算生物声学研究。然而，对于稀有和代表性不足的物种，数据稀缺仍然是一个问题。本研究调查了如何利用元信息来改进零样本音频分类，以鸟类物种作为例子研究，因为鸟类具有丰富和多样化的元数据可用。我们调查了三种不同的元数据来源：通过（S）BERT编码的文本鸟鸣描述，功能性特征（AVONET）和鸟类生活史（BLH）特征。作为音频特征，我们提取音频频谱图变换器（AST）嵌入，并通过采用单个线性层将它们投影到辅助信息的维度。然后，我们采用点积作为兼容性函数和标准零样本学习排名铰链损失来确定正确的类别。通过连接AVONET和BLH特征获得的最佳结果，实现了在8到10个类别的五个不同测试集上的平均未加权F1分数为0.233。

更新时间: 2024-06-11 17:29:51

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2309.08398v2

Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks

Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. While this has been well investigated in Computer Vision, in tasks such as image disentanglement, in the NLP domain sentence disentanglement is still comparatively under-investigated. Most previous work have concentrated on disentangling task-specific generative factors, such as sentiment, within the context of style transfer. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. To achieve this, we contribute to a novel notion of sentence semantic disentanglement and introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties. Experimental results demonstrate that the model can conform the distributed latent space into a better semantically disentangled sentence space, leading to improved language interpretability and controlled generation when compared to the recent state-of-the-art language VAE models.

Updated: 2024-06-11 17:29:22

标题: 通过可逆神经网络学习解释的去耦合语义空间

摘要: 解缠绕的潜在空间通常具有更好的语义可分离性和几何属性，从而导致更好的可解释性和更可控的数据生成。虽然这在计算机视觉领域得到了很好的研究，如图像解缠，但在自然语言处理领域，句子解缠仍然相对较少研究。大多数先前的工作集中在解缠任务特定的生成因素，例如情感，在样式转移的背景下。在这项工作中，我们专注于更一般形式的句子解缠，针对更一般的句子语义特征的局部修改和控制。为了实现这一点，我们提出了一个新颖的句子语义解缠概念，并引入了一个基于流的可逆神经网络（INN）机制，与一个基于变压器的语言自动编码器（AE）集成，以交付具有更好可分离性的潜在空间。实验结果表明，与最近的基于语言VAE模型相比，该模型可以将分布式潜在空间转换为更好的语义解缠的句子空间，从而提高语言可解释性和控制生成。

更新时间: 2024-06-11 17:29:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.01713v3

Towards Generalized Hydrological Forecasting using Transformer Models for 120-Hour Streamflow Prediction

This study explores the efficacy of a Transformer model for 120-hour streamflow prediction across 125 diverse locations in Iowa, US. Utilizing data from the preceding 72 hours, including precipitation, evapotranspiration, and discharge values, we developed a generalized model to predict future streamflow. Our approach contrasts with traditional methods that typically rely on location-specific models. We benchmarked the Transformer model's performance against three deep learning models (LSTM, GRU, and Seq2Seq) and the Persistence approach, employing Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), Pearson's r, and Normalized Root Mean Square Error (NRMSE) as metrics. The study reveals the Transformer model's superior performance, maintaining higher median NSE and KGE scores and exhibiting the lowest NRMSE values. This indicates its capability to accurately simulate and predict streamflow, adapting effectively to varying hydrological conditions and geographical variances. Our findings underscore the Transformer model's potential as an advanced tool in hydrological modeling, offering significant improvements over traditional and contemporary approaches.

Updated: 2024-06-11 17:26:14

标题: 朝向使用Transformer模型进行泛化水文预测，用于120小时流量预测

摘要: 这项研究探讨了Transformer模型在美国爱荷华州125个不同地点的120小时流量预测中的有效性。利用前72小时的降水、蒸散发和排放值等数据，我们开发了一个通用模型来预测未来的流量。我们的方法与通常依赖于特定位置模型的传统方法形成对比。我们将Transformer模型的性能与三种深度学习模型（LSTM、GRU和Seq2Seq）以及持续性方法进行了基准测试，采用Nash-Sutcliffe效率（NSE）、Kling-Gupta效率（KGE）、Pearson's r和归一化均方根误差（NRMSE）作为评估指标。研究表明Transformer模型表现优异，保持较高的中位数NSE和KGE分数，并展现出最低的NRMSE值。这表明它有能力准确模拟和预测流量，有效适应不同的水文条件和地理差异。我们的研究结果强调了Transformer模型作为水文建模中先进工具的潜力，相比传统和现代方法，提供了显著的改进。

更新时间: 2024-06-11 17:26:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.07484v1

Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

The Bhutanese government is increasing its utilization of technological approaches such as including Remote Sensing-based knowledge in their decision-making process. This study focuses on crop type and crop extent in Paro, one of the top rice-yielding districts in Bhutan, and employs publicly available NICFI high-resolution satellite imagery from Planet. Two Deep Learning (DL) approaches, point-based (DNN) and patch-based (U-Net), models were used in conjunction with cloud-computing platforms. Three different models per DL approaches (DNN and U-Net) were trained: 1) RGBN channels from Planet; 2) RGBN and elevation data (RGBNE); 3) RGBN and Sentinel-1 (S1) data (RGBNS), and RGBN with E and S1 data (RGBNES). From this comprehensive analysis, the U-Net displayed higher performance metrics across both model training and model validation efforts. Among the U-Net model sets, the RGBN, RGBNE, RGBNS, and RGBNES models had an F1-score of 0.8546, 0.8563, 0.8467, and 0.8500 respectively. An independent model evaluation was performed and found a high level of performance variation across all the metrics. For this independent model evaluation, the U-Net RGBN, RGBNE, RGBNES, and RGBN models displayed the F1-scores of 0.5935, 0.6154, 0.5882, and 0.6582, suggesting U-Net RGBNES as the best model. The study shows that the DL approaches can predict rice. Also, DL methods can be used with the survey-based approaches currently utilized by the Bhutan Department of Agriculture. Further, this study demonstrated the usage of regional land cover products such as SERVIR's RLCMS as a weak label approach to capture different strata addressing the class imbalance problem and improving the sampling design for DL application. Finally, through preliminary model testing and comparisons outlined it was shown that using additional features such as NDVI, EVI, and NDWI did not drastically improve model performance.

Updated: 2024-06-11 17:25:46

标题: 比较深度学习模型在不丹利用高分辨率卫星影像进行水稻制图

摘要: 不丹政府正在增加其利用技术方法，比如在决策过程中使用基于遥感的知识。本研究重点关注不丹产量最高的稻米种植区之一Paro的作物类型和作物范围，并采用了来自Planet的公开可用的高分辨率卫星图像。采用了两种深度学习（DL）方法，基于点的（DNN）和基于块的（U-Net）模型，结合云计算平台。对每种DL方法（DNN和U-Net）进行了三种不同模型的训练：1）来自Planet的RGBN通道；2）RGBN和高程数据（RGBNE）；3）RGBN和Sentinel-1（S1）数据（RGBNS），以及RGBN与E和S1数据（RGBNES）。通过这次全面分析，U-Net在模型训练和模型验证方面表现出更高的性能指标。在U-Net模型集中，RGBN、RGBNE、RGBNS和RGBNES模型的F1分别为0.8546、0.8563、0.8467和0.8500。进行了独立模型评估，并发现在所有指标上的性能变化较高。在这个独立模型评估中，U-Net RGBN、RGBNE、RGBNES和RGBN模型显示出F1分别为0.5935、0.6154、0.5882和0.6582，表明U-Net RGBNES是最佳模型。研究表明DL方法可以预测水稻。此外，DL方法可以与当前由不丹农业部门使用的基于调查的方法结合使用。此外，本研究展示了使用地区性土地覆盖产品，如SERVIR的RLCMS作为一种弱标签方法，以捕捉不同地层，解决类别不平衡问题，并改善DL应用的采样设计。最后，通过初步模型测试和比较，显示使用额外特征如NDVI、EVI和NDWI并不能显著提高模型性能。

更新时间: 2024-06-11 17:25:46

领域: cs.CV,cs.CY,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2406.07482v1

Learning from Integral Losses in Physics Informed Neural Networks

This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.

Updated: 2024-06-11 17:22:28

标题: 学习物理信息神经网络中的积分损失

摘要: 这项工作提出了一种解决在部分积分微分方程下训练物理信息网络的问题的解决方案。这些方程需要无限或大量神经网络评估来构建单个残差进行训练。因此，准确评估可能是不切实际的，我们表明用无偏估计替代这些积分会导致偏倚的损失函数和解决方案。为了克服这种偏倚，我们研究了三种潜在的解决方案：确定性采样方法、双重采样技巧和延迟目标方法。我们考虑了三类用于基准测试的偏微分方程：一类定义了具有奇异电荷和高达10个维度的弱解的泊松问题，另一类涉及电磁场上的弱解和麦克斯韦方程，第三类定义了一个斯莫鲁霍夫凝聚问题。我们的数值结果证实了实践中存在的前述偏差，并显示我们提出的延迟目标方法可以导致与大样本大小积分估计相当质量的准确解。我们的实现是开源的，可在https://github.com/ehsansaleh/btspinn获取。

更新时间: 2024-06-11 17:22:28

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2305.17387v2

Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior

Trajectory inference seeks to recover the temporal dynamics of a population from snapshots of its (uncoupled) temporal marginals, i.e. where observed particles are not tracked over time. Lavenant et al. arXiv:2102.09204 addressed this challenging problem under a stochastic differential equation (SDE) model with a gradient-driven drift in the observed space, introducing a minimum entropy estimator relative to the Wiener measure. Chizat et al. arXiv:2205.07146 then provided a practical grid-free mean-field Langevin (MFL) algorithm using Schr\"odinger bridges. Motivated by the overwhelming success of observable state space models in the traditional paired trajectory inference problem (e.g. target tracking), we extend the above framework to a class of latent SDEs in the form of observable state space models. In this setting, we use partial observations to infer trajectories in the latent space under a specified dynamics model (e.g. the constant velocity/acceleration models from target tracking). We introduce PO-MFL to solve this latent trajectory inference problem and provide theoretical guarantees by extending the results of arXiv:2102.09204 to the partially observed setting. We leverage the MFL framework of arXiv:2205.07146, yielding an algorithm based on entropic OT between dynamics-adjusted adjacent time marginals. Experiments validate the robustness of our method and the exponential convergence of the MFL dynamics, and demonstrate significant outperformance over the latent-free method of arXiv:2205.07146 in key scenarios.

Updated: 2024-06-11 17:21:15

标题: 使用最优传输和动态先验进行部分观察的轨迹推断

摘要: 轨迹推断旨在从人口的时间边际快照中恢复其时间动态，即观察到的粒子在时间上未被跟踪。Lavenant等人在arXiv:2102.09204中针对具有梯度驱动漂移的观测空间的随机微分方程（SDE）模型解决了这一具有挑战性的问题，引入了相对于维纳测度的最小熵估计器。Chizat等人在arXiv:2205.07146中提供了一种实用的无网格均场朗之万（MFL）算法，使用Schrödinger桥连接。受传统配对轨迹推断问题（如目标跟踪）中可观察状态空间模型的巨大成功的启发，我们将上述框架扩展到一类以可观察状态空间模型形式呈现的潜在SDE中。在这种情况下，我们使用部分观测来推断指定动态模型下（如目标跟踪中的恒定速度/加速度模型）潜在空间中的轨迹。我们引入PO-MFL来解决这一潜在轨迹推断问题，并通过将arXiv:2102.09204的结果扩展到部分观测设置来提供理论保证。我们利用arXiv:2205.07146的MFL框架，提出了一种基于动态调整相邻时间边际之间的熵输运（OT）的算法。实验证实了我们方法的稳健性以及MFL动态的指数收敛，并展示了在关键场景中相比于arXiv:2205.07146的无潜在方法的显著优势。

更新时间: 2024-06-11 17:21:15

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07475v1

Quantifying Local Model Validity using Active Learning

Real-world applications of machine learning models are often subject to legal or policy-based regulations. Some of these regulations require ensuring the validity of the model, i.e., the approximation error being smaller than a threshold. A global metric is generally too insensitive to determine the validity of a specific prediction, whereas evaluating local validity is costly since it requires gathering additional data.We propose learning the model error to acquire a local validity estimate while reducing the amount of required data through active learning. Using model validation benchmarks, we provide empirical evidence that the proposed method can lead to an error model with sufficient discriminative properties using a relatively small amount of data. Furthermore, an increased sensitivity to local changes of the validity bounds compared to alternative approaches is demonstrated.

Updated: 2024-06-11 17:20:28

标题: 使用主动学习量化本地模型的有效性

摘要: 机器学习模型在现实世界应用中通常受到法律或政策规定的约束。其中一些规定要求确保模型的有效性，即逼近误差低于一个阈值。全局度量通常对于确定特定预测的有效性过于迟钝，而评估局部有效性成本高昂，因为这需要收集额外数据。我们提出通过主动学习学习模型误差，以获取局部有效性估计，同时减少所需数据量。通过使用模型验证基准，我们提供实证证据表明，所提出的方法可以在相对较少的数据量下导致具有足够区分性质的误差模型。此外，相对于其他方法，我们展示了对有效性边界局部变化增加的敏感性。

更新时间: 2024-06-11 17:20:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.07474v1

Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression

We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria or Lepskii principle.

Updated: 2024-06-11 17:15:26

标题: 选择$k$值的$k$-NN回归中的最小差异原则策略

摘要: 我们提出了一种新颖的数据驱动策略，用于选择$k$-NN回归估计器中的超参数$k，而无需使用任何留出数据。我们将选择超参数的问题视为一个迭代过程（关于$k$）并提出了一种基于早停止和最小差异原则的易于实施的策略。该模型选择策略被证明在一些平滑函数类上（例如，有界域上的Lipschitz函数类）是极小极优的，假设了协变量的固定设计。与其他模型选择策略（如留出法、5折交叉验证和AIC准则）相比，这种新颖方法通常在人工和现实世界数据集上提高了统计性能。该策略的新颖性在于减少了模型选择过程的计算时间，同时保持了结果估计器的统计（极小极优）优化性。更具体地说，给定大小为$n$的样本，如果需要在$\left\{ 1, \ldots, n \right\}$中选择$k$，并且$\left\{ f^1, \ldots, f^n \right\}$是回归函数的估计器，则最小差异原则要求计算一部分估计器，而这不适用于广义交叉验证、Akaike的AIC准则或Lepskii原则。

更新时间: 2024-06-11 17:15:26

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2008.08718v6

Formal Semantic Geometry over Transformer-based Variational AutoEncoder

Formal/symbolic semantics can provide canonical, rigid controllability and interpretability to sentence representations due to their \textit{localisation} or \textit{composition} property. How can we deliver such property to the current distributional sentence representations to control and interpret the generation of language models (LMs)? In this work, we theoretically frame the sentence semantics as the composition of \textit{semantic role - word content} features and propose the formal semantic geometry. To inject such geometry into Transformer-based LMs (i.e. GPT2), we deploy Transformer-based Variational AutoEncoder with a supervision approach, where the sentence generation can be manipulated and explained over low-dimensional latent Gaussian space. In addition, we propose a new probing algorithm to guide the movement of sentence vectors over such geometry. Experimental results reveal that the formal semantic geometry can potentially deliver better control and interpretation to sentence generation.

Updated: 2024-06-11 17:15:02

标题: 基于Transformer的变分自动编码器的形式语义几何学

摘要: 正式/符号语义学可以通过其“本地化”或“组成”属性为句子表示提供规范的、严格的可控性和可解释性。我们如何将这种属性传递给当前的分布式句子表示，以控制和解释语言模型（LMs）的生成？在这项工作中，我们理论上将句子语义框架化为“语义角色-词内容”特征的组合，并提出正式语义几何。为了将这种几何概念注入基于Transformer的LMs（即GPT2），我们采用了基于Transformer的变分自动编码器和监督方法，其中句子生成可以在低维潜在高斯空间中进行操纵和解释。此外，我们提出了一种新的探测算法，以指导句子向量在这种几何空间中的移动。实验结果显示，正式语义几何概念可能能够为句子生成提供更好的控制和解释。

更新时间: 2024-06-11 17:15:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2210.06230v2

Multimodal Belief Prediction

Recognizing a speaker's level of commitment to a belief is a difficult task; humans do not only interpret the meaning of the words in context, but also understand cues from intonation and other aspects of the audio signal. Many papers and corpora in the NLP community have approached the belief prediction task using text-only approaches. We are the first to frame and present results on the multimodal belief prediction task. We use the CB-Prosody corpus (CBP), containing aligned text and audio with speaker belief annotations. We first report baselines and significant features using acoustic-prosodic features and traditional machine learning methods. We then present text and audio baselines for the CBP corpus fine-tuning on BERT and Whisper respectively. Finally, we present our multimodal architecture which fine-tunes on BERT and Whisper and uses multiple fusion methods, improving on both modalities alone.

Updated: 2024-06-11 17:12:41

标题: 多模态信念预测

摘要: 识别说话者对信仰的承诺程度是一项困难的任务；人类不仅解释语境中的词语含义，还理解语调和音频信号的其他方面的线索。在自然语言处理社区中，许多论文和语料库采用纯文本方法来处理信仰预测任务。我们是第一个提出并展示多模态信仰预测任务结果的研究。我们使用包含对齐文本和音频的说话者信仰标注的CB-Prosody语料库（CBP）。我们首先利用声学-韵律特征和传统机器学习方法报告基线和重要特征。然后，我们分别在CBP语料库上对BERT和Whisper进行文本和音频基线微调。最后，我们呈现了我们的多模态架构，该架构在BERT和Whisper上进行微调，并使用多种融合方法，在单独的两种模态上都有所提升。

更新时间: 2024-06-11 17:12:41

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.07466v1

Generated Contents Enrichment

In this paper, we investigate a novel artificial intelligence generation task, termed as generated contents enrichment (GCE). Different from conventional artificial intelligence contents generation task that enriches the given textual description implicitly with limited semantics for generating visually real content, our proposed GCE strives to perform content enrichment explicitly on both the visual and textual domain, from which the enriched contents are visually real, structurally reasonable, and semantically abundant. Towards to solve GCE, we propose a deep end-to-end method that explicitly explores the semantics and inter-semantic relationships during the enrichment. Specifically, we first model the input description as a semantic graph, wherein each node represents an object and each edge corresponds to the inter-object relationship. We then adopt Graph Convolutional Networks on top of the input scene description to predict the enriching objects and their relationships with the input objects. Finally, the enriched description is fed into an image synthesis model to carry out the visual contents generation. Our experiments conducted on the Visual Genome dataset exhibit promising and visually plausible results.

Updated: 2024-06-11 17:12:26

标题: 生成内容丰富化

摘要: 在这篇论文中，我们研究了一项新颖的人工智能生成任务，被称为生成内容丰富（GCE）。与传统的人工智能内容生成任务不同，传统任务通过有限的语义隐含地丰富给定的文本描述以生成真实的视觉内容，我们提出的GCE旨在显式地在视觉和文本领域进行内容丰富，从中得到的内容在视觉上真实，结构合理，语义丰富。为了解决GCE，我们提出了一种深度端到端的方法，明确探讨了丰富过程中的语义和语义间关系。具体而言，我们首先将输入描述建模为一个语义图，其中每个节点表示一个对象，每条边对应于对象间的关系。然后我们在输入场景描述之上采用图卷积网络来预测丰富对象及其与输入对象的关系。最后，丰富的描述被输入到图像合成模型中进行视觉内容生成。我们在Visual Genome数据集上进行的实验展示了令人期待和视觉上合理的结果。

更新时间: 2024-06-11 17:12:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.03650v2

Data-dependent Generalization Bounds via Variable-Size Compressibility

In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the R\'enyi information dimension of a process, and the metric mean dimension.

Updated: 2024-06-11 17:12:22

标题: 通过可变大小的可压缩性实现数据相关的泛化界限

摘要: 在这篇论文中，我们通过引入新的“变尺寸可压缩性”框架，建立了关于泛化误差的新型数据相关上界。在这个框架中，算法的泛化误差与其输入数据的变尺寸“压缩率”相关联。这被证明可以产生取决于手头给定输入数据的经验度量而非其未知分布的上界。我们建立的新的泛化界限包括尾界限、期望的尾界限和期望界限。此外，我们展示了我们的框架还可以推导出对输入数据和输出假设随机变量的任何函数的一般界限。特别地，这些一般界限被证明可以包含并可能优于几种现有的PAC-Bayes和数据相关的固有维度界限，这些界限可以作为特例恢复，从而揭示了我们方法的统一特性。例如，建立了一种新的数据相关的固有维度界限，将泛化误差与优化轨迹联系起来，并揭示了与过程的速率失真维度、Rényi信息维度和度量平均维度之间的各种有趣联系。

更新时间: 2024-06-11 17:12:22

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2303.05369v3

Estimating the Hallucination Rate of Generative AI

This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. With this perspective, we define a \textit{hallucination} as a generated prediction that has low-probability under the true latent parameter. We develop a new method that takes an ICL problem -- that is, a CGM, a dataset, and a prediction question -- and estimates the probability that a CGM will generate a hallucination. Our method only requires generating queries and responses from the model and evaluating its response log probability. We empirically evaluate our method on synthetic regression and natural language ICL tasks using large language models.

Updated: 2024-06-11 17:01:52

标题: 估计生成式人工智能的幻觉率

摘要: 这项工作是关于使用生成式人工智能估计上下文学习（ICL）的幻觉率。在ICL中，通过数据集提示条件生成模型（CGM），并要求基于该数据集进行预测。ICL的贝叶斯解释假设CGM正在计算一个潜在参数和数据的未知贝叶斯模型的后验预测分布。从这个角度来看，我们将\textit{幻觉}定义为在真实潜在参数下概率较低的生成预测。我们开发了一种新方法，用于估计一个ICL问题 -- 即，一个CGM、一个数据集和一个预测问题 -- 中CGM生成幻觉的概率。我们的方法只需要从模型生成查询和响应，并评估其响应对数概率。我们通过使用大型语言模型对合成回归和自然语言ICL任务进行了实证评估。

更新时间: 2024-06-11 17:01:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07457v1

fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions

Recent advancements in neural network design have given rise to the development of Kolmogorov-Arnold Networks (KANs), which enhance speed, interpretability, and precision. This paper presents the Fractional Kolmogorov-Arnold Network (fKAN), a novel neural network architecture that incorporates the distinctive attributes of KANs with a trainable adaptive fractional-orthogonal Jacobi function as its basis function. By leveraging the unique mathematical properties of fractional Jacobi functions, including simple derivative formulas, non-polynomial behavior, and activity for both positive and negative input values, this approach ensures efficient learning and enhanced accuracy. The proposed architecture is evaluated across a range of tasks in deep learning and physics-informed deep learning. Precision is tested on synthetic regression data, image classification, image denoising, and sentiment analysis. Additionally, the performance is measured on various differential equations, including ordinary, partial, and fractional delay differential equations. The results demonstrate that integrating fractional Jacobi functions into KANs significantly improves training speed and performance across diverse fields and applications.

Updated: 2024-06-11 17:01:45

标题: fKAN：可训练雅可比基函数的分数Kolmogorov-Arnold网络

摘要: 最近神经网络设计的进展催生了科尔莫戈洛夫-阿诺德网络（KANs）的发展，这种网络提高了速度、可解释性和精度。本文介绍了分数科尔莫戈洛夫-阿诺德网络（fKAN），这是一种新颖的神经网络架构，将KANs的独特属性与可训练的自适应分数正交雅各比函数作为基础函数相结合。通过利用分数雅各比函数的独特数学属性，包括简单的导数公式、非多项式行为以及对正负输入值均有活性，这种方法确保了高效的学习和增强的准确性。提出的架构在深度学习和基于物理知识的深度学习的一系列任务中进行了评估。精度在合成回归数据、图像分类、图像去噪和情感分析上进行了测试。此外，还对各种微分方程进行了性能评估，包括常微分方程、偏微分方程和分数延迟微分方程。结果表明，将分数雅各比函数整合到KANs中显著提高了在不同领域和应用中的训练速度和性能。

更新时间: 2024-06-11 17:01:45

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.07456v1

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

In this paper, we study reinforcement learning from human feedback (RLHF) under an episodic Markov decision process with a general trajectory-wise reward model. We developed a model-free RLHF best policy identification algorithm, called $\mathsf{BSAD}$, without explicit reward model inference, which is a critical intermediate step in the contemporary RLHF paradigms for training large language models (LLM). The algorithm identifies the optimal policy directly from human preference information in a backward manner, employing a dueling bandit sub-routine that constantly duels actions to identify the superior one. $\mathsf{BSAD}$ adopts a reward-free exploration and best-arm-identification-like adaptive stopping criteria to equalize the visitation among all states in the same decision step while moving to the previous step as soon as the optimal action is identifiable, leading to a provable, instance-dependent sample complexity $\tilde{\mathcal{O}}(c_{\mathcal{M}}SA^3H^3M\log\frac{1}{\delta})$ which resembles the result in classic RL, where $c_{\mathcal{M}}$ is the instance-dependent constant and $M$ is the batch size. Moreover, $\mathsf{BSAD}$ can be transformed into an explore-then-commit algorithm with logarithmic regret and generalized to discounted MDPs using a frame-based approach. Our results show: (i) sample-complexity-wise, RLHF is not significantly harder than classic RL and (ii) end-to-end RLHF may deliver improved performance by avoiding pitfalls in reward inferring such as overfit and distribution shift.

Updated: 2024-06-11 17:01:41

标题: 从人类反馈中学习强化学习而无需奖励推断：无模型算法和实例相关分析

摘要: 在这篇论文中，我们研究了在具有一般轨迹奖励模型的情节性马尔可夫决策过程下，从人类反馈中进行强化学习（RLHF）。我们开发了一种无模型RLHF最佳策略识别算法，称为$\mathsf{BSAD}$，没有明确的奖励模型推断，这是当代RLHF范式中用于训练大型语言模型（LLM）的关键中间步骤。该算法以反向方式直接从人类偏好信息中识别最优策略，采用一种不断决斗动作以识别优势动作的双武士子例程。$\mathsf{BSAD}$采用无奖励探索和类似最佳臂识别的自适应停止准则，以在同一决策步骤中均匀分配所有状态的访问量，同时在识别出最佳动作后尽快移动到上一步，从而导致可证明的，实例相关的样本复杂度$\tilde{\mathcal{O}}(c_{\mathcal{M}}SA^3H^3M\log\frac{1}{\delta})$，这类似于经典RL中的结果，其中$c_{\mathcal{M}}$是实例相关常数，$M$是批处理大小。此外，$\mathsf{BSAD}$可以转化为一种探索-然后-承诺算法，具有对数遗憾，并且可以通过基于帧的方法推广到折现MDP。我们的结果表明：（i）在样本复杂性方面，RLHF并不比经典RL更难，（ii）端到端RLHF可以通过避免奖励推断中的过拟合和分布转移等问题来提供改进的性能。

更新时间: 2024-06-11 17:01:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07455v1

Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervision makes it difficult for the model to locate multiple instances and accurately group pixels with similar semantics, significantly hampering versatility and performance. In this paper, we introduce Unpair-Seg, a novel weakly-supervised open-vocabulary segmentation framework that learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected. Unpair-Seg initially predicts a set of binary masks and generates pseudo labels by identifying confident pairs of masks and text entities. We then train a feature adapter to align region embeddings with text embeddings based on these pseudo labels, achieving open-vocabulary segmentation. However, the inherent noise in the mask-entity correspondence poses a challenge to obtaining reliable pairs. To address this, we employ a vision-language large model to re-caption the input images and extract precise entities, and we design a multi-scale matching strategy to reduce noisy mask-entity pairs. Our Unpair-Seg framework demonstrates impressive performance, achieving 14.6\% and 19.5\% mIoU on the ADE-847 and PASCAL Context-459 datasets, significantly narrowing the gap between fully-supervised and weakly-supervised methods.

Updated: 2024-06-11 17:01:02

标题: 无监督掩码文本监督下的开放词汇分割

摘要: 目前最先进的开放词汇细分方法通常依赖于图像-掩码-文本三元组注释来进行监督。然而，获取这样详细的注释是费时的，并在复杂的现实场景中存在可扩展性挑战。虽然现有的弱监督方法利用图像-文本对来降低庞大的注释成本，但缺乏掩码监督使模型难以定位多个实例并准确地将具有相似语义的像素分组，显著阻碍了多功能性和性能。在本文中，我们介绍了Unpair-Seg，一种新颖的弱监督开放词汇细分框架，它从未配对的图像-掩码和图像-文本对中学习，可以独立且高效地收集。Unpair-Seg首先预测一组二进制掩码，并通过识别自信的掩码和文本实体对生成伪标签。然后，我们训练一个特征适配器根据这些伪标签将区域嵌入与文本嵌入对齐，实现开放词汇细分。然而，掩码-实体对应中固有的噪声对获取可靠的对应关系构成挑战。为了解决这个问题，我们采用了一个视觉语言大模型重新描述输入图像并提取精确的实体，并设计了一个多尺度匹配策略来减少噪声的掩码-实体对。我们的Unpair-Seg框架展示了令人印象深刻的性能，分别在ADE-847和PASCAL Context-459数据集上实现了14.6%和19.5%的mIoU，显著缩小了完全监督和弱监督方法之间的差距。

更新时间: 2024-06-11 17:01:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.08960v2

An Optimism-based Approach to Online Evaluation of Generative Models

Existing frameworks for evaluating and comparing generative models typically target an offline setting, where the evaluator has access to full batches of data produced by the models. However, in many practical scenarios, the goal is to identify the best model using the fewest generated samples to minimize the costs of querying data from the models. Such an online comparison is challenging with current offline assessment methods. In this work, we propose an online evaluation framework to find the generative model that maximizes a standard assessment score among a group of available models. Our method uses an optimism-based multi-armed bandit framework to identify the model producing data with the highest evaluation score, quantifying the quality and diversity of generated data. Specifically, we study the online assessment of generative models based on the Fr\'echet Inception Distance (FID) and Inception Score (IS) metrics and propose the FID-UCB and IS-UCB algorithms leveraging the upper confidence bound approach in online learning. We prove sub-linear regret bounds for these algorithms and present numerical results on standard image datasets, demonstrating their effectiveness in identifying the score-maximizing generative model.

Updated: 2024-06-11 16:57:48

标题: 一种基于乐观主义的方法用于生成模型的在线评估

摘要: 现有的用于评估和比较生成模型的框架通常针对离线环境，评估者可以访问模型生成的完整数据批次。然而，在许多实际情况下，目标是使用尽可能少的生成样本来识别最佳模型，以最小化从模型查询数据的成本。这种在线比较在当前的离线评估方法中具有挑战性。在这项工作中，我们提出了一个在线评估框架，以找到在一组可用模型中最大化标准评估分数的生成模型。我们的方法使用基于乐观主义的多臂赌博机框架来识别生成数据评分最高的模型，量化生成数据的质量和多样性。具体来说，我们基于Fr\'echet Inception Distance（FID）和Inception Score（IS）指标研究生成模型的在线评估，并提出了利用上置信界方法的FID-UCB和IS-UCB算法。我们证明了这些算法的次线性遗憾上界，并在标准图像数据集上展示了数值结果，证明了它们在识别最大化得分的生成模型方面的有效性。

更新时间: 2024-06-11 16:57:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.07451v1

Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Through this study, we aim to answer the following research questions: (i) How transferable are general-domain representations to the medical domain? (ii) Is multimodal contrastive training sufficient, or does it benefit from unimodal training as well? (iii) What is the impact of feature granularity on the effectiveness of multimodal medical representation learning? To answer these questions, we investigate eight contrastive learning approaches under identical training setups, and train them on 2.8 million image-text pairs from four datasets, and evaluate them on 25 downstream tasks, including classification (zero-shot and linear probing), image-to-text and text-to-image retrieval, and visual question-answering. Our findings suggest a positive answer to the first question, a negative answer to the second question, and the benefit of learning fine-grained features. Finally, we make our code publicly available.

Updated: 2024-06-11 16:55:38

标题: 医学表征学习中基准视觉-语言对比方法的评估Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

摘要: 我们对医学领域学习多模态表示的对比框架进行了全面基准测试。通过这项研究，我们旨在回答以下研究问题：(i)通用领域表示对医学领域有多大的可迁移性？(ii)多模态对比训练是否足够，还是也需要单模态训练？(iii)特征粒度对多模态医学表示学习的有效性有何影响？为了回答这些问题，我们在相同的训练设置下研究了八种对比学习方法，并在四个数据集的280万个图像-文本对上对它们进行训练，并在25个下游任务上进行评估，包括分类（零样本和线性探针）、图像到文本和文本到图像检索，以及视觉问答。我们的发现表明对于第一个问题有积极的答案，对于第二个问题有否定的答案，并且学习精细特征的好处。最后，我们将我们的代码公开提供。

更新时间: 2024-06-11 16:55:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.07450v1

Textual Similarity as a Key Metric in Machine Translation Quality Estimation

Machine Translation (MT) Quality Estimation (QE) assesses translation reliability without reference texts. This study introduces "textual similarity" as a new metric for QE, using sentence transformers and cosine similarity to measure semantic closeness. Analyzing data from the MLQE-PE dataset, we found that textual similarity exhibits stronger correlations with human scores than traditional metrics (hter, model evaluation etc.). Employing GAMMs as a statistical tool, we demonstrated that textual similarity consistently outperforms other metrics across multiple language pairs in predicting human scores. We also found that "hter" actually failed to predict human scores in QE. Our findings highlight the effectiveness of textual similarity as a robust QE metric, recommending its integration with other metrics into QE frameworks and MT system training for improved accuracy and usability.

Updated: 2024-06-11 16:48:17

标题: 文本相似度作为机器翻译质量评估中的关键指标

摘要: 机器翻译（MT）质量估计（QE）评估翻译可靠性，无需参考文本。本研究引入了“文本相似度”作为QE的新度量标准，利用句子转换器和余弦相似度来衡量语义接近度。通过分析MLQE-PE数据集的数据，我们发现文本相似度与人类评分之间的相关性比传统度量标准（hter，模型评估等）更强。通过将广义可加模型（GAMMs）作为统计工具，我们证明了文本相似度在多语言对中一贯优于其他度量标准，以预测人类评分。我们还发现，“hter”实际上未能预测QE中的人类评分。我们的发现突显了文本相似度作为强大QE度量标准的有效性，并建议将其与其他度量标准整合到QE框架和MT系统培训中，以提高准确性和可用性。

更新时间: 2024-06-11 16:48:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07440v1

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 10% on average. Notably, performance gains remain consistent across longer forecasting horizons.

Updated: 2024-06-11 16:45:48

标题: DeformTime：利用可变形注意力捕捉时间序列预测中的变量依赖关系

摘要: 在多元时间序列（MTS）预测中，现有的先进深度学习方法往往专注于自回归形式，并忽视外生指标中的信息。为解决这一局限，我们提出了DeformTime，一种神经网络架构，旨在捕捉输入空间中的相关时间模式，从而提高预测准确性。它采用由可变形注意力块（DAB）执行的两个核心操作：学习来自不同时间步的变量之间的依赖关系（变量DAB），并保留来自先前时间步的数据中的时间依赖关系（时间DAB）。输入数据转换被明确设计为增强从变形信息系列中学习，同时通过DAB。我们对6个MTS数据集进行了大量实验，使用先前建立的基准以及具有更多外生变量的具有挑战性的传染病建模任务。结果表明，DeformTime改进了准确性，相对于之前的竞争方法，减少了平均绝对误差约10％。值得注意的是，性能提升在更长的预测时间范围内保持一致。

更新时间: 2024-06-11 16:45:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.07438v1

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs to enhance prosodic information in latent representations. In the second stage, we build a multi-modal prosody annotator, comprising pretrained encoders, a text-speech fusing scheme, and a sequence classifier. Experiments on English prosodic boundaries demonstrate that our method achieves state-of-the-art (SOTA) performance with 0.72 and 0.93 f1 score for Prosodic Word and Prosodic Phrase boundary respectively, while bearing remarkable robustness to data scarcity.

Updated: 2024-06-11 16:43:11

标题: 多模式自动韵律标注：通过SSWP对比预训练

摘要: 在表达丰富且可控的文本到语音（TTS）中，显式的韵律特征显著提高了合成语音的自然度和可控性。然而，手动韵律标注是费时且不一致的。为了解决这个问题，本文提出了一个创新的两阶段自动标注流程。在第一阶段，我们使用对比性的Speech-Silence和Word-Punctuation（SSWP）对的预训练来增强潜在表示中的韵律信息。在第二阶段，我们构建了一个多模式韵律标注器，包括预训练编码器、文本-语音融合方案和序列分类器。对英语韵律边界的实验表明，我们的方法在韵律词和韵律短语边界的f1分数分别为0.72和0.93，达到了最先进的性能水平，并且对数据稀缺性具有显著的鲁棒性。

更新时间: 2024-06-11 16:43:11

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2309.05423v2

Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

Image restoration networks are usually comprised of an encoder and a decoder, responsible for aggregating image content from noisy, distorted data and to restore clean, undistorted images, respectively. Data aggregation as well as high-resolution image generation both usually come at the risk of involving aliases, i.e.~standard architectures put their ability to reconstruct the model input in jeopardy to reach high PSNR values on validation data. The price to be paid is low model robustness. In this work, we show that simply providing alias-free paths in state-of-the-art reconstruction transformers supports improved model robustness at low costs on the restoration performance. We do so by proposing BOA-Restormer, a transformer-based image restoration model that executes downsampling and upsampling operations partly in the frequency domain to ensure alias-free paths along the entire model while potentially preserving all relevant high-frequency information.

Updated: 2024-06-11 16:42:17

标题: 小心别名—— 信号保留对于稳健的图像恢复至关重要

摘要: 图像恢复网络通常由编码器和解码器组成，分别负责从嘈杂、失真数据中聚合图像内容，并恢复清晰、无失真的图像。数据聚合以及高分辨率图像生成通常存在涉及混叠的风险，即标准架构将其重建模型输入的能力置于危险之中，以达到验证数据上较高的PSNR值。付出的代价是模型鲁棒性较低。在这项工作中，我们展示了在最先进的重建变换器中提供无混叠路径仅需低成本即可支持改善模型的鲁棒性，并提升恢复性能。我们通过提出BOA-Restormer，一个基于变换器的图像恢复模型，部分在频域执行降采样和上采样操作，以确保整个模型沿着无混叠路径，同时可能保留所有相关的高频信息。

更新时间: 2024-06-11 16:42:17

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.07435v1

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.

Updated: 2024-06-11 16:42:02

标题: CTIBench：用于评估网络威胁情报中LLMs的基准测试

摘要: 网络威胁情报（CTI）在当今的网络安全领域至关重要，为理解和减轻不断演变的网络威胁提供了必要的见解。最近大规模语言模型（LLMs）的崛起显示出在这一领域具有潜力，但对它们的可靠性、准确性和幻觉的担忧仍然存在。尽管现有的基准提供了对LLMs的一般评估，但没有基准可以解决CTI特定任务的实际和应用方面。为了弥补这一差距，我们引入了CTIBench，这是一个旨在评估LLMs在CTI应用中表现的基准。CTIBench包括多个数据集，重点评估LLMs在网络威胁领域获得的知识。我们对几种最先进的模型在这些任务上的评估为我们提供了对它们在CTI环境中的优势和劣势的见解，有助于更好地理解LLMs在CTI中的能力。

更新时间: 2024-06-11 16:42:02

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.07599v1

Improving Logits-based Detector without Logits from Black-box LLMs

The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively.

Updated: 2024-06-11 16:41:52

标题: Improving Logits-based Detector without Logits from Black-box LLMs 改进基于逻辑斯蒂的检测器，无需来自黑盒LLMs的逻辑斯蒂

摘要: 大语言模型（LLMs）的出现彻底改变了文本生成，产生的输出几乎模仿了人类写作。机器和人类写作之间的界限变得模糊，这给区分二者带来了新的挑战，尤其是在频繁更新和封闭性强的主流专有LLMs的情况下更加复杂。传统的基于logits的检测方法利用替代模型来识别LLM生成的内容，当无法从黑匣子LLMs获取准确的logits时。然而，这些方法在替代模型和通常未公开的目标模型之间的分布不一致时会面临性能下降的问题，尤其是在引入新的封闭源模型时。此外，尽管当前的方法在识别源模型时通常是有效的，但在模型版本未知的情况下或测试集包含来自各种源模型的输出时会出现困难。为了解决这些限制，我们提出了Distribution-Aligned LLMs Detection（DALD），这是一个创新的框架，即使没有来自源LLMs的logits也重新定义了黑匣子文本检测的最新性能。DALD旨在将替代模型的分布与未知目标LLMs的分布对齐，确保增强的检测能力并抵抗快速模型迭代，同时最小化培训投入。通过利用ChatGPT、GPT-4和Claude-3等先进模型的公开可访问输出的语料库样本，DALD对替代模型进行微调，有效地与未知源模型分布同步。

更新时间: 2024-06-11 16:41:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05232v2

Label Alignment Regularization for Distribution Shift

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.

Updated: 2024-06-11 16:39:05

标题: 标签对齐正则化用于分布偏移

摘要: 最近的研究突出了监督学习中的标签对齐属性（LAP），即数据集中所有标签的向量大多在数据矩阵的前几个奇异向量的张成空间中。受到这一观察的启发，我们提出了一种无监督领域自适应的正则化方法，鼓励目标域预测与其前几个奇异向量的对齐。与传统的领域自适应方法侧重于正则化表示不同，我们改为通过LAP在源域和目标域中的引导，来使分类器与无监督目标数据对齐。理论分析表明，在一定假设下，我们的解决方案位于目标域数据的前几个右奇异向量的张成空间内，并与最优解对齐。通过消除传统领域自适应理论中常用的最优联合风险假设，我们展示了我们的方法在解决传统领域自适应方法常因高联合误差而无法解决的问题时的有效性。此外，我们在诸如MNIST-USPS领域自适应和跨语言情感分析等知名任务中报告了优于领域自适应基线的性能。

更新时间: 2024-06-11 16:39:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.14960v4

Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes

Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via "which variables enter the differential of which other variables". In this paper, we develop a kernel-based test of conditional independence (CI) on "path-space" -- e.g., solutions to SDEs, but applicable beyond that -- by leveraging recent advances in signature kernels. We demonstrate strictly superior performance of our proposed CI test compared to existing approaches on path-space and provide theoretical consistency results. Then, we develop constraint-based causal discovery algorithms for acyclic stochastic dynamical systems (allowing for self-loops) that leverage temporal information to recover the entire directed acyclic graph. Assuming faithfulness and a CI oracle, we show that our algorithms are sound and complete. We empirically verify that our developed CI test in conjunction with the causal discovery algorithms outperform baselines across a range of settings.

Updated: 2024-06-11 16:37:51

标题: Signature Kernel条件独立性检验在随机过程因果发现中的应用

摘要: 从观测数据中推断随机动力系统的因果结构，在从科学和健康到金融等领域都具有巨大潜力。这种过程通常可以通过随机微分方程（SDEs）准确建模，这自然地通过“哪些变量进入哪些其他变量的微分”暗示因果关系。在本文中，我们利用最近在签名核方面的进展，开发了一种基于核的条件独立性（CI）测试，用于“路径空间”——例如，SDE的解，但适用于更广泛的领域。我们证明了我们提出的CI测试在路径空间上相比现有方法具有严格优越的性能，并提供了理论上的一致性结果。然后，我们为无环随机动力系统（允许自循环）开发了基于约束的因果发现算法，利用时间信息恢复整个有向无环图。在假设忠实性和CI神谕的情况下，我们展示了我们的算法是可靠且完整的。我们经验性地验证了我们开发的CI测试与因果发现算法在各种设置下优于基准线。

更新时间: 2024-06-11 16:37:51

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.18477v2

Making 'syscall' a Privilege not a Right

Browsers, Library OSes, and system emulators rely on sandboxes and in-process isolation to emulate system resources and securely isolate untrusted components. All access to system resources like system calls (syscall) need to be securely mediated by the application. Otherwise system calls may allow untrusted components to evade the emulator or sandbox monitor, and hence, escape and attack the entire application or system. Existing approaches, such as ptrace, require additional context switches between kernel and userspace, which introduce high performance overhead. And, seccomp-bpf supports only limited policies, which restricts its functionality, or it still requires ptrace to provide assistance. In this paper, we present nexpoline, a secure syscall interception mechanism combining Memory Protection Keys (MPK) and Seccomp or Syscall User Dispatch (SUD). Our approach transforms an application's syscall instruction into a privilege reserved for the trusted monitor within the address space, allowing flexible user defined policy. To execute a syscall, the application must switch contexts via nexpoline. It offers better efficiency than secure interception techniques like ptrace, as nexpoline can intercept syscalls through binary rewriting securely. Consequently, nexpoline ensures the safety, flexibility and efficiency for syscall interception. Notably, it operates without kernel modifications, making it viable on current Linux systems without needing root privileges. Our benchmarks demonstrate improved performance over ptrace in interception overhead while achieving the same security guarantees. When compared to similarly performing firejail, nexpoline supports more complex policies and enables the possibility to emulate system resources.

Updated: 2024-06-11 16:33:56

标题: 将“系统调用”变为一种特权而非权利

摘要: 浏览器、库操作系统和系统仿真器依赖沙箱和进程内隔离来模拟系统资源并安全地隔离不受信任的组件。所有对系统资源（如系统调用）的访问都需要由应用程序安全地进行介入。否则，系统调用可能允许不受信任的组件规避仿真器或沙箱监视器，从而逃脱并攻击整个应用程序或系统。现有方法，如ptrace，需要内核和用户空间之间额外的上下文切换，这会引入高性能开销。而seccomp-bpf仅支持有限的策略，限制了其功能，或者仍需要ptrace提供帮助。在本文中，我们提出了nexpoline，一种安全的系统调用拦截机制，结合了内存保护密钥（MPK）和Seccomp或Syscall User Dispatch（SUD）。我们的方法将应用程序的系统调用指令转换为地址空间内受信任监视器保留的特权，允许灵活的用户定义策略。要执行系统调用，应用程序必须通过nexpoline切换上下文。与像ptrace这样的安全拦截技术相比，nexpoline可以通过二进制重写安全地拦截系统调用，从而确保了系统调用拦截的安全性、灵活性和效率。值得注意的是，它在不需要根权限的当前Linux系统上运行而无需对内核进行修改。我们的基准测试表明，在实现相同安全保障的同时，nexpoline在拦截开销方面性能更好。与性能相似的firejail相比，nexpoline支持更复杂的策略，并且可以模拟系统资源的可能性。

更新时间: 2024-06-11 16:33:56

领域: cs.CR

下载: http://arxiv.org/abs/2406.07429v1

GemNet: Menu-Based, Strategy-Proof Multi-Bidder Auctions Through Deep Learning

Differentiable economics uses deep learning for automated mechanism design. Despite strong progress, it has remained an open problem to learn multi-bidder, general, and fully strategy-proof (SP) auctions. We introduce GEneral Menu-based NETwork (GemNet), which significantly extends the menu-based approach of RochetNet [D\"utting et al., 2023] to the multi-bidder setting. The challenge in achieving SP is to learn bidder-independent menus that are feasible, so that the optimal menu choices for each bidder do not over-allocate items when taken together (we call this menu compatibility). GemNet penalizes the failure of menu compatibility during training, and transforms learned menus after training through price changes, by considering a set of discretized bidder values and reasoning about Lipschitz smoothness to guarantee menu compatibility on the entire value space. This approach is general, leaving undisturbed trained menus that already satisfy menu compatibility and reducing to RochetNet for a single bidder. Mixed-integer linear programs are used for menu transforms and through a number of optimizations, including adaptive grids and methods to skip menu elements, we scale to large auction design problems. GemNet learns auctions with better revenue than affine maximization methods, achieves exact SP whereas previous general multi-bidder methods are approximately SP, and offers greatly enhanced interpretability.

Updated: 2024-06-11 16:30:30

标题: GemNet：基于菜单的、通过深度学习实现的无策略性多竞标者拍卖

摘要: 可微经济学使用深度学习进行自动机制设计。尽管取得了强大的进展，但学习多竞标者、通用和完全策略稳健(SP)拍卖仍然是一个开放问题。我们引入了GEneral Menu-based NETwork (GemNet)，它显著扩展了RochetNet的基于菜单的方法[D\"utting等，2023]到多竞标者设置中。实现SP的挑战在于学习与竞标者无关的可行菜单，使得每个竞标者的最佳菜单选择在一起时不会过度分配物品（我们称之为菜单兼容性）。GemNet在训练期间对菜单兼容性的失败进行惩罚，并通过考虑一组离散化的竞标者价值并推理利普希茨平滑性，来通过价格变化在训练后转换学习到的菜单，以保证整个价值空间上的菜单兼容性。这种方法是通用的，对已经满足菜单兼容性的训练菜单不做改动，并且在单个竞标者情况下减少到RochetNet。混合整数线性规划被用于菜单转换，并通过一系列优化，包括自适应网格和跳过菜单元素的方法，我们可以扩展到大型拍卖设计问题。GemNet学习到的拍卖收入比仿射最大化方法更好，实现了精确的SP，而先前的通用多竞标者方法是近似的SP，并且提供了更强大的可解释性。

更新时间: 2024-06-11 16:30:30

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07428v1

Errors are Robustly Tamed in Cumulative Knowledge Processes

We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of knowledge in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced a concrete probabilistic model to analyze such questions and showed an affirmative answer to this question. Their study, however, focuses on the simple case where each new unit depends on just one existing unit, and units attach according to a $\textit{preferential attachment rule}$. In this work, we consider much more general families of cumulative knowledge processes, where new units may attach according to varied attachment mechanisms and depend on multiple existing units. We also allow a (random) fraction of insertions of adversarial nodes. We give a robust affirmative answer to the above question by showing that for $\textit{all}$ of these models, as long as many of the units follow simple heuristics for checking a bounded number of units they depend on, all errors will be eventually eliminated. Our results indicate that preserving the quality of large interdependent collections of units of knowledge is feasible, as long as careful but not too costly checks are performed when new units are derived/deposited.

Updated: 2024-06-11 16:28:28

标题: 错误在累积知识过程中被稳健地控制

摘要: 我们研究社会知识积累的过程，其中一个新的知识单元的有效性取决于其推导的正确性以及其所依赖的单元的有效性。在这种情况下的一个基本问题是：如果新的推导中有一个恒定比例是错误的，那么投入一个恒定比例（且远离一的）的努力是否可以确保社会中的一个恒定比例的知识是有效的？Ben-Eliezer，Mikulincer，Mossel和Sudan（ITCS 2023）引入了一个具体的概率模型来分析这类问题，并对这个问题给出了肯定的答案。然而，他们的研究集中在一个简单的情况，即每个新单元仅依赖于一个现有单元，并且单元根据“优先附着规则”附加。在这项工作中，我们考虑了更一般的累积知识过程家族，其中新单元可能根据不同的附加机制附加，并依赖于多个现有单元。我们还允许（随机）插入对抗节点的一部分。通过展示对于所有这些模型，只要许多单元遵循简单的启发式方法来检查它们所依赖的有限数量的单元，所有的错误最终都会被消除，我们给出了对上述问题的强有力的肯定答案。我们的结果表明，只要在衍生/存储新单元时进行谨慎但不会过于昂贵的检查，就可以保持大量相互依赖的知识单元的质量。

更新时间: 2024-06-11 16:28:28

领域: cs.AI,cs.DS,cs.SI,math.PR

下载: http://arxiv.org/abs/2309.05638v3

Nash Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.

Updated: 2024-06-11 16:25:52

标题: Nash 从人类反馈中学习

摘要: 人类反馈的强化学习（RLHF）已经成为将大型语言模型（LLMs）与人类偏好对齐的主要范式。通常，RLHF 包括从人类反馈中学习奖励模型的初始步骤，通常表达为对由预训练的LLM生成的文本生成对之间的偏好。随后，通过强化学习算法优化LLM的策略，以最大化奖励模型。然而，当前奖励模型的固有局限性在于无法充分代表人类偏好的丰富性以及它们对采样分布的依赖。在这项研究中，我们介绍了一种使用两两人类反馈进行LLMs微调的替代流程。我们的方法包括首先学习一个偏好模型，该模型在给定提示的情况下对两个输入进行条件化，然后追求一种策略，该策略始终生成优于任何竞争策略生成的响应，从而定义了该偏好模型的纳什均衡。我们将这种方法称为从人类反馈中的纳什学习（NLHF）。在表格策略表示的背景下，我们提出了一种基于镜像下降原则的新颖算法解决方案，Nash-MD。该算法生成一系列策略，最后一次迭代收敛到正则化的纳什均衡。此外，我们探索了策略的参数表示，并引入了用于深度学习体系结构的梯度下降算法。为了展示我们方法的有效性，我们呈现了涉及LLM文本摘要任务微调的实验结果。我们相信NLHF提供了一条引人注目的途径，用于偏好学习和策略优化，有望推动将LLMs与人类偏好对齐的领域的发展。

更新时间: 2024-06-11 16:25:52

领域: stat.ML,cs.AI,cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2312.00886v4

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

Updated: 2024-06-11 16:24:45

标题: ChatGPT能检测深度伪造视频吗？一项利用多模态大型语言模型进行媒体取证的研究

摘要: DeepFakes指的是人工智能生成的媒体内容，由于它们被用作散布虚假信息的手段，已经成为一个日益关注的问题。目前，检测DeepFakes的方法是使用编程机器学习算法。在这项工作中，我们研究了多模态大型语言模型（LLMs）在DeepFake检测中的能力。我们进行了定性和定量实验，展示了多模态LLMs可以通过仔细的实验设计和及时的工程手段揭示人工智能生成的图像。这是有趣的，考虑到LLMs并非专门为媒体取证任务而设计，且该过程不需要编程。我们讨论了多模态LLMs在这些任务中的局限性，并提出了可能的改进方案。

更新时间: 2024-06-11 16:24:45

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.14077v4

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here.

Updated: 2024-06-11 16:23:33

标题: 超越ELBOs：大规模评估采样变分方法

摘要: 蒙特卡洛方法、变分推断以及它们的组合在从难以处理的概率分布中抽样中起着至关重要的作用。然而，当前的研究缺乏统一的评估框架，依赖于不同的性能指标和在不同任务中的有限方法比较，使得评估进展变得复杂，并妨碍从业者的决策。针对这些挑战，我们的工作引入了一个基准，使用标准化的任务套件和广泛的性能标准来评估抽样方法。此外，我们研究了用于量化模式崩溃的现有度量标准，并引入了新颖的度量标准。我们的发现揭示了现有抽样方法的优势和劣势，为未来的发展提供了宝贵的参考。代码在此处公开提供。

更新时间: 2024-06-11 16:23:33

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.07423v1

Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization

Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. This combination mitigates the biases inherent in the initial boundaries and harnesses RL's adaptability to refine and target gene panel selection dynamically. To illustrate the effectiveness of our method, we conducted detailed comparative experiments, case studies, and visualization analysis.

Updated: 2024-06-11 16:21:33

标题: 单细胞基因组学中增强的基因选择：预过滤协同作用和强化优化

摘要: 最近单细胞基因组学的进展要求在基因面板选择方面精确以有效解释复杂的生物数据。这些方法旨在通过专注于对特定分析任务有显著贡献的最具信息量的基因，简化对scRNA-seq数据的分析。传统的选择方法通常依赖于专家领域知识、嵌入式机器学习模型或基于启发式迭代优化，容易受到偏见和低效率的影响，可能会掩盖关键的基因组信号。认识到传统方法的局限性，我们旨在通过精细的策略超越这些限制。在这项研究中，我们介绍了一种适用于单细胞基因组学聚类任务的迭代基因面板选择策略。我们的方法独特地集成了其他基因选择算法的结果，提供有价值的初步边界或先验知识作为搜索空间中的初始指导，以增强我们框架的效率。此外，我们将探索过程中的随机性贯穿于强化学习（RL）中，并通过基于奖励的反馈实现连续优化。这种组合减轻了初始边界固有的偏见，并利用RL的适应性动态地改进和定位基因面板选择。为了说明我们方法的有效性，我们进行了详细的比较实验、案例研究和可视化分析。

更新时间: 2024-06-11 16:21:33

领域: cs.AI,cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2406.07418v1

Holistic Memory Diversification for Incremental Learning in Growing Graphs

This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad previous knowledge within the scarce memory on graphs. To address that, we introduce a novel holistic Diversified Memory Selection and Generation (DMSG) framework for incremental learning in graphs, which first introduces a buffer selection strategy that considers both intra-class and inter-class diversities, employing an efficient greedy algorithm for sampling representative training nodes from graphs into memory buffers after learning each new task. Then, to adequately rememorize the knowledge preserved in the memory buffer when learning new tasks, we propose a diversified memory generation replay method. This method first utilizes a variational layer to generate the distribution of buffer node embeddings and sample synthesized ones for replaying. Furthermore, an adversarial variational embedding learning method and a reconstruction-based decoder are proposed to maintain the integrity and consolidate the generalization of the synthesized node embeddings, respectively. Finally, we evaluate our model on node classification tasks involving increasing class numbers. Extensive experimental results on publicly accessible datasets demonstrate the superiority of DMSG over state-of-the-art methods.

Updated: 2024-06-11 16:18:15

标题: 在日益增长的图中进行渐进学习的整体记忆多样化

摘要: 本文讨论了在不断增长的图中进行增量学习的挑战，这些图具有越来越复杂的任务。目标是持续训练一个图模型，以处理新任务，同时保留其对先前任务的推理能力。现有方法通常忽视记忆多样性的重要性，限制了有效地从先前任务中选择高质量记忆，并在图上稀缺的记忆中记住广泛的先前知识。为了解决这个问题，我们引入了一个新颖的全面多样化记忆选择和生成（DMSG）框架，用于图中的增量学习，该框架首先引入了一个缓冲选择策略，考虑了类内和类间的多样性，使用高效的贪婪算法从图中抽样代表性训练节点到每次学习新任务后的内存缓冲区。然后，在学习新任务时充分记忆保留在记忆缓冲区中的知识，我们提出了一种多样化记忆生成重放方法。该方法首先利用变分层生成缓冲节点嵌入的分布，并抽样合成节点用于重放。此外，提出了一种对抗变分嵌入学习方法和基于重构的解码器，用于分别维护合成节点嵌入的完整性和巩固泛化。最后，我们在涉及不断增加的类数的节点分类任务上评估我们的模型。公开可访问的数据集上的大量实验结果表明DMSG优于最先进的方法。

更新时间: 2024-06-11 16:18:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.07413v1

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.

Updated: 2024-06-11 16:17:57

标题: 关于损失和基于不确定性的主动学习算法的收敛性

摘要: 我们调查了使用随机梯度下降（SGD）算法训练机器学习模型所需的收敛速度和数据样本大小，其中数据点根据其损失值或不确定性值进行抽样。这些训练方法在主动学习和数据子集选择问题中特别相关。对于具有恒定步长更新的SGD，我们提出了线性分类器和线性可分数据集使用二次铰链损失和类似训练损失函数的收敛结果。此外，我们将分析扩展到更一般的分类器和数据集，考虑基于各种损失的抽样策略和平滑凸训练损失函数。我们提出了一种名为自适应权重抽样（AWS）的新算法，利用具有自适应步长的SGD，在期望中实现随机Polyak步长。我们为AWS在平滑凸训练损失函数上建立了收敛速度结果。我们的数值实验通过使用精确或估计的损失值，在各种数据集上展示了AWS的效率。

更新时间: 2024-06-11 16:17:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.13927v3

Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms.

Updated: 2024-06-11 16:14:30

标题: 通过结构化类牛顿下降加速病态Hankel矩阵恢复

摘要: 本文研究了鲁棒Hankel恢复问题，该问题同时移除稀疏异常值并填补部分观测中的缺失条目。我们提出了一种新颖的非凸算法，命名为Hankel结构化牛顿样式下降（HSNLD），以解决鲁棒Hankel恢复问题。HSNLD具有高效率和线性收敛性，其收敛速度与基础Hankel矩阵的条件数无关。在一些温和条件下已建立了恢复保证。对合成和真实数据集的数值实验显示了HSNLD相对于最先进算法的卓越性能。

更新时间: 2024-06-11 16:14:30

领域: stat.ML,cs.IT,cs.LG,eess.SP,math.IT,math.OC,15A29, 15A83, 47B35, 90C17, 90C26, 90C53

下载: http://arxiv.org/abs/2406.07409v1

Private Geometric Median

In this paper, we study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset: Given $n$ points, $x_1,\dots,x_n$ in $\mathbb{R}^d$, the goal is to find a point $\theta$ that minimizes the sum of the Euclidean distances to these points, i.e., $\sum_{i=1}^{n} \|\theta - x_i\|_2$. Off-the-shelf methods, such as DP-GD, require strong a priori knowledge locating the data within a ball of radius $R$, and the excess risk of the algorithm depends linearly on $R$. In this paper, we ask: can we design an efficient and private algorithm with an excess error guarantee that scales with the (unknown) radius containing the majority of the datapoints? Our main contribution is a pair of polynomial-time DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints. Additionally, we propose an inefficient algorithm based on the inverse smooth sensitivity mechanism, which satisfies the more restrictive notion of pure DP. We complement our results with a lower bound and demonstrate the optimality of our polynomial-time algorithms in terms of sample complexity.

Updated: 2024-06-11 16:13:09

标题: 私有几何中位数

摘要: 在这篇论文中，我们研究了用于计算数据集的几何中位数（GM）的差分隐私（DP）算法：给定$n$个点$x_1,\dots,x_n$在$\mathbb{R}^d$中，目标是找到一个点$\theta$，使得该点到这些点的欧氏距离之和最小，即$\sum_{i=1}^{n} \|\theta - x_i\|_2$。现成的方法，如DP-GD，需要强大的先验知识来确定数据在半径为$R$的球中，算法的过度风险与$R$成线性关系。在这篇论文中，我们提出了一个问题：我们能否设计一个有效且私密的算法，其过度误差保证随着包含大多数数据点的（未知）半径而缩放？我们的主要贡献是一对用于私密GM任务的多项式时间DP算法，其过度误差保证随着数据点的有效直径而缩放。此外，我们提出了一种基于逆平滑灵敏度机制的低效算法，满足更严格的纯DP概念。我们通过一个下界和对样本复杂性的优化证明了我们多项式时间算法的最优性。

更新时间: 2024-06-11 16:13:09

领域: cs.LG

下载: http://arxiv.org/abs/2406.07407v1

Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.

Updated: 2024-06-11 16:10:37

标题: 用灵活的基于图的强化探索策略增强表格数据优化

摘要: 表格数据优化方法旨在自动找到一个能够生成高价值特征并提高下游机器学习任务性能的最佳特征转换过程。当前用于自动特征转换的框架依赖于迭代序列生成任务，通过来自下游任务的性能反馈来优化决策策略。然而，这些方法未能有效利用历史决策经验，忽视了生成特征之间的潜在关系，从而限制了知识提取的深度。此外，决策过程的粒度缺乏对单个特征的动态回溯能力，导致在遇到低效路径时适应性不足，严重影响整体稳健性和探索效率。为了解决当前自动特征工程框架中观察到的局限性，我们引入了一种新方法，利用特征状态转换图有效保存整个特征转换过程，其中每个节点代表一个特定的转换状态。在探索过程中，三个级联代理迭代地选择节点并运用数学运算来生成新的转换状态。这种策略利用了图结构的固有属性，允许保存和重用有价值的转换。它还通过图修剪技术实现了回溯能力，可以纠正低效的转换路径。为了验证我们方法的功效和灵活性，我们进行了全面的实验和详细的案例研究，展示了在不同场景中的卓越性能。

更新时间: 2024-06-11 16:10:37

领域: cs.LG

下载: http://arxiv.org/abs/2406.07404v1

A Survey on Recent Random Walk-based Methods for Embedding Knowledge Graphs

Machine learning, deep learning, and NLP methods on knowledge graphs are present in different fields and have important roles in various domains from self-driving cars to friend recommendations on social media platforms. However, to apply these methods to knowledge graphs, the data usually needs to be in an acceptable size and format. In fact, knowledge graphs normally have high dimensions and therefore we need to transform them to a low-dimensional vector space. An embedding is a low-dimensional space into which you can translate high dimensional vectors in a way that intrinsic features of the input data are preserved. In this review, we first explain knowledge graphs and their embedding and then review some of the random walk-based embedding methods that have been developed recently.

Updated: 2024-06-11 16:08:39

标题: 一份关于最近基于随机游走的方法用于嵌入知识图的调查

摘要: 机器学习、深度学习和自然语言处理方法在知识图谱中被广泛应用于不同领域，并在从自动驾驶汽车到社交媒体平台上的朋友推荐等各个领域发挥着重要作用。然而，要将这些方法应用于知识图谱，通常需要数据处于可接受的大小和格式。实际上，知识图谱通常具有高维度，因此我们需要将它们转换为低维度向量空间。嵌入是一个低维空间，您可以将高维向量转换为其中，以保留输入数据的固有特征。在本综述中，我们首先解释知识图谱及其嵌入，然后回顾一些最近开发的基于随机游走的嵌入方法。

更新时间: 2024-06-11 16:08:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.07402v1

Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control

Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal logic specification engineering is to understand what kind of guidance is most helpful to the LLM and the users to easily produce specifications. Looking specifically at the problem of reactive program synthesis, we explore the impact of providing an LLM with guidance on the separation of control and data--making explicit for the LLM what functionality is relevant for the specification, and treating the remaining functionality as an implementation detail for a series of pre-defined functions and predicates. We present a benchmark set and find that this separation of concerns improves specification generation. Our benchmark provides a test set against which to verify future work in LLM generation of temporal logic specifications.

Updated: 2024-06-11 16:07:24

标题: 用明确的数据和控制分离引导LLM时序逻辑生成

摘要: 时间逻辑是一种强大的工具，被广泛用于合成和验证反应系统。最近对大型语言模型（LLMs）的进展有可能使编写这类规范的过程更容易。然而，对于除了最专家用户外的所有人来说，使用时间逻辑编写规范仍然具有挑战性。在使用LLMs进行时间逻辑规范工程时的一个关键问题是了解什么样的指导对LLMs和用户来说最有帮助，以便轻松生成规范。具体来看反应程序合成问题，我们探讨为LLM提供关于控制和数据分离的指导的影响，明确告诉LLM哪些功能对规范是相关的，并将其余功能视为一系列预定义函数和谓词的实现细节。我们提出了一个基准集，并发现这种关注点的分离改进了规范生成。我们的基准提供了一个测试集，可以用来验证LLM生成时间逻辑规范的未来工作。

更新时间: 2024-06-11 16:07:24

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.07400v1

Redefining Automotive Radar Imaging: A Domain-Informed 1D Deep Learning Approach for High-Resolution and Efficient Performance

Millimeter-wave (mmWave) radars are indispensable for perception tasks of autonomous vehicles, thanks to their resilience in challenging weather conditions. Yet, their deployment is often limited by insufficient spatial resolution for precise semantic scene interpretation. Classical super-resolution techniques adapted from optical imaging inadequately address the distinct characteristics of radar signal data. In response, our study redefines radar imaging super-resolution as a one-dimensional (1D) signal super-resolution spectra estimation problem by harnessing the radar signal processing domain knowledge, introducing innovative data normalization and a domain-informed signal-to-noise ratio (SNR)-guided loss function. Our tailored deep learning network for automotive radar imaging exhibits remarkable scalability, parameter efficiency and fast inference speed, alongside enhanced performance in terms of radar imaging quality and resolution. Extensive testing confirms that our SR-SPECNet sets a new benchmark in producing high-resolution radar range-azimuth images, outperforming existing methods across varied antenna configurations and dataset sizes. Source code and new radar dataset will be made publicly available online.

Updated: 2024-06-11 16:07:08

标题: 重新定义汽车雷达成像：一种基于领域知识的1D深度学习方法，用于高分辨率和高效性能

摘要: 毫米波（mmWave）雷达在自动驾驶车辆的感知任务中不可或缺，得益于其在恶劣天气条件下的韧性。然而，它们的部署通常受到空间分辨率不足以进行精确语义场景解释的限制。经典的超分辨技术改编自光学成像，未能充分解决雷达信号数据的独特特征。为此，我们的研究将雷达成像超分辨重新定义为利用雷达信号处理领域知识的一维（1D）信号超分辨光谱估计问题，引入创新的数据归一化和基于信噪比（SNR）指导的损失函数。我们为汽车雷达成像量身定制了深度学习网络，表现出卓越的可扩展性、参数效率和快速推理速度，同时在雷达成像质量和分辨率方面表现出增强的性能。广泛的测试证实，我们的SR-SPECNet在生成高分辨率雷达距离-方位图像方面设立了新的基准，优于现有的方法，无论是在不同的天线配置还是数据集大小方面。源代码和新的雷达数据集将在线上公开提供。

更新时间: 2024-06-11 16:07:08

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.07399v1

Visual Representation Learning with Stochastic Frame Prediction

Self-supervised learning of image representations by predicting future frames is a promising direction but still remains a challenge. This is because of the under-determined nature of frame prediction; multiple potential futures can arise from a single current frame. To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in frame prediction and explore its effectiveness for representation learning. Specifically, we design a framework that trains a stochastic frame prediction model to learn temporal information between frames. Moreover, to learn dense information within each frame, we introduce an auxiliary masked image modeling objective along with a shared decoder architecture. We find this architecture allows for combining both objectives in a synergistic and compute-efficient manner. We demonstrate the effectiveness of our framework on a variety of tasks from video label propagation and vision-based robot learning domains, such as video segmentation, pose tracking, vision-based robotic locomotion, and manipulation tasks. Code is available on the project webpage: https://sites.google.com/view/2024rsp.

Updated: 2024-06-11 16:05:15

标题: 使用随机帧预测进行视觉表示学习

摘要: 通过预测未来帧进行图像表示的自监督学习是一个有前途的方向，但仍然面临挑战。这是因为帧预测的不确定性特性；一个当前帧可能会有多个潜在的未来。为了解决这一挑战，在本文中，我们重新审视了学习捕捉帧预测不确定性的随机视频生成的概念，并探讨了其在表示学习中的有效性。具体来说，我们设计了一个框架，训练一个随机帧预测模型来学习帧之间的时间信息。此外，为了学习每帧内的密集信息，我们引入了一个辅助的掩码图像建模目标，以及一个共享解码器架构。我们发现这种架构允许以一种协同和高效的方式结合两个目标。我们在视频标签传播和基于视觉的机器人学习领域的各种任务上展示了我们框架的有效性，例如视频分割、姿势跟踪、基于视觉的机器人运动和操作任务。该项目网页上提供了代码：https://sites.google.com/view/2024rsp。

更新时间: 2024-06-11 16:05:15

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.07398v1

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

Updated: 2024-06-11 16:01:07

标题: 通过Monte Carlo树自我细化和LLaMa-3 8B访问GPT-4级数学奥林匹克解决方案

摘要: 本文介绍了MCT Self-Refine（MCTSr）算法，这是一种创新的大语言模型（LLM）与蒙特卡洛树搜索（MCTS）相结合的算法，旨在提高复杂数学推理任务的性能。MCTSr解决了LLM在战略和数学推理中准确性和可靠性方面的挑战，利用系统性探索和启发式自我调整机制来改进LLM内的决策框架。该算法通过选择、自我调整、自我评估和反向传播的迭代过程构建蒙特卡洛搜索树，利用改进的上置信区间（UCB）公式优化探索-利用平衡。大量实验证明了MCTSr在解决奥林匹克级数学问题方面的有效性，在多个数据集（包括GSM8K、GSM Hard、MATH和奥林匹克级基准数据集，包括Math Odyssey、AIME和OlympiadBench）中显着提高了成功率。本研究推动了LLM在复杂推理任务中的应用，并为未来AI集成奠定了基础，提高了LLM驱动应用程序中决策准确性和可靠性。

更新时间: 2024-06-11 16:01:07

领域: cs.AI

下载: http://arxiv.org/abs/2406.07394v1

Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https://github.com/divelab/MFA.

Updated: 2024-06-11 15:58:56

标题: 通过最小帧平均实现等变性以获得更多对称性和效率

摘要: 我们考虑通过帧平均实现机器学习系统中的等变性。当前的帧平均方法涉及大框架上的昂贵总和，或者依赖于仅产生近似等变性的基于采样的方法。在这里，我们提出了Minimal Frame Averaging（MFA），这是一个构建可以被证明是完全等变的最小框架的数学框架。MFA的一般基础还允许我们将帧平均扩展到比以前考虑的更多群体，包括用于描述时空对称性的洛伦兹群和用于复值域的幺正群。结果表明，通过MFA对各种任务进行对称性编码的效率和有效性，包括$n$体模拟、对撞机物理中的顶部标记和放松的能量预测。我们的代码可在https://github.com/divelab/MFA找到。

更新时间: 2024-06-11 15:58:56

领域: cs.LG

下载: http://arxiv.org/abs/2406.07598v1

From Classification to Segmentation with Explainable AI: A Study on Crack Detection and Growth Monitoring

Monitoring surface cracks in infrastructure is crucial for structural health monitoring. Automatic visual inspection offers an effective solution, especially in hard-to-reach areas. Machine learning approaches have proven their effectiveness but typically require large annotated datasets for supervised training. Once a crack is detected, monitoring its severity often demands precise segmentation of the damage. However, pixel-level annotation of images for segmentation is labor-intensive. To mitigate this cost, one can leverage explainable artificial intelligence (XAI) to derive segmentations from the explanations of a classifier, requiring only weak image-level supervision. This paper proposes applying this methodology to segment and monitor surface cracks. We evaluate the performance of various XAI methods and examine how this approach facilitates severity quantification and growth monitoring. Results reveal that while the resulting segmentation masks may exhibit lower quality than those produced by supervised methods, they remain meaningful and enable severity monitoring, thus reducing substantial labeling costs.

Updated: 2024-06-11 15:55:48

标题: 从分类到可解释人工智能的分割：裂缝检测和生长监测研究

摘要: 监测基础设施表面裂缝对结构健康监测至关重要。自动视觉检测提供了一种有效的解决方案，特别是在难以到达的区域。机器学习方法已经证明了它们的有效性，但通常需要大量带有注释的数据集进行监督训练。一旦检测到裂缝，监测其严重程度通常需要对损伤进行精确分割。然而，为了进行分割，对图像进行像素级注释是一项劳动密集型工作。为了减少这种成本，可以利用可解释人工智能（XAI）从分类器的解释中推导出分割，只需要弱图像级监督。本文提出将这种方法应用于分割和监测表面裂缝。我们评估了各种XAI方法的性能，并检查这种方法如何促进严重程度量化和增长监测。结果显示，虽然生成的分割蒙版可能比受监督方法产生的蒙版质量低，但它们仍然具有意义，并能够实现严重程度监测，从而减少了大量的标注成本。

更新时间: 2024-06-11 15:55:48

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2309.11267v2

Test-Driven Development for Code Generation

Recent Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional human-led software development, where code is often written in response to a requirement. Historically, Test-Driven Development (TDD) has proven its merit, requiring developers to write tests before the functional code, ensuring alignment with the initial problem statements. Applying TDD principles to LLM-based code generation offers one distinct benefit: it enables developers to verify the correctness of generated code against predefined tests. This paper investigates if and how TDD can be incorporated into AI-assisted code-generation processes. We experimentally evaluate our hypothesis that providing LLMs like GPT-4 and Llama 3 with tests in addition to the problem statements enhances code generation outcomes. We experimented with established function-level code generation benchmarks such as MBPP and HumanEval. Our results consistently demonstrate that including test cases leads to higher success in solving programming challenges. We assert that TDD is a promising paradigm for helping ensure that the code generated by LLMs effectively captures the requirements.

Updated: 2024-06-11 15:53:35

标题: 代码生成的测试驱动开发

摘要: 最近的大型语言模型(LLMs)已经展示出从问题陈述直接生成代码片段的显著能力。这种越来越自动化的过程反映了传统的人类主导软件开发，其中代码通常是根据需求编写的。从历史上看，测试驱动开发(TDD)已经证明了其价值，要求开发人员在编写功能代码之前编写测试，确保与初始问题陈述保持一致。将TDD原则应用于基于LLM的代码生成提供了一个明显的好处：它使开发人员能够根据预定义的测试验证生成的代码的正确性。本文调查了TDD如何可以纳入AI辅助的代码生成过程。我们通过实验证明了我们的假设，即为GPT-4和Llama 3等LLMs提供测试而不仅是问题陈述可以增强代码生成结果。我们在已建立的函数级别代码生成基准测试中进行了实验，如MBPP和HumanEval。我们的结果一致表明，包含测试用例会导致更高成功率解决编程挑战。我们断言，TDD是一个有希望的范例，可以帮助确保LLMs生成的代码有效地满足需求。

更新时间: 2024-06-11 15:53:35

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2402.13521v2

Algorithmic Persuasion Through Simulation

We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior. After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives. We characterize the sender's optimal messaging policy given any distribution over receiver types. We then design a polynomial-time querying algorithm that optimizes the sender's expected utility in this game. We also consider approximate oracles, more general query structures, and costly queries.

Updated: 2024-06-11 15:51:08

标题: 算法模拟的说服力

摘要: 我们研究了一个贝叶斯说服游戏，发送方希望说服接收方采取二元行动，例如购买产品。发送方了解世界的（二元）状态，例如产品质量高低，但只能获得有限的关于接收方信念和效用的信息。受顾客调查、用户研究和人工智能的最新进展的启发，我们允许发送方通过查询模拟接收方行为的预言者来了解更多关于接收方的信息。在固定数量的查询之后，发送方承诺一个消息政策，接收方根据收到的信息选择能最大化其预期效用的行动。我们描述了发送方在任何接收方类型分布下的最优消息策略。然后，我们设计了一个多项式时间的查询算法，优化了发送方在这个游戏中的预期效用。我们还考虑了近似预言者、更一般的查询结构和昂贵的查询。

更新时间: 2024-06-11 15:51:08

领域: cs.GT,cs.AI,econ.TH

下载: http://arxiv.org/abs/2311.18138v4

World Models with Hints of Large Language Models for Goal Achieving

Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (DLLM). DLLM integrates the proposed hinting subgoals from the LLMs into the model rollouts to encourage goal discovery and reaching in challenging tasks. By assigning higher intrinsic rewards to samples that align with the hints outlined by the language model during model rollouts, DLLM guides the agent toward meaningful and efficient exploration. Extensive experiments demonstrate that the DLLM outperforms recent methods in various challenging, sparse-reward environments such as HomeGrid, Crafter, and Minecraft by 27.7\%, 21.1\%, and 9.9\%, respectively.

Updated: 2024-06-11 15:49:08

标题: 具有大语言模型暗示的世界模型用于实现目标

摘要: 强化学习在面对长时间跨度任务和稀疏目标时面临困难，这是因为手动指定奖励的难度。虽然现有方法通过添加内在奖励来解决这个问题，但它们可能在具有大状态和动作空间的长时间跨度决策任务中缺乏有意义的指导，缺乏目的性探索。受人类认知启发，我们提出了一种新的多模态基于模型的强化学习方法，名为Dreaming with Large Language Models (DLLM)。DLLM将LLMs提出的暗示子目标整合到模型展开中，以鼓励在具有挑战性任务中发现目标并实现目标。通过在模型展开期间将更高的内在奖励分配给与语言模型中提示的线索相一致的样本，DLLM引导代理向有意义且高效的探索方向发展。大量实验证明，DLLM在HomeGrid、Crafter和Minecraft等各种具有挑战性的、稀疏奖励环境中的表现优于最近的方法，分别提高了27.7％、21.1％和9.9％。

更新时间: 2024-06-11 15:49:08

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07381v1

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.

Updated: 2024-06-11 15:49:07

标题: CondTSF：时间序列预测数据集压缩的一行插件

摘要: 数据集压缩是一种新生的技术，它生成一个小数据集，可以用于训练深度神经网络以降低训练成本。数据集压缩的目标是确保用合成数据集训练的模型能够与用完整数据集训练的模型性能相当。然而，现有方法主要集中在分类任务上，导致它们在时间序列预测（TS-forecasting）中的适应性面临挑战。这一挑战源于对合成数据的评估存在差异。在分类中，如果用完整数据集训练的模型和用合成数据集训练的模型对于相同输入产生相同的标签，那么合成数据被认为是很好的。而在TS-forecasting中，合成数据的提炼效果是通过两个模型预测之间的距离来确定的。只有当所有预测中的数据点相似时，合成数据才被认为是很好的。因此，与分类相比，TS-forecasting具有更严格的评估方法。为了弥补这一差距，我们从理论上分析了数据集压缩在TS-forecasting中的优化目标，并根据我们的分析提出了一种新的一行插件数据集压缩，称为基于我们的分析的时间序列预测数据集压缩（CondTSF）。将CondTSF插入先前的数据集压缩方法有助于减小用完整数据集训练的模型和用合成数据集训练的模型之间的预测之间的距离，从而提高性能。我们在八个常用的时间序列数据集上进行了大量实验。CondTSF在所有数据集上持续改进了所有先前的数据集压缩方法的性能，特别是在低压缩比下。

更新时间: 2024-06-11 15:49:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02131v3

Large Language Models for Constrained-Based Causal Discovery

Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and domain knowledge. This work explores the capabilities of Large Language Models (LLMs) as an alternative to domain experts for causal graph generation. We frame conditional independence queries as prompts to LLMs and employ the PC algorithm with the answers. The performance of the LLM-based conditional independence oracle on systems with known causal graphs shows a high degree of variability. We improve the performance through a proposed statistical-inspired voting schema that allows some control over false-positive and false-negative rates. Inspecting the chain-of-thought argumentation, we find causal reasoning to justify its answer to a probabilistic query. We show evidence that knowledge-based CIT could eventually become a complementary tool for data-driven causal discovery.

Updated: 2024-06-11 15:45:24

标题: 基于约束的因果发现的大型语言模型

摘要: 因果关系对于理解复杂系统（如经济、大脑和气候）是至关重要的。构建因果图通常依赖于数据驱动或专家驱动的方法，但两者都面临挑战。前者方法（如著名的PC算法）面临数据需求和因果充分性假设的问题，而后者则需要大量时间和领域知识。本文探讨了将大型语言模型（LLMs）作为因果图生成的替代方法。我们将条件独立性查询作为LLMs的提示，并与PC算法的答案配合使用。基于已知因果图系统的LLM基础条件独立性预测器的性能显示出高度的可变性。我们通过提出的基于统计的投票方案改善了性能，这允许对误报和漏报率进行一定程度的控制。通过检查思维链的论证，我们发现因果推理可以为概率查询提供合理的答案。我们展示了知识驱动的因果推断最终可能成为数据驱动因果发现的补充工具的证据。

更新时间: 2024-06-11 15:45:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07378v1

Improving the realism of robotic surgery simulation through injection of learning-based estimated errors

The development of algorithms for automation of subtasks during robotic surgery can be accelerated by the availability of realistic simulation environments. In this work, we focus on one aspect of the realism of a surgical simulator, which is the positional accuracy of the robot. In current simulators, robots have perfect or near-perfect accuracy, which is not representative of their physical counterparts. We therefore propose a pair of neural networks, trained by data collected from a physical robot, to estimate both the controller error and the kinematic and non-kinematic error. These error estimates are then injected within the simulator to produce a simulated robot that has the characteristic performance of the physical robot. In this scenario, we believe it is sufficient for the estimated error used in the simulation to have a statistically similar distribution to the actual error of the physical robot. This is less stringent, and therefore more tenable, than the requirement for error compensation of a physical robot, where the estimated error should equal the actual error. Our results demonstrate that error injection reduces the mean position and orientation differences between the simulated and physical robots from 5.0 mm / 3.6 deg to 1.3 mm / 1.7 deg, respectively, which represents reductions by factors of 3.8 and 2.1.

Updated: 2024-06-11 15:41:56

标题: 通过注入基于学习的估计误差提高机器人手术模拟的逼真性

摘要: 手术机器人在自动化子任务的算法开发可以通过现实仿真环境的可用性加快。在这项工作中，我们关注手术模拟器的真实性的一个方面，即机器人的位置精度。在当前的模拟器中，机器人具有完美或接近完美的精度，这并不代表它们的实际物理对应物。因此，我们提出了一对由从实际机器人收集的数据训练的神经网络，用于估计控制器误差以及运动学和非运动学误差。然后，这些误差估计被注入到模拟器中，以产生一个具有物理机器人特性表现的模拟机器人。在这种情况下，我们认为模拟中使用的估计误差具有与实际物理机器人的误差统计相似的分布即可。这比物理机器人的误差补偿要求更为宽松，因此更可行，其中估计的误差应等于实际误差。我们的结果表明，误差注入将模拟和实际机器人之间的平均位置和方向差异从5.0毫米/ 3.6度降低到1.3毫米/ 1.7度，分别减少了3.8倍和2.1倍。

更新时间: 2024-06-11 15:41:56

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.07375v1

Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization

We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms. Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2.

Updated: 2024-06-11 15:41:48

标题: 在并行随机凸优化中缩小计算-查询深度差距

摘要: 我们开发了一个新的并行算法，用于最小化具有随机次梯度预言的Lipschitz凸函数。我们的查询总数和查询深度，即查询的并行轮数，与之前的最先进方法[CJJLLST23]相匹配，同时在足够小的精度下通过多项式因子改进了计算深度。当结合之前的最先进方法时，我们的结果消除了最佳已知查询深度和最佳已知计算深度之间的差距。我们的方法以先前并行方法的球加速框架[CJJJLST20，ACJJS21]为起点，将问题简化为最小化约束在欧几里得球内的函数的正则化高斯卷积。通过开发和利用这个诱导函数的Hessian的新稳定性属性，我们偏离了先前的并行算法，并将这些球约束优化问题简化为随机无约束二次最小化问题。虽然我们无法证明我们用来近似这个Hessian的非对称矩阵的浓缩，但我们还是开发了一个有效的并行方法来解决这些二次问题。有趣的是，我们的算法可以利用快速矩阵乘法进行改进，如果矩阵乘法指数为2，则几乎可以线性地工作。

更新时间: 2024-06-11 15:41:48

领域: math.OC,cs.DS,cs.LG

下载: http://arxiv.org/abs/2406.07373v1

Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks

The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.

Updated: 2024-06-11 15:40:43

标题: Text-CRS：一种针对文本对抗攻击的通用认证鲁棒性框架

摘要: 语言模型，尤其是基本的文本分类模型，已经显示出容易受到诸如同义词替换和词语插入攻击等文本对抗攻击的影响。为了抵御此类攻击，越来越多的研究致力于提高模型的鲁棒性。然而，提供可证明的鲁棒性保证而不是经验性的鲁棒性仍然广泛未被探索。在本文中，我们提出了Text-CRS，这是一个基于随机平滑的自然语言处理（NLP）通用认证鲁棒性框架。据我们所知，现有的NLP认证方案只能在同义词替换攻击中证明对$\ell_0$扰动的鲁棒性。将每个单词级对抗操作（即同义词替换、单词重新排序、插入和删除）表示为置换和嵌入变换的组合，我们提出了新颖的平滑定理，以在置换和嵌入空间中导出对抗操作的鲁棒性边界。为了进一步提高认证准确性和半径，我们考虑离散单词之间的数值关系，并选择适当的噪声分布进行随机平滑。最后，我们在多个语言模型和数据集上进行了大量实验。Text-CRS可以处理所有四种不同的单词级对抗操作，并实现了显著的准确性提升。我们还提供了对四种单词级操作的认证准确性和半径的第一个基准，除了优于对抗同义词替换攻击的最先进认证。

更新时间: 2024-06-11 15:40:43

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.16630v2

NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation

Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this paper, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised anomaly detection algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.

Updated: 2024-06-11 15:39:52

标题: NNG-Mix：利用伪异常生成改进半监督异常检测

摘要: 异常检测（AD）在识别复杂系统中的罕见且通常关键事件中至关重要，在网络入侵检测、金融欺诈检测以及基础设施和工业系统中的故障检测等领域找到应用。尽管由于标签注释的高成本，AD通常被视为一项无监督学习任务，但更实际的做法是假定可以访问领域专家提供的少量标记的异常样本，这是半监督异常检测的情况。半监督和监督方法可以利用这些标记数据，从而提高性能。在本文中，我们并未提出一种新的半监督或监督方法来进行AD，而是基于有限的标记异常和大量未标记数据提出了一种生成附加伪异常的新算法。这作为一种增强，有助于检测新的异常。我们提出的算法名为最近邻高斯混合（NNG-Mix），有效地整合了来自标记和未标记数据的信息来生成伪异常。我们将这一新算法与常用的增强技术（如Mixup和Cutout）进行了性能比较。我们通过在原始训练数据上训练各种现有的半监督和监督异常检测算法以及生成的伪异常来评估NNG-Mix。通过在ADBench的57个基准数据集上进行大量实验，这些数据集反映了不同的数据类型，我们证明NNG-Mix优于其他数据增强方法。与仅在原始训练数据上训练的基线相比，它在ADBench的经典、CV和NLP数据集上分别实现了高达16.4％、8.8％和8.0％的性能改进。我们的源代码可在https://github.com/donghao51/NNG-Mix找到。

更新时间: 2024-06-11 15:39:52

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2311.11961v2

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.

Updated: 2024-06-11 15:35:20

标题: reBandit：基于随机效应的在线RL算法，用于减少大麻使用

摘要: 大麻使用及相关的大麻使用障碍（CUD）的流行率不断上升，在全球范围内构成了一个重大的公共卫生挑战。尤其是在新兴成年人（18-25岁）中，治疗缺口明显较大，因此解决大麻使用和CUD仍然是2030年联合国可持续发展目标议程中的一个关键目标。在这项工作中，我们开发了一个名为reBandit的在线强化学习（RL）算法，将在移动健康研究中使用，以提供旨在减少新兴成年人大麻使用的个性化移动健康干预措施。reBandit利用随机效应和信息贝叶斯先验，在嘈杂的移动健康环境中快速高效地学习。此外，reBandit采用经验贝叶斯和优化技术，在线自主更新其超参数。为了评估我们算法的性能，我们利用先前研究的数据构建了一个模拟测试平台，并与移动健康研究中常用的算法进行比较。我们展示了reBandit与所有基线算法同等或更好的表现，且在模拟环境中人口异质性增加时，性能差距扩大，证明了其能够适应不同研究参与者群体。

更新时间: 2024-06-11 15:35:20

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.17739v2

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2$\times$ speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.

Updated: 2024-06-11 15:34:43

标题: 当线性注意力遇上自回归解码：朝着更有效和更高效的线性化大型语言模型

摘要: 自回归大型语言模型（LLMs）在语言任务中取得了令人印象深刻的性能，但面临两个重要瓶颈：（1）随着标记数量的增加，注意力模块的二次复杂度，以及（2）自回归LLMs在生成过程中的顺序处理特性导致的效率有限。虽然线性注意力和猜测解码提供了潜在解决方案，但它们对于增强自回归LLMs的适用性和协同潜力仍然不确定。我们进行了第一项关于现有线性注意力方法对自回归LLMs有效性的全面研究，将它们与猜测解码集成在一起。我们引入了一种线性注意力的增强技术，确保与猜测解码的兼容性，从而实现更高效的LLMs训练和服务。涉及七种现有线性注意力模型和五种基于编码器/解码器的LLMs的广泛实验和消融研究一致验证了我们增强的线性化LLMs的有效性。值得注意的是，与先前的线性注意力方法相比，我们的方法在LLaMA模型上的困惑度降低了高达6.67，并在生成过程中获得了高达2倍的加速。代码和模型可在https://github.com/GATECH-EIC/Linearized-LLM找到。

更新时间: 2024-06-11 15:34:43

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07368v1

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $\textbf{ShiftAddViT}$, which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all $\texttt{MatMuls}$ among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. Extensive experiments on various 2D/3D Transformer-based vision tasks consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to $\textbf{5.18$\times$}$ latency reductions on GPUs and $\textbf{42.9}$% energy savings, while maintaining a comparable accuracy as original or efficient ViTs.

Updated: 2024-06-11 15:34:06

标题: ShiftAddViT: 将乘法基元混合为高效的视觉Transformer

摘要: 视觉Transformer（ViTs）表现出色，已经成为多个视觉任务的统一骨干。然而，ViTs中的注意力机制和多层感知器（MLPs）由于密集的乘法操作而效率不够高，导致训练和推理成本高昂。为此，我们提出重新参数化预训练的ViTs，使用混合的乘法原语，例如位移和加法，形成一种新型的减少乘法操作的模型，称为$\textbf{ShiftAddViT}$，旨在在GPU上实现端到端推理加速而无需从头开始训练。具体来说，将查询、键和值之间的所有$\texttt{MatMuls}$通过将查询和键映射到汉明空间中的二进制代码后，使用添加核进行重新参数化。然后，剩余的MLPs或线性层通过位移核进行重新参数化。我们利用TVM实现和优化这些定制的核，以便在GPU上进行实际硬件部署。我们发现，这种对注意力的重新参数化可以保持模型准确性，但在应用于MLPs时不可避免地会导致准确性下降。为了结合两者的优点，我们进一步提出了一种新的专家混合（MoE）框架，通过将乘法或其原语作为专家，例如乘法和位移，设计一种新的延迟感知负载平衡损失来重新参数化MLPs。这种损失有助于训练一个通用路由器，根据其延迟动态分配不同数量的输入标记给不同的专家。在各种基于2D/3D Transformer的视觉任务上进行的大量实验一致验证了我们提出的ShiftAddViT的有效性，实现了高达$\textbf{5.18$\times$}$的GPU上的延迟降低和$\textbf{42.9}$%的能源节省，同时保持与原始或高效的ViTs相当的准确性。

更新时间: 2024-06-11 15:34:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.06446v5

BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity. In practice, unseen aspects, due to distinct data distribution, impose many challenges for a trained neural model. Motivated by this, this work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real applications. Therefore, we first construct a few-shot ASQP dataset (FSQP) that contains richer categories and is more balanced for the few-shot study. Moreover, recent methods extract quads through a generation paradigm, which involves converting the input sentence into a templated target sequence. However, they primarily focus on the utilization of a single template or the consideration of different template orders, thereby overlooking the correlations among various templates. To tackle this issue, we further propose a Broadview Soft Prompting (BvSP) method that aggregates multiple templates with a broader view by taking into account the correlation between the different templates. Specifically, BvSP uses the pre-trained language model to select the most relevant k templates with Jensen-Shannon divergence. BvSP further introduces soft prompts to guide the pre-trained language model using the selected templates. Then, we aggregate the results of multi-templates by voting mechanism. Empirical results demonstrate that BvSP significantly outperforms the stateof-the-art methods under four few-shot settings and other public datasets. Our code and dataset are available at https://github.com/byinhao/BvSP.

Updated: 2024-06-11 15:32:32

标题: BvSP：广泛视角软提示用于少样本情感方面四元预测

摘要: Aspect sentiment quad prediction (ASQP)旨在预测四个基于方面的元素，包括方面术语，意见术语，方面类别和情感极性。在实践中，由于不同的数据分布，未见过的方面给经过训练的神经模型带来了许多挑战。受此启发，本文将ASQP制定为少样本情景，旨在快速适应真实应用。因此，我们首先构建了一个包含更丰富类别且更平衡的少样本ASQP数据集（FSQP）以进行少样本研究。此外，最近的方法通过一种生成范式提取四元组，其中包括将输入句子转换为模板化的目标序列。然而，它们主要关注单个模板的利用或考虑不同模板顺序，从而忽视了各种模板之间的相关性。为了解决这个问题，我们进一步提出了一种Broadview Soft Prompting（BvSP）方法，通过考虑不同模板之间的相关性，聚合多个模板以更广泛地观察。具体而言，BvSP使用预训练语言模型通过Jensen-Shannon散度选择最相关的k个模板。BvSP进一步引入软提示，通过使用选择的模板指导预训练语言模型。然后，我们通过投票机制聚合多个模板的结果。实证结果表明，在四个少样本设置和其他公共数据集下，BvSP显著优于最先进的方法。我们的代码和数据集可在https://github.com/byinhao/BvSP获取。

更新时间: 2024-06-11 15:32:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07365v1

Deep Implicit Optimization for Robust and Flexible Image Registration

Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, DLIR methods forego many of the benefits of classical optimization-based methods. The functional nature of deep networks do not guarantee that the predicted transformation is a local minima of the registration objective, the representation of the transformation (displacement/velocity field/affine) is fixed, and the networks are not robust to domain shift. Our method aims to bridge this gap between classical and learning methods by incorporating optimization as a layer in a deep network. A deep network is trained to predict multi-scale dense feature images that are registered using a black box iterative optimization solver. This optimal warp is then used to minimize image and label alignment errors. By implicitly differentiating end-to-end through an iterative optimization solver, our learned features are registration and label-aware, and the warp functions are guaranteed to be local minima of the registration objective in the feature space. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift such as anisotropy and varying intensity profiles. For the first time, our method allows switching between arbitrary transformation representations (free-form to diffeomorphic) at test time with zero retraining. End-to-end feature learning also facilitates interpretability of features, and out-of-the-box promptability using additional label-fidelity terms at inference.

Updated: 2024-06-11 15:28:48

标题: 深度隐式优化用于强大而灵活的图像配准

摘要: 深度学习在图像配准（DLIR）方法在图像配准方面取得了巨大成功，这是因为它们的速度和能力能够在训练时结合弱标签监督。然而，DLIR方法放弃了许多传统基于优化的方法的优点。深度网络的功能性质并不保证预测的变换是配准目标的局部最小值，变换的表示（位移/速度场/仿射）是固定的，网络对领域变化也不具有稳健性。我们的方法旨在通过将优化作为深度网络的一层来弥合这种经典和学习方法之间的差距。通过训练一个深度网络来预测多尺度密集特征图像，并使用黑盒迭代优化求解器进行配准。然后利用这个最佳变形来最小化图像和标签对齐误差。通过隐式地通过迭代优化求解器端到端微分，我们学习到的特征是配准和标签感知的，变形函数保证在特征空间中是配准目标的局部最小值。我们的框架在领域内数据集上表现出色，并且对领域变化（如各向异性和不同强度配置文件）不受影响。首次，我们的方法允许在测试时在任意转换表示之间切换（自由形式到微分同胚）而无需重新训练。端到端特征学习还有助于特征的可解释性，并在推断时使用附加标签忠实度术语的即插即用性。

更新时间: 2024-06-11 15:28:48

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.07361v1

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underperformance on an evaluation". In this paper we assess sandbagging capabilities in contemporary language models (LMs). We prompt frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on dangerous capability evaluations, while maintaining performance on general (harmless) capability evaluations. Moreover, we find that models can be fine-tuned, on a synthetic dataset, to hide specific capabilities unless given a password. This behaviour generalizes to high-quality, held-out benchmarks such as WMDP. In addition, we show that both frontier and smaller models can be prompted, or password-locked, to target specific scores on a capability evaluation. Even more, we found that a capable password-locked model (Llama 3 70b) is reasonably able to emulate a less capable model (Llama 2 7b). Overall, our results suggest that capability evaluations are vulnerable to sandbagging. This vulnerability decreases the trustworthiness of evaluations, and thereby undermines important safety decisions regarding the development and deployment of advanced AI systems.

Updated: 2024-06-11 15:26:57

标题: AI堤塞：语言模型可以在评估中有策略性的表现不佳

摘要: 信誉良好的能力评估对确保人工智能系统的安全至关重要，并且正在成为人工智能监管的关键组成部分。然而，人工智能系统的开发者，或者人工智能系统本身，可能有动机低估人工智能的实际能力。这些利益冲突导致了所谓的“故意表现不佳”的问题，我们定义为“对评估的策略性低表现”。在本文中，我们评估了当代语言模型（LMs）的故意表现不佳能力。我们促使前沿的LMs，如GPT-4和Claude 3 Opus，在危险能力评估中选择性地表现不佳，同时在一般（无害）能力评估中保持表现。此外，我们发现模型可以在合成数据集上进行微调，以隐藏特定能力，除非给定密码。这种行为可以推广到高质量的，留置的基准，如WMDP。此外，我们表明，无论是前沿模型还是较小模型，都可以被促使或密码锁定，以针对特定的能力评估分数。更重要的是，我们发现，一个有能力的密码锁定模型（Llama 3 70b）能够合理地模拟一个能力较差的模型（Llama 2 7b）。总的来说，我们的结果表明，能力评估容易受到故意表现不佳的影响。这种脆弱性降低了评估的可信度，从而削弱了关于先进人工智能系统的开发和部署的重要安全决策。

更新时间: 2024-06-11 15:26:57

领域: cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.07358v1

Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response

Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox modeling approaches, critical when learning from small datasets or partially observed, complex systems. Unfortunately, as the hybrid models become more flexible, the causal grounding provided by the mechanistic model can quickly be lost. We address this problem by leveraging another common source of domain knowledge: \emph{ranking} of treatment effects for a set of interventions, even if the precise treatment effect is unknown. We encode this information in a \emph{causal loss} that we combine with the standard predictive loss to arrive at a \emph{hybrid loss} that biases our learning towards causally valid hybrid models. We demonstrate our ability to achieve a win-win, state-of-the-art predictive performance \emph{and} causal validity, in the challenging task of modeling glucose dynamics post-exercise in individuals with type 1 diabetes.

Updated: 2024-06-11 15:25:01

标题: 混合$^2$神经ODE因果建模及其在血糖反应中的应用

摘要: 混合模型将基于机械ODE的动态性与灵活和表达丰富的神经网络组件结合在一起，已经在科学领域迅速增长，特别是在那些ODE-based建模提供重要可解释性和验证因果基础（例如，用于反事实推理）的领域。机械模型的整合还为标准黑盒建模方法提供了归纳偏差，在从小数据集或部分观察到的复杂系统中学习时至关重要。不幸的是，随着混合模型变得更加灵活，由机械模型提供的因果基础可能很快丢失。我们通过利用另一个常见的领域知识来源来解决这个问题：对一组干预的治疗效果进行排名，即使精确的治疗效果未知。我们将这些信息编码在一个因果损失中，将其与标准预测损失相结合，得到一个偏向因果有效的混合损失，从而偏向于学习符合因果有效性的混合模型。我们展示了我们在挑战性的任务中，即在1型糖尿病患者锻炼后建模葡萄糖动态的能力，实现了一种双赢的最新预测性能和因果有效性。

更新时间: 2024-06-11 15:25:01

领域: cs.LG,stat.AP,stat.ME

下载: http://arxiv.org/abs/2402.17233v2

Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities

Internet memes, channels for humor, social commentary, and cultural expression, are increasingly used to spread toxic messages. Studies on the computational analyses of toxic memes have significantly grown over the past five years, and the only three surveys on computational toxic meme analysis cover only work published until 2022, leading to inconsistent terminology and unexplored trends. Our work fills this gap by surveying content-based computational perspectives on toxic memes, and reviewing key developments until early 2024. Employing the PRISMA methodology, we systematically extend the previously considered papers, achieving a threefold result. First, we survey 119 new papers, analyzing 158 computational works focused on content-based toxic meme analysis. We identify over 30 datasets used in toxic meme analysis and examine their labeling systems. Second, after observing the existence of unclear definitions of meme toxicity in computational works, we introduce a new taxonomy for categorizing meme toxicity types. We also note an expansion in computational tasks beyond the simple binary classification of memes as toxic or non-toxic, indicating a shift towards achieving a nuanced comprehension of toxicity. Third, we identify three content-based dimensions of meme toxicity under automatic study: target, intent, and conveyance tactics. We develop a framework illustrating the relationships between these dimensions and meme toxicities. The survey analyzes key challenges and recent trends, such as enhanced cross-modal reasoning, integrating expert and cultural knowledge, the demand for automatic toxicity explanations, and handling meme toxicity in low-resource languages. Also, it notes the rising use of Large Language Models (LLMs) and generative AI for detecting and generating toxic memes. Finally, it proposes pathways for advancing toxic meme detection and interpretation.

Updated: 2024-06-11 15:22:48

标题: 有毒的迷因：关于检测和解释迷因有毒性的计算视角调查

摘要: 互联网迷因作为幽默、社会评论和文化表达的渠道，越来越多地被用来传播有毒信息。在过去五年中，关于有毒迷因的计算分析研究显著增加，而仅有三项关于计算有毒迷因分析的调查涵盖了截至2022年发表的工作，导致术语不一致和未探索的趋势。我们的工作通过调查基于内容的计算视角对有毒迷因进行了系统填补，回顾了直至2024年初的关键发展。采用PRISMA方法论，我们系统地扩展了之前考虑的论文，取得了三倍的结果。首先，我们调查了119篇新论文，分析了158项关注基于内容的有毒迷因分析的计算工作。我们确定了在有毒迷因分析中使用的30多个数据集，并检查它们的标签系统。其次，在观察到计算作品中有关迷因毒性定义不清的存在后，我们引入了一种新的分类迷因毒性类型的分类法。我们还注意到计算任务已经超越了简单的将迷因分类为有毒或非有毒的二元分类，表明朝着对毒性进行细致理解的转变。第三，我们确定了自动研究中迷因毒性的三个基于内容的维度：目标、意图和传达策略。我们开发了一个框架，说明了这些维度与迷因毒性之间的关系。该调查分析了关键挑战和最新趋势，如增强跨模态推理、整合专家和文化知识、对自动毒性解释的需求以及处理低资源语言中的迷因毒性。此外，它注意到大型语言模型（LLMs）和生成式人工智能的使用正在上升，用于检测和生成有毒迷因。最后，它提出了推进有毒迷因检测和解释的途径。

更新时间: 2024-06-11 15:22:48

领域: cs.CL,cs.AI,cs.CV,cs.CY,cs.SI

下载: http://arxiv.org/abs/2406.07353v1

Ask Again, Then Fail: Large Language Models' Vacillations in Judgment

We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current language models. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.

Updated: 2024-06-11 15:22:07

标题: 再问一次，然后失败：大型语言模型在判断中的摇摆

摘要: 我们观察到，当前的对话语言模型在面对后续问题时往往会在判断上摇摆不定，即使最初的判断是正确的。这种摇摆对于生成可靠的回答和建立用户信任构成了重大挑战。为了全面评估这个问题，我们引入了一个Follow-up Questioning Mechanism以及两个度量标准来量化这种不一致性，确认了当前语言模型中普遍存在这种情况。为了缓解这个问题，我们探索了各种提示策略，针对闭源模型；此外，我们开发了一个基于训练的框架Unwavering-FQ，通过合成高质量的偏好数据教导语言模型保持其最初的正确判断。我们的实验结果证实了我们的框架的有效性以及其增强模型通用能力的能力。

更新时间: 2024-06-11 15:22:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.02174v5

Transforming Wearable Data into Health Insights using Large Language Model Agents

Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.

Updated: 2024-06-11 15:17:43

标题: 使用大型语言模型代理将可穿戴数据转化为健康见解

摘要: 尽管可穿戴健康追踪器的普及以及睡眠和锻炼对健康的重要性，但从可穿戴数据中获得可操作的个性化见解仍然是一个挑战，因为这需要对这些数据进行非平凡的开放式分析。最近兴起的大型语言模型(LLM)代理可以使用工具对世界进行推理和交互，为实现大规模的个性化分析提供了有希望的机会。然而，LLM代理在分析个人健康方面的应用仍然是未开发的。在本文中，我们介绍了个人健康洞察代理（PHIA），这是一个代理系统，利用最先进的代码生成和信息检索工具来分析和解释来自可穿戴设备的行为健康数据。我们策划了两个超过4000个健康洞察问题的基准问答数据集。根据650小时的人类和专家评估，我们发现PHIA能够准确回答超过84%的事实性数字问题和超过83%的众包开放性问题。这项工作对推动整个人群的行为健康具有重要意义，潜在地使个人能够解释自己的可穿戴数据，并为一个由数据驱动见解支持的新时代的可获得个性化健康方案铺平道路。

更新时间: 2024-06-11 15:17:43

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06464v2

Erasing Radio Frequency Fingerprinting via Active Adversarial Perturbation

Radio Frequency (RF) fingerprinting is to identify a wireless device from its uniqueness of the analog circuitry or hardware imperfections. However, unlike the MAC address which can be modified, such hardware feature is inevitable for the signal emitted to air, which can possibly reveal device whereabouts, e.g., a sniffer can use a pre-trained model to identify a nearby device when receiving its signal. Such fingerprint may expose critical private information, e.g., the associated upper-layer applications or the end-user. In this paper, we propose to erase such RF feature for wireless devices, which can prevent fingerprinting by actively perturbation from the signal perspective. Specifically, we consider a common RF fingerprinting scenario, where machine learning models are trained from pilot signal data for identification. A novel adversarial attack solution is designed to generate proper perturbations, whereby the perturbed pilot signal can hide the hardware feature and misclassify the model. We theoretically show that the perturbation would not affect the communication function within a tolerable perturbation threshold. We also implement the pilot signal fingerprinting and the proposed perturbation process in a practical LTE system. Extensive experiment results demonstrate that the RF fingerprints can be effectively erased to protect the user privacy.

Updated: 2024-06-11 15:16:05

标题: 通过主动对抗性扰动消除无线频率指纹

摘要: 无线设备的射频（RF）指纹识别是通过其模拟电路或硬件缺陷的独特性来识别无线设备。然而，与可以修改的MAC地址不同，这种硬件特征对于发射到空气中的信号是不可避免的，这可能会揭示设备的位置，例如，当嗅探器接收到其信号时，可以使用预先训练的模型识别附近的设备。这种指纹可能暴露关键的私人信息，例如，相关的上层应用程序或最终用户。在本文中，我们提出了为无线设备消除这种RF特征的方法，可以通过信号角度的主动扰动来防止指纹识别。具体而言，我们考虑了常见的RF指纹识别场景，其中机器学习模型是从导航信号数据中训练出来进行识别的。我们设计了一种新颖的对抗性攻击解决方案来生成适当的扰动，通过这种扰动的导航信号可以隐藏硬件特征并对模型进行错误分类。我们在理论上证明了在可容忍的扰动阈值内，扰动不会影响通信功能。我们还在实际LTE系统中实施了导航信号指纹识别和提出的扰动过程。大量实验结果表明，RF指纹可以被有效地消除以保护用户隐私。

更新时间: 2024-06-11 15:16:05

领域: cs.CR

下载: http://arxiv.org/abs/2406.07349v1

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

Retrieval-Augmented Generation (RAG) has significantly demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks, such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the relevant documents by a single query. We find that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. To mine the relevance, a two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers while maintaining efficiency. Also, a small classifier is applied to two different selection strategies to determine the contribution of the retrieved documents to answering the query and retrieve the relatively relevant documents. Meanwhile, DR-RAG call the LLMs only once, which significantly improves the efficiency of the experiment. The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.

Updated: 2024-06-11 15:15:33

标题: DR-RAG:将动态文档相关性应用于检索增强生成，用于问答

摘要: 检索增强生成（RAG）在知识密集型任务中显著展示了大型语言模型（LLMs）的性能，例如问答（QA）。RAG通过将外部知识库整合到查询上下文中来增强响应的准确性。然而，对于每个查询多次访问LLMs是低效的，通过单个查询检索所有相关文档也是不可靠的。我们发现，即使一些关键文档与查询之间的相关性较低，仍然可以通过将文档的部分内容与查询结合来检索其余文档。为了挖掘相关性，提出了一种名为动态相关检索增强生成（DR-RAG）的两阶段检索框架，以提高文档检索召回率和答案准确性，同时保持效率。此外，将一个小型分类器应用于两种不同的选择策略，以确定检索到的文档对回答查询的贡献，并检索相对相关的文档。与此同时，DR-RAG仅调用LLMs一次，显著提高了实验的效率。在多跳问答数据集上的实验结果显示，DR-RAG可以显著提高答案的准确性，并在QA系统中取得新的进展。

更新时间: 2024-06-11 15:15:33

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.07348v1

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.

Updated: 2024-06-11 15:14:30

标题: ShiftAddLLM：通过后训练无乘法重新参数化加速预训练LLMs

摘要: 大型语言模型(LLMs)在语言任务上表现出色，但在资源受限设备上部署时面临挑战，因为它们具有庞大的参数和依赖密集乘法，导致内存需求高和延迟瓶颈。Shift-and-add重新参数化通过在LLM的注意力和多层感知器(MLP)层中用硬件友好的原语替换昂贵的乘法，提供了一个有前途的解决方案。然而，当前的重新参数化技术需要从头开始训练或全参数微调以恢复准确性，这对LLMs来说是资源密集型的。为了解决这个问题，我们提出了通过后期训练的Shift-and-add重新参数化来加速预训练的LLMs，创建高效的无乘法模型，称为ShiftAddLLM。具体地，我们将每个权重矩阵量化为与分组缩放因子配对的二进制矩阵。相关的乘法被重新参数化为(1)激活和缩放因子之间的位移和(2)根据二进制矩阵的查询和添加。为了减少精度损失，我们提出了一个多目标优化方法，以最小化权重和输出激活重新参数化错误。此外，根据各层对重新参数化的敏感性的不同，我们开发了一个自动位分配策略，进一步降低内存使用和延迟。在五个LLM系列和八个任务上的实验始终验证了ShiftAddLLM的有效性，相比于3位和2位的最具竞争力的量化LLMs，分别获得了平均困惑度提高了5.6和22.7个点，同时比原始LLMs减少了80%以上的内存和能源消耗。代码和模型可以在https://github.com/GATECH-EIC/ShiftAddLLM找到。

更新时间: 2024-06-11 15:14:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05981v2

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.

Updated: 2024-06-11 15:12:09

标题: 一个专家价值一个令牌：通过专家令牌路由将多个专家LLM协同为通才

摘要: 我们提出了Expert-Token-Routing，这是一个统一的通用框架，可以无缝集成多个专家LLM。我们的框架将专家LLM表示为元LLM词汇中的特殊专家标记。元LLM可以像生成新标记一样路由到专家LLM。Expert-Token-Routing不仅支持从现有的指导数据集中学习专家LLM的隐含专业知识，还允许以即插即用的方式动态扩展新的专家LLM。它还从用户的角度隐藏了详细的协作过程，便于互动，就像它是一个单一的LLM一样。我们的框架在涵盖六个不同专家领域的基准测试中优于各种现有的多LLM协作范式，展示了通过协同多个专家LLM来构建通用主义LLM系统的有效性和稳健性。

更新时间: 2024-06-11 15:12:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.16854v3

Formally Verified Approximate Policy Iteration

We formally verify an algorithm for approximate policy iteration on Factored Markov Decision Processes using the interactive theorem prover Isabelle/HOL. Next, we show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions. The algorithm builds on a diverse library of formalized mathematics and pushes existing methodologies for interactive theorem provers to the limits. We discuss the process of the verification project and the modifications to the algorithm needed for formal verification.

Updated: 2024-06-11 15:07:08

标题: 经过正式验证的近似策略迭代

摘要: 我们使用交互式定理证明器Isabelle/HOL正式验证了一种在分解马尔可夫决策过程上进行近似策略迭代的算法。接下来，我们展示了如何将形式化的算法优化为可执行的、经过验证的实现。该实现在基准问题上进行了评估，以展示其实用性。作为完善的一部分，我们开发了用于认证线性规划解决方案的验证软件。该算法基于各种形式化数学库，并将交互式定理证明器的现有方法推向极限。我们讨论了验证项目的过程以及为形式验证所需的算法修改。

更新时间: 2024-06-11 15:07:08

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2406.07340v1

Transferring Knowledge from Large Foundation Models to Small Downstream Models

How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT). Instead of transferring weights, AFT operates purely on features, thereby decoupling the choice of the pre-trained model from the smaller downstream model. Rather than indiscriminately compressing all pre-trained features, AFT adaptively transfers pre-trained features that are most useful for performing the downstream task, using a simple regularization that adds minimal overhead. Across multiple vision, language, and multi-modal datasets, AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost. Furthermore, AFT reliably translates improvement in pre-trained models into improvement in downstream performance, even if the downstream model is over $50\times$ smaller, and can effectively transfer complementary information learned by multiple pre-trained models.

Updated: 2024-06-11 15:06:15

标题: 将大型基础模型的知识转移到小型下游模型

摘要: 我们如何将来自日益庞大的基础模型的相关知识转移到可以以更低成本运行的小型、任务特定的下游模型中？使用预训练权重作为初始化的标准迁移学习转移有限信息，并使我们经常承担庞大的预训练架构。这一过程也排除了结合多个学习互补信息的预训练模型的可能性。为了解决这些缺点，我们引入了自适应特征转移（AFT）。AFT不是传输权重，而是纯粹基于特征操作，从而将预训练模型的选择与较小的下游模型分离开来。与无差别地压缩所有预训练特征不同，AFT会自适应地传输对执行下游任务最有用的预训练特征，使用一种简单的正则化方法，几乎不增加额外开销。在多个视觉、语言和多模态数据集上，与类似计算成本的替代方案相比，AFT实现了显著更好的下游性能。此外，AFT可可靠地将预训练模型的改进转化为下游性能的提高，即使下游模型小于预训练模型的50倍以上，也能有效地传输多个预训练模型学习的互补信息。

更新时间: 2024-06-11 15:06:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.07337v1

Enhancing CTC-based speech recognition with diverse modeling units

In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly with diverse modeling units. This methodology does not only align the strengths of both phoneme and grapheme-based models but also reveals that using these diverse modeling units in a synergistic way can significantly enhance model accuracy. Our findings offer new insights into the optimal integration of heterogeneous modeling units in the development of more robust and accurate ASR systems.

Updated: 2024-06-11 15:03:31

标题: 用多样性建模单元增强基于 CTC 的语音识别

摘要: 近年来，端到端（E2E）自动语音识别（ASR）模型的发展取得了显著进展，这在很大程度上归功于深度学习架构如transformer的进步。在E2E系统的基础上，研究人员通过使用基于音素的模型对E2E模型的N个最佳假设进行重新评分，实现了显著的准确性提高。这引发了一个有趣的问题，即除了系统组合效果之外，这些改进来自何处。我们研究了推动这些收益的潜在机制，并提出了一种高效的联合训练方法，其中E2E模型与多样的建模单元联合训练。这种方法不仅能够整合音素和字素模型的优势，还揭示了以这些多样的建模单元协同方式使用可以显著提高模型准确性。我们的发现为更健壮和准确的ASR系统的开发中异构建模单元的最佳整合提供了新的见解。

更新时间: 2024-06-11 15:03:31

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2406.03274v2

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investigate the performance of CTC-based NAR models in S2ST, as these models have shown impressive results in machine translation. Experimental results demonstrate that by combining pretraining, knowledge distillation, and advanced NAR training techniques such as glancing training and non-monotonic latent alignments, CTC-based NAR models achieve translation quality comparable to the AR model, while preserving up to 26.81$\times$ decoding speedup.

Updated: 2024-06-11 15:00:33

标题: 基于CTC的非自回归无文本语音-语音翻译

摘要: 直接语音到语音翻译（S2ST）已经实现了令人印象深刻的翻译质量，但由于较长的语音序列，通常面临解码速度缓慢的挑战。最近，一些研究开始转向非自回归（NAR）模型以加快解码速度，然而翻译质量通常明显落后于自回归（AR）模型。在本文中，我们研究了基于CTC的NAR模型在S2ST中的性能，因为这些模型在机器翻译中已经显示出令人印象深刻的结果。实验结果表明，通过结合预训练、知识蒸馏以及先进的NAR训练技术，如扫视训练和非单调潜在对齐，基于CTC的NAR模型实现了与AR模型可比的翻译质量，同时保留了高达26.81倍的解码加速。

更新时间: 2024-06-11 15:00:33

领域: cs.CL,cs.AI,cs.SD,eess.AS,I.2.7

下载: http://arxiv.org/abs/2406.07330v1

Realistic Data Generation for 6D Pose Estimation of Surgical Instruments

Automation in surgical robotics has the potential to improve patient safety and surgical efficiency, but it is difficult to achieve due to the need for robust perception algorithms. In particular, 6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers based on visual feedback. In recent years, supervised deep learning algorithms have shown increasingly better performance at 6D pose estimation tasks; yet, their success depends on the availability of large amounts of annotated data. In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs of 6D pose datasets. However, this strategy does not translate well to surgical domains as commercial graphics software have limited tools to generate images depicting realistic instrument-tissue interactions. To address these limitations, we propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets for 6D pose estimation of surgical instruments. Among the improvements, we developed an automated data generation pipeline and an improved surgical scene. To show the applicability of our system, we generated a dataset of 7.5k images with pose annotations of a surgical needle that was used to evaluate a state-of-the-art pose estimation network. The trained model obtained a mean translational error of 2.59mm on a challenging dataset that presented varying levels of occlusion. These results highlight our pipeline's success in training and evaluating novel vision algorithms for surgical robotics applications.

Updated: 2024-06-11 14:59:29

标题: 手术器械6D姿态估计的现实数据生成

摘要: 手术机器人中的自动化技术有望改善患者安全和手术效率，但由于需要强大的感知算法，难以实现。特别是，手术器械的6D姿态估计对于基于视觉反馈实现手术操作的自动执行至关重要。近年来，监督深度学习算法在6D姿态估计任务中表现越来越好；然而，它们的成功取决于大量标注数据的可用性。在家庭和工业环境中，通过3D计算机图形软件生成的合成数据已被证明是减少6D姿态数据集注释成本的替代方法。然而，这种策略在手术领域中并不适用，因为商业图形软件的工具有限，无法生成展现真实器械-组织交互的图像。为解决这些限制，我们提出了一个改进的手术机器人仿真环境，可以自动生成大量丰富多样的6D手术器械姿态估计数据集。在改进中，我们开发了一个自动化数据生成流水线和一个改进的手术场景。为了展示我们系统的适用性，我们生成了一个包含7.5k图像和手术针姿态标注的数据集，并用于评估最先进的姿态估计网络。训练模型在一个具有不同程度遮挡的具有挑战性数据集上获得了平均2.59mm的平移误差。这些结果突显了我们流水线在为手术机器人应用训练和评估新颖视觉算法方面的成功。

更新时间: 2024-06-11 14:59:29

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.07328v1

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples. Despite the efficiency, DPO has rarely be used in the state-of-the-art production-level LLMs, implying its potential pathologies. In this work, we revisit DPO with a comprehensive examination of its empirical efficacy and a systematic comparison with RLHF-PPO. We identify the \textbf{3D}-properties of DPO's learning outcomes: the \textbf{D}rastic drop in the likelihood of rejected responses, the \textbf{D}egradation into LLM unlearning, and the \textbf{D}ispersion effect on unseen responses through experiments with both a carefully designed toy model and practical LLMs on tasks including mathematical problem-solving and instruction following. These findings inherently connect to some observations made by related works and we additionally contribute a plausible theoretical explanation for them. Accordingly, we propose easy regularization methods to mitigate the issues caused by \textbf{3D}-properties, improving the training stability and final performance of DPO. Our contributions also include an investigation into how the distribution of the paired preference data impacts the effectiveness of DPO. We hope this work could offer research directions to narrow the gap between reward-free preference learning methods and reward-based ones.

Updated: 2024-06-11 14:59:24

标题: 3D特性：识别DPO中的挑战并制定前进路径

摘要: 将大型语言模型（LLMs）与人类偏好进行对齐最近引起了极大关注，其中经典但昂贵的RLHF-PPO和简单直接的Direct Preference Optimization（DPO）是两个例子。尽管DPO效率高，但在最先进的生产级LLMs中很少被使用，暗示其潜在的病态。在这项工作中，我们重新审视了DPO，全面检验了其实证功效，并与RLHF-PPO进行了系统比较。我们确定了DPO学习结果的3D特性：被拒绝响应的可能性急剧下降，退化为LLM遗忘，以及对未见响应的扩散效应，通过对精心设计的玩具模型和包括数学问题解决和指令遵循在内的实际LLMs进行实验。这些发现与相关工作中的一些观察内在连接，并且我们额外提出一个可能的理论解释。因此，我们提出了简单的正则化方法来缓解由3D特性引起的问题，提高DPO的训练稳定性和最终表现。我们的贡献还包括研究成对偏好数据分布如何影响DPO的有效性。我们希望这项工作能够为缩小无奖励偏好学习方法和基于奖励的方法之间的差距提供研究方向。

更新时间: 2024-06-11 14:59:24

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.07327v1

Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling

Learned construction heuristics for scheduling problems have become increasingly competitive with established solvers and heuristics in recent years. In particular, significant improvements have been observed in solution approaches using deep reinforcement learning (DRL). While much attention has been paid to the design of network architectures and training algorithms to achieve state-of-the-art results, little research has investigated the optimal use of trained DRL agents during inference. Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget. We propose a simple yet effective parameterization, called $\delta$-sampling that manipulates the trained action vector to bias agent behavior towards exploration or exploitation during solution construction. By following this approach, we can achieve a more comprehensive coverage of the search space while still generating an acceptable number of solutions. In addition, we propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent. Experiments extending existing training protocols for job shop scheduling problems with our inference method validate our hypothesis and result in the expected improvements of the generated solutions.

Updated: 2024-06-11 14:59:18

标题: 超越训练：通过自适应行动采样优化基于强化学习的作业车间调度

摘要: 学习建筑启发式调度问题在近年来已逐渐与已建立的求解器和启发式方法竞争。特别是，最近在使用深度强化学习（DRL）的解决方案方法中观察到了显著的改进。虽然人们在设计网络架构和训练算法以达到最先进的结果方面付出了很多关注，但很少有研究探讨了在推理过程中训练过的DRL代理的最佳使用方式。我们的工作基于这样一个假设，即类似于搜索算法，训练过的DRL代理的利用应取决于可接受的计算预算。我们提出了一种简单而有效的参数化方法，称为$\delta$-sampling，它可以操作训练过的动作向量，以在解决方案构建过程中偏向于探索或开发行为。通过遵循这种方法，我们可以更全面地覆盖搜索空间，同时仍然生成可接受数量的解决方案。此外，我们提出了一种用于获得在给定数量的解决方案和任何给定训练代理的情况下的最佳参数化的算法。通过将我们的推理方法与现有的作业车间调度问题的训练协议进行扩展的实验证明了我们的假设，并导致所生成解决方案的预期改进。

更新时间: 2024-06-11 14:59:18

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07325v1

Room Transfer Function Reconstruction Using Complex-valued Neural Networks and Irregularly Distributed Microphones

Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurements at scattered points in the room. In this paper, we employ complex-valued neural networks to estimate room transfer functions in the frequency range of the first room resonances, using a few irregularly distributed microphones. To the best of our knowledge, this is the first time that complex-valued neural networks are used to estimate room transfer functions. To analyze the benefits of applying complex-valued optimization to the considered task, we compare the proposed technique with a state-of-the-art kernel-based signal processing approach for sound field reconstruction, showing that the proposed technique exhibits relevant advantages in terms of phase accuracy and overall quality of the reconstructed sound field. For informative purposes, we also compare the model with a similarly-structured data-driven approach that, however, applies a real-valued neural network to reconstruct only the magnitude of the sound field.

Updated: 2024-06-11 14:54:45

标题: 利用复值神经网络和不规则分布的麦克风重构房间传递函数

摘要: 重建用于计算房间复杂声场的房间传递函数具有几个重要的实际应用。然而，通常需要大量的麦克风，这在实践中是不现实的。最近，除了经典信号处理方法外，深度学习技术已经被应用于从房间中散点测量的非常有限的数据集开始重建房间传递函数。在本文中，我们使用复数值神经网络来估计第一次房间共振频率范围内的房间传递函数，使用少量不规则分布的麦克风。据我们所知，这是第一次使用复数值神经网络来估计房间传递函数。为了分析将复数值优化应用于所考虑任务的好处，我们将所提出的技术与基于核的信号处理方法进行比较，显示所提出的技术在相位准确性和重建声场的整体质量方面具有显著优势。为了信息传达的目的，我们还将该模型与类似结构的数据驱动方法进行比较，但该方法仅使用实数值神经网络来重建声场的幅度。

更新时间: 2024-06-11 14:54:45

领域: eess.AS,cs.LG,cs.SD,eess.SP

下载: http://arxiv.org/abs/2402.04866v3

Should XAI Nudge Human Decisions with Explanation Biasing?

This paper reviews our previous trials of Nudge-XAI, an approach that introduces automatic biases into explanations from explainable AIs (XAIs) with the aim of leading users to better decisions, and it discusses the benefits and challenges. Nudge-XAI uses a user model that predicts the influence of providing an explanation or emphasizing it and attempts to guide users toward AI-suggested decisions without coercion. The nudge design is expected to enhance the autonomy of users, reduce the risk associated with an AI making decisions without users' full agreement, and enable users to avoid AI failures. To discuss the potential of Nudge-XAI, this paper reports a post-hoc investigation of previous experimental results using cluster analysis. The results demonstrate the diversity of user behavior in response to Nudge-XAI, which supports our aim of enhancing user autonomy. However, it also highlights the challenge of users who distrust AI and falsely make decisions contrary to AI suggestions, suggesting the need for personalized adjustment of the strength of nudges to make this approach work more generally.

Updated: 2024-06-11 14:53:07

标题: 应该使用解释偏见来推动人类决策吗？

摘要: 本文回顾了我们先前对Nudge-XAI的试验，这是一种方法，通过在可解释的人工智能（XAIs）的解释中引入自动偏见，旨在引导用户做出更好的决策，并讨论了其中的好处和挑战。Nudge-XAI使用一个用户模型，预测提供解释或强调解释的影响，并试图在没有强制的情况下引导用户朝向AI建议的决策。预期nudge设计将增强用户的自主性，减少AI未经用户完全同意就做出决策的风险，并使用户避免AI失败。为了讨论Nudge-XAI的潜力，本文通过聚类分析报告了对先前实验结果的事后调查。结果表明，用户对Nudge-XAI的反应行为多样化，支持我们增强用户自主性的目标。然而，它也凸显了对AI持怀疑态度并错误地做出与AI建议相反决策的用户的挑战，这表明有必要个性化调整nudges的强度，以使这种方法更普遍地奏效。

更新时间: 2024-06-11 14:53:07

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.07323v1

Rethinking the impact of noisy labels in graph classification: A utility and privacy perspective

Graph neural networks based on message-passing mechanisms have achieved advanced results in graph classification tasks. However, their generalization performance degrades when noisy labels are present in the training data. Most existing noisy labeling approaches focus on the visual domain or graph node classification tasks and analyze the impact of noisy labels only from a utility perspective. Unlike existing work, in this paper, we measure the effects of noise labels on graph classification from data privacy and model utility perspectives. We find that noise labels degrade the model's generalization performance and enhance the ability of membership inference attacks on graph data privacy. To this end, we propose the robust graph neural network approach with noisy labeled graph classification. Specifically, we first accurately filter the noisy samples by high-confidence samples and the first feature principal component vector of each class. Then, the robust principal component vectors and the model output under data augmentation are utilized to achieve noise label correction guided by dual spatial information. Finally, supervised graph contrastive learning is introduced to enhance the embedding quality of the model and protect the privacy of the training graph data. The utility and privacy of the proposed method are validated by comparing twelve different methods on eight real graph classification datasets. Compared with the state-of-the-art methods, the RGLC method achieves at most and at least 7.8% and 0.8% performance gain at 30% noisy labeling rate, respectively, and reduces the accuracy of privacy attacks to below 60%.

Updated: 2024-06-11 14:44:37

标题: 重新思考图分类中噪声标签的影响：从效用和隐私角度进行分析

摘要: 基于消息传递机制的图神经网络在图分类任务中取得了先进的结果。然而，当训练数据中存在噪声标签时，它们的泛化性能会降低。大多数现有的噪声标记方法专注于视觉领域或图节点分类任务，并仅从实用性角度分析噪声标签的影响。与现有工作不同，本文从数据隐私和模型实用性的角度衡量了噪声标签对图分类的影响。我们发现噪声标签会降低模型的泛化性能，并增强对图数据隐私的成员推断攻击能力。因此，我们提出了一种具有嘈杂标记图分类的鲁棒图神经网络方法。具体来说，我们首先通过高置信度样本和每个类的第一个特征主成分向量准确地过滤嘈杂样本。然后，利用鲁棒主成分向量和数据增强下的模型输出来实现由双空间信息引导的噪声标签校正。最后，引入监督图对比学习来增强模型的嵌入质量并保护训练图数据的隐私。通过在八个真实图分类数据集上比较十二种不同方法，验证了所提出方法的实用性和隐私性。与最先进的方法相比，在30%的噪声标记率下，RGLC方法分别获得了最高和最低7.8%和0.8%的性能增益，并将隐私攻击的准确性降低到60%以下。

更新时间: 2024-06-11 14:44:37

领域: cs.LG

下载: http://arxiv.org/abs/2406.07314v1

AIGB: Generative Auto-bidding via Diffusion Modeling

Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon scenarios and makes the model unstable when dealing with highly random online advertising environments. To tackle this issue, this paper introduces AI-Generated Bidding (AIGB), a novel paradigm for auto-bidding through generative modeling. In this paradigm, we propose DiffBid, a conditional diffusion modeling approach for bid generation. DiffBid directly models the correlation between the return and the entire trajectory, effectively avoiding error propagation across time steps in long horizons. Additionally, DiffBid offers a versatile approach for generating trajectories that maximize given targets while adhering to specific constraints. Extensive experiments conducted on the real-world dataset and online A/B test on Alibaba advertising platform demonstrate the effectiveness of DiffBid, achieving 2.81% increase in GMV and 3.36% increase in ROI.

Updated: 2024-06-11 14:33:23

标题: AIGB：通过扩散建模的生成式自动竞价

摘要: 自动竞价在促进在线广告方面起着至关重要的作用，它可以为广告商自动提供竞价。强化学习（RL）因自动竞价而受到欢迎。然而，目前大多数RL自动竞价方法是通过马尔可夫决策过程（MDP）建模的，这种方法假定了马尔可夫状态转移。这种假设限制了在长期情景下的表现能力，并使模型在处理高度随机的在线广告环境时变得不稳定。为解决这一问题，本文介绍了AI生成竞价（AIGB），这是一种通过生成建模进行自动竞价的新范式。在这种范式中，我们提出了DiffBid，一种用于竞价生成的条件扩散建模方法。DiffBid直接建模了回报与整个轨迹之间的相关性，有效地避免了在长期情景下时间步骤之间的误差传播。此外，DiffBid提供了一种灵活的方法，可生成最大化给定目标并遵守特定约束的轨迹。在阿里巴巴广告平台上进行的真实数据集和在线A/B测试的广泛实验表明了DiffBid的有效性，实现了GMV增长2.81%和ROI增长3.36%。

更新时间: 2024-06-11 14:33:23

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2405.16141v2

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. The dataset consists of a local subset with questions pertinent to the Basque culture, and a global subset with questions of broader interest. We find that state-of-the-art LLMs struggle with local cultural knowledge, even as they excel on global topics. However, we show that continued pre-training in Basque significantly improves the models' performance on Basque culture, even when queried in English. To our knowledge, this is the first solid evidence of knowledge transfer from a low-resource to a high-resource language. Our analysis sheds light on the complex interplay between language and knowledge, and reveals that some prior findings do not fully hold when reassessed on local topics. Our dataset and evaluation code are available under open licenses at https://github.com/juletx/BertaQA.

Updated: 2024-06-11 14:30:34

标题: BertaQA：语言模型对本地文化了解多少？

摘要: 大型语言模型（LLMs）展现出对世界的广泛知识，但大多数评估仅限于全球或以英语为中心的主题。这引发了一个问题，即这些模型在与其他文化相关的主题上表现如何，这些文化在网络上并不那么显著。为了解决这一差距，我们介绍了BertaQA，这是一个并行于英语和巴斯克语的多项选择问答数据集。该数据集包括一组与巴斯克文化相关的问题和一组具有更广泛兴趣的全球问题。我们发现，最先进的LLMs在本地文化知识方面表现困难，即使在全球主题上表现出色。然而，我们展示了在巴斯克语中持续预训练显著提高了模型在巴斯克文化上的表现，即使是用英语查询。据我们所知，这是第一个从低资源语言到高资源语言的知识转移的确凿证据。我们的分析揭示了语言和知识之间复杂的相互作用，并揭示了在本地主题上重新评估时，一些先前的发现并不完全成立。我们的数据集和评估代码可在https://github.com/juletx/BertaQA 下方提供开放许可。

更新时间: 2024-06-11 14:30:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07302v1

Multi-objective Reinforcement learning from AI Feedback

This paper presents Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), a novel approach to improving the alignment and performance of language models trained using reinforcement learning from AI feedback (RLAIF). In contrast to standard approaches that train a single preference model to represent all human preferences, MORLAIF decomposes this task into multiple simpler principles, such as toxicity, factuality, and sycophancy. Separate preference models are trained for each principle using feedback from GPT-3.5-Turbo. These preference model scores are then combined using different scalarization functions to provide a reward signal for Proximal Policy Optimization (PPO) training of the target language model. Our experiments indicate that MORLAIF outperforms the standard RLAIF baselines and that MORLAIF can be used to align larger language models using smaller ones. Surprisingly, the choice of scalarization function does not appear to significantly impact the results.

Updated: 2024-06-11 14:24:00

标题: 多目标强化学习来自AI反馈

摘要: 本文提出了一种名为多目标强化学习来自AI反馈（MORLAIF）的方法，用于改善使用强化学习从AI反馈（RLAIF）训练的语言模型的对齐性和性能。与训练单一偏好模型以代表所有人类偏好的标准方法相比，MORLAIF将这个任务分解为多个更简单的原则，如毒性、事实性和谄媚性。针对每个原则，使用来自GPT-3.5-Turbo的反馈训练单独的偏好模型。然后，使用不同的标量化函数将这些偏好模型得分组合起来，为目标语言模型的Proximal Policy Optimization（PPO）训练提供奖励信号。我们的实验表明，MORLAIF优于标准的RLAIF基线，并且可以使用较小的语言模型来对齐更大的语言模型。令人惊讶的是，标量化函数的选择似乎并不显著影响结果。

更新时间: 2024-06-11 14:24:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.07295v1

Joint Learning of Context and Feedback Embeddings in Spoken Dialogue

Short feedback responses, such as backchannels, play an important role in spoken dialogue. So far, most of the modeling of feedback responses has focused on their timing, often neglecting how their lexical and prosodic form influence their contextual appropriateness and conversational function. In this paper, we investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective. In our evaluation, we primarily focus on how such embeddings can be used as a context-feedback appropriateness metric and thus for feedback response ranking in U.S. English dialogues. Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.

Updated: 2024-06-11 14:22:37

标题: 口语对话中上下文和反馈嵌入的联合学习

摘要: 简短的反馈回应，如背景响应，在口头对话中起着重要作用。到目前为止，大多数关于反馈回应建模的工作都集中在它们的时机上，常常忽略了它们的词汇和韵律形式如何影响其语境适宜性和对话功能。在本文中，我们通过对比学习目标，研究了在相同表示空间中嵌入短对话背景和反馈回应的可能性。在我们的评估中，我们主要关注这样的嵌入如何被用作上下文-反馈适宜性度量，从而用于美国英语对话中的反馈回应排序。我们的结果表明，该模型在相同的排序任务中优于人类，并且学习到的嵌入携带有关反馈回应对话功能的信息。

更新时间: 2024-06-11 14:22:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07291v1

Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application.

Updated: 2024-06-11 14:22:14

标题: 信任公平数据：利用质量在以公平为驱动的数据去除技术中

摘要: 在本文中，我们讨论了消除偏见的技术，即从训练集中删除特定数据点，以实现该集合中人口的公平代表性。机器学习模型是在这些经过预处理的数据集上进行训练的，并且预期它们的预测是公平的。然而，这种方法可能会排除相关数据，使得获得的子集对进一步使用的可信度降低。为了增强先前方法的可信度，我们提出了子集必须满足的额外要求和目标，除了公平性外还包括：(1)群体覆盖和(2)最小数据损失。尽管删除整个群体可能会提高测量公平性的水平，但这种做法是非常有问题的，因为未能代表每个群体不能被认为是公平的。在我们的第二个关注点中，我们主张保留数据同时最小化歧视。通过引入考虑公平性和数据损失的多目标优化问题，我们提出了一种方法来找到平衡这些目标的帕累托最优解。通过识别这样的解决方案，用户可以就公平性和数据质量之间的权衡进行明智的决策，并选择最适合其应用程序的子集。

更新时间: 2024-06-11 14:22:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.12926v2

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data and pretrained models, which have not been fully utilized in the development of S2ST models. Inspired by this, in this paper, we first introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model. Furthermore, to eliminate the reliance on parallel speech data, we propose a novel training method ComSpeech-ZS that solely utilizes S2TT and TTS data. It aligns representations in the latent space through contrastive learning, enabling the speech synthesis capability learned from the TTS data to generalize to S2ST in a zero-shot manner. Experimental results on the CVSS dataset show that when the parallel speech data is available, ComSpeech surpasses previous two-pass models like UnitY and Translatotron 2 in both translation quality and decoding speed. When there is no parallel speech data, ComSpeech-ZS lags behind \name by only 0.7 ASR-BLEU and outperforms the cascaded models.

Updated: 2024-06-11 14:17:12

标题: 我们能否在没有平行语音数据的情况下实现高质量的直接语音到语音翻译？

摘要: 最近提出的两步直接语音到语音翻译（S2ST）模型将任务分解为端到端模型中的语音到文本翻译（S2TT）和文本到语音（TTS），取得了有希望的结果。然而，这些模型的训练仍然依赖于平行语音数据，收集起来非常具有挑战性。相比之下，S2TT和TTS已经积累了大量的数据和预训练模型，但在S2ST模型的开发中尚未得到充分利用。受此启发，本文首先介绍了一个名为ComSpeech的复合S2ST模型，它可以无缝地将任何预训练的S2TT和TTS模型集成到一个直接的S2ST模型中。此外，为了消除对平行语音数据的依赖，我们提出了一种新颖的训练方法ComSpeech-ZS，它只利用S2TT和TTS数据。通过对比学习在潜在空间中对齐表示，使从TTS数据中学到的语音合成能力以零-shot方式泛化到S2ST。在CVSS数据集上的实验结果显示，当有平行语音数据时，ComSpeech在翻译质量和解码速度上均超过了以前的两步模型如UnitY和Translatotron 2。当没有平行语音数据时，ComSpeech-ZS仅在ASR-BLEU上落后于\name 0.7，并且优于级联模型。

更新时间: 2024-06-11 14:17:12

领域: cs.CL,cs.AI,cs.SD,eess.AS,I.2.7

下载: http://arxiv.org/abs/2406.07289v1

Unsupervised Object Detection with Theoretical Guarantees

Unsupervised object detection using deep neural networks is typically a difficult problem with few to no guarantees about the learned representation. In this work we present the first unsupervised object detection method that is theoretically guaranteed to recover the true object positions up to quantifiable small shifts. We develop an unsupervised object detection architecture and prove that the learned variables correspond to the true object positions up to small shifts related to the encoder and decoder receptive field sizes, the object sizes, and the widths of the Gaussians used in the rendering process. We perform detailed analysis of how the error depends on each of these variables and perform synthetic experiments validating our theoretical predictions up to a precision of individual pixels. We also perform experiments on CLEVR-based data and show that, unlike current SOTA object detection methods (SAM, CutLER), our method's prediction errors always lie within our theoretical bounds. We hope that this work helps open up an avenue of research into object detection methods with theoretical guarantees.

Updated: 2024-06-11 14:12:31

标题: 具有理论保证的无监督目标检测

摘要: 使用深度神经网络进行无监督目标检测通常是一个困难的问题，对于学习到的表示几乎没有任何保证。在这项工作中，我们提出了第一个无监督目标检测方法，理论上保证能够恢复真实目标位置，直至可以量化的小偏移。我们开发了一个无监督目标检测架构，并证明学习到的变量与真实目标位置相对应，直到与编码器和解码器感受域大小、目标大小以及渲染过程中使用的高斯函数宽度相关的小偏移。我们对每个变量的误差如何取决于进行了详细分析，并进行了合成实验验证我们的理论预测，达到了单个像素的精度。我们还在基于CLEVR的数据上进行了实验，并展示，与当前的SOTA目标检测方法（如SAM、CutLER）不同，我们的方法的预测误差始终在我们的理论界限内。我们希望这项工作有助于开辟一个具有理论保证的目标检测方法的研究方向。

更新时间: 2024-06-11 14:12:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07284v1

Sparsity in neural networks can improve their privacy

This article measures how sparsity can make neural networks more robust to membership inference attacks. The obtained empirical results show that sparsity improves the privacy of the network, while preserving comparable performances on the task at hand. This empirical study completes and extends existing literature.

Updated: 2024-06-11 14:10:46

标题: 神经网络中的稀疏性可以提高其隐私性

摘要: 本文研究了稀疏性如何使神经网络更加抵抗成员推断攻击。所得到的实证结果表明，稀疏性可以提高网络的隐私性，同时保持在当前任务上相当的性能。这一实证研究对现有文献进行了补充和拓展。

更新时间: 2024-06-11 14:10:46

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2304.10553v2

Calibration of Time-Series Forecasting: Detecting and Adapting Context-Driven Distribution Shift

Recent years have witnessed the success of introducing deep learning models to time series forecasting. From a data generation perspective, we illustrate that existing models are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. Such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigms. In this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained model. To this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. A high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. In this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". This framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. Our theoretical analysis demonstrates that this adaptation strategy can achieve an optimal bias-variance trade-off. Notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of models. Extensive experiments show that SOLID consistently enhances the performance of current forecasting models on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validating the effectiveness of the calibration approach.

Updated: 2024-06-11 14:07:17

标题: 时间序列预测的校准：检测和适应上下文驱动的分布转移

摘要: 近年来，深度学习模型在时间序列预测中取得了成功。从数据生成的角度来看，我们阐明了现有模型容易受到由观察或未观察到的时间背景驱动的分布转移的影响。这种上下文驱动的分布转移（CDS）会在特定背景下引入预测偏差，并对传统训练范式提出挑战。在本文中，我们介绍了一种用于检测和适应CDS的通用校准方法，该方法需要一个经过训练的模型。为此，我们提出了一种新颖的CDS检测器，称为“基于残差的CDS检测器”或“再调节器”，通过评估预测残差与其对应背景之间的互信息来量化模型对CDS的易感性。高再调节器分数表明严重的易感性，因此需要模型适应。在这种情况下，我们提出了一种简单但有效的模型校准适配器框架，称为“样本级上下文化适配器”或“SOLID”。该框架涉及策划一个与提供的测试样本相似的背景数据集，并随后通过有限步数对模型的预测层进行微调。我们的理论分析表明，这种适应策略可以实现最佳的偏差-方差平衡。值得注意的是，我们提出的再调节器和SOLID不受模型限制，并且可轻松适用于各种模型。大量实验证明，SOLID在真实世界数据集上持续提升了当前预测模型的性能，特别是在由我们提出的再调节器检测到的存在显著CDS的情况下，从而验证了校准方法的有效性。

更新时间: 2024-06-11 14:07:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.14838v2

Clifford-Steerable Convolutional Neural Networks

We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincar\'e-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrization of $\mathrm{O}(p,q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks.

Updated: 2024-06-11 14:05:40

标题: Clifford-Steerable 卷积神经网络

摘要: 我们提出了Clifford-Steerable卷积神经网络（CS-CNNs），这是一种新颖的$\mathrm{E}(p, q)$-等变CNNs类。CS-CNNs处理伪欧几里得空间$\mathbb{R}^{p,q}$上的多向量场。例如，它们在$\mathbb{R}^3$上实现$\mathrm{E}(3)$-等变性，在闵可夫斯基时空$\mathbb{R}^{1,3}$上实现Poincar\'e-等变性。我们的方法基于Clifford群等变神经网络对$\mathrm{O}(p,q)$-可操控核的隐式参数化。我们在流体动力学以及相对论电动力学预测任务上显著且一致地优于基线方法。

更新时间: 2024-06-11 14:05:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.14730v2

ToNER: Type-oriented Named Entity Recognition with Generative Language Model

In recent years, the fine-tuned generative models have been proven more powerful than the previous tagging-based or span-based models on named entity recognition (NER) task. It has also been found that the information related to entities, such as entity types, can prompt a model to achieve NER better. However, it is not easy to determine the entity types indeed existing in the given sentence in advance, and inputting too many potential entity types would distract the model inevitably. To exploit entity types' merit on promoting NER task, in this paper we propose a novel NER framework, namely ToNER based on a generative model. In ToNER, a type matching model is proposed at first to identify the entity types most likely to appear in the sentence. Then, we append a multiple binary classification task to fine-tune the generative model's encoder, so as to generate the refined representation of the input sentence. Moreover, we add an auxiliary task for the model to discover the entity types which further fine-tunes the model to output more accurate results. Our extensive experiments on some NER benchmarks verify the effectiveness of our proposed strategies in ToNER that are oriented towards entity types' exploitation.

Updated: 2024-06-11 14:05:03

标题: ToNER: 用生成语言模型的类型导向命名实体识别

摘要: 近年来，已经证明优化的生成模型比先前基于标记或基于跨度的模型在命名实体识别（NER）任务上更强大。研究还发现，与实体相关的信息，如实体类型，可以促使模型更好地实现NER。然而，事先确定给定句子中确实存在的实体类型并不容易，输入过多的潜在实体类型将不可避免地分散模型的注意力。为了利用实体类型在推动NER任务中的优势，在本文中我们提出了一种基于生成模型的新型NER框架，即ToNER。在ToNER中，首先提出了一个类型匹配模型，用于识别最有可能出现在句子中的实体类型。然后，我们附加了一个多个二进制分类任务，以微调生成模型的编码器，以生成输入句子的精细表示。此外，我们为模型添加了一个辅助任务，以发现进一步微调模型以输出更准确结果的实体类型。我们在一些NER基准上进行了大量实验证实了我们在ToNER中提出的针对实体类型开发的策略的有效性。

更新时间: 2024-06-11 14:05:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.09145v2

On Kernel's Safety in the Spectre Era (Extended Version)

The efficacy of address space layout randomization has been formally demonstrated in a shared-memory model by Abadi et al., contingent on specific assumptions about victim programs. However, modern operating systems, implementing layout randomization in the kernel, diverge from these assumptions and operate on a separate memory model with communication through system calls. In this work, we relax Abadi et al.'s language assumptions while demonstrating that layout randomization offers a comparable safety guarantee in a system with memory separation. However, in practice, speculative execution and side-channels are recognized threats to layout randomization. We show that kernel safety cannot be restored for attackers capable of using side-channels and speculative execution and introduce a new condition, that allows us to formally prove kernel safety in the Spectre era. Our research demonstrates that under this condition, the system remains safe without relying on layout randomization. We also demonstrate that our condition can be sensibly weakened, leading to enforcement mechanisms that can guarantee kernel safety for safe system calls in the Spectre era.

Updated: 2024-06-11 14:04:58

标题: 在幽灵时代的内核安全性（扩展版）

摘要: 地址空间布局随机化的有效性已经在共享内存模型中由Abadi等人正式证明，前提是关于受害程序的特定假设。然而，现代操作系统在内核中实现布局随机化，与这些假设不同，使用系统调用进行通信在一个独立的内存模型上运行。在这项工作中，我们放宽了Abadi等人的语言假设，同时展示了在一个具有内存分离的系统中，布局随机化提供了相当的安全保证。然而，在实践中，推测执行和侧信道被认为是布局随机化的威胁。我们展示了对于能够利用侧信道和推测执行的攻击者来说，无法恢复内核的安全性，并引入了一个新的条件，使我们能够在“幽灵”时代正式证明内核的安全性。我们的研究表明，在这种条件下，系统仍然可以保持安全，而不依赖于布局随机化。我们还展示了我们的条件可以得到合理的弱化，导致实施机制可以为“幽灵”时代的安全系统调用保证内核的安全性。

更新时间: 2024-06-11 14:04:58

领域: cs.CR

下载: http://arxiv.org/abs/2406.07278v1

Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication

Effective communication requires the ability to refer to specific parts of an observation in relation to others. While emergent communication literature shows success in developing various language properties, no research has shown the emergence of such positional references. This paper demonstrates how agents can communicate about spatial relationships within their observations. The results indicate that agents can develop a language capable of expressing the relationships between parts of their observation, achieving over 90% accuracy when trained in a referential game which requires such communication. Using a collocation measure, we demonstrate how the agents create such references. This analysis suggests that agents use a mixture of non-compositional and compositional messages to convey spatial relationships. We also show that the emergent language is interpretable by humans. The translation accuracy is tested by communicating with the receiver agent, where the receiver achieves over 78% accuracy using parts of this lexicon, confirming that the interpretation of the emergent language was successful.

Updated: 2024-06-11 14:04:25

标题: 用你的语言说话：可解释性紧急沟通中的空间关系

摘要: 有效的沟通需要能够在观察中参照特定部分与其他部分之间的关系。尽管新兴的沟通文献在发展各种语言属性方面取得了成功，但没有研究表明这种位置参考的出现。本文演示了代理人如何能够就他们的观察中的空间关系进行沟通。结果表明，代理人可以发展一种能够表达他们观察中部分之间关系的语言，在接受训练并需要这种沟通的指代游戏中，达到了超过90%的准确率。通过使用词组搭配度量，我们演示了代理人如何创建这种参考。这种分析表明，代理人使用非组合和组合信息混合来传达空间关系。我们还展示了新兴语言是可解释的。翻译准确性通过与接收代理人进行沟通进行测试，接收者使用这个词汇的部分达到了超过78%的准确率，证实了新兴语言的解释是成功的。

更新时间: 2024-06-11 14:04:25

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.07277v1

DCA-Bench: A Benchmark for Dataset Curation Agents

The quality of datasets plays an increasingly crucial role in the research and development of modern artificial intelligence (AI). Despite the proliferation of open dataset platforms nowadays, data quality issues, such as insufficient documentation, inaccurate annotations, and ethical concerns, remain common in datasets widely used in AI. Furthermore, these issues are often subtle and difficult to be detected by rule-based scripts, requiring expensive manual identification and verification by dataset users or maintainers. With the increasing capability of large language models (LLMs), it is promising to streamline the curation of datasets with LLM agents. In this work, as the initial step towards this goal, we propose a dataset curation agent benchmark, DCA-Bench, to measure LLM agents' capability of detecting hidden dataset quality issues. Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed. Additionally, to establish an automatic pipeline for evaluating the success of LLM agents, which requires a nuanced understanding of the agent outputs, we implement a dedicated Evaluator using another LLM agent. We demonstrate that the LLM-based Evaluator empirically aligns well with human evaluation, allowing reliable automatic evaluation on the proposed benchmark. We further conduct experiments on several baseline LLM agents on the proposed benchmark and demonstrate the complexity of the task, indicating that applying LLMs to real-world dataset curation still requires further in-depth exploration and innovation. Finally, the proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving. The benchmark suite is available at \url{https://github.com/TRAIS-Lab/dca-bench}.

Updated: 2024-06-11 14:02:23

标题: DCA-Bench：数据集筛选代理的基准测试

摘要: 数据集的质量在现代人工智能（AI）的研究和开发中扮演着越来越关键的角色。尽管如今存在许多开放数据集平台，但数据质量问题，如文档不足、注释不准确和伦理问题，在广泛应用于AI的数据集中仍然很常见。此外，这些问题通常是微妙的，很难通过基于规则的脚本来检测，需要数据集用户或维护者进行昂贵的手动识别和验证。随着大型语言模型（LLM）能力的增强，利用LLM代理简化数据集的整理是有希望的。在这项工作中，作为实现这一目标的初始步骤，我们提出了一个数据集整理代理基准，DCA-Bench，用于衡量LLM代理检测隐藏数据集质量问题的能力。具体来说，我们从八个开放数据集平台收集了多样化的真实世界数据集质量问题作为测试基础。此外，为了建立一个评估LLM代理成功的自动管道，这需要对代理输出有细致的理解，我们使用另一个LLM代理实现了一个专门的评估器。我们证明了基于LLM的评估器在实证上与人类评估很好地契合，允许对所提出的基准进行可靠的自动评估。我们进一步在所提出的基准上对几个基线LLM代理进行实验证明了任务的复杂性，表明将LLM应用于实际数据集整理仍需要进一步深入探索和创新。最后，所提出的基准还可以作为一个测试LLM在问题发现方面能力的测试基础，而不仅仅是问题解决。基准套件可在\url{https://github.com/TRAIS-Lab/dca-bench}上找到。

更新时间: 2024-06-11 14:02:23

领域: cs.AI

下载: http://arxiv.org/abs/2406.07275v1

Fun with Flags: Robust Principal Directions via Flag Manifolds

Principal component analysis (PCA), along with its extensions to manifolds and outlier contaminated data, have been indispensable in computer vision and machine learning. In this work, we present a unifying formalism for PCA and its variants, and introduce a framework based on the flags of linear subspaces, ie a hierarchy of nested linear subspaces of increasing dimension, which not only allows for a common implementation but also yields novel variants, not explored previously. We begin by generalizing traditional PCA methods that either maximize variance or minimize reconstruction error. We expand these interpretations to develop a wide array of new dimensionality reduction algorithms by accounting for outliers and the data manifold. To devise a common computational approach, we recast robust and dual forms of PCA as optimization problems on flag manifolds. We then integrate tangent space approximations of principal geodesic analysis (tangent-PCA) into this flag-based framework, creating novel robust and dual geodesic PCA variations. The remarkable flexibility offered by the 'flagification' introduced here enables even more algorithmic variants identified by specific flag types. Last but not least, we propose an effective convergent solver for these flag-formulations employing the Stiefel manifold. Our empirical results on both real-world and synthetic scenarios, demonstrate the superiority of our novel algorithms, especially in terms of robustness to outliers on manifolds.

Updated: 2024-06-11 14:01:18

标题: 与旗帜共舞：通过旗帜流形获得稳健的主方向

摘要: 主成分分析（PCA）及其在流形和含有异常值数据中的扩展，在计算机视觉和机器学习中发挥着不可或缺的作用。在这项工作中，我们提出了一个统一的PCA及其变种的形式化框架，并介绍了一个基于线性子空间的标志的框架，即一个增加维度的嵌套线性子空间的层次结构，这不仅允许共同实施，还产生了以前未曾探索的新变种。我们从泛化传统的PCA方法开始，这些方法要么最大化方差，要么最小化重构误差。我们扩展这些解释，通过考虑异常值和数据流形，开发了一系列新的降维算法。为了设计一个共同的计算方法，我们将鲁棒和对偶形式的PCA重新构造为在标志流形上的优化问题。然后，我们将主切线分析（切线PCA）的切线空间近似集成到基于标志的框架中，创造出新颖的鲁棒和对偶测地线PCA变体。这里介绍的“标志化”所提供的卓越灵活性，甚至还能通过特定标志类型识别更多的算法变体。最后，我们提出了一个有效的收敛求解器，用于这些基于标志的公式，采用了斯蒂费尔流形。我们在真实世界和合成场景中的实证结果，展示了我们的新算法的优越性，特别是在流形中对异常值的鲁棒性方面。

更新时间: 2024-06-11 14:01:18

领域: cs.CV,cs.LG,math.DG,math.OC,stat.ML

下载: http://arxiv.org/abs/2401.04071v3

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.

Updated: 2024-06-11 13:54:55

标题: S-HR-VQVAE：用于视频预测的顺序层次残差学习矢量量化变分自编码器

摘要: 我们通过提出一种结合了（i）我们最近提出的分层残差向量量化变分自动编码器（HR-VQVAE）和（ii）一种新型时空PixelCNN（ST-PixelCNN）的模型来解决视频预测任务。我们将这种方法称为顺序分层残差学习向量量化变分自动编码器（S-HR-VQVAE）。通过利用HR-VQVAE在建模静止图像方面的内在能力，结合ST-PixelCNN处理时空信息的能力，S-HR-VQVAE可以更好地应对视频预测中的主要挑战。这些挑战包括学习时空信息、处理高维数据、对抗模糊预测以及隐式建模物理特性。在KTH Human Action和Moving-MNIST任务上的大量实验结果表明，尽管模型规模要小得多，但我们的模型在定量和定性评估中与顶级视频预测技术相比表现优异。最后，我们通过提出一种新的训练方法来共同估计HR-VQVAE和ST-PixelCNN参数，从而提升了S-HR-VQVAE的性能。

更新时间: 2024-06-11 13:54:55

领域: cs.CV,cs.AI,cs.LG,I.2.10; I.4.10; I.4.5; I.4.2; I.2.6

下载: http://arxiv.org/abs/2307.06701v2

Efficient 3D Molecular Generation with Flow Matching and Scale Optimal Transport

Generative models for 3D drug design have gained prominence recently for their potential to design ligands directly within protein pockets. Current approaches, however, often suffer from very slow sampling times or generate molecules with poor chemical validity. Addressing these limitations, we propose Semla, a scalable E(3)-equivariant message passing architecture. We further introduce a molecular generation model, MolFlow, which is trained using flow matching along with scale optimal transport, a novel extension of equivariant optimal transport. Our model produces state-of-the-art results on benchmark datasets with just 100 sampling steps. Crucially, MolFlow samples high quality molecules with as few as 20 steps, corresponding to a two order-of-magnitude speed-up compared to state-of-the-art, without sacrificing performance. Furthermore, we highlight limitations of current evaluation methods for 3D generation and propose new benchmark metrics for unconditional molecular generators. Finally, using these new metrics, we compare our model's ability to generate high quality samples against current approaches and further demonstrate MolFlow's strong performance.

Updated: 2024-06-11 13:51:51

标题: 高效的三维分子生成：流匹配和尺度最优输运

摘要: 最近，用于3D药物设计的生成模型因其在蛋白质口袋内直接设计配体的潜力而备受关注。然而，目前的方法往往在采样时间非常缓慢或生成化学有效性差的分子。为解决这些限制，我们提出了Semla，一种可扩展的E(3)-等变消息传递架构。我们进一步介绍了一种分子生成模型MolFlow，该模型使用流匹配和尺度最优输运进行训练，这是等变最优输运的一种新扩展。我们的模型在基准数据集上仅需100个采样步骤即可产生最先进的结果。关键是，MolFlow在仅需20个步骤时即可采样高质量分子，与最先进技术相比，速度提高了两个数量级，而不会牺牲性能。此外，我们强调了当前用于3D生成的评估方法的局限性，并为无条件的分子生成器提出了新的基准指标。最后，利用这些新指标，我们比较了我们的模型生成高质量样本的能力与当前方法，并进一步展示了MolFlow的强大性能。

更新时间: 2024-06-11 13:51:51

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.07266v1

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.

Updated: 2024-06-11 13:42:57

标题: VulDetectBench：使用大型语言模型评估漏洞检测的深度能力

摘要: 大型语言模型（LLM）具有包含大量程序代码的训练语料库，极大地提高了模型对代码的理解和生成能力。然而，关于检测程序漏洞的全面研究仍然缺乏，这是与代码相关的更具体的任务，并评估LLM在这种更专业情景下的性能。为了解决漏洞分析中的常见挑战，我们的研究引入了一个新的基准，VulDetectBench，专门设计用于评估LLM的漏洞检测能力。该基准通过五个难度逐渐增加的任务全面评估LLM辨别、分类和定位漏洞的能力。我们评估了17个模型（包括开源和闭源），发现现有模型可以在与漏洞识别和分类相关的任务上实现超过80%的准确率，但在特定、更详细的漏洞分析任务上仍然表现不佳，准确率不到30%，难以为专业漏洞挖掘提供有价值的辅助信息。我们的基准有效地评估了各种LLM在漏洞检测这一具体任务的不同水平上的能力，为未来在代码安全这一关键领域的研究和改进奠定了基础。VulDetectBench可在https://github.com/Sweetaroo/VulDetectBench 上公开获取。

更新时间: 2024-06-11 13:42:57

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.07595v1

Active learning for affinity prediction of antibodies

The primary objective of most lead optimization campaigns is to enhance the binding affinity of ligands. For large molecules such as antibodies, identifying mutations that enhance antibody affinity is particularly challenging due to the combinatorial explosion of potential mutations. When the structure of the antibody-antigen complex is available, relative binding free energy (RBFE) methods can offer valuable insights into how different mutations will impact the potency and selectivity of a drug candidate, thereby reducing the reliance on costly and time-consuming wet-lab experiments. However, accurately simulating the physics of large molecules is computationally intensive. We present an active learning framework that iteratively proposes promising sequences for simulators to evaluate, thereby accelerating the search for improved binders. We explore different modeling approaches to identify the most effective surrogate model for this task, and evaluate our framework both using pre-computed pools of data and in a realistic full-loop setting.

Updated: 2024-06-11 13:42:49

标题: 抗体亲和力预测的主动学习

摘要: 大多数优化铅化活动的主要目标是增强配体的结合亲和力。对于大分子如抗体，鉴定能够增强抗体亲和力的突变尤其具有挑战性，因为潜在突变的组合爆炸性增长。当抗体-抗原复合物的结构可用时，相对结合自由能（RBFE）方法可以提供有价值的见解，了解不同突变将如何影响药物候选物的效力和选择性，从而减少对昂贵和耗时的湿实验的依赖。然而，准确模拟大分子的物理性质在计算上是非常耗时的。我们提出了一个主动学习框架，通过迭代提出有希望的序列供模拟器评估，从而加速寻找改进配体的过程。我们探索不同的建模方法，以确定对于这项任务最有效的替代模型，并使用预先计算的数据池以及在真实的完整循环设置中评估我们的框架。

更新时间: 2024-06-11 13:42:49

领域: cs.LG,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2406.07263v1

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Powered by remarkable advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks. However, the practical application scenarios of MLLMs are intricate, exposing them to potential malicious instructions and thereby posing safety risks. While current benchmarks do incorporate certain safety considerations, they often lack comprehensive coverage and fail to exhibit the necessary rigor and robustness. For instance, the common practice of employing GPT-4V as both the evaluator and a model to be evaluated lacks credibility, as it tends to exhibit a bias toward its own responses. In this paper, we present MLLMGuard, a multidimensional safety evaluation suite for MLLMs, including a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator. MLLMGuard's assessment comprehensively covers two languages (English and Chinese) and five important safety dimensions (Privacy, Bias, Toxicity, Truthfulness, and Legality), each with corresponding rich subtasks. Focusing on these dimensions, our evaluation dataset is primarily sourced from platforms such as social media, and it integrates text-based and image-based red teaming techniques with meticulous annotation by human experts. This can prevent inaccurate evaluation caused by data leakage when using open-source datasets and ensures the quality and challenging nature of our benchmark. Additionally, a fully automated lightweight evaluator termed GuardRank is developed, which achieves significantly higher evaluation accuracy than GPT-4. Our evaluation results across 13 advanced models indicate that MLLMs still have a substantial journey ahead before they can be considered safe and responsible.

Updated: 2024-06-11 13:41:33

标题: MLLMGuard：用于多模态大型语言模型的多维安全评估套件

摘要: 由于大规模语言模型（LLMs）取得了显著进展，多模态大语言模型（MLLMs）展示出在多种任务中令人印象深刻的能力。然而，MLLMs的实际应用场景复杂，暴露它们于潜在的恶意指令，从而构成安全风险。虽然当前的基准测试确实包含一定的安全考虑，但它们往往缺乏全面的覆盖并且缺乏必要的严谨性和鲁棒性。例如，常见的做法是将GPT-4V同时用作评估者和待评估模型，这缺乏可信度，因为它往往会偏向于自己的回应。在本文中，我们提出了MLLMGuard，一个用于MLLMs的多维安全评估套件，包括一个双语图像-文本评估数据集、推断工具和一个轻量级评估器。MLLMGuard的评估全面涵盖了两种语言（英语和中文）和五个重要的安全维度（隐私、偏见、有毒性、真实性和合法性），每个维度都有相应丰富的子任务。我们的评估数据集主要来源于社交媒体等平台，并且通过人工专家的细致注释，将文本和图像的红队技术结合起来。这可以防止由于使用开源数据集时的数据泄露而导致的评估不准确，并确保我们基准测试的质量和挑战性质。此外，我们开发了一个完全自动化的轻量级评估器GuardRank，其评估准确度显著高于GPT-4。我们对13个高级模型的评估结果表明，MLLMs在被认为安全和负责之前还有很长的路要走。

更新时间: 2024-06-11 13:41:33

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.07594v1

Scientific Computing with Large Language Models

We provide an overview of the emergence of large language models for scientific computing applications. We highlight use cases that involve natural language processing of scientific documents and specialized languages designed to describe physical systems. For the former, chatbot style applications appear in medicine, mathematics and physics and can be used iteratively with domain experts for problem solving. We also review specialized languages within molecular biology, the languages of molecules, proteins, and DNA where language models are being used to predict properties and even create novel physical systems at much faster rates than traditional computing methods.

Updated: 2024-06-11 13:39:07

标题: 大型语言模型在科学计算中的应用

摘要: 我们概述了大型语言模型在科学计算应用中的出现。我们重点介绍了涉及科学文档自然语言处理和描述物理系统的专门语言的用例。对于前者，在医学、数学和物理学中出现了类似于聊天机器人的应用，可以与领域专家进行迭代使用以解决问题。我们还回顾了分子生物学领域的专门语言，即分子、蛋白质和DNA的语言模型正在被用于预测性质，甚至以比传统计算方法快得多的速度创建新的物理系统。

更新时间: 2024-06-11 13:39:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07259v1

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

This paper introduces a scholarly Question Answering (QA) system on top of the NFDI4DataScience Gateway, employing a Retrieval Augmented Generation-based (RAG) approach. The NFDI4DS Gateway, as a foundational framework, offers a unified and intuitive interface for querying various scientific databases using federated search. The RAG-based scholarly QA, powered by a Large Language Model (LLM), facilitates dynamic interaction with search results, enhancing filtering capabilities and fostering a conversational engagement with the Gateway search. The effectiveness of both the Gateway and the scholarly QA system is demonstrated through experimental analysis.

Updated: 2024-06-11 13:36:19

标题: 《在NFDI4DataScience门户中利用大型语言模型进行学术问答》

摘要: 本文介绍了一个基于NFDI4DataScience Gateway的学术问答（QA）系统，采用了一种基于检索增强生成（RAG）的方法。作为基础框架，NFDI4DS Gateway提供了一个统一和直观的界面，用于查询各种科学数据库，使用联合搜索。基于RAG的学术QA系统，由一个大型语言模型（LLM）驱动，促进了与搜索结果的动态交互，增强了过滤能力，并促进了与Gateway搜索的对话互动。通过实验分析，证明了Gateway和学术QA系统的有效性。

更新时间: 2024-06-11 13:36:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07257v1

Modeling Sustainable Resource Management using Active Inference

Active inference helps us simulate adaptive behavior and decision-making in biological and artificial agents. Building on our previous work exploring the relationship between active inference, well-being, resilience, and sustainability, we present a computational model of an agent learning sustainable resource management strategies in both static and dynamic environments. The agent's behavior emerges from optimizing its own well-being, represented by prior preferences, subject to beliefs about environmental dynamics. In a static environment, the agent learns to consistently consume resources to satisfy its needs. In a dynamic environment where resources deplete and replenish based on the agent's actions, the agent adapts its behavior to balance immediate needs with long-term resource availability. This demonstrates how active inference can give rise to sustainable and resilient behaviors in the face of changing environmental conditions. We discuss the implications of our model, its limitations, and suggest future directions for integrating more complex agent-environment interactions. Our work highlights active inference's potential for understanding and shaping sustainable behaviors.

Updated: 2024-06-11 13:36:12

标题: 运用主动推理建模可持续资源管理

摘要: 主动推断有助于我们模拟生物和人工智能代理的适应性行为和决策。在我们之前探索主动推断、幸福、适应性和可持续性之间关系的基础上，我们提出了一个计算模型，描述了一个代理学习在静态和动态环境下可持续资源管理策略的过程。代理的行为是通过优化其自身幸福（由先验偏好表示），并考虑对环境动态的信念而产生的。在静态环境中，代理学会持续消耗资源以满足其需求。在资源根据代理的行动而减少和补充的动态环境中，代理会调整其行为以平衡即时需求和长期资源可用性。这展示了在不断变化的环境条件下，主动推断如何导致可持续和适应性行为。我们讨论了我们模型的影响、限制，并建议未来整合更复杂的代理-环境交互的方向。我们的工作突显了主动推断在理解和塑造可持续行为方面的潜力。

更新时间: 2024-06-11 13:36:12

领域: cs.AI

下载: http://arxiv.org/abs/2406.07593v1

Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference

The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) treat time series data as observations from a control path, parameterise a CDE's vector field using a neural network, and use the solution path as a continuously evolving hidden state. As their formulation makes them robust to irregular sampling rates, NCDEs are a powerful approach for modelling real-world data. Building on neural rough differential equations (NRDEs), we introduce Log-NCDEs, a novel, effective, and efficient method for training NCDEs. The core component of Log-NCDEs is the Log-ODE method, a tool from the study of rough paths for approximating a CDE's solution. Log-NCDEs are shown to outperform NCDEs, NRDEs, the linear recurrent unit, S5, and MAMBA on a range of multivariate time series datasets with up to $50{,}000$ observations.

Updated: 2024-06-11 13:35:52

标题: 对数神经控制微分方程：李括号的差异性

摘要: 受控微分方程（CDE）的向量场描述了控制路径与解路径演化之间的关系。神经CDE（NCDE）将时间序列数据视为来自控制路径的观测值，使用神经网络对CDE的向量场进行参数化，并将解路径作为持续演化的隐藏状态。由于其制定使其能够适应不规则的采样率，NCDE是建模现实世界数据的强大方法。基于神经粗糙微分方程（NRDE），我们引入了Log-NCDE，这是一种新颖、有效、高效的训练NCDE的方法。Log-NCDE的核心组件是Log-ODE方法，这是从粗糙路径研究中用于近似CDE解的工具。通过对多变量时间序列数据集进行测试，Log-NCDE表现优于NCDE、NRDE、线性递归单元、S5和MAMBA，最多达到50000个观测值。

更新时间: 2024-06-11 13:35:52

领域: cs.LG

下载: http://arxiv.org/abs/2402.18512v2

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.

Updated: 2024-06-11 13:35:50

标题: AS-70：一份用于自动语音识别和口吃事件检测的普通话口吃语音数据集

摘要: 在过去的二十年里，语音技术的快速发展导致了在流利语音识别（ASR）等任务中达到了人类水平的表现。然而，当应用于非典型语音，如口吃时，这些模型的有效性会降低。本文介绍了AS-70，第一个公开可用的普通话口吃语音数据集，它是其类别中最大的数据集。AS-70包含了会话和语音命令朗读语音，包括逐字手动转录，适用于各种语音相关任务。此外，建立了基准系统，并针对ASR和口吃事件检测（SED）任务提供了实验结果。通过将这个数据集纳入模型微调，观察到了在最先进的ASR模型（如Whisper和Hubert）中的显著改进，增强了它们在处理口吃语音方面的包容性。

更新时间: 2024-06-11 13:35:50

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.07256v1

WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research?

Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.

Updated: 2024-06-11 13:34:09

标题: 怪异的ICWSM：社交计算研究有多西方化、受教育、工业化、富有和民主化？

摘要: 社会计算领域的许多研究分析来自社交媒体平台的数据，这些数据可能天生带有偏见。一个被忽视的偏见来源是WEIRD（西方、受教育、工业化、富裕和民主）人口的过度代表，这可能不能准确反映全球的人口多样性。我们评估了在AAAI ICWSM会议上呈现的研究对WEIRD人口的依赖性；这是唯一一家其会议记录完全致力于社交计算研究的场所。我们通过分析2018年至2022年发表的494篇论文，包括完整的研究论文、数据集论文和海报，来进行评估。在筛选掉分析合成数据集或缺乏明确来源国家的论文后，我们剩下420篇论文，其中188名参与者在一项众包研究中进行了全面手动验证，提取数据用于计算WEIRD分数。然后，该数据被用来调整现有的WEIRD指标，以适用于社交媒体数据。我们发现，这些论文中有37%仅关注来自西方国家的数据。这一比例远远低于CHI（76%）和FAccT（84%）会议研究中观察到的比例，表明ICWSM中数据集来源的多样性更大。然而，在ICWSM中的研究仍然主要关注受教育、工业化和富裕的国家人口，与FAccT中的情况相比，特别提到“民主”变量反映政治自由和权利。这突显了社交媒体数据在揭示受限制政治自由国家的研究结果方面的实用性。基于这些见解，我们建议扩展当前的“论文清单”以包括对WEIRD偏见的考虑，并呼吁社区通过鼓励使用来自代表性不足地区的多样数据集来扩大研究的包容性。

更新时间: 2024-06-11 13:34:09

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.02090v2

Hybrid Reinforcement Learning from Offline Observation Alone

We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While Reinforcement Learning (RL) research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motivates our study of the hybrid RL with observation-only offline dataset framework. While the task of competing with the best policy "covered" by the offline data can be solved if a reset model of the environment is provided (i.e., one that can be reset to any state), we show evidence of hardness when only given the weaker trace model (i.e., one can only reset to the initial states and must produce full traces through the environment), without further assumption of admissibility of the offline data. Under the admissibility assumptions -- that the offline data could actually be produced by the policy class we consider -- we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model. We also perform proof-of-concept experiments that suggest the effectiveness of our algorithm in practice.

Updated: 2024-06-11 13:34:05

标题: 单纯利用离线观察进行混合强化学习

摘要: 我们考虑混合强化学习设置，其中代理可以访问离线数据和在线交互式访问。虽然强化学习（RL）研究通常假设离线数据包含完整的动作、奖励和转移信息，但仅包含状态信息的数据集（也称为仅观察数据集）更通用、丰富且实用。这促使我们研究了具有仅观察离线数据集框架的混合RL。虽然如果提供环境的重置模型（即可以重置到任何状态的模型），就可以解决与离线数据中的最佳策略"对抗"的任务，但我们展示了在只给出较弱的跟踪模型（即只能重置到初始状态并必须通过环境生成完整的轨迹）的情况下，没有进一步假设离线数据的可接受性时的困难证据。在可接受性假设下 - 即离线数据实际上可以由我们考虑的策略类产生 - 我们提出了跟踪模型设置中的第一个算法，可以证明与利用重置模型的算法性能相匹配。我们还进行了概念验证实验，结果表明我们的算法在实践中是有效的。

更新时间: 2024-06-11 13:34:05

领域: cs.LG

下载: http://arxiv.org/abs/2406.07253v1

Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models

In this work, we introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost. Our cascading method uses the image generated at the lowest resolution as a baseline to sample at higher resolutions. For the guidance, we introduce the Slider, a tunable mechanism that fuses the overall structure contained in the first-generated image with enhanced fine details. At each inference step, we denoise patches rather than the entire latent space, minimizing memory demands such that a single GPU can handle the process, regardless of the image's resolution. Our experimental results show that Pixelsmith not only achieves higher quality and diversity compared to existing techniques, but also reduces sampling time and artifacts. The code for our work is available at https://github.com/Thanos-DB/Pixelsmith.

Updated: 2024-06-11 13:33:33

标题: 一个GPU足够吗？利用基础模型推动更高分辨率的图像生成

摘要: 在这项工作中，我们介绍了Pixelsmith，这是一个零射文字到图像生成框架，可以使用单个GPU在更高分辨率下采样图像。我们是第一个展示通过对预训练扩散模型的输出进行1000倍的放大是可能的，从而为吉卜赛尔图像生成打开了道路，而无需额外成本。我们的级联方法使用在最低分辨率下生成的图像作为基准来在更高分辨率下进行采样。为了指导，我们引入了滑块，这是一个可调节机制，将第一次生成的图像中包含的整体结构与增强的细节结合起来。在每个推断步骤中，我们对补丁进行去噪，而不是整个潜在空间，从而降低了内存需求，使得单个GPU可以处理这个过程，无论图像的分辨率如何。我们的实验结果显示，与现有技术相比，Pixelsmith不仅实现了更高质量和多样性，还减少了采样时间和伪影。我们的工作代码可以在https://github.com/Thanos-DB/Pixelsmith找到。

更新时间: 2024-06-11 13:33:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07251v1

Graph Mining under Data scarcity

Multitude of deep learning models have been proposed for node classification in graphs. However, they tend to perform poorly under labeled-data scarcity. Although Few-shot learning for graphs has been introduced to overcome this problem, the existing models are not easily adaptable for generic graph learning frameworks like Graph Neural Networks (GNNs). Our work proposes an Uncertainty Estimator framework that can be applied on top of any generic GNN backbone network (which are typically designed for supervised/semi-supervised node classification) to improve the node classification performance. A neural network is used to model the Uncertainty Estimator as a probability distribution rather than probabilistic discrete scalar values. We train these models under the classic episodic learning paradigm in the $n$-way, $k$-shot fashion, in an end-to-end setting. Our work demonstrates that implementation of the uncertainty estimator on a GNN backbone network improves the classification accuracy under Few-shot setting without any meta-learning specific architecture. We conduct experiments on multiple datasets under different Few-shot settings and different GNN-based backbone networks. Our method outperforms the baselines, which demonstrates the efficacy of the Uncertainty Estimator for Few-shot node classification on graphs with a GNN.

Updated: 2024-06-11 13:33:16

标题: 数据稀缺条件下的图挖掘

摘要: 许多深度学习模型已被提出用于图中节点分类。然而，在标记数据稀缺的情况下，它们往往表现不佳。尽管已经引入了用于克服这个问题的图的少样本学习，但现有模型不易适用于通用图学习框架，如图神经网络（GNNs）。我们的工作提出了一种不确定性评估器框架，可应用于任何通用GNN基础网络（通常设计用于监督/半监督节点分类），以提高节点分类性能。神经网络被用来将不确定性评估器建模为概率分布，而不是概率离散标量值。我们在$n$路，$k$次模式下，在端到端设置中训练这些模型，采用经典的情节学习范式。我们的工作表明，在没有任何元学习特定体系结构的情况下，将不确定性评估器实现在GNN基础网络上可以提高少样本设置下的分类准确性。我们在不同少样本设置和不同基于GNN的基础网络上进行实验。我们的方法优于基线，这表明了不确定性评估器在具有GNN的图中的少样本节点分类中的有效性。

更新时间: 2024-06-11 13:33:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.04825v2

Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot problem is to enable rapid deployment of ASD systems for new kinds of machines without the need for machine-specific hyperparameter tunings. This problem setting was realized by (1) giving only one section for each machine type and (2) having completely different machine types for the development and evaluation datasets. For the DCASE 2024 Challenge Task 2, data of completely new machine types were newly collected and provided as the evaluation dataset. In addition, attribute information such as the machine operation conditions were concealed for several machine types to mimic situations where such information are unavailable. We will add challenge results and analysis of the submissions after the challenge submission deadline.

Updated: 2024-06-11 13:32:40

标题: DCASE 2024挑战赛任务2的描述和讨论：用于机器状态监测的首次射击无监督异常声音检测

摘要: 我们介绍了检测和分类声音场景和事件（DCASE）2024挑战任务2的任务描述：用于机器状态监测的首次无监督异常声音检测（ASD）。延续去年DCASE 2023挑战任务2的做法，我们将任务组织为在需要域泛化的设置下的首次问题。首次问题的主要目标是在不需要特定于机器的超参数调整的情况下，实现对新种类机器的快速部署ASD系统。这个问题设置是通过（1）为每种机器类型仅提供一个部分以及（2）在开发和评估数据集中使用完全不同的机器类型来实现的。对于DCASE 2024挑战任务2，完全新的机器类型数据被新收集并提供为评估数据集。此外，类似机器运行条件的属性信息对于几种机器类型进行了隐藏，以模拟这些信息不可用的情况。在挑战提交截止日期之后，我们将添加挑战结果和提交的分析。

更新时间: 2024-06-11 13:32:40

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.07250v1

Are Protein Language Models Compute Optimal?

While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns in performance as model size increases, and we identify a performance plateau in training loss comparable to the one found in relevant works in the field. Our findings suggest that widely-used pLMs might not be compute-optimal, indicating that larger models could achieve convergence more efficiently. Training a 35M model on a reduced token set, we attained perplexity results comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with a single dataset pass. This work paves the way towards more compute-efficient pLMs, democratizing their training and practical application in computational biology.

Updated: 2024-06-11 13:32:11

标题: 蛋白质语言模型是计算最优的吗？

摘要: 蛋白质语言模型（pLMs）已经改变了生物研究，但是其改进的规律仍然未被充分探讨。通过借鉴自自然语言处理规模定律的方法，我们调查了在固定计算预算内模型参数和训练令牌之间的最佳比率。我们的研究显示，pLM的规模与计算预算呈次线性关系，随着模型大小的增加，性能递减，我们发现训练损失中存在与该领域相关作品中发现的性能平台。我们的研究结果表明，广泛使用的pLM可能不是计算优化的，这意味着更大的模型可能会更有效地实现收敛。通过在减少的令牌集上训练35M模型，我们获得了与更大模型（如ESM-2（15B）和xTrimoPGLM（100B））相当的困惑度结果，只需一个数据集遍历。这项工作为更高效的pLM铺平了道路，使其在计算生物学中的训练和实际应用更加民主化。

更新时间: 2024-06-11 13:32:11

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2406.07249v1

Dynamical Mean-Field Theory of Self-Attention Neural Networks

Transformer-based models have demonstrated exceptional performance across diverse domains, becoming the state-of-the-art solution for addressing sequential machine learning problems. Even though we have a general understanding of the fundamental components in the transformer architecture, little is known about how they operate or what are their expected dynamics. Recently, there has been an increasing interest in exploring the relationship between attention mechanisms and Hopfield networks, promising to shed light on the statistical physics of transformer networks. However, to date, the dynamical regimes of transformer-like models have not been studied in depth. In this paper, we address this gap by using methods for the study of asymmetric Hopfield networks in nonequilibrium regimes --namely path integral methods over generating functionals, yielding dynamics governed by concurrent mean-field variables. Assuming 1-bit tokens and weights, we derive analytical approximations for the behavior of large self-attention neural networks coupled to a softmax output, which become exact in the large limit size. Our findings reveal nontrivial dynamical phenomena, including nonequilibrium phase transitions associated with chaotic bifurcations, even for very simple configurations with a few encoded features and a very short context window. Finally, we discuss the potential of our analytic approach to improve our understanding of the inner workings of transformer models, potentially reducing computational training costs and enhancing model interpretability.

Updated: 2024-06-11 13:29:34

标题: 《自注意神经网络的动力学均场理论》

摘要: 基于Transformer的模型在各个领域展现出了卓越的性能，成为解决顺序机器学习问题的最先进解决方案。尽管我们对Transformer架构中的基本组件有一般性的理解，但我们对它们的操作方式或预期动态知之甚少。最近，人们对探索注意力机制和霍普菲尔德网络之间的关系越来越感兴趣，有望揭示Transformer网络的统计物理学。然而，迄今为止，类似Transformer的模型的动态区域尚未深入研究。在本文中，我们通过使用非平衡状态下的不对称霍普菲尔德网络研究方法来填补这一空白--即通过生成函数法的路径积分方法，得到由并发均场变量主导的动态。假设使用1位标记和权重，我们推导了大型自注意力神经网络与softmax输出耦合的行为的解析近似，这在大尺寸极限下变得精确。我们的发现揭示了非平凡的动态现象，包括与混沌分叉相关的非平衡相变，即使是具有少量编码特征和非常短的上下文窗口的非常简单的配置。最后，我们讨论了我们的分析方法的潜力，以改善我们对Transformer模型内部运作方式的理解，有可能降低计算训练成本并增强模型的可解释性。

更新时间: 2024-06-11 13:29:34

领域: cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2406.07247v1

Marginalization Consistent Mixture of Separable Flows for Probabilistic Irregular Time Series Forecasting

Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model~\citep{Durichen2015.Multitask}, TACTiS, the Transformer-Attentional Copulas for Time Series~\cite{Drouin2022.Tactis, ashok2024tactis} and ProFITi \citep{Yalavarthi2024.Probabilistica}, a multivariate normalizing flow model based on invertible attention layers. While ProFITi, thanks to using multivariate normalizing flows, is the more expressive model with better predictive performance, we will show that it suffers from marginalization inconsistency: it does not guarantee that the marginal distributions of a subset of variables in its predictive distributions coincide with the directly predicted distributions of these variables. Also, TACTiS does not provide any guarantees for marginalization consistency. We develop a novel probabilistic irregular time series forecasting model, Marginalization Consistent Mixtures of Separable Flows (moses), that mixes several normalizing flows with (i) Gaussian Processes with full covariance matrix as source distributions and (ii) a separable invertible transformation, aiming to combine the expressivity of normalizing flows with the marginalization consistency of Gaussians. In experiments on four different datasets we show that moses outperforms other state-of-the-art marginalization consistent models, performs on par with ProFITi, but different from ProFITi, guarantee marginalization consistency.

Updated: 2024-06-11 13:28:43

标题: 边缘一致的可分离流混合模型用于概率不规则时间序列预测

摘要: 概率预测模型用于不规则时间序列中目标的联合分布是机器学习中一个研究较少的领域，据我们所知，目前只有三种模型得到研究：GPR，即高斯过程回归模型（Durichen，2015），TACTiS，即时间序列的Transformer-Attentional Copulas模型（Drouin等，2022；ashok等，2024），以及ProFITi（Yalavarthi，2024），一种基于可逆注意力层的多元归一化流模型。虽然ProFITi由于使用多元归一化流而是更有表现力且具有更好的预测性能的模型，但我们将展示它存在边际化不一致性：它并不保证其预测分布中的某些变量子集的边际分布与这些变量的直接预测分布一致。此外，TACTiS也不能保证边际化一致性。我们开发了一种新颖的概率不规则时间序列预测模型，即Marginalization Consistent Mixtures of Separable Flows（moses），它将多个归一化流与（i）具有完整协方差矩阵的高斯过程作为源分布以及（ii）可分离的可逆变换相结合，旨在结合归一化流的表现力与高斯的边际化一致性。在四个不同数据集上的实验中，我们展示moses优于其他最先进的边际化一致性模型，与ProFITi表现相当，但不同于ProFITi，moses保证了边际化一致性。

更新时间: 2024-06-11 13:28:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.07246v1

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

The study of time series is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. Not only do they enhance the generative and inferential capabilities for sequential and temporal data, but they also extend to other downstream tasks. In this survey, we comprehensively and thoroughly review the use of diffusion models in time series and spatio-temporal data, categorizing them by model category, task type, data modality, and practical application domain. In detail, we categorize diffusion models into unconditioned and conditioned types and discuss time series and spatio-temporal data separately. Unconditioned models, which operate unsupervised, are subdivided into probability-based and score-based models, serving predictive and generative tasks such as forecasting, anomaly detection, classification, and imputation. Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks. Our survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Through this structured overview, we aim to provide researchers and practitioners with a comprehensive understanding of diffusion models for time series and spatio-temporal data analysis, aiming to direct future innovations and applications by addressing traditional challenges and exploring innovative solutions within the diffusion model framework.

Updated: 2024-06-11 13:25:53

标题: 一个关于时间序列和时空数据扩散模型的调查

摘要: 时间序列的研究对于理解时间趋势和异常至关重要，可以在各个领域实现预测性洞见。另一方面，时空数据对于分析空间和时间中的现象至关重要，提供了对复杂系统相互作用的动态视角。最近，扩散模型在时间序列和时空数据挖掘中得到了广泛应用。它们不仅增强了顺序和时间数据的生成和推理能力，而且还扩展到其他下游任务。在本调查中，我们全面而彻底地审查了扩散模型在时间序列和时空数据中的应用，通过模型类别、任务类型、数据类型和实际应用领域对它们进行了分类。具体而言，我们将扩散模型分为无条件和有条件类型，并分别讨论时间序列和时空数据。无条件模型是无监督操作的，分为基于概率和基于得分的模型，用于预测和生成任务，如预测、异常检测、分类和填补。另一方面，有条件模型利用额外信息来提高性能，并分为预测和生成任务。我们的调查广泛涵盖了它们在各个领域的应用，包括医疗保健、推荐、气候、能源、音频和交通，为如何分析和生成数据提供了基础性理解。通过这种结构化的概述，我们旨在为研究人员和实践者提供对扩散模型在时间序列和时空数据分析中的全面理解，旨在通过解决传统挑战和在扩散模型框架内探索创新解决方案来引导未来的创新和应用。

更新时间: 2024-06-11 13:25:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18886v3

Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models

As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.

Updated: 2024-06-11 13:22:47

标题: 全球剪切器：增强基于变压器的目标检测模型的安全性和可靠性

摘要: 随着基于变压器的目标检测模型的进展，人们预计它们在自动驾驶汽车和航空等关键领域的影响将增长。推理过程中发生的软错误导致位翻转显著影响了深度神经网络的性能，改变了预测结果。传统的用于卷积神经网络的范围限制解决方案对于变压器来说并不够用。本研究引入了全局修剪器和全局混合修剪器，这是专门为基于变压器的模型设计的有效缓解策略。它显著增强了它们对软错误的韧性，将错误推断减少到约0%。我们还详细介绍了在涉及两种变压器模型（DINO-DETR和Lite-DETR）和两种卷积神经网络模型（YOLOv3和SSD）以及三个数据集的64多种情景中进行的广泛测试，总计约330万次推断，全面评估了模型的稳健性。此外，本文探讨了变压器中注意力块的独特方面以及它们与卷积神经网络的操作差异。

更新时间: 2024-06-11 13:22:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03229v2

On the Effects of Data Scale on Computer Control Agents

Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we investigate how performance measured on both high and low-level tasks in domain and out of domain scales as more training data is collected. To this end we collect and release a new dataset, AndroidControl, consisting of 15,283 demonstrations of everyday tasks with Android apps. Compared to existing datasets, each AndroidControl task instance includes both high and low-level human-generated instructions, allowing us to explore the level of task complexity an agent can handle. Moreover, AndroidControl is the most diverse computer control dataset to date, including 15,283 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data. Using the dataset, we find that when tested in domain fine-tuned models outperform zero and few-shot baselines and scale in such a way that robust performance might feasibly be obtained simply by collecting more data. Out of domain, performance scales significantly more slowly and suggests that in particular for high-level tasks, fine-tuning on more data alone may be insufficient for achieving robust out-of-domain performance.

Updated: 2024-06-11 13:19:38

标题: 关于数据规模对计算机控制代理的影响

摘要: 自主代理控制计算机界面以完成人类任务的技术正在兴起。利用LLMs来支持这样的代理已经引起了特别关注，但除非在人类收集的任务示范上进行精细调整，否则性能仍然相对较低。在这项工作中，我们研究了仅通过精细调整是否是构建现实世界计算机控制代理的可行方法。特别是，我们调查了在领域内和领域外测量的性能如何随着收集更多训练数据而变化。为此，我们收集并发布了一个新数据集AndroidControl，其中包含15,283个使用Android应用程序进行的日常任务示范。与现有数据集相比，每个AndroidControl任务实例都包括高级和低级的人类生成指令，这使我们能够探索代理可以处理的任务复杂度水平。此外，AndroidControl是迄今最多样化的计算机控制数据集，包括833个Android应用程序上的15,283个独特任务，从而使我们能够对训练数据的领域内外的模型性能进行深入分析。使用该数据集，我们发现在领域内测试时，精细调整模型的性能优于零和少量训练的基线，并且随着收集更多数据，性能可以得到稳健提升。在领域外，性能提升速度明显较慢，并表明特别是对于高级任务，仅仅通过更多数据的精细调整可能不足以实现稳健的领域外性能。

更新时间: 2024-06-11 13:19:38

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03679v2

Hate Speech Detection with Generalizable Target-aware Fairness

To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge over time. To tackle this defect, we propose Generalizable target-aware Fairness (GetFair), a new method for fairly classifying each post that contains diverse and even unseen targets during inference. To remove the HSD classifier's spurious dependence on target-related features, GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator that recovers the targeted group from filtered post embeddings. To maintain scalability and generalizability, we innovatively parameterize all filter functions via a hypernetwork that is regularized by the semantic affinity among targets. Taking a target's pretrained word embedding as input, the hypernetwork generates the weights used by each target-specific filter on-the-fly without storing dedicated filter parameters. Finally, comparative experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.

Updated: 2024-06-11 13:18:14

标题: 使用可泛化目标感知公平性进行仇恨言论检测

摘要: 为了对抗社交媒体平台的泛滥带来的副作用，仇恨言论检测（HSD）在阻止有毒在线帖子传播的早期阶段起着至关重要的作用。然而，鉴于社交媒体上普遍存在的话题社区，经过训练的HSD分类器很容易对特定目标群体（例如女性和黑人）产生偏见，高假阳性/阴性率显著损害公众对内容调控机制公平性的信任，最终危害在线社会的多样性。尽管现有的关注公平性的HSD方法可以消除一些目标群体之间的差异，但它们大多针对被认为已知且固定的一小部分目标，这不可避免地阻碍了这些方法在新的目标群体不断涌现的真实用例中的泛化。为了解决这一缺陷，我们提出了通用目标感知公平性（GetFair），这是一种新方法，用于在推理过程中公平地对包含多样甚至未知目标的每个帖子进行分类。为了消除HSD分类器对与目标相关的特征的虚假依赖，GetFair训练了一系列过滤函数，构成一个对抗性管道，以欺骗从过滤后的帖子嵌入中恢复目标群体的鉴别器。为了保持可扩展性和泛化性，我们通过一个由目标之间的语义亲和力正则化的超网络来创新地参数化所有过滤函数。通过将目标的预训练词嵌入作为输入，超网络动态生成每个特定目标过滤器使用的权重，而无需存储专门的过滤器参数。最后，在两个HSD数据集上的比较实验表明，GetFair在样本外目标上表现出优势性能。

更新时间: 2024-06-11 13:18:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.00046v2

Large Language Model Meets Graph Neural Network in Knowledge Distillation

In service-oriented architectures, accurately predicting the Quality of Service (QoS) is crucial for maintaining reliability and enhancing user satisfaction. However, significant challenges remain due to existing methods always overlooking high-order latent collaborative relationships between users and services and failing to dynamically adjust feature learning for every specific user-service invocation, which are critical for learning accurate features. Additionally, reliance on RNNs for capturing QoS evolution hampers models' ability to detect long-term trends due to difficulties in managing long-range dependencies. To address these challenges, we propose the \underline{T}arget-Prompt \underline{O}nline \underline{G}raph \underline{C}ollaborative \underline{L}earning (TOGCL) framework for temporal-aware QoS prediction. TOGCL leverages a dynamic user-service invocation graph to model historical interactions, providing a comprehensive representation of user-service relationships. Building on this graph, it develops a target-prompt graph attention network to extract online deep latent features of users and services at each time slice, simultaneously considering implicit collaborative relationships between target users/services and their neighbors, as well as relevant historical QoS values. Additionally, a multi-layer Transformer encoder is employed to uncover temporal feature evolution patterns of users and services, leading to temporal-aware QoS prediction. Extensive experiments conducted on the WS-DREAM dataset demonstrate that our proposed TOGCL framework significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80\%. These results underscore the effectiveness of the TOGCL framework for precise temporal QoS prediction.

Updated: 2024-06-11 13:17:12

标题: 大型语言模型在知识蒸馏中遇到图神经网络

摘要: 在面向服务的架构中，准确预测服务质量（QoS）对于维护可靠性和提高用户满意度至关重要。然而，由于现有方法经常忽视用户和服务之间高阶潜在的协作关系，并且未能动态调整每个特定用户-服务调用的特征学习，这对学习准确特征至关重要，因此仍然存在重大挑战。此外，依赖于RNN来捕捉QoS演变阻碍了模型检测长期趋势的能力，因为难以管理长程依赖性。为了解决这些挑战，我们提出了面向时间感知QoS预测的TPOGCL框架。TPOGCL利用动态用户-服务调用图来建模历史交互，提供了对用户-服务关系的全面表示。基于此图，它开发了一个目标提示图注意网络，以在每个时间切片上提取用户和服务的在线深层潜在特征，同时考虑目标用户/服务与其邻居之间的隐式协作关系，以及相关的历史QoS值。此外，采用多层Transformer编码器来揭示用户和服务的时间特征演变模式，从而实现时间感知QoS预测。在WS-DREAM数据集上进行的大量实验表明，我们提出的TPOGCL框架在多个指标上明显优于最先进的方法，改进高达38.80％。这些结果强调了TPOGCL框架在精确时间QoS预测方面的有效性。

更新时间: 2024-06-11 13:17:12

领域: cs.AI,cs.LG,68T30, 68R10, 68T05

下载: http://arxiv.org/abs/2402.05894v4

Let Go of Your Labels with Unsupervised Transfer

Foundation vision-language models have enabled remarkable zero-shot transferability of the pre-trained representations to a wide range of downstream tasks. However, to solve a new task, zero-shot transfer still necessitates human guidance to define visual categories that appear in the data. Here, we show that fully unsupervised transfer emerges when searching for the labeling of a dataset that induces maximal margin classifiers in representation spaces of different foundation models. We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling of a downstream dataset without any supervision and task-specific representation learning. We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance. Furthermore, TURTLE, although being fully unsupervised, outperforms zero-shot transfer baselines on a wide range of datasets. In particular, TURTLE matches the average performance of CLIP zero-shot on 26 datasets by employing the same representation space, spanning a wide range of architectures and model sizes. By guiding the search for the underlying labeling using the representation spaces of two foundation models, TURTLE surpasses zero-shot transfer and unsupervised prompt tuning baselines, demonstrating the surprising power and effectiveness of unsupervised transfer.

Updated: 2024-06-11 13:14:04

标题: 放下你的标签，进行无监督转移

摘要: 基于基础视觉-语言模型的研究使得预训练表示在广泛的下游任务中实现了令人瞩目的零-shot可转移性。然而，要解决一个新任务，零-shot转移仍然需要人为指导来定义数据中出现的视觉类别。在这里，我们展示了当搜索引起不同基础模型表示空间中的最大间隔分类器的数据集的标签时，完全无监督的转移会出现。我们提出了TURTLE，一种完全无监督的方法，有效地利用这一指导原则，揭示下游数据集的潜在标签，而无需任何监督和任务特定的表示学习。我们在一个多样化的基准套件中评估了TURTLE的性能，并展示它实现了新的无监督性能的最新水平。此外，尽管TURTLE是完全无监督的，但在广泛的数据集上，它的表现超过了零-shot转移基线。特别是，通过利用相同的表示空间，跨越广泛的架构和模型大小范围，TURTLE在26个数据集上与CLIP零-shot的平均表现相匹配。通过利用两个基础模型的表示空间来引导对潜在标签的搜索，TURTLE超越了零-shot转移和无监督提示调整基线，展示了无监督转移的惊人力量和有效性。

更新时间: 2024-06-11 13:14:04

领域: cs.LG

下载: http://arxiv.org/abs/2406.07236v1

OPFData: Large-scale datasets for AC optimal power flow with topological perturbations

Solving the AC optimal power flow problem (AC-OPF) is critical to the efficient and safe planning and operation of power grids. Small efficiency improvements in this domain have the potential to lead to billions of dollars of cost savings, and significant reductions in emissions from fossil fuel generators. Recent work on data-driven solution methods for AC-OPF shows the potential for large speed improvements compared to traditional solvers; however, no large-scale open datasets for this problem exist. We present the largest readily-available collection of solved AC-OPF problems to date. This collection is orders of magnitude larger than existing readily-available datasets, allowing training of high-capacity data-driven models. Uniquely, it includes topological perturbations - a critical requirement for usage in realistic power grid operations. We hope this resource will spur the community to scale research to larger grid sizes with variable topology.

Updated: 2024-06-11 13:12:39

标题: OPFData：具有拓扑扰动的交流最优功率流的大规模数据集

摘要: 解决交流最优功率流问题（AC-OPF）对于电网的高效和安全规划和运行至关重要。在这一领域的小幅效率改进有潜力带来数十亿美元的成本节约，并显著减少化石燃料发电机的排放。最近针对AC-OPF的数据驱动解决方法显示出与传统求解器相比的速度大幅提升的潜力；然而，目前尚无大规模开放数据集用于解决这一问题。我们提供迄今为止最大的可获得的一系列已解决AC-OPF问题。这一集合比现有可获得的数据集大数个数量级，可以用于训练高容量的数据驱动模型。独特的是，它包括拓扑扰动 - 在现实电网运行中的关键要求。我们希望这一资源将激励社区将研究扩展到具有可变拓扑的更大规模电网。

更新时间: 2024-06-11 13:12:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.07234v1

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs.

Updated: 2024-06-11 13:10:39

标题: DUAL-REFLECT: 通过双向学习反馈机制增强大型语言模型以进行反思性翻译

摘要: 最近，通过自我反思增强的大型语言模型（LLMs）在机器翻译方面取得了令人期待的性能。关键思想是引导LLMs生成具有类似人类反馈的翻译。然而，现有的自我反思方法缺乏有效的反馈信息，限制了翻译性能。为解决这一问题，我们引入了一个DUAL-REFLECT框架，利用翻译任务的双向学习来提供有效的反馈，从而增强模型的自我反思能力并改善翻译性能。该方法在各种翻译任务中的应用已经证明其在提高翻译准确性和消除歧义方面的有效性，特别是在低资源语言对的翻译任务中。

更新时间: 2024-06-11 13:10:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07232v1

Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms

Understanding commonsense knowledge is crucial in the field of Natural Language Processing (NLP). However, the presence of demographic terms in commonsense knowledge poses a potential risk of compromising the performance of NLP models. This study aims to investigate and propose methods for enhancing the performance and effectiveness of a commonsense polarization classifier by mitigating the influence of demographic terms. Three methods are introduced in this paper: (1) hierarchical generalization of demographic terms (2) threshold-based augmentation and (3) integration of hierarchical generalization and threshold-based augmentation methods (IHTA). The first method involves replacing demographic terms with more general ones based on a term hierarchy ontology, aiming to mitigate the influence of specific terms. To address the limited bias-related information, the second method measures the polarization of demographic terms by comparing the changes in the model's predictions when these terms are masked versus unmasked. This method augments commonsense sentences containing terms with high polarization values by replacing their predicates with synonyms generated by ChatGPT. The third method combines the two approaches, starting with threshold-based augmentation followed by hierarchical generalization. The experiments show that the first method increases the accuracy over the baseline by 2.33%, and the second one by 0.96% over standard augmentation methods. The IHTA techniques yielded an 8.82% and 9.96% higher accuracy than threshold-based and standard augmentation methods, respectively.

Updated: 2024-06-11 13:09:16

标题: 通过减少人口统计学术语的影响来改善常识偏见分类

摘要: 理解常识知识在自然语言处理（NLP）领域至关重要。然而，常识知识中存在的人口统计术语可能会危及NLP模型的性能。本研究旨在调查并提出方法，通过减轻人口统计术语的影响，增强常识极化分类器的性能和有效性。本文介绍了三种方法：（1）人口统计术语的分层概括，（2）基于阈值的增强，以及（3）整合分层概括和基于阈值的增强方法（IHTA）。第一种方法涉及根据术语层次本体论用更一般的术语替换人口统计术语，旨在减轻特定术语的影响。为了解决有限的偏见相关信息，第二种方法通过比较模型在遮盖与未遮盖这些术语时的预测变化来衡量人口统计术语的极化。该方法通过用ChatGPT生成的同义词替换具有较高极化值的术语的常识句子中的谓词来增强。第三种方法结合了这两种方法，首先进行基于阈值的增强，然后进行分层概括。实验表明，第一种方法将准确率提高了2.33％，第二种方法将其提高了0.96％，相对于标准增强方法。IHTA技术的准确率分别比基于阈值和标准增强方法高出8.82％和9.96％。

更新时间: 2024-06-11 13:09:16

领域: cs.CL,cs.AI,68T50,I.2.7; I.2.6

下载: http://arxiv.org/abs/2406.07229v1

Needle In A Multimodal Haystack

With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning. In each task, the model is required to answer the questions according to different key information scattered throughout the given multimodal document. Evaluating the leading MLLMs on MM-NIAH, we observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation. We hope this work can provide a platform for further research on long multimodal document comprehension and contribute to the advancement of MLLMs. Code and benchmark are released at https://github.com/OpenGVLab/MM-NIAH.

Updated: 2024-06-11 13:09:16

标题: 在多模态草堆中的针

摘要: 随着多模态大型语言模型（MLLMs）的快速发展，它们的评估变得越来越全面。然而，理解长篇多模态内容作为现实世界应用的基础能力仍未被充分探索。在这项工作中，我们提出了“Needle In A Multimodal Haystack”（MM-NIAH），这是第一个专门设计用于系统评估现有MLLMs能力的基准测试，以理解长篇多模态文档。我们的基准测试包括三种类型的评估任务：多模态检索、计数和推理。在每个任务中，模型需要根据给定的多模态文档中分散的不同关键信息回答问题。通过在MM-NIAH上评估领先的MLLMs，我们观察到现有模型在这些任务上仍有很大的改进空间，特别是在以视觉为中心的评估上。我们希望这项工作能为长篇多模态文档理解的进一步研究提供平台，并为MLLMs的发展做出贡献。代码和基准测试发布在https://github.com/OpenGVLab/MM-NIAH。

更新时间: 2024-06-11 13:09:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07230v1

Haptic Repurposing with GenAI

Mixed Reality aims to merge the digital and physical worlds to create immersive human-computer interactions. Despite notable advancements, the absence of realistic haptic feedback often breaks the immersive experience by creating a disconnect between visual and tactile perceptions. This paper introduces Haptic Repurposing with GenAI, an innovative approach to enhance MR interactions by transforming any physical objects into adaptive haptic interfaces for AI-generated virtual assets. Utilizing state-of-the-art generative AI models, this system captures both 2D and 3D features of physical objects and, through user-directed prompts, generates corresponding virtual objects that maintain the physical form of the original objects. Through model-based object tracking, the system dynamically anchors virtual assets to physical props in real time, allowing objects to visually morph into any user-specified virtual object. This paper details the system's development, presents findings from usability studies that validate its effectiveness, and explores its potential to significantly enhance interactive MR environments. The hope is this work can lay a foundation for further research into AI-driven spatial transformation in immersive and haptic technologies.

Updated: 2024-06-11 13:06:28

标题: 使用GenAI进行触觉再利用

摘要: 混合现实旨在将数字世界和物理世界融合，以创建沉浸式的人机交互体验。尽管取得了显著进展，但缺乏真实的触觉反馈通常会通过在视觉和触觉感知之间创建断裂来破坏沉浸式体验。本文介绍了一种名为GenAI的触觉再利用方法，这是一种创新的方法，通过将任何物理物体转化为适应性触觉接口，以增强混合现实交互。利用最先进的生成式人工智能模型，该系统捕捉物理物体的2D和3D特征，并通过用户指导提示生成保持原始物体物理形态的相应虚拟物体。通过基于模型的物体跟踪，该系统能够在实时将虚拟资产动态锚定到物理道具上，使物体视觉上能够转变为任何用户指定的虚拟物体。本文详细介绍了系统的开发情况，展示了验证其有效性的可用性研究结果，并探讨了其在显著增强交互式混合现实环境方面的潜力。希望这项工作能为进一步研究在沉浸式和触觉技术中的人工智能驱动的空间转换奠定基础。

更新时间: 2024-06-11 13:06:28

领域: cs.HC,cs.AI,F.2.2, I.2.7

下载: http://arxiv.org/abs/2406.07228v1

Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as ``I packed my bag''. This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To address the gap in available memory-based Deep Reinforcement Learning baselines, we introduce an implementation that integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This approach utilizes TrXL as a form of episodic memory, employing a sliding window technique. Our comparative study between the Gated Recurrent Unit (GRU) and TrXL reveals varied performances across different settings. TrXL, on the finite environments, demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins. Website and Source Code: https://github.com/MarcoMeter/endless-memory-gym/

Updated: 2024-06-11 13:06:15

标题: 记忆健身房：朝着无尽任务，以评估代理的记忆能力

摘要: Memory Gym提供了一套2D部分可观测环境，分别是Mortar Mayhem，Mystery Path和Searing Spotlights，旨在评估决策代理的记忆能力。这些环境最初是有限任务，现已扩展为创新的无限格式，反映了类似“我打包了我的包”等累积记忆游戏中不断升级的挑战。任务设计的这种进展将焦点从仅仅评估样本效率转移到在动态、持久的情景中探究记忆效能水平。为填补基于记忆的深度强化学习基线的空白，我们引入了一种将Transformer-XL（TrXL）与Proximal Policy Optimization整合的实现。这种方法利用TrXL作为一种叙事式记忆，采用滑动窗口技术。我们在门控循环单元（GRU）和TrXL之间进行的比较研究揭示了在不同设置下的不同表现。在有限环境中，TrXL在Mystery Path中表现出卓越的样本效率，并在Mortar Mayhem中表现出色。然而，在Searing Spotlights上，GRU更为高效。值得注意的是，在所有无限任务中，GRU表现出了显著的复苏，始终以显著的优势胜过TrXL。网站和源代码：https://github.com/MarcoMeter/endless-memory-gym/

更新时间: 2024-06-11 13:06:15

领域: cs.LG

下载: http://arxiv.org/abs/2309.17207v4

Improving Autoformalization using Type Checking

Large language models show promise for autoformalization, the task of automatically translating natural language into formal languages. However, current autoformalization methods remain limited. The last reported state-of-the-art performance on the ProofNet formalization benchmark for the Lean proof assistant, achieved using Codex for Lean 3, only showed successful formalization of 16.1% of informal statements. Similarly, our evaluation of GPT-4o for Lean 4 only produces successful translations 34.9% of the time. Our analysis shows that the performance of these models is largely limited by their inability to generate formal statements that successfully type-check (i.e., are syntactically correct and consistent with types) - with a whopping 86.6% of GPT-4o errors starting from a type-check failure. In this work, we propose a method to fix this issue through decoding with type-check filtering, where we initially sample a diverse set of candidate formalizations for an informal statement, then use the Lean proof assistant to filter out candidates that do not type-check. Using GPT-4o as a base model, and combining our method with self-consistency, we obtain a +18.3% absolute increase in formalization accuracy, and achieve a new state-of-the-art of 53.2% on ProofNet with Lean 4.

Updated: 2024-06-11 13:01:50

标题: 通过类型检查来改进自动形式化

摘要: 大型语言模型显示出在自动形式化方面的潜力，即将自然语言自动翻译为形式语言的任务。然而，目前的自动形式化方法仍然存在限制。最近报告的Lean证明助手的ProofNet形式化基准的最新性能仅使用Codex for Lean 3实现，仅成功形式化了16.1%的非正式语句。类似地，我们对Lean 4的GPT-4o的评估仅在34.9%的情况下产生成功的翻译。我们的分析显示，这些模型的性能主要受到其无法生成成功类型检查的正式语句（即，语法正确且与类型一致）的限制 - 86.6%的GPT-4o错误源自类型检查失败。在这项工作中，我们提出了一种通过类型检查过滤进行解码的方法来解决这个问题，其中我们首先对非正式语句采样一个多样化的候选形式化集合，然后使用Lean证明助手来筛选出无法通过类型检查的候选项。使用GPT-4o作为基础模型，并将我们的方法与自一致性结合，我们在形式化准确性上获得了+18.3%的绝对增长，并在Lean 4的ProofNet上取得了53.2%的新的最新性能。

更新时间: 2024-06-11 13:01:50

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07222v1

How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

We investigate the mechanism of in-context learning (ICL) on sentence classification tasks with semantically-unrelated labels ("foo"/"bar"). We find intervening in only 1\% heads (named "in-context heads") significantly affects ICL accuracy from 87.6\% to 24.4\%. To understand this phenomenon, we analyze the value-output vectors in these heads and discover that the vectors at each label position contain substantial information about the corresponding labels. Furthermore, we observe that the prediction shift from "foo" to "bar" is due to the respective reduction and increase in these heads' attention scores at "foo" and "bar" positions. Therefore, we propose a hypothesis for ICL: in in-context heads, the value-output matrices extract label features, while the query-key matrices compute the similarity between the features at the last position and those at each label position. The query and key matrices can be considered as two towers that learn the similarity metric between the last position's features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22\% and 17\%, respectively.

Updated: 2024-06-11 12:58:51

标题: 大型语言模型是如何在上下文中学习的？上下文头部的查询和键矩阵是度量学习的两座塔楼

摘要: 我们研究了在具有语义不相关标签（“foo”/“bar”）的句子分类任务中上下文学习（ICL）的机制。我们发现仅干预1％的头部（称为“上下文头”）显着影响ICL准确率，从87.6％降至24.4％。为了理解这一现象，我们分析了这些头部中的值输出向量，并发现每个标签位置的向量包含大量关于相应标签的信息。此外，我们观察到从“foo”到“bar”的预测转变是由于这些头部在“foo”和“bar”位置的关注分数相应降低和增加。因此，我们提出了ICL的一个假设：在上下文头中，值输出矩阵提取标签特征，而查询-键矩阵计算最后一个位置的特征与每个标签位置的特征之间的相似度。查询和键矩阵可以被视为两个塔，学习最后一个位置的特征与每个标签位置的演示之间的相似度度量。利用这一假设，我们解释了ICL中的大多数标签偏见和最新偏见，并提出了两种分别将这些偏见降低22％和17％的方法。

更新时间: 2024-06-11 12:58:51

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02872v2

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2})$ sample complexity and $O(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.

Updated: 2024-06-11 12:58:28

标题: 有限和极小极大问题的分布式随机梯度下降上升

摘要: 近年来，由于极其广泛的应用于众多机器学习模型中，极小化最大化（minimax）优化问题引起了人们的极大关注。为了解决这一极小化最大化问题，提出了各种各样的随机优化方法。然而，大多数方法忽略了训练数据分布在多个工作者之间的分布式设置。本文中，我们开发了一种新颖的去中心化随机梯度上升方法，用于有限和最小最大问题。特别地，通过采用方差减少梯度，我们的方法可以实现$O(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2})$的样本复杂度和$O(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2})$的通信复杂度，用于非凸强凹极小最大问题。据我们所知，我们的工作是第一个为这种极小最大问题实现这种理论复杂度的方法。最后，我们将我们的方法应用于AUC最大化，并实验证明了我们方法的有效性。

更新时间: 2024-06-11 12:58:28

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2212.02724v3

InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation

Tabular data are omnipresent in various sectors of industries. Neural networks for tabular data such as TabNet have been proposed to make predictions while leveraging the attention mechanism for interpretability. However, the inferred attention masks are often dense, making it challenging to come up with rationales about the predictive signal. To remedy this, we propose InterpreTabNet, a variant of the TabNet model that models the attention mechanism as a latent variable sampled from a Gumbel-Softmax distribution. This enables us to regularize the model to learn distinct concepts in the attention masks via a KL Divergence regularizer. It prevents overlapping feature selection by promoting sparsity which maximizes the model's efficacy and improves interpretability to determine the important features when predicting the outcome. To assist in the interpretation of feature interdependencies from our model, we employ a large language model (GPT-4) and use prompt engineering to map from the learned feature mask onto natural language text describing the learned signal. Through comprehensive experiments on real-world datasets, we demonstrate that InterpreTabNet outperforms previous methods for interpreting tabular data while attaining competitive accuracy.

Updated: 2024-06-11 12:53:03

标题: InterpreTabNet：通过显著特征解释从表格数据中提炼预测信号

摘要: 表格式数据在各行业的各个领域中无处不在。用于表格数据的神经网络，如TabNet，被提出用于进行预测，同时利用注意力机制来解释结果。然而，推断出的注意力掩码通常很密集，这使得很难得出有关预测信号的理由。为了解决这个问题，我们提出了InterpreTabNet，这是TabNet模型的一个变种，将注意力机制建模为从Gumbel-Softmax分布中抽样的潜在变量。这使我们能够通过KL散度正则化器，在注意力掩码中学习不同的概念。它通过促进稀疏性来防止重叠的特征选择，从而最大化模型的有效性，并改善解释性，以确定在预测结果时重要的特征。为了辅助从我们的模型中解释特征之间的相互依赖关系，我们使用了一个大型语言模型(GPT-4)，并使用提示工程将学习的特征掩码映射到描述学习信号的自然语言文本上。通过对真实世界数据集的全面实验，我们证明InterpreTabNet在解释表格数据方面优于先前的方法，同时达到了竞争性的准确性。

更新时间: 2024-06-11 12:53:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.00426v3

A Synthetic Dataset for Personal Attribute Inference

Recently, powerful Large Language Models (LLMs) have become easily accessible to hundreds of millions of users worldwide. However, their strong capabilities and vast world knowledge do not come without associated privacy risks. In this work, we focus on the emerging privacy threat LLMs pose - the ability to accurately infer personal information from online texts. Despite the growing importance of LLM-based author profiling, research in this area has been hampered by a lack of suitable public datasets, largely due to ethical and privacy concerns associated with real personal data. In this work, we take two steps to address this problem: (i) we construct a simulation framework for the popular social media platform Reddit using LLM agents seeded with synthetic personal profiles; (ii) using this framework, we generate SynthPAI, a diverse synthetic dataset of over 7800 comments manually labeled for personal attributes. We validate our dataset with a human study showing that humans barely outperform random guessing on the task of distinguishing our synthetic comments from real ones. Further, we verify that our dataset enables meaningful personal attribute inference research by showing across 18 state-of-the-art LLMs that our synthetic comments allow us to draw the same conclusions as real-world data. Together, this indicates that our dataset and pipeline provide a strong and privacy-preserving basis for future research toward understanding and mitigating the inference-based privacy threats LLMs pose.

Updated: 2024-06-11 12:50:53

标题: 一个用于个人属性推断的合成数据集

摘要: 最近，功能强大的大型语言模型（LLMs）已经变得容易被全球数亿用户访问。然而，它们强大的能力和广泛的世界知识并非没有相关的隐私风险。在这项工作中，我们关注新兴的隐私威胁LLMs带来的问题 - 即能够从在线文本准确推断个人信息的能力。尽管基于LLM的作者个人特征识别越来越重要，但由于缺乏合适的公共数据集，这一领域的研究一直受到阻碍，主要是由于与真实个人数据相关的伦理和隐私问题。在这项工作中，我们采取两个步骤来解决这个问题：（i）我们构建了一个模拟框架，使用LLM代理人基于合成个人资料种子在流行的社交媒体平台Reddit上进行模拟；（ii）利用这个框架，我们生成了SynthPAI，一个包含超过7800条评论的多样化合成数据集，手动标记了个人属性。我们通过人类研究验证了我们的数据集，结果显示人类在区分我们的合成评论和真实评论的任务上几乎与随机猜测持平。此外，我们验证了我们的数据集通过展示在18种最先进的LLMs上，我们的合成评论能够让我们得出与真实世界数据相同的结论，从而使个人属性推断研究变得有意义。总的来说，这表明我们的数据集和流程为未来研究提供了一个强大且保护隐私的基础，以便了解和缓解LLMs带来的基于推断的隐私威胁。

更新时间: 2024-06-11 12:50:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07217v1

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

Recently, Neural Fields have emerged as a powerful modelling paradigm to represent continuous signals. In a conditional neural field, a field is represented by a latent variable that conditions the NeF, whose parametrisation is otherwise shared over an entire dataset. We propose Equivariant Neural Fields based on cross attention transformers, in which NeFs are conditioned on a geometric conditioning variable, a latent point cloud, that enables an equivariant decoding from latent to field. Our equivariant approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws if the field transforms, the latent represents transforms accordingly and vice versa. Crucially, the equivariance relation ensures that the latent is capable of (1) representing geometric patterns faitfhully, allowing for geometric reasoning in latent space, (2) weightsharing over spatially similar patterns, allowing for efficient learning of datasets of fields. These main properties are validated using classification experiments and a verification of the capability of fitting entire datasets, in comparison to other non-equivariant NeF approaches. We further validate the potential of ENFs by demonstrate unique local field editing properties.

Updated: 2024-06-11 12:45:08

标题: 在几何中对连续表示进行基础化：等变神经场

摘要: 最近，神经场已经成为一种强大的建模范式，用来表示连续信号。在条件神经场中，一个场由一个潜变量表示，该变量条件于神经场，其参数化在整个数据集上共享。我们提出了基于交叉注意力变换器的等变神经场，其中神经场条件于一个几何条件变量，即一个潜点云，这使得从潜到场的等变解码成为可能。我们的等变方法引入了一种可操纵性属性，通过该属性，场和潜变量都基于几何，并且易于转换定律，如果场发生转换，则相应地表示转换，反之亦然。关键是，等变关系确保潜变量能够（1）忠实地表示几何模式，允许在潜空间中进行几何推理，（2）在空间上相似模式上进行权值共享，从而实现对数据集的有效学习。这些主要属性通过分类实验验证，并验证了相对于其他非等变神经场方法，适应整个数据集的能力。我们通过展示独特的局部场编辑属性，进一步验证了ENFs的潜力。

更新时间: 2024-06-11 12:45:08

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05753v2

DSig: Breaking the Barrier of Signatures in Data Centers

Data centers increasingly host mutually distrustful users on shared infrastructure. A powerful tool to safeguard such users are digital signatures. Digital signatures have revolutionized Internet-scale applications, but current signatures are too slow for the growing genre of microsecond-scale systems in modern data centers. We propose DSig, the first digital signature system to achieve single-digit microsecond latency to sign, transmit, and verify signatures in data center systems. DSig is based on the observation that, in many data center applications, the signer of a message knows most of the time who will verify its signature. We introduce a new hybrid signature scheme that combines cheap single-use hash-based signatures verified in the foreground with traditional signatures pre-verified in the background. Compared to prior state-of-the-art signatures, DSig reduces signing time from 18.9 to 0.7 us and verification time from 35.6 to 5.1 us, while keeping signature transmission time below 2.5 us. Moreover, DSig achieves 2.5x higher signing throughput and 6.9x higher verification throughput than the state of the art. We use DSig to (a) bring auditability to two key-value stores (HERD and Redis) and a financial trading system (based on Liquibook) for 86% lower added latency than the state of the art, and (b) replace signatures in BFT broadcast and BFT replication, reducing their latency by 73% and 69%, respectively

Updated: 2024-06-11 12:44:16

标题: DSig：打破数据中心签名的障碍

摘要: 数据中心越来越多地托管互相不信任的用户共享基础设施。保护这些用户的一个强大工具是数字签名。数字签名已经彻底改变了互联网规模的应用程序，但目前的签名对于现代数据中心中微秒级系统的增长类型来说太慢了。我们提出了DSig，这是第一个在数据中心系统中实现单位微秒延迟以进行签名、传输和验证签名的数字签名系统。DSig基于这样的观察，即在许多数据中心应用程序中，信息发送者大部分时间都知道谁会验证其签名。我们引入了一种新的混合签名方案，它结合了在前台进行验证的廉价单次使用基于哈希的签名和在后台进行预验证的传统签名。与先前的最先进签名相比，DSig将签名时间从18.9 us减少到0.7 us，将验证时间从35.6 us减少到5.1 us，同时保持签名传输时间在2.5 us以下。此外，DSig实现了2.5倍更高的签名吞吐量和6.9倍更高的验证吞吐量比先进技术。我们使用DSig来(a)为两个键值存储（HERD和Redis）和一个基于Liquibook的金融交易系统带来审计功能，使其增加的延迟比现有技术低86％，并(b)替换BFT广播和BFT复制中的签名，分别将其延迟降低了73％和69％。

更新时间: 2024-06-11 12:44:16

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.07215v1

Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning

This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement learning (DRL) soft actor-critic (SAC) approach. Firstly, we delve into the extraction of semantic information. Secondly, we redefine metrics for semantic information in V2V and V2I spectrum sharing in IoV environments, introducing high-speed semantic spectrum efficiency (HSSE) and semantic transmission rate (HSR). Finally, we employ the SAC algorithm for decision optimization in V2V and V2I spectrum sharing based on semantic information. This optimization encompasses the optimal link of V2V and V2I sharing strategies, the transmission power for vehicles sending semantic information and the length of transmitted semantic symbols, aiming at maximizing HSSE of V2I and enhancing success rate of effective semantic information transmission (SRS) of V2V. Experimental results demonstrate that the SSS algorithm outperforms other baseline algorithms, including other traditional-communication-based spectrum sharing algorithms and spectrum sharing algorithm using other reinforcement learning approaches. The SSS algorithm exhibits a 15% increase in HSSE and approximately a 7% increase in SRS.

Updated: 2024-06-11 12:42:41

标题: 基于深度强化学习的车联网语义感知频谱共享

摘要: 这项工作旨在研究高速移动车联网（IoV）环境中的语义通信，重点关注车辆间（V2V）和车辆基础设施（V2I）通信之间的频谱共享。我们特别关注频谱稀缺和网络流量，然后基于深度强化学习（DRL）软演员-评论家（SAC）方法提出了一种基于语义感知的频谱共享算法（SSS）。首先，我们深入挖掘语义信息的提取。其次，我们重新定义了IoV环境中V2V和V2I频谱共享中的语义信息度量标准，引入了高速语义频谱效率（HSSE）和语义传输速率（HSR）。最后，我们采用SAC算法基于语义信息进行V2V和V2I频谱共享的决策优化。该优化包括V2V和V2I共享策略的最佳链接、发送语义信息的车辆传输功率以及传输的语义符号长度，旨在最大化V2I的HSSE并提高V2V有效语义信息传输成功率（SRS）。实验结果表明，SSS算法优于其他基准算法，包括其他基于传统通信的频谱共享算法和使用其他强化学习方法的频谱共享算法。SSS算法在HSSE方面表现出15%的增长，SRS大约增加了7%。

更新时间: 2024-06-11 12:42:41

领域: cs.LG

下载: http://arxiv.org/abs/2406.07213v1

Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models

Large language models (LLMs) present a valuable technology for various applications in healthcare, but their tendency to hallucinate introduces unacceptable uncertainty in critical decision-making situations. Human-AI collaboration (HAIC) can mitigate this uncertainty by combining human and AI strengths for better outcomes. This paper presents a novel guided deferral system that provides intelligent guidance when AI defers cases to human decision-makers. We leverage LLMs' verbalisation capabilities and internal states to create this system, demonstrating that fine-tuning smaller LLMs with data from larger models enhances performance while maintaining computational efficiency. A pilot study showcases the effectiveness of our deferral system.

Updated: 2024-06-11 12:41:54

标题: 朝向在医疗保健领域的人工智能与人类的协作：利用大型语言模型的引导推迟系统

摘要: 大型语言模型(LLMs)为医疗保健领域的各种应用提供了宝贵的技术，但它们产生幻觉的倾向在关键决策情况下引入了不可接受的不确定性。人工智能与人类协作(HAIC)可以通过结合人类和人工智能的优势来减轻这种不确定性，以获得更好的结果。本文提出了一种新颖的引导推迟系统，当人工智能将案例推迟给人类决策者时，提供智能指导。我们利用LLMs的表达能力和内部状态来创建这个系统，证明通过使用来自更大模型的数据微调较小的LLMs可以提高性能同时保持计算效率。一项试点研究展示了我们推迟系统的有效性。

更新时间: 2024-06-11 12:41:54

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.07212v1

From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science

This paper evaluated the effectiveness of using generative AI to simplify science communication and enhance the public's understanding of science. By comparing lay summaries of journal articles from PNAS, yoked to those generated by AI, this work first assessed linguistic simplicity across such summaries and public perceptions. Study 1a analyzed simplicity features of PNAS abstracts (scientific summaries) and significance statements (lay summaries), observing that lay summaries were indeed linguistically simpler, but effect size differences were small. Study 1b used a large language model, GPT-4, to create significance statements based on paper abstracts and this more than doubled the average effect size without fine-tuning. Study 2 experimentally demonstrated that simply-written GPT summaries facilitated more favorable perceptions of scientists (they were perceived as more credible and trustworthy, but less intelligent) than more complexly-written human PNAS summaries. Crucially, Study 3 experimentally demonstrated that participants comprehended scientific writing better after reading simple GPT summaries compared to complex PNAS summaries. In their own words, participants also summarized scientific papers in a more detailed and concrete manner after reading GPT summaries compared to PNAS summaries of the same article. AI has the potential to engage scientific communities and the public via a simple language heuristic, advocating for its integration into scientific dissemination for a more informed society.

Updated: 2024-06-11 12:35:51

标题: 从复杂性到清晰度：人工智能如何提升科学家和公众对科学的理解

摘要: 本文评估了利用生成性人工智能简化科学交流并增强公众对科学的理解的有效性。通过比较《美国国家科学院院刊》（PNAS）的期刊文章的普通读者摘要与由人工智能生成的摘要，本研究首先评估了这些摘要的语言简单性以及公众的感知。研究1a分析了PNAS摘要（科学摘要）和重要性陈述（普通读者摘要）的简单性特征，并观察到普通读者摘要确实在语言上更简单，但效应大小差异较小。研究1b使用了一个大型语言模型GPT-4，根据论文摘要创建了重要性陈述，而不需要微调，这使平均效应大小增加了一倍以上。研究2实验证明，简单书写的GPT摘要促进了对科学家更有利的感知（他们被认为更可信、值得信赖，但不太聪明）比复杂书写的人类PNAS摘要。关键的是，研究3实验证明，在阅读简单的GPT摘要后，参与者比阅读复杂的PNAS摘要更好地理解科学著作。参与者用自己的话总结了同一篇文章的GPT摘要与PNAS摘要相比，更加详细和具体。人工智能有潜力通过简单的语言启发法吸引科学界和公众，主张将其融入科学传播中，以建设一个更加知情的社会。

更新时间: 2024-06-11 12:35:51

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.00706v2

Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

Graph Neural Network (GNN) resembles the diffusion process, leading to the over-smoothing of learned representations when stacking many layers. Hence, the reverse process of message passing can produce the distinguishable node representations by inverting the forward message propagation. The distinguishable representations can help us to better classify neighboring nodes with different labels, such as in heterophilic graphs. In this work, we apply the design principle of the reverse process to the three variants of the GNNs. Through the experiments on heterophilic graph data, where adjacent nodes need to have different representations for successful classification, we show that the reverse process significantly improves the prediction performance in many cases. Additional analysis reveals that the reverse mechanism can mitigate the over-smoothing over hundreds of layers. Our code is available at https://github.com/ml-postech/reverse-gnn.

Updated: 2024-06-11 12:35:27

标题: 缓解异质图中GNN过度平滑化的方法：通过GNN的反向过程

摘要: 图神经网络（GNN）类似于扩散过程，导致在堆叠许多层时学习到的表示过于平滑。因此，消息传递的逆过程可以通过反转正向消息传播来产生可区分的节点表示。这种可区分的表示可以帮助我们更好地对具有不同标签的相邻节点进行分类，例如在异质图中。在这项工作中，我们将逆过程的设计原则应用于GNN的三个变体。通过在异质图数据上的实验，其中相邻节点需要具有不同的表示才能成功分类，我们展示了在许多情况下逆过程显著提高了预测性能。额外的分析揭示了逆机制可以减轻数百层的过度平滑。我们的代码可在https://github.com/ml-postech/reverse-gnn上找到。

更新时间: 2024-06-11 12:35:27

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2403.10543v2

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.

Updated: 2024-06-11 12:33:42

标题: II-Bench：面向多模态大型语言模型的图像推理理解基准

摘要: 多模态大型语言模型（MLLMs）的快速发展不断在各种基准测试中取得新突破。作为回应，提出了许多具有挑战性和全面性的基准测试，以更准确地评估MLLMs的能力。然而，对MLLMs高阶感知能力的探索还很匮乏。为了填补这一空白，我们提出了图像推理理解基准测试，II-Bench，旨在评估模型对图像的高阶感知能力。通过对多个MLLMs在II-Bench上进行广泛实验，我们取得了重要发现。首先，观察到MLLMs在II-Bench上的表现与人类之间存在显著差距。MLLMs的巅峰准确率达到74.8％，而人类的准确率平均为90％，最高可达98％。随后，在抽象和复杂图像上，MLLMs表现较差，表明它们在理解高级语义和捕捉图像细节方面存在局限性。最后，观察到大多数模型在提示中加入图像情感极性暗示时准确率有所提高。这一观察强调了它们对图像情感的固有理解不足。我们相信II-Bench将激励社区开发下一代MLLMs，推动通向专家人工智能（AGI）的旅程。II-Bench可在https://huggingface.co/datasets/m-a-p/II-Bench 上公开获取。

更新时间: 2024-06-11 12:33:42

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05862v2

Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.

Updated: 2024-06-11 12:26:30

标题: 掌握合作和竞争同时进行游戏中的零交互

摘要: 自我对弈和规划的结合在顺序游戏中取得了巨大成功，例如在国际象棋和围棋中。然而，将诸如AlphaZero之类的算法调整到同时进行的游戏中则面临新的挑战。在这些游戏中，关于其他代理同时行动的信息缺失是一个限制因素，因为它们可能选择不同的纳什均衡，或者根本不以最佳方式行动。因此，在与其他代理在同时进行的游戏中互动时，对其行为进行建模至关重要。为此，我们提出了Albatross：使用模拟自我对弈学习有界理性代理和基于温度的响应优化的AlphaZero。Albatross学习玩一种新颖的均衡概念——平滑最佳响应逻辑均衡（SBRLE），这使得与任何实力的代理进行合作和竞争成为可能。我们对Albatross在一组合作和竞争的同时完全信息游戏上进行了广泛评估。与AlphaZero相比，Albatross能够在竞争游戏Battlesnake中利用弱势代理。此外，与合作游戏Overcooked基准中的先前最先进水平相比，它取得了37.6%的改进。

更新时间: 2024-06-11 12:26:30

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2402.03136v2

Tailoring Mixup to Data for Calibration

Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic labels assigned and the true label distributions, which can deteriorate calibration. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves performance and calibration of models, while being much more efficient. The code for our work is available at https://github.com/qbouniot/sim_kernel_mixup.

Updated: 2024-06-11 12:22:27

标题: 根据数据定制Mixup以进行校准

摘要: 到目前为止，所有提出的数据增强技术中，对训练样本进行线性插值，也被称为Mixup，已被证明对许多应用程序有效。除了性能提高外，Mixup也是一种改善校准和预测不确定性的良好技术。然而，粗心地混合数据可能导致流形入侵，即分配的合成标签与真实标签分布之间的冲突，这可能会损害校准。在这项工作中，我们认为流形入侵的可能性随着要混合的数据之间的距离增加而增加。为此，我们提出根据要混合的样本之间的相似性动态更改插值系数的基础分布，并定义了一个灵活的框架，可以在不损失多样性的情况下这样做。我们为分类和回归任务提供了广泛的实验，表明我们提出的方法提高了模型的性能和校准，同时更加高效。我们的工作代码可在https://github.com/qbouniot/sim_kernel_mixup 上找到。

更新时间: 2024-06-11 12:22:27

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2311.01434v2

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.

Updated: 2024-06-11 12:22:14

标题: 自我对齐以确保事实性：通过自我评估减轻LLMs中的幻觉

摘要: 尽管大型语言模型（LLMs）展示出越来越接近人类的能力，但它们经常在事实准确性方面遇到困难，即“幻觉”，即使它们具有相关的知识。为了解决这些幻觉，当前的方法通常需要高质量的人类事实标注。在这项工作中，我们探索了自我对齐的事实性，利用LLM的自我评估能力提供训练信号，引导模型朝向事实性。具体而言，我们引入了Self-Eval，一个自我评估组件，促使LLM仅基于其内部知识验证其自己生成的响应的事实性。此外，我们设计了Self-Knowledge Tuning（SK-Tuning）来增强LLM的自我评估能力，通过改进模型的信心估计和校准。然后，我们利用这些自我注释的响应通过直接偏好优化算法对模型进行微调。我们展示了所提出的自我对齐方法在TruthfulQA和BioGEN上的三项关键知识密集型任务中显著提高了事实准确性，超过了Llama系列模型。

更新时间: 2024-06-11 12:22:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.09267v2

MambaLRP: Explaining Selective State Space Sequence Models

Recent sequence modeling approaches using Selective State Space Sequence Models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling, demonstrating promising performance. To foster their reliable use in real-world scenarios, it is crucial to augment their transparency. Our work bridges this critical gap by bringing explainability, particularly Layer-wise Relevance Propagation (LRP), to the Mamba architecture. Guided by the axiom of relevance conservation, we identify specific components in the Mamba architecture, which cause unfaithful explanations. To remedy this issue, we propose MambaLRP, a novel algorithm within the LRP framework, which ensures a more stable and reliable relevance propagation through these components. Our proposed method is theoretically sound and excels in achieving state-of-the-art explanation performance across a diverse range of models and datasets. Moreover, MambaLRP facilitates a deeper inspection of Mamba architectures, uncovering various biases and evaluating their significance. It also enables the analysis of previous speculations regarding the long-range capabilities of Mamba models.

Updated: 2024-06-11 12:15:47

标题: MambaLRP: 解释性选择性状态空间序列模型

摘要: 最近，使用选择性状态空间序列模型（称为Mamba模型）的序列建模方法受到了极大关注。这些模型允许以线性时间高效处理长序列，并正在迅速被广泛应用于诸如语言建模等各种应用中，表现出有希望的性能。为了促进它们在现实场景中的可靠使用，增强它们的透明度至关重要。我们的工作通过将可解释性，特别是分层相关传播（LRP），引入到Mamba架构中来弥补这一关键差距。受相关性保持公理的启发，我们确定了Mamba架构中导致不忠实解释的特定组件。为了解决这个问题，我们提出了MambaLRP，这是LRP框架内的一种新颖算法，通过这些组件确保了更稳定和可靠的相关性传播。我们提出的方法在理论上是可靠的，并在各种模型和数据集上实现了最先进的解释性能。此外，MambaLRP促进了对Mamba架构的深入检查，揭示了各种偏见并评估了它们的重要性。它还使得对Mamba模型的长距离能力的先前推测进行分析成为可能。

更新时间: 2024-06-11 12:15:47

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.07592v1

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ -- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing *data-dependent* (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel *action-dependent bonuses* to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.

Updated: 2024-06-11 12:12:59

标题: 使用独立线性函数逼近的马尔可夫博弈的精细样本复杂度

摘要: 马尔可夫博弈（MG）是多智能体强化学习（MARL）的重要模型。长期以来人们一直认为“多智能体的诅咒”（即，算法性能随着智能体数量呈指数级下降）是不可避免的，直到最近几项研究（Daskalakis等人，2023年；Cui等人，2023年；Wang等人，2023年）。尽管这些研究解决了多智能体的诅咒，在状态空间非常庞大且采用（线性）函数逼近时，它们要么具有较慢的收敛速率$O(T^{-1/4})，要么带来了对动作数量$A_{\max}$的多项式依赖—即使在损失函数可以任意随时间变化的单智能体情况下也是如此。本文首次通过Wang等人（2023年）的AVLPR框架进行了改进，设计了*数据相关*（即，随机）悲观估计的次优性差距，允许更广泛的插入算法选择。当专门应用于具有独立线性函数逼近的MGs时，我们提出了新颖的*动作相关奖励*，以覆盖偶尔极端的估计误差。借助来自单智能体RL文献的最新技术，我们提出了第一个解决多智能体诅咒、达到最佳$O(T^{-1/2})收敛速率和同时避免$\text{poly}(A_{\max})$依赖的算法。

更新时间: 2024-06-11 12:12:59

领域: cs.LG,cs.GT,stat.ML

下载: http://arxiv.org/abs/2402.07082v2

Merging Improves Self-Critique Against Jailbreak Attacks

The robustness of large language models (LLMs) against adversarial manipulations, such as jailbreak attacks, remains a significant challenge. In this work, we propose an approach that enhances the self-critique capability of the LLM and further fine-tunes it over sanitized synthetic data. This is done with the addition of an external critic model that can be merged with the original, thus bolstering self-critique capabilities and improving the robustness of the LLMs response to adversarial prompts. Our results demonstrate that the combination of merging and self-critique can reduce the attack success rate of adversaries significantly, thus offering a promising defense mechanism against jailbreak attacks. Code, data and models released at https://github.com/vicgalle/merging-self-critique-jailbreaks .

Updated: 2024-06-11 12:01:09

标题: 合并改进自我批评对越狱攻击的抵御

摘要: 大型语言模型（LLMs）对抗性操纵的鲁棒性，如越狱攻击，仍然是一个重要挑战。在这项工作中，我们提出了一种方法，增强了LLM的自我批评能力，并通过消毒合成数据进一步微调它。这是通过添加一个外部批评模型来实现的，可以与原始模型合并，从而增强自我批评能力，并改善LLM对抗性提示的响应的鲁棒性。我们的结果表明，合并和自我批评的结合可以显著降低对手的攻击成功率，从而提供一个有希望的防御机制来对抗越狱攻击。代码、数据和模型发布在https://github.com/vicgalle/merging-self-critique-jailbreaks。

更新时间: 2024-06-11 12:01:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07188v1

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $\Sigma$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of statistical models, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant designs. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.

Updated: 2024-06-11 11:56:46

标题: 基于近似消息传递的结构化广义线性模型的频谱估计器

摘要: 我们考虑在高维广义线性模型中的参数估计问题。通过一个适当的数据相关矩阵的主特征向量得到的谱方法提供了一个简单但令人惊讶地有效的解决方案。然而，尽管它们被广泛使用，但对于非结构化（i.i.d. 高斯和哈尔正交）设计，严格的性能表征以及数据预处理的原则性方法仅适用。相比之下，现实世界中的数据矩阵具有高度结构化且展现出非平凡的相关性。为了解决这个问题，我们考虑捕捉特征的非各向异性特性的相关高斯设计，通过一个协方差矩阵$\Sigma$。我们的主要结果是对谱估计器性能的精确渐近表征。这使我们能够确定最小化参数估计所需样本数量的最佳预处理方法。令人惊讶的是，这种预处理在广泛的统计模型中是通用的，这在一定程度上解决了关于旋转不变设计的最佳谱估计器的猜想。我们的原则性方法大大改进了先前的启发式方法，包括计算成像和遗传学中常见的设计。提出的基于近似消息传递的方法具有广泛的适用性，并为在各种设置中精确表征尖峰矩阵及相应谱方法的开辟了道路。

更新时间: 2024-06-11 11:56:46

领域: math.ST,cs.IT,cs.LG,math.IT,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2308.14507v2

TernaryLLM: Ternarized Large Language Model

Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming from outliers in both weights and activations. In this work, observing asymmetric outliers and non-zero means in weights, we introduce Dual Learnable Ternarization (DLT), which enables both scales and shifts to be learnable. We also propose Outlier-Friendly Feature Knowledge Distillation (OFF) to recover the information lost in extremely low-bit quantization. The proposed OFF can incorporate semantic information and is insensitive to outliers. At the core of OFF is maximizing the mutual information between features in ternarized and floating-point models using cosine similarity. Extensive experiments demonstrate that our TernaryLLM surpasses previous low-bit quantization methods on the standard text generation and zero-shot benchmarks for different LLM families. Specifically, for one of the most powerful open-source models, LLaMA-3, our approach (W1.58A16) outperforms the previous state-of-the-art method (W2A16) by 5.8 in terms of perplexity on C4 and by 8.2% in terms of average accuracy on zero-shot tasks.

Updated: 2024-06-11 11:40:12

标题: 三元LLM: 三值化大型语言模型

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）任务上取得了显著的性能，但受到高计算成本和内存需求的限制。三值化是一种极端的量化形式，通过减少内存使用并实现能效浮点加法，提供了一种解决方案。然而，将三值化应用于LLMs面临着来自权重和激活中异常值的挑战。在本研究中，观察到权重中的不对称异常值和非零均值，我们引入了双可学习三值化（DLT），使得尺度和位移都可以被学习。我们还提出了友好异常值特征知识蒸馏（OFF）来恢复在极低比特量化中丢失的信息。所提出的OFF可以包含语义信息，并且对异常值不敏感。OFF的核心是通过余弦相似度最大化三值化模型和浮点模型中特征之间的互信息。大量实验证明，我们的三值LLM在标准文本生成和零样本基准测试中超过了以前的低比特量化方法，适用于不同LLM系列。具体而言，对于最强大的开源模型之一LLaMA-3，我们的方法（W1.58A16）在C4的困惑度方面比以前的最先进方法（W2A16）提高了5.8%，在零样本任务的平均准确度方面提高了8.2%。

更新时间: 2024-06-11 11:40:12

领域: cs.LG

下载: http://arxiv.org/abs/2406.07177v1

A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining

Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, student behavior detection, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. We then analyze the practical challenges in EDM and propose targeted solutions. Finally, we point out emerging trends and future directions in this research area.

Updated: 2024-06-11 11:38:57

标题: 教育数据挖掘中深度学习技术的综合调研

摘要: 教育数据挖掘（EDM）已经成为一个重要的研究领域，利用计算技术来分析教育数据。随着教育数据的复杂性和多样性不断增加，深度学习技术在解决与分析和建模这些数据相关的挑战方面显示出显著优势。本调查旨在系统地审视EDM与深度学习的最新状况。我们首先简要介绍EDM和深度学习，并突出它们在现代教育背景下的相关性。接下来，我们详细审查了应用于四种典型教育场景中的深度学习技术，包括知识追踪、学生行为检测、表现预测和个性化推荐。此外，还提供了一个关于EDM的公共数据集和处理工具的全面概述。然后，我们分析了EDM中的实际挑战并提出有针对性的解决方案。最后，我们指出了这一研究领域的新兴趋势和未来方向。

更新时间: 2024-06-11 11:38:57

领域: cs.LG,cs.CY,cs.IR

下载: http://arxiv.org/abs/2309.04761v4

G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous covariate history. In this work, we present G-Transformer, a Transformer-based framework supporting g-computation for counterfactual prediction under dynamic and time-varying treatment strategies. G-Transfomer captures complex, long-range dependencies in time-varying covariates using a Transformer architecture. G-Transformer estimates the conditional distribution of relevant covariates given covariate and treatment history at each time point using an encoder architecture, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture for counterfactual outcome prediction under dynamic and time-varying treatment strategies. Code will be released upon publication of the paper.

Updated: 2024-06-11 11:37:35

标题: G-Transformer：在动态和时变治疗方案下的反事实结果预测

摘要: 在医学决策的背景下，反事实预测使临床医生能够根据观察到的患者历史，预测在替代治疗行动下感兴趣的治疗结果。以往的机器学习方法针对时间变化的治疗进行反事实预测，主要集中在静态时间变化的治疗方案上，其中治疗不依赖于先前的协变量历史。在这项工作中，我们提出了G-Transformer，这是一个基于Transformer的框架，支持在动态和时间变化的治疗策略下进行反事实预测。G-Transfomer利用Transformer架构捕捉时间变化的协变量中的复杂、长程依赖关系。G-Transformer通过编码器架构估计在每个时间点给定协变量和治疗历史的相关协变量的条件分布，然后通过在感兴趣的治疗策略下模拟患者轨迹的前进来产生反事实结果的蒙特卡洛估计。我们使用两个来自机械模型的模拟纵向数据集和来自MIMIC-IV的现实世界sepsis ICU数据集对G-Transformer进行了广泛评估。在这些设置中，G-Transformer优于传统和最先进的反事实预测模型。据我们所知，这是第一个基于Transformer架构的针对动态和时间变化的治疗策略进行反事实结果预测的模型。代码将在论文发表后发布。

更新时间: 2024-06-11 11:37:35

领域: cs.LG

下载: http://arxiv.org/abs/2406.05504v2

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges

Graph Neural Networks (GNNs) perform well in community detection and molecule classification. Counterfactual Explanations (CE) provide counter-examples to overcome the transparency limitations of black-box models. Due to the growing attention in graph learning, we focus on the concepts of CE for GNNs. We analysed the SoA to provide a taxonomy, a uniform notation, and the benchmarking datasets and evaluation metrics. We discuss fourteen methods, their evaluation protocols, twenty-two datasets, and nineteen metrics. We integrated the majority of methods into the GRETEL library to conduct an empirical evaluation to understand their strengths and pitfalls. We highlight open challenges and future work.

Updated: 2024-06-11 11:18:57

标题: 一项关于图形对抗性解释的调查：定义、方法、评估和研究挑战

摘要: 图神经网络（GNN）在社区检测和分子分类中表现良好。反事实解释（CE）提供反例来克服黑盒模型的透明度限制。由于对图学习的关注不断增加，我们专注于GNN的CE概念。我们分析了现有技术以提供分类法、统一符号和基准数据集以及评估指标。我们讨论了十四种方法、它们的评估协议、二十二个数据集和十九个指标。我们将大多数方法整合到GRETEL库中进行实证评估，以了解它们的优势和缺陷。我们强调了开放挑战和未来工作。

更新时间: 2024-06-11 11:18:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2210.12089v3

Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning

Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely predictions are vital for safeguarding human life and property. Consequently, finding a balance between accuracy and timeliness is crucial. In this paper, we propose a spatio-temporal early prediction model based on Multi-Objective reinforcement learning that can either implement an optimal policy given a preference or infer the preference based on a small number of samples. The model addresses two primary challenges: 1) enhancing the accuracy of early predictions and 2) providing the optimal policy for determining the most suitable prediction time for each area. Our method demonstrates superior performance on three large-scale real-world datasets, surpassing existing methods in early spatio-temporal prediction tasks.

Updated: 2024-06-11 11:14:56

标题: 基于多目标强化学习的时空早期预测

摘要: 准确性和及时性在预测任务中确实经常是相互冲突的目标。过早的预测可能会导致更高的虚警率，而延迟预测以收集更多信息可能会导致它们太迟而无法发挥作用。在野火、犯罪和交通拥堵等应用中，及时的预测对于保障人类生命和财产至关重要。因此，在准确性和及时性之间找到平衡是至关重要的。在本文中，我们提出了一个基于多目标强化学习的时空早期预测模型，该模型可以根据偏好实施最佳策略，或者根据少量样本推断出偏好。该模型解决了两个主要挑战：1）提高早期预测的准确性；2）为确定每个区域最适合的预测时间提供最佳策略。我们的方法在三个大规模真实世界数据集上展现出优越的性能，在早期时空预测任务中超越了现有方法。

更新时间: 2024-06-11 11:14:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.04035v2

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that is fully balanced in speakers and emotions distributions. Based on EmoBox, we present the intra-corpus SER results of 10 pre-trained speech models on 32 emotion datasets with 14 languages, and the cross-corpus SER results on 4 datasets with the fully balanced test sets. To the best of our knowledge, this is the largest SER benchmark, across language scopes and quantity scales. We hope that our toolkit and benchmark can facilitate the research of SER in the community.

Updated: 2024-06-11 11:12:51

标题: EmoBox：多语言多语料库语音情感识别工具包和基准

摘要: 语音情感识别（SER）是人机交互的重要组成部分，受到工业界和学术界的广泛关注。然而，目前的SER研究领域长期以来存在以下问题：1）数据集的合理和通用分割较少，使得比较不同模型和方法变得困难。2）没有常用的基准涵盖多样的语料库和语言供研究人员参考，使得再现性成为负担。在本文中，我们提出了EmoBox，一个开箱即用的多语种多语料库语音情感识别工具包，并提供了用于内部语料库和跨语料库设置的基准。对于内部语料库设置，我们精心设计了不同数据集的数据分区。对于跨语料库设置，我们采用了一个基础SER模型emotion2vec，以减轻注释错误，并获得一个在发言者和情感分布上完全平衡的测试集。基于EmoBox，我们展示了10个预先训练的语音模型在32个包含14种语言的情感数据集上的内部语料库SER结果，以及在4个数据集上的跨语料库SER结果，其中包含完全平衡的测试集。据我们所知，这是跨语言范围和数量规模最大的SER基准。我们希望我们的工具包和基准可以促进社区对SER的研究。

更新时间: 2024-06-11 11:12:51

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2406.07162v1

Deep Learning-Based Approach for User Activity Detection with Grant-Free Random Access in Cell-Free Massive MIMO

Modern wireless networks must reliably support a wide array of connectivity demands, encompassing various user needs across diverse scenarios. Machine-Type Communication (mMTC) is pivotal in these networks, particularly given the challenges posed by massive connectivity and sporadic device activation patterns. Traditional grant-based random access (GB-RA) protocols face limitations due to constrained orthogonal preamble resources. In response, the adoption of grant-free random access (GF-RA) protocols offers a promising solution. This paper explores the application of supervised machine learning models to tackle activity detection issues in scenarios where non-orthogonal preamble design is considered. We introduce a data-driven algorithm specifically designed for user activity detection in Cell-Free Massive Multiple-Input Multiple-Output (CF-mMIMO) networks operating under GF-RA protocols. Additionally, this study presents a novel clustering strategy that simplifies and enhances activity detection accuracy, assesses the resilience of the algorithm to input perturbations, and investigates the effects of adopting floating-to-fixed-point conversion on algorithm performance. Simulations conducted adhere to 3GPP standards, ensuring accurate channel modeling, and employ a deep learning approach to boost the detection capabilities of mMTC GF-RA devices. The results are compelling: the algorithm achieves an exceptional 99\% accuracy rate, confirming its efficacy in real-world applications.

Updated: 2024-06-11 11:08:33

标题: 基于深度学习的方法在无授权随机接入的无蜂窝大规模MIMO中用于用户活动检测

摘要: 现代无线网络必须可靠地支持各种连接需求，涵盖不同用户在不同场景下的各种需求。机器类型通信（mMTC）在这些网络中至关重要，特别是考虑到大量连接和零星设备激活模式带来的挑战。传统的基于授予的随机接入（GB-RA）协议由于受限于正交前导资源而面临限制。为此，采用无授予随机接入（GF-RA）协议提供了一种有前途的解决方案。本文探讨了在考虑非正交前导设计的场景中，采用监督机器学习模型来解决活动检测问题。我们引入了一个专门设计用于在GF-RA协议下运行的Cell-Free Massive Multiple-Input Multiple-Output（CF-mMIMO）网络中进行用户活动检测的数据驱动算法。此外，本研究提出了一种新颖的聚类策略，简化和增强了活动检测的准确性，评估了算法对输入扰动的弹性，并研究了采用浮点到定点转换对算法性能的影响。进行的模拟符合3GPP标准，确保准确的信道建模，并采用深度学习方法来提升mMTC GF-RA设备的检测能力。结果令人信服：该算法实现了99\%的异常准确率，验证了其在实际应用中的有效性。

更新时间: 2024-06-11 11:08:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.07160v1

A benchmark dataset for deep learning-based airplane detection: HRPlanes

Airplane detection from satellite imagery is a challenging task due to the complex backgrounds in the images and differences in data acquisition conditions caused by the sensor geometry and atmospheric effects. Deep learning methods provide reliable and accurate solutions for automatic detection of airplanes; however, huge amount of training data is required to obtain promising results. In this study, we create a novel airplane detection dataset called High Resolution Planes (HRPlanes) by using images from Google Earth (GE) and labeling the bounding box of each plane on the images. HRPlanes include GE images of several different airports across the world to represent a variety of landscape, seasonal and satellite geometry conditions obtained from different satellites. We evaluated our dataset with two widely used object detection methods namely YOLOv4 and Faster R-CNN. Our preliminary results show that the proposed dataset can be a valuable data source and benchmark data set for future applications. Moreover, proposed architectures and results of this study could be used for transfer learning of different datasets and models for airplane detection.

Updated: 2024-06-11 11:04:06

标题: 一个基于深度学习的飞机检测基准数据集：HRPlanes

摘要: 卫星图像中的飞机检测是一项具有挑战性的任务，原因在于图像中复杂的背景以及由传感器几何和大气效应引起的数据获取条件的差异。深度学习方法为飞机的自动检测提供了可靠且准确的解决方案；然而，为了获得令人满意的结果，需要大量的训练数据。在本研究中，我们利用谷歌地球（GE）图像并在图像上标记每架飞机的边界框，创建了一个名为高分辨率飞机（HRPlanes）的新型飞机检测数据集。HRPlanes包括来自世界各地多个不同机场的GE图像，以代表来自不同卫星获得的各种地貌、季节和卫星几何条件。我们使用了两种广泛使用的目标检测方法，即YOLOv4和Faster R-CNN，对我们的数据集进行了评估。我们的初步结果表明，所提出的数据集可以成为未来应用的有价值的数据源和基准数据集。此外，本研究提出的体系结构和结果可用于不同数据集和模型的飞机检测的迁移学习。

更新时间: 2024-06-11 11:04:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2204.10959v2

Scaling Large-Language-Model-based Multi-Agent Collaboration

Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration, demonstrating that collective intelligence can surpass the capabilities of each individual. Inspired by the neural scaling law, which posits that increasing neurons leads to emergent abilities, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration. Technically, we propose multi-agent collaboration networks (MacNet), which utilize directed acyclic graphs to organize agents and streamline their interactive reasoning via topological ordering, with solutions derived from their dialogues. Extensive experiments show that MacNet consistently outperforms baseline models, enabling effective agent collaboration across various network topologies and supporting cooperation among more than a thousand agents. Notably, we observed a small-world collaboration phenomenon, where topologies resembling small-world properties achieved superior performance. Additionally, we identified a collaborative scaling law, indicating that normalized solution quality follows a logistic growth pattern as scaling agents, with collaborative emergence occurring much earlier than previously observed instances of neural emergence. The code and data will be available at https://github.com/OpenBMB/ChatDev.

Updated: 2024-06-11 11:02:04

标题: 基于大型语言模型的多智能体协作的扩展

摘要: 开创性的大型语言模型驱动的代理技术的进步强调了多代理协作的设计模式，表明集体智能可以超越每个个体的能力。受神经缩放定律的启发，即增加神经元会导致新的能力，本研究调查了类似原则是否适用于增加多代理协作中的代理。从技术上讲，我们提出了利用有向无环图来组织代理并通过拓扑排序简化它们的交互推理的多代理协作网络（MacNet），并通过它们的对话得出解决方案。大量实验表明，MacNet始终优于基准模型，能够在各种网络拓扑结构中实现有效的代理协作，并支持超过一千个代理之间的合作。值得注意的是，我们观察到了一个小世界协作现象，其中类似小世界属性的拓扑结构实现了更优异的性能。此外，我们还确定了一个协作缩放定律，表明随着代理规模的增加，标准化解决方案质量遵循逻辑增长模式，协作出现比先前观察到的神经出现实例要早得多。代码和数据将在https://github.com/OpenBMB/ChatDev上提供。

更新时间: 2024-06-11 11:02:04

领域: cs.AI,cs.CL,cs.MA,cs.NI,cs.SI

下载: http://arxiv.org/abs/2406.07155v1

UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming

Distributed learning is commonly used for training deep learning models, especially large models. In distributed learning, manual parallelism (MP) methods demand considerable human effort and have limited flexibility. Hence, automatic parallelism (AP) methods have recently been proposed for automating the parallel strategy optimization process. Existing AP methods suffer from sub-optimal solutions because they do not jointly optimize the two categories of parallel strategies (i.e., inter-layer parallelism and intra-layer parallelism). In this paper, we propose a novel AP method called UniAP, which unifies inter- and intra-layer automatic parallelism by mixed integer quadratic programming. To the best of our knowledge, UniAP is the first parallel method that can jointly optimize the two categories of parallel strategies to find an optimal solution. Experimental results show that UniAP outperforms state-of-the-art methods by up to 3.80$\times$ in throughput and reduces strategy optimization time by up to 107$\times$ across five Transformer-based models.

Updated: 2024-06-11 10:52:48

标题: UniAP：通过混合整数二次规划统一跨层和内层自动并行化

摘要: 分布式学习通常用于训练深度学习模型，特别是大型模型。在分布式学习中，手动并行（MP）方法需要大量人力，并且灵活性有限。因此，最近提出了自动并行（AP）方法，用于自动化并行策略优化过程。现有的AP方法存在子优化解的问题，因为它们没有同时优化两类并行策略（即，层间并行和层内并行）。在本文中，我们提出了一种名为UniAP的新型AP方法，通过混合整数二次规划来统一层间和层内自动并行。据我们所知，UniAP是第一种可以同时优化两类并行策略以找到最优解的并行方法。实验结果显示，UniAP在各种基于Transformer的模型中，吞吐量比现有方法提高了多达3.80倍，并且减少了高达107倍的策略优化时间。

更新时间: 2024-06-11 10:52:48

领域: cs.LG,cs.DC,math.OC

下载: http://arxiv.org/abs/2307.16375v4

EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels

Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness and bulkiness of these devices make relevant applications difficult to generalize to practical applications. On the other hand, Electroencephalography (EEG), despite its advantages of ease of use, cost-efficiency, high temporal resolution, and non-invasive nature, has not been fully explored in relevant studies due to the lack of comprehensive datasets. To address this gap, we introduce EEG-ImageNet, a novel EEG dataset comprising recordings from 16 subjects exposed to 4000 images selected from the ImageNet dataset. EEG-ImageNet consists of 5 times EEG-image pairs larger than existing similar EEG benchmarks. EEG-ImageNet is collected with image stimuli of multi-granularity labels, i.e., 40 images with coarse-grained labels and 40 with fine-grained labels. Based on it, we establish benchmarks for object classification and image reconstruction. Experiments with several commonly used models show that the best models can achieve object classification with accuracy around 60% and image reconstruction with two-way identification around 64%. These results demonstrate the dataset's potential to advance EEG-based visual brain-computer interfaces, understand the visual perception of biological systems, and provide potential applications in improving machine visual models.

Updated: 2024-06-11 10:52:17

标题: EEG-ImageNet：一种具有多粒度标签的图像视觉刺激的脑电图数据集和基准测试

摘要: 从大脑活动中识别和重建我们所看到的东西，使我们能够深入研究生物视觉系统如何表征世界。尽管最近的努力已经实现了从功能性磁共振成像（fMRI）或脑磁图（MEG）收集的脑信号进行高性能图像分类和高质量图像重建，但这些设备的昂贵和笨重使得相关应用难以推广到实际应用中。另一方面，尽管脑电图（EEG）具有易于使用、成本效益高、高时间分辨率和无创性的优势，但由于缺乏全面的数据集，它在相关研究中尚未得到充分探索。为了填补这一差距，我们引入了EEG-ImageNet，这是一个新颖的EEG数据集，包括16名受试者对来自ImageNet数据集的4000幅图像的记录。EEG-ImageNet比现有类似EEG基准数据集的EEG-图像对大5倍。EEG-ImageNet采用多粒度标签的图像刺激，即40个具有粗粒度标签的图像和40个细粒度标签的图像。基于此，我们建立了物体分类和图像重建的基准。对几种常用模型进行的实验表明，最佳模型可以实现大约60％的物体分类准确率和大约64％的双向识别的图像重建。这些结果表明该数据集在推动基于EEG的视觉脑-计算机界面、理解生物系统的视觉感知以及提供改进机器视觉模型的潜在应用方面具有潜力。

更新时间: 2024-06-11 10:52:17

领域: cs.MM,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.07151v1

Wearable Device-Based Physiological Signal Monitoring: An Assessment Study of Cognitive Load Across Tasks

This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution cognitive load assessment on EEG data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students(SVS). By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among SVS students and their utility across various tasks. The study designed two experiments to validate the efficacy of the proposed approach: Initially, a random forest classification model, developed using the N-BACK task, enabled the precise decoding of physiological signal characteristics in SVS students under different levels of cognitive load, achieving a classification accuracy of 97%. Subsequently, this classification model was applied in a cross-task experiment involving the National Computer Rank Examination, demonstrating the method's significant applicability and cross-task transferability in diverse learning contexts. Conducted with high portability, this research holds substantial theoretical and practical significance for optimizing teaching resource allocation in secondary vocational education, as well as for cognitive load assessment methods and monitoring. Currently, the research findings are undergoing trial implementation in the school.

Updated: 2024-06-11 10:48:26

标题: 可穿戴设备基于生理信号监测：跨任务认知负荷评估研究

摘要: 本研究利用尖端可穿戴监测技术，对二级职业学生（SVS）的FP1通道脑电图数据和心率变异性（HRV）数据进行高精度、高时间分辨率的认知负荷评估。通过联合分析这两个关键生理指标，研究深入探讨它们在评估SVS学生认知负荷方面的应用价值以及在各种任务中的效用。研究设计了两个实验来验证所提出方法的有效性：首先，使用N-BACK任务开发的随机森林分类模型，能够精确解码SVS学生在不同认知负荷水平下的生理信号特征，实现了97%的分类准确率。随后，该分类模型在涉及全国计算机等级考试的跨任务实验中得到应用，展示了该方法在不同学习环境中的显著适用性和跨任务可转移性。这项研究具有高度的可移植性，对于优化二级职业教育中的教学资源配置以及认知负荷评估方法和监测具有重要的理论和实践意义。目前，研究结果正在学校进行试行实施。

更新时间: 2024-06-11 10:48:26

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.07147v1

StreamPrompt: Learnable Prompt-guided Data Selection for Efficient Stream Learning

Stream Learning (SL) requires models to rapidly adapt to continuous data streams, setting it apart from traditional Continual Learning (CL). Recent SL methods emphasize efficiency by selecting data subsets for training, but they often struggle due to their reliance on static, rule-based selection algorithms that cannot effectively adapt to the changing importance of data. In this work, we introduce StreamPrompt, a method that enhances data selection through dynamic, learnable prompts. These dynamic prompts serve two purposes beyond guiding model inference: 1) optimizing data selection, and 2) guiding updates to the rehearsal buffer. This approach addresses the challenges of adaptability and computational efficiency in processing continuous data streams. Moreover, StreamPrompt introduces Prompt Attunement,a mechanism that enhances the efficiency of prompt learning. By leveraging attention layers from vision transformers and softly combining their outputs with a gate unit, Prompt Attunementrefines prompts with minimal computational resources. Comprehensive evaluations demonstrate StreamPrompts superior performance over state-of-the-art, with significant improvements in accuracy and reductions in training time. These results underscore the efficacy and efficiency of StreamPrompt, establishing its potential as a scalable and effective solution for the evolving demands of SL. Our code is available at https://github.com/intellistream/Efficient-Stream-Learning.

Updated: 2024-06-11 10:46:41

标题: StreamPrompt：可学习的提示引导数据选择，用于高效流式学习

摘要: 流式学习（SL）要求模型快速适应连续数据流，这使其不同于传统的持续学习（CL）。最近的SL方法强调通过选择数据子集进行训练来提高效率，但它们经常遇到困难，因为它们依赖于静态的、基于规则的选择算法，这些算法无法有效地适应数据重要性的变化。在这项工作中，我们介绍了StreamPrompt，一种通过动态可学习提示增强数据选择的方法。这些动态提示除了引导模型推断之外还有两个目的：1）优化数据选择，2）引导更新到回忆缓冲区。这种方法解决了在处理连续数据流时适应性和计算效率的挑战。此外，StreamPrompt引入了Prompt Attunement，一种增强提示学习效率的机制。通过利用视觉变换器中的注意层，并将它们的输出与门单元软组合，Prompt Attunement用最少的计算资源细化提示。全面评估表明StreamPrompt在准确性和训练时间减少方面优于现有技术，这些结果强调了StreamPrompt的有效性和效率，确立了其作为可扩展和有效解决方案的潜力，以满足SL不断变化的需求。我们的代码可在https://github.com/intellistream/Efficient-Stream-Learning找到。

更新时间: 2024-06-11 10:46:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07590v1

Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, which results in a loss of the inherent 3D nature and critical details. To overcome these issues, we introduce a novel framework that efficiently and effectively generates radiology reports for high-resolution (HR) 3D volumes, based on large language models (LLMs). Specifically, our framework utilizes low-resolution (LR) visual tokens as queries to mine information from HR tokens, preserving detailed HR information while reducing computational costs by only processing HR informed LR visual queries. Further benefiting the field, we curate and release BIMCV-RG, a new dataset with 5,328 HR 3D volumes and paired reports, establishing the first benchmarks for report generation from 3D HR medical images. Our method consistently surpasses existing methods on this benchmark across three different settings: normal-resolution, high-resolution inputs, and zero-shot domain transfer, all at an acceptable computational cost, trainable on a single A100-80G.

Updated: 2024-06-11 10:45:59

标题: 基准测试和提升三维高分辨率医学图像的放射学报告生成

摘要: 自动生成放射学报告可以显著地促进放射科医生的报告撰写工作，尤其是对于像CT扫描这样的3D放射图像，这对于广泛的临床诊断至关重要，但与2D放射图像相比尚未得到充分开发。现有方法通常要么逐层处理3D体积，要么由于当前GPU内存限制而进行激进的降采样，这导致丢失固有的3D特性和关键细节。为了克服这些问题，我们引入了一个新颖的框架，基于大型语言模型（LLMs）高效有效地生成高分辨率（HR）3D体积的放射学报告。具体来说，我们的框架利用低分辨率（LR）视觉标记作为查询，从HR标记中挖掘信息，保留详细的HR信息同时通过仅处理HR知情的LR视觉查询来降低计算成本。进一步造福该领域，我们策划并发布了一个新的数据集BIMCV-RG，其中包含5,328个HR 3D体积和配对报告，建立了从3D HR医学图像生成报告的第一个基准。我们的方法在这个基准测试中始终优于现有方法，包括三种不同设置：正常分辨率、高分辨率输入和零样本领域转移，所有这些都在可接受的计算成本下，可以在单个A100-80G上进行训练。

更新时间: 2024-06-11 10:45:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07146v1

Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models

In large deep neural networks that seem to perform surprisingly well on many tasks, we also observe a few failures related to accuracy, social biases, and alignment with human values, among others. Therefore, before deploying these models, it is crucial to characterize this failure landscape for engineers to debug and legislative bodies to audit models. Nevertheless, it is infeasible to exhaustively test for all possible combinations of factors that could lead to a model's failure. In this paper, we introduce a post-hoc method that utilizes \emph{deep reinforcement learning} to explore and construct the landscape of failure modes in pre-trained discriminative and generative models. With the aid of limited human feedback, we then demonstrate how to restructure the failure landscape to be more desirable by moving away from the discovered failure modes. We empirically show the effectiveness of the proposed method across common Computer Vision, Natural Language Processing, and Vision-Language tasks.

Updated: 2024-06-11 10:45:41

标题: 失败是注定的，但可以被淡化：对大规模视觉和语言模型中不良行为的特征化和减轻

摘要: 在许多任务中表现出奇异优秀的大型深度神经网络中，我们也观察到了与准确性、社会偏见和与人类价值观一致性等相关的一些失败。因此，在部署这些模型之前，对工程师来说，对这种失败景观进行表征是至关重要的，以便进行调试，对立法机构来说，对模型进行审计也是必要的。然而，对可能导致模型失败的所有可能因素组合进行详尽测试是不可行的。在本文中，我们介绍了一种后期方法，利用深度强化学习来探索和构建预训练判别模型和生成模型的失败模式景观。通过有限的人类反馈的帮助，我们展示了如何通过远离发现的失败模式来重新构建更理想的失败景观。我们在常见的计算机视觉、自然语言处理和视觉-语言任务中实证展示了所提出方法的有效性。

更新时间: 2024-06-11 10:45:41

领域: cs.LG

下载: http://arxiv.org/abs/2406.07145v1

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets.

Updated: 2024-06-11 10:40:54

标题: 通过概率性插槽关注实现可识别的物体中心表示学习

摘要: 学习模块化的物体中心表示对系统性泛化至关重要。现有方法在实证上展现了有希望的物体绑定能力，但理论上的可识别性保证仍相对不够完备。理解何时可以在理论上识别物体中心表示对于将基于插槽方法扩展到具有正确性保证的高维图像至关重要。为此，我们提出了一种概率插槽注意力算法，通过在物体中心插槽表示上施加一个聚合混合先验，从而在等价关系下提供插槽可识别性保证，无需监督。我们使用简单的二维数据和高分辨率成像数据集对我们的理论可识别性结果进行了实证验证。

更新时间: 2024-06-11 10:40:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07141v1

Cybersecurity in Critical Infrastructures: A Post-Quantum Cryptography Perspective

The machinery of industrial environments was connected to the Internet years ago with the scope of increasing their performance. However, this change made such environments vulnerable against cyber-attacks that can compromise their correct functioning resulting in economic or social problems. Moreover, implementing cryptosystems in the communications between operational technology (OT) devices is a more challenging task than for information technology (IT) environments since the OT networks are generally composed of legacy elements, characterized by low-computational capabilities. Consequently, implementing cryptosystems in industrial communication networks faces a trade-off between the security of the communications and the amortization of the industrial infrastructure. Critical Infrastructure (CI) refers to the industries which provide key resources for the daily social and economical development, e.g. electricity. Furthermore, a new threat to cybersecurity has arisen with the theoretical proposal of quantum computers, due to their potential ability of breaking state-of-the-art cryptography protocols, such as RSA or ECC. Many global agents have become aware that transitioning their secure communications to a quantum secure paradigm is a priority that should be established before the arrival of fault-tolerance. In this paper, we aim to describe the problematic of implementing post-quantum cryptography (PQC) to CI environments. For doing so, we describe the requirements for these scenarios and how they differ against IT. We also introduce classical cryptography and how quantum computers pose a threat to such security protocols. Furthermore, we introduce state-of-the-art proposals of PQC protocols and present their characteristics. We conclude by discussing the problematic of integrating PQC in industrial environments.

Updated: 2024-06-11 10:29:10

标题: 关键基础设施中的网络安全：后量子密码学的视角

摘要: 工业环境的设备早在几年前就连接到了互联网，目的是提高它们的性能。然而，这种变化使得这些环境容易受到网络攻击，可能会损害它们的正常运行，导致经济或社会问题。此外，在操作技术（OT）设备之间的通信中实施加密系统比在信息技术（IT）环境中更具挑战性，因为OT网络通常由低计算能力的传统元素组成。因此，在工业通信网络中实施加密系统面临着通信安全和工业基础设施摊销之间的权衡。关键基础设施（CI）指为日常社会和经济发展提供关键资源的行业，例如电力。此外，随着量子计算机的理论提议，网络安全面临了新的威胁，因为量子计算机有可能破解现有的RSA或ECC等密码协议。许多全球组织已意识到，在容错机制到来之前，将安全通信迁移到量子安全范式是一项优先任务。本文旨在描述将后量子密码学（PQC）应用于CI环境的问题。为此，我们描述了这些场景的要求以及它们与IT的不同之处。我们还介绍了经典密码学以及量子计算机对此类安全协议构成的威胁。此外，我们介绍了PQC协议的最新提议并展示其特点。最后，我们讨论了在工业环境中整合PQC的问题。

更新时间: 2024-06-11 10:29:10

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2401.03780v2

Mining Frequent Structures in Conceptual Models

The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of creating models. The undeniable value of using patterns in conceptual modeling was demonstrated in several experimental studies. However, discovering patterns in conceptual models is widely recognized as a highly complex task and a systematic solution to pattern identification is currently lacking. In this paper, we propose a general approach to the problem of discovering frequent structures, as they occur in conceptual modeling languages. As proof of concept for our scientific contribution, we provide an implementation of the approach, by focusing on UML class diagrams, in particular OntoUML models. This implementation comprises an exploratory tool, which, through the combination of a frequent subgraph mining algorithm and graph manipulation techniques, can process multiple conceptual models and discover recurrent structures according to multiple criteria. The primary objective is to offer a support facility for language engineers. This can be employed to leverage both good and bad modeling practices, to evolve and maintain the conceptual modeling language, and to promote the reuse of encoded experience in designing better models with the given language.

Updated: 2024-06-11 10:24:02

标题: 在概念模型中挖掘频繁结构

摘要: 使用结构化方法表示知识的问题在概念建模中是众所周知的，并且已经被研究了很多年。已经证明采用建模模式代表了一种有效的结构方法。模式实际上是可利用作为设计问题解决方案的可推广的重复结构。它们有助于理解和改进创建模型的过程。在几项实验研究中展示了在概念建模中使用模式的无可否认的价值。然而，在概念模型中发现模式被广泛认为是一个非常复杂的任务，目前缺乏系统化的模式识别解决方案。在本文中，我们提出了一种发现概念建模语言中频繁结构问题的一般方法。作为我们科学贡献的概念验证，我们提供了一种方法的实现，重点放在UML类图上，特别是OntoUML模型。这一实现包括一种探索性工具，通过结合频繁子图挖掘算法和图操作技术，可以处理多个概念模型，并根据多个标准发现重复结构。主要目标是为语言工程师提供支持设施。这可以用来利用好坏建模实践，发展和维护概念建模语言，并促进在给定语言中设计更好模型时重用编码经验。

更新时间: 2024-06-11 10:24:02

领域: cs.AI

下载: http://arxiv.org/abs/2406.07129v1

Logical Distillation of Graph Neural Networks

We present a logic based interpretable model for learning on graphs and an algorithm to distill this model from a Graph Neural Network (GNN). Recent results have shown connections between the expressivity of GNNs and the two-variable fragment of first-order logic with counting quantifiers (C2). We introduce a decision-tree based model which leverages an extension of C2 to distill interpretable logical classifiers from GNNs. We test our approach on multiple GNN architectures. The distilled models are interpretable, succinct, and attain similar accuracy to the underlying GNN. Furthermore, when the ground truth is expressible in C2, our approach outperforms the GNN.

Updated: 2024-06-11 10:18:58

标题: 图神经网络的逻辑提炼

摘要: 我们提出了一种基于逻辑的可解释模型，用于在图上学习，并提出了一种从图神经网络（GNN）中提炼这种模型的算法。最近的研究结果表明，GNN的表达能力与一阶逻辑带计数量词（C2）的两变量片段存在联系。我们引入了一种基于决策树的模型，利用C2的扩展从GNN中提炼可解释的逻辑分类器。我们在多个GNN架构上测试了我们的方法。提炼出的模型具有可解释性、简洁性，并且达到与底层GNN类似的准确性。此外，当基础真相可以用C2表达时，我们的方法优于GNN。

更新时间: 2024-06-11 10:18:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07126v1

MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic Algorithm

In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm

Updated: 2024-06-11 10:17:44

标题: MeGA: 基于遗传算法的合并多个独立训练的神经网络

摘要: 在这篇论文中，我们介绍了一种使用遗传算法MeGA合并多个预训练神经网络权重的新方法。传统技术，如权重平均和集成方法，经常无法充分发挥预训练网络的能力。我们的方法利用了具有锦标赛选择、交叉和突变的遗传算法来优化权重组合，从而创建更有效的融合。这种技术使合并模型能够继承父模型的优势特征，从而提高准确性和鲁棒性。通过对CIFAR-10数据集的实验，我们证明了我们基于遗传算法的权重合并方法相比于单个模型和传统方法能够提高测试准确性。这种方法为在各种深度学习应用中整合多个预训练网络提供了可扩展的解决方案。Github链接为：https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm.

更新时间: 2024-06-11 10:17:44

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.04607v2

CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation

Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.

Updated: 2024-06-11 10:16:55

标题: 加拉加斯：用于详细的CAN攻击模拟的车辆架构

摘要: 现代车辆越来越容易受到利用网络基础设施的攻击的威胁，特别是控制区域网络（CAN）网络。为了有效地应对这些威胁，使用基于数据分析和分类的当代工具如入侵检测系统（IDSs），大量的CAN消息数据集变得至关重要。本文探讨了通过利用仿真框架（如Simulink）的建模能力和攻击模型的健壮表示来生成合成数据集的可行性，提出了CARACAS，一个包括通过CAN消息进行组件控制和攻击注入能力的车辆模型。CARACAS展示了这种方法的有效性，包括一个电池电动车（BEV）模型，并专注于在两个不同场景中针对扭矩控制的攻击。

更新时间: 2024-06-11 10:16:55

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07125v1

Fast Controllable Diffusion Models for Undersampled MRI Reconstruction

Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for undersampled MRI reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.

Updated: 2024-06-11 10:15:53

标题: 快速可控扩散模型用于欠采样MRI重建

摘要: 监督深度学习方法在欠采样磁共振成像（MRI）重建中表现出潜力，但它们对配对数据的要求限制了它们对不同MRI采集参数的泛化能力。最近，无监督可控生成扩散模型已被应用于欠采样MRI重建，无需配对数据或模型重新训练以适应不同的MRI采集。然而，扩散模型通常在采样速度上较慢，最先进的加速技术在直接应用于可控生成过程时可能导致次优结果。本研究介绍了一种名为预测器-投影器-去噪器（PPN）的新算法，它增强并加速了扩散模型对欠采样MRI重建的可控生成。我们的结果表明，PPN生成符合欠采样k空间测量的高保真度MR图像，并且重建时间显著短于其他可控采样方法。此外，无监督PPN加速的扩散模型适应不同的MRI采集参数，使其比监督学习技术更适用于临床使用。

更新时间: 2024-06-11 10:15:53

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2311.12078v3

The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability

Generative diffusion models have achieved spectacular performance in many areas of machine learning and generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, variational inference and stochastic calculus, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We show that these phase-transitions are always in a mean-field universality class, as they are the result of a self-consistency condition in the generative dynamics. We argue that the critical instability that arises from the phase transitions lies at the heart of their generative capabilities, which are characterized by a set of mean-field critical exponents. Finally, we show that the dynamic equation of the generative process can be interpreted as a stochastic adiabatic transformation that minimizes the free energy while keeping the system in thermal equilibrium.

Updated: 2024-06-11 10:15:05

标题: 生成扩散模型的统计热力学：相变、对称性破缺和临界不稳定性

摘要: 生成扩散模型在许多机器学习和生成建模领域取得了惊人的表现。尽管这些模型的基本思想来自非平衡物理、变分推断和随机微积分，但本文表明，这些模型的许多方面可以用平衡统计力学的工具来理解。通过这种重新表述，我们展示了生成扩散模型经历了对称破缺现象对应的二阶相变。我们展示了这些相变总是处于平均场普适类中，因为它们是生成动态中自洽条件的结果。我们认为从相变中产生的临界不稳定性是它们生成能力的核心，这些能力通过一组平均场临界指数来表征。最后，我们展示了生成过程的动态方程可以解释为最小化自由能的随机绝热变换，同时保持系统处于热平衡状态。

更新时间: 2024-06-11 10:15:05

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.17467v3

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the minor embedding problem suffer from scalability issues when confronted with larger problem sizes. In this paper, we propose a novel approach utilizing Reinforcement Learning (RL) techniques to address the minor embedding problem, named CHARME. CHARME includes three key components: a Graph Neural Network (GNN) architecture for policy modeling, a state transition algorithm ensuring solution validity, and an order exploration strategy for effective training. Through comprehensive experiments on synthetic and real-world instances, we demonstrate that the efficiency of our proposed order exploration strategy as well as our proposed RL framework, CHARME. In details, CHARME yields superior solutions compared to fast embedding methods such as Minorminer and ATOM. Moreover, our method surpasses the OCT-based approach, known for its slower runtime but high-quality solutions, in several cases. In addition, our proposed exploration enhances the efficiency of the training of the CHARME framework by providing better solutions compared to the greedy strategy.

Updated: 2024-06-11 10:12:10

标题: CHARME：一种基于链的强化学习方法用于次嵌入问题

摘要: 量子退火（QA）在高效解决组合优化问题方面具有巨大潜力。然而，QA算法的有效性严重依赖于将问题实例（表示为逻辑图）嵌入到量子单元处理器（QPU）中，其拓扑结构是一种有限连接图，被称为次嵌入问题。现有的次嵌入问题方法在面对更大的问题规模时存在可扩展性问题。本文提出了一种利用强化学习（RL）技术解决次嵌入问题的新方法，名为CHARME。CHARME包括三个关键组件：用于策略建模的图神经网络（GNN）架构，确保解决方案有效性的状态转换算法，以及用于有效训练的顺序探索策略。通过对合成和真实世界实例的综合实验，我们展示了我们提出的顺序探索策略以及我们提出的RL框架CHARME的效率。具体来说，与Minorminer和ATOM等快速嵌入方法相比，CHARME产生了更优越的解决方案。此外，我们的方法在几种情况下超越了以OCT为基础的方法，后者以较慢的运行时但高质量的解决方案而闻名。此外，我们提出的探索方法通过提供比贪婪策略更好的解决方案，提高了CHARME框架的训练效率。

更新时间: 2024-06-11 10:12:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07124v1

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing vector quantization (VQ) methods are fixed-length encodings, overlooking the uneven information density in sign language, which leads to under-encoding of important regions and over-encoding of unimportant regions. To address this issue, we propose a novel dynamic vector quantization (DVA-VAE) model that can dynamically adjust the encoding length based on the information density in sign language to achieve accurate and compact encoding. Then, a GPT-like model learns to generate code sequences and their corresponding durations from spoken language text. Extensive experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. To promote sign language research, we propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.Experimental analysis on PHOENIX-News shows that the performance of our model can be further improved by increasing the size of the training data. Our project homepage is https://t2sgpt-demo.yinaoxiong.cn.

Updated: 2024-06-11 10:06:53

标题: T2S-GPT：用于文本自回归手语生成的动态向量量化

摘要: 在这项工作中，我们提出了一个分为两个阶段的手语生成（SLP）范式，首先将手语序列编码为离散代码，然后基于学习到的码书从文本自回归生成手语。然而，现有的矢量量化（VQ）方法是固定长度的编码，忽视了手语中信息密度不均匀的问题，导致重要区域的欠编码和不重要区域的过度编码。为了解决这个问题，我们提出了一种新颖的动态矢量量化（DVA-VAE）模型，可以根据手语中的信息密度动态调整编码长度，实现准确和紧凑的编码。然后，一个类似GPT的模型学习从口语文本生成代码序列及其对应的持续时间。对PHOENIX14T数据集进行的大量实验表明了我们提出方法的有效性。为了推动手语研究，我们提出了一个新的大规模德语手语数据集PHOENIX-News，其中包含486小时的手语视频、音频和转录文本。对PHOENIX-News的实验分析显示，通过增加训练数据的规模，我们的模型的性能可以进一步提高。我们的项目主页为https://t2sgpt-demo.yinaoxiong.cn。

更新时间: 2024-06-11 10:06:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07119v1

Augmenting Offline RL with Unlabeled Data

Recent advancements in offline Reinforcement Learning (Offline RL) have led to an increased focus on methods based on conservative policy updates to address the Out-of-Distribution (OOD) issue. These methods typically involve adding behavior regularization or modifying the critic learning objective, focusing primarily on states or actions with substantial dataset support. However, we challenge this prevailing notion by asserting that the absence of an action or state from a dataset does not necessarily imply its suboptimality. In this paper, we propose a novel approach to tackle the OOD problem. We introduce an offline RL teacher-student framework, complemented by a policy similarity measure. This framework enables the student policy to gain insights not only from the offline RL dataset but also from the knowledge transferred by a teacher policy. The teacher policy is trained using another dataset consisting of state-action pairs, which can be viewed as practical domain knowledge acquired without direct interaction with the environment. We believe this additional knowledge is key to effectively solving the OOD issue. This research represents a significant advancement in integrating a teacher-student network into the actor-critic framework, opening new avenues for studies on knowledge transfer in offline RL and effectively addressing the OOD challenge.

Updated: 2024-06-11 10:02:07

标题: 用未标记数据增强离线强化学习

摘要: 最近离线强化学习（Offline RL）方面的进展使得基于保守策略更新的方法来解决分布外（OOD）问题备受关注。这些方法通常涉及添加行为规范化或修改评论家学习目标，主要关注具有实质性数据支持的状态或行动。然而，我们挑战这一普遍观念，主张数据集中缺少的动作或状态并不一定意味着其次优。在本文中，我们提出了一种新颖的方法来解决OOD问题。我们引入了一个离线RL师生框架，并辅以策略相似度度量。该框架使得学生策略不仅可以从离线RL数据集中获得见解，还可以从师生策略传递的知识中获得见解。师生策略是使用另一个包含状态-动作对的数据集进行训练的，这可以被视为在没有直接与环境交互的情况下获得的实际领域知识。我们相信这种额外的知识对有效解决OOD问题至关重要。这项研究代表着在将师生网络整合到演员-评论家框架中取得重大进展，为离线RL中的知识转移研究开辟了新的研究途径，并有效应对OOD挑战。

更新时间: 2024-06-11 10:02:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07117v1

Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency

The growing number of cases that require digital forensic analysis raises concerns about the ability of law enforcement to conduct investigations promptly. Consequently, this paper delves into the potential and effectiveness of integrating Large Language Models (LLMs) into digital forensic investigation to address these challenges. A comprehensive literature review is carried out, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the use of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and possibilities of incorporating LLMs. In conclusion, the study asserts that the adoption of LLMs in digital forensics, with appropriate constraints, has the potential to improve investigation efficiency, improve traceability, and alleviate technical and judicial barriers faced by law enforcement entities.

Updated: 2024-06-11 10:01:05

标题: 探索大型语言模型在提高数字取证调查效率方面的潜力

摘要: 随着需要数字取证分析的案件数量不断增加，人们对执法机构能够及时进行调查的能力提出了担忧。因此，本文深入探讨了将大型语言模型（LLMs）整合到数字取证调查中以解决这些挑战的潜力和有效性。进行了全面的文献综述，涵盖了现有数字取证模型、工具、LLMs、深度学习技术以及LLMs在调查中的应用。综述确定了现有数字取证流程中的当前挑战，并探讨了整合LLMs的障碍和可能性。最后，研究指出，在数字取证中采用LLMs，并加以适当限制，有可能提高调查效率，改善可追溯性，减轻执法实体面临的技术和司法障碍。

更新时间: 2024-06-11 10:01:05

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2402.19366v2

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, which effectively improves the planning and inferencing performance of tool-augmented LLMs compared to traditional chain reasoning approaches. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT) during training, which does not fully exploit the advantages of the tree of thought. In this study, we propose an inference trajectory optimization framework based on the preference data extracted from decision trees to address this limitation. We first introduce a novel method for constructing preference data from the tree of thought, capitalizing on the failed explorations previously overlooked in the trees. Specifically, we generate an effective step-wise preference dataset, named ToolPreference, for tool use based on the ToolBench dataset. In the subsequent training phase, we first fine-tune the LLM with tool-usage expert trajectories and then use these step-wise preference pairs for direct preference optimization (DPO) to update the policy of the LLM, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.

Updated: 2024-06-11 10:00:18

标题: 推进工具增强型大型语言模型：整合推理树中的错误洞察

摘要: 工具增强的大型语言模型(LLMs)利用工具，通常以API的形式，增强其在复杂任务上的推理能力，从而扮演智能代理与现实世界互动的角色。秦等人[2024]最近推出的ToolLLaMA模型利用基于深度优先搜索的决策树(DFSDT)方法进行推理，使用$16000+个真实世界API，相对于传统的链式推理方法，有效提高了工具增强的LLMs的规划和推理性能。然而，他们的方法在训练过程中仅使用决策树(也称为推理树)中的成功路径进行监督微调(SFT)，未充分利用思维树的优势。在本研究中，我们提出了一个基于从决策树提取的偏好数据的推理轨迹优化框架，以解决这一局限性。我们首先介绍了一种新颖的方法，从思维树中构建偏好数据，利用先前在树中被忽视的失败探索。具体来说，我们生成了一组有效的逐步偏好数据集，名为ToolPreference，用于基于ToolBench数据集的工具使用。在随后的训练阶段，我们首先通过工具使用专家轨迹对LLM进行微调，然后使用这些逐步偏好对来直接偏好优化(DPO)更新LLM的策略，形成我们的ToolPrefer-LLaMA(TP-LLaMA)模型。我们的实验表明，通过从推理树中的错误中获得见解，TP-LLaMA在几乎所有测试场景中明显优于基线，并展示了更好的与未见API的泛化能力。与基线相比，TP-LLaMA还表现出更高的推理效率，使其更适用于复杂的工具使用推理任务。

更新时间: 2024-06-11 10:00:18

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07115v1

Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings

Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.

Updated: 2024-06-11 09:59:38

标题: 后验正交化以减轻胸部X射线嵌入中受保护特征偏差

摘要: 目的：分析和消除深度学习模型胸部X射线嵌入中受保护特征的影响。方法：利用正交化方法来消除受保护特征（如年龄、性别、种族）在胸部X射线嵌入中的影响，确保特征独立的结果。为了验证该方法的有效性，我们回顾性地研究了MIMIC和CheXpert数据集，使用三种预训练模型，分别是有监督对比、自监督对比和基线分类器模型。我们的统计分析涉及通过估计受保护特征影响并评估使用两种类型嵌入来预测种族、年龄或性别的能力，比较原始与正交化的嵌入。结果：我们的实验揭示了受保护特征对病理预测的显著影响。应用正交化方法消除了这些特征影响。除了消除对病理分类的任何影响，同时保持竞争性预测性能外，正交化的嵌入进一步使得直接预测受保护属性和减轻亚组差异变得不可行。结论：所呈现的工作展示了在胸部X射线图像分类领域中正交化技术的成功应用和评估。

更新时间: 2024-06-11 09:59:38

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2311.01349v2

Unlocking the Potential of the Metaverse for Innovative and Immersive Digital Care

The Metaverse, a persistent, immersive virtual environment, has the immense potential to revolutionize healthcare by transforming patient care, medical education, and research. This paper explores the applications, benefits, and challenges associated with this transformative technology, highlighting its ability to improve patient engagement, communication, access to information, and health outcomes. The paper also examines how the analysis of Metaverse data using machine learning techniques can unlock insights to further enhance healthcare applications. The discussion summarizes key findings, analyzes the significance and practical implications of Metaverse integration, and identifies areas for future research. It underscores the role of major tech companies in developing Metaverse-based solutions and the importance of addressing emerging opportunities and challenges to unlock the transformative potential of this technology in healthcare. The paper concludes by emphasizing the need for collaboration between stakeholders to ensure the ethical and effective implementation of these technologies, ultimately leading to a more accessible, personalized, and efficient healthcare system.

Updated: 2024-06-11 09:58:27

标题: 解锁元宇宙潜力，为创新和沉浸式数字护理提供可能性

摘要: 元宇宙是一个持久的、沉浸式的虚拟环境，具有革命性的潜力，可以通过改变患者护理、医学教育和研究来改变医疗保健。本文探讨了与这种变革性技术相关的应用、好处和挑战，突出了它提高患者参与度、沟通、获取信息和健康结果的能力。本文还分析了如何利用机器学习技术对元宇宙数据进行分析，以解锁进一步增强医疗应用的见解。讨论总结了关键发现，分析了元宇宙整合的重要性和实际意义，并确定了未来研究的领域。强调了主要科技公司在开发基于元宇宙的解决方案中的作用，以及解锁这一技术在医疗保健中的变革潜力的重要性。本文最后强调了各利益相关方之间合作的必要性，以确保这些技术的道德和有效实施，最终实现更具可访问性、个性化和高效的医疗保健系统。

更新时间: 2024-06-11 09:58:27

领域: cs.CY,cs.AI,cs.IR,68T01a,I.2.0; I.2.1

下载: http://arxiv.org/abs/2406.07114v1

Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream. GitHub: https://github.com/if-loops/3L-CVRP

Updated: 2024-06-11 09:57:23

标题: 使用强化学习解决三维装载容车辆路径问题

摘要: 重型货车是供应链交付系统的重要支柱，但在英国只有60%的装载效率，同时还会对碳排放做出重大贡献。协作车辆路径规划被提出作为提高效率的解决方案，但仍存在挑战使其成为可能。一个关键挑战是对于共同装载和路径规划的可行解决方案的高效计算。目前的运营研究方法在问题规模增大时存在非线性扩展，并且受限于有限的地理区域来及时计算结果以应对日常运营。这只允许路径规划中的局部最优解，并且未能触及全局优化潜力。我们开发了一个强化学习模型来解决三维装载容量车辆路径规划问题，其计算时间大致呈线性增长。虽然这个问题在运营研究中得到了广泛研究，但没有关于用强化学习解决它的出版物。我们展示了我们的强化学习模型具有良好的扩展性，并将我们的路径规划性能与最先进的方法进行了基准测试。与已建立的方法相比，该模型的表现在平均差距为3.83%至8.10%之间。我们的模型不仅代表了朝着利用强化学习进行大规模物流优化的有希望的第一步，还为这一研究领域奠定了基础。GitHub: https://github.com/if-loops/3L-CVRP

更新时间: 2024-06-11 09:57:23

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2307.12136v2

Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph

Locating objects referred to in natural language poses a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object retrieval with simple (bare) queries but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene spatial graph representation with metric edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to form 3D objects, an advanced raycasting algorithm to project them to 2D, and a vision-language model to describe them as graph nodes. On Replica and ScanNet datasets, we show that the designed method accurately constructs 3D object-centric maps. We have demonstrated that their quality takes a leading place for open-vocabulary 3D semantic segmentation against other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On Sr3D and Nr3D benchmarks, our deductive approach demonstrates a significant improvement, enabling retrieving objects by complex queries compared to other state-of-the-art methods. Considering our design solutions, we achieved a processing speed approximately x3 times faster than the closest analog. This promising performance enables our approach for usage in applied intelligent robotics projects. We make the code publicly available at linukc.github.io/bbq/.

Updated: 2024-06-11 09:57:04

标题: 超越简单查询：使用三维场景图进行开放词汇对象检索

摘要: 将自然语言中提到的对象定位是自主代理面临的一个重要挑战。现有基于CLIP的开放词汇方法成功地使用简单（裸）查询执行3D对象检索，但无法处理需要理解对象关系的模糊描述。为了解决这个问题，我们提出了一种模块化方法，称为BBQ（超越裸查询），它利用度量边构建3D场景空间图表示，并利用大型语言模型作为人到代理接口，通过我们的推理场景推理算法。BBQ利用强大的DINO支持的关联形成3D对象，采用先进的射线投射算法将它们投影到2D，并利用视觉语言模型将它们描述为图节点。在Replica和ScanNet数据集上，我们展示了设计方法准确构建3D对象为中心的地图。我们已经证明，它们的质量在开放词汇的3D语义分割中处于领先地位，比其他零样本方法更好。此外，我们表明，利用空间关系对包含多个相同语义类别实体的场景特别有效。在Sr3D和Nr3D基准上，我们的推理方法显示出显著改进，相对于其他最先进的方法，使得通过复杂查询检索对象成为可能。考虑到我们的设计解决方案，我们实现了处理速度约快于最接近的模拟品3倍的速度。这种有希望的性能使我们的方法能够在应用智能机器人项目中使用。我们将代码公开发布在linukc.github.io/bbq/。

更新时间: 2024-06-11 09:57:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07113v1

Trainwreck: A damaging adversarial attack on image classifiers

Adversarial attacks are an important security concern for computer vision (CV). As CV models are becoming increasingly valuable assets in applied practice, disrupting them is emerging as a form of economic sabotage. This paper opens up the exploration of damaging adversarial attacks (DAAs) that seek to damage target CV models. DAAs are formalized by defining the threat model, the cost function DAAs maximize, and setting three requirements for success: potency, stealth, and customizability. As a pioneer DAA, this paper proposes Trainwreck, a train-time attack that conflates the data of similar classes in the training data using stealthy ($\epsilon \leq 8/255$) class-pair universal perturbations obtained from a surrogate model. Trainwreck is a black-box, transferable attack: it requires no knowledge of the target architecture, and a single poisoned dataset degrades the performance of any model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100 and various model architectures (EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16) demonstrates Trainwreck's efficiency. Trainwreck achieves similar or better potency compared to the data poisoning state of the art and is fully customizable by the poison rate parameter. Finally, data redundancy with hashing is identified as a reliable defense against Trainwreck or similar DAAs. The code is available at https://github.com/JanZahalka/trainwreck.

Updated: 2024-06-11 09:53:51

标题: Trainwreck: 对图像分类器造成破坏性的对抗性攻击

摘要: 对抗性攻击是计算机视觉（CV）中一个重要的安全问题。随着CV模型在应用实践中变得越来越有价值，破坏它们正在成为一种经济破坏的形式。本文开展了对破坏性对抗性攻击（DAAs）的探索，旨在损害目标CV模型。DAAs通过定义威胁模型，最大化的成本函数以及设置三个成功所需的要求（效力、隐秘性和可定制性）来形式化。作为一种先驱性的DAA，本文提出了Trainwreck，一种在训练时攻击，利用从替代模型获得的隐秘（$\epsilon \leq 8/255$）类对通用扰动混淆训练数据中相似类别的数据。Trainwreck是一种黑盒、可传递的攻击：它不需要对目标架构有任何了解，一个被污染的数据集就会降低任何在其上训练的模型的性能。对CIFAR-10和CIFAR-100以及各种模型架构（EfficientNetV2、ResNeXt-101和finetuned ViT-L-16）的实验评估证明了Trainwreck的效率。Trainwreck相比于数据污染的最新技术具有类似或更好的效力，并且可以通过毒害率参数进行完全定制。最后，通过哈希技术实现数据冗余被确定为一种可靠的防御措施，可以抵御Trainwreck或类似的DAAs。代码可在https://github.com/JanZahalka/trainwreck获取。

更新时间: 2024-06-11 09:53:51

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.14772v2

Tag and correct: high precision post-editing approach to correction of speech recognition errors

This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger. The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected. This is especially crucial in production environments, where avoiding the introduction of new mistakes by the error correction model may be more important than the net gain in overall results. The results show that the performance of the proposed error correction models is comparable with previous approaches while requiring much smaller resources to train, which makes it suitable for industrial applications, where both inference latency and training times are critical factors that limit the use of other techniques.

Updated: 2024-06-11 09:52:33

标题: 标记和校正：高精度后编辑方法用于纠正语音识别错误

摘要: 本文提出了一种新的方法来通过后编辑来纠正语音识别错误的问题。该方法包括使用一个神经序列标注器，该标注器学习如何逐字地纠正ASR（自动语音识别）假设，以及一个应用标注器返回的纠正的校正模块。所提出的解决方案适用于任何ASR系统，无论其架构如何，并提供对正在纠正的错误具有高精度控制。这在生产环境中尤为关键，在这种环境中，通过错误校正模型避免引入新错误可能比整体结果的净增益更重要。结果表明，所提出的错误校正模型的性能与先前的方法相当，但训练所需的资源要小得多，这使其适用于工业应用，其中推断延迟和训练时间是限制其他技术使用的关键因素。

更新时间: 2024-06-11 09:52:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07589v1

Agnostic Sharpness-Aware Minimization

Sharpness-aware minimization (SAM) has been instrumental in improving deep neural network training by minimizing both the training loss and the sharpness of the loss landscape, leading the model into flatter minima that are associated with better generalization properties. In another aspect, Model-Agnostic Meta-Learning (MAML) is a framework designed to improve the adaptability of models. MAML optimizes a set of meta-models that are specifically tailored for quick adaptation to multiple tasks with minimal fine-tuning steps and can generalize well with limited data. In this work, we explore the connection between SAM and MAML, particularly in terms of enhancing model generalization. We introduce Agnostic-SAM, a novel approach that combines the principles of both SAM and MAML. Agnostic-SAM adapts the core idea of SAM by optimizing the model towards wider local minima using training data, while concurrently maintaining low loss values on validation data. By doing so, it seeks flatter minima that are not only robust to small perturbations but also less vulnerable to data distributional shift problems. Our experimental results demonstrate that Agnostic-SAM significantly improves generalization over baselines across a range of datasets and under challenging conditions such as noisy labels and data limitation.

Updated: 2024-06-11 09:49:00

标题: 不确定的锐度感知最小化

摘要: 锐度感知最小化（SAM）在改善深度神经网络训练方面发挥了关键作用，通过最小化训练损失和损失景观的锐度，将模型引导至更平坦的极小值，这些极小值与更好的泛化特性相关联。在另一个方面，无关模型元学习（MAML）是一个旨在提高模型适应性的框架。MAML优化了一组专门为快速适应多个任务而设计的元模型，最小化微调步骤，并且可以在有限数据下进行良好的泛化。在本研究中，我们探讨了SAM和MAML之间的联系，特别是在增强模型泛化方面。我们介绍了无关SAM，这是一种结合了SAM和MAML原则的新方法。无关SAM通过使用训练数据优化模型朝着更宽的局部极小值发展的核心思想，同时在验证数据上保持较低的损失值。通过这样做，它寻求更平坦的极小值，不仅对小扰动具有鲁棒性，而且对数据分布转移问题更不易受到影响。我们的实验结果表明，无关SAM在各种数据集和具有挑战性条件下（如嘈杂标签和数据限制）显著提高了泛化性能。

更新时间: 2024-06-11 09:49:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.07107v1

MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems.

Updated: 2024-06-11 09:42:47

标题: MR-RawNet：使用原始波形实现变长话语的多个时间分辨率的说话人验证系统

摘要: 在说话者验证系统中，利用短语言表达产生了持续的挑战，主要是由于缺乏足够的语音信息来表征说话者而导致性能下降。为了克服这一障碍，我们提出了一种新颖的结构，MR-RawNet，旨在通过使用原始波形增强说话者验证系统对可变长度言语的鲁棒性。MR-RawNet通过多分辨率特征提取器从原始波形中提取时频表示，同时优化调整时间和频谱分辨率。此外，我们应用了一个多分辨率注意力块，专注于不同和广泛的时间上下文，确保对言语长度变化的鲁棒性。在VoxCeleb1数据集上进行的实验结果表明，与其他基于原始波形的系统相比，MR-RawNet在处理可变长度言语时表现出更优异的性能。

更新时间: 2024-06-11 09:42:47

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2406.07103v1

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on standard benchmark graph datasets. Further, we exhibit that this framework can be applied in the context of bio-activity prediction in drug discovery.

Updated: 2024-06-11 09:42:03

标题: D-GRIL：具有2参数持久性的端到端拓扑学习

摘要: 利用1参数持久性进行端到端拓扑学习是众所周知的。我们展示了通过采用最近引入的基于2参数持久性的矢量化技术GRIL，可以增强这一框架。我们建立了区分GRIL产生D-GRIL的理论基础。我们展示了D-GRIL可以用于学习标准基准图数据集上的双滤波函数。此外，我们展示了这一框架可以应用于药物发现中的生物活性预测。

更新时间: 2024-06-11 09:42:03

领域: cs.LG,cs.AI,math.AT

下载: http://arxiv.org/abs/2406.07100v1

Guiding Catalogue Enrichment with User Queries

Techniques for knowledge graph (KGs) enrichment have been increasingly crucial for commercial applications that rely on evolving product catalogues. However, because of the huge search space of potential enrichment, predictions from KG completion (KGC) methods suffer from low precision, making them unreliable for real-world catalogues. Moreover, candidate facts for enrichment have varied relevance to users. While making correct predictions for incomplete triplets in KGs has been the main focus of KGC method, the relevance of when to apply such predictions has been neglected. Motivated by the product search use case, we address the angle of generating relevant completion for a catalogue using user search behaviour and the users property association with a product. In this paper, we present our intuition for identifying enrichable data points and use general-purpose KGs to show-case the performance benefits. In particular, we extract entity-predicate pairs from user queries, which are more likely to be correct and relevant, and use these pairs to guide the prediction of KGC methods. We assess our method on two popular encyclopedia KGs, DBPedia and YAGO 4. Our results from both automatic and human evaluations show that query guidance can significantly improve the correctness and relevance of prediction.

Updated: 2024-06-11 09:38:46

标题: 用用户查询指导目录丰富

摘要: 知识图谱（KGs）丰富技术对于依赖不断发展的产品目录的商业应用变得越来越关键。然而，由于潜在丰富性的巨大搜索空间，来自KG完成（KGC）方法的预测精度较低，使其在现实世界的目录中变得不可靠。此外，用于丰富的候选事实与用户的相关性各不相同。虽然在KGs中为不完整三元组做出正确预测一直是KGC方法的主要焦点，但何时应用这些预测的相关性却被忽视了。在产品搜索用例的激励下，我们关注使用用户搜索行为和用户与产品属性的关联角度来生成目录的相关完成。在本文中，我们提出了识别可丰富数据点的直觉，并使用通用KGs展示性能优势。特别地，我们从用户查询中提取实体-谓词对，这些对更可能是正确和相关的，并使用这些对来指导KGC方法的预测。我们在两个流行百科知识图谱DBPedia和YAGO 4上评估我们的方法。我们的自动和人工评估结果显示，查询引导可以显著改善预测的正确性和相关性。

更新时间: 2024-06-11 09:38:46

领域: cs.IR,cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.07098v1

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter

Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicating model reuse and slowing down inference. This work presents a new approach to fast context-biasing with CTC-based Word Spotter (CTC-WS) for CTC and Transducer (RNN-T) ASR models. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The valid candidates then replace their greedy recognition counterparts in corresponding frame intervals. A Hybrid Transducer-CTC model enables the CTC-WS application for the Transducer model. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER compared to baseline methods. The proposed method is publicly available in the NVIDIA NeMo toolkit.

Updated: 2024-06-11 09:37:52

标题: 快速上下文偏置对于基于CTC和Transducer ASR模型的CTC-based词识别器的影响

摘要: 稀有和新单词的准确识别仍然是上下文化自动语音识别（ASR）系统面临的紧迫问题。大多数上下文偏向方法涉及修改ASR模型或波束搜索解码算法，使模型复用复杂化并减慢推理速度。本研究提出了一种基于CTC的词辨别器（CTC-WS）的快速上下文偏向方法，适用于CTC和传输器（RNN-T）ASR模型。所提出的方法将CTC对数概率与紧凑的上下文图进行匹配，以检测潜在的上下文偏向候选词。有效的候选词将取代它们在相应帧间隔中的贪婪识别对应词。混合传输器-CTC模型使CTC-WS应用于传输器模型。结果表明，与基线方法相比，上下文偏向识别加速明显，同时F分数和WER得到改善。所提出的方法已在NVIDIA NeMo工具包中公开提供。

更新时间: 2024-06-11 09:37:52

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.07096v1

Data Complexity in Expressive Description Logics With Path Expressions

We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHb Self reg OIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.

Updated: 2024-06-11 09:37:51

标题: 使用路径表达式的表达式描述逻辑中的数据复杂性

摘要: 我们研究了在准森林上的非常表达性描述逻辑ZOIQ（也称为ALCHb Self reg OIQ）的可满足性问题的数据复杂性，并确定了其NP完全性。这完善了ZOIQ可判定分片的数据复杂度格局，并重新证明了关于OWL2（SR系列）可判定分片的已知结果。使用相同的技术，我们确定了在ZIQ中根查询的蕴涵问题在组合复杂度方面的coNEXPTIME完全性。

更新时间: 2024-06-11 09:37:51

领域: cs.LO,cs.AI,cs.CC

下载: http://arxiv.org/abs/2406.07095v1

Metrizing Fairness

We study supervised learning problems that have significant effects on individuals from two demographic groups, and we seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP). A predictor is SP-fair if the distributions of predictions within the two groups are close in Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we identify conditions under which hard SP constraints are guaranteed to improve predictive accuracy. We also showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators, which are constructed from random mini-batches of training samples, if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on synthetic and real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster.

Updated: 2024-06-11 09:34:06

标题: 度量公平

摘要: 我们研究了对来自两个人口群体的个体产生重大影响的监督学习问题，并寻求符合统计平等（SP）等群体公平标准的预测因子。如果预测在两个群体内的分布在Kolmogorov距离上接近，则预测是SP公平的，公平性是通过在学习问题的目标函数中惩罚这两个分布的不相似性来实现的。在本文中，我们确定了硬SP约束保证提高预测准确性的条件。我们还展示了使用除Kolmogorov距离以外的积分概率度量（IPMs）来衡量不公平性的概念和计算优势。从概念上讲，我们表明任何IPM的生成器可以被解释为一组效用函数，并且如果两个人口群体中的个体具有不同的期望效用，则相对于该IPM的不公平性会产生。我们还证明，如果不公平性由平方$\mathcal L^2$距离或平方最大均值差异度量，则不公平性正则化预测损失将产生无偏梯度估计器，这些估计器是从随机小批量训练样本构建的。在这种情况下，公平学习问题容易受到高效的随机梯度下降（SGD）算法的影响。对合成和真实数据的数值实验表明，这些SGD算法在公平学习方面优于最先进的方法，因为它们实现了更好的准确性-不公平性权衡 - 有时是数量级更快。

更新时间: 2024-06-11 09:34:06

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2205.15049v5

Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity. Despite the effectiveness, existing works still face two persistent challenges: (1) how to extract common supervision signals from popular items to improve the unpopular item representations, and (2) how to alleviate the representation separation caused by popularity bias. In this work, we conduct an empirical analysis of popularity bias and propose Popularity-Aware Alignment and Contrast (PAAC) to address two challenges. Specifically, we use the common supervisory signals modeled in popular item representations and propose a novel popularity-aware supervised alignment module to learn unpopular item representations. Additionally, we suggest re-weighting the contrastive learning loss to mitigate the representation separation from a popularity-centric perspective. Finally, we validate the effectiveness and rationale of PAAC in mitigating popularity bias through extensive experiments on three real-world datasets. Our code is available at https://github.com/miaomiao-cai2/KDD2024-PAAC.

Updated: 2024-06-11 09:29:46

标题: 考虑流行度的对齐和对比以减轻流行度偏见

摘要: 协同过滤（CF）通常面临着流行度偏见的重要挑战，这是由于现实世界数据集中物品的不均匀分布所导致的。这种偏见导致了流行物品和不受欢迎物品之间的显著准确性差距。它不仅阻碍了准确理解用户偏好，还加剧了推荐系统中的马太效应。为了缓解流行度偏见，现有的努力集中在强调不受欢迎的物品或分离物品表示与其流行度之间的相关性。尽管有效，现有作品仍然面临两个持续挑战：（1）如何从流行物品中提取常见的监督信号以改善不受欢迎的物品表示，以及（2）如何缓解由流行度偏见引起的表示分离。在本研究中，我们对流行度偏见进行了实证分析，并提出了一种名为Popularity-Aware Alignment and Contrast（PAAC）的方法来解决这两个挑战。具体来说，我们利用流行物品表示中建模的常见监督信号，并提出了一种新颖的流行度感知监督对齐模块来学习不受欢迎的物品表示。此外，我们建议重新加权对比学习损失，以缓解流行度中心的表示分离。最后，通过对三个真实世界数据集进行大量实验，我们验证了PAAC在缓解流行度偏见方面的有效性和合理性。我们的代码可在https://github.com/miaomiao-cai2/KDD2024-PAAC上找到。

更新时间: 2024-06-11 09:29:46

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.20718v2

Leveraging Large Language Models for Efficient Failure Analysis in Game Development

In games, and more generally in the field of software development, early detection of bugs is vital to maintain a high quality of the final product. Automated tests are a powerful tool that can catch a problem earlier in development by executing periodically. As an example, when new code is submitted to the code base, a new automated test verifies these changes. However, identifying the specific change responsible for a test failure becomes harder when dealing with batches of changes -- especially in the case of a large-scale project such as a AAA game, where thousands of people contribute to a single code base. This paper proposes a new approach to automatically identify which change in the code caused a test to fail. The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure. We investigate the effectiveness of our approach with quantitative and qualitative evaluations. Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year. We further evaluated our model through a user study to assess the utility and usability of the tool from a developer perspective, resulting in a significant reduction in time -- up to 60% -- spent investigating issues.

Updated: 2024-06-11 09:21:50

标题: 利用大型语言模型在游戏开发中进行高效的故障分析

摘要: 在游戏中，以及更普遍地说，在软件开发领域，早期检测错误对于保持最终产品的高质量至关重要。自动化测试是一个强大的工具，可以通过定期执行来更早地发现问题。例如，当新代码提交到代码库时，一个新的自动化测试会验证这些更改。然而，当处理一批更改时，特别是在大型项目（如AAA游戏）的情况下，识别导致测试失败的具体更改变得更加困难，因为成千上万的人为单个代码库做出贡献。本文提出了一种新方法，可以自动识别导致测试失败的代码更改。该方法利用大型语言模型（LLMs）将错误消息与导致失败的相应代码更改关联起来。我们通过定量和定性评估来研究我们方法的有效性。我们的方法在我们新创建的数据集中达到了71%的准确率，该数据集包括EA开发人员在一年内报告的问题。我们通过用户研究进一步评估了我们的模型，以评估工具在开发人员视角下的效用和可用性，从而使问题调查所需的时间显著减少，高达60%。

更新时间: 2024-06-11 09:21:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.07084v1

Occlusion-Aware Deep Convolutional Neural Network via Homogeneous Tanh-transforms for Face Parsing

Face parsing infers a pixel-wise label map for each semantic facial component. Previous methods generally work well for uncovered faces, however, they overlook facial occlusion and ignore some contextual areas outside a single face, especially when facial occlusion has become a common situation during the COVID-19 epidemic. Inspired by the lighting phenomena in everyday life, where illumination from four distinct lamps provides a more uniform distribution than a single central light source, we propose a novel homogeneous tanh-transform for image preprocessing, which is made up of four tanh-transforms. These transforms fuse the central vision and the peripheral vision together. Our proposed method addresses the dilemma of face parsing under occlusion and compresses more information from the surrounding context. Based on homogeneous tanh-transforms, we propose an occlusion-aware convolutional neural network for occluded face parsing. It combines information in both Tanh-polar space and Tanh-Cartesian space, capable of enhancing receptive fields. Furthermore, we introduce an occlusion-aware loss to focus on the boundaries of occluded regions. The network is simple, flexible, and can be trained end-to-end. To facilitate future research of occluded face parsing, we also contribute a new cleaned face parsing dataset. This dataset is manually purified from several academic or industrial datasets, including CelebAMask-HQ, Short-video Face Parsing, and the Helen dataset, and will be made public. Experiments demonstrate that our method surpasses state-of-the-art methods in face parsing under occlusion.

Updated: 2024-06-11 09:19:24

标题: 通过均匀Tanh变换的面部解析感知深度卷积神经网络

摘要: 面部解析推断出每个语义面部组件的像素级标签图。先前的方法通常对未被遮挡的面部效果良好，但是它们忽略了面部遮挡，并忽视了单个面部外一些上下文区域，尤其是在新冠疫情期间面部遮挡变得常见的情况下。受到日常生活中光照现象的启发，其中来自四个不同灯具的照明比单个中央光源提供更均匀的分布，我们提出了一种新颖的均匀tanh变换用于图像预处理，它由四个tanh变换组成。这些变换将中央视野和外围视野融合在一起。我们提出的方法解决了面部解析在遮挡情况下的困境，并从周围背景中压缩了更多信息。基于均匀tanh变换，我们提出了一种适用于遮挡面部解析的遮挡感知卷积神经网络。它结合了Tanh极坐标空间和Tanh笛卡尔空间中的信息，能够增强感知场。此外，我们引入了一种遮挡感知损失，以便专注于遮挡区域的边界。该网络简单、灵活，并且可以端到端地进行训练。为了促进遮挡面部解析的未来研究，我们还贡献了一个新的清洁面部解析数据集。该数据集是从几个学术或工业数据集中手动清理出来的，包括CelebAMask-HQ、Short-video Face Parsing和Helen数据集，并将公开发布。实验证明，我们的方法在面部解析遮挡情况下超过了现有的方法。

更新时间: 2024-06-11 09:19:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.15323v2

Efficient Mixture Learning in Black-Box Variational Inference

Mixture variational distributions in black box variational inference (BBVI) have demonstrated impressive results in challenging density estimation tasks. However, currently scaling the number of mixture components can lead to a linear increase in the number of learnable parameters and a quadratic increase in inference time due to the evaluation of the evidence lower bound (ELBO). Our two key contributions address these limitations. First, we introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE), which amortizes the mapping from input to mixture-parameter space using one-hot encodings. Fortunately, with MISVAE, each additional mixture component incurs a negligible increase in network parameters. Second, we construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time with marginal or even improved impact on performance. Collectively, our contributions enable scalability to hundreds of mixture components and provide superior estimation performance in shorter time, with fewer network parameters compared to previous Mixture VAEs. Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST. Furthermore, we empirically validate our estimators in other BBVI settings, including Bayesian phylogenetic inference, where we improve inference times for the SOTA mixture model on eight data sets.

Updated: 2024-06-11 09:16:43

标题: 黑盒变分推断中高效混合学习

摘要: 在黑盒变分推断（BBVI）中，混合变分分布已经在挑战性的密度估计任务中展现出令人印象深刻的结果。然而，目前扩展混合成分的数量可能导致可学习参数数量的线性增加和由于评估证据下界（ELBO）而导致推断时间的二次增加。我们的两个关键贡献解决了这些限制。首先，我们介绍了新颖的多重重要性采样变分自编码器（MISVAE），通过使用one-hot编码来分摊从输入到混合参数空间的映射。幸运的是，使用MISVAE，每个额外的混合成分带来的网络参数增加可以忽略不计。其次，我们构建了两个新的混合ELBO估计器，在BBVI中为混合提供了极大的推断时间减少，同时在性能上带来边际甚至有所改善。总的来说，我们的贡献使得能够扩展到数百个混合成分，并在更短的时间内提供更优越的估计性能，同时与之前的混合VAEs相比，网络参数更少。通过对MISVAE进行实验，在MNIST上取得了惊人的SOTA结果。此外，我们还在其他BBVI设置中对我们的估计器进行了实证验证，包括贝叶斯系统发生推断，在这些数据集上改善了SOTA混合模型的推断时间。

更新时间: 2024-06-11 09:16:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07083v1

Deception Analysis with Artificial Intelligence: An Interdisciplinary Perspective

Humans and machines interact more frequently than ever and our societies are becoming increasingly hybrid. A consequence of this hybridisation is the degradation of societal trust due to the prevalence of AI-enabled deception. Yet, despite our understanding of the role of trust in AI in the recent years, we still do not have a computational theory to be able to fully understand and explain the role deception plays in this context. This is a problem because while our ability to explain deception in hybrid societies is delayed, the design of AI agents may keep advancing towards fully autonomous deceptive machines, which would pose new challenges to dealing with deception. In this paper we build a timely and meaningful interdisciplinary perspective on deceptive AI and reinforce a 20 year old socio-cognitive perspective on trust and deception, by proposing the development of DAMAS -- a holistic Multi-Agent Systems (MAS) framework for the socio-cognitive modelling and analysis of deception. In a nutshell this paper covers the topic of modelling and explaining deception using AI approaches from the perspectives of Computer Science, Philosophy, Psychology, Ethics, and Intelligence Analysis.

Updated: 2024-06-11 09:06:53

标题: 使用人工智能进行欺诈分析：一种跨学科的视角

摘要: 人类和机器的互动比以往任何时候都要频繁，我们的社会变得越来越混合。这种混合化的后果是由于AI-enabled欺骗的普遍存在而导致社会信任的下降。然而，尽管我们近年来对AI在信任方面的作用有了一定的了解，我们仍然没有一个计算理论能够完全理解和解释在这种背景下欺骗所起的作用。这是一个问题，因为在我们解释混合社会中的欺骗的能力受阻的同时，AI代理的设计可能会继续朝着完全自主的欺骗性机器发展，这将给处理欺骗带来新的挑战。在本文中，我们建立了一个及时且有意义的跨学科视角，强调了20年前关于信任和欺骗的社会认知视角，提出了DAMAS的发展——一个用于社会认知建模和分析欺骗的全面多代理系统（MAS）框架。简而言之，本文从计算机科学、哲学、心理学、伦理学和情报分析的角度探讨了使用AI方法对欺骗进行建模和解释的主题。

更新时间: 2024-06-11 09:06:53

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.05724v2

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a hierarchical attention structure to effectively leverage shared and complementary features of both modalities of histology and genomics. Specifically, to mitigate unimodal bias from modality imbalance, we utilize a query-based cross-attention mechanism for prototype clustering in the pathology encoder. Our prototype assignment and modularity strategy are designed to align shared features and minimizes modality gaps. An additional registration mechanism with learnable tokens is introduced to enhance cross-modal feature integration and robustness in multimodal unified modeling. Our experiments demonstrate that our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks, underscoring its superiority in precision neuro-Oncology.

Updated: 2024-06-11 09:06:41

标题: 统一建模增强多模态学习用于精准神经肿瘤学

摘要: 多模态学习，整合组织学图像和基因组学，承诺通过在微观和分子水平上提供全面视图来增强精准肿瘤学。然而，现有方法可能无法充分建模共享或互补信息，以实现更有效的整合。在这项研究中，我们引入了一个统一建模增强多模态学习（UMEML）框架，该框架采用分层注意结构，有效利用组织学和基因组学两种模态的共享和互补特征。具体而言，为了减轻由于模态不平衡而产生的单模态偏差，我们在病理编码器中利用基于查询的交叉注意机制进行原型聚类。我们的原型分配和模块化策略旨在对齐共享特征并减少模态差距。引入了一个具有可学习标记的额外注册机制，以增强多模态统一建模中的跨模态特征整合和稳健性。我们的实验证明，我们的方法在胶质瘤诊断和预后任务中超越了先前的最新方法，突显其在精准神经肿瘤学中的优越性。

更新时间: 2024-06-11 09:06:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07078v1

On the relation between trainability and dequantization of variational quantum learning models

The quest for successful variational quantum machine learning (QML) relies on the design of suitable parametrized quantum circuits (PQCs), as analogues to neural networks in classical machine learning. Successful QML models must fulfill the properties of trainability and non-dequantization, among others. Recent works have highlighted an intricate interplay between trainability and dequantization of such models, which is still unresolved. In this work we contribute to this debate from the perspective of machine learning, proving a number of results identifying, among others when trainability and non-dequantization are not mutually exclusive. We begin by providing a number of new somewhat broader definitions of the relevant concepts, compared to what is found in other literature, which are operationally motivated, and consistent with prior art. With these precise definitions given and motivated, we then study the relation between trainability and dequantization of variational QML. Next, we also discuss the degrees of "variationalness" of QML models, where we distinguish between models like the hardware efficient ansatz and quantum kernel methods. Finally, we introduce recipes for building PQC-based QML models which are both trainable and nondequantizable, and corresponding to different degrees of variationalness. We do not address the practical utility for such models. Our work however does point toward a way forward for finding more general constructions, for which finding applications may become feasible.

Updated: 2024-06-11 08:59:20

标题: 关于可训练性与变分量子学习模型去量化之间的关系

摘要: 寻求成功的变分量子机器学习（QML）的关键在于设计适当的参数化量子电路（PQCs），类似于经典机器学习中的神经网络。成功的QML模型必须满足可训练性和非量子化等属性。最近的研究强调了这些模型的可训练性和非量子化之间错综复杂的相互作用，这仍然没有解决。在这项工作中，我们从机器学习的角度为这场辩论做出了贡献，证明了一些结果，其中包括可训练性和非量子化并不是互斥的情况。我们首先提供了一些新的相对宽泛的与其他文献中发现的相关概念相比的定义，这些定义是操作性动机的，并与先前的研究一致。在给出并激励这些精确定义后，我们研究了变分QML的可训练性和非量子化之间的关系。接下来，我们还讨论了QML模型的“变分性”程度，其中我们区分了像硬件高效参数和量子核方法等模型。最后，我们介绍了构建基于PQC的可训练和非量子化的QML模型的方法，并对不同程度的变分性进行了对应。我们没有探讨这些模型的实际效用。然而，我们的工作确实指向了一种寻找更一般构造的前进之路，这样找到应用可能会变得可行。

更新时间: 2024-06-11 08:59:20

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.07072v1

Is Stateful Fuzzing Really Challenging?

Fuzzing has been proven extremely effective in finding vulnerabilities in software. When it comes to fuzz stateless systems, analysts have no doubts about the choice to make. In fact, among the plethora of stateless fuzzers devised in the last 20 years, AFL (with its descendants AFL++ and LibAFL) stood up for its effectiveness, speed and ability to find bugs. On the other hand, when dealing with stateful systems, it is not clear what is the best tool to use. In fact, the research community struggles to devise (and benchmark) effective and generic stateful fuzzers. In this short paper, we discuss the reasons that make stateful fuzzers difficult to devise and benchmark.

Updated: 2024-06-11 08:58:59

标题: 有状态模糊测试真的具有挑战性吗？

摘要: Fuzzing已被证明在发现软件漏洞方面非常有效。当涉及到对无状态系统进行模糊测试时，分析人员毫无疑问地选择了AFL（及其后继者AFL++和LibAFL）。事实上，在过去20年中设计的众多无状态模糊测试工具中，AFL以其有效性、速度和发现漏洞的能力脱颖而出。然而，在处理有状态系统时，目前并不清楚使用哪种最佳工具。事实上，研究界正努力设计（和评估）有效且通用的有状态模糊测试工具。在本短文中，我们讨论了使有状态模糊测试工具难以设计和评估的原因。

更新时间: 2024-06-11 08:58:59

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2406.07071v1

TIM: Temporal Interaction Model in Notification System

Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patterns. Additionally, these efforts only focus on individual notifications, and there is a lack of studies on optimizing the holistic timing of multiple notifications within a period. To bridge these gaps, we propose the Temporal Interaction Model (TIM), which models users' behavior patterns by estimating CTR in every time slot over a day in our short video application Kuaishou. TIM leverages long-term user historical interaction sequence features such as notification receipts, clicks, watch time and effective views, and employs a temporal attention unit (TAU) to extract user behavior patterns. Moreover, we provide an elegant strategy of holistic notifications send time control to improve user engagement while minimizing disruption. We evaluate the effectiveness of TIM through offline experiments and online A/B tests. The results indicate that TIM is a reliable tool for forecasting user behavior, leading to a remarkable enhancement in user engagement without causing undue disturbance.

Updated: 2024-06-11 08:53:15

标题: TIM：通知系统中的时间交互模型

摘要: 现代移动应用程序在获取每日活跃用户和增强用户参与度方面严重依赖通知系统。系统需要能主动地接触用户，决定何时向用户发送通知。尽管许多研究人员研究了优化通知发送时间的方法，但他们只利用用户的情境特征，而没有建立用户行为模式。此外，这些努力仅关注单个通知，并缺乏关于在一段时间内优化多个通知的研究。为填补这些差距，我们提出了时间互动模型（TIM），该模型通过在我们的短视频应用快手中估计每天每个时间段的点击率来建立用户的行为模式。TIM利用长期用户历史互动序列特征，如通知接收、点击、观看时间和有效查看，并采用时间注意力单元（TAU）来提取用户行为模式。此外，我们提供了一种优雅的全面通知发送时间控制策略，以提高用户参与度同时最小化干扰。我们通过离线实验和在线A/B测试评估了TIM的有效性。结果表明，TIM是一种可靠的预测用户行为的工具，显著提高了用户参与度，而不会造成不必要的干扰。

更新时间: 2024-06-11 08:53:15

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.07067v1

Reconstructing the Tropical Pacific Upper Ocean using Online Data Assimilation with a Deep Learning model

A deep learning (DL) model, based on a transformer architecture, is trained on a climate-model dataset and compared with a standard linear inverse model (LIM) in the tropical Pacific. We show that the DL model produces more accurate forecasts compared to the LIM when tested on a reanalysis dataset. We then assess the ability of an ensemble Kalman filter to reconstruct the monthly-averaged upper ocean from a noisy set of 24 sea-surface temperature observations designed to mimic existing coral proxy measurements, and compare results for the DL model and LIM. Due to signal damping in the DL model, we implement a novel inflation technique by adding noise from hindcast experiments. Results show that assimilating observations with the DL model yields better reconstructions than the LIM for observation averaging times ranging from one month to one year. The improved reconstruction is due to the enhanced predictive capabilities of the DL model, which map the memory of past observations to future assimilation times.

Updated: 2024-06-11 08:45:41

标题: 利用深度学习模型进行在线数据同化重建热带太平洋上层海洋

摘要: 基于Transformer架构的深度学习（DL）模型在一个气候模型数据集上进行训练，并与热带太平洋中的标准线性逆模型（LIM）进行比较。我们表明，与LIM相比，DL模型在再分析数据集上的测试中产生更准确的预测。然后，我们评估了集合卡尔曼滤波器从一个嘈杂的24个海表温度观测组成的数据集中重建月平均上层海洋的能力，这些观测被设计成模拟现有珊瑚代理测量，并比较了DL模型和LIM的结果。由于DL模型中的信号阻尼，我们通过从回报实验中添加噪声来实施一种新颖的膨胀技术。结果表明，使用DL模型同化观测比LIM在观测平均时间从一个月到一年不等的情况下产生更好的重建。改善的重建是由于DL模型的增强预测能力，它将过去观测的记忆映射到未来同化时间。

更新时间: 2024-06-11 08:45:41

领域: physics.ao-ph,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2406.07063v1

A Micro Architectural Events Aware Real-Time Embedded System Fault Injector

In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES.

Updated: 2024-06-11 08:44:00

标题: 一个微体系结构事件感知的实时嵌入式系统故障注入器

摘要: 在当代，系统复杂性的增加给SACRES的可靠性、可信度和安全性带来了重大挑战。关键问题包括易受瞬态电压波动、电磁干扰、中子打击和超出温度范围等现象的影响。这些因素可能导致晶体管的开关状态改变，导致位翻转、软错误和存储数据的暂时损坏。软错误的发生可能导致系统故障，将系统推向危险状态。特别是在汽车、航空电子或航空航天等关键领域，此类故障可能产生现实世界的影响，潜在地对个人造成伤害。本文介绍了一种新型故障注入器，旨在促进微体系结构事件的监测、聚合和检查。这是通过利用微处理器的PMU和调试接口实现的，特别关注确保故障注入的可重复性。故障注入方法针对内存系统中的位翻转，影响CPU寄存器和RAM。这些故障注入的结果使得对软错误影响的彻底分析成为可能，并建立了被SACRES要求的关键时序可预测性之间的强大相关性。

更新时间: 2024-06-11 08:44:00

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2401.08397v2

Reading Miscue Detection in Primary School through Automatic Speech Recognition

Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how efficiently state-of-the-art (SOTA) pretrained ASR models recognize Dutch native children speech and manage to detect reading miscues. We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%). Our findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for reading miscue detection. Specifically, Wav2Vec2 Large shows the highest recall at 0.83, whereas Whisper exhibits the highest precision at 0.52 and an F1 score of 0.52.

Updated: 2024-06-11 08:41:21

标题: 通过自动语音识别技术检测小学阅读错误

摘要: 自动阅读诊断系统可以使教师更有效地评分阅读练习，并使学生更容易地访问带有反馈的阅读练习。然而，在英语以外的语言中，关于儿童语音的自动语音识别（ASR）以及基于ASR的阅读诊断系统的研究有限。本研究调查了最先进（SOTA）预训练ASR模型如何有效地识别荷兰母语儿童的语音，并能够检测阅读错误。我们发现，Hubert Large在荷兰语音上微调后实现了SOTA音素级别的儿童语音识别（PER为23.1％），而Whisper（更快的Whisper Large-v2）实现了SOTA单词级性能（WER为9.8％）。我们的发现表明，Wav2Vec2 Large和Whisper是用于阅读错误检测的两种最佳ASR模型。具体而言，Wav2Vec2 Large显示出最高的召回率为0.83，而Whisper表现出最高的精确度为0.52，F1分数为0.52。

更新时间: 2024-06-11 08:41:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07060v1

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks, highlighting the complexities introduced by the multimodality and underscoring the necessity for advanced methodologies to enhance their reliability. For instance, typical proprietary models still struggle with the perception of visually confusing images and are vulnerable to multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to disclose privacy in text and reveal ideological and cultural biases even when paired with irrelevant images in inference, indicating that the multimodality amplifies the internal risks from base LLMs. Additionally, we release a scalable toolbox for standardized trustworthiness research, aiming to facilitate future advancements in this important field. Code and resources are publicly available at: https://multi-trust.github.io/.

Updated: 2024-06-11 08:38:13

标题: 多模态大型语言模型的信誉度基准测试：一项综合研究

摘要: 尽管多模态大型语言模型（MLLMs）在各种任务中具有出色的能力，但它们仍面临着重要的信任挑战。然而，目前关于可信MLLMs评估的文献仍然有限，缺乏全面评估以提供深入洞察未来改进。在这项工作中，我们建立了MultiTrust，这是关于MLLMs信任度的第一个全面统一基准，涵盖了真实性、安全性、稳健性、公平性和隐私五个主要方面。我们的基准采用了严格的评估策略，既涵盖了多模态风险，又涵盖了交叉模态影响，包括32个多样化任务和自行策划的数据集。对21个现代MLLMs进行了大量实验，揭示了一些以前未曾探索的信任度问题和风险，突显了多模态引入的复杂性，并强调了增强它们可靠性的先进方法的必要性。例如，典型的专有模型仍然在感知视觉混乱的图像方面存在困难，并容易受到多模态越狱和对抗性攻击的影响；MLLMs更倾向于在文本中泄露隐私，并在推断中透露意识形态和文化偏见，即使与无关的图像配对，也表明多模态放大了基础LLMs的内部风险。此外，我们发布了一个可扩展的工具箱，用于标准化的信任度研究，旨在促进这一重要领域的未来进步。代码和资源可在以下网址公开获取：https://multi-trust.github.io/。

更新时间: 2024-06-11 08:38:13

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.07057v1

Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks

In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over high-degree graphs (HDGs), wherein most nodes have dozens (or even hundreds) of neighbors, such as social networks, transaction graphs, power grids, etc. Additionally, such graphs usually encompass rich and complex structure semantics, which are hard to capture merely by feature aggregations in GNNs. Motivated by the above limitations, we propose TADA, an efficient and effective front-mounted data augmentation framework for GNNs on HDGs. Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. The former obtains augmented node features and enhanced model capacity by encoding the graph structure into high-quality structure embeddings with our highly-efficient sketching method. Further, by exploiting task-relevant features extracted from graph structures and attributes, the second module enables the accurate identification and reduction of numerous redundant/noisy edges from the input graph, thereby alleviating over-smoothing and facilitating faster feature aggregations over HDGs. Empirically, TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification, while achieving efficient training and inference processes.

Updated: 2024-06-11 08:36:37

标题: 高度图神经网络的高效拓扑感知数据增强

摘要: 最近几年，图神经网络（GNNs）已经成为一种在图结构化数据上学习的强大工具，并在各个领域取得了丰硕的成就。大多数GNNs遵循消息传递范式，其中每个节点的表示通过递归地聚合其邻居的特征来学习。然而，这种机制在高度图（HDGs）上存在严重的过度平滑和效率问题，其中大多数节点拥有数十甚至数百个邻居，例如社交网络、交易图、电网等。此外，这种图通常包含丰富和复杂的结构语义，仅通过GNNs中的特征聚合难以捕捉。受上述限制的启发，我们提出了TADA，一个适用于HDGs上GNNs的高效有效的前置数据增强框架。在底层，TADA包括两个关键模块：（i）结构嵌入的特征扩展，和（ii）拓扑和属性感知的图稀疏化。前者通过将图结构编码为高质量的结构嵌入来获得增强的节点特征和增强的模型容量，从而通过我们高效的草图方法。此外，通过利用从图结构和属性中提取的与任务相关的特征，第二个模块实现了准确识别和减少输入图中众多冗余/噪声边的能力，从而缓解过度平滑并促进在HDGs上更快的特征聚合。经验表明，TADA在节点分类方面显著提高了主流GNN模型在8个实际同构/异构HDGs上的预测性能，同时实现了高效的训练和推断过程。

更新时间: 2024-06-11 08:36:37

领域: cs.LG

下载: http://arxiv.org/abs/2406.05482v2

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could be further enhanced by leveraging the capabilities of LLMs themselves. In this paper, we propose CoEvol, an LLM-based multi-agent cooperation framework for the improvement of responses to instructions. To effectively refine the responses, we develop an iterative framework following a debate-advise-edit-judge paradigm. A two-stage multi-agent debate strategy is further devised to ensure the diversity and reliability of editing suggestions within the framework. Empirically, models equipped with CoEvol outperform competitive baselines evaluated by MT-Bench and AlpacaEval, demonstrating its effectiveness in enhancing instruction-following capabilities for LLMs.

Updated: 2024-06-11 08:35:37

标题: CoEvol：通过多智能体合作构建更好的响应，用于指导微调

摘要: 近年来，对大型语言模型（LLMs）进行指导微调（IFT）已经引起了相当大的关注，以增强模型在未知任务上的性能。已经尝试自动构建和有效选择IFT数据。然而，我们认为先前的方法并没有充分利用LLMs增强数据质量的潜力。通过利用LLMs本身的能力，IFT数据中的响应可以进一步增强。在本文中，我们提出了CoEvol，这是一个基于LLM的多智能体合作框架，用于改进对指令的响应。为了有效地完善响应，我们制定了一个遵循辩论-建议-编辑-评判范式的迭代框架。进一步设计了一个两阶段的多智能体辩论策略，以确保框架内编辑建议的多样性和可靠性。经验上，配备CoEvol的模型在MT-Bench和AlpacaEval评估中优于竞争基线，证明了其在增强LLMs的遵循指令能力方面的有效性。

更新时间: 2024-06-11 08:35:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07054v1

TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs

Large Language Models (LLMs) have immense potential to transform the telecommunications industry. They could help professionals understand complex standards, generate code, and accelerate development. However, traditional LLMs struggle with the precision and source verification essential for telecom work. To address this, specialized LLM-based solutions tailored to telecommunication standards are needed. Retrieval-augmented generation (RAG) offers a way to create precise, fact-based answers. This paper proposes TelecomRAG, a framework for a Telecommunication Standards Assistant that provides accurate, detailed, and verifiable responses. Our implementation, using a knowledge base built from 3GPP Release 16 and Release 18 specification documents, demonstrates how this assistant surpasses generic LLMs, offering superior accuracy, technical depth, and verifiability, and thus significant value to the telecommunications field.

Updated: 2024-06-11 08:35:23

标题: TelecomRAG：利用检索增强生成和LLMs驯服电信标准

摘要: 大型语言模型（LLMs）具有巨大的潜力来改变电信行业。它们可以帮助专业人士理解复杂的标准，生成代码并加速开发。然而，传统的LLMs在电信工作中关键的准确性和源验证方面存在困难。为了解决这个问题，需要定制的基于LLM的专门解决方案来满足电信标准的需求。检索增强生成（RAG）提供了一种创建准确、基于事实的答案的方法。本文提出了TelecomRAG，一个用于提供准确、详细和可验证响应的电信标准助手的框架。我们的实现使用从3GPP Release 16和Release 18规范文档构建的知识库，演示了这个助手如何超越通用LLMs，提供更高的准确性、技术深度和可验证性，从而为电信领域带来显著价值。

更新时间: 2024-06-11 08:35:23

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2406.07053v1

GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired Framework

Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Recent neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a **Grid**-cell inspired **Positional Encoding** technique, termed **GridPE**, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.

Updated: 2024-06-11 08:25:11

标题: GridPE：使用受网格细胞启发的框架统一Transformer中的位置编码

摘要: 理解空间位置和关系是现代人工智能系统的基本能力。人类空间认知的见解为这一领域提供了宝贵的指导。最近的神经科学发现强调了网格细胞作为空间表征的基本神经成分的作用，包括距离计算、路径积分和尺度辨别。在本文中，我们介绍了一种受傅立叶分析和计算神经科学关于网格细胞的最新发现启发的新颖位置编码方案。假设网格细胞通过傅立叶基函数的求和来编码空间位置，我们展示了在内积计算过程中网格表征的平移不变性。此外，我们根据生物效率原则推导出多维欧几里德空间的最佳网格比例。利用这些计算原则，我们开发了一种受网格细胞启发的位置编码技术，称为GridPE，用于在高维空间中编码位置。我们将GridPE整合到金字塔视觉变换器架构中。我们的理论分析表明，GridPE为任意高维空间中的位置编码提供了一个统一框架。实验结果表明，GridPE显著提升了变压器的性能，强调了将神经科学见解纳入人工智能系统设计的重要性。

更新时间: 2024-06-11 08:25:11

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2406.07049v1

Adversarial flows: A gradient flow characterization of adversarial attacks

A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method and its iterative variant. In this paper, we interpret this method as an explicit Euler discretization of a differential inclusion, where we also show convergence of the discretization to the associated gradient flow. To do so, we consider the concept of p-curves of maximal slope in the case $p=\infty$. We prove existence of $\infty$-curves of maximum slope and derive an alternative characterization via differential inclusions. Furthermore, we also consider Wasserstein gradient flows for potential energies, where we show that curves in the Wasserstein space can be characterized by a representing measure on the space of curves in the underlying Banach space, which fulfill the differential inclusion. The application of our theory to the finite-dimensional setting is twofold: On the one hand, we show that a whole class of normalized gradient descent methods (in particular signed gradient descent) converge, up to subsequences, to the flow, when sending the step size to zero. On the other hand, in the distributional setting, we show that the inner optimization task of adversarial training objective can be characterized via $\infty$-curves of maximum slope on an appropriate optimal transport space.

Updated: 2024-06-11 08:20:26

标题: 对抗流：对抗攻击的梯度流特征化

摘要: 一种在神经网络上进行对抗性攻击的流行方法是所谓的快速梯度符号方法及其迭代变体。在本文中，我们将这种方法解释为微分包含的显式欧拉离散化，同时展示了离散化收敛到相关梯度流。为此，我们考虑在$p=\infty$情况下的最大斜率p-曲线的概念。我们证明了最大斜率的$\infty$-曲线的存在，并通过微分包含导出了一个替代特征。此外，我们还考虑了势能的Wasserstein梯度流，展示了Wasserstein空间中的曲线可以通过满足微分包含的底层Banach空间中曲线空间上的代表性测度来表征。我们将我们的理论应用于有限维设置有两方面：一方面，我们证明了一整类归一化梯度下降方法（特别是符号梯度下降）在将步长趋近于零时，收敛到流，直至子序列。另一方面，在分布设置中，我们展示了对抗性训练目标的内部优化任务可以通过适当的最优传输空间上的最大斜率的$\infty$-曲线来表征。

更新时间: 2024-06-11 08:20:26

领域: cs.LG,math.AP,49Q20, 34A60, 68Q32, 65K15

下载: http://arxiv.org/abs/2406.05376v2

DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Personalised discount codes provide a powerful mechanism for managing customer relationships and operational spend in e-commerce. Bandits are well suited for this product area, given the partial information nature of the problem, as well as the need for adaptation to the changing business environment. Here, we introduce DISCO, an end-to-end contextual bandit framework for personalised discount code allocation at ASOS. DISCO adapts the traditional Thompson Sampling algorithm by integrating it within an integer program, thereby allowing for operational cost control. Because bandit learning is often worse with high dimensional actions, we focused on building low dimensional action and context representations that were nonetheless capable of good accuracy. Additionally, we sought to build a model that preserved the relationship between price and sales, in which customers increasing their purchasing in response to lower prices ("negative price elasticity"). These aims were achieved by using radial basis functions to represent the continuous (i.e. infinite armed) action space, in combination with context embeddings extracted from a neural network. These feature representations were used within a Thompson Sampling framework to facilitate exploration, and further integrated with an integer program to allocate discount codes across ASOS's customer base. These modelling decisions result in a reward model that (a) enables pooled learning across similar actions, (b) is highly accurate, including in extrapolation, and (c) preserves the expected negative price elasticity. Through offline analysis, we show that DISCO is able to effectively enact exploration and improves its performance over time, despite the global constraint. Finally, we subjected DISCO to a rigorous online A/B test, and find that it achieves a significant improvement of >1% in average basket value, relative to the legacy systems.

Updated: 2024-06-11 08:16:34

标题: DISCO: 一种用于个性化折扣分配的端到端强盗框架

摘要: 个性化折扣代码为电子商务中管理客户关系和运营支出提供了强大的机制。考虑到问题的部分信息性质以及对不断变化的业务环境的适应性需求，赌博算法非常适合这个产品领域。在这里，我们介绍了DISCO，这是一个用于ASOS个性化折扣代码分配的端到端上下文赌博框架。DISCO通过将传统的汤普森抽样算法整合到整数规划中，从而实现了对运营成本的控制。由于高维动作通常会导致赌博学习效果较差，我们专注于构建低维度动作和上下文表示，尽管如此，这些表示仍能够具有良好的准确性。此外，我们还努力构建了一个能够保持价格与销售之间关系的模型，其中顾客对降价增加购买行为（“负价格弹性”）。通过使用径向基函数来表示连续（即无限武装）动作空间，结合从神经网络中提取的上下文嵌入，实现了这些目标。这些特征表示被用于汤普森抽样框架中以促进探索，并进一步与整数规划集成，以在ASOS的客户群体中分配折扣代码。这些建模决策导致了一个奖励模型，使得（a）在类似的动作之间实现汇集学习，（b）高度准确，包括在外推中，以及（c）保持预期的负价格弹性。通过离线分析，我们展示了DISCO能够有效执行探索，并随着时间的推移提高性能，尽管存在全局约束。最后，我们对DISCO进行了严格的在线A/B测试，并发现相对于传统系统，它在平均购物篮价值上实现了>1%的显著提高。

更新时间: 2024-06-11 08:16:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06433v2

Computing $\varphi(N)$ for an RSA module with a single quantum query

In this paper we give a polynomial time algorithm to compute $\varphi(N)$ for an RSA module $N$ using as input the order modulo $N$ of a randomly chosen integer. The algorithm consists only on a computation of a greatest common divisor, two multiplications and a division. The algorithm works with a probability of at least $1-\frac{C\log\log N}{N^{1/2}}$.

Updated: 2024-06-11 08:13:09

标题: 使用单个量子查询计算RSA模块的$\varphi(N)$值

摘要: 在这篇论文中，我们提出了一个多项式时间算法，用于计算RSA模数$N$的欧拉函数$\varphi(N)$，输入是一个随机选择的整数的模$N$的阶。该算法仅包括计算最大公约数、两次乘法和一次除法。该算法的工作概率至少为$1-\frac{C\log\log N}{N^{1/2}}$。

更新时间: 2024-06-11 08:13:09

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2406.04061v2

A Latent Space Metric for Enhancing Prediction Confidence in Earth Observation Data

This study presents a new approach for estimating confidence in machine learning model predictions, specifically in regression tasks utilizing Earth Observation (EO) data, with a particular focus on mosquito abundance (MA) estimation. We take advantage of a Variational AutoEncoder architecture, to derive a confidence metric by the latent space representations of EO datasets. This methodology is pivotal in establishing a correlation between the Euclidean distance in latent representations and the Absolute Error (AE) in individual MA predictions. Our research focuses on EO datasets from the Veneto region in Italy and the Upper Rhine Valley in Germany, targeting areas significantly affected by mosquito populations. A key finding is a notable correlation of 0.46 between the AE of MA predictions and the proposed confidence metric. This correlation signifies a robust, new metric for quantifying the reliability and enhancing the trustworthiness of the AI model's predictions in the context of both EO data analysis and mosquito abundance studies.

Updated: 2024-06-11 08:00:22

标题: 一个潜在空间度量用于提高地球观测数据预测置信度

摘要: 本研究提出了一种新的方法，用于估计机器学习模型预测的置信度，特别是在利用地球观测（EO）数据进行回归任务时，重点关注蚊子数量（MA）的估计。我们利用变分自动编码器架构，通过EO数据集的潜在空间表示来导出置信度指标。这种方法对于建立潜在表示中的欧几里得距离与个体MA预测的绝对误差（AE）之间的相关性至关重要。我们的研究聚焦于意大利威尼托地区和德国上莱茵河谷的EO数据集，针对蚊子种群显著受影响的地区。一个关键发现是MA预测的AE和提出的置信度指标之间的显著相关性为0.46。这种相关性意味着在EO数据分析和蚊子数量研究的背景下，量化可靠性和增强AI模型预测的可信度的一种稳健的新指标。

更新时间: 2024-06-11 08:00:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.17342v2

Integrating Domain Knowledge for handling Limited Data in Offline RL

With the ability to learn from static datasets, Offline Reinforcement Learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform sub-optimally when confronted with limited data confined to specific regions within the state space. The performance degradation is attributed to the inability of offline RL algorithms to learn appropriate actions for rare or unseen observations. This paper proposes a novel domain knowledge-based regularization technique and adaptively refines the initial domain knowledge to considerably boost performance in limited data with partially omitted states. The key insight is that the regularization term mitigates erroneous actions for sparse samples and unobserved states covered by domain knowledge. Empirical evaluations on standard discrete environment datasets demonstrate a substantial average performance increase of at least 27% compared to existing offline RL algorithms operating on limited data.

Updated: 2024-06-11 07:59:17

标题: 整合领域知识以处理离线RL中的有限数据

摘要: 随着能够从静态数据集中学习的能力，离线强化学习（RL）作为现实世界应用的一个引人注目的途径出现。然而，最先进的离线RL算法在面对限制在状态空间特定区域的有限数据时表现不佳。性能下降归因于离线RL算法无法为稀有或未见过的观察学习适当的动作。本文提出了一种基于领域知识的新颖的正则化技术，并自适应地优化初始领域知识，以显著提升在部分省略状态的有限数据中的性能。关键洞察是正则化项可以减轻稀疏样本和领域知识覆盖的未观察到的状态的错误动作。对标准离散环境数据集的实证评估表明，与现有离线RL算法相比，该方法在操作有限数据时至少实现了27%的平均性能提升。

更新时间: 2024-06-11 07:59:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07041v1

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then theoretically prove that in a non-linear regression setting with overfitting algorithms, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Interestingly, our findings indicate that discretizing the data might enhance the algorithm's security. Specifically, it is demonstrated to be limited by a constant, which quantifies the diversity of the underlying data distribution. We illustrate those results through two simple simulations.

Updated: 2024-06-11 07:51:48

标题: 机器学习模型的成员推断攻击的基本限制

摘要: 成员推断攻击（MIA）可以揭示特定数据点是否是训练数据集的一部分，可能暴露个人的敏感信息。本文通过探讨与机器学习模型上MIA相关的基本统计限制，提供了理论保证。更具体地，我们首先推导出统计量，该统计量控制了此类攻击的有效性和成功性。然后我们在非线性回归设置中证明，在过拟合算法中，攻击可能具有很高的成功概率。最后，我们调查了几种情况，并对这一感兴趣的数量提供了界限。有趣的是，我们的发现表明，对数据进行离散化可能会增强算法的安全性。具体而言，我们证明其受到一个常数的限制，该常数量化了底层数据分布的多样性。我们通过两个简单的模拟来说明这些结果。

更新时间: 2024-06-11 07:51:48

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.13786v4

Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfaithful translations. To address this issue, we propose to encourage LLMs to pay more attention to the source context from both source and target perspectives in zeroshot prompting: 1) adjust source context attention weights; 2) suppress irrelevant target prefix influence; Additionally, we propose 3) avoiding over-reliance on the target prefix in instruction tuning. Experimental results from both human-collected unfaithfulness test sets focusing on LLM-generated unfaithful translations and general test sets, verify our methods' effectiveness across multiple language pairs. Further human evaluation shows our method's efficacy in reducing hallucinatory translations and facilitating faithful translation generation.

Updated: 2024-06-11 07:49:04

标题: 更加关注来源语境：减轻大型语言模型不忠实翻译

摘要: 大型语言模型（LLMs）展示了令人印象深刻的多语言机器翻译能力。然而，与编码器-解码器风格模型不同，仅解码器LLMs缺乏源语言和目标语言上下文之间的明确对齐。分析生成过程中的贡献分数表明，LLMs可能偏向于先前生成的标记而不是对应的源标记，导致不忠实的翻译。为了解决这个问题，我们提出在零样本提示中鼓励LLMs从源和目标两个角度更多地关注源上下文：1）调整源上下文注意权重；2）抑制无关的目标前缀影响；此外，我们提出避免在指令调整中过度依赖目标前缀。人类收集的不忠实性测试集和通用测试集的实验结果验证了我们的方法在多种语言对上的有效性。进一步的人类评估显示我们的方法在减少产生幻觉翻译和促进忠实翻译生成方面的功效。

更新时间: 2024-06-11 07:49:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07036v1

Improving Multi-hop Logical Reasoning in Knowledge Graphs with Context-Aware Query Representation Learning

Multi-hop logical reasoning on knowledge graphs is a pivotal task in natural language processing, with numerous approaches aiming to answer First-Order Logic (FOL) queries. Recent geometry (e.g., box, cone) and probability (e.g., beta distribution)-based methodologies have effectively addressed complex FOL queries. However, a common challenge across these methods lies in determining accurate geometric bounds or probability parameters for these queries. The challenge arises because existing methods rely on linear sequential operations within their computation graphs, overlooking the logical structure of the query and the relation-induced information that can be gleaned from the relations of the query, which we call the context of the query. To address the problem, we propose a model-agnostic methodology that enhances the effectiveness of existing multi-hop logical reasoning approaches by fully integrating the context of the FOL query graph. Our approach distinctively discerns (1) the structural context inherent to the query structure and (2) the relation-induced context unique to each node in the query graph as delineated in the corresponding knowledge graph. This dual-context paradigm helps nodes within a query graph attain refined internal representations throughout the multi-hop reasoning steps. Through experiments on two datasets, our method consistently enhances the three multi-hop reasoning foundation models, achieving performance improvements of up to 19.5%. Our code is available at https://github.com/kjh9503/caqr.

Updated: 2024-06-11 07:48:20

标题: 使用上下文感知查询表示学习改进知识图中的多跳逻辑推理

摘要: 在知识图谱上进行多跳逻辑推理是自然语言处理中的一个关键任务，有许多方法旨在回答一阶逻辑（FOL）查询。最近基于几何（例如，盒子，锥体）和概率（例如，贝塔分布）的方法有效地解决了复杂的FOL查询。然而，这些方法之间的一个共同挑战在于确定这些查询的准确几何界限或概率参数。这一挑战的原因是现有方法依赖于其计算图中的线性顺序操作，忽视了查询的逻辑结构和可以从查询的关系中获取的关系诱导信息，我们称之为查询的上下文。为了解决这个问题，我们提出了一种模型无关的方法，通过充分整合FOL查询图的上下文来增强现有的多跳逻辑推理方法的有效性。我们的方法独特地区分了查询结构固有的结构上下文和每个节点在查询图中的关系诱导上下文，在相应的知识图谱中描述。这种双重上下文范式帮助查询图中的节点在多跳推理步骤中获得精细的内部表示。通过在两个数据集上的实验，我们的方法持续增强了三种多跳推理基础模型，性能提高了高达19.5%。我们的代码可在https://github.com/kjh9503/caqr找到。

更新时间: 2024-06-11 07:48:20

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07034v1

Effectiveness Assessment of Recent Large Vision-Language Models

The advent of large vision-language models (LVLMs) represents a remarkable advance in the quest for artificial general intelligence. However, the model's effectiveness in both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of these novel models. To gauge their effectiveness in specialized tasks, we employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial. These six tasks include salient/camouflaged/transparent object detection, as well as polyp detection, skin lesion detection, and industrial anomaly detection. We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks. Moreover, we conduct empirical investigations utilizing the aforementioned LVLMs together with GPT-4V, assessing their multi-modal understanding capabilities in general tasks including object counting, absurd question answering, affordance reasoning, attribute recognition, and spatial relation reasoning. Our investigations reveal that these LVLMs demonstrate limited proficiency not only in specialized tasks but also in general tasks. We delve deep into this inadequacy and uncover several potential factors, including limited cognition in specialized tasks, object hallucination, text-to-image interference, and decreased robustness in complex problems. We hope that this study can provide useful insights for the future development of LVLMs, helping researchers improve LVLMs for both general and specialized applications.

Updated: 2024-06-11 07:42:51

标题: 最近大型视觉-语言模型的有效性评估

摘要: 大型视觉语言模型（LVLMs）的出现代表了人工通用智能追求中的一次显著进步。然而，该模型在专业和通用任务中的有效性需要进一步研究。本文致力于评估流行的LVLMs在专业和通用任务中的能力，旨在提供对这些新模型的全面理解。为了衡量它们在专业任务中的有效性，我们在三种不同的应用场景中采用了六个具有挑战性的任务：自然、医疗保健和工业。这六个任务包括显著/伪装/透明物体检测，以及息肉检测、皮肤病变检测和工业异常检测。我们对三种最近的开源LVLMs，包括MiniGPT-v2、LLaVA-1.5和Shikra，在这些任务中的视觉识别和定位性能进行了检查。此外，我们进行实证调查，利用上述LVLMs以及GPT-4V，评估它们在包括对象计数、荒谬问题回答、可负担性推理、属性识别和空间关系推理在内的通用任务中的多模态理解能力。我们的调查表明，这些LVLMs不仅在专业任务中表现出有限的熟练度，而且在通用任务中也是如此。我们深入探讨了这种不足，并发现了几个潜在因素，包括在专业任务中的认知能力有限、物体幻觉、文本到图像的干扰以及在复杂问题中的稳健性降低。我们希望这项研究能为未来LVLMs的发展提供有用的见解，帮助研究人员改进LVLMs，使其在通用和专业应用中都能取得进步。

更新时间: 2024-06-11 07:42:51

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.04306v4

Fairness-Aware Meta-Learning via Nash Bargaining

To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.

Updated: 2024-06-11 07:34:15

标题: 公平感知元学习的纳什议价方法

摘要: 为了解决机器学习中群体级公平性的问题，调整模型参数以基于敏感属性验证集上的特定公平目标是很自然的。这样的调整过程可以在元学习框架中实现。然而，通过元学习对公平目标的天真整合可能会导致子群体的超梯度冲突，导致不稳定的收敛，损害模型性能和公平性。为了解决这个问题，我们将解决超梯度冲突的方法框定为一个多方合作博弈。我们引入了一个两阶段元学习框架，其中第一阶段涉及使用纳什博弈解(NBS)来解决超梯度冲突，并将模型引导向帕累托前沿，第二阶段则根据具体的公平目标进行优化。我们的方法得到了理论结果的支持，尤其是关于梯度聚合的NBS证明不受线性独立性假设的影响，帕累托改进的证明，以及验证损失的单调改进的证明。我们还展示了在六个关键的公平数据集和两个图像分类任务中，针对各种公平目标的实证效果。

更新时间: 2024-06-11 07:34:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.07029v1

On the Distributed Evaluation of Generative Models

The evaluation of deep generative models has been extensively studied in the centralized setting, where the reference data are drawn from a single probability distribution. On the other hand, several applications of generative models concern distributed settings, e.g. the federated learning setting, where the reference data for conducting evaluation are provided by several clients in a network. In this paper, we study the evaluation of generative models in such distributed contexts with potentially heterogeneous data distributions across clients. We focus on the widely-used distance-based evaluation metrics, Fr\'echet Inception Distance (FID) and Kernel Inception Distance (KID). In the case of KID metric, we prove that scoring a group of generative models using the clients' averaged KID score will result in the same ranking as that of a centralized KID evaluation over a collective reference set containing all the clients' data. In contrast, we show the same result does not apply to the FID-based evaluation. We provide examples in which two generative models are assigned the same FID score by each client in a distributed setting, while the centralized FID scores of the two models are significantly different. We perform several numerical experiments on standard image datasets and generative models to support our theoretical results on the distributed evaluation of generative models using FID and KID scores.

Updated: 2024-06-11 07:33:04

标题: 关于生成模型的分布式评估

摘要: 深度生成模型的评估在集中设置中得到了广泛研究，其中参考数据来自单个概率分布。另一方面，生成模型的几个应用涉及分布式设置，例如联邦学习设置，其中用于评估的参考数据由网络中的多个客户提供。在本文中，我们研究了在具有潜在异构数据分布的客户端的分布式背景下生成模型的评估。我们专注于广泛使用的基于距离的评估指标，Fr\'echet Inception Distance (FID) 和 Kernel Inception Distance (KID)。对于KID指标，我们证明使用客户端平均KID分数对一组生成模型进行评分将导致与包含所有客户端数据的集体参考集上的集中KID评估相同的排名。相比之下，我们表明相同的结果不适用于基于FID的评估。我们提供了一些示例，其中在分布式设置中，两个生成模型被每个客户端分配相同的FID分数，而两个模型的集中FID分数显着不同。我们在标准图像数据集和生成模型上进行了几个数值实验，以支持我们关于使用FID和KID分数对生成模型进行分布式评估的理论结果。

更新时间: 2024-06-11 07:33:04

领域: cs.LG

下载: http://arxiv.org/abs/2310.11714v4

Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets

In this paper, we attempt to address the challenge of applying Neural Architecture Search (NAS) algorithms, specifically the Differentiable Architecture Search (DARTS), to long-tailed datasets where class distribution is highly imbalanced. We observe that traditional re-sampling and re-weighting techniques, which are effective in standard classification tasks, lead to performance degradation when combined with DARTS. To mitigate this, we propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS when integrated with the Bilateral Branch Network (BBN) for handling imbalanced datasets. Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations in the later stages of training. Additionally, we explore the impact of branch mixing factors on the algorithm's performance. Through extensive experiments on the CIFAR-10 dataset with an artificially induced long-tailed distribution, we demonstrate that our method achieves comparable accuracy to using DARTS alone. And the experiment results suggest that re-sampling methods inherently harm the performance of the DARTS algorithm. Our findings highlight the importance of careful data augment when applying DNAS to imbalanced learning scenarios.

Updated: 2024-06-11 07:32:25

标题: 异构学习率调度用于长尾数据集上的神经架构搜索

摘要: 在本文中，我们尝试解决将神经架构搜索（NAS）算法，特别是可微架构搜索（DARTS），应用于类别分布高度不平衡的长尾数据集的挑战。我们观察到，传统的重采样和重新加权技术，在标准分类任务中有效，但与DARTS结合时会导致性能下降。为了缓解这一问题，我们提出了一种新颖的自适应学习率调度策略，专为与双侧分支网络（BBN）集成处理不平衡数据集的DARTS架构参数而设计。我们的方法根据训练时期动态调整架构参数的学习率，防止在训练后期破坏良好训练的表示。此外，我们探讨了分支混合因子对算法性能的影响。通过在CIFAR-10数据集上进行大量实验，并人为引入长尾分布，我们展示了我们的方法达到与仅使用DARTS相当的准确性。实验结果表明，重采样方法本质上损害了DARTS算法的性能。我们的发现强调了在将DNAS应用于不平衡学习场景时谨慎进行数据增强的重要性。

更新时间: 2024-06-11 07:32:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07028v1

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, has become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce $\textbf{LangCell}$, the first $\textbf{Lang}$uage-$\textbf{Cell}$ pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.

Updated: 2024-06-11 07:31:13

标题: LangCell：面向细胞身份理解的语言-细胞预训练

摘要: 细胞身份包括细胞的各种语义方面，包括细胞类型、通路信息、疾病信息等，这些对于生物学家来说是了解其生物特征至关重要的内容。从转录组数据中理解细胞身份，如注释细胞类型，已成为生物信息学中的重要任务。由于这些语义方面是由人类专家确定的，因此在没有单细胞和标签对提供的监督信号的情况下，AI模型无法有效地执行细胞身份理解任务。目前用于此任务的单细胞预训练语言模型（PLMs）仅在单一模态，即转录组数据上进行训练，缺乏对细胞身份知识的理解。因此，它们必须在下游任务中进行微调，并在缺乏具有所需语义标签的标记数据时遇到困难。为解决这一问题，我们提出了一个创新性的解决方案，通过在预训练阶段构建单细胞数据和自然语言的统一表示，使模型能够直接融入与细胞身份相关的见解。更具体地说，我们介绍了$\textbf{LangCell}$，这是第一个$\textbf{Lang}$uage-$\textbf{Cell}$预训练框架。LangCell利用富含细胞身份信息的文本，以获得对跨模态知识的深刻理解。在不同基准测试中进行的实验结果表明，LangCell是唯一能够有效在零样本细胞身份理解场景中工作的单细胞PLM，同时在少样本和微调细胞身份理解场景中也明显优于现有模型。

更新时间: 2024-06-11 07:31:13

领域: q-bio.GN,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.06708v5

Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets.

Updated: 2024-06-11 07:30:48

标题: Muffliato：点对点隐私增强用于分散优化和平均

摘要: 分散化优化在机器学习中越来越受欢迎，因为它具有可扩展性和效率。直觉上，它也应该提供更好的隐私保证，因为节点只观察网络图中邻居发送的消息。但是，形式化和量化这种收益是具有挑战性的：现有的结果通常局限于局部差分隐私（LDP）保证，忽视了分散化的优势。在这项工作中，我们引入了成对网络差分隐私，这是对LDP的一种放宽，捕捉了一个节点$u$向另一个节点$v$泄露隐私可能取决于它们在图中的相对位置的事实。然后，我们分析了在固定和随机通信图上将局部噪声注入与（简单或随机的）八卦平均协议结合在一起。我们还推导出一种差分私密的分散式优化算法，该算法在局部梯度下降步骤和八卦平均之间交替进行。我们的结果显示，我们的算法随着图中节点之间的距离而放大隐私保证，与可信的策展人的隐私-效用权衡相匹配，直到明确取决于图拓扑的因素。最后，我们通过对合成和真实数据集的实验展示我们的隐私收益。

更新时间: 2024-06-11 07:30:48

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2206.05091v3

Entropy-Reinforced Planning with Large Language Models for Drug Discovery

The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: https://github.com/xuefeng-cs/ERP.

Updated: 2024-06-11 07:29:13

标题: 利用大型语言模型增强熵规划在药物发现中的应用

摘要: 药物发现的目标是识别具有特定药用特性的化合物，以对特定的结合靶标产生影响。现有的大型语言模型（LLMS）能够在分子生成的可能性方面实现高令牌匹配分数。然而，仅依赖LLM解码往往会导致生成的分子要么由于单个误用的令牌而无效，要么由于LLM先前的经验而导致不平衡的探索和开发，从而导致次优结果。在这里，我们提出了ERP，即Entropy-Reinforced Planning for Transformer Decoding，它采用熵强化规划算法来增强Transformer解码过程，并在开发和探索之间取得平衡。ERP旨在在多个属性上实现与直接从Transformer进行采样相比的改进。我们在SARS-CoV-2病毒（3CLPro）和人类癌细胞靶蛋白（RTCB）基准上评估了ERP，并证明在这两个基准上，ERP始终优于当前最先进的算法1-5％，并分别优于基线5-10％。此外，这种改进在经过不同目标训练的Transformer模型中是稳健的。最后，为了进一步展示ERP的能力，我们在三个代码生成基准上测试了我们的算法，并且在效果上超过了当前最先进的方法。我们的代码公开可用：https://github.com/xuefeng-cs/ERP。

更新时间: 2024-06-11 07:29:13

领域: cs.LG,cs.AI,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2406.07025v1

Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achieve this, we explore a tensor rank condition on contingency tables for an observed variable set $\mathbf{X}_p$, showing that the rank is determined by the minimum support of a specific conditional set (not necessary in $\mathbf{X}_p$) that d-separates all variables in $\mathbf{X}_p$. By this, one can locate the latent variable through probing the rank on different observed variables set, and further identify the latent causal structure under some structure assumptions. We present the corresponding identification algorithm and conduct simulated experiments to verify the effectiveness of our method. In general, our results elegantly extend the identification boundary for causal discovery with discrete latent variables and expand the application scope of causal discovery with latent variables.

Updated: 2024-06-11 07:25:17

标题: 使用张量秩条件学习离散潜变量结构

摘要: 未观测到的离散数据在许多科学领域中是普遍存在的，如何学习这些潜在变量的因果结构对于揭示数据模式至关重要。大多数研究集中在线性潜在变量模型上，或者对潜在结构施加严格的约束，这些方法未能解决包含非线性关系或复杂潜在结构的离散数据情况。为了实现这一目标，我们探讨了对于观测变量集$\mathbf{X}_p$的列联表的张量秩条件，表明秩由一个特定条件集的最小支持确定（该条件集不一定包含在$\mathbf{X}_p$中），该条件集能够d-分隔$\mathbf{X}_p$中的所有变量。通过这种方式，可以通过探测不同观测变量集上的秩来定位潜在变量，并在一定结构假设下进一步确定潜在因果结构。我们提出了相应的识别算法，并进行了模拟实验以验证我们方法的有效性。总的来说，我们的结果优雅地扩展了具有离散潜在变量的因果发现的识别边界，并扩展了具有潜在变量的因果发现的应用范围。

更新时间: 2024-06-11 07:25:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.07020v1

MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations

Few-shot gradient methods have been extensively utilized in existing model pruning methods, where the model weights are regarded as static values and the effects of potential weight perturbations are not considered. However, the widely used large language models (LLMs) have several billion model parameters, which could increase the fragility of few-shot gradient pruning. In this work, we experimentally show that one-shot gradient pruning algorithms could lead to unstable results under perturbations to model weights. And the minor error of switching between data formats bfloat16 and float16 could result in drastically different outcomes. To address such instabilities, we leverage optimization analysis and propose an LLM structural pruning method, called MoreauPruner, with provable robustness against weight perturbations. In MoreauPruner, the model weight importance is estimated based on the neural network's Moreau envelope, which can be flexibly combined with $\ell_1$-norm regularization techniques to induce the sparsity required in the pruning task. We extensively evaluate the MoreauPruner algorithm on several well-known LLMs, including LLaMA-7B, LLaMA-13B, LLaMA3-8B, and Vicuna-7B. Our numerical results suggest the robustness of MoreauPruner against weight perturbations, and indicate the MoreauPruner's successful accuracy-based scores in comparison to several existing pruning methods. We have released the code in \url{https://github.com/ShiningSord/MoreauPruner}.

Updated: 2024-06-11 07:19:04

标题: MoreauPruner：针对权重扰动的大型语言模型的稳健修剪

摘要: Few-shot梯度方法已广泛应用于现有的模型剪枝方法中，其中模型权重被视为静态值，不考虑潜在的权重扰动的影响。然而，广泛使用的大型语言模型（LLMs）具有数十亿个模型参数，这可能增加了少样本梯度剪枝的脆弱性。在这项工作中，我们通过实验证明，一次梯度剪枝算法在模型权重受到扰动时可能导致不稳定的结果。而在bfloat16和float16之间切换数据格式的轻微错误可能导致截然不同的结果。为了解决这种不稳定性，我们利用优化分析提出了一种名为MoreauPruner的LLM结构剪枝方法，具有对权重扰动具有可证明的鲁棒性。在MoreauPruner中，模型权重重要性是基于神经网络的Moreau包络来估计的，可以灵活地与$\ell_1$-范数正则化技术相结合，以诱导在剪枝任务中所需的稀疏性。我们对几个知名的LLM，包括LLaMA-7B、LLaMA-13B、LLaMA3-8B和Vicuna-7B，对MoreauPruner算法进行了广泛评估。我们的数值结果表明了MoreauPruner对权重扰动的鲁棒性，并表明MoreauPruner在准确性评分方面与几种现有的剪枝方法相比取得了成功。我们已在\url{https://github.com/ShiningSord/MoreauPruner}上发布了代码。

更新时间: 2024-06-11 07:19:04

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.07017v1

ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concrete samples that provide the strongest signal for learning in a noisy dataset. In this work, we propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness and relevancy for use in multimodal learning. Our approach leverages strong foundation models for measuring visual-semantic information loss in multimodal representations. We demonstrate that this strongly correlates with human evaluation of concreteness in both single-word and sentence-level texts. Moreover, we show that curation using ICC complements existing approaches: It succeeds in selecting the highest quality samples from multimodal web-scale datasets to allow for efficient training in resource-constrained settings.

Updated: 2024-06-11 07:18:44

标题: ICC：用于多模态数据集策划的图像标题具体性量化

摘要: 在多模态学习中，基于配对文本-图像数据的大规模训练变得日益重要，但面临着野外数据集高度嘈杂的挑战。标准的数据过滤方法成功地去除了不匹配的文本-图像对，但允许语义相关但高度抽象或主观的文本。这些方法缺乏细粒度的能力，无法隔离出提供最强信号以在嘈杂数据集中学习的最具体样本。在这项工作中，我们提出了一种新的度量标准，图像标题的具体性，它评估不带图像参考的标题文本，以衡量其在多模态学习中的具体性和相关性。我们的方法利用了衡量视觉-语义信息损失的强基础模型，以多模态表示。我们证明，这与人类对单词和句子级文本具体性的评估强烈相关。此外，我们展示了使用ICC进行策划和补充现有方法的有效性：它成功地从多模态大规模数据集中选择最高质量的样本，以便在资源受限的环境中进行高效训练。

更新时间: 2024-06-11 07:18:44

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.01306v3

Delving into ChatGPT usage in academic writing through excess vocabulary

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

Updated: 2024-06-11 07:16:34

标题: 深入研究ChatGPT在学术写作中的使用：通过冗余词汇

摘要: 最近的大型语言模型（LLMs）能够生成和修订文本，表现出人类水平的性能，并已广泛商业化应用在像ChatGPT这样的系统中。这些模型存在明显局限性：它们可能产生不准确的信息，强化现有的偏见，并容易被滥用。然而，许多科学家已经开始使用它们来辅助其学术写作。目前学术文献中LLM的使用有多广泛？为了回答这个问题，我们采用一种不带有任何关于学术LLM使用的假设的客观、大规模的方法。我们研究了从2010年到2024年间的1400万篇PubMed摘要中的词汇变化，并展示了LLMs的出现如何导致某些风格词频率的突然增加。基于过量词语使用的分析显示，至少有10%的2024年摘要是经过LLMs处理的。这个下限在不同学科、国家和期刊之间有所不同，对于一些PubMed子语料库来说甚至高达30%。我们表明，基于LLM的写作助手的出现在科学文献中产生了前所未有的影响，超过了主要世界事件如Covid大流行的影响。

更新时间: 2024-06-11 07:16:34

领域: cs.CL,cs.AI,cs.CY,cs.DL,cs.SI

下载: http://arxiv.org/abs/2406.07016v1

Post-train Black-box Defense via Bayesian Boundary Correction

Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.

Updated: 2024-06-11 07:14:18

标题: 贝叶斯边界修正的后训练黑匣防御

摘要: 基于深度神经网络的分类器容易受到对抗性攻击，广泛存在的脆弱性引发了对其防御潜在威胁的研究。针对一个脆弱的分类器，现有的防御方法大多是白盒的，并且通常需要在修改的损失函数/训练方案下重新训练受害者。虽然受害者的模型/数据/训练具体细节通常对用户不可见，但由于诸如有限的计算资源等原因，重新训练不令人满意，甚至是不可能的。为此，我们提出了一种新的后训练黑盒防御框架。它可以将任何预训练的分类器转化为一个具有鲁棒性的分类器，对模型具体细节的了解较少。这是通过对干净数据、对抗性示例和分类器进行新的联合贝叶斯处理来实现的，以最大化它们的联合概率。它还配备了一种新的后训练策略，保持受害者完整，避免重新训练。我们将我们的框架命名为贝叶斯边界校正（BBC）。BBC是一个通用且灵活的框架，可以轻松适应不同的数据类型。我们将BBC实例化为图像分类和基于骨架的人体活动识别，适用于静态和动态数据。详尽的评估显示，与现有的防御方法相比，BBC具有更强大的鲁棒性，可以增强鲁棒性而不严重损害干净准确性。

更新时间: 2024-06-11 07:14:18

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2306.16979v3

WeatherGNN: Exploiting Meteo- and Spatial-Dependencies for Local Numerical Weather Prediction Bias-Correction

Due to insufficient local area information, numerical weather prediction (NWP) may yield biases for specific areas. Previous studies correct biases mainly by employing handcrafted features or applying data-driven methods intuitively, overlooking the complicated dependencies between weather factors and between areas. To address this issue, we propose WeatherGNN, a local NWP bias-correction method that utilizes Graph Neural Networks (GNNs) to exploit meteorological dependencies and spatial dependencies under the guidance of domain knowledge. Specifically, we introduce a factor GNN to capture area-specific meteorological dependencies adaptively based on spatial heterogeneity and a fast hierarchical GNN to capture dynamic spatial dependencies efficiently guided by Tobler's first and second laws of geography. Our experimental results on two real-world datasets demonstrate that WeatherGNN achieves the state-of-the-art performance, outperforming the best baseline with an average of 4.75 \% on RMSE.

Updated: 2024-06-11 07:13:13

标题: WeatherGNN:利用气象和空间依赖性进行本地数值天气预报偏差校正

摘要: 由于缺乏足够的本地区域信息，数值天气预报（NWP）可能会对特定区域产生偏差。先前的研究主要通过使用手工特征或直观地应用数据驱动方法来纠正偏差，忽视了天气因素之间和区域之间复杂的依赖关系。为了解决这个问题，我们提出了WeatherGNN，一种利用图神经网络（GNNs）的本地NWP偏差校正方法，以利用领域知识的指导来利用气象依赖性和空间依赖性。具体来说，我们引入了一个因子GNN来自适应地捕获基于空间异质性的特定区域气象依赖关系，以及一个快速分层GNN来高效地捕获受托勒的第一和第二地理法则指导的动态空间依赖关系。我们在两个真实数据集上的实验结果表明，WeatherGNN实现了最先进的性能，平均RMSE提高了4.75％，优于最佳基线。

更新时间: 2024-06-11 07:13:13

领域: cs.LG

下载: http://arxiv.org/abs/2310.05517v2

Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions

Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao.

Updated: 2024-06-11 07:10:45

标题: 突破自由：无需假设非勾结的情况下高效的多方私有集合联合

摘要: 多方私有集合并集（MPSU）协议使$m$（$m>2$）个各自持有一个集合的参与方能够集体计算它们的集合的并集，而不向其他参与方透露任何额外信息。MPSU协议主要分为两类：第一类基于公钥技术。该类别中的所有现有作品都涉及超线性数量的公钥操作，导致实际效率低下。第二类基于无差别传输和对称密钥技术。该类别中唯一的现有作品由刘和高（ASIACRYPT 2023）提出，尽管其计算和通信均为超线性，但在所有现有协议中表现出最佳的具体性能。不幸的是，由于其固有地依赖于非串通假设，它并未实现标准半诚实安全性，而在实践中不太可能成立。因此，基于无差别传输和对称密钥技术在标准半诚实模型中构建实用的MPSU协议的问题仍然存在。此外，还没有一个MPSU协议同时实现线性计算和线性通信复杂性，这留下了另一个未解决的问题。在这项工作中，我们解决了这两个未解决的问题。我们提出了第一个基于无差别传输和对称密钥技术的标准半诚实模型中的MPSU协议。在局域网设置中，该协议比刘和高快$4.9-9.3$倍。具体而言，我们的协议在每个拥有$2^{20}$个项目的3个参与方的在线阶段仅需要$3.6$秒。我们提出了第一个基于公钥操作实现线性计算和线性通信复杂性的MPSU协议。该协议具有最低的总体通信成本，并与刘和高相比在总体通信方面显示出$3.0-36.5$倍的改进。

更新时间: 2024-06-11 07:10:45

领域: cs.CR

下载: http://arxiv.org/abs/2406.07011v1

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

As pretrained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. In this paper, we introduce a method to produce results with the same structure of a target image but painted with colors from a reference image, i.e., appearance transfer, especially following the semantic correspondence between the result and the reference. E.g., the result wing takes color from the reference wing, not the reference head. Existing methods rely on the query-key similarity within self-attention layer, usually producing defective results. To this end, we propose to find semantic correspondences and explicitly rearrange the features according to the semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the color from the reference according to the semantic correspondences, even when the two images are not aligned.

Updated: 2024-06-11 07:08:48

标题: 以眼还眼：扩散模型中的语义对应外观转移

摘要: 随着预训练的文本到图像扩散模型成为图像合成的有用工具，人们希望以各种方式指定结果。本文介绍了一种方法，可以生成具有目标图像相同结构但使用参考图像中颜色绘制的结果，即外观转移，特别是遵循结果与参考之间的语义对应关系。例如，结果翅膀取自参考翅膀的颜色，而不是参考头部的颜色。现有方法依赖于自注意力层内的查询-键相似性，通常会产生有缺陷的结果。为此，我们提出通过找到语义对应关系并根据这些关系显式重新排列特征。大量实验证明了我们的方法在各个方面的优越性：保留目标的结构，并根据语义对应关系反映参考颜色，即使两个图像不对齐也能实现。

更新时间: 2024-06-11 07:08:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07008v1

ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC tasks, named ChatLang-8, which encompasses eight types of subject nouns and 23 types of grammar. It consists of 1 million pairs featuring human-like grammatical errors. Our experiments reveal that ChatLang-8 exhibits a more uniform pattern composition compared to existing GEC datasets. Furthermore, we observe improved model performance when using ChatLang-8 instead of existing GEC datasets. The experimental results suggest that our framework and ChatLang-8 are valuable resources for enhancing ChatGPT's data generation capabilities.

Updated: 2024-06-11 07:06:34

标题: ChatLang-8：一种基于LLM的语法错误校正合成数据生成框架

摘要: 我们探讨并改进了LLMs生成用于语法错误纠正（GEC）的数据的能力。当仅仅生成平行句子时，它们的模式过于简单，不足以作为一个语料库有价值。为了解决这个问题，我们提出了一个自动化框架，包括主题选择器、语法选择器、提示管理器和评估器。此外，我们引入了一个新的GEC任务数据集，名为ChatLang-8，包括八种主题名词和23种语法类型。它包含100万对具有类似人类的语法错误的句子。我们的实验显示，与现有的GEC数据集相比，ChatLang-8具有更统一的模式组成。此外，我们观察到在使用ChatLang-8而不是现有的GEC数据集时，模型性能有所提高。实验结果表明，我们的框架和ChatLang-8是增强ChatGPT数据生成能力的宝贵资源。

更新时间: 2024-06-11 07:06:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.03202v2

DecoR: Deconfounding Time Series with Robust Regression

Causal inference on time series data is a challenging problem, especially in the presence of unobserved confounders. This work focuses on estimating the causal effect between two time series, which are confounded by a third, unobserved time series. Assuming spectral sparsity of the confounder, we show how in the frequency domain this problem can be framed as an adversarial outlier problem. We introduce Deconfounding by Robust regression (DecoR), a novel approach that estimates the causal effect using robust linear regression in the frequency domain. Considering two different robust regression techniques, we first improve existing bounds on the estimation error for such techniques. Crucially, our results do not require distributional assumptions on the covariates. We can therefore use them in time series settings. Applying these results to DecoR, we prove, under suitable assumptions, upper bounds for the estimation error of DecoR that imply consistency. We show DecoR's effectiveness through experiments on synthetic data. Our experiments furthermore suggest that our method is robust with respect to model misspecification.

Updated: 2024-06-11 06:59:17

标题: DecoR: 使用健壮回归对时间序列进行去混杂化

摘要: 时间序列数据上的因果推断是一个具有挑战性的问题，特别是在存在未观察到的混杂因素的情况下。本文关注于估计两个时间序列之间的因果效应，这两个时间序列受到一个第三个未观察到的时间序列的混淆。假设混杂因素在频谱上是稀疏的，我们展示了在频域中如何将这个问题框架化为一个对抗性异常值问题。我们引入了一种名为Deconfounding by Robust regression (DecoR)的新方法，该方法使用频域中的鲁棒线性回归来估计因果效应。考虑到两种不同的鲁棒回归技术，我们首先改进了对于这些技术的估计误差的现有界限。关键的是，我们的结果不需要对协变量进行分布假设。因此，我们可以在时间序列设置中使用它们。将这些结果应用于DecoR，我们证明，在适当的假设下，DecoR的估计误差上界暗示了一致性。通过对合成数据进行实验，我们展示了DecoR的有效性。我们的实验进一步表明，我们的方法对于模型错误规定是稳健的。

更新时间: 2024-06-11 06:59:17

领域: stat.ML,cs.LG,62F12 (Primary) 62F35 (Secondary),I.2.0

下载: http://arxiv.org/abs/2406.07005v1

Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent biases towards specific tokens and positions. To mitigate these issues, we make the first attempt and propose a novel two-stage classification framework for LLMs. Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias. Specifically, we begin with a self-reduction technique to efficiently narrow down numerous options, which contributes to reduced decision space and a faster comparison process. Subsequently, pairwise contrastive comparisons are employed in a chain-of-thought manner to draw out nuances and distinguish confusable options, thus refining the ambiguous decision boundary. Extensive experiments on four datasets (Banking77, HWU64, LIU54, and Clinic150) verify the effectiveness of our framework. Furthermore, benefitting from our framework, various LLMs can achieve consistent improvements. Our code and data are available in \url{https://github.com/Chuge0335/PC-CoT}.

Updated: 2024-06-11 06:53:19

标题: 在大语言模型时代减轻文本分类中的边界模糊和固有偏见

摘要: 文本分类是在实际场景中经常遇到的关键任务，然而在大型语言模型(LLMs)时代仍然未被充分探索。本研究表明LLMs对文本分类中选项数量和排列的变化是脆弱的。我们的大量实证分析揭示了关键瓶颈源于模糊的决策边界和对特定标记和位置的固有偏见。为了缓解这些问题，我们首次尝试并提出了一个新颖的LLMs两阶段分类框架。我们的方法建立在实证观察到的对比能有效缓解边界模糊和固有偏见的基础上。具体地，我们首先采用自降维技术来高效缩小众多选项，从而减少决策空间并加快对比过程。随后，我们以一种链式思维方式使用成对对比来描绘微妙之处并区分易混淆的选项，从而完善模糊的决策边界。在四个数据集（Banking77、HWU64、LIU54和Clinic150）上进行的大量实验验证了我们框架的有效性。此外，借助我们的框架，各种LLMs可以获得一致的改进。我们的代码和数据可在\url{https://github.com/Chuge0335/PC-CoT}上找到。

更新时间: 2024-06-11 06:53:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07001v1

Unified Low-rank Compression Framework for Click-through Rate Prediction

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.

Updated: 2024-06-11 06:47:50

标题: 点击率预测的统一低秩压缩框架

摘要: 深度点击率（CTR）预测模型在现代工业推荐场景中扮演着重要角色。然而，高内存开销和计算成本限制了它们在资源受限环境中的部署。低秩逼近是计算机视觉和自然语言处理模型的有效方法，但其在压缩CTR预测模型中的应用尚未得到充分探索。由于有限的内存和计算资源，CTR预测模型的压缩经常面临三个基本挑战，即（1）如何减小模型大小以适应边缘设备？（2）如何加速CTR预测模型推断？（3）如何在压缩后保留原始模型的功能？先前的低秩压缩研究主要使用张量分解，可以实现高参数压缩比，但会带来AUC降级和额外的计算开销。为了解决这些挑战，我们提出了一个统一的低秩分解框架来压缩CTR预测模型。我们发现，即使使用最经典的矩阵分解SVD方法，我们的框架也可以比原始模型表现更好。为了进一步提高我们框架的有效性，我们在局部压缩输出特征而不是压缩模型权重。我们的统一低秩压缩框架可以应用于各种CTR预测模型中的嵌入表和MLP层。对两个学术数据集和一个真实工业基准的大量实验表明，通过3-5倍的模型大小缩减，我们的压缩模型可以实现比未压缩原始模型更快的推断速度和更高的AUC。我们的代码位于https://github.com/yuhao318/Atomic_Feature_Mimicking。

更新时间: 2024-06-11 06:47:50

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.18146v2

Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.

Updated: 2024-06-11 06:32:16

标题: 缩小差距：在马尔可夫抽样下利用神经网络参数化实现演员-评论家的全球收敛（最终迭代）

摘要: 目前，Actor-Critic（AC）算法的最先进理论分析在解决AC实现的实际方面上明显滞后。为了使分析与AC的实际实现保持一致，需要填补这一关键差距。为了解决这一问题，我们提倡考虑MMCLG标准：\textbf{M}ulti-layer神经网络参数化用于actor/critic，\textbf{M}arkovian抽样，\textbf{C}ontinuous状态-动作空间，最后迭代的性能以及\textbf{G}lobal最优性。这些方面在实践中具有重要意义，但在现有AC算法的理论分析中往往被忽视。在这项工作中，我们通过提供首个涵盖所有五个关键实际方面（涵盖MMCLG标准）的AC算法的全面理论分析来解决这些差距。我们建立了全局收敛的样本复杂度界限为$\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$。我们通过创新地利用MDP的弱梯度支配性质和对评论家估计误差的独特分析实现了这一结果。

更新时间: 2024-06-11 06:32:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.01843v3

DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach

The rapid advancement of Artificial Intelligence (AI) has introduced Deep Neural Network (DNN)-based tasks to the ecosystem of vehicular networks. These tasks are often computation-intensive, requiring substantial computation resources, which are beyond the capability of a single vehicle. To address this challenge, Vehicular Edge Computing (VEC) has emerged as a solution, offering computing services for DNN-based tasks through resource pooling via Vehicle-to-Vehicle/Infrastructure (V2V/V2I) communications. In this paper, we formulate the problem of joint DNN partitioning, task offloading, and resource allocation in VEC as a dynamic long-term optimization. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. To this end, we first leverage a Lyapunov optimization technique to decouple the original long-term optimization with stability constraints into a per-slot deterministic problem. Afterwards, we propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models to determine the optimal DNN partitioning and task offloading decisions. Furthermore, we integrate convex optimization techniques into MAD2RL as a subroutine to allocate computation resources, enhancing the learning efficiency. Through simulations under real-world movement traces of vehicles, we demonstrate the superior performance of our proposed algorithm compared to existing benchmark solutions.

Updated: 2024-06-11 06:31:03

标题: DNN分区、任务卸载和资源分配在动态车载网络中的应用：一种以Lyapunov引导的基于扩散的强化学习方法

摘要: 人工智能（AI）的快速发展将深度神经网络（DNN）任务引入了车载网络生态系统。这些任务通常需要大量计算资源，单个车辆无法满足。为解决这一挑战，车载边缘计算（VEC）作为一种解决方案出现，通过车辆间/车辆基础设施（V2V/V2I）通信资源池，为基于DNN的任务提供计算服务。本文将VEC中的联合DNN分区、任务卸载和资源分配问题建模为动态长期优化问题。我们的目标是在保证系统稳定性的同时最小化基于DNN的任务完成时间。为实现这一目标，我们首先利用Lyapunov优化技术将原始长期优化问题与稳定性约束解耦为每个时隙的确定性问题。随后，我们提出了一个基于多智能体扩散的深度强化学习（MAD2RL）算法，创新地利用扩散模型确定最佳的DNN分区和任务卸载决策。此外，我们将凸优化技术集成到MAD2RL中作为一种子程序，以提高学习效率。通过在真实车辆移动轨迹下的仿真，我们展示了我们提出的算法相对于现有基准解决方案的卓越性能。

更新时间: 2024-06-11 06:31:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.06986v1

On the Hölder Stability of Multiset and Graph Neural Networks

Famously, multiset neural networks based on sum-pooling can separate all distinct multisets, and as a result can be used by message passing neural networks (MPNNs) to separate all pairs of graphs that can be separated by the 1-WL graph isomorphism test. However, the quality of this separation may be very weak, to the extent that the embeddings of "separable" multisets and graphs might even be considered identical when using fixed finite precision. In this work, we propose to fully analyze the separation quality of multiset models and MPNNs via a novel adaptation of Lipschitz and H\"{o}lder continuity to parametric functions. We prove that common sum-based models are lower-H\"{o}lder continuous, with a H\"{o}lder exponent that decays rapidly with the network's depth. Our analysis leads to adversarial examples of graphs which can be separated by three 1-WL iterations, but cannot be separated in practice by standard maximally powerful MPNNs. To remedy this, we propose two novel MPNNs with improved separation quality, one of which is lower Lipschitz continuous. We show these MPNNs can easily classify our adversarial examples, and compare favorably with standard MPNNs on standard graph learning tasks.

Updated: 2024-06-11 06:28:21

标题: 关于多集合和图神经网络的Hölder稳定性

摘要: 在著名的基于求和池化的多集神经网络中，可以分离所有不同的多集，因此可以被消息传递神经网络（MPNNs）用来分离可以通过1-WL图同构测试分离的所有图对。然而，这种分离的质量可能非常弱，以至于在使用固定有限精度时，“可分离”的多集和图的嵌入甚至被认为是相同的。在这项工作中，我们提出通过对参数函数进行Lipschitz和H\"{o}lder连续性的新颖调整，全面分析多集模型和MPNNs的分离质量。我们证明常见的基于求和的模型是较低的H\"{o}lder连续的，其H\"{o}lder指数随着网络深度的增加迅速衰减。我们的分析导致了可以通过三次1-WL迭代分离的图的对抗示例，但在实践中无法通过标准的最大功率MPNNs进行分离。为了解决这个问题，我们提出了两种具有改进分离质量的新型MPNNs，其中一种是较低的Lipschitz连续的。我们展示这些MPNNs可以轻松分类我们的对抗示例，并在标准图学习任务中与标准MPNNs进行了有利的比较。

更新时间: 2024-06-11 06:28:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.06984v1

MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To address these, we extend the recent multistep consistency distillation (MCD) strategy to representative LDMs, establishing the Multistep Latent Consistency Models (MLCMs) approach for low-cost high-quality image synthesis. MLCM serves as a unified model for various sampling steps due to the promise of MCD. We further augment MCD with a progressive training strategy to strengthen inter-segment consistency to boost the quality of few-step generations. We take the states from the sampling trajectories of the teacher model as training data for MLCMs to lift the requirements for high-quality training datasets and to bridge the gap between the training and inference of the distilled model. MLCM is compatible with preference learning strategies for further improvement of visual quality and aesthetic appeal. Empirically, MLCM can generate high-quality, delightful images with only 2-8 sampling steps. On the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of 33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps, substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and 8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in applications including controllable generation, image style transfer, and Chinese-to-image generation.

Updated: 2024-06-11 06:22:53

标题: MLCM：潜在扩散模型的多步一致性蒸馏

摘要: 将大型潜在扩散模型（LDMs）精炼为易于快速采样的模型正吸引着越来越多的研究兴趣。然而，现有方法大多面临一个困境，要么依赖于多个不同采样预算的个别精炼模型，要么在采样步骤有限（例如2-4步）和/或适中（例如5-8步）的情况下牺牲生成质量。为了解决这些问题，我们将最近的多步一致性精炼（MCD）策略扩展到代表性的LDMs，建立了适用于低成本高质量图像合成的多步潜在一致性模型（MLCMs）方法。MLCM作为各种采样步骤的统一模型，因为MCD的承诺而存在。我们进一步使用渐进训练策略增强MCD，以增强分段间的一致性，提升少步生成的质量。我们将教师模型的采样轨迹状态作为MLCM的训练数据，以提高对高质量训练数据集的要求，并弥合精炼模型的训练和推断之间的差距。MLCM与偏好学习策略兼容，以进一步提高视觉质量和美学吸引力。经验上，MLCM可以仅通过2-8个采样步骤生成高质量、令人愉悦的图像。在MSCOCO-2017 5K基准上，从SDXL精炼的MLCM在仅4步的情况下获得了33.30的CLIP得分，6.19的美学得分和1.20的图像奖励，大幅超过了4步LCM，8步SDXL-Lightning和8步HyperSD。我们还展示了MLCM在可控生成、图像风格转移和中文到图像生成等应用中的多功能性。

更新时间: 2024-06-11 06:22:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05768v2

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to the absence of multimodal benchmarks that align with human preferences. Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking. Our study reveals that, while MLLMs demonstrate remarkable human-like discernment in Pair Comparison, there is a significant divergence from human preferences in Scoring Evaluation and Batch Ranking. Furthermore, a closer examination reveals persistent challenges in the judgment capacities of LLMs, including diverse biases, hallucinatory responses, and inconsistencies in judgment, even in advanced models such as GPT-4V. These findings emphasize the pressing need for enhancements and further research efforts to be undertaken before regarding MLLMs as fully reliable evaluators. In light of this, we advocate for additional efforts dedicated to supporting the continuous development within the domain of MLLM functioning as judges. The code and dataset are publicly available at our project homepage: \url{https://mllm-judge.github.io/}.

Updated: 2024-06-11 06:21:46

标题: MLLM作为评判者：评估具有视觉-语言基准的多模式MLLM

摘要: 多模态大型语言模型（MLLMs）最近引起了广泛关注，在人工通用智能方面展现出了显著潜力。然而，评估MLLM的实用性存在相当大的挑战，主要是由于缺乏与人类偏好一致的多模态基准。本文从LLMs中的LLM作为评判者的概念中汲取灵感，引入了一个新颖的基准，称为MLLM作为评判者，以评估MLLM在协助跨越多种模态的评判者方面的能力，包括得分评估、配对比较和批量排名三个不同任务。我们的研究发现，虽然MLLM在配对比较中展现出了显著的类人辨别能力，但在得分评估和批量排名方面与人类偏好存在显著差异。此外，更仔细的检查揭示了LLMs的评判能力中持续存在的挑战，包括各种偏见、幻觉性回应以及在判断中的不一致性，即使在像GPT-4V这样的高级模型中也是如此。这些发现强调了在将MLLM视为完全可靠的评估者之前需要进行增强和进一步研究的迫切需要。鉴于此，我们倡导额外的努力致力于支持MLLM作为评判者领域内的持续发展。代码和数据集可在我们的项目主页上公开获取：https://mllm-judge.github.io/。

更新时间: 2024-06-11 06:21:46

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.04788v3

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery. AudioMarkBench includes a new dataset created from Common-Voice across languages, biological sexes, and ages, 3 state-of-the-art watermarking methods, and 15 types of perturbations. We benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings. Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions. Our dataset and code are publicly available at \url{https://github.com/moyangkuo/AudioMarkBench}.

Updated: 2024-06-11 06:18:29

标题: AudioMarkBench：音频水印的稳健性基准测试

摘要: 合成语音逼真度的增加，受到文本转语音模型的进步推动，引发了关于冒充和虚假信息的道德关注。音频水印技术通过将人类感知不到的水印嵌入到人工智能生成的音频中，提供了一种有前途的解决方案。然而，音频水印技术对常见/对抗性扰动的抗性仍未得到充分研究。我们提出了AudioMarkBench，这是第一个系统性基准，用于评估音频水印技术对水印去除和伪造的抗性。AudioMarkBench包括一个由Common-Voice跨语言、生物性别和年龄创建的新数据集，3种最先进的水印方法，以及15种扰动类型。我们在无盒、黑盒和白盒设置中对这些方法的抗性进行了基准测试。我们的研究结果突出了当前水印技术的脆弱性，并强调了更加稳健和公平的音频水印解决方案的必要性。我们的数据集和代码可以在\url{https://github.com/moyangkuo/AudioMarkBench}上公开获取。

更新时间: 2024-06-11 06:18:29

领域: cs.LG,cs.CR,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.06979v1

Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation

Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlation with previous tasks is not well-known before the training, which is called cross-domain. 2) The dynamic worker performance as workers will learn from the ground truth. In this paper, we consider both factors in designing an allocation scheme named cross-domain-aware worker selection with training approach. Our approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically. A framework with a theoretical analysis of the worker elimination process is given. To validate the effectiveness of our methods, we collect two novel real-world datasets and generate synthetic datasets. The experiment results show that our method outperforms the baselines on both real-world and synthetic datasets.

Updated: 2024-06-11 06:18:22

标题: 跨领域感知工人选择与众包注释培训

摘要: 通过众包进行标注吸引了增量关注，这依赖于在一组工作者中进行有效选择的方案。现有方法提出基于工作者在具有地面真相的任务上的表现来选择工作者，但遗漏了两个重要点。1）工作者在其他任务中的历史表现。在现实世界的场景中，工作者需要解决一个与之前任务的相关性在训练之前未知的新任务，这被称为跨领域。2）工作者的动态表现，因为工作者将从地面真相中学习。在本文中，我们考虑了设计一种名为跨领域感知工作者选择与训练方法的分配方案中的这两个因素。我们的方法提出了两个估计模块，既可以统计分析跨领域相关性，又可以动态模拟工作者的学习收益。给出了一个具有工作者淘汰过程的理论分析的框架。为了验证我们方法的有效性，我们收集了两个新颖的现实世界数据集并生成了合成数据集。实验结果表明，我们的方法在真实世界和合成数据集上都优于基线方法。

更新时间: 2024-06-11 06:18:22

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2406.06977v1

Discrete Dictionary-based Decomposition Layer for Structured Representation Learning

Neuro-symbolic neural networks have been extensively studied to integrate symbolic operations with neural networks, thereby improving systematic generalization. Specifically, Tensor Product Representation (TPR) framework enables neural networks to perform differentiable symbolic operations by encoding the symbolic structure of data within vector spaces. However, TPR-based neural networks often struggle to decompose unseen data into structured TPR representations, undermining their symbolic operations. To address this decomposition problem, we propose a Discrete Dictionary-based Decomposition (D3) layer designed to enhance the decomposition capabilities of TPR-based models. D3 employs discrete, learnable key-value dictionaries trained to capture symbolic features essential for decomposition operations. It leverages the prior knowledge acquired during training to generate structured TPR representations by mapping input data to pre-learned symbolic features within these dictionaries. D3 is a straightforward drop-in layer that can be seamlessly integrated into any TPR-based model without modifications. Our experimental results demonstrate that D3 significantly improves the systematic generalization of various TPR-based models while requiring fewer additional parameters. Notably, D3 outperforms baseline models on the synthetic task that demands the systematic decomposition of unseen combinatorial data.

Updated: 2024-06-11 06:16:33

标题: 离散字典分解层用于结构化表示学习

摘要: 神经符号神经网络已被广泛研究，以整合符号操作与神经网络，从而提高系统化泛化能力。具体而言，张量积表示（TPR）框架使神经网络能够通过在向量空间中编码数据的符号结构来执行可微符号操作。然而，基于TPR的神经网络往往难以将看不见的数据分解为结构化的TPR表示，削弱了它们的符号操作能力。为了解决这一分解问题，我们提出了一种基于离散字典的分解（D3）层，旨在增强TPR模型的分解能力。D3采用离散的、可学习的键值字典，训练用于捕获分解操作所必需的符号特征。它利用训练过程中获得的先验知识，通过将输入数据映射到这些字典中预先学习的符号特征来生成结构化的TPR表示。D3是一个直接可插入的层，可以无缝集成到任何基于TPR的模型中而无需修改。我们的实验结果表明，D3显著提高了各种基于TPR的模型的系统化泛化能力，同时需要更少的额外参数。值得注意的是，D3在合成任务中胜过了基线模型，该任务需要对看不见的组合数据进行系统化分解。

更新时间: 2024-06-11 06:16:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06976v1

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.

Updated: 2024-06-11 06:12:52

标题: 开放式多模态关系推理用于视频问答

摘要: 在本文中，我们介绍了一种专门设计用于分析外部环境并回答参与者问题的机器人代理。该代理的主要重点是在视频场景中通过基于语言的交互帮助个人。我们提出的方法将视频识别技术和自然语言处理模型整合到机器人代理中。我们通过研究参与者和机器人代理之间出现的相关问题，探讨影响人机交互的关键因素。在方法上，我们的实验发现显示了信任与交互效率之间的正向关系。此外，我们的模型表现出比其他基准方法提高2\%到3\%的性能。

更新时间: 2024-06-11 06:12:52

领域: cs.AI,cs.HC,cs.RO

下载: http://arxiv.org/abs/2012.00822v4

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC. MIMIC treats the potentially backdoored encoder as the teacher net and employs knowledge distillation to distill a clean student encoder from the teacher net. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing <5% of clean data, surpassing seven state-of-the-art backdoor mitigation techniques.

Updated: 2024-06-11 06:11:36

标题: 相互信息引导的预训练编码器后门缓解

摘要: 自监督学习（SSL）越来越受到青睐，因为它可以在不需要标记数据的情况下对编码器进行预训练。建立在这些经过预训练的编码器之上的下游任务可以获得几乎接近最先进的性能。然而，通过SSL预训练的编码器容易受到后门攻击的影响，这已经被现有研究所证明。已经设计了许多后门缓解技术用于下游任务模型。然而，由于预训练时缺乏标签信息，这些技术在适用于预训练编码器时的有效性受到损害和限制。为了解决针对预训练编码器的后门攻击，在本文中，我们创新地提出了一种名为MIMIC的互信息引导后门缓解技术。MIMIC将潜在受到后门攻击的编码器视为教师网络，并利用知识蒸馏从教师网络中提取干净的学生编码器。与现有的知识蒸馏方法不同，MIMIC使用随机权重初始化学生，不继承教师网络的后门。然后，MIMIC利用每一层和提取特征之间的互信息来定位教师网络中良性知识的位置，并利用蒸馏将干净的特征从教师克隆到学生。我们设计了两方面的蒸馏损失，包括克隆损失和注意损失，旨在同时减轻后门攻击并保持编码器的性能。我们在SSL中进行的两种后门攻击的评估表明，MIMIC可以通过仅利用不到5％的干净数据显著降低攻击成功率，超过了七种最先进的后门缓解技术。

更新时间: 2024-06-11 06:11:36

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.03508v2

Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

We cast multiview reconstruction from unknown pose as a generative modeling problem. From a collection of unannotated 2D images of a scene, our approach simultaneously learns both a network to predict camera pose from 2D image input, as well as the parameters of a Neural Radiance Field (NeRF) for the 3D scene. To drive learning, we wrap both the pose prediction network and NeRF inside a Denoising Diffusion Probabilistic Model (DDPM) and train the system via the standard denoising objective. Our framework requires the system accomplish the task of denoising an input 2D image by predicting its pose and rendering the NeRF from that pose. Learning to denoise thus forces the system to concurrently learn the underlying 3D NeRF representation and a mapping from images to camera extrinsic parameters. To facilitate the latter, we design a custom network architecture to represent pose as a distribution, granting implicit capacity for discovering view correspondences when trained end-to-end for denoising alone. This technique allows our system to successfully build NeRFs, without pose knowledge, for challenging scenes where competing methods fail. At the conclusion of training, our learned NeRF can be extracted and used as a 3D scene model; our full system can be used to sample novel camera poses and generate novel-view images.

Updated: 2024-06-11 06:09:41

标题: 多视角到未知姿态的三维生成提升：将NeRF包裹在扩散中

摘要: 我们将从未知姿态的多视角重建视为一种生成建模问题。通过对一个场景的一组未标记的2D图像，我们的方法同时学习从2D图像输入预测相机姿态的网络，以及用于3D场景的神经辐射场（NeRF）的参数。为了推动学习，我们将姿态预测网络和NeRF包装在一个去噪扩散概率模型（DDPM）内，并通过标准的去噪目标训练系统。我们的框架要求系统通过预测其姿态并从该姿态渲染NeRF来完成去噪输入2D图像的任务。因此，学习去噪迫使系统同时学习基础的3D NeRF表示和从图像到相机外部参数的映射。为了促进后者，我们设计了一个自定义网络架构来表示姿态作为分布，从而在仅为去噪训练时具有发现视图对应关系的隐含能力。这种技术使我们的系统能够成功构建NeRF，而无需姿态知识，可用于竞争方法失败的复杂场景。在训练结束时，我们学到的NeRF可以被提取并用作3D场景模型；我们的完整系统可以用来采样新的相机姿态并生成新视图图像。

更新时间: 2024-06-11 06:09:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06972v1

Generalized Principal-Agent Problem with a Learning Agent

Generalized principal-agent problems, including Stackelberg games, contract design, and Bayesian persuasion, are a class of economic problems where an agent best responds to a principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. Using this reduction, we show that: (1) if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is at least the principal's optimal utility in the classic non-learning model minus the square root of the agent's regret; (2) if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility more than the optimal utility in the non-learning model plus the agent's swap regret. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These general results not only refine previous results in Stackelberg games and contract design with learning agents but also lead to new results for Bayesian persuasion with a learning agent.

Updated: 2024-06-11 05:52:20

标题: 具有学习代理的广义委托-代理问题

摘要: 广义的委托代理问题，包括斯塔克贝格博弈、合同设计和贝叶斯说服等，是一类经济问题，其中代理人对委托人的承诺策略作出最佳响应。我们研究了在假设委托人没有承诺权力且代理人使用算法学习如何响应委托人的情况下的重复广义委托代理问题。我们将这个问题简化为一个近似最佳响应代理人的一次性广义委托代理问题。利用这种简化，我们表明：（1）如果代理人使用上下文无悔学习算法，那么委托人可以保证至少与经典非学习模型中的委托人最优效用相同，减去代理人遗憾的平方根；（2）如果代理人使用上下文无交换遗憾学习算法，那么委托人无法获得比非学习模型中的最优效用更多，加上代理人的交换遗憾。但是（3）如果代理人使用基于均值的学习算法（可以是无悔但不能是无交换遗憾），那么委托人可以比非学习模型做得更好。这些一般性结果不仅细化了以往在斯塔克贝格博弈和学习代理的合同设计中的结果，还为具有学习代理的贝叶斯说服提供了新的结果。

更新时间: 2024-06-11 05:52:20

领域: cs.GT,cs.AI,cs.LG,econ.TH

下载: http://arxiv.org/abs/2402.09721v3

Beyond the Norms: Detecting Prediction Errors in Regression Models

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems. Our code is available at https://zenodo.org/records/11281964.

Updated: 2024-06-11 05:51:44

标题: 超越规范：检测回归模型中的预测误差

摘要: 这篇论文解决了在回归算法中检测不可靠行为的挑战，这可能源自固有的变异性（例如，aleatoric不确定性）或建模错误（例如，模型不确定性）。首先，我们正式介绍了回归中的不可靠性概念，即当回归器的输出超过指定的差异（或错误）时。然后，利用强大的概率建模工具，我们估计差异密度，并使用我们提出的统计异质性度量其统计多样性。反过来，这使我们能够推导出一个表达回归结果不确定性的数据驱动分数。我们展示了对多个回归任务的错误检测的经验改进，始终优于流行的基线方法，并为不确定性量化和安全机器学习系统的更广泛领域做出贡献。我们的代码可在https://zenodo.org/records/11281964上找到。

更新时间: 2024-06-11 05:51:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06968v1

Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples

The dual thinking framework considers fast, intuitive processing and slower, logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision, which also aids in studying the qualitative behavior of deep learning models. Our study also addresses a major criticism of using classification models as computational models of human vision by using instance segmentation models that localize objects. The evidence underscores the importance of shape in identifying instances in human vision and shows that deep learning models lack an understanding of sub-structures, as indicated by errors related to the position and number of sub-components. Additionally, the similarity in errors made by models and intuitive human processing indicates that models only address intuitive thinking in human vision.

Updated: 2024-06-11 05:50:34

标题: 双重思维和使用人类对抗示例对深度学习模型进行感知分析

摘要: The dual thinking framework considers fast, intuitive processing and slower, logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision, which also aids in studying the qualitative behavior of deep learning models. Our study also addresses a major criticism of using classification models as computational models of human vision by using instance segmentation models that localize objects. The evidence underscores the importance of shape in identifying instances in human vision and shows that deep learning models lack an understanding of sub-structures, as indicated by errors related to the position and number of sub-components. Additionally, the similarity in errors made by models and intuitive human processing indicates that models only address intuitive thinking in human vision.

更新时间: 2024-06-11 05:50:34

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.06967v1

Evolving Subnetwork Training for Large Language Models

Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language model and from commonly used modules within each layer, Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP). By gradually increasing the size of the subnetworks during the training process, EST can save the cost of training. We apply EST to train GPT2 model and TinyLlama model, resulting in 26.7\% FLOPs saving for GPT2 and 25.0\% for TinyLlama without an increase in loss on the pre-training dataset. Moreover, EST leads to performance improvements in downstream tasks, indicating that it benefits generalization. Additionally, we provide intuitive theoretical studies based on training dynamics and Dropout theory to ensure the feasibility of EST. Our code is available at https://github.com/OpenDFM/EST.

Updated: 2024-06-11 05:44:56

标题: 大语言模型的不断演变的子网络训练

摘要: 大语言模型开启了人工智能研究的新时代。然而，它们庞大的训练成本阻碍了进一步发展和广泛采纳。在本文中，受大语言模型参数冗余的启发，我们提出了一种新颖的训练范式：Evolving Subnetwork Training（EST）。EST从大语言模型的各层和每一层中常用的模块Multi-Head Attention（MHA）和Multi-Layer Perceptron（MLP）中抽样子网络。通过在训练过程中逐渐增加子网络的大小，EST可以节省训练成本。我们应用EST来训练GPT2模型和TinyLlama模型，结果显示GPT2的FLOPs节省了26.7％，TinyLlama节省了25.0％，而在预训练数据集上损失没有增加。此外，EST导致下游任务的性能提升，表明它有助于泛化。此外，我们基于训练动态和Dropout理论提供直观的理论研究，以确保EST的可行性。我们的代码可在https://github.com/OpenDFM/EST找到。

更新时间: 2024-06-11 05:44:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.06962v1

Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications

The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor.

Updated: 2024-06-11 05:43:50

标题: 黑盒$k$-to-$1$-PCA约简：理论和应用

摘要: $k$-主成分分析（$k$-PCA）问题是数据分析和降维应用中广泛使用的基本算法原语。在统计设置中，$k$-PCA的目标是识别分布的协方差矩阵的顶部特征空间，通过样本我们只能通过黑盒访问。受这些设置的启发，我们分析了黑盒缩减方法作为设计$k$-PCA算法的框架，其中我们通过黑盒$1$-PCA预言来建模对未知目标矩阵的访问，该预言返回一个近似的顶部特征向量，根据两种流行的近似概念。尽管这些黑盒方法可能是设计$k$-PCA算法最自然的基于降维的方法，但是这种递归调用$1$-PCA预言$k$次的黑盒方法以前很难理解。我们的主要贡献是对$k$-PCA的缩减方法的近似参数退化的显著更为锐利的界限。对于一种能量PCA（ePCA）的二次形式近似概念，我们展示了缩减方法不会有参数损失。对于一种备受关注的替代近似概念相关PCA（cPCA），我们紧密刻画了缩减方法可行的参数范围。此外，我们展示了在所有可行的范围内，$k$-cPCA缩减算法对于任何常数$k$都不会有渐近参数损失。我们应用我们的框架来获得抗数据集污染的最新$k$-PCA算法，通过$\mathsf{poly}(k)$因子改进了先前的样本复杂度。

更新时间: 2024-06-11 05:43:50

领域: math.NA,cs.DS,cs.LG,cs.NA,stat.ML

下载: http://arxiv.org/abs/2403.03905v3

Low Rank Multi-Dictionary Selection at Scale

The sparse dictionary coding framework represents signals as a linear combination of a few predefined dictionary atoms. It has been employed for images, time series, graph signals and recently for 2-way (or 2D) spatio-temporal data employing jointly temporal and spatial dictionaries. Large and over-complete dictionaries enable high-quality models, but also pose scalability challenges which are exacerbated in multi-dictionary settings. Hence, an important problem that we address in this paper is: How to scale multi-dictionary coding for large dictionaries and datasets? We propose a multi-dictionary atom selection technique for low-rank sparse coding named LRMDS. To enable scalability to large dictionaries and datasets, it progressively selects groups of row-column atom pairs based on their alignment with the data and performs convex relaxation coding via the corresponding sub-dictionaries. We demonstrate both theoretically and experimentally that when the data has a low-rank encoding with a sparse subset of the atoms, LRMDS is able to select them with strong guarantees under mild assumptions. Furthermore, we demonstrate the scalability and quality of LRMDS in both synthetic and real-world datasets and for a range of coding dictionaries. It achieves 3X to 10X speed-up compared to baselines, while obtaining up to two orders of magnitude improvement in representation quality on some of the real world datasets given a fixed target number of atoms.

Updated: 2024-06-11 05:40:45

标题: 规模化的低秩多字典选择

摘要: 稀疏字典编码框架将信号表示为少量预定义的字典原子的线性组合。它已被应用于图像、时间序列、图信号，最近也用于联合时间和空间字典的2向（或2D）时空数据。大型和过完备的字典能够实现高质量的模型，但也带来了可伸缩性挑战，尤其是在多字典设置中。因此，本文要解决的一个重要问题是：如何将多字典编码扩展到大型字典和数据集？我们提出了一种用于低秩稀疏编码的多字典原子选择技术，称为LRMDS。为了实现对大型字典和数据集的可伸缩性，它基于其与数据的对齐情况逐步选择行列原子对的组，并通过相应的子字典执行凸松弛编码。我们在理论上和实验上证明，当数据具有低秩编码和一部分原子的稀疏子集时，LRMDS能够在温和的假设下具有强大的选择保证。此外，我们在合成和真实世界数据集以及一系列编码字典中展示了LRMDS的可伸缩性和质量。与基线相比，它实现了3倍到10倍的加速，并在一些真实世界数据集上获得了两个数量级的表示质量改进，同时给定固定的目标原子数。

更新时间: 2024-06-11 05:40:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.06960v1

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primarily exploit the prior information within the diffusion models while neglecting their denoising capability. To bridge this gap, this work leverages the diffusion process to reframe noisy inverse problems as a two-variable constrained optimization task by introducing an auxiliary optimization variable. By employing gradient truncation, the projection gradient descent method is efficiently utilized to solve the corresponding optimization problem. The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework. Extensive experiments on the image restoration tasks and source separation and partial generation tasks demonstrate that ProjDiff exhibits superior performance across various linear and nonlinear inverse problems, highlighting its potential for practical applications. Code is available at https://github.com/weigerzan/ProjDiff/.

Updated: 2024-06-11 05:35:18

标题: 释放扩散先验去噪能力以解决逆问题

摘要: 最近出现的扩散模型显著提升了可学习先验的精度，为解决反问题提供了创新途径。由于反问题固有地涉及最大后验估计，先前的研究努力将扩散先验整合到优化框架中。然而，主流基于优化的反问题算法主要利用扩散模型中的先验信息，而忽略了其去噪能力。为了弥补这一差距，本研究利用扩散过程将有噪声的反问题重新构建为一个双变量受限制的优化任务，引入一个辅助优化变量。通过使用梯度截断，投影梯度下降方法被有效地用于解决相应的优化问题。提出的算法名为ProjDiff，有效地利用了预先训练的扩散模型在优化框架中的先验信息和去噪能力。对图像恢复任务以及源分离和部分生成任务进行的大量实验表明，ProjDiff 在各种线性和非线性反问题中表现出优越性能，突显了其在实际应用中的潜力。代码可在https://github.com/weigerzan/ProjDiff/获得。

更新时间: 2024-06-11 05:35:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06959v1

Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box

This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: a decision tree (fully transparent, interpretable, white-box model), a random forest (semi-interpretable, grey-box model), and a neural network (fully opaque, black-box model). We tested the counterfactual generation process using four algorithms (DiCE, WatcherCF, prototype, and GrowingSpheresCF) in the literature in 25 different datasets. Our findings indicate that: (1) Different machine learning models have little impact on the generation of counterfactual explanations; (2) Counterfactual algorithms based uniquely on proximity loss functions are not actionable and will not provide meaningful explanations; (3) One cannot have meaningful evaluation results without guaranteeing plausibility in the counterfactual generation. Algorithms that do not consider plausibility in their internal mechanisms will lead to biased and unreliable conclusions if evaluated with the current state-of-the-art metrics; (4) A counterfactual inspection analysis is strongly recommended to ensure a robust examination of counterfactual explanations and the potential identification of biases.

Updated: 2024-06-11 05:33:37

标题: 基准测试面向实例的反事实算法用于XAI：从白盒到黑盒

摘要: 这项研究调查了机器学习模型对生成反事实解释的影响，通过对三种不同类型的模型进行基准评估：决策树（完全透明、可解释、白盒模型）、随机森林（半可解释、灰盒模型）和神经网络（完全不透明、黑盒模型）。我们在25个不同数据集上使用四种算法（DiCE、WatcherCF、prototype和GrowingSpheresCF）测试了反事实生成过程。我们的研究结果表明：（1）不同的机器学习模型对生成反事实解释影响较小；（2）仅基于接近度损失函数的反事实算法不具有可操作性，也无法提供有意义的解释；（3）在反事实生成中保证合理性是获得有意义评估结果的先决条件。不考虑合理性的算法在内部机制上将导致偏见和不可靠的结论，如果使用当前最先进的度量标准进行评估；（4）强烈建议进行反事实检查分析，以确保对反事实解释进行强有力的审查，并潜在地发现偏见。

更新时间: 2024-06-11 05:33:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2203.02399v4

Turning the Tide on Dark Pools? Towards Multi-Stakeholder Vulnerability Notifications in the Ad-Tech Supply Chain

Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to mislead advertisers. We investigate the effectiveness of vulnerability notification campaigns aimed at mitigating dark pooling. Prior research on vulnerability notifications has primarily focused on single-stakeholder scenarios, and it is unclear whether vulnerability notifications can be effective in the multi-stakeholder ad-tech supply chain. We implement an automated vulnerability notification pipeline to systematically evaluate the responsiveness of various stakeholders, including publishers, ad-networks, and advertisers to vulnerability notifications by academics and activists. Our nine-month long multi-stakeholder notification study shows that notifications are an effective method for reducing dark pooling vulnerabilities in the online advertising ecosystem, especially when targeted towards ad-networks. Further, the sender reputation does not impact responses to notifications from activists and academics in a statistically different way. In addition to being the first notification study targeting the online advertising ecosystem, we are also the first to study multi-stakeholder context in vulnerability notifications.

Updated: 2024-06-11 05:31:29

标题: 扭转黑暗池的局势？走向广告科技供应链中的多利益相关者脆弱性通知

摘要: 在线广告依赖于一个复杂而不透明的供应链，涉及多个利益相关方，包括广告商、出版商和广告网络，每个利益相关方都有明显且有时相互冲突的激励。最近的研究表明，广告技术供应链存在漏洞，例如黑暗池，低质量的出版商将他们的广告库存与高质量的出版商捆绑在一起，以误导广告商。我们调查了旨在减轻黑暗池问题的漏洞通知活动的有效性。以往关于漏洞通知的研究主要集中在单利益相关方场景上，目前尚不清楚漏洞通知是否能在多利益相关方的广告技术供应链中发挥作用。我们实施了一个自动化的漏洞通知流程，系统评估了各种利益相关方对学术界和活动家发出的漏洞通知的响应能力，包括出版商、广告网络和广告商。我们进行了为期九个月的多利益相关方通知研究，结果显示通知是减少在线广告生态系统中黑暗池漏洞的有效方法，尤其是当针对广告网络时。此外，发件人声誉对来自活动家和学术界的通知的响应没有产生统计上不同的影响。除了是针对在线广告生态系统的第一项通知研究之外，我们也是第一个研究漏洞通知中多利益相关方背景的研究。

更新时间: 2024-06-11 05:31:29

领域: cs.CR,cs.CY,cs.MA,cs.NI,cs.SI,K.4.1; K.4.3; K.4.4; D.2.0; D.2.4; G.3; H.3.7; K.1; K.6.1; K.6.5

下载: http://arxiv.org/abs/2406.06958v1

ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models

With the increasing popularity of recommendation systems (RecSys), the demand for compute resources in datacenters has surged. However, the model-wise resource allocation employed in current RecSys model serving architectures falls short in effectively utilizing resources, leading to sub-optimal total cost of ownership. We propose ElasticRec, a model serving architecture for RecSys providing resource elasticity and high memory efficiency. ElasticRec is based on a microservice-based software architecture for fine-grained resource allocation, tailored to the heterogeneous resource demands of RecSys. Additionally, ElasticRec achieves high memory efficiency via our utility-based resource allocation. Overall, ElasticRec achieves an average 3.3x reduction in memory allocation size and 8.1x increase in memory utility, resulting in an average 1.6x reduction in deployment cost compared to state-of-the-art RecSys inference serving system.

Updated: 2024-06-11 05:25:48

标题: 《ElasticRec：一种基于微服务的模型服务架构，实现推荐模型的弹性资源扩展》

摘要: 随着推荐系统（RecSys）日益流行，数据中心对计算资源的需求激增。然而，目前RecSys模型服务架构中采用的基于模型的资源分配在有效利用资源方面存在不足，导致总体拥有成本不佳。我们提出了ElasticRec，一种为RecSys提供资源弹性和高内存效率的模型服务架构。ElasticRec基于微服务的软件架构，用于细粒度资源分配，旨在满足RecSys的异构资源需求。此外，ElasticRec通过我们基于效用的资源分配实现高内存效率。总体而言，与最先进的RecSys推理服务系统相比，ElasticRec实现了平均内存分配大小减少3.3倍，内存效用增加8.1倍，导致部署成本平均降低1.6倍。

更新时间: 2024-06-11 05:25:48

领域: cs.DC,cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.06955v1

Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods

Mixed Integer Linear Programming (MILP) is a fundamental tool for modeling combinatorial optimization problems. Recently, a growing body of research has used machine learning to accelerate MILP solving. Despite the increasing popularity of this approach, there is a lack of a common repository that provides distributions of similar MILP instances across different domains, at different hardness levels, with standardized test sets. In this paper, we introduce Distributional MIPLIB, a multi-domain library of problem distributions for advancing ML-guided MILP methods. We curate MILP distributions from existing work in this area as well as real-world problems that have not been used, and classify them into different hardness levels. It will facilitate research in this area by enabling comprehensive evaluation on diverse and realistic domains. We empirically illustrate the benefits of using Distributional MIPLIB as a research vehicle in two ways. We evaluate the performance of ML-guided variable branching on previously unused distributions to identify potential areas for improvement. Moreover, we propose to learn branching policies from a mix of distributions, demonstrating that mixed distributions achieve better performance compared to homogeneous distributions when there is limited data and generalize well to larger instances.

Updated: 2024-06-11 05:25:38

标题: Distributional MIPLIB：用于推进ML-Guided MILP方法的多领域库

摘要: 混合整数线性规划（MILP）是建模组合优化问题的基本工具。最近，越来越多的研究利用机器学习来加速MILP求解。尽管这种方法越来越受欢迎，但缺乏一个共同的存储库，提供不同领域、不同难度水平、标准化测试集的类似MILP实例分布。在本文中，我们介绍Distributional MIPLIB，这是一个多领域问题分布库，用于推进基于机器学习的MILP方法。我们从这一领域的现有工作以及未被使用的真实问题中筛选出MILP分布，并将它们分类为不同的难度级别。它将通过在不同和现实领域上进行全面评估来促进这一领域的研究。我们通过两种方式实证证明使用Distributional MIPLIB作为研究工具的好处。我们评估了先前未使用的分布上ML引导的变量分支的性能，以确定改进的潜在领域。此外，我们建议从不同分布中学习分支策略，证明混合分布在数据有限且泛化到更大实例时性能优于同质分布。

更新时间: 2024-06-11 05:25:38

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.06954v1

Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

Moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily focus on extracting target features only from the spatial-temporal domain. For further enhancing feature representation, more information domains such as frequency are believed to be potentially valuable. To extend target feature learning, we propose a new Triple-domain Strategy (Tridos) with the frequency-aware memory enhancement on the spatial-temporal domain. In our scheme, it effectively detaches and enhances frequency features by a local-global frequency-aware module with Fourier transform. Inspired by the human visual system, our memory enhancement aims to capture the target spatial relations between video frames. Furthermore, it encodes temporal dynamics motion features via differential learning and residual enhancing. Additionally, we further design a residual compensation unit to reconcile possible cross-domain feature mismatches. To our best knowledge, our Tridos is the first work to explore target feature learning comprehensively in spatial-temporal-frequency domains. The extensive experiments on three datasets (DAUB, ITSDT-15K, and IRDST) validate that our triple-domain learning scheme could be obviously superior to state-of-the-art ones. Source codes are available at https://github.com/UESTC-nnLab/Tridos.

Updated: 2024-06-11 05:21:30

标题: 三域特征学习与频率感知记忆增强在移动红外小目标检测中的应用

摘要: 在移动红外小目标检测中存在重大挑战，因为目标尺寸小，与背景对比度低。目前现有的方法主要集中在从时空域提取目标特征。为了进一步增强特征表示，认为更多信息域如频率可能很有价值。为了扩展目标特征学习，我们提出了一种新的三域策略（Tridos），其中包括空间-时间域的频率感知内存增强。在我们的方案中，通过傅里叶变换，我们有效地分离和增强频率特征。受人类视觉系统的启发，我们的内存增强旨在捕捉视频帧之间的目标空间关系。此外，它通过差分学习和残差增强来编码时间动态运动特征。此外，我们进一步设计了一个残差补偿单元，以协调可能存在的跨域特征不匹配。据我们最好的知识，我们的Tridos是第一个全面探索时空频率域目标特征学习的工作。对三个数据集（DAUB，ITSDT-15K和IRDST）的大量实验证实，我们的三域学习方案显然优于现有技术。源代码可在https://github.com/UESTC-nnLab/Tridos上找到。

更新时间: 2024-06-11 05:21:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06949v1

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML source codes for input, limiting their application to web environments. Moreover, the information contained in HTML codes is often inaccurate or incomplete, making the agent less reliable for practical applications. We propose an LLM-based agent that functions solely on the basis of screenshots for recognizing environments, while leveraging in-context learning to eliminate the need for collecting large datasets of human demonstration. Our strategy, named Context-Aware Action Planning (CAAP) prompting encourages the agent to meticulously review the context in various angles. Through our proposed methodology, we achieve a success rate of 94.4% on 67~types of MiniWoB++ problems, utilizing only 1.48~demonstrations per problem type. Our method offers the potential for broader applications, especially for tasks that require inter-application coordination on computers or smartphones, showcasing a significant advancement in the field of automation agents. Codes and models are accessible at https://github.com/caap-agent/caap-agent.

Updated: 2024-06-11 05:21:20

标题: CAAP：利用前端用户界面解决计算机任务的上下文感知行动规划提示

摘要: 软件机器人长期被部署在机器人过程自动化（RPA）中，以自动化乏味和重复的计算机任务。具有先进推理能力的大型语言模型（LLMs）的出现为这些代理现在承担更复杂甚至以前未见任务奠定了基础。然而，最近文献中基于LLM的自动化技术经常依赖HTML源代码作为输入，限制了它们在Web环境中的应用。此外，HTML代码中包含的信息通常是不准确或不完整的，使代理在实际应用中不够可靠。我们提出了一种基于LLM的代理，仅基于屏幕截图来识别环境，并利用上下文学习来消除收集大量人类演示数据的需求。我们的策略，名为上下文感知行动规划（CAAP）提示，鼓励代理以各种角度仔细审查上下文。通过我们提出的方法论，我们在67种MiniWoB++问题上实现了94.4％的成功率，每种问题类型仅使用1.48个演示。我们的方法为更广泛的应用提供了潜力，特别是对于需要在计算机或智能手机上进行应用间协调的任务，展示了自动化代理领域的重大进展。代码和模型可在https://github.com/caap-agent/caap-agent中访问。

更新时间: 2024-06-11 05:21:20

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.06947v1

FAULT+PROBE: A Generic Rowhammer-based Bit Recovery Attack

Rowhammer is a security vulnerability that allows unauthorized attackers to induce errors within DRAM cells. To prevent fault injections from escalating to successful attacks, a widely accepted mitigation is implementing fault checks on instructions and data. We challenge the validity of this assumption by examining the impact of the fault on the victim's functionality. Specifically, we illustrate that an attacker can construct a profile of the victim's memory based on the directional patterns of bit flips. This profile is then utilized to identify the most susceptible bit locations within DRAM rows. These locations are then subsequently leveraged during an online attack phase with side information observed from the change in the victim's behavior to deduce sensitive bit values. Consequently, the primary objective of this study is to utilize Rowhammer as a probe, shifting the emphasis away from the victim's memory integrity and toward statistical fault analysis (SFA) based on the victim's operational behavior. We show FAULT+PROBE may be used to circumvent the verify-after-sign fault check mechanism, which is designed to prevent the generation of erroneous signatures that leak sensitive information. It does so by injecting directional faults into key positions identified during a memory profiling stage. The attacker observes the signature generation rate and decodes the secret bit value accordingly. This circumvention is enabled by an observable channel in the victim. FAULT+PROBE is not limited to signing victims and can be used to probe secret bits on arbitrary systems where an observable channel is present that leaks the result of the fault injection attempt. To demonstrate the attack, we target the fault-protected ECDSA in wolfSSL's implementation of the TLS 1.3 handshake. We recover 256-bit session keys with an average recovery rate of 22 key bits/hour and a 100% success rate.

Updated: 2024-06-11 05:00:47

标题: FAULT+PROBE：一种基于Rowhammer的通用位恢复攻击

摘要: Rowhammer是一种安全漏洞，允许未经授权的攻击者在DRAM单元中引发错误。为防止故障注入升级为成功攻击，一种被广泛接受的缓解措施是在指令和数据上实施故障检查。我们挑战这一假设的有效性，通过检查故障对受害者功能的影响。具体来说，我们说明攻击者可以根据位翻转的方向模式构建受害者内存的配置文件。然后利用这个配置文件来识别DRAM行中最易受影响的位位置。这些位置随后在在线攻击阶段利用从受害者行为变化中观察到的侧面信息，推断敏感位值。因此，本研究的主要目标是利用Rowhammer作为一个探针，将重点从受害者内存完整性转移到基于受害者操作行为的统计故障分析(SFA)。我们展示了FAULT+PROBE可以用于规避验证后签名故障检查机制，该机制旨在防止泄漏敏感信息的错误签名的生成。它通过在内存配置阶段识别的关键位置注入方向性故障来实现这一点。攻击者观察签名生成速率，并相应解码秘密位值。这种规避是通过受害者中的可观察通道实现的。FAULT+PROBE不仅限于签名受害者，还可以用于探测具有可观察通道的任意系统上的秘密位。为了展示攻击，我们针对wolfSSL的TLS 1.3握手实现中受到故障保护的ECDSA。我们以每小时22个密钥位的平均恢复速率和100%的成功率恢复256位会话密钥。

更新时间: 2024-06-11 05:00:47

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2406.06943v1

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. We develop a non-autoregressive decoder capable of concurrently generating multiple text or acoustic unit tokens upon receiving fixed-length speech chunks. The decoder can generate blank or repeated tokens and employ CTC decoding to dynamically adjust its latency. Experimental results show that NAST-S2X outperforms state-of-the-art models in both speech-to-text and speech-to-speech tasks. It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation.

Updated: 2024-06-11 04:25:48

标题: 一个非自回归生成框架，用于端到端同时语音到任意语言翻译

摘要: 同时翻译模型在促进沟通方面发挥着关键作用。然而，现有研究主要集中在文本到文本或语音到文本模型上，需要额外的串联组件来实现语音到语音的翻译。这些流水线方法容易出现错误传播，并且在每个串联组件中积累延迟，导致说话者和听众之间的同步降低。为了克服这些挑战，我们提出了一种新颖的非自回归生成框架，用于同时语音翻译（NAST-S2X），将语音到文本和语音到语音任务整合到统一的端到端框架中。我们开发了一个非自回归解码器，能够在接收固定长度的语音块时同时生成多个文本或声学单元标记。解码器可以生成空白或重复标记，并使用CTC解码动态调整其延迟。实验结果表明，NAST-S2X在语音到文本和语音到语音任务中优于最先进的模型。它在延迟不到3秒的情况下实现高质量的即时翻译，并在离线生成中提供了28倍的解码加速。

更新时间: 2024-06-11 04:25:48

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.06937v1

Text Injection for Neural Contextual Biasing

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and its biasing component. Unpaired text is converted into speech-like representations and used to guide the model's attention towards relevant bias phrases. Moreover, we introduce a contextual text-injected (CTI) minimum word error rate (MWER) training, which minimizes the expected WER caused by contextual biasing when unpaired text is injected into the model. Experiments show that CTI with 100 billion text sentences can achieve up to 43.3% relative WER reduction from a strong neural biasing model. CTI-MWER provides a further relative improvement of 23.5%.

Updated: 2024-06-11 04:11:56

标题: 神经语境偏见的文本注入

摘要: 神经上下文偏置有效地改进了关键短语的自动语音识别（ASR），特别是那些在训练数据中不常见的短语。本研究提出了上下文文本注入（CTI）来增强上下文ASR。CTI不仅利用成对的语音文本数据，还利用一个更大的未成对文本语料库来优化ASR模型及其偏置组件。未成对文本被转换为类似语音的表示，并用于引导模型注意力集中在相关的偏置短语上。此外，我们引入了一种上下文文本注入（CTI）最小词错误率（MWER）训练，该训练可以最小化由于注入未成对文本而导致的上下文偏置引起的期望词错误率（WER）。实验证明，使用100亿个文本句子的CTI可以实现相对于强神经偏置模型的43.3%的WER减少。CTI-MWER提供了额外的相对改进为23.5%。

更新时间: 2024-06-11 04:11:56

领域: cs.CL,cs.AI,cs.LG,cs.NE,eess.AS

下载: http://arxiv.org/abs/2406.02921v2

Diffusion Models for Accurate Channel Distribution Generation

Strong generative models can accurately learn channel distributions. This could save recurring costs for physical measurements of the channel. Moreover, the resulting differentiable channel model supports training neural encoders by enabling gradient-based optimization. The initial approach in the literature draws upon the modern advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models (DMs), which have demonstrated high sample quality and mode coverage in image generation. In addition to testing the generative performance of the channel distributions, we use an end-to-end (E2E) coded-modulation framework underpinned by DMs and propose an efficient training algorithm. Our simulations with various channel models show that a DM can accurately learn channel distributions, enabling an E2E framework to achieve near-optimal symbol error rates (SERs). Furthermore, we examine the trade-off between mode coverage and sampling speed through skipped sampling using sliced Wasserstein distance (SWD) and the E2E SER. We investigate the effect of noise scheduling on this trade-off, demonstrating that with an appropriate choice of parameters and techniques, sampling time can be significantly reduced with a minor increase in SWD and SER. Finally, we show that the DM can generate a correlated fading channel, whereas a strong GAN variant fails to learn the covariance. This paper highlights the potential benefits of using DMs for learning channel distributions, which could be further investigated for various channels and advanced techniques of DMs.

Updated: 2024-06-11 04:01:00

标题: 精确通道分布生成的扩散模型

摘要: 强生成模型可以准确地学习信道分布。这可以节省对信道的物理测量的重复成本。此外，由此产生的可微信道模型通过启用基于梯度的优化支持训练神经编码器。文献中的初始方法借鉴了图像生成的现代进展，利用生成对抗网络（GANs）或其增强变体生成信道分布。在本文中，我们通过扩散模型（DMs）解决了这一信道近似挑战，该模型已在图像生成中表现出高质量样本和模式覆盖。除了测试信道分布的生成性能，我们还利用由DMs支持的端到端（E2E）编码调制框架，并提出了一种高效的训练算法。我们的各种信道模型的模拟结果显示，DM可以准确学习信道分布，使得E2E框架能够实现接近最佳的符号误差率（SERs）。此外，我们通过跳过采样使用切片Wasserstein距离（SWD）和E2E SER来检验模式覆盖和采样速度之间的权衡。我们研究了噪声调度对这种权衡的影响，表明通过适当选择参数和技术，采样时间可以显著减少，而SWD和SER略微增加。最后，我们展示了DM可以生成相关衰落信道，而强GAN变体无法学习协方差。本文突出了使用DM学习信道分布的潜在好处，这可以进一步研究各种信道和DM的先进技术。

更新时间: 2024-06-11 04:01:00

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2309.10505v4

CodeR: Issue Resolving with Multi-Agent and Task Graphs

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

Updated: 2024-06-11 03:52:03

标题: CodeR：使用多智能体和任务图解决问题

摘要: 最近，GitHub问题解决引起了学术界和工业界的重视。SWE-bench被提出用来衡量解决问题的表现。本文提出了CodeR，采用多智能体框架和预定义任务图来修复并解决报告的错误，并在代码库中添加新功能。在SWE-bench lite上，CodeR能够解决28.33%的问题，每个问题只提交一次。我们检验了CodeR每个设计的性能影响，并提供了促进这一研究方向的见解。

更新时间: 2024-06-11 03:52:03

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.01304v3

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.

Updated: 2024-06-11 03:51:28

标题: 可证表示与部分可观察强化学习的高效规划

摘要: 在大多数现实世界的强化学习应用中，状态信息只能部分观测到，这打破了马尔可夫决策过程的假设，并导致那些将观测与状态混淆的算法表现不佳。另一方面，部分可观察马尔可夫决策过程（POMDPs）提供了一个通用框架，允许在学习、探索和规划中考虑部分可观察性，但也带来了显著的计算和统计挑战。为了解决这些困难，我们提出了一种基于表示的观点，这种观点导致了一个连贯的框架和可实现的算法方法，用于从部分观测中进行实际强化学习。我们提供了一个理论分析，以证明所提出算法的统计效率，并从实证上证明所提出算法能够在各种基准测试中超越最先进的性能，推动可靠的强化学习朝着更实际的应用领域发展。

更新时间: 2024-06-11 03:51:28

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2311.12244v3

Non-autoregressive Personalized Bundle Generation

The personalized bundle generation problem, which aims to create a preferred bundle for user from numerous candidate items, receives increasing attention in recommendation. However, existing works ignore the order-invariant nature of the bundle and adopt sequential modeling methods as the solution, which might introduce inductive bias and cause a large latency in prediction. To address this problem, we propose to perform the bundle generation via non-autoregressive mechanism and design a novel encoder-decoder framework named BundleNAT, which can effectively output the targeted bundle in one-shot without relying on any inherent order. In detail, instead of learning sequential dependency, we propose to adopt pre-training techniques and graph neural network to fully embed user-based preference and item-based compatibility information, and use a self-attention based encoder to further extract global dependency pattern. We then design a permutation-equivariant decoding architecture that is able to directly output the desired bundle in a one-shot manner. Experiments on three real-world datasets from Youshu and Netease show the proposed BundleNAT significantly outperforms the current state-of-the-art methods in average by up to 35.92%, 10.97% and 23.67% absolute improvements in Precision, Precision+, and Recall, respectively.

Updated: 2024-06-11 03:44:17

标题: 非自回归个性化捆绑生成

摘要: 个性化捆绑生成问题旨在从众多候选物品中为用户创建一个首选捆绑，受到推荐领域的越来越多关注。然而，现有研究忽略了捆绑的无序性质，并采用顺序建模方法作为解决方案，这可能引入归纳偏差并导致预测延迟较大。为了解决这个问题，我们提出通过非自回归机制执行捆绑生成，并设计了一种名为BundleNAT的新型编码-解码框架，可以有效地在一次输出所需捆绑，而无需依赖任何固有顺序。具体而言，我们提出采用预训练技术和图神经网络来完全嵌入基于用户偏好和基于物品兼容性的信息，并使用基于自注意力的编码器进一步提取全局依赖模式。然后，我们设计了一个置换等变的解码架构，能够以一次性方式直接输出所需的捆绑。在来自优术和网易的三个真实数据集上的实验证明，所提出的BundleNAT在Precision、Precision+和Recall方面平均显著优于当前最先进的方法，分别提高了35.92％、10.97％和23.67％。

更新时间: 2024-06-11 03:44:17

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2406.06925v1

Finite-Time Analysis for Conflict-Avoidant Multi-Task Reinforcement Learning

Multi-task reinforcement learning (MTRL) has shown great promise in many real-world applications. Existing MTRL algorithms often aim to learn a policy that optimizes individual objective functions simultaneously with a given prior preference (or weights) on different tasks. However, these methods often suffer from the issue of \textit{gradient conflict} such that the tasks with larger gradients dominate the update direction, resulting in a performance degeneration on other tasks. In this paper, we develop a novel dynamic weighting multi-task actor-critic algorithm (MTAC) under two options of sub-procedures named as CA and FC in task weight updates. MTAC-CA aims to find a conflict-avoidant (CA) update direction that maximizes the minimum value improvement among tasks, and MTAC-FC targets at a much faster convergence rate. We provide a comprehensive finite-time convergence analysis for both algorithms. We show that MTAC-CA can find a $\epsilon+\epsilon_{\text{app}}$-accurate Pareto stationary policy using $\mathcal{O}({\epsilon^{-5}})$ samples, while ensuring a small $\epsilon+\sqrt{\epsilon_{\text{app}}}$-level CA distance (defined as the distance to the CA direction), where $\epsilon_{\text{app}}$ is the function approximation error. The analysis also shows that MTAC-FC improves the sample complexity to $\mathcal{O}(\epsilon^{-3})$, but with a constant-level CA distance. Our experiments on MT10 demonstrate the improved performance of our algorithms over existing MTRL methods with fixed preference.

Updated: 2024-06-11 03:38:20

标题: 有限时间内对避免冲突的多任务强化学习进行分析

摘要: 多任务强化学习（MTRL）在许多现实世界应用中显示出巨大的潜力。现有的MTRL算法通常旨在学习一个策略，同时优化不同任务的个体目标函数，并在不同任务上给定先验偏好（或权重）。然而，这些方法通常存在“梯度冲突”问题，即具有较大梯度的任务会主导更新方向，导致其他任务性能下降。在本文中，我们开发了一种新颖的动态加权多任务演员-评论家算法（MTAC），其中包括两个命名为CA和FC的子程序选项，在任务权重更新中。MTAC-CA旨在找到一种避免冲突（CA）的更新方向，最大化任务之间最小值的改进，并且MTAC-FC旨在实现更快的收敛速度。我们为这两种算法提供了全面的有限时间收敛分析。我们展示了MTAC-CA能够使用$\mathcal{O}({\epsilon^{-5}})$样本找到一个$\epsilon+\epsilon_{\text{app}}$-准确的帕累托稳态策略，同时确保小的$\epsilon+\sqrt{\epsilon_{\text{app}}}$级别的CA距离（定义为到CA方向的距离），其中$\epsilon_{\text{app}}$是函数逼近误差。分析还表明，MTAC-FC将样本复杂性提高到$\mathcal{O}(\epsilon^{-3})$，但具有恒定级别的CA距离。我们在MT10上的实验表明，我们的算法相对于现有的具有固定偏好的MTRL方法表现出了更好的性能。

更新时间: 2024-06-11 03:38:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.16077v2

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.

Updated: 2024-06-11 03:36:09

标题: Physics3D: 通过视频扩散学习3D高斯分布的物理特性

摘要: 在最近几年，3D生成模型取得了快速发展，为模拟3D物体的动态运动和定制它们的行为等应用开辟了新的可能性。然而，当前的3D生成模型往往只关注表面特征，如颜色和形状，忽视了控制物体在现实世界中行为的固有物理属性。为了准确模拟与物理一致的动态，必须预测材料的物理属性并将其纳入行为预测过程中。然而，由于真实物体的物理属性的复杂性，预测其多样化材料仍然具有挑战性。在本文中，我们提出了一种名为Physics3D的新方法，通过视频扩散模型学习3D物体的各种物理属性。我们的方法涉及设计一个基于粘弹性材料模型的高度通用的物理模拟系统，使我们能够模拟具有高保真度能力的各种材料。此外，我们从包含更多对现实物体材料理解的视频扩散模型中提取物理先验。大量实验证明了我们的方法在弹性和塑料材料方面的有效性。Physics3D显示了连接物理世界和虚拟神经空间之间鸿沟的巨大潜力，提供了更好地在虚拟环境中集成和应用现实物理原理的可能性。项目页面：https://liuff19.github.io/Physics3D。

更新时间: 2024-06-11 03:36:09

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2406.04338v3

Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography

In recent years, the rapid development of artificial intelligence technology, especially the emergence of large language models (LLMs) such as ChatGPT, has presented significant prospects for application in the field of education. LLMs possess the capability to interpret knowledge, answer questions, and consider context, thus providing support for dialogic teaching to students. Therefore, an examination of the capacity of LLMs to effectively fulfill instructional roles, thereby facilitating student learning akin to human educators within dialogic teaching scenarios, is an exceptionally valuable research topic. This research recruited 34 undergraduate students as participants, who were randomly divided into two groups. The experimental group engaged in dialogic teaching using ChatGPT, while the control group interacted with human teachers. Both groups learned the histogram equalization unit in the information-related course "Digital Image Processing". The research findings show comparable scores between the two groups on the retention test. However, students who engaged in dialogue with ChatGPT exhibited lower performance on the transfer test. Electroencephalography data revealed that students who interacted with ChatGPT exhibited higher levels of cognitive activity, suggesting that ChatGPT could help students establish a knowledge foundation and stimulate cognitive activity. However, its strengths on promoting students. knowledge application and creativity were insignificant. Based upon the research findings, it is evident that ChatGPT cannot fully excel in fulfilling teaching tasks in the dialogue teaching in information related courses. Combining ChatGPT with traditional human teachers might be a more ideal approach. The synergistic use of both can provide students with more comprehensive learning support, thus contributing to enhancing the quality of teaching.

Updated: 2024-06-11 03:35:03

标题: 使用脑电图研究在对话教学中应用ChatGPT的效果

摘要: 近年来，人工智能技术的快速发展，尤其是像ChatGPT这样的大型语言模型（LLMs）的出现，在教育领域的应用前景十分广阔。LLMs具有解释知识、回答问题和考虑上下文的能力，因此为学生提供了对话式教学的支持。因此，对LLMs有效履行教学角色的能力进行审查，从而促进学生在对话式教学场景中学习，是一个非常有价值的研究课题。本研究招募了34名本科生作为参与者，他们被随机分为两组。实验组使用ChatGPT进行对话式教学，而对照组与人类教师交互。两组都学习了信息相关课程“数字图像处理”中的直方图均衡单元。研究结果显示，在保留测试中两组之间的分数相当。然而，在转移测试中，与ChatGPT对话的学生表现出较低的表现。脑电图数据显示，与ChatGPT互动的学生表现出更高水平的认知活动，表明ChatGPT可以帮助学生建立知识基础并刺激认知活动。然而，在促进学生知识应用和创造力方面的优势并不显著。根据研究结果，显然ChatGPT不能完全擅长在信息相关课程中的对话式教学中履行教学任务。将ChatGPT与传统的人类教师结合起来可能是更理想的方法。两者的协同使用可以为学生提供更全面的学习支持，从而有助于提高教学质量。

更新时间: 2024-06-11 03:35:03

领域: cs.CY,cs.AI,physics.ed-ph

下载: http://arxiv.org/abs/2403.16687v5

PcLast: Discovering Plannable Continuous Latent States

Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.

Updated: 2024-06-11 03:32:58

标题: PcLast：发现可规划的连续潜在状态

摘要: 目标导向规划受益于对丰富观察结果的学习低维表示。虽然通常从变分自动编码器或逆动力学中学到的紧凑潜在表示使目标导向决策变得可能，但它们忽略了状态可达性，影响了它们的性能。在本文中，我们学习了一种能够有效规划和学习目标导向策略的表示，将可达状态联系在一起。我们首先通过多步逆动力学学习潜在表示（以消除干扰信息），然后将该表示转换为在$\ell_2$空间中将可达状态联系在一起。我们的提议在各种模拟测试中经过严格测试。基于奖励的设置中的数字结果显示了采样效率的显着提高。此外，在无奖励的设置中，这种方法产生了分层状态抽象，使得能够通过零额外样本进行计算高效的分层规划，以达到临时目标。

更新时间: 2024-06-11 03:32:58

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2311.03534v2

On the Communication Complexity of Secure Multi-Party Computation With Aborts

A central goal of cryptography is Secure Multi-party Computation (MPC), where $n$ parties desire to compute a function of their joint inputs without letting any party learn about the inputs of its peers. Unfortunately, it is well-known that MPC guaranteeing output delivery to every party is infeasible when a majority of the parties are malicious. In fact, parties operating over a point-to-point network (i.e. without access to a broadcast channel) cannot even reach an agreement on the output when more than one third of the parties are malicious (Lamport, Shostak, and Pease, JACM 1980). Motivated by this infeasibility in the point-to-point model, Goldwasser and Lindell (J. Cryptol 2005) introduced a definition of MPC that does not require agreement, referred to as MPC with selective abort. Under this definition, any party may abort the protocol if they detect malicious behavior. They showed that MPC with selective abort is feasible for any number of malicious parties by implementing a broadcast functionality with abort. While the model of MPC with abort has attracted much attention over the years, little is known about its communication complexity over point-to-point networks. In this work, we study the communication complexity of MPC with abort and devise nearly-optimal communication efficient protocols in this model. Namely, we prove trade-offs between the number of honest parties $h$, the communication complexity, and the locality of the protocols. Here, locality is a bound on the number of peers with which each party must communicate.

Updated: 2024-06-11 03:15:39

标题: 关于带有终止机制的安全多方计算的通信复杂性

摘要: 密码学的一个核心目标是安全多方计算（MPC），其中n个参与方希望计算他们共同输入的一个函数，而不让任何一方了解其同行的输入。不幸的是，众所周知，当大多数参与方是恶意的时，保证MPC将输出传递给每个参与方是不可行的。事实上，在点对点网络上运行的参与方（即没有广播通道）甚至在超过三分之一的参与方是恶意的时，甚至无法就输出达成协议（Lamport，Shostak，和Pease，JACM 1980）。受到点对点模型中这种不可行性的启发，Goldwasser和Lindell（J. Cryptol 2005）引入了一种不需要协议的MPC定义，称为有选择性中止的MPC。根据这个定义，如果检测到恶意行为，任何一方都可以中止协议。他们表明，通过实现带有中止功能的广播功能，有选择性中止的MPC对任意数量的恶意方是可行的。虽然多年来MPC带有中止模型引起了很多关注，但关于在点对点网络上的通信复杂性，我们知之甚少。在这项工作中，我们研究了带有中止的MPC的通信复杂性，并在这个模型中设计了几乎最优的通信高效的协议。换句话说，我们证明了诚实方的数量h、通信复杂度和协议的局部性之间的权衡。这里，局部性是每个参与方必须与之通信的同行数量的限制。

更新时间: 2024-06-11 03:15:39

领域: cs.CR

下载: http://arxiv.org/abs/2406.06914v1

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.

Updated: 2024-06-11 03:12:01

标题: SemEval-2024任务3：对话中的多模情感因果分析

摘要: 理解情绪的能力是类似于人类的人工智能的一个关键组成部分，因为情绪极大地影响人类认知、决策和社交互动。除了在对话中识别情绪外，识别对话中个体情绪状态背后的潜在原因的任务在许多应用场景中也非常重要。我们组织了SemEval-2024任务3，名为对话中的多模态情绪原因分析，旨在从对话中提取所有情绪及其相应原因的配对。在不同的模式设置下，它包括两个子任务：对话中文本情绪-原因配对提取（TECPE）和对话中多模态情绪-原因配对提取（MECPE）。该共享任务吸引了143个注册和216个成功提交。在本文中，我们介绍任务、数据集和评估设置，总结前几名团队的系统，并讨论参与者的发现。

更新时间: 2024-06-11 03:12:01

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2405.13049v2

FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods

This paper introduces the Fair Fairness Benchmark (\textsf{FFB}), a benchmarking framework for in-processing group fairness methods. Ensuring fairness in machine learning is important for ethical compliance. However, there exist challenges in comparing and developing fairness methods due to inconsistencies in experimental settings, lack of accessible algorithmic implementations, and limited extensibility of current fairness packages and tools. To address these issues, we introduce an open-source standardized benchmark for evaluating in-processing group fairness methods and provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness. This work offers the following key contributions: the provision of flexible, extensible, minimalistic, and research-oriented open-source code; the establishment of unified fairness method benchmarking pipelines; and extensive benchmarking, which yields key insights from $\mathbf{45,079}$ experiments, $\mathbf{14,428}$ GPU hours. We believe that our work will significantly facilitate the growth and development of the fairness research community.

Updated: 2024-06-11 03:10:10

标题: FFB：一个公平的公平性基准，用于处理组公平性方法

摘要: 这篇论文介绍了公平公正基准（FFB），这是一个用于对内部处理组公平方法进行基准测试的框架。在机器学习中确保公平性对于道德合规非常重要。然而，由于实验设置的不一致性、缺乏可访问的算法实现以及当前公平性软件包和工具的有限可扩展性，存在比较和开发公平性方法的挑战。为了解决这些问题，我们引入了一个开源的标准化基准，用于评估内部处理组公平性方法，并对确保不同群体公平性概念的最新方法进行了全面分析。这项工作提供了以下关键贡献：提供灵活、可扩展、简约且以研究为导向的开源代码；建立统一的公平性方法基准测试管道；以及广泛的基准测试，从45079次实验和14428个GPU小时中获得了关键见解。我们相信我们的工作将极大地促进公平性研究社区的增长和发展。

更新时间: 2024-06-11 03:10:10

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2306.09468v2

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality. Specifically, for the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score, on four NVIDIA A5000 GPUs. Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances. The code is available at https://github.com/czg1225/AsyncDiff.

Updated: 2024-06-11 03:09:37

标题: AsyncDiff: 通过异步去噪实现扩散模型的并行化

摘要: 扩散模型因其在各种应用中具有强大的生成能力而引起了社区的极大兴趣。然而，它们典型的多步骤序贯去噪特性导致了高累积延迟，从而排除了并行计算的可能性。为了解决这个问题，我们引入了AsyncDiff，这是一种通用且即插即用的加速方案，可以实现跨多个设备的模型并行性。我们的方法将繁琐的噪声预测模型分成多个组件，将每个组件分配给不同的设备。为了打破这些组件之间的依赖链，它将传统的序贯去噪转变为一种异步过程，通过利用连续扩散步骤中隐藏状态之间的高相似性。因此，每个组件都可以在不同设备上并行计算。所提出的策略显著降低了推断延迟，同时最小程度地影响了生成质量。具体而言，对于稳定的扩散v2.1，AsyncDiff在四个NVIDIA A5000 GPU上实现了2.7倍的加速，性能几乎没有下降，并且在CLIP评分上实现了4.0倍的加速，仅降低了0.38。我们的实验证明，AsyncDiff可以轻松应用于视频扩散模型，并表现出令人鼓舞的性能。代码可在https://github.com/czg1225/AsyncDiff上找到。

更新时间: 2024-06-11 03:09:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06911v1

Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit

This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs), reflecting the evolution of the model performance during the training process. We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings. First, only the hidden variable's second moment affects feature learnability at the state with uninformative initialization. Second, higher moments influence the probability of feature selection by controlling the attraction region, rather than affecting local stability. Finally, independent noises added in the data argumentation degrade performance but negatively correlated noise can reduces the variance of gradient estimation yielding better performance. Despite of the simplicity of the analyzed model, it exhibits a rich phenomena of training dynamics, paving a way to understand more complex mechanism behind practical large models.

Updated: 2024-06-11 03:07:41

标题: 在高维极限下的非线性对比学习模型的训练动态

摘要: 这封信介绍了对单层非线性对比学习模型的训练动态进行高维分析。模型权重的经验分布收敛到由McKean-Vlasov非线性偏微分方程（PDE）控制的确定性测度。在L2正则化的情况下，这个PDE简化为一组闭合的低维常微分方程（ODEs），反映了模型在训练过程中性能的演变。我们分析了ODE的固定点位置及其稳定性，揭示了几个有趣的发现。首先，在状态具有无信息初始化时，只有隐藏变量的二阶矩会影响特征的学习能力。其次，高阶矩通过控制吸引区域而非影响局部稳定性来影响特征选择的概率。最后，在数据增强中添加的独立噪声会降低性能，但负相关噪声可以减小梯度估计的方差，从而提高性能。尽管所分析模型的简单性，它展示了丰富的训练动态现象，为理解实际大型模型背后更复杂的机制铺平道路。

更新时间: 2024-06-11 03:07:41

领域: cs.LG,cond-mat.dis-nn,stat.ML

下载: http://arxiv.org/abs/2406.06909v1

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

A persistent challenge in sign language video processing, including the task of sign language to written language translation, is how we learn representations of sign language in an effective and efficient way that can preserve the important attributes of these languages, while remaining invariant to irrelevant visual differences. Informed by the nature and linguistics of signed languages, our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer. However, instead of using pose estimation coordinates from off-the-shelf pose tracking models, which have inconsistent performance for hands and faces, we propose to learn the complex handshapes and rich facial expressions of sign languages in a self-supervised fashion. Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training. Compared to a recent model that established a new state of the art in sign language translation on the How2Sign dataset, our approach yields similar translation performance, using less than 3\% of the compute.

Updated: 2024-06-11 03:00:41

标题: 标志火枪手：一种高效的大规模手语翻译多流方法

摘要: 手语视频处理中的一个持久挑战，包括手语到书面语翻译的任务，是如何以一种有效和高效的方式学习手语的表示，可以保留这些语言的重要属性，同时保持对无关视觉差异不变。受手语的性质和语言学知识的启发，我们提出的方法集中在手语视频中最相关的部分：使用者的面部、手部和身体姿势。然而，我们提议以自监督的方式学习手语的复杂手形和丰富的面部表情，而不是使用现成的姿势跟踪模型的姿势估计坐标，这些模型对手部和面部的性能不一致。我们的方法是基于从单个帧学习（而不是视频序列），因此比先前的手语预训练工作更高效。与最近在How2Sign数据集上建立了新的手语翻译技术的模型相比，我们的方法产生了类似的翻译性能，但使用的计算资源不到3％。

更新时间: 2024-06-11 03:00:41

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.06907v1

On the Limitation of Kernel Dependence Maximization for Feature Selection

A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important features will exhibit a high dependence with the response and their inclusion in the set of selected features will increase the HSIC. Through counterexamples, we demonstrate that this rationale is flawed and that feature selection via HSIC maximization can miss critical features.

Updated: 2024-06-11 02:56:13

标题: 关于核依赖最大化在特征选择中的限制

摘要: 一种简单直观的特征选择方法是选择最大化响应和特征之间的非参数依赖度量的特征子集。文献中一个流行的建议使用希尔伯特-施密特独立性准则（HSIC）作为非参数依赖度量。这种特征选择方法背后的理念是，重要特征将表现出与响应的高度依赖性，并且将它们包含在选定特征集中将增加HSIC。通过反例，我们证明了这种理念是错误的，通过HSIC最大化进行特征选择可能会错过关键特征。

更新时间: 2024-06-11 02:56:13

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.06903v1

Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?

While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information. We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.

Updated: 2024-06-11 02:52:58

标题: 被生成的语境蒙蔽：当知识冲突时，语言模型如何合并生成和检索到的语境？

摘要: 尽管辅助信息已成为增强大语言模型（LLMs）的关键，但对于LLMs如何合并这些上下文（特别是LLMs生成的上下文和从外部源检索的上下文）的了解相对较少。为了调查这一问题，我们制定了一个系统框架，以确定LLMs的响应是归因于生成的上下文还是检索的上下文。为了方便追踪响应的来源，我们构建了包含冲突上下文的数据集，即每个问题与生成的上下文和检索的上下文配对，但只有一个包含正确答案。我们的实验揭示了几个LLMs（GPT-4/3.5和Llama2）存在明显偏见，倾向于选择生成的上下文，即使它们提供的信息是错误的。我们进一步确定了导致这种偏见的两个关键因素：i）由LLMs生成的上下文通常与问题更相似，增加了它们被选择的可能性；ii）检索的上下文中使用的分割过程破坏了它们的完整性，从而阻碍了它们在LLMs中的充分利用。我们的分析增进了对LLMs如何合并不同上下文的理解，为推进当前LLMs增强方法提供了宝贵见解，并强调了为检索增强的LLMs带来生成的错误信息的风险。

更新时间: 2024-06-11 02:52:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.11911v6

MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in comprehensively capturing the intricate coevolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pretraining in the low MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model complex evolutionary patterns. Endowed by this, its flexible 1D MSA decoding framework facilitates zero or few shot learning. Moreover, we demonstrate that leveraging the feedback from AlphaFold2 can further enhance the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT in generating faithful virtual MSA to enhance the structure prediction accuracy. The transfer learning capabilities also highlight its great potential for facilitating other protein tasks.

Updated: 2024-06-11 02:42:17

标题: MSAGPT：通过MSA生成预训练进行神经提示的蛋白结构预测

摘要: 多序列比对（MSA）在揭示蛋白家族的进化轨迹中发挥着至关重要的作用。对于缺乏足够同源信息以构建高质量MSA的蛋白序列，蛋白结构预测的准确性往往会受到影响。尽管已经提出了各种方法来在这些条件下生成虚拟MSA，但它们在全面捕捉MSA内部复杂的共同进化模式方面存在不足，或者需要来自外部预测模型的指导。在这里，我们介绍了MSAGPT，一种通过低MSA体制下的MSA生成预训练来促进蛋白结构预测的新方法。MSAGPT采用简单而有效的2D进化位置编码方案来建模复杂的进化模式。凭借这一特点，其灵活的1D MSA解码框架有助于零或少量样本学习。此外，我们还展示了通过利用来自AlphaFold2的反馈可以通过Rejective Fine tuning（RFT）和Reinforcement Learning from AF2 Feedback（RLAF）进一步增强模型容量。大量实验证实了MSAGPT在生成忠实的虚拟MSA以提高结构预测准确性方面的有效性。转移学习能力还突显了它在促进其他蛋白任务方面的巨大潜力。

更新时间: 2024-06-11 02:42:17

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05347v2

Random Features Approximation for Control-Affine Systems

Modern data-driven control applications call for flexible nonlinear models that are amenable to principled controller synthesis and realtime feedback. Many nonlinear dynamical systems of interest are control affine. We propose two novel classes of nonlinear feature representations which capture control affine structure while allowing for arbitrary complexity in the state dependence. Our methods make use of random features (RF) approximations, inheriting the expressiveness of kernel methods at a lower computational cost. We formalize the representational capabilities of our methods by showing their relationship to the Affine Dot Product (ADP) kernel proposed by Casta\~neda et al. (2021) and a novel Affine Dense (AD) kernel that we introduce. We further illustrate the utility by presenting a case study of data-driven optimization-based control using control certificate functions (CCF). Simulation experiments on a double pendulum empirically demonstrate the advantages of our methods.

Updated: 2024-06-11 02:32:16

标题: 控制仿射系统的随机特征逼近

摘要: 现代数据驱动控制应用需要灵活的非线性模型，这些模型适合于基于原则的控制器合成和实时反馈。许多感兴趣的非线性动力系统是控制仿射的。我们提出了两种新颖的非线性特征表示类别，这些类别捕捉了控制仿射结构，同时允许状态依赖性的任意复杂性。我们的方法利用随机特征（RF）逼近，以较低的计算成本继承核方法的表达能力。通过展示我们的方法与Casta\~neda等人（2021年）提出的仿射点积（ADP）核以及我们引入的新颖的仿射密集（AD）核之间的关系，我们形式化了我们的方法的表示能力。我们进一步通过提出一个基于数据驱动优化的控制案例研究，使用控制证书功能（CCF）来说明其实用性。对一个双摆的模拟实验从经验上展示了我们方法的优势。

更新时间: 2024-06-11 02:32:16

领域: cs.LG,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.06514v2

Nonlinear time-series embedding by monotone variational inequality

In the wild, we often encounter collections of sequential data such as electrocardiograms, motion capture, genomes, and natural language, and sequences may be multichannel or symbolic with nonlinear dynamics. We introduce a new method to learn low-dimensional representations of nonlinear time series without supervision and can have provable recovery guarantees. The learned representation can be used for downstream machine-learning tasks such as clustering and classification. The method is based on the assumption that the observed sequences arise from a common domain, but each sequence obeys its own autoregressive models that are related to each other through low-rank regularization. We cast the problem as a computationally efficient convex matrix parameter recovery problem using monotone Variational Inequality and encode the common domain assumption via low-rank constraint across the learned representations, which can learn the geometry for the entire domain as well as faithful representations for the dynamics of each individual sequence using the domain information in totality. We show the competitive performance of our method on real-world time-series data with the baselines and demonstrate its effectiveness for symbolic text modeling and RNA sequence clustering.

Updated: 2024-06-11 02:19:31

标题: 非线性时间序列嵌入的单调变分不等式

摘要: 在野外，我们经常遇到诸如心电图、动作捕捉、基因组和自然语言等序列数据集，序列可能是多通道的或符号性的，具有非线性动态。我们介绍了一种新方法，可以在没有监督的情况下学习非线性时间序列的低维表示，并具有可证实的恢复保证。学习到的表示可以用于下游的机器学习任务，如聚类和分类。该方法基于一个假设，即观察到的序列来自一个共同的域，但每个序列都遵循其自己的自回归模型，这些模型通过低秩正则化相互关联。我们将问题构建为一个计算有效的凸矩阵参数恢复问题，使用单调变分不等式，并通过学习表示中的低秩约束来编码共同域的假设，这可以学习整个领域的几何结构，以及使用总体领域信息来展现每个个体序列的动态的忠实表示。我们展示了我们的方法在真实世界时间序列数据上与基准的竞争性表现，并展示了它在符号文本建模和RNA序列聚类中的有效性。

更新时间: 2024-06-11 02:19:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.06894v1

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an average-case setting and establish an algorithmic separation of transformers over FCNs. Specifically, a one-layer transformer trained with gradient descent provably learns the sparse token selection task and, surprisingly, exhibits strong out-of-distribution length generalization. We provide empirical simulations to justify our theoretical findings.

Updated: 2024-06-11 02:15:53

标题: 变压器可证明学习稀疏标记选择，而全连接网络不能

摘要: 变压器架构在各种深度学习环境中普遍存在，因为它具有选择和组合结构信息的卓越能力。受到这些能力的启发，Sanford等人提出了稀疏令牌选择任务，在这个任务中，变压器在最坏情况下表现出色，而全连接网络（FCNs）则失败。在此基础上，我们将FCN的下界加强到平均情况设置，并建立了变压器优于FCN的算法分离。具体来说，使用梯度下降训练的一层变压器可以证明学习稀疏令牌选择任务，并且令人惊讶的是，它展现出强大的超出分布长度泛化能力。我们提供了实证模拟来证明我们的理论发现。

更新时间: 2024-06-11 02:15:53

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2406.06893v1

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreover, inadequate training data can further increase the risk of performance degradation. To address these challenges, we propose a novel dReLU function, which is designed to improve LLM activation sparsity, along with a high-quality training data mixture ratio to facilitate effective sparsification. Additionally, we leverage sparse activation patterns within the Feed-Forward Network (FFN) experts of Mixture-of-Experts (MoE) models to further boost efficiency. By applying our neuron sparsification method to the Mistral and Mixtral models, only 2.5 billion and 4.3 billion parameters are activated per inference iteration, respectively, while achieving even more powerful model performance. Evaluation results demonstrate that this sparsity achieves a 2-5x decoding speedup. Remarkably, on mobile phones, our TurboSparse-Mixtral-47B achieves an inference speed of 11 tokens per second. Our models are available at \url{https://huggingface.co/PowerInfer}

Updated: 2024-06-11 02:15:47

标题: 超级稀疏：用最少的激活参数实现LLM最佳性能

摘要: 利用激活稀疏性是显著加速大型语言模型（LLMs）推理过程的一种有希望的方法，而不会影响性能。然而，激活稀疏性取决于激活函数，常用的函数如SwiGLU和GeGLU表现出有限的稀疏性。简单地用ReLU替换这些函数无法实现足够的稀疏性。此外，不足的训练数据还会进一步增加性能下降的风险。为了解决这些挑战，我们提出了一种新颖的dReLU函数，旨在提高LLM激活稀疏性，同时使用高质量的训练数据混合比例来促进有效的稀疏化。另外，我们利用混合专家（MoE）模型中前馈网络（FFN）专家内的稀疏激活模式进一步提高效率。通过将我们的神经元稀疏化方法应用于Mistral和Mixtral模型，分别在每次推理迭代中激活的参数仅为25亿和43亿个，同时实现更强大的模型性能。评估结果表明，这种稀疏性实现了2-5倍的解码加速。值得注意的是，在手机上，我们的TurboSparse-Mixtral-47B实现了每秒11个标记的推理速度。我们的模型可在\url{https://huggingface.co/PowerInfer}上找到。

更新时间: 2024-06-11 02:15:47

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.05955v2

LLaMA-E: Empowering E-commerce Authoring with Object-Interleaved Instruction Following

E-commerce authoring entails creating engaging, diverse, and targeted content to enhance preference elicitation and retrieval experience. While Large Language Models (LLMs) have revolutionized content generation, they often fall short in e-commerce applications due to their limited memorization of domain-specific features. This paper proposes LLaMA-E, the unified e-commerce authoring models that address the contextual preferences of customers, sellers, and platforms, the essential objects in e-commerce operation. We design the instruction set derived from tasks of ads generation, query-enhanced product title rewriting, product classification, purchase intent speculation, and general e-commerce Q&A. The instruction formulation ensures the interleaved cover of the presented and required object features, allowing the alignment of base models to parameterise e-commerce knowledge comprehensively. The proposed LLaMA-E models achieve state-of-the-art evaluation performance and exhibit the advantage in zero-shot practical applications. To our knowledge, this is the first LLM tailored to empower authoring applications with comprehensive scenario understanding by integrating features focused on participated objects.

Updated: 2024-06-11 02:14:06

标题: LLaMA-E：通过对象交错指导实现电子商务创作的强大功能

摘要: 电子商务创作涉及创建引人入胜、多样化和针对性的内容，以增强偏好引发和检索体验。虽然大型语言模型(LLMs)已经彻底改变了内容生成，但由于其对领域特定特征的有限记忆，它们在电子商务应用中经常表现不佳。本文提出了LLaMA-E，统一的电子商务创作模型，以解决客户、卖家和平台这三个在电子商务运营中至关重要的对象的上下文偏好。我们设计了从广告生成、查询增强产品标题重写、产品分类、购买意图推测和一般电子商务问答等任务中得出的指令集。指令的制定确保呈现和所需对象特征的交错覆盖，从而使基础模型能够全面参数化电子商务知识。所提出的LLaMA-E模型取得了最先进的评估性能，并在零-shot实际应用中展现了优势。据我们所知，这是第一个旨在通过整合关注参与对象特征的功能来赋予创作应用全面情景理解能力的LLM。

更新时间: 2024-06-11 02:14:06

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2308.04913v2

Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification

Traditional methods for tabular classification usually rely on supervised learning from scratch, which requires extensive training data to determine model parameters. However, a novel approach called Prior-Data Fitted Networks (TabPFN) has changed this paradigm. TabPFN uses a 12-layer transformer trained on large synthetic datasets to learn universal tabular representations. This method enables fast and accurate predictions on new tasks with a single forward pass and no need for additional training. Although TabPFN has been successful on small datasets, it generally shows weaker performance when dealing with categorical features. To overcome this limitation, we propose FT-TabPFN, which is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features. By fine-tuning it for downstream tasks, FT-TabPFN not only expands the functionality of the original model but also significantly improves its applicability and accuracy in tabular classification. Our full source code is available for community use and development.

Updated: 2024-06-11 02:13:46

标题: 特征标记化，增强表格：用于表格分类的FT-TABPFN模型

摘要: 传统的表格分类方法通常依赖于从头开始的监督学习，这需要大量的训练数据来确定模型参数。然而，一种名为Prior-Data Fitted Networks（TabPFN）的新方法改变了这种范式。TabPFN使用一个12层的transformer在大型合成数据集上进行训练，以学习通用的表格表示。这种方法能够在新任务上进行快速准确的预测，只需进行一次前向传递，无需额外的训练。尽管TabPFN在小数据集上取得了成功，但在处理分类特征时通常表现出较弱的性能。为了克服这一限制，我们提出了FT-TabPFN，这是TabPFN的增强版本，包括一个新颖的特征标记层，以更好地处理分类特征。通过对下游任务进行微调，FT-TabPFN不仅扩展了原始模型的功能，而且显著提高了其在表格分类中的适用性和准确性。我们的完整源代码可供社区使用和开发。

更新时间: 2024-06-11 02:13:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06891v1

Consistent Optimal Transport with Empirical Conditional Measures

Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation techniques are necessary. Since the main challenge is that the conditional distributions are not explicitly available, the key idea in our OT formulation is to employ kernelized-least-squares terms computed over the joint samples, which implicitly match the transport plan's marginals with the empirical conditionals. Under mild conditions, we prove that our estimated transport plans, as a function of the conditioned variable, are asymptotically optimal. For finite samples, we show that the deviation in terms of our regularized objective is bounded by $O(1/m^{1/4})$, where $m$ is the number of samples. We also discuss how the conditional transport plan could be modelled using explicit probabilistic models as well as using implicit generative ones. We empirically verify the consistency of our estimator on synthetic datasets, where the optimal plan is analytically known. When employed in applications like prompt learning for few-shot classification and conditional-generation in the context of predicting cell responses to treatment, our methodology improves upon state-of-the-art methods.

Updated: 2024-06-11 02:12:50

标题: 使用经验条件测量的一致最优输运

摘要: 鉴于从两个联合分布中得到的样本，我们考虑在一个共同变量条件下它们之间的最优输运（OT）问题。我们专注于条件变量可能是连续的一般设置，并且这个变量在两个联合分布中的边缘可能不相同。在这种情况下，标准的OT变体无法使用，需要新颖的估计技术。由于主要挑战是条件分布并不明确可用，我们OT公式的关键思想是在联合样本上计算核化最小二乘项，这些项隐含地将输运计划的边缘与经验条件匹配。在温和条件下，我们证明我们估计的输运计划作为条件变量的函数在渐近上是最优的。对于有限样本，我们展示我们正则化目标的偏差被限制为$O(1/m^{1/4})$，其中$m$是样本数量。我们还讨论了如何使用显式概率模型以及隐式生成模型来建模条件输运计划。我们在合成数据集上经验验证了我们估计器的一致性，在那里最优计划是已知的。当应用于少样本分类的快速学习和在预测细胞对待治疗的反应的条件生成背景下，我们的方法改进了最先进的方法。

更新时间: 2024-06-11 02:12:50

领域: cs.LG

下载: http://arxiv.org/abs/2305.15901v6

PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation raises our inquiry: Can preference learning, which trains models to prefer correct solutions over incorrect ones, help push the boundaries of code LMs even further? We propose PLUM, a novel \textbf{p}reference \textbf{l}earning framework a\textbf{u}gmented with test cases tailored for code L\textbf{M}s.PLUM aims to investigate the key success factors and potential benefits of preference learning in code LMs, which remain elusive despite its success in aligning LMs with human values. PLUM consists of three stages: (1) Generating test cases for natural language instructions, (2) sampling candidate solutions from the policy and evaluating them against the test cases to create a preference dataset, which is then used to (3) train the policy with a preference learning algorithm. Experiments demonstrate that PLUM substantially improves the performance of existing code LMs on established code generation benchmarks such as HumanEval (+) and MBPP (+), even for the state-of-the-art open-source language model CodeQwen-1.5-7B-Chat. PLUM complements the supervised fine-tuning (SFT) stage, demonstrating synergistic effects.

Updated: 2024-06-11 02:07:18

标题: PLUM：偏好学习加测试用例产生更好的代码语言模型

摘要: 指导微调的代码语言模型（LMs）在各种编程任务中表现出潜力。它们使用语言建模目标，在自然语言说明和黄金代码片段对上进行训练。最近的证据表明，这些模型在训练过程中从未接触过不正确的解决方案，通常难以区分正确和不正确的解决方案。这一观察引发了我们的探究：偏好学习，即训练模型更喜欢正确解决方案而非不正确解决方案，能否进一步推动代码LMs的边界？我们提出了PLUM，一种新颖的\textbf{p}reference \textbf{l}earning \textbf{u}framework，配备了专为代码LMs定制的测试用例。PLUM旨在研究偏好学习在代码LMs中的关键成功因素和潜在好处，尽管在与人类价值观对齐方面取得成功，但其仍然难以捉摸。PLUM包括三个阶段：（1）为自然语言说明生成测试用例，（2）从策略中抽样候选解决方案，并根据测试用例评估它们，以创建一个偏好数据集，然后用于（3）使用偏好学习算法训练策略。实验证明，PLUM显著提高了现有代码LMs在已建立的代码生成基准上的性能，如HumanEval(+)和MBPP(+)，甚至对于最先进的开源语言模型CodeQwen-1.5-7B-Chat。PLUM补充了监督微调（SFT）阶段，展现出协同效应。

更新时间: 2024-06-11 02:07:18

领域: cs.CL,cs.AI,cs.LG,cs.PL,cs.SE

下载: http://arxiv.org/abs/2406.06887v1

Integrating Marketing Channels into Quantile Transformation and Bayesian Optimization of Ensemble Kernels for Sales Prediction with Gaussian Process Models

This study introduces an innovative Gaussian Process (GP) model utilizing an ensemble kernel that integrates Radial Basis Function (RBF), Rational Quadratic, and Mat\'ern kernels for product sales forecasting. By applying Bayesian optimization, we efficiently find the optimal weights for each kernel, enhancing the model's ability to handle complex sales data patterns. Our approach significantly outperforms traditional GP models, achieving a notable 98\% accuracy and superior performance across key metrics including Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination ($R^2$). This advancement underscores the effectiveness of ensemble kernels and Bayesian optimization in improving predictive accuracy, offering profound implications for machine learning applications in sales forecasting.

Updated: 2024-06-11 01:59:25

标题: 将市场渠道整合到分位数转换和贝叶斯优化的集成核心中，用高斯过程模型进行销售预测

摘要: 这项研究引入了一种创新的高斯过程（GP）模型，利用整合径向基函数（RBF）、有理二次和Mat\'ern核的集成核来进行产品销售预测。通过应用贝叶斯优化，我们有效地找到了每个核的最佳权重，增强了模型处理复杂销售数据模式的能力。我们的方法明显优于传统的GP模型，在关键指标包括均方误差（MSE）、平均绝对误差（MAE）、均方根误差（RMSE）和确定系数（$R^2$）方面取得显著的98%准确率和优越性能。这一进展强调了集成核和贝叶斯优化在提高预测准确性方面的有效性，为销售预测中的机器学习应用提供了深远的影响。

更新时间: 2024-06-11 01:59:25

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.09386v2

QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution

Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.

Updated: 2024-06-11 01:52:05

标题: QMGeo：通过混合截断几何分布的随机量化实现差分隐私联邦学习

摘要: 联邦学习（FL）是一个框架，允许多个用户在参数服务器的协调下通过仅传输模型更新来共同训练全局机器学习（ML）模型，同时能够保持其本地数据集。这种分布式框架的一个关键动机是为用户提供隐私保证。然而，仅保留用户的数据集在本地被证明并不足以保护隐私。已经提出了几种差分隐私（DP）机制，通过向框架引入随机性来提供可证明的隐私保证，其中大多数机制依赖于注入加性噪声。FL框架也面临通信效率的挑战，特别是随着机器学习模型在复杂性和大小上的增长。量化是一种常用的方法，通过传输底层信息的压缩表示来降低通信成本。尽管已经有几项关于DP和量化在FL中的研究，但量化方法单独提供隐私保证的潜在贡献尚未得到广泛分析。在本文中，我们提出了一种新颖的随机量化方法，利用混合几何分布引入所需的随机性以提供DP，而不引入任何加性噪声。我们为我们的框架提供了收敛分析，并对其性能进行了实证研究。

更新时间: 2024-06-11 01:52:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.05761v2

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the generated output with individual expectations. Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. In order to capture user-specific behavior patterns, we first train a reranker to prioritize the most useful information from top-retrieved relevant historical records. By combining the prioritized history with the corresponding query, we train an adapter to align the output with individual user-specific preferences, eliminating the reliance on access to inherent model parameters of black-box LLMs. Both the reranker and the adapter can be decomposed into a base model with multiple user-specific heads, resembling a hydra. The base model maintains shared knowledge across users, while the multiple personal heads capture user-specific preferences. Experimental results demonstrate that HYDRA outperforms existing state-of-the-art prompt-based methods by an average relative improvement of 9.01% across five diverse personalization tasks in the LaMP benchmark. Our implementation is available at https://github.com/night-chen/HYDRA.

Updated: 2024-06-11 01:51:57

标题: HYDRA：用于黑盒LLM个性化的模型分解框架

摘要: 个性化已经成为现代智能系统中的一个关键研究领域，主要关注挖掘用户的行为历史并根据他们的偏好提供定制体验。尽管黑盒大型语言模型（LLMs）展现出了显著的少样本能力，但其模型参数的固有不透明性在使生成的输出与个体期望保持一致方面存在重大挑战。现有解决方案主要侧重于提示设计，以整合用户特定的配置文件和行为；然而，由于这些方法无法捕捉所有用户之间的共享知识，因此往往难以有效地概括。为了解决这些挑战，我们提出了HYDRA，一个模型分解框架，旨在捕捉历史数据中用户特定的行为模式和所有用户之间的共享一般知识，以提供个性化生成。为了捕捉用户特定的行为模式，我们首先训练一个重新排列器，以优先考虑从检索到的相关历史记录中提取最有用的信息。通过将优先考虑的历史记录与相应的查询结合，我们训练一个适配器，以将输出与个体用户特定的偏好对齐，消除对黑盒LLMs固有模型参数的依赖。重新排列器和适配器都可以分解为一个基础模型，具有多个用户特定的头部，类似于九头蛇。基础模型保持用户之间的共享知识，而多个个人头部捕捉用户特定的偏好。实验结果表明，HYDRA相对于现有的基于提示的方法在LaMP基准测试中的五个不同个性化任务上平均相对改进9.01％。我们的实现可在https://github.com/night-chen/HYDRA上找到。

更新时间: 2024-06-11 01:51:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.02888v2

Reconstructing the Geometry of Random Geometric Graphs

Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance, independently among pairs. In this work, we show how to efficiently reconstruct the geometry of the underlying space from the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a low dimensional manifold and that the connection probability is a strictly decreasing function of the Euclidean distance between the points in a given embedding of the manifold in $\mathbb{R}^N$. Our work complements a large body of work on manifold learning, where the goal is to recover a manifold from sampled points sampled in the manifold along with their (approximate) distances.

Updated: 2024-06-11 01:50:34

标题: 重建随机几何图的几何结构

摘要: 随机几何图是在度量空间上定义的随机图模型。这样的模型首先从度量空间中抽样点，然后以依赖于它们之间距离的概率连接每对抽样点，对于每对抽样点独立进行连接。在这项工作中，我们展示了如何在流形假设下，即假设底层空间是低维流形，并且连接概率是欧几里得距离的严格递减函数的情况下，有效地从抽样图中重建底层空间的几何结构，即给定流形在$\mathbb{R}^N$中的嵌入中的点之间的欧几里得距离。我们的工作补充了大量关于流形学习的研究，其目标是从在流形中抽样的点及其（近似）距离中恢复流形。

更新时间: 2024-06-11 01:50:34

领域: cs.LG,math.PR

下载: http://arxiv.org/abs/2402.09591v2

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.

Updated: 2024-06-11 01:46:51

标题: 使用Fenchel-Young损失的在线结构化预测和改进的在线多类分类Logistic损失的替代后悔

摘要: 本文研究了具有全信息反馈的在线结构化预测。对于在线多类分类，Van der Hoeven（2020）通过引入一种优雅的“利用替代差距”框架建立了与时间跨度无关的\emph{有限}替代遗憾界限。然而，这种框架主要局限于多类分类，因为它依赖于将估计得分转换为输出的特定于分类的程序。我们将“利用替代差距”框架扩展到具有\emph{Fenchel-Young损失}的在线结构化预测中，这是一个包含多类分类的逻辑损失作为特例的大型替代损失家族，并在各种结构化预测问题中获得有限的替代遗憾界限。为此，我们提出并分析了\emph{随机解码}，它将估计得分转换为一般的结构化输出。此外，通过将我们的解码应用于具有逻辑损失的在线多类分类，我们获得了一个$O(\| \mathbf{U} \|_\mathrm{F}^2)$的替代遗憾界限，其中$\mathbf{U}$是最佳离线线性估计器，$\| \cdot \|_\mathrm{F}$表示Frobenius范数。这个界限在对数因子上是紧密的，并且通过因素$d$（类别数）改进了Van der Hoeven（2020）的先前界限$O(d\| \mathbf{U} \|_\mathrm{F}^2)$。

更新时间: 2024-06-11 01:46:51

领域: cs.LG

下载: http://arxiv.org/abs/2402.08180v2

Pseudo-Entanglement is Necessary for EFI Pairs

Regarding minimal assumptions, most of classical cryptography is known to depend on the existence of One-Way Functions (OWFs). However, recent evidence has shown that this is not the case when considering quantum resources. Besides the well known unconditional security of Quantum Key Distribution, it is now known that computational cryptography may be built on weaker primitives than OWFs, e.g., pseudo-random states [JLS18], one-way state generators [MY23], or EFI pairs of states [BCQ23]. We consider a new quantum resource, pseudo-entanglement, and show that the existence of EFI pairs, one of the current main candidates for the weakest computational assumption for cryptography (necessary for commitments, oblivious transfer, secure multi-party computation, computational zero-knowledge proofs), implies the existence of pseudo-entanglement, as defined by [ABF+24, ABV23] under some reasonable adaptations. We prove this by constructing a new family of pseudo-entangled quantum states given only EFI pairs. Our result has important implications for the field of computational cryptography. It shows that if pseudo-entanglement does not exist, then most of cryptography cannot exist either. Moreover, it establishes pseudo-entanglement as a new minimal assumption for most of computational cryptography, which may pave the way for the unification of other assumptions into a single primitive. Finally, pseudo-entanglement connects physical phenomena and efficient computation, thus, our result strengthens the connection between cryptography and the physical world.

Updated: 2024-06-11 01:44:16

标题: 虚拟纠缠对于EFI对是必要的

摘要: 关于最小假设，众所周知，大部分经典密码学依赖于单向函数（OWFs）的存在。然而，最近的证据表明，在考虑量子资源时并非如此。除了众所周知的量子密钥分发的无条件安全性外，现在已知，计算密码学可以建立在比OWFs更弱的原语上，例如伪随机态[JLS18]、单向态生成器[MY23]或EFI对状态[BCQ23]。我们考虑了一种新的量子资源，伪纠缠，并展示了EFI对的存在，这是当前主要候选用于密码学的最弱计算假设（对于承诺、无意中转移、安全多方计算、计算零知识证明至关重要），意味着伪纠缠的存在，根据[ABF+24，ABV23]的定义在某些合理的调整下。我们通过仅给定EFI对构造了一个新的家族伪纠缠量子态来证明这一点。我们的结果对计算密码学领域具有重要意义。它表明，如果伪纠缠不存在，那么大部分的密码学也无法存在。此外，它将伪纠缠确立为大部分计算密码学的新的最小假设，这可能为其他假设的统一铺平道路。最后，伪纠缠连接了物理现象和高效计算，因此，我们的结果加强了密码学与物理世界之间的联系。

更新时间: 2024-06-11 01:44:16

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2406.06881v1

Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a sequential approach results in serious issues such as significant under-utilization of data and distribution mismatch between the learned reward model and generated policy, which eventually lead to poor alignment performance. We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF), capable of integrating both human preference and demonstration to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms such as RLHF and Directly Policy Optimization (DPO), and only requires minor changes to the existing alignment pipelines. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo. We observe that the proposed solutions outperform the existing alignment algorithms such as RLHF and DPO by large margins, especially when the amount of high-quality preference data is relatively limited.

Updated: 2024-06-11 01:20:53

标题: 联合示范和偏好学习可以提高政策与人类反馈的一致性

摘要: 调整人类偏好和价值是构建当代基础模型和具身人工智能的重要要求。然而，流行的方法，如带有人类反馈的强化学习（RLHF），将任务分解为连续阶段，如监督微调（SFT）、奖励建模（RM）和强化学习（RL），每个阶段执行一个特定的学习任务。这种顺序方法导致严重问题，如数据严重未被利用和学习奖励模型与生成策略之间的分布不匹配，最终导致对齐性能不佳。我们开发了一种名为整合人类反馈的对齐单阶段方法（AIHF），能够整合人类偏好和演示以训练奖励模型和策略。所提出的方法允许一套高效的算法，可以轻松转化为，并利用，流行的对齐算法，如RLHF和直接策略优化（DPO），并且只需要对现有对齐管道进行较小的更改。我们通过涉及LLMs中的对齐问题和MuJoCo中的机器人控制问题的大量实验展示了所提出解决方案的效率。我们观察到，所提出的解决方案在高质量偏好数据相对有限时，特别是在这种情况下，胜过现有的对齐算法，如RLHF和DPO。

更新时间: 2024-06-11 01:20:53

领域: cs.AI,cs.HC,cs.RO

下载: http://arxiv.org/abs/2406.06874v1

Visual Prompt Tuning in Null Space for Continual Learning

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to overcome catastrophic forgetting in CL. However, different from the orthogonal projection in the traditional CNN architecture, the prompt gradient orthogonal projection in the ViT architecture shows completely different and greater challenges, i.e., 1) the high-order and non-linear self-attention operation; 2) the drift of prompt distribution brought by the LayerNorm in the transformer block. Theoretically, we have finally deduced two consistency conditions to achieve the prompt gradient orthogonal projection, which provide a theoretical guarantee of eliminating interference on previously learned knowledge via the self-attention mechanism in visual prompt tuning. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient orthogonal projection. Extensive experimental results demonstrate the effectiveness of anti-forgetting on four class-incremental benchmarks with diverse pre-trained baseline models, and our approach achieves superior performances to state-of-the-art methods. Our code is available at https://github.com/zugexiaodui/VPTinNSforCL.

Updated: 2024-06-11 01:15:17

标题: 在零空间中针对持续学习的视觉提示调整

摘要: 现有的提示调整方法在持续学习（CL）中展现出令人印象深刻的性能，通过选择和更新视觉变换器模型中相关的提示。相比之下，本文旨在通过调整提示的方向来学习每个任务，使其正交于先前任务特征所张成的子空间，以确保对已学习的任务没有干扰，从而克服了在持续学习中的灾难性遗忘。然而，与传统CNN架构中的正交投影不同，ViT架构中的提示梯度正交投影展现出完全不同且更大的挑战，即1）高阶和非线性的自注意力操作；2）由变压器块中的LayerNorm引起的提示分布漂移。从理论上讲，我们最终推导出了两个一致性条件，以实现提示梯度正交投影，这为通过视觉提示调整中的自注意机制消除先前学习知识上的干扰提供了理论保证。在实践中，我们提出了一种基于零空间的有效近似解决方案，以实现提示梯度正交投影。大量实验结果展示了在四个不同的预训练基线模型上针对类增量基准的抗遗忘性能的有效性，我们的方法实现了优于最先进方法的性能。我们的代码可在https://github.com/zugexiaodui/VPTinNSforCL 上找到。

更新时间: 2024-06-11 01:15:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05658v2

What's in an embedding? Would a rose by any embedding smell as sweet?

Large Language Models (LLMs) are often criticized for lacking true "understanding" and an ability to "reason" with their knowledge, being seen merely as advanced autocomplete systems. We believe that this perspective might be missing an important insight. We suggest that LLMs do develop a kind of empirical "understanding" that is "geometry"-like, which seems quite sufficient for a range of applications in NLP, computer vision, coding assistance, etc. However, this "geometric" understanding, built from incomplete and noisy data, makes them unreliable, difficult to generalize, and lacking in inference capabilities and explanations, similar to the challenges faced by heuristics-based expert systems decades ago. To overcome these limitations, we suggest that LLMs should be integrated with an "algebraic" representation of knowledge that includes symbolic AI elements used in expert systems. This integration aims to create large knowledge models (LKMs) that not only possess "deep" knowledge grounded in first principles, but also have the ability to reason and explain, mimicking human expert capabilities. To harness the full potential of generative AI safely and effectively, a paradigm shift from LLMs to the more comprehensive LKMs is needed.

Updated: 2024-06-11 01:10:40

标题: 嵌入中有什么？任何一种嵌入都会有如此芬芳吗？

摘要: 大型语言模型（LLMs）经常被批评缺乏真正的“理解”和与知识“推理”的能力，被视为高级自动完成系统。我们认为，这种观点可能忽略了一个重要的洞见。我们建议LLMs确实发展了一种类似于“几何”般的经验性“理解”，这种理解对于自然语言处理、计算机视觉、编码辅助等一系列应用来说似乎是足够的。然而，这种从不完整和嘈杂数据构建的“几何”理解使它们不可靠，难以泛化，并且缺乏推理能力和解释能力，类似于几十年前启发式专家系统面临的挑战。为了克服这些局限性，我们建议LLMs应该与包含专家系统中使用的符号人工智能元素的“代数”知识表示相结合。这种整合旨在创建大型知识模型（LKMs），这些模型不仅具有根植于第一原则的“深刻”知识，而且具有推理和解释的能力，模仿人类专家的能力。为了安全有效地利用生成式人工智能的全部潜力，需要从LLMs向更全面的LKMs的范式转变。

更新时间: 2024-06-11 01:10:40

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06870v1

Are Normalizing Flows the Key to Unlocking the Exponential Mechanism?

The Exponential Mechanism (ExpM), designed for private optimization, has been historically sidelined from use on continuous sample spaces, as it requires sampling from a generally intractable density, and, to a lesser extent, bounding the sensitivity of the objective function. Any differential privacy (DP) mechanism can be instantiated as ExpM, and ExpM poses an elegant solution for private machine learning (ML) that bypasses inherent inefficiencies of DPSGD. This paper seeks to operationalize ExpM for private optimization and ML by using an auxiliary Normalizing Flow (NF), an expressive deep network for density learning, to approximately sample from ExpM density. The method, ExpM+NF is an alternative to SGD methods for model training. We prove a sensitivity bound for the $\ell^2$ loss permitting ExpM use with any sampling method. To test feasibility, we present results on MIMIC-III health data comparing (non-private) SGD, DPSGD, and ExpM+NF training methods' accuracy and training time. We find that a model sampled from ExpM+NF is nearly as accurate as non-private SGD, more accurate than DPSGD, and ExpM+NF trains faster than Opacus' DPSGD implementation. Unable to provide a privacy proof for the NF approximation, we present empirical results to investigate privacy including the LiRA membership inference attack of Carlini et al. and the recent privacy auditing lower bound method of Steinke et al. Our findings suggest ExpM+NF provides more privacy than non-private SGD, but not as much as DPSGD, although many attacks are impotent against any model. Ancillary benefits of this work include pushing the SOTA of privacy and accuracy on MIMIC-III healthcare data, exhibiting the use of ExpM+NF for Bayesian inference, showing the limitations of empirical privacy auditing in practice, and providing several privacy theorems applicable to distribution learning.

Updated: 2024-06-11 01:09:47

标题: 是否正规化流是解锁指数机制的关键？

摘要: 指数机制（ExpM）是为私有优化设计的，历史上一直被搁置在连续样本空间上的使用，因为它需要从通常难以处理的密度中进行采样，并且在较小程度上需要限制目标函数的敏感性。任何差分隐私（DP）机制都可以实例化为ExpM，并且ExpM为私有机器学习（ML）提供了一种优雅的解决方案，绕过了DPSGD固有的低效率。本文旨在通过使用一个辅助的正则化流（NF）——一个用于密度学习的表达力深度网络，来近似从ExpM密度中进行采样，从而使ExpM在私有优化和ML中得以实现。该方法ExpM+NF是模型训练的SGD方法的一种替代方法。我们证明了$\ell^2$损失的敏感性边界，允许使用任何采样方法来实现ExpM。为了测试可行性，我们在MIMIC-III健康数据上展示了关于（非私密）SGD、DPSGD和ExpM+NF训练方法的准确性和训练时间的结果。我们发现，从ExpM+NF中采样的模型几乎与非私密SGD一样准确，比DPSGD更准确，并且ExpM+NF的训练速度比Opacus的DPSGD实现更快。由于无法为NF近似提供隐私证明，我们提供了实证结果来调查隐私，包括Carlini等人的LiRA成员推断攻击以及Steinke等人最近的隐私审计下界方法。我们的发现表明，ExpM+NF提供的隐私性比非私密SGD更多，但不如DPSGD，尽管许多攻击对任何模型都是无效的。这项工作的附加好处包括推动在MIMIC-III医疗保健数据上隐私性和准确性的SOTA，展示了ExpM+NF用于贝叶斯推断的用途，展示了实证隐私审计在实践中的局限性，并提供了几个适用于分布学习的隐私定理。

更新时间: 2024-06-11 01:09:47

领域: stat.ML,cs.AI,cs.CR,cs.LG,math.PR

下载: http://arxiv.org/abs/2311.09200v4

Online Joint Fine-tuning of Multi-Agent Flows

A Flow is a collection of component models (``Agents'') which constructs the solution to a complex problem via iterative communication. Flows have emerged as state of the art architectures for code generation, and are the raison d'etre for frameworks like Autogen. However, flows are currently constructed via a combination of manual prompt engineering and stagewise supervised learning techniques; the latter is limited to acyclic flows with granular node supervision. In this writeup I describe a procedure for online joint fine-tuning of an entire flow inspired by the Learning to Search framework. The approach leverages simulator access to reduce preferences over entire episodes to preferences over individual node outputs; when the components are language models the latter is a well-studied problem. The approach is applicable to reward-free settings (e.g., text feedback) if an episode evaluator model is available. I apply to the multi-hop QA dataset Musique achieving a state-of-the-art result.

Updated: 2024-06-11 01:08:53

标题: 在线联合微调多智能体流量

摘要: 一种流程是由构建解决复杂问题的组件模型（“代理”）组成，通过迭代通信构建解决方案。流程已经成为代码生成的最先进架构，并且是Autogen等框架的存在理由。然而，流程目前是通过手动提示工程和分阶段监督学习技术的组合构建的；后者仅限于具有颗粒节点监督的非循环流程。在本文中，我描述了一种受到学习搜索框架启发的在线联合微调整个流程的过程。该方法利用模拟器访问来将对整个剧集的偏好减少到对单个节点输出的偏好；当组件是语言模型时，后者是一个经过深入研究的问题。该方法适用于无奖励设置（例如文本反馈），如果存在一个剧集评估模型。我将其应用于多跳QA数据集Musique，取得了最先进的结果。

更新时间: 2024-06-11 01:08:53

领域: cs.LG

下载: http://arxiv.org/abs/2406.04516v2

Budget-Constrained Tool Learning with Planning

Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked. This paper proposes a novel method for budget-constrained tool learning. Our approach involves creating a preferable plan under the budget constraint before utilizing the tools. This plan outlines the feasible tools and the maximum number of times they can be employed, offering a comprehensive overview of the tool learning process for large language models. This allows them to allocate the budget from a broader perspective. To devise the plan without incurring significant extra costs, we suggest initially estimating the usefulness of the candidate tools based on past experience. Subsequently, we employ dynamic programming to formulate the plan. Experimental results demonstrate that our method can be integrated with various tool learning methods, significantly enhancing their effectiveness under strict budget constraints.

Updated: 2024-06-11 01:02:19

标题: 受预算限制的带规划的工具学习

摘要: 尽管已经投入了大量精力进行工具学习，但受限于预算的工具学习问题，即解决用户查询在特定预算约束下的问题，却被广泛忽视。本文提出了一种新颖的受限于预算的工具学习方法。我们的方法包括在利用工具之前，在预算约束下创建一个优先计划。该计划概述了可行的工具及其最大使用次数，为大型语言模型的工具学习过程提供了全面的概述。这使它们能够从更广泛的角度分配预算。为了在不产生显著额外成本的情况下制定计划，我们建议基于过去经验初步估计候选工具的有用性。随后，我们使用动态规划来制定计划。实验结果表明，我们的方法可以与各种工具学习方法相结合，显著增强它们在严格预算约束下的有效性。

更新时间: 2024-06-11 01:02:19

领域: cs.AI

下载: http://arxiv.org/abs/2402.15960v2

Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems.

Updated: 2024-06-11 00:41:08

标题: 用大型多模态语言模型解决旅行推销员问题的案例研究

摘要: 多模态大语言模型（MLLMs）已经展示出在处理不同的模态（包括文本、图像和音频）方面的熟练度。这些模型利用广泛的现有知识，使它们能够在几乎没有特定训练示例的情况下解决复杂问题，正如在少样本和零样本上下文学习场景中所证明的那样。本文调查了MLLMs的视觉能力，通过分析二维平面上点分布的图像来“直觉地”解决旅行推销员问题（TSP）。我们的实验旨在验证MLLMs能否有效地“直觉地”找到可行的TSP路线的假设。零样本、少样本、自组合和自调零样本评估的结果显示了令人鼓舞的成果。我们预计这些发现将激发进一步探索MLLMs的视觉推理能力，以解决其他组合问题。

更新时间: 2024-06-11 00:41:08

领域: cs.AI

下载: http://arxiv.org/abs/2406.06865v1

Validating LLM-Generated Programs with Metamorphic Prompt Testing

The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast, most notably in efficiency and rapid prototyping, as LLMs become increasingly integrated into the software development lifecycle and hence the supply chain, complex and multifaceted challenges arise as the code generated from these language models carry profound questions on quality and correctness. Research is required to comprehensively explore these critical concerns surrounding LLM-generated code. In this paper, we propose a novel solution called metamorphic prompt testing to address these challenges. Our intuitive observation is that intrinsic consistency always exists among correct code pieces but may not exist among flawed code pieces, so we can detect flaws in the code by detecting inconsistencies. Therefore, we can vary a given prompt to multiple prompts with paraphrasing, and to ask the LLM to acquire multiple versions of generated code, so that we can validate whether the semantic relations still hold in the acquired code through cross-validation. Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.

Updated: 2024-06-11 00:40:17

标题: 用变形提示测试验证由LLM生成的程序

摘要: 软件开发中的最新范式转变带来了由大型语言模型（LLM）提供的创新和自动化，其中展示了生成式预训练变压器（GPT）所具备的显著能力，能够自主生成代码，显著减少了各种编程任务所需的手动工作量。尽管LLM生成的代码的潜在好处是巨大的，尤其在效率和快速原型设计方面，但随着LLM越来越多地融入软件开发生命周期以及供应链中，由此产生的代码带来了质量和正确性方面的复杂多面性挑战。需要进行研究以全面探讨围绕LLM生成的代码的关键问题。在本文中，我们提出了一个称为变形提示测试的新颖解决方案来解决这些挑战。我们直观地观察到在正确的代码片段之间始终存在内在一致性，但在错误的代码片段之间可能不存在，因此我们可以通过检测不一致性来检测代码中的缺陷。因此，我们可以通过改变给定提示的多个提示进行释义，并要求LLM获取生成的代码的多个版本，以便我们可以通过交叉验证验证获取的代码中是否仍然保持语义关系。我们在HumanEval上的评估结果显示，变形提示测试能够检测到GPT-4生成的错误程序中的75％，误报率为8.6％。

更新时间: 2024-06-11 00:40:17

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.06864v1

Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity

Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems, improving cybersecurity threat modeling and risk management. However, evaluating LLMs in this context is crucial for legal compliance and effective application development. Existing LLM evaluation frameworks often overlook the human factor and cognitive computing capabilities essential for interdependent cybersecurity. To address this gap, I propose OllaBench, a novel evaluation framework that assesses LLMs' accuracy, wastefulness, and consistency in answering scenario-based information security compliance and non-compliance questions. OllaBench is built on a foundation of 24 cognitive behavioral theories and empirical evidence from 38 peer-reviewed papers. OllaBench was used to evaluate 21 LLMs, including both open-weight and commercial models from OpenAI, Anthropic, Google, Microsoft, Meta and so on. The results reveal that while commercial LLMs have the highest overall accuracy scores, there is significant room for improvement. Smaller low-resolution open-weight LLMs are not far behind in performance, and there are significant differences in token efficiency and consistency among the evaluated models. OllaBench provides a user-friendly interface and supports a wide range of LLM platforms, making it a valuable tool for researchers and solution developers in the field of human-centric interdependent cybersecurity and beyond.

Updated: 2024-06-11 00:35:39

标题: Ollabench：评估人类中心互联网络安全的LLMs推理

摘要: 大型语言模型（LLMs）有潜力通过更好地代表复杂相互依赖的网络安全系统来增强基于代理的建模，从而改善网络安全威胁建模和风险管理。然而，在这一背景下评估LLMs对于法律合规和有效应用开发至关重要。现有的LLM评估框架通常忽视了人为因素和认知计算能力对于相互依赖网络安全至关重要。为了填补这一空白，我提出了OllaBench，一个新颖的评估框架，用于评估LLMs在回答基于情景的信息安全合规和不合规问题时的准确性，浪费程度和一致性。OllaBench建立在24个认知行为理论和来自38篇同行评议论文的经验证据基础上。OllaBench被用于评估21个LLMs，包括来自OpenAI，Anthropic，Google，Microsoft，Meta等开放权重和商业模型。结果显示，尽管商业LLMs具有最高的整体准确性得分，但仍有很大改进空间。较小的低分辨率开放权重LLMs在性能上并不逊色，而在评估模型之间存在显著的令牌效率和一致性差异。OllaBench提供用户友好的界面，支持广泛的LLM平台，使其成为人为中心的相互依赖网络安全领域及其他领域的研究人员和解决方案开发人员的宝贵工具。

更新时间: 2024-06-11 00:35:39

领域: cs.CR,cs.AI,cs.HC,I.2.0; J.4

下载: http://arxiv.org/abs/2406.06863v1

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

Updated: 2024-06-11 00:17:39

标题: FLUX：通过内核融合在GPU上实现快速基于软件的通信重叠

摘要: 大型深度学习模型已经展示出在广泛应用领域解决许多任务的强大能力。这些大型模型通常需要进行分布式训练和推断。张量并行是一种常见的技术，将操作或层的计算在设备之间进行划分，以克服单个处理器的内存容量限制，并/或加速计算以满足特定的延迟要求。然而，这种并行性引入了额外的通信，可能会占整体运行时间的一个重要部分。因此，在具有高速互连的设备组内，如具有NVLink的GPU节点中，这种技术的可扩展性受到限制。本文提出了一种新颖的方法Flux，通过依赖计算显著隐藏GPU的通信延迟。Flux将通信和计算操作超分解为更细粒度的操作，并进一步将它们融合成一个更大的核心，有效地隐藏通信而不影响核心效率。在给定一个融合核心的情况下，Flux可以潜在地将高达96%的通信重叠。总体而言，它可以在128个GPU群集上比Megatron-LM获得高达1.24倍的训练加速度，并在具有各种GPU世代和互连的8个GPU群集上比vLLM获得高达1.66倍和1.30倍的预填充和解码推断加速度。

更新时间: 2024-06-11 00:17:39

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.06858v1

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual policies, which can be substantially cheaper than estimating the behavior of each policy directly. However, the best-known complexities in RL fail to take advantage of this and instead estimate the behavior of each policy directly. Does it suffice to estimate only the differences in the behaviors of policies in RL? We answer this question positively for contextual bandits but in the negative for tabular RL, showing a separation between contextual bandits and RL. However, inspired by this, we show that it almost suffices to estimate only the differences in RL: if we can estimate the behavior of a single reference policy, it suffices to only estimate how any other policy deviates from this reference policy. We develop an algorithm which instantiates this principle and obtains, to the best of our knowledge, the tightest known bound on the sample complexity of tabular RL.

Updated: 2024-06-11 00:02:19

标题: 在表格强化学习中通过策略差异估计减少样本复杂性

摘要: 在这篇论文中，我们研究了上下文臂和表格强化学习中纯探索问题的非渐近样本复杂性：以高概率从一组策略中识别出一个ε-最优策略。现有的臂问题研究表明，可以通过仅估计各个策略的行为之间的差异来识别最佳策略，这可能比直接估计每个策略的行为要便宜得多。然而，在强化学习中，已知的复杂性未能利用这一点，而是直接估计每个策略的行为。在强化学习中，仅估计策略的行为差异是否足够？对于上下文臂问题，我们肯定地回答了这个问题，但对于表格强化学习，则是否定的，显示了上下文臂问题和强化学习之间的差异。然而，受此启发，我们表明在强化学习中几乎仅需要估计行为的差异：如果我们可以估计一个参考策略的行为，那么仅需要估计任何其他策略如何偏离该参考策略。我们开发了一个算法，实现了这一原则，并获得了我们所知的表格强化学习样本复杂性的最紧密知道界限。

更新时间: 2024-06-11 00:02:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06856v1

Design and Scheduling of an AI-based Queueing System

To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.

Updated: 2024-06-11 00:01:42

标题: 基于人工智能的排队系统设计与调度

摘要: 为了利用预测模型在服务系统中做出最佳调度决策，我们必须了解预测错误如何影响由于外部性导致其他作业延迟的拥挤情况。受到预测模型与人类服务器（例如内容审核）交互的应用的启发，我们考虑一个由许多单服务器队列组成的大型排队系统，其中作业的类别是使用预测模型估计的。通过表征误预测对重流量拥堵成本的影响，我们设计了一个基于指标的策略，以一种接近最优的方式结合了预测的类别信息。我们的理论结果指导了预测模型的设计，提供了一个简单的模型选择程序，以下游排队性能为中心关注，并提供了关于如何设计基于人工智能的分诊排队系统的新颖见解。我们在基于真实在线评论的内容审核任务上展示了我们的框架，通过微调大型语言模型构建了毒性分类器。

更新时间: 2024-06-11 00:01:42

领域: math.OC,cs.LG,cs.NI

下载: http://arxiv.org/abs/2406.06855v1